Last Updated: 6 Apr, 2021

Repeated DNA Sequences

Moderate
Asked in companies
eBayBank Of AmericaAmadeus Lab

Problem statement

You are given a string of size ‘N’ consisting of only characters ‘a’,’c’,’g’ and ’t’. You have to find all the sub-strings of the given string of length 10 that is occurring more than once in the given string.

Input Format:
The first line of input contains an integer ‘T’ denoting the number of test cases. 

The first line of each test case consists of a single integer ‘N’ denoting the size of the string.

The next line consists of ‘N’ length string consisting of only  ‘a’,’c’,’g’ and ’t’.
Output Format:
For each test case, return all possible sub-strings of size 10 that occurs more than one time. You may return the substrings in any order.
Constraints:
1 <= ’T’ <= 50
1 <= ’N’ <= 5* 10^3
s[i] = {‘a’,‘c’,’g’ ,’t’ }

Time Limit: 1 sec
Note:
You do not need to print anything, it has already been taken care of. Just implement the given function.

Approaches

01 Approach

The main idea is to store the count of all substrings.

 

Algorithm:

  • Create a map ‘OP’ of string and int type.
  • Run a loop from 0 to ‘N - 10’ where ‘N’ is the length of the string.
  • Create an empty string temp. Run another loop from ‘i’ to ‘i’+10.Add all the characters in ‘TEMP’ string from ‘i’ to ‘i’ + 10.
  • Increase the count of temp substring in ‘OP’ map by 1.
  • After loops traverse over the map and the one which occurs more than once push it in the ‘ANS’ list.
  • Return the 'ANS’ list.