Repeated DNA Sequences

Moderate
0/80
Average time to solve is 30m
profile
Contributed by
4 upvotes
Asked in companies
eBayBank Of AmericaAmadeus Lab

Problem statement

You are given a string of size ‘N’ consisting of only characters ‘a’,’c’,’g’ and ’t’. You have to find all the sub-strings of the given string of length 10 that is occurring more than once in the given string.

Detailed explanation ( Input/output format, Notes, Images )
Input Format:
The first line of input contains an integer ‘T’ denoting the number of test cases. 

The first line of each test case consists of a single integer ‘N’ denoting the size of the string.

The next line consists of ‘N’ length string consisting of only  ‘a’,’c’,’g’ and ’t’.
Output Format:
For each test case, return all possible sub-strings of size 10 that occurs more than one time. You may return the substrings in any order.
Constraints:
1 <= ’T’ <= 50
1 <= ’N’ <= 5* 10^3
s[i] = {‘a’,‘c’,’g’ ,’t’ }

Time Limit: 1 sec
Note:
You do not need to print anything, it has already been taken care of. Just implement the given function.
Sample Input 1
2
12
aaaaaaaaaaab
20
aaagggccctaaagggccct
Sample Output 1
aaaaaaaaaa
aaagggccct
Sample Output Explanation
For First Test-case total number of substrings of size 10 are:-
[aaaaaaaaaa,aaaaaaaaaa,aaaaaaaaab]
From the above substrings, aaaaaaaaaa occurs more than once.

For SecondTestcase total number of substrings of size 10 are:-
[aaagggccct,aagggcccta,agggccctaa,gggccctaaa,ggccctaaag,gccctaaagg, ccctaaaggg, cctaaagggc ,ctaaagggcc ,taaagggccc ,aaagggccct]

From the above substrings, aaagggccct occurs more than once.
Sample Input 2
2
11
ggggggggggg
20
aaaaagggggaaaaaggggg
Sample Output 2
gggggggggg
aaaaaggggg
Hint

Store the count of substrings of length 10.

Approaches (1)
Mapping

The main idea is to store the count of all substrings.

 

Algorithm:

  • Create a map ‘OP’ of string and int type.
  • Run a loop from 0 to ‘N - 10’ where ‘N’ is the length of the string.
  • Create an empty string temp. Run another loop from ‘i’ to ‘i’+10.Add all the characters in ‘TEMP’ string from ‘i’ to ‘i’ + 10.
  • Increase the count of temp substring in ‘OP’ map by 1.
  • After loops traverse over the map and the one which occurs more than once push it in the ‘ANS’ list.
  • Return the 'ANS’ list.
Time Complexity

O(N) where ‘N’ is the length of the string.

 

We are running nested loops which will take N * 10 steps where n is the length of the string. Hence time complexity is O(N).

Space Complexity

O(N), where ‘N’ is the length of the string.

 

We are storing all substrings of size 10 and there are total N-9 substrings of size 10 where n is the length of the string. Hence space complexity is O(N).

Code Solution
(100% EXP penalty)
Repeated DNA Sequences
Full screen
Console