Optimal Superstring

Hard
0/120
Average time to solve is 10m
profile
Contributed by
19 upvotes
Asked in companies
BarclaysGoogle inc

Problem statement

You are given a set of strings. You have to find the length of the Optimal Superstring for this set.

Note:

1. A string is said to be the Superstring of a set of strings if all the strings of the set are present in it as a substring.

2. A Superstring is said to be an Optimal Superstring if its length is minimum out of all possible Superstrings for a given set of strings. 

3. Each string of the set consists of the uppercase English alphabet.
Detailed explanation ( Input/output format, Notes, Images )
Input Format:
The first line of the input contains an integer 'N', denoting the size of the set.

The following 'N' lines contain one string denoting the strings, part of the given set.
Output Format:
For each test case, print a single line containing a single integer denoting the length of the optimal Superstring.

The output of each test case will be printed in a separate line.
Note:
You don't have to print anything, it has already been taken care of. Just Implement the given function.
Constraints:
1 <= N <= 14
1 <= |Si| <= 100

Time Limit: 1 sec.
Sample Input 1:
2
AA
AAAA
Sample Output 1:
4
Explanation of sample input 1:
In string “AAAA”, both strings “AA” and “AAAA” are present as substrings.
Sample input 2:
3
ABC
CDE
EFG
Sample output 2:
7
Explanation of sample input 2:
In string “ABCDEFG”, all the strings given in the set are present as a substring.
Hint

Can we use DP with bitmasking to denote the already added strings?

Approaches (1)
DP with Bitmasks
  • First of all, we need to remove the strings which are already present as a substring in another string of the set.
  • Now for any 2 strings, we can have the following two cases:
    • Some part of both the strings overlap with each other(suffix or prefix).In this case, we can remove the common part from one of the string in order to utilise the same characters as a part of both strings.
    • They do not overlap with each other. In the case of non-overlapping strings, we need to add complete strings in the superstring.
  • In order to find the length of optimal superstring, we try out all possible permutations of the given set of strings and compute the length of superstring for each permutation.
  • One thing to observe here is that while appending a string to some superstring, only the last added string matters for updating the length of superstring not the all previous strings included in the superstring.
  • So instead of generating all possible permutations, we can use a recursive function which takes in last appended string (let’s i’th string) in the superstring and a mask where ‘1’ denotes the strings already included in the current superstring. And this function will return the minimum length of superstring for the given state.
  • The base condition for this function will be when all the strings are included in the superstring, which is when ‘mask’ contains 1’s equal to the size of the set of strings.
  • In cases where the base condition is not satisfied, we try to append all the non appended strings so far in the superstring and return the minimum length possible for superstring can be generated from the current state.
  • Also, we can reach a certain state multiple times, hence we store the results for each state in a table (let’s call it dp) for further calls.
  • Recurrence relation for a certain state will be:
    • Dp[i][mask] = min(dp[x][ mask | (1 << x)] + costOfCombining(i, x)), for all x in range (1, N) where x’th bit in the mask is unset (zero). Here costOfCombining(i, x) is additional characters to be added to superstring in order to append x’th string ahead of i’th string.
  • The cost of combining two string can be determined by storing the non-overlapping suffix of a current string with the previous string.
  • Calling this function with ‘i’ as -1 denoting no strings included so far and mask value as 0. And this will return the length of Optimal Superstring.
Time Complexity

O((N ^ 2) * 2 ^ N + (N * |L|) ^ 2), where ‘N’ is the size of the set and, ‘|L|’ is the length of string present in the set.

 

As we are computing values for all states which are (N * 2 ^ N) and for finding the value of each state we need to select which next string will be in the sequence, which takes extra O(N) operation per state. Additionally, we need to precompute the non-overlapping length, of all possible pairs of two strings.

Space Complexity

O(N * 2 ^ N + N ^ 2), where ‘N’ is the size of the set and, ‘|L|’ is the length of string present in the set.

 

As we need to store the answer for each possible state which are of order O(N * 2 ^ N) and also we need to store the non-overlapping length of all possible pairs of two strings.

Video Solution
Unlock at level 3
(75% EXP penalty)
Code Solution
(100% EXP penalty)
Optimal Superstring
Full screen
Console