Last Updated: 21 Oct, 2020

Distinct Subsequences

Moderate
Asked in companies
MicrosoftUberMeesho

Problem statement

You have been given string 'S' of length 'N' that may contain duplicate alphabets. Your task is to return the count of distinct subsequences of it.

For example:

For the given string “deed” :
The possible subsequences are {“”}, {“d”}, {“e”}, {“de”}, {“e”}, {“de”}, {“ee”}, {“dee”}, {“d”}, {“dd”}, {“ed”}, {“ded”}, {“ed”}, {“ded”}, {“eed”} and {“deed”}.

As, {“d”}, {“e”}, {“de”}, {“ed”} and {“ded”} are repeated. 

The distinct subsequences are {“”}, {“d”}, {“e”}, {“de”}, {“ee”}, {“dee”}, {“dd”}, {“ed”}, {“ded”}, {“eed”} and {“deed”}

Thus, the output will be 11. 

Note:

As the answer can be large, return your answer modulo 10^9  + 7.  
Input format:
The first line contains an integer 'T' which denotes the number of test cases or queries to be run. Then, the T test cases follow.

The first and only line of each test case or query contains the string 'S'. 
Output format:
For each test case, print a single line containing the count of distinct subsequences modulo 10^9+7.

The output of each test case will be printed in a separate line.

Note:

You do not need to print anything, it has already been taken care of. Just implement the given function.
Constraints:
1 <= T <= 10 
1 <= N <= 10 ^ 4 

where 'T' is the number of test cases and 'N' is the length of the given string 'S'.

Time limit: 1 sec.

Approaches

01 Approach

  • The IDea is to call a helper function so as to create all possible subsequences of the string which will contain:
    • String CUR = the possible subsequence of the string.
    • Int ID = the index of character which will either be included or excluded.
  • We maintain a set to count the distinct subsequences, each time a new subsequence is created, we add it into the set. Remember, the set does not contain duplicate elements.
  • We return the size of the set which is the count of distinct subsequences received from the helper function.

Algorithm:

void helper(STR, set, ID, CUR) :

  • Base Case: if ID >= STR.size()’, this means that CUR is a possible subsequence, thus, add CUR to set and return.
  • Call helper(STR, set, ID+1, CUR), this is where you exclude the current character.
  • Call helper(STR, set, ID+1, CUR + STR[ID]), this is where you include the current character.

 

int distinctSubsequences(S) :

  • Initialise an empty set.
  • Call helper function, helper(STR, set, 0, “”).
  • Return the size of the set.

02 Approach

  • The idea is to use dynamic programming.
  • Take a ‘DP’ list which at every index stores the number of subsequences created by the given string which ends at that particular index. Mathematically, DP[id] will store the number of subsequences created by string ‘STR’ if the length of the string is id.
  • We maintain a ‘MAP’ which stores the index of the latest occurrence of any particular alphabet in the string.
  • Now, to calculate DP[id] the steps follow:

    • If the character at index id has not appeared before then there will be no possibility for repetition in subsequence. Hence, DP[id] = 2 * DP[id - 1].
    • Otherwise, we need to subtract the repetitions and as the last occurrence of the current character in the string is at the MAP[STR[id-1]]. So the count of subsequences till this index is repeated so we will subtract the count. Hence, DP[id] = 2 * DP[id - 1] - DP[ MAP[STR[id-1]].
  • Update the latest index of the current index in the MAP.
  • Return DP[N] where N is the size of the string. 
     

For example, let’s take the string ”abbcb”

 

  • Size of the given string is 5. Initialise a MAP to store the latest index of each character. Along with this also initialise a DP array DP[6]
  • The base case, DP[0] = 1 This stores the number of subsequences that can be created if string size is 0 which is 1, i.e a null string.
  • From index 1 to the length of the string:
    • Iteration 1: STR[1-1], i.e ‘a’ occurs for the first time, DP[1] = 2 * DP[0]. Thus, DP[1] = 2. Also, MAP[a] = 0.
    • Iteration 2: STR[2-1], i.e ‘b’ occurs for the first time, DP[2] = 2 * DP[1]. Thus, DP[2] = 4. Also, MAP[b] = 1.
    • Iteration 3: STR[3-1], i.e ‘b’ is repeated, DP[3] = 2 * DP[2] - DP[ MAP[ b ]]. Thus, DP[3] = 2 * 4 - 2, DP[3] = 6. Also, MAP[b] = 2.
    • Iteration 4: STR[4-1], i.e ‘c’ occurs for the first time, DP[4] = 2 * DP[3]. Thus, DP[4] = 12. Also, MAP[c] = 3.
    • Iteration 5: STR[5-1], i.e ‘b’ is repeated, DP[5] = 2 * DP[4] - DP[ MAP[ b ]]. Thus, DP[5] = 2 * 12 - 4, DP[5] = 20. Also, MAP[b] = 4.
    • The final count is stored in DP[5] i.e 20. The DP list is [1, 2, 4, 6, 12, 20].

03 Approach

  • Take a variable, say ‘LEVELCOUNT’ which stores the count of subsequences ending at a particular character.
  • Take another variable, say ‘ALLCOUNT’ which stores the count of total subsequences created yet.
  • Take a ‘MAP’ which maps the character with its respective ‘LEVELCOUNT’.
  • Now, if the string is not an empty string, let’s work on the very first character (C). The only subsequence ending with C will be “C”, thus,
    • LEVELCOUNT becomes 1.
    • In the same way, ALLCOUNT becomes 1
    • Hash C with its LEVELCOUNT i.e. MAP[C] = 1.
  • Iterate through the string:
    • Update LEVELCOUNT with ALLCOUNT + 1.
    • If the character has not appeared before then there will be no possibility for repetition in subsequence. Hence, ALLCOUNT = ALLCOUNT + LEVELCOUNT.
    • Otherwise, we need to subtract the repetitions and as the MAP stores the number of repeated subsequences, hence, ALLCOUNT = ALLCOUNT + LEVELCOUNT - MAP[character].
  • Return ALLCOUNT + 1 as we include empty subsequence as well.

 

For example, let’s take the string “ada”

 

  • Let us take the very first character i.e. ‘a’. The number of subsequences ending with ‘a’ = 1 i.e (‘a’). Thus total subsequences = 1 (‘a’).
  • Now, take the next character i.e. ‘d’. The number of subsequences ending with ‘d’ = 2 i.e (‘d’, ‘ad’) i.e. Total subsequences + 1. Thus total subsequences = 3 (‘a’, ‘d’, ‘ad’) i.e (Total subsequences) + (subsequences ending with ‘d’).
  • Now, take the next character i.e. ‘a’. The number of subsequences ending with ‘a’ = 4 i.e (‘aa’, ‘da’, ‘ada’, ‘a’) i.e. Total subsequences + 1. Total subsequences = 7 (‘a’, ‘d’, ‘ad’, ‘aa’, ‘da’, ‘ada’, ‘a’) i.e (Total subsequences) + (subsequences ending with ‘a’), but, this time, the subsequences are repeated as the character ‘a’ is also repeated.
  • Thus, the total subsequences will be 6 which is (total subsequences) + (subsequences ending with ‘a’) - (total subsequences that ended with previous occurrence of ‘a’). Finally, (‘a’, ‘d’, ‘ad’) + ( ‘aa’, ‘da’, ‘ada’, ‘a’) - (‘a’)

Note: We have excluded empty string here.