Last Updated: 4 Jan, 2021

Longest Common Subsequence

Moderate
Asked in companies
GoogleAmazonVisa

Problem statement

You have been given two Strings “STR1” and “STR2” of characters. Your task is to find the length of the longest common subsequence.

A String ‘a’ is a subsequence of a String ‘b’ if ‘a’ can be obtained from ‘b’ by deletion of several (possibly, zero or all) characters. A common subsequence of two Strings is a subsequence that is common to both Strings.

Input Format :
The first line contains an integer ‘T’ denoting the number of test cases. Then each test case follows. 

The first input line of each test case contains two space-separated Strings “STR1” and “STR2” representing two Strings. 
Output Format :
For each test case, print the length of the longest common subsequence. 

Print the output of each test case in a separate line. 
Note:
You are not required to print the expected output; it has already been taken care of. Just implement the function. 
Constraints :
1 <= T <= 50 
1 <= |STR1| <= 10^2
1 <= |STR2| <= 10^2

Where |STR1| and |STR2| denote the length of “STR1” and “STR2” respectively. 

Time Limit: 1 sec 

Approaches

01 Approach

The basic idea of this approach is to break the original problem into sub-problems. Let us assume we want to find the length of the longest common subsequence of “STR1” and “STR2” whose length is ‘N’ and ‘M’ respectively. 

 

Now, let us define a recursive function 

 

LCS(Int I, int J, string STR1, string STR2)

Which returns the length of the longest common subsequence of string STR1[0: I-1] and STR1[0: J-1] where STR1[l: R] denotes the substring of “STR1” having characters from index ‘l’ to index ‘R’.

 

Now, consider the following steps :

  1. If ( I == 0 ) i.e. “STR1” is empty then, the length of the LCS will be 0. So return 0.
  2. Similarly, if ( J == 0 ) return 0.
  3. Now if ( STR1[I-1] == STR2[J-1] ) which means the last character of both string matches then it is optimal to include the current character into the LCS. Therefore, find the LCS of the remaining characters of both strings. So return LCS(I-1, J-1, STR1, STR2) + 1
  4. Otherwise, we can not have both STR1[I-1] and STR2[J-1] at the end of the LCS which means either of them can be ignored. Now there are two possibilities, either we ignore STR1[I-1] and get the LCS of the rest of the characters and vice versa. We should choose the option which maximizes the LCS i.e. return max( LCS(I-1, J, STR1, STR2), LCS(I, J-1, STR1, STR2).


 

02 Approach

Let us observe the recursion tree for the case where STR1 = “abc” and STR2 = “cabd”

 

After observing the tree, we’ll find that there are some redundant function calls which means that there are some overlapping subproblems. The repetition of such sub-problems suggests that we can use dynamic programming to optimize our approach.

 

The key idea behind a dynamic programming approach is to use memoization, i.e. we’ll save the result of our sub-problem in a matrix so that it can be used later on.

 

Let DP[ I ][ J ] be our dynamic programming matrix to store the length of LCS of substring STR1[0: i-1] and STR2[0: j-1].

 

Now, consider the following steps:

  1. Before calling the function for any valid (I, J), we will first check whether we have already a solution corresponding to the (I, J), if we have already a solution then we return the solution else we will solve it with the following steps :
  2. If ( I == 0 ) i.e. “STR1” is empty then, the length of the LCS will be 0. So the answer is 0.
  3. Similarly, if ( J == 0 ) the answer is 0.
  4. Now if ( STR1[I-1] == STR2[J-1] ) which means the last character of both string matches then it is optimal to include the current character in the LCS. Therefore, find the LCS of the remaining characters of both strings. So return LCS(I-1, J-1, STR1, STR2) + 1
  5. Otherwise, we can not have both STR1[I-1] and STR2[J-1] at the end of the LCS which means either of them can be ignored. Now there are two possibilities, either we ignore STR1[I-1] and get the LCS of the rest of the characters and vice versa. We should choose the option which maximizes the LCS i.e. the answer will be max( LCS(I-1, J, STR1, STR2), LCS(I, J-1, STR1, STR2).
  6. Store the answer in the “DP” table for later use.


 

03 Approach

The basic idea of this approach is to solve the problem iteratively.

 

Let DP[I] J] be our dynamic programming matrix to store the length of LCS of substring STR1[0: I-1] and STR2[0:J-1].

 

Now, consider the following steps:

Start traversing the “STR1” using a variable ‘i’
Create a nested loop and start traversing the “STR2” using a variable ‘J’

 

If either ‘I’ or ‘J’ is zero then DP[I][J] = 0 because one of the strings is empty.
If ( STR1[I-1] == STR2[J-1] ) which means the current character of both strings matches then it is optimal to include the current character in the LCS. Therefore, find the LCS of the remaining characters of both strings. DP[I][J] = DP[I-1][J-1] + 1


Otherwise, we can not have both STR1[I-1] and STR2[J-1] at the end of the LCS which means either of them can be ignored. Now there are two possibilities, either we ignore STR1[I-1] and get the LCS of the rest of the characters and vice versa. We should choose the option which maximises the LCS i.e. DP[I][J] = max(DP[I-1][J], DP[I][J-1])

 

Now we can get the length of LCS from DP[N][M] where ‘N’ and ‘M’ are the length of “STR1” and “STR2” respectively.
 

04 Approach

We can observe in the previous approach that the current state depends on one immediate previous row. This suggests using a dynamic programming matrix of 2*M dimension instead of using N*M dimension.


Let DP[ 2 ][ M ] be our dynamic programming matrix to store the length of LCS of the substring.

 

Now, consider the following steps :

Initialize a variable ‘T’ to zero which denotes the current row of “DP” matrix.


Start traversing the “STR1” using a variable ‘I’


Create a nested loop and start traversing the “STR2” using a variable ‘J’

If either ‘I’ or ‘J’ is zero then DP[T][J] = 0 because one of the strings is empty.


If ( STR1[i-1] == str2[j-1] ) which means the current character of both string matches then it is optimal to include the current character in the LCS. Therefore, find the LCS of the remaining characters of both strings. DP[t][j] = DP[t^1][j-1] + 1


Otherwise, we can not have both STR1[i-1] and str2[j-1] at the end of the LCS which means either of them can be ignored. Now there are two possibilities, either we ignore STR1[i-1] and get the LCS of the rest of the characters and vice versa. We should choose the option which maximises the LCS i.e. DP[t][j] = max(DP[t^1][j], DP[t][j-1])


Now, toggle the ‘T’ i.e. T = (T XOR 1). The reason behind this is we want to use the current row of “DP” matrix in the next iteration.

 

Now we can get the length of LCS from DP[t^1][M] where ‘M’ is the length of “STR2”.