Data structures & algorithms (Beginner to Intermediate)

Free guided path13 chapters99+ problems

Earn badges and level up

Introduction

Today, let’s learn about a famous and commonly asked Interview Problem of Data Structure, i.e., the Longest Common Subsequence.

It is a famous and common problem that is solved using dynamic programming in its most optimized version.

I know the Dynamic programmingproblem is a little tricky to understand starting, but if you go step by step in this blog, the problem will be much clear for you to solve other problems based on a similar concept.

Given two strings, s1 and s2, find the length of their longest common subsequence. If there is no common subsequence, return 0.

A subsequence of a string is a new string generated from the original string with some (or none) characters deleted without changing the relative order of the remaining characters.

So, if a string is “abcde,” some of the possible subsequences of the given string are “cde,” “bc,” etc.

However, “bae” is not a valid subsequence as the relative ordering of characters is not maintained.

Example:

Let the two strings be “acbaed” and “abcadf.”

The longest Common Subsequence of the strings is “acad,” as this subsequence is present in both string1 and string2 and is the longest one.

So, the length of Longest Common Subsequence = size of “acad” = 4.

Let's consider another example: Let the two strings be “acb” and “dfe”.

The longest Common Subsequence of the strings is “” (empty string) as it has no subsequence, which is present in both string1 and string2.

So, the length of Longest Common Subsequence = length of “” = 0.

Now, let's get started and learn various approaches to solve this problem.

Get the tech career you deserve, faster!

Connect with our expert counsellors to understand how to hack your way to success

User rating 4.7/5

1:1 doubt support

95% placement record

Akash Pal

Senior Software Engineer

326% Hike After Job Bootcamp

Himanshu Gusain

Programmer Analyst

32 LPA After Job Bootcamp

After Job Bootcamp

Longest Common Subsequence Algorithm

There are many ways to find the longest common subsequence of two strings. These algorithms have different time and space complexities, but they solve a common problem. The following sections will discuss the different approaches to finding the longest common subsequence.

Approach1 - Using Recursion

The naive (or brute force) solution to this problem could be finding all possible configurations of different subsequences and finding the longest common subsequence possible.

So, we maintain two indexes, i and j, one for each string and increment one by one to find the best possible solution.

Note: If any of the indices reach any of the string’s length, the longest common subsequence would be 0 as the range to find the longest common subsequence.

System.out.println("Longest Common Subsequence "+ lcsans); } }

Output:

Enter String1 acbaed
Enter String2 abcadf
Longest Common Subsequence 4

Implementation in C++

C++

C++

// Recursive implementation of // Longest Common subsequence problem

#include <bits/stdc++.h> using namespace std;

int N, M;

// For calculating the length of the longest common subsequence

int lcs(string S, string T, int i, int j) { // one or both of the strings are fully traversed

if (i == N || j == M) return 0;

// check if current characters are equal

if (S[i] == T[j]) // check for next character return (lcs(S, T, i + 1, j + 1) + 1); else // check which call has maximum value and add to length return max(lcs(S, T, i + 1, j), lcs(S, T, i, j + 1)); } int main() { string S = " abcde"; string T = "acek";

N = S.length(); M = T.length(); cout << "Length of longest common subsequence " << lcs(S, T, 0, 0);

return 0; }

Output:

Length of longest common subsequence 3

Time and Space Complexity

Time Complexity: O(2 ^ N), i.e., exponential as we generate and compare all the subsequences of both the strings.

Note: The total number of subsequences of a string is O(2 ^ N).

Where ‘N’ is the length of the shortest of the two strings.

Approach 2 - Using Dynamic Programming

We could optimize the time complexity of our previous approach by maintaining a “dp array” where - dp[i][j] stores the longest common subsequence value that could be formed via considering string1 till i-th index and string2 till jth index.

Let's look at the recursive tree:

Since the recursive approach had a lot of overlapping subproblems, the dp array would also help us avoid that repetitive work.

So, let’s consider the two strings “aggtab” and “gxtxayb.”

“-” here indicates that the value of LCS = 0 after considering the string1 and string2 till ith and jth index, respectively.

Every time a matching character is found, the LCS value increases by one else; it remains as it is.

// Using Top Down dp approach public static int LCS(String s1, String s2, int m, int n, int[][] dp){

// Base Case if (m <= 0 || n <= 0) return 0;

// LookUp if (dp[m][n] != 0) return dp[m][n];

// Calculating Longest Common Subsequence if (s1.charAt(m) == s2.charAt(n)) dp[m][n] = LCS(s1, s2, m - 1, n - 1, dp) + 1; else dp[m][n] = Math.max(LCS(s1, s2, m - 1, n, dp), LCS(s1, s2, m, n - 1, dp));

return dp[m][n]; }

public static void main (String[] args){ Scanner s = new Scanner(System.in);

// Take Input System.out.println("Enter String1"); String s1 = s.next();

int main() { string S = "abcde"; string T = "acek";

N = S.length(); M = T.length();

// table for memoization

int memo[N][max_value];

// initializing memo table with -1

for (int i = 0; i < N; i++) for (int j = 0; j < M; j++) memo[i][j] = -1;

cout << "Length of longest common subsequence " << lcs(S, T, 0, 0, memo); return 0; }

Output:

Length of longest common subsequence 3

Time and Space Complexity

Time Complexity: O(N ^ 2) as for each length, we are traversing from 0 to that particular length to calculate our max possible profit for the current length.

Space Complexity: O(N ^ 2) as extra space is used to store the longest common subsequence value after considering both the strings until a particular index.

Where ‘N’ is the length of the shortest of the two strings.

// Using dp approach public static int LCS(String s1, String s2){

int m = s1.length(); int n = s2.length();

// Forming dp array int dp[][] = new int[m + 1][n + 1];

// Calculating Longest Common Subsequence for (int i=0; i <= m; i++){ for (int j=0; j <= n; j++){ if (i == 0 || j == 0) dp[i][j] = 0;

/* LCS length increases by one when there is a character match */ else if (s1.charAt(i - 1) == s2.charAt(j - 1)) dp[i][j] = dp[i - 1][j - 1] + 1; else dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]); } } return dp[m][n]; }

public static void main (String[] args){ Scanner s = new Scanner(System.in);

// Take Input System.out.println("Enter String1"); String s1 = s.next();

System.out.println("Longest Common Subsequence "+ lcsans); } }

Output:

Enter String1 acbaed
Enter String2 abcadf
Longest Common Subsequence 4

Implementation in C++

Java

Java

/* Dynamic Programming C++ implementation of LCS problem */ #include<bits/stdc++.h> using namespace std;

int max(int a, int b);

/* Returns length of LCS for X[0..m-1], Y[0..n-1] */ int lcs( char *X, char *Y, int m, int n ) { int L[m + 1][n + 1]; int i, j;

/* Following steps build L[m+1][n+1] in bottom up fashion. Note that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */ for (i = 0; i <= m; i++) { for (j = 0; j <= n; j++) { if (i == 0 || j == 0) L[i][j] = 0;

cout << "Length of longest common subsequence " << lcs( X, Y, m, n );

return 0; }

Output:

Length of longest common subsequence 3

Time and Space Complexity

Time Complexity: O(N ^ 2) as for each length, we are traversing from 0 to that particular length to calculate our max possible profit for the current length.

Space Complexity: O(N ^ 2) as extra space is used to store the longest common subsequence value after considering both the strings until a particular index.

Where ‘N’ is the length of the shortest of the two strings.

Longest Common Subsequence Applications

The following are some applications of Longest common subsequence:-

1. Version Control Systems (e.g., Git)

2. DNA sequence comparison in bioinformatics

3. Text comparison and plagiarism detection

4. Data differencing and file synchronization

5. Natural Language Processing tasks like spell checkers and similarity analysis

Frequently Asked Questions

What is meant by longest common subsequence?

The Longest Common Subsequence (LCS) is the longest sequence of elements present in two or more sequences, with the same relative order but not necessarily consecutive. It is a fundamental concept in dynamic programming and string matching algorithms.

What is the longest common subsequence measure?

A measure of similarity between two sequences, indicating the length of the longest common subsequence (LCS) they share.

What is the longest common subsequence of an array?

The longest subsequence common to two or more arrays, representing elements appearing in the same order in each array.

What is the longest common subsequence in linear space?

Efficient algorithms like the Hunt–Szymanski algorithm achieve LCS computation with linear space complexity.

Conclusion

In this blog, we learned various approaches to the Longest Common Subsequence.

Longest Common Subsequence is a standard recursive problem that is optimized via Dynamic Programming.

A subsequence of a string is a new string generated from the original string with some (or none) characters deleted without changing the relative order of the remaining characters.

The optimized time complexity of this problem is O(N ^ 2) for each character at an index, and we are calculating and comparing the LCS value to the ith and jth index value of string1 and string2, respectively.