In algorithmic problem-solving, the Longest Common Subsequence (LCS) stands as a fundamental concept with applications spanning from computational biology to data compression. LCS seeks to find the longest sequence that can be derived from two or more sequences, maintaining the order of elements without necessarily being contiguous. Understanding LCS involves delving into dynamic programming strategies, sequence alignment techniques, and real-world scenarios where it plays a pivotal role in pattern recognition and similarity analysis.
Given two strings, s1 and s2, find the length of their longest common subsequence. If there is no common subsequence, return 0.
A subsequence of a string is a new string generated from the original string with some (or none) characters deleted without changing the relative order of the remaining characters.
So, if a string is “abcde,” some of the possible subsequences of the given string are “cde,” “bc,” etc.
However, “bae” is not a valid subsequence as the relative ordering of characters is not maintained.
Example:
Let the two strings be “acbaed” and “abcadf.”
The longest Common Subsequence of the strings is “acad,” as this subsequence is present in both string1 and string2 and is the longest one.
So, the length of Longest Common Subsequence = size of “acad” = 4.
Let's consider another example: Let the two strings be “acb” and “dfe”.
The longest Common Subsequence of the strings is “” (empty string) as it has no subsequence, which is present in both string1 and string2.
So, the length of Longest Common Subsequence = length of “” = 0.
Now, let's get started and learn various approaches to solve this problem.
Longest Common Subsequence Algorithm
There exist multiple algorithms to determine the longest common subsequence between two strings, each varying in time and space complexities while addressing the same problem. Subsequent sections will explore these distinct approaches for finding the longest common subsequence.
Longest Common Subsequence (LCS) using Recursion
The naive (or brute force) solution to this problem could be finding all possible configurations of different subsequences and finding the longest common subsequence possible.
So, we maintain two indexes, i and j, one for each string and increment one by one to find the best possible solution.
Note: If any of the indices reach any of the string’s length, the longest common subsequence would be 0 as the range to find the longest common subsequence.
Implementation
Let’s have a look at its implementation:
C
C++
Java
Python
Javascript
PHP
C#
C
#include <stdio.h> #include <string.h>
int LCS(char s1[], char s2[], int i, int j) { // Base case if (s1[i] == '\0' || s2[j] == '\0') return 0;
if (s1[i] == s2[j]) { return 1 + LCS(s1, s2, i + 1, j + 1); }
int option1 = LCS(s1, s2, i + 1, j); int option2 = LCS(s1, s2, i, j + 1);
return (option1 > option2) ? option1 : option2; }
int main() { char s1[100], s2[100];
// Take Input printf("Enter String1: "); scanf("%s", s1);
printf("Enter String2: "); scanf("%s", s2);
int lcsans = LCS(s1, s2, 0, 0);
printf("Longest Common Subsequence: %d\n", lcsans);
Where ‘N’ is the length of the shortest of the two strings.
Longest Common Subsequence (LCS) Using Dynamic Programming
We could optimize the time complexity of our previous approach by maintaining a “dp array” where - dp[i][j] stores the longest common subsequence value that could be formed via considering string1 till i-th index and string2 till jth index.
Let's look at the recursive tree:
Since the recursive approach had a lot of overlapping subproblems, the dp array would also help us avoid that repetitive work.
So, let’s consider the two strings “aggtab” and “gxtxayb.”
“-” here indicates that the value of LCS = 0 after considering the string1 and string2 till ith and jth index, respectively.
Every time a matching character is found, the LCS value increases by one else; it remains as it is.
1. Longest Common Subsequence (LCS) Using Top-Down Dp
Implementation
Let’s have a look at its implementation
C
C++
Java
Python
Javascript
PHP
C#
C
#include <stdio.h> #include <string.h>
int LCS(char s1[], char s2[], int m, int n, int dp[][100]) { // Base Case if (m <= 0 || n <= 0) return 0;
// LookUp if (dp[m][n] != 0) return dp[m][n];
// Calculating Longest Common Subsequence if (s1[m - 1] == s2[n - 1]) dp[m][n] = LCS(s1, s2, m - 1, n - 1, dp) + 1; else dp[m][n] = (LCS(s1, s2, m - 1, n, dp) > LCS(s1, s2, m, n - 1, dp)) ? LCS(s1, s2, m - 1, n, dp) : LCS(s1, s2, m, n - 1, dp);
return dp[m][n]; }
int main() { char s1[100], s2[100];
// Take Input printf("Enter String1: "); scanf("%s", s1);
printf("Enter String2: "); scanf("%s", s2);
int m = strlen(s1); int n = strlen(s2);
// Forming dp array int dp[101][101] = {0};
int lcsans = LCS(s1, s2, m, n, dp);
printf("Longest Common Subsequence: %d\n", lcsans);
// Using Top Down dp approach public static int LCS(String s1, String s2, int m, int n, int[][] dp){
// Base Case if (m <= 0 || n <= 0) return 0;
// LookUp if (dp[m][n] != 0) return dp[m][n];
// Calculating Longest Common Subsequence if (s1.charAt(m) == s2.charAt(n)) dp[m][n] = LCS(s1, s2, m - 1, n - 1, dp) + 1; else dp[m][n] = Math.max(LCS(s1, s2, m - 1, n, dp), LCS(s1, s2, m, n - 1, dp));
return dp[m][n]; }
public static void main (String[] args){ Scanner s = new Scanner(System.in);
// Take Input System.out.println("Enter String1"); String s1 = s.next();
// Forming dp array int[,] dp = new int[m + 1, n + 1];
int lcsans = LCS(s1, s2, m, n, dp);
Console.WriteLine("Longest Common Subsequence: " + lcsans); } }
Output
Enter String1 acbaed
Enter String2 abcadf
Longest Common Subsequence 4
Time and Space Complexity
Time Complexity: O(N ^ 2) as for each length, we are traversing from 0 to that particular length to calculate our max possible profit for the current length.
Space Complexity: O(N ^ 2) as extra space is used to store the longest common subsequence value after considering both the strings until a particular index.
Where ‘N’ is the length of the shortest of the two strings.
2. Longest Common Subsequence (LCS) Using Bottom-Up Dp
Implementation
Let’s have a look at its implementation
C
C++
Java
Python
Javascript
PHP
C#
C
#include <stdio.h> #include <string.h>
int LCS(char s1[], char s2[]) { int m = strlen(s1); int n = strlen(s2);
// Forming dp array int dp[m + 1][n + 1];
// Calculating Longest Common Subsequence for (int i = 0; i <= m; i++) { for (int j = 0; j <= n; j++) { if (i == 0 || j == 0) dp[i][j] = 0; else if (s1[i - 1] == s2[j - 1]) dp[i][j] = dp[i - 1][j - 1] + 1; else dp[i][j] = (dp[i - 1][j] > dp[i][j - 1]) ? dp[i - 1][j] : dp[i][j - 1]; } } return dp[m][n]; }
int main() { char s1[100], s2[100];
// Take Input printf("Enter String1: "); scanf("%s", s1);
printf("Enter String2: "); scanf("%s", s2);
int lcsans = LCS(s1, s2);
printf("Longest Common Subsequence: %d\n", lcsans);
// Using dp approach public static int LCS(String s1, String s2){
int m = s1.length(); int n = s2.length();
// Forming dp array int dp[][] = new int[m + 1][n + 1];
// Calculating Longest Common Subsequence for (int i=0; i <= m; i++){ for (int j=0; j <= n; j++){ if (i == 0 || j == 0) dp[i][j] = 0;
/* LCS length increases by one when there is a character match */ else if (s1.charAt(i - 1) == s2.charAt(j - 1)) dp[i][j] = dp[i - 1][j - 1] + 1; else dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]); } } return dp[m][n]; }
public static void main (String[] args){ Scanner s = new Scanner(System.in);
// Take Input System.out.println("Enter String1"); String s1 = s.next();
# Forming dp array dp = [[0] * (n + 1) for _ in range(m + 1)]
# Calculating Longest Common Subsequence for i in range(1, m + 1): for j in range(1, n + 1): if s1[i - 1] == s2[j - 1]: dp[i][j] = dp[i - 1][j - 1] + 1 else: dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
Console.WriteLine("Longest Common Subsequence: " + lcsans); } }
Output:
Enter String1 acbaed
Enter String2 abcadf
Longest Common Subsequence 4
Time and Space Complexity
Time Complexity: O(N ^ 2) as for each length, we are traversing from 0 to that particular length to calculate our max possible profit for the current length.
Space Complexity: O(N ^ 2) as extra space is used to store the longest common subsequence value after considering both the strings until a particular index.
Where ‘N’ is the length of the shortest of the two strings.
Longest Common Subsequence Applications
The following are some applications of Longest common subsequence:-
1. Version Control Systems (e.g., Git)
2. DNA sequence comparison in bioinformatics
3. Text comparison and plagiarism detection
4. Data differencing and file synchronization
5. Natural Language Processing tasks like spell checkers and similarity analysis
The longest common subsequence (LCS) of two sequences is the longest sequence that appears in both sequences in the same order but not necessarily consecutively. It represents the maximum length of shared elements between the two sequences.
Why do we use LCS?
We use LCS to find similar parts within different sequences (text, code).
What are the conditions for LCS algorithm?
LCS works on any sequences, needing no specific order within the sequences.
What is the recursive formula for LCS?
If elements match, add 1 to LCS from shorter subsequences. Otherwise, take the maximum LCS from excluding either sequence's last element.
What is the longest common subsequence measure?
A measure of similarity between two sequences, indicating the length of the longest common subsequence (LCS) they share.
What is the longest common subsequence of an array?
The longest subsequence common to two or more arrays, representing elements appearing in the same order in each array.
What is the longest common subsequence in linear space?
Efficient algorithms like the Hunt–Szymanski algorithm achieve LCS computation with linear space complexity.
Conclusion
In this blog, we learned various approaches to the Longest Common Subsequence.
Longest Common Subsequence is a standard recursive problem that is optimized via Dynamic Programming.
A subsequence of a string is a new string generated from the original string with some (or none) characters deleted without changing the relative order of the remaining characters.
The optimized time complexity of this problem is O(N ^ 2) for each character at an index, and we are calculating and comparing the LCS value to the ith and jth index value of string1 and string2, respectively.