In string processing and algorithmic design, the concept of finding commonalities between strings holds profound significance. One such fundamental problem is that of identifying the longest common substring (LCS). An LCS between two or more strings is a sequence that appears in the same order in each string, yet it may not be contiguous.
Problem Statement
In this problem, we will be given two strings. We have to find the length of the longest substring, which is common in the two given strings.
Example
Assume the two given strings are:
s1 = “qdef”
s2 = “defq”
“q”, “d”, “e”, “f”, “de”, “ef”, and “def” are the common substrings between s1 and s2. “def” is the longest common substring. So, the length of the longest common substring of s1 and s2 will be “3”.
1. Simple Approach
The simplest approach is to generate all possible substrings of one string. Then check if each of them is a substring of the other string. This approach has a time complexity of O(n^3). n is the length of the longest input string. It is not efficient for large input sizes. It is mainly used for small instances or as a baseline for comparison.
Implementation
C
C++
Java
Python
Javascript
PHP
C#
C
#include <stdio.h> #include <string.h>
// Simple approach int LCSubstr(const char* str1, const char* str2) { int result = 0; int len1 = strlen(str1); int len2 = strlen(str2);
for (int i = 0; i < len1; i++) { for (int j = 0; j < len2; j++) { int k = 0; while ((i + k) < len1 && (j + k) < len2 && str1[i + k] == str2[j + k]) { k++; } if (k > result) { result = k; } } } return result; }
// Driver code int main() { const char* X = "qdef"; const char* Y = "defq";
printf("Length of Longest Common Substring is %d\n", LCSubstr(X, Y)); return 0; }
#include <iostream> #include <string.h> using namespace std;
//simple approach int LCSubstr(string str1 ,string str2){ int result = 0;
/* loop to find all substrings of string 1. check if current substring matches. maximize the lenght of common substring */ for (int i = 0; i < str1.length(); i++) { for (int j = 0; j < str2.length(); j++) { int k = 0; while ((i + k) < str1.length() && (j + k) < str2.length() && str1[i + k] == str2[j + k]){ k = k + 1; }
result = max(result, k); } } return result; }
// Driver code int main(){ string X = "qdef"; string Y = "defq";
cout << "Length of Longest Common Substring is " << LCSubstr(X, Y); return 0; }
You can also try this code with Online C++ Compiler
for i in range(len(str1)): for j in range(len(str2)): k = 0 while i + k < len(str1) and j + k < len(str2) and str1[i + k] == str2[j + k]: k += 1 result = max(result, k)
return result
# Driver code if __name__ == "__main__": X = "qdef" Y = "defq"
print(f"Length of Longest Common Substring is {LCSubstr(X, Y)}")
You can also try this code with Online Python Compiler
public class Program { // Simple approach public static int LCSubstr(string str1, string str2) { int result = 0;
for (int i = 0; i < str1.Length; i++) { for (int j = 0; j < str2.Length; j++) { int k = 0; while (i + k < str1.Length && j + k < str2.Length && str1[i + k] == str2[j + k]) { k++; } result = Math.Max(result, k); } } return result; }
// Driver code public static void Main() { string X = "qdef"; string Y = "defq";
Console.WriteLine("Length of Longest Common Substring is " + LCSubstr(X, Y)); } }
Output
2. Recursive Approach
This problem can be solved using Recursion. We can consider one by one each character in the two strings. The following two cases will arise:
1. The two characters from the two strings are the same:
In this case, increase the length of the longest common substring and move to the next characters in both strings.
2. The two characters from the two strings are different:
In this case, take the maximum of the current length of the longest common string and the two lengths of the longest common substring from two cases - one by moving to the next character in only the first string and another by moving to the next character in only the second string.
Let’s understand this algorithm in detail and its C++ code.
Algorithm
Step 1. Create a recursive function “longestCommonSubstringLength()” which takes input
s1 - First given string
s2 - Second given string
i - It denotes the current length of s1
j - It denotes the current length of s2
maxLen - For storing the length of the longest common substring
Step 2. Write the base case of the recursive function. If i <= 0 or j <= 0 i.e. if the current length of any of the string becomes 0, then simply return ‘maxLen’.
Step 3. Now check if the characters at index ‘i-1’ in the first string and ‘j-1’ in the second string are the same, then increase the value of ‘maxLen’ by one and call the function for the smaller sub-problem.
Step 4. Else If the two characters are not the same, then update the value as the maximum of ‘maxLen’ and the value returned by calling the recursive function for two smaller sub-problems.
Step 5. Finally, return ‘maxLen,’ i.e., the length of the longest common substring.
Implementation
C
C++
Java
Python
Javascript
PHP
C#
C
#include <stdio.h> #include <string.h>
int longestCommonSubstringLength(char *s1, char *s2, int i, int j, int maxLen) { // Base Case if (i <= 0 || j <= 0) { return maxLen; }
int maxLen1 = maxLen;
// If the characters are same in both the string then increase maxLen by 1 if (s1[i - 1] == s2[j - 1]) { maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1); }
int maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0); int maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0);
int longestCommonSubstringLength(string s1, string s2, int i, int j, int maxLen){ // Base Case if (i <= 0 || j <= 0) { return maxLen; }
int maxLen1 = maxLen;
// If the characters are same in both the string then increase maxLen by 1 if (s1[i - 1] == s2[j - 1]) { maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1); }
int maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0); int maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0);
return max(maxLen1, max(maxLen2, maxLen3)); }
int main(){ string s1 = "qdef"; string s2 = "defq";
int maxLen=0;
cout<<"The Length of Longest Common Substring is "<< longestCommonSubstringLength(s1, s2, s1.length(), s2.length(), maxLen)<<endl;
return 0; }
You can also try this code with Online C++ Compiler
public class Main { public static int longestCommonSubstringLength(String s1, String s2, int i, int j, int maxLen) { // Base Case if (i <= 0 || j <= 0) { return maxLen; }
int maxLen1 = maxLen;
// If the characters are same in both the string then increase maxLen by 1 if (s1.charAt(i - 1) == s2.charAt(j - 1)) { maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1); }
int maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0); int maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0);
def longestCommonSubstringLength(s1, s2, i, j, maxLen): # Base Case if i <= 0 or j <= 0: return maxLen
maxLen1 = maxLen
# If the characters are same in both the string then increase maxLen by 1 if s1[i - 1] == s2[j - 1]: maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1)
maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0) maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0)
function longestCommonSubstringLength(s1, s2, i, j, maxLen) { // Base Case if (i <= 0 || j <= 0) { return maxLen; }
let maxLen1 = maxLen;
// If the characters are same in both the string then increase maxLen by 1 if (s1[i - 1] === s2[j - 1]) { maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1); }
const maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0); const maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0);
<?php function longestCommonSubstringLength($s1, $s2, $i, $j, $maxLen) { // Base Case if ($i <= 0 || $j <= 0) { return $maxLen; }
$maxLen1 = $maxLen;
// If the characters are same in both the string then increase maxLen by 1 if ($s1[$i - 1] == $s2[$j - 1]) { $maxLen1 = longestCommonSubstringLength($s1, $s2, $i - 1, $j - 1, $maxLen + 1); }
public class Program { public static int longestCommonSubstringLength(string s1, string s2, int i, int j, int maxLen) { // Base Case if (i <= 0 || j <= 0) { return maxLen; }
int maxLen1 = maxLen;
// If the characters are same in both the string then increase maxLen by 1 if (s1[i - 1] == s2[j - 1]) { maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1); }
int maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0); int maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0);
// Driver code public static void Main() { string s1 = "qdef"; string s2 = "defq"; int maxLen = 0;
Console.WriteLine("The Length of Longest Common Substring is " + longestCommonSubstringLength(s1, s2, s1.Length, s2.Length, maxLen)); } }
Output
Time complexity: O(3^(M+N))
As the function is called 3 times, and it checks for all the characters, i.e. M + N,
where M and N are the lengths of the respective strings. A recursive tree is created.
Space complexity: O(M+N)
The recursive function takes a space of N+M for auxiliary stack space during recursion.
3. Dynamic Programming Approach
In the recursive approach, there are a lot of duplicate calls for the same state i.e. for the same values of i, j, and maxLen, which leads to exponential time complexity.
The above solution can be modified, and the results can be stored in a table to avoid the repetitive calling of recursive functions for the same state. Thus dynamic programming approach will significantly reduce the time complexity as compared to the recursive solution.
The algorithm will be similar to the above recursive approach. The only difference is that instead of repetitively calling the recursive function, we will store the length of the longest common substring for each state in a table.
Algorithm
Create a matrix dp of size (m+1) x (n+1).
Initialize the first row and the first column of the matrix dp with zeros.
Initialize two variables: maxLength to store the length of the longest common substring and endIndex to store the ending index of the longest common substring.
Start a nested loop.
If i == 0 or j == 0, then dp[i][j] = 0.
If str1[i] == str2[j], then dp[i][j] = 1 + dp[i – 1][j – 1].
Keep track of the maximum value and return it.
Implementation
C
C++
Java
Python
Javascript
PHP
C#
C
#include <stdio.h> #include <string.h>
int lcsDP(char* string_1, char* string_2, int M, int N) { int dp[M + 1][N + 1]; int count = 0;
for (int i = 0; i <= M; i++) { for (int j = 0; j <= N; j++) { if (i == 0 || j == 0) dp[i][j] = 0; else if (string_1[i - 1] == string_2[j - 1]) { dp[i][j] = dp[i - 1][j - 1] + 1; count = (count > dp[i][j]) ? count : dp[i][j]; } else dp[i][j] = 0; } } return count; }
def lcsDP(string_1, string_2, M, N): dp = [[0] * (N + 1) for _ in range(M + 1)] count = 0
for i in range(1, M + 1): for j in range(1, N + 1): if string_1[i - 1] == string_2[j - 1]: dp[i][j] = dp[i - 1][j - 1] + 1 count = max(count, dp[i][j]) else: dp[i][j] = 0
Console.WriteLine("Length of Longest Common Substring is " + lcsDP(str1, str2, str1.Length, str2.Length)); } }
Output
Time Complexity: O(M*N)
where M and N are the lengths of the given string1 and string2. The nested loop for filling the entries of the 2-D dp table requires a time on O(M*N).
Space Complexity: O(M*N)
where M and N are the lengths of the given string1 and string2. 2-D dp array of size [M+1][N+1] is required.
Frequently asked questions
How do you find the longest common substring?
By using algorithms like dynamic programming, iterate through possible substrings of two strings to find the longest sequence present in both.
What is the difference between LCS and longest common substring?
LCS (Longest Common Subsequence) considers non-contiguous elements, while longest common substring requires elements to be contiguous.
What is the recursive method for the longest common substring?
Recursively compare characters of two strings. If they match, recursively check the next characters; otherwise, start checking from the next positions.
What is the complexity of finding longest common substring?
Finding the longest common substring can be done in O(m*n) time using dynamic programming. For each substring of both strings, determine the length of the longest common suffix and record that length in a table. Use the previously computed values to determine the answer of larger suffixes in O(1) time.
What is the naive approach to the longest common substring?
The naive approach for finding the longest common substring between two strings involves comparing all possible substrings of both strings. It has a time complexity of O(m^2 * n), making it impractical for large inputs.
Conclusion
In this article, we discussed the Longest Common Substring Problem, the various approaches to solve this problem programmatically, and the time and space complexities. We find out how to use extra space and drastically reduce the time complexity from exponential to polynomial. If you want to practice this problem, then you can visit Code360. Recommended problems -