Data structures & algorithms (Beginner to Intermediate)

Free guided path13 chapters99+ problems

Earn badges and level up

Introduction

This blog will discuss the various approaches to solving the Longest Common Substring Problem. Before jumping into the solution, let’s first recall what a substring is and then understand the Longest Common Substring problem.

A substringis a contiguous part of the string. Its length will be less than or equal to the original string length. Let’s get into the article now.

Problem Statement

In this problem, we will be given two strings. We have to find the length of the longest substring, which is common in the two given strings.

Example

Assume the two given strings are:

s1 = “qdef”

s2 = “defq”

“q”, “d”, “e”, “f”, “de”, “ef”, and “def” are the common substrings between s1 and s2. “def” is the longest common substring. So, the length of the longest common substring of s1 and s2 will be “3”.

Get the tech career you deserve, faster!

Connect with our expert counsellors to understand how to hack your way to success

User rating 4.7/5

1:1 doubt support

95% placement record

Akash Pal

Senior Software Engineer

326% Hike After Job Bootcamp

Himanshu Gusain

Programmer Analyst

32 LPA After Job Bootcamp

After Job Bootcamp

1. Simple Approach

The simplest approach is to generate all possible substrings of one string. Then check if each of them is a substring of the other string. This approach has a time complexity of O(n^3). n is the length of the longest input string. It is not efficient for large input sizes. It is mainly used for small instances or as a baseline for comparison.

C++ Code of Simple Approach

#include <iostream>
#include <string.h>
using namespace std;
//simple approach
int LCSubstr(string str1 ,string str2){
int result = 0;
/*
loop to find all substrings of string 1.
check if current substring matches.
maximize the lenght of common substring
*/
for (int i = 0; i < str1.length(); i++) {
for (int j = 0; j < str2.length(); j++) {
int k = 0;
while ((i + k) < str1.length() &&
(j + k) < str2.length() && str1[i + k] == str2[j + k]){
k = k + 1;
}
result = max(result, k);
}
}
return result;
}
// Driver code
int main(){
string X = "qdef";
string Y = "defq";
cout << "Length of Longest Common Substring is " << LCSubstr(X, Y);
return 0;
}

Output

Java Code of Simple Approach

// This is the Simple Approach.
class Main{
static String str1, str2;
public static int lcsubstr(String str1, String str2) {
int result = 0;
for(int i = 0; i < str1.length(); i++) {
for (int j = 0; j < str2.length(); j++) {
int k = 0;
while ((i + k) < str1.length() &&
(j + k) < str2.length() && str1.charAt(i + k) == str2.charAt(j + k)) {
k = k + 1;
}
result = Math.max(result, k);
}
}
return result;
}
// Driver code
public static void main(String[] args) {
str1 = "qdef";
str2 = "defq";
System.out.println("Length of Longest Common Substring is "+ lcsubstr(str1, str2));
}
}

Output

Python Code of Simple Approach

def lcs(str1, str2):
result = 0
for i in range(len(str1)):
for j in range(len(str2)):
k = 0
while ((i + k) < len(str1) and (j + k) < len(str2)
and str1[i + k] == str2[j + k]):
k = k + 1
result = max(result, k)
return result
def main():
s1 = "qdef"
s2 = "defq"
print("The Length of Longest Common Substring is", lcs(s1, s2))
if __name__ == '__main__':
main()

Output

2. Recursive Approach

This problem can be solved using Recursion. We can consider one by one each character in the two strings. The following two cases will arise:

1. The two characters from the two strings are the same:

In this case, increase the length of the longest common substring and move to the next characters in both strings.

2. The two characters from the two strings are different:

In this case, take the maximum of the current length of the longest common string and the two lengths of the longest common substring from two cases - one by moving to the next character in only the first string and another by moving to the next character in only the second string.

Let’s understand this algorithm in detail and its C++ code.

Algorithm

Step 1. Create a recursive function “longestCommonSubstringLength()” which takes input

s1 - First given string

s2 - Second given string

i - It denotes the current length of s1

j - It denotes the current length of s2

maxLen - For storing the length of the longest common substring

Step 2. Write the base case of the recursive function. If i <= 0 or j <= 0 i.e. if the current length of any of the string becomes 0, then simply return ‘maxLen’.

Step 3. Now check if the characters at index ‘i-1’ in the first string and ‘j-1’ in the second string are the same, then increase the value of ‘maxLen’ by one and call the function for the smaller sub-problem.

Step 4. Else If the two characters are not the same, then update the value as the maximum of ‘maxLen’ and the value returned by calling the recursive function for two smaller sub-problems.

Step 5. Finally, return ‘maxLen,’ i.e., the length of the longest common substring.

Implementation in C++

#include <bits/stdc++.h>
using namespace std;
int longestCommonSubstringLength(string s1, string s2, int i, int j, int maxLen){
// Base Case
if (i <= 0 || j <= 0) {
return maxLen;
}
int maxLen1 = maxLen;
// If the characters are same in both the string then increase maxLen by 1
if (s1[i - 1] == s2[j - 1]) {
maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1);
}
int maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0);
int maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0);
return max(maxLen1, max(maxLen2, maxLen3));
}
int main(){
string s1 = "qdef";
string s2 = "defq";
int maxLen=0;
cout<<"The Length of Longest Common Substring is "<<
longestCommonSubstringLength(s1, s2, s1.length(), s2.length(), maxLen)<<endl;
return 0;
}

Output

Implementation in Java

// This is the recursive Approach
class Main {
static String string_1, string_2;
static int LCSRecursive(int i, int j, int count) {
// Base Case
if (i == 0 || j == 0) {
return count;
}
/*
If the last character is same then the
next character need to be compared
increase the count by 1
*/
if (string_1.charAt(i - 1) == string_2.charAt(j - 1)) {
count = LCSRecursive(i - 1, j - 1, count + 1);
}
/*
if the characters are not the same then you need to
decrease the i once and update the count to 0
In the other case we decrement the j and update the count to 0
Keep track of maximum value.
*/
count = Math.max(count, Math.max(LCSRecursive(i, j - 1, 0), LCSRecursive(i - 1, j, 0)));
return count;
}
// Driver code
public static void main(String[] args) {
string_1 = "qdef";
string_2 = "defq";
System.out.println("The Length of Longest Common Substring is "+ LCSRecursive(string_1.length(), string_2.length(), 0));
}
}

As the function is called 3 times, and it checks for all the characters, i.e. M + N,

where M and N are the lengths of the respective strings. A recursive tree is created.

Space complexity: O(M+N)

The recursive function takes a space of N+M for auxiliary stack space during recursion.

3. Dynamic Programming Approach

In the recursive approach, there are a lot of duplicate calls for the same state i.e. for the same values of i, j, and maxLen, which leads to exponential time complexity.

The above solution can be modified, and the results can be stored in a table to avoid the repetitive calling of recursive functions for the same state. Thus dynamic programming approach will significantly reduce the time complexity as compared to the recursive solution.

The algorithm will be similar to the above recursive approach. The only difference is that instead of repetitively calling the recursive function, we will store the length of the longest common substring for each state in a table.

Algorithm

Create a matrix dp of size (m+1) x (n+1).

Initialize the first row and the first column of the matrix dp with zeros.

Initialize two variables: maxLength to store the length of the longest common substring and endIndex to store the ending index of the longest common substring.

Start a nested loop.

If i == 0 or j == 0, then dp[i][j] = 0.

If str1[i] == str2[j], then dp[i][j] = 1 + dp[i – 1][j – 1].

Keep track of the maximum value and return it.

C++ Implementation

#include <iostream>
#include <string.h>
using namespace std;
int lcsDP(string string_1, string string_2, int M, int N) {
// create a 2-D array named dp and intitialise all the values as zero.
int dp[M + 1][N + 1];
int count = 0;
/*
intitialise a nested loop
Check the base Case
*/
for (int i = 0; i <= M; i++) {
for (int j = 0; j <= N; j++) {
if (i == 0 || j == 0)
dp[i][j] = 0;
else if (string_1[i - 1] == string_2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1;
count = max(count, dp[i][j]);
}
else
dp[i][j] = 0;
}
}
return count;
}
// Driver code
int main() {
string str1 = "qdef";
string str2 = "defq";
cout << "Length of Longest Common Substring is " << lcsDP(str1, str2, str1.length(),str2.length());
return 0;
}

Output

Java Implementation

class Main {
// function which returns the length of longest common substring
static int lcsDP(String string_1, String string_2, int M, int N) {
// create a 2-D array named dp and intitialise all the values as zero.
int dp[][] = new int[M + 1][N + 1];
int count = 0;
/*
fill the values in the dp array in bottom-up manner
initialize a nested loop
if the characters at string_1 and string_2 matches
keep track of maximun element
if the characters did not match then the value is zero
*/
for (int i = 0; i <= M; i++) {
for (int j = 0; j <= N; j++) {
if (i == 0 || j == 0) {
dp[i][j] = 0;
}
else if (string_1.charAt(i - 1) == string_2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1] + 1;
count = Integer.max(count, dp[i][j]);
}
else {
dp[i][j] = 0;
}
}
}
return count;
}
// Driver code
public static void main(String[] args) {
String str1 = "qdef";
String str2 = "defq";
System.out.println("The Length of the Longest Common Substring is " + lcsDP(str1, str2, str1.length(),str2.length()));
}
}

Output

Python Implementation

def lcsDP(string_1, string_2, M, N):
# create a 2-D array named dp and intitialise all the values as zero.
dp = [[0 for k in range(N+1)] for l in range(M+1)]
# To store the length of the longest common substring
count = 0
# initialize a nested loop
# if the characters at string_1 and string_2 matches
# keep track of maximun element
# if the characters did not match then the value is zero
for i in range(M + 1):
for j in range(N + 1):
if (i == 0 or j == 0):
dp[i][j] = 0
elif (string_1[i-1] == string_2[j-1]):
dp[i][j] = dp[i-1][j-1] + 1
count = max(count, dp[i][j])
else:
dp[i][j] = 0
return count
# Driver code
if __name__ == "__main__":
string_1 = "qdef"
string_2 = "defq"
print("The Length of Longest Common Substring is", lcsDP(string_1, string_2, len(string_1), len(string_2)))

Output

Time Complexity: O(M*N)

where M and N are the lengths of the given string1 and string2. The nested loop for filling the entries of the 2-D dp table requires a time on O(M*N).

Space Complexity: O(M*N)

where M and N are the lengths of the given string1 and string2. 2-D dp array of size [M+1][N+1] is required.

Frequently asked questions

What is the complexity of finding longest common substring?

Finding the longest common substring can be done in O(m*n) time using dynamic programming. For each substring of both strings, determine the length of the longest common suffix and record that length in a table. Use the previously computed values to determine the answer of larger suffixes in O(1) time.

What is the naive approach to the longest common substring?

The naive approach for finding the longest common substring between two strings involves comparing all possible substrings of both strings. It has a time complexity of O(m^2 * n), making it impractical for large inputs.

What is the real life application of the longest common subsequence?

The Longest common subsequence is used in plagiarism detection, version control systems, bioinformatics for DNA sequence alignment, data comparison, speech recognition, spell checkers, music analysis, image processing, file comparison, and network routing to identify common patterns in data sequences.

Conclusion

In this article, we discussed the Longest Common Substring Problem, the various approaches to solve this problem programmatically, and the time and space complexities. We find out how to use extra space and drastically reduce the time complexity from exponential to polynomial. If you want to practice this problem, then you can visit Coding Ninjas Studio. Recommended problems -