Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Brute Force Approach
2.1.
Algorithm
2.2.
C++
2.2.1.
Algorithm Complexity: 
3.
Top-Down Approach
3.1.
Algorithm 
3.2.
C++
3.2.1.
Algorithm Complexity:
4.
Bottom-Up Approach
4.1.
Algorithm 
4.2.
C++
4.2.1.
Algorithm Complexity: 
5.
Frequently Asked Questions
5.1.
What is meant by longest repeating subsequence?
5.2.
What is the longest repeating subsequence in pi?
5.3.
What is longest continuous common subsequence?
5.4.
What is the meaning of longest palindromic subsequence?
6.
Conclusion
Last Updated: Aug 27, 2024
Medium

Longest Repeating Subsequence

Author Riya
1 upvote

Introduction

This blog will discuss the various approaches to solve the Longest Repeating Subsequence problem. Before jumping into the problem, let’s first recall what a subsequence is and then understand the Longest Repeating Subsequence problem.

longest repeating subsequence

A subsequence of a string is generated by deleting zero, one, or more characters of the original string without disturbing the relative positions of the remaining characters.

In this problem, we will be given a string. We have to find the length of the longest subsequence occurring at least two times (i.e., the subsequence is repeated in the given string)  in the given string such that the two subsequences shouldn’t contain any character with the same index in the original string.

Let’s take an example to understand it further:

Let the given string be S= “XXYZYVWW” and assume 0-based indexing. 

In the above string, we can see that the repeating subsequences are ‘X,’ ‘Y,’ and ‘W,’ “XYW.” So, the longest repeating subsequence will be “XYW,” and the length of the longest Repeating Subsequence will be 3. We can see that there are two occurrences of the subsequence “XYZ,” one will be the subsequence formed by the 0th, 2nd, and 6th characters (shown in blue color in the table below), and the other will be the subsequence formed by 1st,  4th, and 7th characters (shown in red color in the table below).

X

X

Y

Z

Y

V

W

W

 

If we understand this question carefully, we can see that this question is just a variation of the Longest Common Subsequence Problem (LCS). Here, we have to find the length of the longest common subsequence of and where is the given string, with the condition that while checking for the same characters in the two strings, we have to take care that the two same characters shouldn’t be at the same index in the given string.

Brute Force Approach

Let’s assume that the given string is S, and we have to find the Longest Repeating Subsequence in S. As discussed above, this problem is a modification of the Longest Common Subsequence Problem. Now this problem is reduced to find the LCS (SS) with the condition that not any character in the resulting subsequence should be at the same index in both the S, where LCS (s1, s2) is the function to find the length of the longest common subsequence of the two strings s1 and s2.

So, longestRepeatingSubsequence(S) = LCS (SS) with a restriction that the resulting subsequence shouldn’t contain any character with the same index in their corresponding original string.

We can understand this by taking an example:

Let S= “XXYZYVWW”

LCS(S, S) = S, as the whole string is the same but if we consider the restriction that is written above then, the two ‘X’s shown in red, two ‘Y’s shown in blue and two ‘W’s shown in orange will be in the common subsequence from the two strings. Thus the LCS(S, S) with the condition that not any character in the resulting subsequence should be at the same index in both the S,  will be “XYW”. We can see that each character in “XYW” is at different indexes in both the S as shown below.

S= “XXYZYVWW”

S= “XXYZYVWW

 

Thus the code for this problem will be similar to the LCS problem. We will find LCS(S, S) with the above described condition. So, we need to consider each character in the two strings one by one. (Please note that the two strings are same here)

So, on considering one by one each character in the two strings, two situations will arise:

  1. The two characters are identical and are not occurring at the same index in their respective strings.
  2. The two characters either occur at the same index in their respective strings or differ.

If the first condition arises, we will increase the longest repeating subsequence by one and move to the next character of both strings. If the second condition occurs, we will take the maximum of the two cases - one by moving to the next character in the first string and the other by moving to the next character in the second string.

Algorithm

Step 1. Create a recursive function ‘longestRepeatingSubsequence()’ that will accept the below parameters -

  • ‘str’: The given string for which we have to calculate the length of the longest repeating subsequence.
  • ‘n’: Length of the given string 
  • ‘pos1’: The variable denoting the current character of ‘str1’.
  • ‘pos2’: The variable denoting the current character of ‘str2’.
     

Step 2. Then write the base case of the recursive function i.e., if ‘pos1’ or ‘pos2’ reaches the end of the string, then we can’t add more characters to the longest repeating subsequence, so return 0.

Step 3. Now check for the two conditions described in the brute force approach section and do accordingly.

  • C++

C++

#include <bits/stdc++.h>

using namespace std;

int longestRepeatingSubsequence(string str, int pos1, int pos2, int n)

{



        /*

           base case

           if pos1 or pos2 reaches n, it means there are no further characters

           to consider, so return 0

       */       

        if(pos1 == n || pos2 == n) return 0;



        /*

           If the two characters are equal and not occurring at the same index,

           then increase the length of the longest repeating subsequence by 1

        */

        if(pos1 != pos2 && str[pos1] == str[pos2]) return 1+longestRepeatingSubsequence(str, pos1 + 1, pos2 + 1, n);

        else return max(longestRepeatingSubsequence(str, pos1 + 1, pos2, n), longestRepeatingSubsequence(str, pos1, pos2 + 1, n));

}



int main()

 {

string str="XXYZYVWW";

cout<<"The length of the Longest Repeating Subsequence in the given  string is: "<<longestRepeatingSubsequence(str, 0, 0, 8)<<endl;

return 0;

 }


Output:

The length of the Longest Repeating Subsequence in the given string is: 3

Algorithm Complexity: 

Time Complexity: O(2 ^ N)

In every recursive call ‘longestRepeatingSubsequence()’, we are making two more calls to the same function, so calls are increasing at a rate of 2 until ‘pos1’ or ‘pos2’ reaches N. There will be, at most, ‘2 ^ N’ calls in this code. Therefore the overall time complexity will be O(2 ^ N), where ‘N’ is the length of the given string.

Space Complexity: O(1) 

As we are using constant space, the overall space complexity will be O(1).

Also check out - Substr C++

Top-Down Approach

In the above approach, there are a lot of duplicate calls for the same state ‘pos1’ and ‘pos2’, which leads to exponential complexity. So, we will use a dynamic programming algorithm to store the states to reduce the time complexity.

We can avoid calling a recursive function again for the same state by using a two-dimensional vector and storing the state in that vector. If we encounter the same state during execution, we will return from the vector instead of recomputing. This approach will drastically decrease the time complexity.  We will need to make some changes in the above recursive code to make it a top-down dynamic programming solution.

Top-Down Approach

Algorithm 

Step 1. Create a two-dimensional vector ‘dp’ of size (N + 1) x (N + 1), where ‘N’ is the length of the given string. Initialize all the elements of the vector with ‘-1’ and call the ‘longestRepeatingSubsequence()’ function, which will now have one extra argument - two-dimensional vector ‘dp’ other than those that were there in the recursive solution.

We have taken the size N x N as pos1 and pos2 are varying from 0 to N.

Step 2. Base case will be if pos1 = N or pos2 = N, then return 0. 

Step 3. Check if the value is present in the vector’ dp’. If yes, then return it.

Step 4. If the value is not present in the ‘dp’ vector, then call the recursive function as in the brute force solution, but here we have to do one more thing: at each step, we have to store the calculated result in our ‘dp’ vector.

  • C++

C++

#include <bits/stdc++.h>

using namespace std;

int longestRepeatingSubsequence(string str, int pos1, int pos2, int n, vector<vector<int>>& dp)

     {



         /*

             base case

             if pos1 or pos2 reaches n, it means there are no further

             characters to consider, so return 0

         */

         if(pos1 == n || pos2 == n) return 0;



         /*

             If this state is already computed, then directly return it from

             the “dp” vector

         */

         if(dp[pos1][pos2] != -1) return dp[pos1][pos2];

 

         if(pos1 != pos2 && str[pos1] == str[pos2]) return dp[pos1][pos2] = 1 + longestRepeatingSubsequence(str, pos1 + 1, pos2 + 1, n, dp);

         else return dp[pos1][pos2] = max(longestRepeatingSubsequence(str, pos1 + 1, pos2, n, dp),longestRepeatingSubsequence(str,pos1,pos2+1,n,dp));

     }



int main()

 {

string str = "XXYZYVWW";

vector<vector<int>> dp(9,vector<int>(9,-1));

cout<<"The length of the Longest Repeating Subsequence in the given string is: " <<longestRepeatingSubsequence(str, 0, 0, 8, dp)<<endl;

return 0;

 }


Output:

The length of the Longest Repeating Subsequence in the given string is: 3

Algorithm Complexity:

Time Complexity: O(N ^ 2)

There will be ‘N * N’ calls in a recursive function as we are using a 2D vector, and if the value is present in this array, then we are not making duplicate calls, so there will be utmost N * N combinations. Therefore the overall time complexity is O(N ^ 2), where ‘N’ is the length of the given string.

Space Complexity: O(N ^2)

As we are using a 2D vector of size ‘N x N’, the overall space complexity is O(N^2), where N is the length of the given string.

Bottom-Up Approach

In the above approach, the overhead of the internal stack is being used in recursion, and we can eliminate it by using an iterative approach. We will take a 2D array and will start populating it from the bottom ( 0th index ) to top ( Nth index ); hence by using this approach, we can make our algorithm more memory efficient. 

Algorithm 

Step 1. Create a function that will accept the below parameters:

  1. ‘str’: The given string for which we have to calculate the length of the longest repeating subsequence.
  2. ‘n’: Length of the given string.

Step 2. Create a 2D array ‘dp’ of size (N+1) x (N+1). 

Step 3. Run two for loops to consider all the characters one by one to find LCS(str, str) with the condition that no two characters in the resulting subsequence will have the same index in their corresponding original string, resulting subsequence will be our longest repeating subsequence.

Step 4: If i=0 or j=0, i.e if the length of the string will be 0, then dp[i][j]=0

Step 4: Then check if the two characters are not at the same index and are identical, then increase the length of the resulting subsequence by 1, else take the maximum of the two cases -  one by moving to the next character in the first string and the another by moving to the next character in the second string.

Step 5:  Return dp[N][N]

  • C++

C++

#include <bits/stdc++.h>

using namespace std;

int longestRepeatingSubsequence(string str, int n)

{

   int dp[n+1][n+1];

   for(int i = 0; i <= n; i++)

     {

         for(int j = 0; j <= n; j++)

          {

/*

                  if length of the string will be 0, then length

                  of longest repeating subsequence will be 0  

              */              

              if(i == 0 || j == 0) dp[i][j] = 0;



              /*

                  If the two characters are not at same position and are equal,

                  then we will increase the length of the longest repeating

                  subsequence by one, else we will take maximum of the two

                  cases as described in the algorithm section

              */                   

              else if(i != j && str[i-1] == str[j-1]) dp[i][j] = 1 + dp[i-1][j-1];

              else dp[i][j] = max(dp[i-1][j], dp[i][j-1]);

          }

     }

   return dp[n][n];

}



int main()

 {

string str="XXYZYVWW";

vector<vector<int>> dp(9, vector<int>(9, -1));

cout<<"The length of the Longest Repeating Subsequence in the given string is: " <<longestRepeatingSubsequence(str, 8) <<endl;

return 0;

 }


Output:

The length of the Longest Repeating Subsequence in the given string is: 3

Algorithm Complexity: 

Time Complexity: O(N ^ 2)

There will be ‘N * N’ calls in a 2D array using the nested ‘for’ loop. Therefore the overall time complexity is O(N ^2), where N is the length of the given string.

Space Complexity: O(N ^ 2)

As we are using a 2D array of size ‘N * N’, the overall space complexity is O(N^2), where N is the length of the given string.

Frequently Asked Questions

What is meant by longest repeating subsequence?

The longest repeating subsequence is the longest sequence within a string that repeats at least twice without overlapping. It helps identify patterns and is used in areas like DNA sequence analysis and text compression.

What is the longest repeating subsequence in pi?

This refers to the longest sequence of digits that repeats within the infinite decimal expansion of Pi. Finding such sequences can be challenging due to Pi's non-repeating, non-terminating nature.

What is longest continuous common subsequence?

The longest continuous common subsequence is the longest sequence of characters that appears in the same order and without any interruptions in both strings. It's useful in comparing DNA, texts, or patterns.

What is the meaning of longest palindromic subsequence?

The longest palindromic subsequence is the longest sequence within a string that reads the same forwards and backwards. It's a common problem in computational biology and string processing, offering insights into symmetry and structure.

Conclusion

In this article, we discussed what the Longest Repeating Subsequence Problem is, discussed the various approaches to solve this problem programmatically, and discussed the time and space complexities. We find out how to use extra space and drastically reduce the time complexity from exponential to polynomial.  

Recommended Readings: 

You can refer to our blogs on Code360 to upskill yourself.

Live masterclass