Table of contents
1.
Introduction
2.
Problem Statement
3.
Approach
4.
Pseudocode
5.
Code in Python3
5.1.
Time Complexity
5.2.
Space Complexity
6.
Frequently Asked Questions
7.
Key Takeaways
Last Updated: Mar 27, 2024

Repeated Substring Pattern

Author HET FADIA
1 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

This article aims to familiarise you with the string algorithms used for pattern matching. We will be solving the Repeated Substring Pattern using KMP and the z algorithm.

To brush up on your knowledge of the Knuth Morris Pratt algorithm, you can read the article Knuth Morris Pratt on Coding Ninjas Studio.

Let’s see the problem statement in the next section.

Problem Statement

Given a string, you need to print True if you can construct the string adding repeated copies of one of its proper substrings. 

Note: You can’t take the entire string as its substring here.

More precisely, you are given a string S and you have to output True if there exists a substring S[i:j] which added multiple times becomes S otherwise, you have to output False.

Let us see a few examples for a better understanding.

Input

abaabaabaaba

Output

True

Explanation:

We can make the input string by taking its substring “aba” 4 times. More precisely S= “aba” + “aba” + “aba” + “aba” 

Input

abacabac

Output

True

Explanation: S= “abac” + “abac”

Input

abac

Output:

False

Explanation: No such substring of “abac” exists, which when added multiple times becomes “abac”. Note that here we can’t take the whole string as a substring.

Note: We will use the term “ lps” which means longest proper prefix which is also a suffix. For more information about lps, check out KMP algorithm.

Approach

The approach is simple. If we find out the length of the substring which gets repeated in the given string then we can easily calculate the answer.

Now the length of the required substring = N - lps[N-1]. This is because of S[0… N-length] matches with S[length …N]. From this information, we get to know that S[i]=S[i-length] for all i where length<=i<N. Hence, we can derive that S[0… length] is equal to S[length+1 .. 2*length] and so on.

So, we can know that S[0.. length] when added multiple times results in S.

Let us see this through an example:

Consider string a= “abaabaaba”.

Figure 1: Shows the matching for the string “abaabaaba” in the LPS array

Thus lps_array is [0, 0, 1, 1, 2, 3, 4, 5, 6].

Length = 9-lps[9-1]=9-6=3. Thus the substring has length 3.

Similarly for the z_algorithm when z[i]= n-i, the suffix starting from i is S[i.. N-1]. Thus S[0…N-i] is equal to S[i.. N-1]. Thus from this information we get to know that S[i]=S[i-length] for all i where length<=i<N.

Pseudocode

Approach 1: pseudocode to find if a string is repeated substring

DEFINE FUNCTION z_function(S):
    l=0
    r=0
    SET n TO len(S)
    SET z TO [0 FOR i IN range(n)]
    FOR i IN 1 to n-1:
        IF i<=r:
            z[i]=min(r-i+1,z[i-l])
        WHILE i+z[i]<n and S[z[i]]==S[i+z[i]]:
            z[i]+=1
        IF i+z[i]-1>r:
            l=i
            r=i+z[i]-1
    SET z[0] TO n
    RETURN z
DEFINE FUNCTION is_repeated_subarray(s):
    z=z_function(s)
    n=len(s)
    length=n
    FOR i IN range(1,n):
        IF z[i]==n-i:
            length=i
            break
    IF len(s)%length!=0 or length==len(s):
        RETURN False
    FOR i IN range(length,len(s)):
        IF s[i]!=s[i-length]:
            RETURN False
    RETURN True

Approach 2: Pseudocode for z function algorithm

DEFINE FUNCTION get_lps(search):
    set lps_arr to [0 FOR i IN range(len(search))]
    previous=0
    curr_index=1
    WHILE curr_index is less than length of the (lps_arr):
        IF search[curr_index]==search[previous]:
            previous+=1
            lps_arr[curr_index]=previous
            curr_index+=1
        ELSEIF previous==0:
            lps_arr[curr_index]=0
            curr_index+=1
        ELSE:
            previous=lps_arr[previous-1]
    RETURN lps_arr
DEFINE FUNCTION is_repeated_subarray(s):
    lps=get_lps(s)
    length=len(s)-lps[-1]

    IF len(s)%length!=0 or length==len(s):
        RETURN False
    FOR i IN range(length,len(s)):
        IF s[i]!=s[i-length]:
            RETURN False
    RETURN True

Also check out - Substr C++

Code in Python3

Approach 1: Python code to find if it is a repeated substring pattern using lps( longest proper prefix which is also a suffix) array

def get_lps(search):
    # lps array is the longest proper prefix which is also a suffix
    lps_arr=[0 for i in range(len(search))]
    # previous is the last longest proper prefix which is also a suffix
    previous=0
    # Here we are checking proper prefix so that we start with the index 1
    # as the prefix can't be itself
    curr_index=1
    while curr_index<len(lps_arr):
        if search[curr_index]==search[previous]:
            """This means that we have found a match for the last longest lps
            and the current prefix
            Thus we update the lps array"""
            previous+=1
            lps_arr[curr_index]=previous
            curr_index+=1
        elif previous==0:
            """This means that we have not found a match for the current prefix with the
            last lps.
            As previous=0 then there does not exist previous longest lps."""
            # no lps[curr index] exist so lps[curr_index]=0
            lps_arr[curr_index]=0
            curr_index+=1
        else:
            # we replace current_index
            previous=lps_arr[previous-1]
    return lps_arr
def is_repeated_subarray(s):
    lps=get_lps(s)
    length=len(s)-lps[-1]
   
    if len(s)%length!=0 or length==len(s):
        return False
    for i in range(length,len(s)):
        if s[i]!=s[i-length]:
            return False
    return True

s="abaabaabaaba"
answer=is_repeated_subarray(s)
print(answer)
You can also try this code with Online Python Compiler
Run Code

Approach 2: Code to find if the string is repeated substring using z functions.

def z_function(S):
    l=0
    r=0
    n = len(S)
    z = [0 for i in range(n)]
    for i in range(1, n):
        if i<=r:
            z[i]=min(r-i+1,z[i-l])
        while i+z[i]<n and S[z[i]]==S[i+z[i]]:
            z[i]+=1
        if i+z[i]-1>r:
            l=i
            r=i+z[i]-1
    z[0] = n
    return z
def is_repeated_subarray(s):
    z=z_function(s)
    n=len(s)
    length=n
    for i in range(1,n):
        if z[i]==n-i:
            length=i
            break
    if len(s)%length!=0 or length==len(s):
        return False
    for i in range(length,len(s)):
        if s[i]!=s[i-length]:
            return False
    return True
s="abaabaabaaba"
res=is_repeated_subarray(s)
print(res)
You can also try this code with Online Python Compiler
Run Code

Input

abaabaabaaba

Output

True

Time Complexity

The time complexity for both approaches is O(N).

For approach 1, we are finding lps_array, which takes O(N) time. From that LPS array, we know the length of the substring.

For approach 2, the z-function also takes O(N) time. Similarly, we can know the length of the substring from the z_array.

After knowing the length, we can check if the string can be made from the substring in O(N) time.

Space Complexity

The space complexity for both approaches is O(N).

For the LPS approach, we are making an LPS array of size N. 

For the z-function approach, we are making a Z array of size N.

 

Also see, Morris Traversal for Inorder.and Rabin Karp Algorithm

 

Frequently Asked Questions

1. What is the time complexity of the naive algorithm?

The time complexity of the naive algorithm is O(N*N). The reason is we are checking for every s[0..i] where i= 1,2, …N whether s can be made from s[0..i].

2. What is the space complexity of the naive algorithm?

The space complexity for the naive approach will be O(1) as we are using no extra space.

3. What is the z algorithm, and how can we solve this question using z_function?

The z function returns us an array Z. Z[i] represents the length of the longest common prefix for S[i… N-1] and S. Thus, if Z[i]= N-i means that S[i… N-1] and S is N-i have the longest common prefix of N-i. Since the first string is of length N-i, the longest common prefix of S[i … N-1] and S is N-i. Thus S[i]=S[i-length] for all i where length<=i<N. Thus S[: length] is the repeating substring.

4. How can we optimise the naive algorithm?

The substring is repeated multiple times and added to form the string. Thus the string can be formed from the substring only when the length of the substring divides the length of the string. Thus the brute force can be optimised by only checking on those substrings whose length divides the size of the given string. 

5. Why does making LPS array make the code significantly efficient?

LPS array precalculates the longest proper prefix, which is also a suffix. Using this, we get the length of the repeating substring. Thus it helps efficiently to reduce the time complexity.

Key Takeaways

The article helps us understand how the KMP and z function to know if a string can be made by adding repeated copies of one of its substrings. Comments have been added to the code for better understanding. You can also copy the code and run it on your system on multiple inputs to understand the approach better.

Check out this problem - Longest Common Prefix

Check out Coding Ninjas Studio to learn more about competitive programming, DSA, attempt mock tests, interview preparation and much more.

Happy Learning!!!

Live masterclass