Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
1.
Introduction
2.
Problem Statement
3.
Approach
4.
Pseudocode
5.
Code in Python3
5.1.
Time Complexity
5.2.
Space Complexity
6.
7.
Key Takeaways
Last Updated: Mar 27, 2024

# Repeated Substring Pattern

1 upvote
Master Python: Predicting weather forecasts
Speaker
Ashwin Goyal
Product Manager @

## Introduction

This article aims to familiarise you with the string algorithms used for pattern matching. We will be solving the Repeated Substring Pattern using KMP and the z algorithm.

To brush up on your knowledge of the Knuth Morris Pratt algorithm, you can read the article Knuth Morris Pratt on Coding Ninjas Studio.

Letâ€™s see the problem statement in the next section.

## Problem Statement

Given a string, you need to print True if you can construct the string adding repeated copies of one of its proper substrings.

Note: You canâ€™t take the entire string as its substring here.

More precisely, you are given a string S and you have to output True if there exists a substring S[i:j] which added multiple times becomes S otherwise, you have to output False.

Let us see a few examples for a better understanding.

Input

``abaabaabaaba``

Output

``True``

Explanation:

We can make the input string by taking its substring â€śabaâ€ť 4 times. More precisely S= â€śabaâ€ť + â€śabaâ€ť + â€śabaâ€ť + â€śabaâ€ť

Input

``abacabac``

Output

``True``

Explanation: S= â€śabacâ€ť + â€śabacâ€ť

Input

``abac``

Output:

``False``

Explanation: No such substring of â€śabacâ€ť exists, which when added multiple times becomes â€śabacâ€ť. Note that here we canâ€™t take the whole string as a substring.

Note: We will use the term â€ś lpsâ€ť which means longest proper prefix which is also a suffix. For more information about lps, check out KMP algorithm.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

## Approach

The approach is simple. If we find out the length of the substring which gets repeated in the given string then we can easily calculate the answer.

Now the length of the required substring = N - lps[N-1]. This is because of S[0â€¦ N-length] matches with S[length â€¦N]. From this information, we get to know that S[i]=S[i-length] for all i where length<=i<N. Hence, we can derive that S[0â€¦ length] is equal to S[length+1 .. 2*length] and so on.

So, we can know that S[0.. length] when added multiple times results in S.

Let us see this through an example:

Consider string a= â€śabaabaabaâ€ť.

Figure 1: Shows the matching for the string â€śabaabaabaâ€ť in the LPS array

Thus lps_array is [0, 0, 1, 1, 2, 3, 4, 5, 6].

Length = 9-lps[9-1]=9-6=3. Thus the substring has length 3.

Similarly for the z_algorithm when z[i]= n-i, the suffix starting from i is S[i.. N-1]. Thus S[0â€¦N-i] is equal to S[i.. N-1]. Thus from this information we get to know that S[i]=S[i-length] for all i where length<=i<N.

## Pseudocode

Approach 1: pseudocode to find if a string is repeated substring

``````DEFINE FUNCTION z_function(S):
l=0
r=0
SET n TO len(S)
SET z TO [0 FOR i IN range(n)]
FOR i IN 1 to n-1:
IF i<=r:
z[i]=min(r-i+1,z[i-l])
WHILE i+z[i]<n and S[z[i]]==S[i+z[i]]:
z[i]+=1
IF i+z[i]-1>r:
l=i
r=i+z[i]-1
SET z[0] TO n
RETURN z
DEFINE FUNCTION is_repeated_subarray(s):
z=z_function(s)
n=len(s)
length=n
FOR i IN range(1,n):
IF z[i]==n-i:
length=i
break
IF len(s)%length!=0 or length==len(s):
RETURN False
FOR i IN range(length,len(s)):
IF s[i]!=s[i-length]:
RETURN False
RETURN True``````

Approach 2: Pseudocode for z function algorithm

``````DEFINE FUNCTION get_lps(search):
set lps_arr to [0 FOR i IN range(len(search))]
previous=0
curr_index=1
WHILE curr_index is less than length of the (lps_arr):
IF search[curr_index]==search[previous]:
previous+=1
lps_arr[curr_index]=previous
curr_index+=1
ELSEIF previous==0:
lps_arr[curr_index]=0
curr_index+=1
ELSE:
previous=lps_arr[previous-1]
RETURN lps_arr
DEFINE FUNCTION is_repeated_subarray(s):
lps=get_lps(s)
length=len(s)-lps[-1]

IF len(s)%length!=0 or length==len(s):
RETURN False
FOR i IN range(length,len(s)):
IF s[i]!=s[i-length]:
RETURN False
RETURN True``````

Also check out - Substr C++

## Code in Python3

Approach 1: Python code to find if it is a repeated substring pattern using lps( longest proper prefix which is also a suffix) array

``````def get_lps(search):
# lps array is the longest proper prefix which is also a suffix
lps_arr=[0 for i in range(len(search))]
# previous is the last longest proper prefix which is also a suffix
previous=0
# Here we are checking proper prefix so that we start with the index 1
# as the prefix can't be itself
curr_index=1
while curr_index<len(lps_arr):
if search[curr_index]==search[previous]:
"""This means that we have found a match for the last longest lps
and the current prefix
Thus we update the lps array"""
previous+=1
lps_arr[curr_index]=previous
curr_index+=1
elif previous==0:
"""This means that we have not found a match for the current prefix with the
last lps.
As previous=0 then there does not exist previous longest lps."""
# no lps[curr index] exist so lps[curr_index]=0
lps_arr[curr_index]=0
curr_index+=1
else:
# we replace current_index
previous=lps_arr[previous-1]
return lps_arr
def is_repeated_subarray(s):
lps=get_lps(s)
length=len(s)-lps[-1]

if len(s)%length!=0 or length==len(s):
return False
for i in range(length,len(s)):
if s[i]!=s[i-length]:
return False
return True

s="abaabaabaaba"

Approach 2: Code to find if the string is repeated substring using z functions.

``````def z_function(S):
l=0
r=0
n = len(S)
z = [0 for i in range(n)]
for i in range(1, n):
if i<=r:
z[i]=min(r-i+1,z[i-l])
while i+z[i]<n and S[z[i]]==S[i+z[i]]:
z[i]+=1
if i+z[i]-1>r:
l=i
r=i+z[i]-1
z[0] = n
return z
def is_repeated_subarray(s):
z=z_function(s)
n=len(s)
length=n
for i in range(1,n):
if z[i]==n-i:
length=i
break
if len(s)%length!=0 or length==len(s):
return False
for i in range(length,len(s)):
if s[i]!=s[i-length]:
return False
return True
s="abaabaabaaba"
res=is_repeated_subarray(s)
print(res)``````

Input

``abaabaabaaba``

Output

``True``

### Time Complexity

The time complexity for both approaches is O(N).

For approach 1, we are finding lps_array, which takes O(N) time. From that LPS array, we know the length of the substring.

For approach 2, the z-function also takes O(N) time. Similarly, we can know the length of the substring from the z_array.

After knowing the length, we can check if the string can be made from the substring in O(N) time.

### Space Complexity

The space complexity for both approaches is O(N).

For the LPS approach, we are making an LPS array of size N.

For the z-function approach, we are making a Z array of size N.

Also see, Morris Traversal for Inorder.and Rabin Karp Algorithm

## Frequently Asked Questions

1. What is the time complexity of the naive algorithm?

The time complexity of the naive algorithm is O(N*N). The reason is we are checking for every s[0..i] where i= 1,2, â€¦N whether s can be made from s[0..i].

2. What is the space complexity of the naive algorithm?

The space complexity for the naive approach will be O(1) as we are using no extra space.

3. What is the z algorithm, and how can we solve this question using z_function?

The z function returns us an array Z. Z[i] represents the length of the longest common prefix for S[iâ€¦ N-1] and S. Thus, if Z[i]= N-i means that S[iâ€¦ N-1] and S is N-i have the longest common prefix of N-i. Since the first string is of length N-i, the longest common prefix of S[i â€¦ N-1] and S is N-i. Thus S[i]=S[i-length] for all i where length<=i<N. Thus S[: length] is the repeating substring.

4. How can we optimise the naive algorithm?

The substring is repeated multiple times and added to form the string. Thus the string can be formed from the substring only when the length of the substring divides the length of the string. Thus the brute force can be optimised by only checking on those substrings whose length divides the size of the given string.

5. Why does making LPS array make the code significantly efficient?

LPS array precalculates the longest proper prefix, which is also a suffix. Using this, we get the length of the repeating substring. Thus it helps efficiently to reduce the time complexity.

## Key Takeaways

The article helps us understand how the KMP and z function to know if a string can be made by adding repeated copies of one of its substrings. Comments have been added to the code for better understanding. You can also copy the code and run it on your system on multiple inputs to understand the approach better.

Check out this problem - Longest Common Prefix

Check out Coding Ninjas Studio to learn more about competitive programming, DSA, attempt mock tests, interview preparation and much more.

Happy Learning!!!

Live masterclass