Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
1.1.
Problem Statement
1.2.
Sample Example
2.
Brute Force Approach
3.
Main-Lorentz Approach
3.1.
Implementation
3.2.
Output
4.
Complexity Analysis
4.1.
Time Complexity
4.2.
Space Complexity
5.
Frequently Asked Questions
5.1.
Why do we apply divide and conquer for this problem?
5.2.
What is rbegin()?
5.3.
What are crossing repetitions?
5.4.
What is Z-Function?
5.5.
What is a string match-making problem?
6.
Conclusion
Last Updated: Mar 27, 2024
Medium

Finding Repetitions

Author Ujjawal Gupta
0 upvote
Roadmap to SDE career at Amazon
Speaker
Anubhav Sinha
SDE-2 @
25 Jun, 2024 @ 01:30 PM

Introduction

Hey ninja, this article will help you understand the problem of finding repetition, which is a string match-making problem. In this problem, if you are unfamiliar, you need to find the number of repeated substrings in the given string.  The problem we are going to discuss is well known.

We will use the divide and conquer approach to tackle this problem and the main-Lorentz algorithm, which is an algorithm that comes under the divide and conquers technique.

It is based on a well-researched theorem developed by Main and Lorentz, which will give us the answer in better time complexity.

Must read, kth largest element in an array and Euclid GCD Algorithm

Problem Statement

The problem is that we will have a nonempty string; in this particular string, we need to determine the number of repetitions. Repetitions here mean the occurrence of a string in a row like a string  X will have a substring S[i…j] where i<j consist of two same strings one after the other.

Sample Example

Let S = “abacacabb”, here the string contains the following repetition:

  • S[2…5] = “acac”
  • S[3…6] = “caca”
  • S[7…8] = “bb”
example

Brute Force Approach

A naive approach to this problem will be to find all the possible substrings of the given string and then have them; we will discover their repetitions one by one. This is not a good approach in terms of time complexity since it will take O(N^2), where N is the length of the string.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Main-Lorentz Approach

The main -Lorentz Approach is the most optimal to solve this problem. In this algorithm, we will first divide the string into two halves to calculate the number of repetitions. The hardest part is to find out the crossing repetitions. Crossing repetitions are those repetitions that start in the first half and end in the second half. 

We can calculate those repetitions by dividing them into two parts of left crossing and right crossing.

We use the divide and conquer technique and can implement this in logarithmic time complexity.

Implementation

#include <bits/stdc++.h>
using namespace std;
vector<int> zFunction(string const &s)
{
    int n = s.size();
    vector<int> zvector(n);
    for (int j= 1, l = 0, r = 0; j < n; j++)
    {
        if (j<= r)
          zvector[j] = min(r - j + 1, zvector[j - l]);
        while (j + zvector[j] < n && s[zvector[j]] == s[j + zvector[j]])
            zvector[j]++;
        if (j + zvector[j] - 1 > r)
        {
            l = j;
            r = j + zvector[j] - 1;
        }
    }
    return zvector;
}
 
int getZ(vector<int> const &z, int i)
{
    if (0 <= i && i < (int)z.size())
        return z[i];
    else
        return 0;
}
 
vector<pair<int, int>> repetitions;
void getRepetitions(int shift, bool left, int cntr, int l, int k1, int k2)
{
    for (int l1 = max(1, l - k2); l1 <= min(l, k1); l1++)
    {
        if (left && l1 == l)
            break;
        int l2 = l - l1;
        int pos = shift + (left ? cntr - l1 : cntr - l - l1 + 1);
        repetitions.emplace_back(pos, pos + 2 * l - 1);
    }
}

/* Recursive function with zfunction and getRepetitions function */
void StringRepetitions(string s, int shift = 0)
{
    int n = s.size();
    if (n == 1)
        return;
 /* Calculating size of each half */
    int nu = n / 2;
    int nv = n - nu;

/* Defining half strings and their reverse*/
    string u = s.substr(0, nu);
    string v = s.substr(nu);
    string ru(u.rbegin(), u.rend());
    string rv(v.rbegin(), v.rend());
    
 /* Implementing recursion for both halves */
    StringRepetitions(u, shift);
    StringRepetitions(v, shift + nu);
 
 /* Calling zFunction */
    vector<int> z1 = zFunction(ru);
    vector<int> z2 = zFunction(v + '#' + u);
    vector<int> z3 = zFunction(ru + '#' + rv);
    vector<int> z4 = zFunction(v);
 
 /* Calculating the number of repetitions through fixed center and
     lenght l, k1, and k2*/
    for (int cntr = 0; cntr < n; cntr++)
    {
        int l, k1, k2;
        if (cntr < nu)
        {
            l = nu - cntr;
            k1 = getZ(z1, nu - cntr);
            k2 = getZ(z2, nv + 1 + cntr);
        }
        else
        {
            l = cntr - nu + 1;
            k1 = getZ(z3, nu + 1 + nv - 1 - (cntr - nu));
            k2 = getZ(z4, (cntr - nu) + 1);
        }
        if (k1 + k2 >= l)
            getRepetitions(shift, cntr < nu, cntr, l, k1, k2);
    }
}
int main()
{
    string s;
    cout<<"Enter the String: ";
    cin >> s;
    StringRepetitions(s);
    cout <<"Total number of repetitions are: "<<repetitions.size();
}

Output

output

Complexity Analysis

Time Complexity

Time complexity is O(N * log(N)), because of the divide and conquer technique applied in it through recursion, and N is the length of the string.

Space Complexity

Space complexity for this program is O(N) where N is the size of the vector applied in the Z-function rest of the operations used O(1) space.

Frequently Asked Questions

Why do we apply divide and conquer for this problem?

The reason to apply the divide and conquer technique for this problem is that we can have the result in O(N * log(N)) complexity, which is way better than O(N^2).

What is rbegin()?

It is a function given in C++ STL which returns a reverse iterator for the given string.

What are crossing repetitions?

Crossing repetitions are those repetitions that start in one half and end in another half.

Example: cac| aba

| line divides the halves and we can see caca repetition is there but a lies in the other half. 

What is Z-Function?

The Z-function of a particular string is a vector of length n, where the ith element's value is the largest number of letters beginning at the position i which corresponds to the first characters of the string.

What is a string match-making problem?

String match-making is a problem where we find out the occurrence of a particular pattern in a given string.

Conclusion

In this article, we learned how to find the number of repetitions in a given string with the help of the main-Lorentz algorithm, which is a divide and conquers technique-based algorithm. Due to this, we can have output in O(N * log(N)), where N is the length of the string.

For more articles based on string matchmaking algorithms, you can refer following links:

String Manipulation in Python

KMP String Matching Algorithm

String Matching naive

To learn more about DSA, competitive coding, and many more knowledgeable topics, please look into the guided paths on Coding Ninjas Studio. Also, you can enroll in our courses and check out the mock test and problems available to you. Please check out our interview experiences and interview bundle for placement preparations.

Previous article
Fibonacci numbers
Next article
Tiling Problem Using Divide and Conquer Algorithm
Live masterclass