Hey ninja, this article will help you understand the problem of finding repetition, which is a string match-making problem. In this problem, if you are unfamiliar, you need to find the number of repeated substrings in the given string. The problem we are going to discuss is well known.
We will use the divide and conquer approach to tackle this problem and the main-Lorentz algorithm, which is an algorithm that comes under the divide and conquers technique.
It is based on a well-researched theorem developed by Main and Lorentz, which will give us the answer in better time complexity.
The problem is that we will have a nonempty string; in this particular string, we need to determine the number of repetitions. Repetitions here mean the occurrence of a string in a row like a string X will have a substring S[i…j] where i<j consist of two same strings one after the other.
Sample Example
Let S = “abacacabb”, here the string contains the following repetition:
S[2…5] = “acac”
S[3…6] = “caca”
S[7…8] = “bb”
Brute Force Approach
A naive approach to this problem will be to find all the possible substrings of the given string and then have them; we will discover their repetitions one by one. This is not a good approach in terms of time complexity since it will take O(N^2), where N is the length of the string.
Main-Lorentz Approach
The main -Lorentz Approach is the most optimal to solve this problem. In this algorithm, we will first divide the string into two halves to calculate the number of repetitions. The hardest part is to find out the crossing repetitions. Crossing repetitions are those repetitions that start in the first half and end in the second half.
We can calculate those repetitions by dividing them into two parts of left crossing and right crossing.
We use the divide and conquer technique and can implement this in logarithmic time complexity.
Implementation
#include <bits/stdc++.h>
using namespace std;
vector<int> zFunction(string const &s)
{
int n = s.size();
vector<int> zvector(n);
for (int j= 1, l = 0, r = 0; j < n; j++)
{
if (j<= r)
zvector[j] = min(r - j + 1, zvector[j - l]);
while (j + zvector[j] < n && s[zvector[j]] == s[j + zvector[j]])
zvector[j]++;
if (j + zvector[j] - 1 > r)
{
l = j;
r = j + zvector[j] - 1;
}
}
return zvector;
}
int getZ(vector<int> const &z, int i)
{
if (0 <= i && i < (int)z.size())
return z[i];
else
return 0;
}
vector<pair<int, int>> repetitions;
void getRepetitions(int shift, bool left, int cntr, int l, int k1, int k2)
{
for (int l1 = max(1, l - k2); l1 <= min(l, k1); l1++)
{
if (left && l1 == l)
break;
int l2 = l - l1;
int pos = shift + (left ? cntr - l1 : cntr - l - l1 + 1);
repetitions.emplace_back(pos, pos + 2 * l - 1);
}
}
/* Recursive function with zfunction and getRepetitions function */
void StringRepetitions(string s, int shift = 0)
{
int n = s.size();
if (n == 1)
return;
/* Calculating size of each half */
int nu = n / 2;
int nv = n - nu;
/* Defining half strings and their reverse*/
string u = s.substr(0, nu);
string v = s.substr(nu);
string ru(u.rbegin(), u.rend());
string rv(v.rbegin(), v.rend());
/* Implementing recursion for both halves */
StringRepetitions(u, shift);
StringRepetitions(v, shift + nu);
/* Calling zFunction */
vector<int> z1 = zFunction(ru);
vector<int> z2 = zFunction(v + '#' + u);
vector<int> z3 = zFunction(ru + '#' + rv);
vector<int> z4 = zFunction(v);
/* Calculating the number of repetitions through fixed center and
lenght l, k1, and k2*/
for (int cntr = 0; cntr < n; cntr++)
{
int l, k1, k2;
if (cntr < nu)
{
l = nu - cntr;
k1 = getZ(z1, nu - cntr);
k2 = getZ(z2, nv + 1 + cntr);
}
else
{
l = cntr - nu + 1;
k1 = getZ(z3, nu + 1 + nv - 1 - (cntr - nu));
k2 = getZ(z4, (cntr - nu) + 1);
}
if (k1 + k2 >= l)
getRepetitions(shift, cntr < nu, cntr, l, k1, k2);
}
}
int main()
{
string s;
cout<<"Enter the String: ";
cin >> s;
StringRepetitions(s);
cout <<"Total number of repetitions are: "<<repetitions.size();
}
You can also try this code with Online C++ Compiler
Time complexity is O(N * log(N)), because of the divide and conquer technique applied in it through recursion, and N is the length of the string.
Space Complexity
Space complexity for this program is O(N) where N is the size of the vector applied in the Z-function rest of the operations used O(1) space.
Frequently Asked Questions
Why do we apply divide and conquer for this problem?
The reason to apply the divide and conquer technique for this problem is that we can have the result in O(N * log(N)) complexity, which is way better than O(N^2).
What is rbegin()?
It is a function given in C++ STL which returns a reverse iterator for the given string.
What are crossing repetitions?
Crossing repetitions are those repetitions that start in one half and end in another half.
Example: cac| aba
| line divides the halves and we can see caca repetition is there but a lies in the other half.
What is Z-Function?
The Z-function of a particular string is a vector of length n, where the ith element's value is the largest number of letters beginning at the position i which corresponds to the first characters of the string.
What is a string match-making problem?
String match-making is a problem where we find out the occurrence of a particular pattern in a given string.
Conclusion
In this article, we learned how to find the number of repetitions in a given string with the help of the main-Lorentz algorithm, which is a divide and conquers technique-based algorithm. Due to this, we can have output in O(N * log(N)), where N is the length of the string.
For more articles based on string matchmaking algorithms, you can refer following links: