Table of contents
1.
Problem
1.1.
Example
2.
Approach
2.1.
Implementation
2.2.
Time Complexity
2.3.
Space Complexity
3.
FAQs
4.
Key Takeaways
Last Updated: Mar 27, 2024

Equal substrings

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Problem

You are given a string S of length N and an array ‘queries’ of length Q. For each ‘i’ (1 <= i <= Q) find the number of ways to choose exactly ‘queries[i]’ equal substrings from the given string.

Example

Input:

S = “aabaa”

Queries = [ 1, 2, 3 ]

Output:

15 7 4

Explanation:

The following are the substrings that can be formed from S:

“a”, “a’’,  “b”, “a”, “a”
“aa”, “ab”, “ba”, “aa”,
“aab”, “aba”, “baa”,
“aaba”, “abaa”,
“aabaa”

Thus, there are a total of 15 substrings. 

  1. For the first query, we have to choose exactly 1 equal substring. Since any string is equal to itself, we can choose any substring. Therefore, the number of ways to choose 1 equal substring is 15C1 = 15. 
  2. For the second query, we have to choose 2 equal substrings. The following are the sets of equal substrings:
    [“a”, “a”, “a”, “a”] = 4C2 = 6
    [“aa”, “aa”] = 2C2 = 1
    So, the answer for this query is 6 + 1 = 7
  3. For the third query, we have to choose 3 equal substrings. The only available set is : [“a”, “a”, “a”, “a”]. Thus the answer is 4C3 = 4.
     

Also check out - Substr C++

Approach

Let cnt[i] be the count of different substrings that occur exactly i times in the given string. Then, we can see from the explanation that the answer to a single query will be:

Σni = k(cnt[i] * nCr(i,k)).   

Now we have to find some efficient way to calculate this ‘cnt’ array. Recall that in the Z-algorithm, we calculate the Z-array. The Z array for any string 'S' of length 'N' is an array of size 'N' where the ith element is equal to the greatest number of characters starting from the position ‘i’ that coincides with the first characters of ‘S’.

Consider some suffix P of S, i.e., P = S[i…N]. The Z-array Z[i] for this prefix will denote the maximal equal substring that starts from i and matches a prefix of P. 

For example, for the sample example, the Z-array will for the whole string will be:

S = “aabaa”
Z[1] = 5, maximum prefix matching of “aabaa” and “aabaa”.
Z[2] = 1, maximum prefix matching of “aabaa” and “abaa”.
Z[3] = 0, maximum prefix matching of “aabaa” and “baa”.
Z[4] = 2, maximum prefix matching of “aabaa” and “aa”.
Z[5] = 1, maximum prefix matching of “aabaa” and “a”.

Now how can we use this array for calculating the cnt[] array?

From the above example, we can say that we have 1 substring which can be chosen at least 4 times. Why? Because 4 entries in the Z-array are greater than or equal to 1. Similarly, we have 2 substrings that can be chosen at least 2 times, 3,4 and 5 substrings that can be chosen at least 1 time. Therefore, we increment the values of cnt[4] by 1, cnt[2] by 1, and cnt[1] by 3.

We can do this for all the suffixes of S. Now, the cnt[i] will store the number of different strings that can be chosen at least i times. To find the number of strings that can be chosen exactly i times, we can subtract from cnt[i] the value of (cnt[i+1] + cnt[i+2] + … cnt[N]).

Implementation

#include <bits/stdc++.h>
using namespace std;

const int mod = 1e9 + 7;

int ans[100010];
int cnt[100010];
int cnt2[100010];

// Function to calculate Z array
vector<int> z_function(string s)
{
   int n = (int)s.length();
   vector<int> z(n);
   for (int i = 1, l = 0, r = 0; i < n; ++i)
   {
      if (i <= r)
         z[i] = min(r - i + 1, z[i - l]);
      while (i + z[i] < n && s[z[i]] == s[i + z[i]])
         ++z[i];
      if (i + z[i] - 1 > r)
         l = i, r = i + z[i] - 1;
   }
   return z;
}

// Fact[i] will store i factorial
int fact[100010];

// inv[i] will store inverse(i factorial MOD mod)
int inv[100010];

// Function to return o^s
int bin_pow(int o, int s)
{
   if (s == 0)
      return 1;
   if (s == 1)
      return o % mod;
   int d = bin_pow(o, s / 2);
   d = (d * 1ll * d) % mod;
   if (s % 2 == 1)
      d = (d * 1ll * o) % mod;
   return d;
}

// Function to calculate nCr
int C(int n, int k)
{
   int res = 1;
   res = (res * 1ll * fact[n]) % mod;
   res = (res * 1ll * inv[k]) % mod;
   res = (res * 1ll * inv[n - k]) % mod;
   return res;
}

// Function to count equal substrings
void solve(string s, vector<int> queries)
{
   int n = s.length();
   fact[0] = inv[0] = 1;
   ans[0] = 0;
   for (int i = 1; i <= n; ++i)
   {
      // Calculating fact and inv array
      fact[i] = (fact[i - 1] * 1ll * i) % mod;
      inv[i] = bin_pow(fact[i], mod - 2);
      ans[i] = 0;
      cnt2[i] = 0;
   }

   for (int i = 0; i < n; ++i)
   {

      // Calling z function for every substring
      vector<int> zz = z_function(s.substr(i, n - i));
      for (int i = 0; i <= n; ++i)
         cnt[i] = 0;

      // Updating count values for every z-array
      for (auto x : zz)
      {
         cnt[0]++;
         cnt[x + 1]--;
      }

      // converting at least i times to exactly i times
      for (int i = 1; i <= n; ++i)
      {
         cnt[i] += cnt[i - 1];
         cnt2[cnt[i]]++;
      }

      ans[1] += n - i;
   }

   for (int i = 1; i <= n; ++i)
   {
      for (int j = 1; j <= i; ++j)
      {

         // Calculating the answer by cnt[i] * nCr(i, j)
         ans[j + 1] += (cnt2[i] * 1ll * C(i, j)) % mod;
         ans[j + 1] %= mod;
      }
   }
   
   // Iterating over queries
   for (auto x : queries)
   {

      // If query is greater than N, ans will be 0
      if (x > n)
         cout << "0 ";
      else

         // Printing the pre-calculated equal substrings
         cout << ans[x] << " ";
   }
   cout<<endl;
}

// Driver Code
int main()
{
   string s = "aabaa";
   vector<int> queries = {1, 2, 3};
   solve(s, queries);
   return 0;
}

Time Complexity

The Z-array can be calculated in O(N) time.

For every subarray, we are calculating the Z-array, so the time taken to create the cnt[] array is O(N ^ 2). 

Now, for every i, ans[i] will be the summation of cnt[j] * nCr(i, j) for j<=i. Iterating over all i and j will take O(N ^ 2) and since we are calculating nCr(i, j) in O(1) using precalculated factorial and inverse factorial, the overall time complexity of the above approach is O(N ^ 2).

Space Complexity

We are creating cnt[] and z[] arrays to calculate the final answer. These arrays take O(N) space. So the overall space complexity is O(N).

Check out this problem - Longest Subarray With Sum K 

FAQs

  1. What is the Z-array in the Z-algorithm?
    The Z array for any string 'S' of length 'N' is an array of size 'N' where the ith element is equal to the greatest number of characters starting from the position ‘i’ that coincides with the first characters of ‘S’. It can be calculated in O(N) time using the Z-algorithm.
  2. How to calculate the number of ways to choose K strings from N strings?
    The number of ways to choose K strings from N is NCK. Here, NCK is equal to factorial(N) / (factorial(K) * factorial(N-K)).
  3. How to calculate the number of different substrings that occur at least ‘i’ times in a string?
    We can use the Z-algorithm to find this. Consider some suffix P of S, i.e., P = S[j…N]. The Z-array Z[j] for this prefix will denote the maximal equal substring that starts from j and matches a prefix of P. Thus, the number of positions where Z[j] is >= to ‘i’ will denote that a substring starting from ‘j’ occurs at least ‘i’ times. We can do this for every ‘j’ to calculate the final answer.

Key Takeaways

In this article, we have discussed the approach to finding the number of ways to select exactly ‘query[i]’ equal substrings from the given string. We saw how we can solve it using the Z-algorithm and its implementation in C++. We hope that this blog has helped you enhance your knowledge of manacher's algorithm and if you would like to learn more, check out our articles on Coding Ninjas Studio. Do upvote our blog to help other ninjas grow. Happy Coding!

Live masterclass