Problem
You are given a string S of length N and an array ‘queries’ of length Q. For each ‘i’ (1 <= i <= Q) find the number of ways to choose exactly ‘queries[i]’ equal substrings from the given string.
Example
Input:
S = “aabaa”
Queries = [ 1, 2, 3 ]
Output:
15 7 4
Explanation:
The following are the substrings that can be formed from S:
“a”, “a’’, “b”, “a”, “a”
“aa”, “ab”, “ba”, “aa”,
“aab”, “aba”, “baa”,
“aaba”, “abaa”,
“aabaa”
Thus, there are a total of 15 substrings.
- For the first query, we have to choose exactly 1 equal substring. Since any string is equal to itself, we can choose any substring. Therefore, the number of ways to choose 1 equal substring is 15C1 = 15.
-
For the second query, we have to choose 2 equal substrings. The following are the sets of equal substrings:
[“a”, “a”, “a”, “a”] = 4C2 = 6
[“aa”, “aa”] = 2C2 = 1
So, the answer for this query is 6 + 1 = 7 -
For the third query, we have to choose 3 equal substrings. The only available set is : [“a”, “a”, “a”, “a”]. Thus the answer is 4C3 = 4.
Also check out - Substr C++
Approach
Let cnt[i] be the count of different substrings that occur exactly i times in the given string. Then, we can see from the explanation that the answer to a single query will be:
Σni = k(cnt[i] * nCr(i,k)).
Now we have to find some efficient way to calculate this ‘cnt’ array. Recall that in the Z-algorithm, we calculate the Z-array. The Z array for any string 'S' of length 'N' is an array of size 'N' where the ith element is equal to the greatest number of characters starting from the position ‘i’ that coincides with the first characters of ‘S’.
Consider some suffix P of S, i.e., P = S[i…N]. The Z-array Z[i] for this prefix will denote the maximal equal substring that starts from i and matches a prefix of P.
For example, for the sample example, the Z-array will for the whole string will be:
S = “aabaa”
Z[1] = 5, maximum prefix matching of “aabaa” and “aabaa”.
Z[2] = 1, maximum prefix matching of “aabaa” and “abaa”.
Z[3] = 0, maximum prefix matching of “aabaa” and “baa”.
Z[4] = 2, maximum prefix matching of “aabaa” and “aa”.
Z[5] = 1, maximum prefix matching of “aabaa” and “a”.
Now how can we use this array for calculating the cnt[] array?
From the above example, we can say that we have 1 substring which can be chosen at least 4 times. Why? Because 4 entries in the Z-array are greater than or equal to 1. Similarly, we have 2 substrings that can be chosen at least 2 times, 3,4 and 5 substrings that can be chosen at least 1 time. Therefore, we increment the values of cnt[4] by 1, cnt[2] by 1, and cnt[1] by 3.
We can do this for all the suffixes of S. Now, the cnt[i] will store the number of different strings that can be chosen at least i times. To find the number of strings that can be chosen exactly i times, we can subtract from cnt[i] the value of (cnt[i+1] + cnt[i+2] + … cnt[N]).
Implementation
#include <bits/stdc++.h>
using namespace std;
const int mod = 1e9 + 7;
int ans[100010];
int cnt[100010];
int cnt2[100010];
// Function to calculate Z array
vector<int> z_function(string s)
{
int n = (int)s.length();
vector<int> z(n);
for (int i = 1, l = 0, r = 0; i < n; ++i)
{
if (i <= r)
z[i] = min(r - i + 1, z[i - l]);
while (i + z[i] < n && s[z[i]] == s[i + z[i]])
++z[i];
if (i + z[i] - 1 > r)
l = i, r = i + z[i] - 1;
}
return z;
}
// Fact[i] will store i factorial
int fact[100010];
// inv[i] will store inverse(i factorial MOD mod)
int inv[100010];
// Function to return o^s
int bin_pow(int o, int s)
{
if (s == 0)
return 1;
if (s == 1)
return o % mod;
int d = bin_pow(o, s / 2);
d = (d * 1ll * d) % mod;
if (s % 2 == 1)
d = (d * 1ll * o) % mod;
return d;
}
// Function to calculate nCr
int C(int n, int k)
{
int res = 1;
res = (res * 1ll * fact[n]) % mod;
res = (res * 1ll * inv[k]) % mod;
res = (res * 1ll * inv[n - k]) % mod;
return res;
}
// Function to count equal substrings
void solve(string s, vector<int> queries)
{
int n = s.length();
fact[0] = inv[0] = 1;
ans[0] = 0;
for (int i = 1; i <= n; ++i)
{
// Calculating fact and inv array
fact[i] = (fact[i - 1] * 1ll * i) % mod;
inv[i] = bin_pow(fact[i], mod - 2);
ans[i] = 0;
cnt2[i] = 0;
}
for (int i = 0; i < n; ++i)
{
// Calling z function for every substring
vector<int> zz = z_function(s.substr(i, n - i));
for (int i = 0; i <= n; ++i)
cnt[i] = 0;
// Updating count values for every z-array
for (auto x : zz)
{
cnt[0]++;
cnt[x + 1]--;
}
// converting at least i times to exactly i times
for (int i = 1; i <= n; ++i)
{
cnt[i] += cnt[i - 1];
cnt2[cnt[i]]++;
}
ans[1] += n - i;
}
for (int i = 1; i <= n; ++i)
{
for (int j = 1; j <= i; ++j)
{
// Calculating the answer by cnt[i] * nCr(i, j)
ans[j + 1] += (cnt2[i] * 1ll * C(i, j)) % mod;
ans[j + 1] %= mod;
}
}
// Iterating over queries
for (auto x : queries)
{
// If query is greater than N, ans will be 0
if (x > n)
cout << "0 ";
else
// Printing the pre-calculated equal substrings
cout << ans[x] << " ";
}
cout<<endl;
}
// Driver Code
int main()
{
string s = "aabaa";
vector<int> queries = {1, 2, 3};
solve(s, queries);
return 0;
}
Time Complexity
The Z-array can be calculated in O(N) time.
For every subarray, we are calculating the Z-array, so the time taken to create the cnt[] array is O(N ^ 2).
Now, for every i, ans[i] will be the summation of cnt[j] * nCr(i, j) for j<=i. Iterating over all i and j will take O(N ^ 2) and since we are calculating nCr(i, j) in O(1) using precalculated factorial and inverse factorial, the overall time complexity of the above approach is O(N ^ 2).
Space Complexity
We are creating cnt[] and z[] arrays to calculate the final answer. These arrays take O(N) space. So the overall space complexity is O(N).
Check out this problem - Longest Subarray With Sum K