Table of contents
1.
Problem statement
1.1.
Example 1
1.1.1.
Input
1.1.2.
Output
1.1.3.
Explanation
2.
Approach
3.
Brute force Code:
3.1.
Input:
3.2.
Output:
4.
Time Complexity 
5.
Space Complexity 
6.
Optimized Approach
7.
Optimized Code:
7.1.
Input:
7.2.
Output:
8.
Time Complexity 
9.
Space Complexity 
10.
FAQs
11.
Key Takeaways
Last Updated: Mar 27, 2024

Good Substrings

Author Apoorv
1 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Problem statement

Rohit is playing a game ‘good substrings’. Mohit challenges Rohit to win this game of good substrings but Rohit is a cheater and he is asking for help from a lot of people he is also seeking help from your side.

 In this game, You have the string s, which is made up of small English letters. Some of the English letters are good, but the rest are bad. A substring of s can be denoted as s[l...r] where (1 ≤ l ≤ r ≤ |s|) and s  =  s1s2...s|s| (where |s| is the total length of string s). If there are at most x bad letters among the letters sl, sl + 1,..., sr, the substring s[l...r] is excellent (see at the sample's examples for a better understanding). Your task is to find the total number of different good substrings of the given string s. 

Two substrings s[a...b] and s[c...d] are considered different if their content is different, i.e. s[a...b] ≠ s[c...d].The first line of the input contains the non-empty string s, which is made up of small English letters and has a maximum length of 1500 characters. 

The string of characters "0" and "1" on the second line of the input is exactly 26 characters long. The i-th English letter is excellent if the i-th character of this string equals "1," otherwise, it is bad. This means that the first character in this string corresponds to the letter "a," the second to the letter "b," and so on. The third line of the input contains a single integer x (0 x |s|), which is the maximum number of bad characters in a good substring that can be tolerated.

So solve this problem and help Rohit to win this game.

Also see about, kth largest element in an array, and Euclid GCD Algorithm

Example 1

Input

acbacbacaa

00000000000000000000000000

2

Output

8

Explanation

After exploring all the substrings in the given these strings are considered to be as good substrings "a", "aa", "ac", "b", "ba", "c", "ca", "cb".Here the value of x is 2 so a substring should have at most of 2 good characters and then the string will be considered as good substring

Approach

The brute force approach to solving this problem "Good Substrings" is that we can iterate in a nested way to generate all the substrings and then check whether that substring is a good substring or not, so by chance, if it is a good substring, then increment the count otherwise move forward, but doing so will cost high time complexity because in every case, we are iterating three times in an array which will cost the time complexity of O(N ^ 3) where 'N' is the size of the given string. Below is the detailed code for this approach.

Brute force Code:

// Code for count of Good Substrings
#include <bits/stdc++.h>
using namespace std;

int main() {
    
    // Taking input for problem count of Good Substrings
    string s;cin>>s;
    string check;cin>>check;
    int x;cin>>x;
    
    // Set to store the unique substrings
    set<string> set_string;
    
    // Iterating on every possible substring of the given string
    for(int i = 0 ; i < s.length() ; i++){
        for(int j = i ;  j < s.length() ; j++){
            int x_limit = 0;

            // find count of a bad characters in the substring
            for(int k = i  ;  k <= j  ;  k++){
                char to_check = s[k];
                if(check[to_check-'a'] == '0')
                {
                    x_limit++;
                }
            }

            // If the string is good substring increase the count of Good Substrings by pushing in set
            if(x_limit<=x){
                int len = (j - i + 1);
                set_string.insert(s.substr(i, len));
            }
        }
    }

    // size of set data structure will give the count of Good Substrings
    cout<<set_string.size();
    return 0;
}
You can also try this code with Online C++ Compiler
Run Code

Input:

acbacbacaa
00000000000000000000000000
2
You can also try this code with Online C++ Compiler
Run Code

 

Output:

8
You can also try this code with Online C++ Compiler
Run Code

Also check out - Substr C++

Time Complexity 

O(N ^ 3)

The time complexity to find the count of Good Substrings in a given string is O(N ^ 3), where 'N' is the size of the given string. Since in the entire implementation, we are generating all the possible substring for that, we are iterating two times in a nested loop. Then we are iterating in the generated substring to check if it is a good substring or not, which will cost Iteration three times in a nested loop which leads to the time complexity of O(N ^ 3).

Space Complexity 

O(N)

The space complexity to find the count of Good Substrings in a given string is O(N), where 'N' is the size of the given string. In the entire implementation, we have used a set data structure for storing the unique elements, which means the work has been done in linear space.

Optimized Approach

So to optimize this approach, we can use a suffix array with lcp. We can first pre-compute the bad character count for all the substrings and store it in a table. Then, iterating through the suffix array and avoiding duplicate substrings using the lcp table, we can find the answer. This will do the work for us in quadratic time complexity.

Optimized Code:

// Code for count of Good Substrings
#include<bits/stdc++.h>
using namespace std;
 
 
#define MAX_LEN 1505
 
char str[MAX_LEN];
char charMask[26];
 
// creating required array for precomputation
int pos[MAX_LEN],rnk[MAX_LEN],cnt[MAX_LEN],nxt[MAX_LEN],lcp[MAX_LEN];
bool bh[MAX_LEN],b2h[MAX_LEN];
 
int dp[MAX_LEN][MAX_LEN];
 
bool smaller_first_char(int a,int b)
{
    return str[a]<str[b];
}
 
void computeSuffixArray(int);
int findAns(int,int);
 
int main(){
    int k;

    // Taking input
    scanf("%s",str);
    scanf("%s",charMask);
    scanf("%d",&k);
    int res=findAns(k,strlen(str));
    printf("%d\n",res);
    return 0;
}
 
int findAns(int K,int len){
    int res=0;

    // computing suffix array
    computeSuffixArray(len);
    for(int i=0; i<len; ++i){
        if( charMask[str[i]-'a']=='0')
            dp[i][i] = 1;
    }
    
    // finding total number of good substrings which are unique
    for(int k = 2; k <= len; ++k){
        for(int i = 0; len - i >= k; ++i){
            int start = i, end = i + k - 1;
            dp[start][end] = dp[start][end - 1] + ( charMask[str[end] - 'a'] == '0' ? 1 : 0);
        }
    }
    
    for(int i=0 ;i<len ;++i){
        int start = pos[i];
        int end = (i == 0) ? start : (start + lcp[i]);
        while( end<len ){
            if(dp[start][end] <= K)
                ++res;
            ++end;
        }
    }
    return res;
}
    
// predefined suffix array 
void computeSuffixArray(int n)
{
    for(int i = 0; i < n; ++i){ pos[i] = i; }
    sort(pos, pos + n, smaller_first_char);
    for(int i = 0; i < n; ++i)
    {
        bh[i] = ( i == 0 || str[pos[i]] != str[pos[i - 1]] );
        b2h[i] = false;
    }
    for( int h = 1; h < n; h <<= 1 )
    {
        int buckets = 0;
        for(int i = 0, j; i < n; i = j)
        {
            j = i + 1;
            while( j<n && !bh[j] ){ ++j; }
            nxt[i] = j;
            ++buckets;
        }
        if( buckets == n ){ break; }
        for(int i = 0; i < n; i = nxt[i])
        {
            cnt[i] = 0;
            for(int j = i; j < nxt[i]; ++j)
            {
                rnk[pos[j]] = i;
            }
        }
        cnt[rnk[n - h]]++;
        b2h[rnk[n - h]] = true;
        for(int i = 0; i < n; i = nxt[i])
        {
            for(int j = i; j < nxt[i];++j)
            {
                int s = pos[j] - h;
                if(s >= 0)
                {
                    int head = rnk[s];
                    rnk[s] = head + cnt[head]++;
                    b2h[rnk[s]] = true;
                }
            }
            for(int j = i; j < nxt[i]; ++j)
            {
                int s = pos[j] - h;
                if(s >= 0 && b2h[rnk[s]])
                {
                    for(int k = rnk[s] + 1; !bh[k] && b2h[k]; ++k){ b2h[k] = false; }
                }
            }
        }
        for( int i = 0; i < n; ++i)
        {
            pos[rnk[i]] = i;
            bh[i] = b2h[i];
        }
    }
    lcp[0] = 0;
    for(int i = 0, h = 0; i < n; ++i)
    { 
        if(rnk[i] > 0)
        {
            int j = pos[rnk[i] - 1];
            while( i < n && j < n && str[i + h] == str[j + h] ){ ++h; }
            lcp[rnk[i]] = h;
            if(h > 0){ --h;}
        }
    }
    return;
}
You can also try this code with Online C++ Compiler
Run Code

Input:

acbacbacaa
00000000000000000000000000
2

Output:

8
You can also try this code with Online C++ Compiler
Run Code

Time Complexity 

O(N * N)

The time complexity to find count of Good Substrings in a given string is O(N * N), where ‘N’ is the size of the given string. Since in implementation, We have first pre-computed the bad character count for all the substrings and then we have stored it in a table then, iterating through the suffix array and avoiding duplicate substrings using the lcp table,will cost double nested loop which leads to quadratic time complexity. 

Space Complexity 

O(N)

The space complexity to find the count of Good Substrings in a given string is O(N), where ‘N’ is the size of the given string. In the implementation, we have used suffix array which is used to pre-compute the bad character count which means the work has been done in linear space.

FAQs

What is LCP?

The longest common prefix array (LCP array) is an auxiliary data structure to the suffix array in computer science. In a sorted suffix array, it keeps the lengths of the longest common prefixes (LCPs) between all pairs of subsequent suffixes.

What is a precomputation array in computer science?

Precomputation, in general, is when you break your algorithm into phases and execute some generic computation that creates some data from the input in one phase, and then use the already computed data in a later phase to speed up calculations in another.

Key Takeaways

In this blog, we discussed the solution of the problem statement “count of Good Substrings''.Along with the solution, We also explored both the approaches brute force and optimized approach along with their implementations with the time and space complexity of the solution.

Check out this problem - Longest Common Prefix

If you want to learn more about such hidden algorithms and want to practice some quality questions which require you to excel your preparation s a notch higher, then you can visit our Guided Path for these algorithms on  Coding Ninjas Studio.To be more confident in data structures and algorithms, try out our DS and Algo Course. Until then, All the best for your future endeavors, and Keep Coding.

 

Live masterclass