Count Substrings having Frequency of a Character Exceeding that of Another Character in a String

Introduction

Hey Ninjas! This blog discusses a coding challenge based on prefix sum and segment trees named Count substrings having frequency of a character exceeding that of another character in a string. Segment trees are one of the advanced topics actively asked in programming contests and technical interviews.

Segment trees are a must-to-know topic for programmers. This blog approaches the problem with a naive approach and then moves towards the efficient approach deriving clues from the naive one.

Problem Statement

Given a string consisting of only three types of characters, 'a', 'b', and 'c', the task is to count the number of substrings in the given string that have more 'a' characters than 'c' characters. A substring is a contiguous sequence of characters within a string. For example, the string "abc" has the following substrings: "a", "b", "c", "ab", "bc", and "abc".

Also see, Euclid GCD Algorithm

Sample Example

We will take an example so the problem statement will be clear for you

Input

"aca"

Output

Number of substrings with more 'a' than 'c': 3

Explanation

Substrings "a", "aca", and "a" satisfy the condition mentioned in the problem statement. Hence the number of such substrings is 3.

Bruteforce Approach

A simple method is to determine the difference between the frequencies of character a and character c in all possible substrings of the given string. Add one to the answer if character ‘a’ frequency is greater than character ‘c’ in a substring.

Algorithm

To store the difference in the frequencies of characters ‘a’ and ‘c’, initialise a variable count to 0 and To store the number of substrings satisfying the condition given in the problem statement, Initialise a variable countSubstrings to 0.
Start a for loop with index variable 'i' from 0 to N - 1.
Assign count = 0. To calculate the difference from this starting index.
Start a for loop with index variable 'j' from 'i' to N - 1 and do the following
If 'str[j]' is 'a', increment 'count' by one. Otherwise, if 'str[j]' is 'c', decrement 'count' by one.
If 'count' is greater than 0, increment 'countSubstrings' by one.

Dry Run

We have covered the problem statement, example, and algorithm. Now, let's discuss the above algorithm using an input value, i.e., dry run.

Given a string str = "aca". We calculate its length and put it in a variable called N.
Two variables, count and countsubstrings, are initialised to 0. The difference between the frequency of the letters ‘a’ and ‘c’ in the current substring is tracked by count. countsubstrings keeps track of how many substrings meet the specified requirement.
Now, using two nested loops, we consider all potential substrings of the input string. Positions 0, 1, and 2 are the possible starting positions for substrings in the outer loop, and positions 0, 1 and 2 are the possible ending positions for substrings in the inner loop (i.e., positions 0, 1, and 2).
Starting with the substring "a" (i = 0, j = 0), we proceed. count is set to 1 because the substring only has one 'a' and no 'c'. We increase countsubstrings to 1 because count is greater than 0.
Next, we take into account the substring "ac" (i = 0, j = 1). count is set to 0 because the substring has one 'a' and one 'c'. We don't increase countsubstrings because count is less than 0.
Now, consider the substring "aca" (i = 0, j = 2). count is set to 1 because the substring has two 'a's and one 'c'. We increase countsubstrings by 1, and countsubstrings become 2. because count is greater than 0.
The following step is to move to the subsequent substring starting place (i = 1). We proceed with the substring "c" (i = 1, j = 1). The count is set to -1 because the substring only has one 'c'. We don't increase countsubstrings because count is less than 0.
Next, we consider the substring "ca" (i = 1, j = 2). Count is set to 0 because the substring has one 'a' and one 'c'. We don't increase countsubstrings because count equals 0.
The following step is to move to the subsequent substring starting place (i = 2). We proceed with the substring "a" (i = 2, j = 2). The count is set to 1 because the substring only has one 'a'. We increase countsubstrings by 1, and countsubstrings becomes 3 because count is greater than 0.
There the total number of substrings having frequency of character ‘a’ greater than that of character ‘c’ is 3.

Implementation in C++

#include <iostream>
#include <string>
using namespace std;

int main() {
    // Initialize the string.
    string str = "aca";
    // Find the size of the string.
    int N = str.length();
    // To store the difference in the frequencies of characters a and c.
    int count = 0;
    // To store the number of substrings satisfying the condition.
    int countSubstrings = 0;
    // Consider all the substrings.
    for (int i = 0; i < N; i++) {
        count = 0;
        for (int j = i; j < N; j++) {
            // If the current character is a, increment the count by one.
            if (str[j] == 'a') {
                count += 1;
            }
            // else if the current character is c, decrement the count by one.
            else if (str[j] == 'c') {
                count -= 1;
            }
            // If the frequency of a is greater than that of c, increase the answer by one.
            if (count > 0) {
                countSubstrings += 1;
            }
        }
    }
    cout << "Number of substrings with more 'a' than 'c': " << countSubstrings << endl;
    return 0;
}

Output:

Implementation in Python

# Initialize the string.
str = "aca"
# Find the size of the string.
N = len(str)
# To store the difference in the frequencies of characters a and c.
count = 0
# To store the number of substrings satisfying the condition.
countSubstrings = 0
# Consider all the substrings.
for i in range(N):
    count = 0
    for j in range(i, N):
        # If the current character is a, increment the count by one.
        if str[j] == 'a':
            count += 1
        # else if the current character is c, decrement the count by one.
        elif str[j] == 'c':
            count -= 1
        # If the frequency of a is greater than that of c, increase the answer by one.
        if count > 0:
            countSubstrings += 1
print("Number of substrings with more 'a' than 'c':", countSubstrings)

Output:

Time Complexity

The time complexity of the given code is O(n^2), where n is the length of the input string. This is because two nested loops are used to iterate over all the substrings of the string.

Space Complexity

The space complexity of the given code is O(1). This is because the variables used in the code have fixed sizes and do not depend on the length of the input string.

Optimised Approach

We'll apply the same fundamental concept from the prior strategy. For each prefix of the input string, we will use segment trees to store the difference in the frequencies of the characters a and c to reduce the time complexity.

Algorithm

We create function countSubstring() with the given string str as the function parameter and Store the size of string 'str' in the variable n.
Create a segment tree 'prefseg' of size 4 * N.
Create a variable 'countSubs' to store the number of valid substrings.
Create a variable 'count' to store the differences in the frequencies of characters ‘a’ and ‘c’ for each prefix and initialise it with zero.
Update the segment tree by incrementing the element at index 'n' and updating the parent nodes subsequently. This is to consider the case of an empty string where the frequency difference is 0.
Loop through the string and let the current index be 'ind'. If 'str[ind]' == 'a', increment 'count' by one. Otherwise, if 'str[ind]' == 'c', decrement 'count' by one. Update the segment tree by incrementing the element at index count + n and updating the parent nodes subsequently. Make a range sum query to find the sum of elements from indices 0 to count + n - 1 and store the sum in 'partialAns'. Add 'partialAns' to ‘countSubs'.
Return 'countSubs'.

Dry Run

The segment tree is first built with zero values. The input string "aca" is three characters long.
The segment tree's leaf node at index 3 is increased by calling the updateQuery() function after the frequency difference count has been initialised to zero with count + N = 3. N is the length of the string. The partial answer countSubs is also initialised to zero.
Then the loop through the string's characters starts:
The count is raised by 1 for the initial character ‘a’. To increase the leaf node at index 4 of the segment tree, updateQuery() is invoked with count + N = 4. The number of valid substrings that finish at this index is determined by calling the sumQuery() function with the range [0, count + N - 1] = [0, 3]. The partial response partialAns is set to 1 since the only substring that can legitimately end at index 0 is an empty substring. When countSubs is increased by this amount, the result is 1.
The count decreases to 0 with the second character, ‘c’. To increase the leaf node at index 3 of the segment tree, updateQuery() is invoked with count + N = 3. The number of valid substrings that finish at this index is determined by calling the sumQuery() function with the range [0, count + N - 1] = [0, 2]. partialAns is set to 0 since no valid substrings finish at index 1. CountSubs is increased by this amount, which stays at 1.
The count is increased by 1 once more for the third character ‘c’. To increase the leaf node at index 4 of the segment tree, updateQuery() is invoked with count + N = 4. The number of valid substrings that finish at this index is determined by calling the sumQuery() function with the range [0, count + N - 1] = [0, 3]. "aca" and "a" are two acceptable substrings that terminate at index 2. countSubs is set to 1, and partialAns is set to 2, adding up to 3.
The final response, 3, is output from the countSubstrings variable to the console.
The number of valid substrings in the string "aca," thus, is 3, making that the final answer.

Implementation in C++

#include<iostream>
#include<string>
#include<vector>
using namespace std;


// Segment tree implementation.
int sumQuery(int a,int b,vector<int>& tree,int n){


   // Update the leaf nodes indices.
   a += n; b += n;
   // To store the range sum.
   int rangeSum = 0;
   while(a <= b){
       // Node lies completely in the range of a to b.
       if(a % 2 == 1) {
           rangeSum = rangeSum + tree[a++];
       }
       if(b % 2 == 0) {
           rangeSum = rangeSum + tree[b--];
       }
       // Indices of parent nodes.
       a /= 2; b /= 2;
   }
   // Return the sum.
   return rangeSum;
}
void updateQuery(int a,vector<int>& tree,int n){
   // Update the value of index a.
   a += n;
   // Increment the leaf node
   tree[a]++;
   for(a /= 2; a >= 1;a /= 2){
       // Update the parent nodes.
       tree[a] = tree[2*a] + tree[2*a + 1];
   }
}
// Function to find the count of valid substrings.
int countValidSubstrings(string str){
   // Find the size of the string.
   int N = str.size();
   // segment tree initialised with zeros.
   vector<int> prefseg(4 * N, 0);
   // To store the difference in the frequencies of characters a and c.
   int count = 0;
   // To account for empty substring.
   updateQuery(count + N, prefseg, 2 * N);
   // To store the final answer.
   int countSubs = 0;
   for(int ind = 0; ind < N; ind++){
       // If the current character is a, increment the count by one.
       if(str[ind] == 'a') count++;
       // else if the current character is c,decrement the count by one.
       else if(str[ind] == 'c') count--;
       // Update the current prefix difference.
       updateQuery(count + N, prefseg, 2 * N);
       // To find the valid substrings ending at ind.
       int partialAns = sumQuery(0, count + N - 1, prefseg, 2 * N);


       // Update to the final answer.
       countSubs += partialAns;
   }
   return countSubs;
}
int main(){
   string str="aca";
   // To store the number of substrings satisfying the condition.
   int countSubstrings = countValidSubstrings(str);
     // Output the answer.
   cout << "Number of substrings with more 'a' than 'c': " << countSubstrings << endl;
}

You can also try this code with Online C++ Compiler

Run Code

Output:

Implementation in Python

# Import required modules
import math

# Segment tree implementation
def sumQuery(a, b, tree, n):
    # Update the leaf nodes indices
    a += n
    b += n
    # To store the range sum
    rangeSum = 0
    while a <= b:
        # Node lies completely in the range of a to b
        if a % 2 == 1:
            rangeSum += tree[a]
            a += 1
        if b % 2 == 0:
            rangeSum += tree[b]
            b -= 1
        # Indices of parent nodes
        a //= 2
        b //= 2
    # Return the sum
    return rangeSum

def updateQuery(a, tree, n):
    # Update the value of index a
    a += n
    # Increment the leaf node
    tree[a] += 1
    a //= 2
    while a >= 1:
        # Update the parent nodes
        tree[a] = tree[2*a] + tree[2*a + 1]
        a //= 2

# Function to find the count of valid substrings
def countValidSubstrings(s):
    # Find the size of the string
    N = len(s)
    # segment tree initialised with zeros
    prefseg = [0] * (4 * N)
    # To store the difference in the frequencies of characters a and c
    count = 0
    # To account for empty substring
    updateQuery(count + N, prefseg, 2 * N)
    # To store the final answer
    countSubs = 0
    for ind in range(N):
        # If the current character is a, increment the count by one
        if s[ind] == 'a':
            count += 1
        # else if the current character is c, decrement the count by one
        elif s[ind] == 'c':
            count -= 1
        # Update the current prefix difference
        updateQuery(count + N, prefseg, 2 * N)
        # To find the valid substrings ending at ind
        partialAns = sumQuery(0, count + N - 1, prefseg, 2 * N)
        # Update to the final answer
        countSubs += partialAns
    return countSubs

# Driver code
str = "aca"
# To store the number of substrings satisfying the condition
countSubstrings = countValidSubstrings(str)
# Output the answer
print("Number of substrings with more 'a' than 'c':", countSubstrings)

You can also try this code with Online C++ Compiler

Run Code

Output:

Time Complexity

The time complexity of the above approach is O(N * logN), where ‘N’ is the size of the input string. The time complexity of the update and sum query in segment trees is O(logN). Since we call the update and sum queries on the segment tree ‘N’ times, the overall time complexity is O(N * log N).

Space Complexity

The space complexity of the above approach is O(N), where ‘N’ is the size of the input string. It is because we are creating a segment tree of order ‘N’ size.

Also check out - Substr C++

Frequently Asked Questions

What is a segment tree?

A segment tree data structure enables quick updates and range searches on an array. It divides the array into segments and creates a tree structure to hold the total of each segment's elements. This makes it possible to compute range sums quickly using a method similar to binary search.

What further uses do segment trees have?

The minimum/maximum element in a range, computing the sum of the elements in a range, and determining the number of items in a range that satisfy a given condition are all range query problems that can be solved using segment trees. They can be enhanced to allow additional activities like lazy propagation for quicker and range updates.

What is the substring frequency difference?

Substring frequency difference refers to the difference between the number of occurrences of two characters (let's say A and B) in a substring of a given string.

What is time complexity?

The time an algorithm or code takes to execute each instruction is known as its time complexity.

What is space complexity?

The total memory space used by an algorithm or code, including the space of input values, is referred to as "space complexity".

Conclusion

This article discusses the topic of the count of cycles in connected undirected graphs. In detail, we have seen the problem statement, sample example, algorithm, dry run, and implementation in two languages. Along with this, we also saw the time and space complexity.

We hope this blog has helped you enhance your knowledge of the count of cycles in a connected undirected graph. If you want to learn more, then check out our articles.

And many more on our platform Coding Ninjas Studio.

Recommended problems -

Refer to our Guided Path to upskill yourself in DSA, Competitive Programming, JavaScript, System Design, and many more! If you want to test your competency in coding, you may check out the mock test series and participate in the contests hosted on Coding Ninjas Studio!

But suppose you have just started your learning process and are looking for questions from tech giants like Amazon, Microsoft, Uber, etc. In that case, you must look at the problems, interview experiences, and interview bundles for placement preparations.

However, you may consider our paid courses to give your career an edge over others!

Happy Learning!