Table of contents
1.
Introduction
2.
Naive Approach
2.1.
PseudoCode
2.2.
Implementation in C++
2.2.1.
Complexity Analysis
3.
Efficient Approach
3.1.
Implementation in C++
3.1.1.
Complexity Analysis
4.
Frequently Asked Questions
5.
Key Takeaways
Last Updated: Mar 27, 2024

Longest Substring with At Least K Repeating Characters

Author aniket verma
2 upvotes
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

In this blog, we will discuss how to find the Longest Substring with At Least K Repeating Characters. It’s a significant problem to check your understanding of hashing and strings. Such questions are frequently asked in interviews and is a fundamental problem.

Before solving the problem, it’s recommended to understand hashmaps, strings, the sliding window, and the divide and conquer technique. This blog will dive deep into each detail to get a firm hold over the application of strings and hashing in problems.  

Let’s look at the problem statement.

You are given a string s, containing only lowercase characters and an integer k. The task is to find the maximum length of a substring of s, such that every unique character in the substring has a frequency greater than or equal to k.

Let’s understand the problem using an example.

Input:  string s = “ababcc”, k =2

Output 6

Explanation: Since the unique characters have a count greater than or equal to k.

Recommended topic, kth largest element in an array, and Euclid GCD Algorithm

Naive Approach

First, let’s look at the naive approach that would probably strike anybody’s mind before jumping to an optimal approach.

So, the basic approach is to find all the substrings, count the frequency of unique characters in the substrings, and maximize the length of such substrings where the frequency of each unique character is greater than k. 

So the naive/basic approach could be formulated as:

  1. Compute all substrings of the given substring. 
  2. Count the frequency of every unique character in each substring.
  3. If the condition is satisfied in the substring, then maximize the answer.  

Now let’s look at the PseudoCode.

PseudoCode

Algorithm

___________________________________________________________________

procedure longestSubstring(s, k):

___________________________________________________________________

1.    ans ← 0

2.    for each substring in substrings(s) do  

3.        if hasKFrequencyCharacters(substring, k) do # check the condition 

4.          ans ← max(ans, substring.size())    

5.     end if   

6.  return ans

7end procedure

___________________________________________________________________

Implementation in C++

//C++ program to solve the problem  Longest Substring with At Least K Repeating //Characters
#include <bits/stdc++.h>
using namespace std;
int longestSubstring(string &s, int k){
    int ans = 0; // length of the subtring to return 
    if(s.size() < k) return ans; // edge case if the size of the string is less than k 
    // iterate over all subtrings from given index
    for(int l = 0; l<s.size(); ++l){
        vector<int> countMap(26, 0); // maintain a map to store count of each character
        for(int r = l; r<s.size(); ++r){ // go through all possible length substrings from l 
            countMap[s[r]-'a']++; // increment the count of this character 
            bool hasKFrequencyCharacters = 1; // flag to check if the count of each character >= k
            for(int i=0;i<26;++i){
                if(countMap[i]>0 and countMap[i]<k){ // check if condition is violated
                    hasKFrequencyCharacters = 0;
                }
            }
            //if condition satisfies maximise the answer
            if(hasKFrequencyCharacters){
                ans = max(ans, r-l+1);
            }
        }
    }
    // return the ans
    return ans;
}
int main() {
    // string s
	string s = "ababcc";
	// k
	int k = 2;

	//find the answer
	cout<<"The longest substring having atleast k frequency of each unique character is: "<<longestSubstring(s, k);
	return 0;
}
You can also try this code with Online C++ Compiler
Run Code

 

Output

The longest substring having atleast k frequency of each unique character is: 6

Complexity Analysis

Time Complexity: O(|s|2)

This is because we iterate through all the subtsrings of s

Space complexity: O(1) at the worst case as we use constant memory to store the characters of the string s1 and s2.

The above algorithm works in quadratic time, which is good but not an efficient algorithm. This gives us the motivation to improve our algorithm.

So let’s think about an efficient approach to solve the problem.

Also check out - Substr C++

Efficient Approach

If we want to build up an efficient solution, we will need to find out if there is any redundancy or any repetition that we are doing. The redundancy in our code is that we check for every substring of the string s if the condition is satisfied. We can make some observations that we can infer to get rid of the extra redundant computation.

Notice that the thing of prime importance is that we must have unique characters in the substrings, which has a count of each unique character >= k. The substring which follows this condition, then we should maximize our answer. In the naive approach, what we did was going through substring of every length. The substring could have any number of unique characters with frequency >= k; obviously, it should not be greater than the total size of the string s.

So the point we are trying to make here is that instead of focusing on lengths, let’s give prime importance to the number of unique characters in the substring and try to find the substring with a given number of unique characters in the substring and count of the unique characters is >=k. If we get such strings, we can easily maximize our answer.

NOTE: we are not trying to say that the substring with a greater number of unique characters will contain the count of each unique character >=k. 

We thought of giving importance to the number of unique characters in the substring because the number of unique characters in any string <= 26(as only lower case letters are allowed). Then we can find only the substring with the frequency of each unique character >=k. 

Let’s also look at the given example in the problem statement and understand the idea.

Now let’s look at substrings with L number of unique characters where L can take values from {1, 2, 3}For each substring with exactly L unique characters, we can look for the substrings with the frequency of each unique character >= k.  

Let’s look at how we find the substrings with L = 2 unique characters such that the condition is satisfied. We iterate through the string in a sliding window fashion. 

 

Since the number of unique characters is 1 we can extend the window.

Since the number of unique characters is 2, but the frequency condition is not satisfied, we can extend the window.

Since the number of unique characters is 2, but the frequency condition is not satisfied, we can extend the window.

Since the number of unique characters is 2 and the frequency condition is satisfied, we can still extend the window to maximize our answer, but locally we can update our answer from 0 to 4.

Since the number of unique characters is 3, we will shrink the window and drop the starting character of the window. 

To save time, we can drop the next 2 characters at the front of the window because they will still exceed the unique characters.

Now the number of unique characters is 2, but the frequency condition is not met. Now we can increase the window size.

Now the number of unique characters is 2, but the frequency condition is not met. We can’t increase the window size further; hence for L = 2, we got the maximum length = 4.

Now, you can try dry running for L =1, 3 also, and you will find the optimal answer we get is for L = 3, i.e., 6. 

Let’s look at the Code.

Implementation in C++

//C++ program to solve the problem Longest Substring with At Least K Repeating //Characters
#include <bits/stdc++.h>
using namespace std;

int longestSubstring(string &s, int k){
    //maintain a count map to store the count of unique characters 
    vector<int> countMap(26, 0);
    int maximumUniqueChars = 0; // variable that stores the maximum number of unique characters
    for(char c:s) 
        countMap[c-'a']++;
    for(int i=0;i<26;++i)
        if(countMap[i]>0)
            maximumUniqueChars++;
    int ans = 0; // final answer that would be returned
    
    // iterate over number of unique characters in a substring
    for (int currUnique = 1; currUnique <= maximumUniqueChars; currUnique++) {
        // fill countMap with 0
        fill(countMap.begin(), countMap.end(), 0);
        int l = 0, r = 0; // l, r  for representing a window[l,r]
        int countAtLeastK = 0; // number of unique characters in the window with freq >= k.  
        int unique = 0; // numberof unique characters in the window
        
        // start iterating over the window
        while (r < s.size()) {
            if(countMap[s[r]-'a']==0) unique++; // increment the count of unique character
            countMap[s[r]-'a']++; // increment the count of current character
            if(countMap[s[r]-'a']==k) countAtLeastK++; // increment if freq == k
            
            // if number of unique characters are more start shrinking the window
            if(unique>currUnique){ 
                //shrinking the window
                while(l<r and unique>currUnique){
                    countMap[s[l]-'a']--; // decrement the count
                    if(countMap[s[l]-'a']==0){
                        unique--;        // reduce the number of unique characters
                    }
                    l++;
                }
                int countKfreqchars = 0;
                for(int i=0;i<26;++i) if(countMap[i]>=k) countKfreqchars++;
                countAtLeastK = countKfreqchars; // update the count of unique chars with freq>=k
            }
            //update the answer if the condition is satisfied
            if(unique == currUnique and countAtLeastK==unique){
                ans = max(ans, r-l+1);
            }
            // increment the window size
            r++;
        }
        // finally check if the condition is satisfied
        if(unique == currUnique and countAtLeastK==unique){
            ans = max(ans, r-l);
        }
    }
    return ans;
}

int main() {
    // string s
	string s = "ababcc";
	// k
	int k = 2;

	//find the answer
	cout<<"The longest substring having atleast k frequency of each unique character is: "<<longestSubstring(s, k);
	return 0;
}
You can also try this code with Online C++ Compiler
Run Code

Output

The longest substring having atleast k frequency of each unique character is: 6

Complexity Analysis

Time Complexity: O(|s|*26)

We are traversing the whole string s maximum unique character times <= 26 in the worst case.

Space complexity: O(1) as we are using constant memory to store the count of characters.

Hence we reached an efficient solution from a quadratic solution. 

Frequently Asked Questions

Q1. When is the Sliding window technique useful?                                                        

Answer) The sliding window technique is useful when you need to keep track of a contiguous sequence of elements.

 

Q2. Why is hashing important?                                                                               

Answer) Hashing is very useful and important because it helps store key-value pairs, increasing the solution’s efficiency and providing fast checks.  

 

Q3. How often are strings asked in interviews?                                                                                    

Answer) String is a favorite topic of the top companies, including Apple, Google, Amazon, etc. It is often asked in interviews and questions formed on strings, usually club 2-3 essential concepts as discussed in the blog. 

Key Takeaways

This article taught us how to find the Longest Substring with At Least K Repeating Characters. We also saw how to approach the problem using a naive approach followed by an efficient solution. We discussed an iterative implementation using examples, pseudocode, and proper code in detail.

We hope you could take away critical techniques like analyzing problems by walking over the execution of the examples and finding out how we can apply hashing in strings and the sliding window technique to simplify our task and make fewer comparisons. 
Check out this problem - Longest String Chain

Now, we recommend you practice problem sets based on the concepts of finding the Longest Substring with At Least K Repeating Characters to master your fundamentals. You can get a wide range of questions similar to the problem of finding the Longest Substring with At Least K Repeating Characters on Coding Ninjas Studio

Happy Learning!!!

Live masterclass