Last Updated: 15 Oct, 2020

Minimum Character Deletion

Moderate
Asked in companies
American ExpressDeutsche BankMeesho

Problem statement

You are given a string ‘STR’. You need to find and return the minimum number of characters to be deleted from ‘STR’ so that the frequency of each character in the string becomes unique.

Example:
If the given string is “aaBBccc” then the frequency of characters: { a:2, B:2, c:3 }. Now, as ‘a’ and ‘B’ both have the same frequency 2, we need to delete one character either one ‘a’ or one ‘B’, to make their frequency different. After deleting any character we will get frequency as 1,2 and 3, as they all are different. Thus we got our solution as 1.
Input Format:
The first line of input contains a single integer 'T', representing the number of test cases.

The first line of each test case contains a string 'STR'. 
Output Format:
For each test case, print an integer denoting the minimum number of characters to be deleted from the string 'STR'.
Note:
You are not required to print the output, it has already been taken care of. Just implement the given function.
Constraints:
1 <= T <= 10
1 <= |STR| <= 10^5

Where |STR| is the length of the string 'STR'.

Time limit: 1 sec

Approaches

01 Approach

As we are only allowed to delete the character, thus the resulting string after deletion of some character would be the subsequence of the string. So, we have to find such a subsequence which has the unique frequency of each character. 

 

Initialise a variable ‘ans’ that will store the minimum number of characters that are needed to remove from the string. Create all the subsequences of the string and check whether it satisfies the condition, simultaneously update the ‘ans’ with the minimum of ‘ans’ and ('N' - length of subsequence). Finally, return the ‘ans’.

 

For creating the subsequence of string:

 

  1. Call a helper function to create all the subsequences.
  2. Recursively call 2 functions, one by including the current character and one by excluding the current character.
  3. The base case would be when input length becomes zero, then check for the condition of unique frequency of character and update the ans accordingly.

 

For checking the unique frequency condition:

 

  1. Create a map to store the frequency of each character.
  2. Traverse the subsequence and store the frequencies in the map.
  3. Create a set.
  4. Traverse the keys of the map and store its frequency in the set.
  5. If the frequency is already in the set then return false.
  6. If all the array is traversed, then return true.

02 Approach

Create a max heap of integers, which will store the frequency of each character. Initialize a variable ‘count’ that will store the number of deleted characters. Create a map, and store the frequency of each character by traversing the string. Also store all the keys frequency in the heap.

 

Now, while the heap is not empty, remove the highest occurrence of a character. If the current top element of the heap is equal to the frequency of the highest occurring character then decrement the frequency of the highest occurred character and push it back into the heap if the new frequency is not zero. Increment the 'count' by 1. Finally, return the count.