Last Updated: 21 Nov, 2020

Minimum Length Of Compressed String

Ninja
Asked in company
Amdocs

Problem statement

Run-length Encoding is the string compression technique that compresses the given string by replacing the number of consecutive identical characters (repeated 2 or more times) with the concatenation of the character and its count of characters.

For Example :

If the given string is  “wwwmpp”, then, Run-length encoded string will be “w3mp2”,
Here, we replace ‘www’ with ‘w3’, as the frequency of ‘w’ is 3, 
And, we replace ‘pp’ with ‘p2’, as the frequency of ‘p’ is 2, 
Thus, the given string is compressed to “w3mp2”, 
Note, for ‘m’ we are not adding 1 because it is mentioned in the definition of Run-Length Encoding, only 2 or more repeated character count will be concatenated.

You are given a string 'Str' and an integer 'K'. Your task is to find the minimum possible length of the run-length encoded string, which can be generated by deleting at most 'K' characters from the string 'Str'.

For Example :

Consider the string 'Str' = “cccdeeef” and 'K' =2, 
Current Run-length encoded string is "c3de3f" having length 6. 
Here we can delete at most 2 characters, so If we delete any of the characters 'c' or 'e', it would at most decrease the length of the compressed string to 5, 
For example, if we delete 2 'c' then we will have the string "cdeeef" which can be compressed to “cde3f”. Therefore, the optimal way is to delete 'd' and 'f', then the compressed string will be "c3e3" of length 4.

Input Format :

The first line of input contains a single integer 'T', representing the number of test cases or queries to be run. 

Then the 'T' test cases follow.

The first line of each test case contains a string 'STR'.

The second line of each test case contains a single integer 'K', representing the maximum deletions allowed in the string.

Output Format :

For each test case, print the length of the minimum compressed string after deleting at-most 'K' characters in a separate line.

Note :

You do not need to print anything, it has already been taken care of. Just implement the given function.

Constraints :

1 <= T <= 50    
1 <= |STR| <= 100 
0 <= K <= |STR|   

Where |STR| denotes the length of string 'STR'.
All the characters of the string will be in lowercase


Time Limit: 1 sec

Approaches

01 Approach

We find all possible strings that can be obtained by deleting at most ‘K’ characters using a backtracking algorithm. Then one by one we compress all these strings and find the minimum length after compression.

 

Algorithm

  • Create an empty list of strings ‘possibleStr’.
  • Create an empty string ‘curStr’.
  • Call a function findPossibleString(possibleStr, curStr, K, curIdx) Where ‘K’ is the number of deletion allowed and curIdx represents the current Index of string ‘str’. This function will backtrack to generate all possible strings as follows
    • If curIdx = length of given string ‘Str’,  then append curStr in list possibeStr and then return.
    • Otherwise if K > 0 then call findPossibleString(possibleStr, curStr, K-1, curIdx+1) to find possible string after removal of current character.
    • And append Str[curIdx] in curStr, and then call findPossibleString(possibleStr, curStr, K, curIdx+1) to find all possible strings after retention of current character.
    • Remove this character from curStr.
  • List ‘possibleStr’ will now have all possible strings that can be obtained after removing at most ‘K’ characters.
  • Initialize an integer variable  ‘result’ by infinity.
  • Iterate over the list ‘possibleStr’ and compress each string of this list as follows.
    • Initialize a variable ‘cnt’ := 0.
    • Create an empty string ‘compressedStr’.
    • Traverse over the string and if the current character is the same as the previous character then increment ‘cnt’ by ‘1’  otherwise, append the current character with ‘cnt’ in  ‘compressedStr’ and change ‘cnt’ value to  1.
    • Update result with a minimum of its current value and length of ‘compressedStr’.
  • Return ‘result’.

02 Approach

Let Len(curIdx, K, prevChar, cntPrevChar) denote the minimum run-length-encoding obtained starting from index curIdx where K denotes the number of deletions remaining, prevChar denotes the last character with frequency cntPrevChar so far.

 

Now two possibilities exist, either to delete the current character or to retain it.

 

If K > 0 and we delete the current character then -:

 

L1 = Len(curIdx+1, K-1, prevChar, cntPrevChar)

 

If we retain the current character then there are two possible cases -:

 

  1. If prevChar and current character are the same, then

            L2 = Len(curIdx+1, K, prevChar, cntPrevChar+1)

  1. If prevChar and Current character are not the same, then

            L2 = Len(curIdx+1, K, current character, 1)

 

Len(curIdx, K, prevChar, cntPrevChar) = min(L1, L2).

      

Algorithm

  • Create a function  helper(curIdx, K, prevChar, cntPrevChar) where
    • curIdx: current index in str.
    • K: Remaining number of deletions.
    • prevChar: previous character.
    • cntPrevChar: Number of occurrences of the previous character.
  • The function can be implemented recursively as follows -:
    • If the entire string has been traversed i.e curIdx = N, then return 0, because there is no more character so no further addition will be in length.
    • If K > 0 then call helper(curIdx+1, K-1, prevChar, cntPrevChar); It will give a minimum run-length encoding by removing the current character. Store this value in the integer variable ‘result’.
    • If Str[curIdx] == prevChar then call helper(curIdx+1, K, prevChar, cntPrevChar+1), Otherwise call helper(curIdx+1, K, Str[curIdx], 1). The minimum run-length encoding by retaining the current character will be helper(curIdx+1, K, Str[curIdx], 1) + 1 if Str[curIdx] != prevChar, otherwise it will be helper(curIdx+1, K, prevChar, cntPrevChar+1) + 1 when cntPrevChar is 2, 9 or 99 as number of digits increment by 1, and helper(curIdx+1, K, prevChar, cntPrevChar+1) for all other values of cntPrevChar.  Update result with maximum of its previous value and this value.
    • Return ‘result’.
  • We will call helper(0, K, #, 0) to calculate the minimum length of the compressed string (minimum run-length encoding) of the given string.  Note here ‘#’ is the character that symbolizes the start of the string.

03 Approach

In the previous approach, we can observe that we have done a lot of repeated work. To avoid repeated work we will store the result of already computed subproblems in a ‘dp’ table.  

 

The ‘dp’ table will have four dimension dp[curIdx][K][prevChar][cntPrevChar] where curIdx: current index in, K: Remaining number of deletion, prevChar: a previous character we represent a character with an integer between 1 to 26., cntPrevChar: Number of occurrences of the previous character.

 

Algorithm

  • Create a ‘dp’ table of dimension N*K*26*N and fill it by -1.
  • Create a function  helper(curIdx, K, prevChar, cntPrevChar) where
    • curIdx: current index in str.
    • K: Remaining number of deletions.
    • prevChar: previous character (represented by an integer).
    • cntPrevChar: Number of occurrences of the previous character.
  • The function can be implemented recursively as follows -:
    • If the entire string has been traversed i.e curIdx = N, then return 0, because there is no more character so no further addition will be in length.
    •  If dp[curIdx][K][prevChar][cntPrevChar] != -1,  i.e the subproblem is already computed so be return its result then simply return it.
    • If K > 0 then call helper(curIdx+1, K-1, prevChar, cntPrevChar); It will give a minimum run-length encoding by removing the current character. Store this value in the integer variable ‘result’.
    • If Str[curIdx] - ‘a’ == prevChar then call helper(curIdx+1, K, prevChar, cntPrevChar+1), Otherwise call helper(curIdx+1, K, Str[curIdx] - ‘a’, 1). The minimum run-length encoding by retaining the current character will be helper(curIdx+1, K, Str[curIdx] - ‘a’, 1) + 1 if Str[curIdx] -’a’!= prevChar, otherwise it will be helper(curIdx+1, K, prevChar, cntPrevChar+1) + 1 when cntPrevChar is 2, 9 or 99 as number of digits increment by 1, and helper(curIdx+1, K, prevChar, cntPrevChar+1) for all other values of cntPrevChar.  Update result with maximum of its previous value and this value.
    • Store ‘result’ in dp[curIdx][K][prevChar][cntPrevChar].
    • Return ‘result’. 
  • We will call a helper(0, K, 27, 0) to calculate the minimum length of the compressed string (minimum run-length encoding) of the given string.  Note here 27 is the integer representation of the character that symbolizes the start of the string.