Last Updated: 22 Mar, 2021

Queries on Strings list

Hard
Asked in companies
UberAmazonD.E.Shaw

Problem statement

Initially you have an empty list/array of strings. You need to process ‘N’ queries of the following 4 types :

1. Add ‘KEY’ : You have to append 1 occurrence of string ‘KEY’ in that list/array.

2. Rem ‘KEY’ : You have to erase any 1 occurrence of string ‘KEY’ from that list/array.

3. GetMaxKey : You have to find the lexicographically smallest string present in the list/array that has the highest frequency.

4. GetMinKey : You have to find the lexicographically smallest string present in the list/array that has the lowest frequency.

You should print a string list/array consisting of answers to all the queries of types 3 and 4 in the same order in which they have been asked.

Note:
1. It is guaranteed that you can always process all these queries i.e there will be at least one occurrence of string ‘KEY’ in list/array when the query of type 2 is asked, and list/array is not empty when the query of type 3 or 4 is asked.
2. It is guaranteed that there will be at least 1 query of type 3 or 4.
Follow up:
1. Can you process all 4 types of queries in O(1) time? (Assume the length of string ‘KEY’ is constant)
2. Can you process all 4 types of queries in O(1) time if you can find any string in the 3rd or 4th query that has the highest and lowest frequency respectively? (Assume the length of string ‘KEY’ is constant).
Example:
Consider the following ‘9’ queries.
Add “abc”
Add “aaa”
Add “pqrs”
Rem “pqrs”
Add “abc”
Add “cd”
GetMaxKey
Rem “abc” 
GetMinKey 

Initially, we have an empty string list/array i.e []
After 1st query -:  [“abc”]
After 2nd query -:  [“abc”, “aaa”]
After 3rd query -:  [“abc”, “aaa”, “pqrs”]
After 4th query -:  [“abc”, “aaa”, “pqrs”]
After 5th query -:   [“abc”, “aaa”, “pqrs”, “abc”]
After 6th query -:   [“abc”, “aaa”, “pqrs”, “abc”, “cd”]
The answer to the 7th query clearly will be “abc” as it is the only string with the highest frequency 2.
After 8th query -:   [“aaa”, “pqrs”, “abc”, “cd”] (Note you can remove any occurrence of “abc”)
The answer to the 9th query will be “aaa” as all the strings now have frequency 1 which is the smallest but the lexicographically smallest string among them is “aaa”

Thus we should return list/array = [“abc”, “aaa”].
Input format:
The first line of input contains an integer ‘T’ denoting the number of test cases. then ‘T’ test cases follow.

The first line of each test case consists of a single integer ‘N’ representing the number of queries.

Then next ‘N’ lines follow in each test case. Each of these ‘N’ lines consists of a query of one of the 4 types as described above in the problem statement.
Output format :
For each test case, print single space-separated outputs to the queries of type 3 and 4 in a single line.

If there are ‘K’ queries of type 3 and 4, then print a single line consisting of ‘K’ single space-separated strings representing answers of queries of type 3 and 4 in the same order they have been asked.

Note:

You do not need to print anything, it has already been taken care of. Just implement the given function.
Constraints:
1 <= T <= 50
1 <= N <= 10^4
1 <= |KEY| <= 5

Where |KEY| is the length of string ‘KEY’.

Time limit: 1 sec

Approaches

01 Approach

This approach is straightforward, we will make a HashMap where the key will be the strings given in queries and the value will be their current frequency.  We can process queries of types 1 and 2 in O(1) times by simply increasing and decreasing the value of the string ‘KEY’ given in the Add or Remove query. 

 

For queries of type 3 and 4, we need to iterate over the HashMap in order to find the lexicographically smallest string with the highest and lowest frequency respectively.

 

Algorithm

  1. Create an empty HashMap and let it be named  ‘FREQ’.
  2. Create an empty string list/array ‘RESULT’.
  3. Iterate over the queries and do the following -:
    • If the query is of type-1 (Add ‘KEY’), then if ‘KEY’ is not present in ‘FREQ’, insert it with value ‘1’ otherwise,  increment its value by ‘1’.
    • If the query is of type-2 (Rem ‘KEY’), then if the value for key ‘KEY’ in  FREQ’ is 1, then remove this key, otherwise,  decrement its value by ‘1’.
    • If the query is of type-3 (GetMaxKey), then do the following -:.
      • Initialize an integer variable ‘MAXFREQ’: = 0, and create an empty string ANS.
      • Iterate over the HashMap ‘FREQ’, and for each (key, value) pair, if the value is greater than ‘MAXFREQ’ then assign the key to ‘ANS’ and value to ‘MAXFREQ’, If the value is equal to ‘MAXFREQ’ then assign the key to ‘ANS’ if the key is lexicographically smaller than ‘ANS’
      • Append ‘ANS’ in the ‘RESULT’.
    • If the query is of type-4 (GetMixKey), then do the following -:.
      • Initialize an integer variable ‘MINFREQ’: = INF, and create an empty string ANS.
      • Iterate over the HashMap ‘FREQ’, and for each (key, value) pair, if the value is smaller than ‘MINFREQ’ then assign the key to ‘ANS’ and value to ‘MINFREQ’, If the value is equal to ‘MINFREQ’ then assign the key to ‘ANS’ if the key is lexicographically smaller than ‘ANS’
      • Append ‘ANS’ in the ‘RESULT’.
  4. Return ‘RESULT’.

02 Approach

In this approach for processing the query of types 1 and 2, we will keep a HashMap like the previous approach, where the key will be the strings given in queries and the value will be their current frequency.  We can then process queries of type 1 and 2  by simply increasing and decreasing the value of the string ‘KEY’ given in the Add or Rem query. 

 

For queries of type 3 and 4, we need a data structure called buckets. Here, each bucket will have a Self-Balanced Binary Search tree (Set of strings in C++ or TreeSet of strings in Java), and a pointer ‘PREV’ and ‘NEXT’ to the previous and next non-empty bucket. We create an array of buckets of size ‘N + 1’ and let it be named ‘BUCKETS’.  All the strings having frequency ‘X’ will be stored in the Set (Self-Balanced Binary Search tree) of  ‘BUCKETS[X]’. We also keep two pointers ‘HEAD’ and ‘TAIL’ at the non-empty bucket with lowest and highest index respectively.

 

Now when we perform type-1 query i.e (Add ‘KEY’), and if the frequency of ‘KEY’ after this becomes ‘X’,  then we remove it from ’BUCKETS[X - 1]’ and add it to ‘BUCKETS[X]’. Then we may need to update  ‘PREV’, ‘NEXT’ pointers of both buckets and  ‘HEAD’, ‘TAIL’ pointers accordingly.

 

Similarly when we perform type-2 query i.e (Rem ‘KEY’), and if the frequency of ‘KEY’ after this becomes ‘X’,  then we remove it from ’BUCKETS[X]’ and add it to ‘BUCKETS[X - 1]’. Then we may need to update ‘PREV’, ‘NEXT’ pointers of both buckets and  ‘HEAD’, ‘TAIL’ pointers accordingly.

 

As each bucket stores strings in a set of strings based on the Self-Balanced Binary Search tree,  then we can find lexicographically smallest strings with the highest and lowest frequency, by simply finding the start elements of a set of buckets pointed by ‘TAIL’ and ‘HEAD’ pointers respectively.

 

Algorithm

  1. Create an empty HashMap and let it be named  ‘FREQ’.
  2. Create an empty string list/array ‘RESULT’.
  3. Create a list/array of bucket BUCKETS of size ‘N’+1. Initially each bucket should have ‘PREV’ = 0, ‘NEXT’ = 0, and empty Set of strings (self-balanced-BST ) ‘STRSET'.
  4. Initialize two pointers ‘HEAD’:= 0, ‘TAIL’:= 0.
  5. Iterate over the queries and do the following -:
    • If the query is of type-1 (Add ‘KEY’), then do the following -:
      • if ‘KEY’ is not present in ‘FREQ’, insert it with the value ‘1’ otherwise,  increment its value by ‘1’.
      • Let ‘X: = FREQ[‘KEY’].
      • Insert ‘KEY’ in BUCKET[X].STRSET.
      • If X > 1, then erase ‘KEY’ from BUCKET[X-1].STRSET.
      • Now we will update pointers as follows -:
      • If the list/array is empty before this query i.e ‘HEAD’:= 0 and ‘TAIL’:= 0 ,  do ‘HEAD’:=1, ‘TAIL’:=1.
      • Otherwise, If BUCKETS[X].PREV != 0 OR  BUCKETS[X].NEXT != 0,  then do nothing as pointers are already set correctly.
      • Otherwise, If X < HEAD,  then do BUCKET[X].NEXT = ‘HEAD’ and ‘HEAD’:= ‘X’  because X’ is now the smallest index of the non-empty bucket.
      • Otherwise, do BUCKETS[X].next = BUCKETS[X-1].next, BUCKETS[X].prev = X-1, TAIL = max(TAIL, X).
      • Otherwise, there can be two possible cases (i) BUCKETS[X-1].STRSET is not empty, in this case we simply do BUCKETS[X-1].NEXT:= ‘X’,  (ii) BUCKETS[X-1].STRSET is empty, in this case we erase BUCKETS[X-1] by setting pointers correctly.
    • If the query is of type-2 (Rem ‘KEY’), then if the value for key ‘KEY’ in  FREQ’ is 1, then remove this key, otherwise,  decrement its value by ‘1’.   After that we have to update BUCKETS and pointers, we can do it in a way similar to the type-1 query.
    • If the query is of type-3 (GetMaxKey), then append the first (lexicographically smallest) string of BUCKETS[TAIL].STRSET in ‘RESULT’.
    • If the query is of type-4 (GetMixKey), then append the first (lexicographically smallest) string of BUCKETS[HEAD].STRSET in ‘RESULT’.
  6. Return ‘RESULT’.