Last Updated: 27 Feb, 2021

Minimum Distinct Labels

Moderate
Asked in companies
Expedia GroupMorgan Stanley

Problem statement

'N' boxes are placed on a table. Each box has an integer label on it. The labels present on each box are given in the array 'ARR'. Two different boxes may or may not have the same label value. You are given an integer 'M'. Your task is to remove any 'M' of the 'N' boxes such that after their removal, the number of distinct labels left on the table are minimum.

For example:

Consider M = 2 and the array ARR = [ 3, 4, 5, 3 ] 
If we remove the second and the third box, then all the boxes remaining on the table will have label 3. Hence, the minimum number of distinct labels left will be 1 in this case.
Input Format:
The first line of the input contains an integer, 'T,’ denoting the number of test cases.

The first line of each test case contains two space-separated integers, 'N' and 'M', denoting the number of boxes and the number of boxes to be removed respectively.

The second line of each test case contains 'N' space-separated integers denoting the labels of each box.
Output Format:
For each test case, return the minimum number of distinct labels left on the table after removing exactly 'M' boxes.
Note :
You do not need to print anything. It has already been taken care of. Just implement the given function.
Constraints:
1 <= T <= 10
1 <= N <= 10^4 
0 <= M <= N
1 <= ARR[i] <= 10^9

Time Limit: 1 sec

Approaches

01 Approach

The idea is to generate all the possible combinations of the ‘M’ boxes that can be removed, and for each combination, find out the number of distinct labels that are left on the table and compare it with results obtained from all other combinations. In the end we will return minimum possible labels obtained.

To check the number of distinct labels left, we will insert all the labels on the boxes which are not removed into a set and the number of distinct labels will be equal to the number of elements in the set. 

To generate all the combinations of removals, we will take help of the next permutation function. We will create an array having ‘M’ zeros and ‘N’ - ‘M’ ones to store the current combination. The ‘M’ zeros will represent the indices of the boxes that will be removed, and the ‘N’ - ‘M’ ones will represent the indices of the boxes left on the table. We will place all the ‘M’ zeros at the start of the array and the ones at the end. To move to the next combination we will call the next permutation function for the current permutation.

 

Steps:

  1. Let 'ANS' be the minimum distinct labels left. Initialize it as the number of distinct labels initially present on the table.
  2. Create an array 'PERM' having ‘M’ zeros followed by ‘N’ - ‘M’ ones to store the current permutation.
  3. While a greater permutation for the current permutation exists,
    • Change current permutation to the next permutation.
    • Create an empty set ‘LABS’ to store the distinct labels.
    • Iterate from ‘j’ = 0 to ‘N - 1’
      • If 'PERM'[j] is equal to 1, then insert ARR[i] to the set ‘LABELS’. 
    • Let the number of elements in the set ‘LABELS’ be ‘K’
    • Set 'ANS' to the minimum value among 'ANS' and ‘K’. 
  4. Return the variable 'ANS'.

02 Approach

The idea is to create a HashMap to store the 'FREQUENCIES' of each label and it is optimal to always remove the box having the least frequency. As our goal is to minimize the number of distinct labels left which is the same as maximizing the number of distinct labels that are completely removed from the table, i.e., no occurrence of that label is present on the table. As we want more and more labels to be completely removed we can select the least frequent label, as removing it completely will take the least amount of removals compared to any other label. This observation is very intuitive and can be found out by simulating the answers for some examples.  

We can always select the label having the least frequency, until we have exhausted all ‘M’ removals. In the end our answers will be the number of labels that have not been removed completely till now.

 

Steps: 

  1. Iterate through ‘i’ = 0 to ‘N - 1’
    • If ‘ARR[i]’ is present in the HashMap, then we will increase its frequency. Otherwise, will add ARR[i] to the HashMap, and Initialize it's frequency as 1 as this is the first time the label ARR[i] is being inserted into the HashMap. 
  2. Create an array 'FREQUENCIES' containing the value of each key in the HashMap.
  3. Sort the 'FREQUENCIES' array in ascending order.
  4. Let 'DISTINCT_LABELS_LEFT' be the number of labels that are not removed till now. Initialize it as the number of elements in the 'FREQUENCIES' array, as no elements have been removed till now.
  5. Iterate through the ‘FREQUENCIES’ array
    • Let 'CURR' denote the current element.
    • If 'CURR' is smaller than or equal to ‘M’, then subtract 'CURR' from ‘M’, and decrease 'DISTINCT_LABELS_LEFT' by 1. As it is possible to remove the particular label completely, so we are removing at and decreasing the number of distinct labels left.
    • Otherwise we will end up iterating through the array, as no new labels can be removed because all the labels that are not removed will have a greater frequency.
  6. Return the 'DISTINCT_LABELS_LEFT' variable.