Last Updated: 25 Apr, 2021

Subarrays With ‘K’ Distinct Values

Moderate
Asked in companies
Goldman SachsVMware IncD.E.Shaw

Problem statement

You are given an array ‘ARR’ having ‘N’ integers. You are also given an integer ‘K’. The task is to count the number of subarrays that have ‘K’ distinct values.


Subarray: A consecutive sequence of one or more values taken from an array.


For Example :
‘N’ = 4, ‘K’ = 2
‘ARR’ = [1, 1, 2, 3]

There are ‘3’ subarrays with ‘2’ distinct elements, which are as follows: [1, 2], [2, 3], [1, 1, 2].
Thus, you should return ‘3’ as the answer.
Input Format :
The first line contains two integers, ‘N’ and ‘K’.

The second line contains ‘N’ integers representing the array ‘ARR’.
Output Format :
Print the number of subarrays that have exactly ‘K’ distinct values.
Note :
You do not need to print anything, it has already been taken care of. Just implement the function.

Approaches

01 Approach

We can generate all the subarrays of ‘ARR’ using two nested loops. In the outer loop, use a hash map to store the distinct values in the subarrays generated in the inner loop. If a subarray ‘[L, R]’ has more than ‘K’ distinct values, then all the subarrays ‘[L, R+1], [L, R+2], .., [L, N-1]’ will also have more than ‘K’ distinct values. So, we don’t need to check these subarrays.

 

Algorithm

 

  • Set ‘COUNT = 0’. Use it to store the count of subarrays with ‘K’ distinct values.
  • Run a loop where ‘L’ ranges from ‘0’ to ‘N - 1’:
    • Create a hash map ‘hashMap’ to store integers. Use it to store the distinct values in the subarrays starting at index ‘L’.
    • Run a loop where ‘R’ ranges from ‘L’ to ‘N-1’ to iterate through all subarrays starting from ‘L’:
      • Insert ‘ARR[R]’ into ‘hashMap’.
      • If ‘hashMap.size() == K’, then the subarray ‘[L, R]’ has ‘K’ distinct elements:
        • ‘COUNT += 1’. The subarray ‘[L, R]’ at most ‘K’ distinct values.
      • If ‘hashMap.size() > K’, then the subarray ‘[L, R]’ has more than ‘K’ distinct elements:
        • Break the loop/
  • Return ‘COUNT’ as the answer.

02 Approach

To find subarrays with exactly k distinct integers, we can solve the problem of finding subarrays with less than or equal to k distinct integers first. Then, we subtract the result of count of subarrays with less than or equal to (k-1) distinct integers from the count of subarrays with less than or equal to (k) distinct integers to get the final solution.

Now to find subarrays with less than or equal to k distinct integers first.

Let’s say we have a subarray having indexes ‘[L, R]’ such that there are at most ‘K’ distinct values from ‘L’ to ‘R’, and ‘L’ is the smallest possible index. Then, all the ‘R - L + 1’ subarrays of the form ‘[L, R]’, [L+1, R], [L+2, R], …, [R, R]’ will also have at most ‘K’ distinct values. Thus, if we know the value ‘L’, we know the valid subarrays ending with index ‘R’. Use a hash map to store the ‘(Key, Value)’ pairs as ‘(Distinct values in ‘[L, R]’, The number of times they occur in ‘[L, R]’)’. The size of the hash map is equal to the number of ‘(Keys, Value)’ pairs having ‘Value > 0’. If at any point a ‘Value’ becomes zero, decrease the size of the hash map.

 

To find the count of valid subarrays ending at ‘R+1’, increase the count of ‘ARR[R+1]’ in the hash map. Now, two cases can arise:

 

  • The size of the hash map becomes greater than ‘K’: Remove the values at ‘L’ from the hash map, move ‘L’ to ‘L+1’. Keep doing this until the size of the hash map is not equal to ‘K’.
  • The size of the hashMap remains less than equal to ‘K’: We don’t have to do anything.

 

Now, the number of valid subarrays ending at ‘R+1 will be ‘(R+1) - L + 1’. We have the base case of ‘[L, R] = [0, 0]’, i.e., the first index and the hash map storing a single ‘(Key, Value)’ pair ‘(ARR[R], 1)’. Follow the steps mentioned above to find the valid subarrays ending at index ‘1’, ‘2’, …, up to ‘N-1’. This is called the sliding window technique.

 

Algorithm

 

  • Set ‘count = 0’. Use it to store the count of subarrays with at most ‘K’ distinct values.
  • Set ‘L = R = 0’.
  • Create a hash map ‘hashMap’ that stores ‘(Key, Value)’ pairs as ‘(Distinct values in ‘[L, R]’, The number of times they occur in ‘[L, R]’)
  • Set ‘size = 0’. Use it to store the keep the count of pairs with non-zero values in the ‘hashMap’.
  • Run a loop where ‘R’ ranges from ‘0’ to ‘N - 1’:
    • ‘hashMap[ARR[R]] += 1’. Insert the value ‘ARR[R]’ into ‘hashMap’.
    • If ‘hashMap[ARR[R]]’ is equal to ‘1’:
      • ‘size += 1’. Increase the size of ‘hashMap’.
    • Run a loop while ‘size’ is greater than ‘K’:
      • ‘hashMap[ARR[L]] -= 1’
      • If ‘hashMap[ARR[L]]’ is equal to ‘0’:
        • ‘size -= 0’. Decrease the size of ‘hashMap’ as we have a ‘(Key, Value)’ pair with ‘Value’ as zero.
      • ‘L += 1’. Move ‘L’ to the next index.
    • ‘count += R - L + 1’. The number of valid subarrays ending at index ‘R’.
    • ‘R += 1’. Move ‘R’ to the next index.
  • Return ‘count’ as the answer.