Last Updated: 25 Apr, 2021

Game of Dominoes

Moderate
Asked in companies
OlaProcol

Problem statement

Rafiq loves to play with dominoes. On his birthday his father gifted him ‘N’ piles of dominoes, where each pile contains a positive number of dominoes stacked above.

Rafiq loves to play with piles if they are of equal heights, so his father decides to make changes to the piles. His father wonders for each consecutive window of length ‘K’ what is the minimum cost to make them all of the equal height. Hence his father wants to calculate the cost for each window.

The cost for removing and adding one domino on any pile is one unit.

Determine what is the cost to make all elements equal in a window of size 'K', for all windows of size 'K' in an 'N'-sized array/list 'heights'.

Input Format:
The first line contains a single integer ‘T’ representing the number of test cases. 

The first line of each test case will contain a single integer ‘N’ i.e., length of the pile and ‘K’ size of the window.

The next line of each test case contains 'N' numbers a1, a2, a3... aN denoting the initial height of each pile.
Output Format:
For each test case, output the minimum cost for each window.

Output for every test case will be printed in a separate line.
Note:
You don’t need to print anything; It has already been taken care of.
Constraints:
1 <= T <= 10
1 <= N <= 10000 
1 <= K <= N 
1 <= height[i] <= 10^5 

where 'T' is the number of test cases, 'N' is the number of piles, 'K' is the size of the window as discussed above and 'height[i]' represents the height of each pile.

Time limit: 1 sec

Approaches

01 Approach

Let's reduce our question to a fixed array instead of every subarray of size ‘K’ and try to find an answer specific to that array/list ‘height’, so we could do the same thing for each window, now, as mentioned in the hint, we need to calculate the sum of absolute difference of every element from its median, so after getting the new array, next step would be to calculate the cost of that window, so first we will sort the array and select the ‘K’/2 th element as we know the size of the new array will be ‘K’, now we could calculate the cost easily by iterating over the window.

 

Reference: proof for optimal cost to make all elements equal would be at its median value.

 

The steps are as follows:

 

  1. Declare a list/vector ‘ret’ in which we store our answer.
  2. Run a loop while ‘i’ = 0 to ‘N’ - ‘K’.
    • Let's calculate the median of the ‘K’ size subarray/sublist ‘temp’ and initialize it as ‘Median’. Note that there are various methods to calculate the median of the ‘temp’.
      • select the ‘N’/2 th element in the sorted ‘temp’, here ‘N’ denotes the size of the ‘temp’.
      • Use kth-order statistics to find the median in linear time.
    • For each element ‘X’ in the ‘temp’, we add abs(‘X’ - ‘Median’) to ‘Answer’, as we need this quantity of steps to make ‘X’ equal to ‘Median’, by incrementing or decrementing ‘X’ by 1.
    • Add this ‘Answer’ into ‘ret’.
  3. Finally, return 'ret'.

 

02 Approach

Now as we know that we have to convert each element in the window to the median of that window to get the optimal cost. A median of the window is equal to the element at the ‘P’/2 th position in the sorted window where ‘P’ denotes the size of the window. in our case, ‘P’ is equal to ‘K’. 

 

Fenwick Tree can be used to find range sum, point updates, so we could find range sum in log(N) time, here ‘N’ is the size of the ‘pile’ that we declare. Alternatively, you can use any range sum data structure like which supports updates, as we will use keep data according to the sliding window, and keep only data in the current window.

 

We are using an Ordered set for getting the median in log(K) time as ‘K’ is the size of the window, also to find the count of elements less than some number. Note that the Ordered set stores only unique elements or unique data types. But we need multiset instead  of set, here so we will replace element ‘X’ with pair data type like (‘X’,’indexOfX’) , here ‘indexOfX’ is the index of ‘X’ in the given ‘pile’, now it is assured each pair will be unique as each pair consists index which is unique and ranges from [0,’N’) here ‘N’ is the number of piles. 

 

Now as we saw in Approach-1 that we have to calculate ‘Answer’ for each window of size ‘K’. Let's rephrase the approach a little bit which we were using for a particular subarray/sublist, for the third step in Approach-1 we could write the sum of absolute(‘X’ - ‘Median’)  as ‘S1’ + ‘S2’  where ‘S1’ is equal to Sum of (‘X’ - ‘Median’ ) if ‘X’ > ‘Median’ and ‘S2’ is equal to Sum of (‘Median’ - ‘X’) if ‘X’ < ‘Median’.

 

Now as we know Fenwick tree can be used to find range sum so that we can find sum of elements in a certain range like:

 Sum of (‘X’ - ‘Median’ ) = Sum of (‘X’) - ‘CountLessX’ * ‘Median’, here ‘CountLessX’ can be found using the ordered set and denotes the number of elements less than the median, we are using less than here instead of less than equal to as there might be many elements equal to ‘Median’, so it will be more convenient for us. Also, Sum of ‘X’ is basically a range sum query on the range [1, Median).

 

Similarly, Sum of (‘Median’ - ‘X’) = ‘CountGreaterX’ * ‘Median’ - Sum of (‘X’), where ‘CountGreaterX’ can be found by ordered set and denotes the number of array/list ‘pile’ elements greater than median and Sum of (‘X’) is equal to range sum query for range [Median, 10^5). Here 10^5 denotes the maximum array element.

 

So now, it is clear that we can get the total required steps for a particular subarray/sublist in log(N) time as log(N) is the required time complexity for Range Sum query on Fenwick tree also for finding count of elements in the given range using ordered set and finding element at ‘K’/2 th index.

So let’s start simulating our approach for each window, steps for simulation are as follows.

 

  1. Declare Fenwick tree as ‘FTree’, Ordered Set as ‘OS’.
  2. Declare an empty list (say ‘Ret’).
  3. Initialize a variable ‘CurAnswer’ = ‘0’/
  4. Run a for loop for ‘i’ in the range [0,’K’):
    • Update or Increment Value at the index of ‘height[i]’ by 'height[i]' in the Fenwick tree.
    • Insert element in Ordered Set as make_pair('height[i]' , ‘i’).
  5. Declare and initialize variables like ‘Median’, ‘CountGreaterX’, ‘CountLessX’, ‘SumLess’,  ‘SumGreater’  (refer above for definition).
    • ‘Median’ = OS.getMedian(),
    • ‘CountGreaterX’ = OS.CountGreater(‘Median’).
    • ‘CountLessX’ = OS.CountLess(‘Median’).
    • ‘SumLess’ = FTree.RangeSum(1,Median-1).
    • ‘SumGreater’ = FTree.RangeSum(Median + 1, 10^5).
  6. ‘CurAnswer’ = (‘SumGreater’  -   ‘CountGreaterX’ * ‘Median’ ) + (‘CountLessX’ * Median -  ‘SumLess’ ) = ‘SumGreater’ - ‘SumLess’ - ‘Median’ * (‘CountGreaterX’ - ‘CountLessX’).
  7. Add ‘currAnswer’ to list ‘ret’.
  8. Run a for loop for ‘i’ in range [‘K’,’N’):
    • Remove Array[i-k] from Fenwick tree (‘FTree’) and ordered set(‘OS’).
    • Insert Array[i] in Fenwick tree (‘FTree’) and ordered set(‘OS’).
    • ‘Median’ = OS.getMedian().
    • ‘CountGreaterX’ = OS.CountGreater(‘Median’).
    • ‘CountLessX’ = OS.CountLess(‘Median’).
    • ‘SumLess’ = FTree.RangeSum(1,Median-1).
    • ‘SumGreater’ = FTree.RangeSum(Median + 1, 10^5).
    • ‘CurAnswer’ = ‘SumGreater’ - ‘SumLess’ - ‘Median’ * (‘CountGreaterX’ - ‘CountLessX’).
    • Add ‘currAnswer’ to list ‘ret’.
  9. Return list ‘ret’.

03 Approach

If you could notice what we did in Approach-2, all we need is the sum of half elements and the sum of the other half elements. Also, we need the median of this window, we do this for all windows of size ‘K’.

So we will maintain two multisets ‘Left’ and ‘Right’ here to do so, also we need to remember that half of the sorted elements are in ‘Left’ multiset and remaining in ‘Right’ multiset and elements in ‘Left’ is sorted in non-increasing order and in non-decreasing order for ‘Right’.

So, we could get the median easily by selecting the top or first element in the ‘Left’ multiset.

We will also maintain two variables ‘LeftSum’ and ‘RightSum’ for calculating the sum in left multiset and right multiset. Also, we will always maintain the above-mentioned sorting order for the multisets.

Now as we did in the previous approach, we first inserted all our data of our first ‘K’ sized window i.e., range [0, K) in our data structure similarly, here we will also do the same by 

Sorting first K elements and inserting them into two multisets half in ‘Left’ and half in ‘Right’.

Now we can calculate the cost for the first window using the same equation as we did in the previous Approach-2 And now we can update our data structure i.e.multiset in log(N) time, as we are moving our sliding window, only one element is getting added and removed so at most log(N) time is required. 

Remember we are trying to maintain the size of both sets as close as possible, so we know that the size of each set could be differing by at most 1. Also here numbers in ‘Left’ are stored in non-increasing order and in non-decreasing order for the ‘Right’ multiset, as we need only access to the highest element for ‘Left’ and lowest element for ‘Right’. 

The steps of simulation are as follows, note that it will be much similar to the second approach as we are only changing our data structure to calculate range sum query and median.

 

  1. Declare Two Multisets as ‘Left’ and ‘Right’,
  2. Declare variables ‘LeftSum’ and ‘RightSum’.
  3. Declare a list ‘Ret’, so we could store our answer for each window.
  4. Run a loop for 'i' in the range [0, N):
    • Declare variable ‘curEle’ = Arr[i]
    • if Arr[i] is less than equal to the highest element insert Arr[i] in multiset ‘Left’ and increment ‘LeftSum’ by ‘CurEle’ else insert in ‘Right’ and increment ‘RightSum’ by ‘CurEle’.
    • The balanced size of both sets also makes changes to ‘LeftSum’ and ‘RightSum’ accordingly.
    • If ‘i’ is not less than ‘K’:
      • Declare variable ‘curMedian’ = highest element in ‘Left’
      • Declare variable ‘MinCost’ =   ‘curMedian’ * ( sizeof(‘Left’) - sizeof(‘Right’)) - ‘LeftSum’ + ‘RightSum’
      • Push ‘Mincost’ to list/vector ‘Ret’.
  5. Return list ‘Ret’.