Last Updated: 9 Dec, 2020

Median in a stream

Hard
Asked in companies
OlaInfo Edge India (Naukri.com)Samsung

Problem statement

Given that integers are read from a data stream. Your task is to find the median of the elements read so far.

Median is the middle value in an ordered integer list. If the size of the list is even there is no middle value. So the median is the floor of the average of the two middle values.

For example :
[2,3,4] - median is 3.
[2,3] - median is floor((2+3)/2) = 2.


Input Format:
The first line of input contains a single integer T, representing the number of test cases or queries to be run. 

Then the T test cases follow.
The first line of each test case contains the number of elements, N, in the input data stream.

The second line of each test case contains N space separated elements of the input data stream.
Output Format:
For each test case, print the median of the elements after each incoming element. Each median value is separated by a single space.

The output of each test case is printed in a separate line.
Note :
You do not need to print anything, just return the vector of medians after each element is read from the stream. 
Constraints :
1 <= T <= 10
1 <= N <= 10^4
0 <= data <= 10^8
Where T is the number of test cases, N is the number of elements in the data stream.

Time Limit : 1 sec

Approaches

01 Approach

Store the incoming data elements in a vector. Sort the vector everytime you need to output the median.

Algorithm:

  1. Store the incoming elements of the data stream in a vector.
  2. Step by step insert one element and sort the vector as soon as any element is added in the vector.
  3. If the size of the vector is odd, print the middle element of the sorted vector.
    If the size of the vector is even, return the mean of the middle two elements of the sorted vector.

02 Approach

If we sort the data as it appears, the median can be located easily. Insertion Sort is one such algorithm that sorts the data appeared so far. At any instance of sorting, say after sorting i-th element, the first ‘i’ elements of the array are sorted. Insertion sort considers data sorted so far while inserting the next element. 

Algorithm:

  1. Store the incoming elements in a vector.
  2. Use insert operation of insertion sort to sort the elements step by step calculating medians.
  3. If the size of the vector is odd, print the middle element of the sorted vector.
    If the size of the vector is even, return the mean of the middle two elements of the sorted vector.

03 Approach

The idea is to use two heaps, one max - heap and one min - heap. The max-heap will be used to represent elements that are less than the effective median, and min-heap will be used to represent the elements that are greater than the effective median.
After processing the incoming element, the number of elements in heaps differ utmost by 1 element and this will be the case when there are odd number of elements. Thus the heap having more number of elements will give its root value as median. When both the heaps contains same number of elements, average of heaps root data is the effective median.

Algorithm:

  1. Two heaps :
    a.   Max-heap ‘lo’ for storing the smaller half of numbers.
    b.   Min-heap ‘hi’ for storing the larger half of numbers.
  2. The max-heap ‘lo’ is allowed to store, at worst, one element more than the min-heap ‘hi’.
    If ‘K’ elements have been processed:
    a.   If ‘K’ is odd then ‘lo’ will have one more element than ‘hi’.
    b.   If ‘K’ is even then both the heaps will have an equal number of elements.
    Thus in case a) root of ‘lo’ will give us the median and in case b) the average of the roots of both the heaps will be the median.
  3. Adding a number num :

a.  Add num to max-heap ‘lo’. Since ‘lo’ received a new element, we must do a balancing step for ‘lo’ and offer its largest element to ‘hi’.

b.  The min-heap ‘hi’ might end up holding more elements than max-heap ‘lo’, after previous operation. We fix that by removing the smallest element  from ‘hi’ and offering it to ‘lo’.