Last Updated: 9 Sep, 2022

Find Median from Data Stream

Moderate
Asked in company
Amazon

Problem statement

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.

Example:
Median for ‘arr’ = [1,2,3,4,5] is 3.
Median for ‘arr’ = [1,2,3,4] is (2+3)/2 = 2.5.

Implement the MedianFinder class:

MedianFinder() initialises the MedianFinder object.

1. Void addNum(int ‘num’) adds the integer ‘num’ from the datastream to the data structure.

2. Double findMedian() returns the median of all elements so far.

Note : Answers within 10^-5 of actual answer will be accepted.

Example:
Input: 
5
1 1
1 2
2
1 5
2

Output:     
1.5
2

Explanation:
MedianFinder() initialises the MedianFinder object.
Add 1 to the data structure ‘arr’, so arr = [1].
Add 2 to arr, so arr = [1,2]
Find Median of current arr, that is (1+2)/2 = 1.5.
Add 5 to arr, so arr = [1,2,5]
Find Median of current arr, that is 2.0.
Input Format :
The first line of the input contains an integer, ‘T’, denoting the number of test cases.

The First line of each test case contains an integer, ‘N’ denoting the number of calls to addNum and findMedian.
Output format :
Return an integer if the call is made to findMedian.
Constraints :
1  <= T <= 10
-100000 <= nums <= 100000
1 <= N <= 50000
There would be at least 1 element in the data structure before calling findMedian.
Time Limit: 1 sec

Approaches

01 Approach

The solution to the problem lies in storing the data in min and max heaps.

We can use priority queue as it is a heap based data structure. For adding data from datastream into the data structure such that the median would be either the top elements of the priority queue or the mean of top elements of both the priority queues. We always divide the data from between such that first half is stored in the min heap based priority queue and other half in max heap based priority queue.

 

The steps are as follows :

Class MedianFinder

  1. Public members:
    • Create a max heap ‘maxHeap’.
    • Create a min heap‘minHeap’.

Function addNum(int ‘num’)

  • If maxHeap is empty or num <= top element of maxHeap
    • Push num to maxHeap.
  • else
    • Push num to minHeap.
  • If size of maxHeap  > size of minHeap+1.
    • Push the top element of maxHeap to minHeap.
    • Pop top element of maxHeap.
  • Else if size of maxHeap < minHeap.
    • Push the top element of minHeap to maxHeap.
    • Pop top element of minHeap.

Function findMedian()

  • If size of maxHeap  == size of minHeap.
    • Return mean of top element of both minHeap and maxHeap.
  • Else
    • Return the top element of maxHeap.