Find Median from Data Stream

Moderate
0/80
Average time to solve is 20m
profile
Contributed by
2 upvotes
Asked in company
Amazon

Problem statement

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.

Example:
Median for ‘arr’ = [1,2,3,4,5] is 3.
Median for ‘arr’ = [1,2,3,4] is (2+3)/2 = 2.5.

Implement the MedianFinder class:

MedianFinder() initialises the MedianFinder object.

1. Void addNum(int ‘num’) adds the integer ‘num’ from the datastream to the data structure.

2. Double findMedian() returns the median of all elements so far.

Note : Answers within 10^-5 of actual answer will be accepted.

Example:
Input: 
5
1 1
1 2
2
1 5
2

Output:     
1.5
2

Explanation:
MedianFinder() initialises the MedianFinder object.
Add 1 to the data structure ‘arr’, so arr = [1].
Add 2 to arr, so arr = [1,2]
Find Median of current arr, that is (1+2)/2 = 1.5.
Add 5 to arr, so arr = [1,2,5]
Find Median of current arr, that is 2.0.
Detailed explanation ( Input/output format, Notes, Images )
Input Format :
The first line of the input contains an integer, ‘T’, denoting the number of test cases.

The First line of each test case contains an integer, ‘N’ denoting the number of calls to addNum and findMedian.
Output format :
Return an integer if the call is made to findMedian.
Constraints :
1  <= T <= 10
-100000 <= nums <= 100000
1 <= N <= 50000
There would be at least 1 element in the data structure before calling findMedian.
Time Limit: 1 sec
Sample Input 1 :
2
5
1 1
1 2
2
1 5
2 
4
1 1
2
1 5
2
Sample Output 1 :
1.5
2
1
3
Explanation Of Sample Input 1 :
Test 1:
MedianFinder() initialises the MedianFinder object.
Add 1 to the data structure ‘arr’, so arr = [1].
Add 2 to arr, so arr = [1,2]
Find Median of the current arr, that is (1+2)/2 = 1.5.
Add 5 to arr, so arr = [1,2,5]
Find Median of current arr, that is 2.0

Test 2:
MedianFinder() initialises the MedianFinder object.
Add 1 to the data structure ‘arr’, so arr = [1].
Find Median of current arr, that is 1.0.
Add 5 to arr, so arr = [1,5]
Find Median of current arr, that is (1+5)/2 = 3.0.
Sample Input 2 :
2
4
1 1
1 5
1 2
2
5
1 1
1 5
2
1 3
2
Sample Output 2 :
2.0
3
3
Hint

Use 2 heap data structure - min and max heap to store data.

Approaches (1)
Heap

The solution to the problem lies in storing the data in min and max heaps.

We can use priority queue as it is a heap based data structure. For adding data from datastream into the data structure such that the median would be either the top elements of the priority queue or the mean of top elements of both the priority queues. We always divide the data from between such that first half is stored in the min heap based priority queue and other half in max heap based priority queue.

 

The steps are as follows :

Class MedianFinder

  1. Public members:
    • Create a max heap ‘maxHeap’.
    • Create a min heap‘minHeap’.

Function addNum(int ‘num’)

  • If maxHeap is empty or num <= top element of maxHeap
    • Push num to maxHeap.
  • else
    • Push num to minHeap.
  • If size of maxHeap  > size of minHeap+1.
    • Push the top element of maxHeap to minHeap.
    • Pop top element of maxHeap.
  • Else if size of maxHeap < minHeap.
    • Push the top element of minHeap to maxHeap.
    • Pop top element of minHeap.

Function findMedian()

  • If size of maxHeap  == size of minHeap.
    • Return mean of top element of both minHeap and maxHeap.
  • Else
    • Return the top element of maxHeap.
Time Complexity

O( N x log(N) ), Where ‘N’ is the number of integers inserted from data stream. 

As for every addNum call, we are inserting data into the heap which takes log(N) per insertion.  

Hence the time complexity is O( N x log(N) ).

Space Complexity

O( N ), Where ‘N’ is the number of integers inserted from the data stream. 

We are maintaining 2 priority queues of N size.

Hence the space complexity is O( N ). 

Code Solution
(100% EXP penalty)
Find Median from Data Stream
Full screen
Console