Statistics From A Large Sample

Moderate
0/80
Average time to solve is 25m
profile
Contributed by
11 upvotes
Asked in companies
AmazonHike

Problem statement

You have been given a sample of integers in the range [0, 255]. Since the sample is quite large, you are provided with an array/list “count” whose i-th element denotes the number of times ‘i’ appears in the sample.

You are supposed to calculate the following statistics :
1. minimum: The minimum element in the sample.

2. maximum: The maximum element in the sample.

3. mean: The average of the sample, calculated as the total sum of all elements divided by the total number of elements.

4. median:
If the sample has an odd number of elements, then the median is the middle element once the sample is sorted.
If the sample has an even number of elements, then the median is the average of the two middle elements once the sample is sorted.

5. mode: The number that appears the most in the sample. It is guaranteed to be unique.
Detailed explanation ( Input/output format, Notes, Images )
Input Format :
The first line contains an integer ‘T’ denoting the number of test cases. Then each test case follows.

The first input line of each test case contains 256 space-separated integers where i-th integer denotes the frequency of i-th integer in the sample.
Output Format :
For each test case, print 5 space-separated values denoting minimum, maximum, mean, mode and median respectively. The values must be correct up to 5 decimal places.

Print the output of each test case in a separate line.
Note :
You are not required to print the expected output; it has already been taken care of. Just implement the function.
Constraints :
1 <= T <= 50
0 <= count[i] <= 1000    
Where count[i] is the i-th element of the “count” array/list.

Time limit: 1 sec
Sample Input 1 :
2
1 0 1 1 1 1 1 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
1 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
Sample Output 1 :
0.00000 8.00000 4.66667 5.00000 7.00000
0.00000 2.00000 1.00000 1.00000 1.00000
Explanation of Sample Input 1 :
For the first test case, the elements in the sample are [0, 2, 3, 4, 5, 6, 7, 7, 8]. The minimum and maximum element is 0 and 8 respectively. Mean is (0 + 2 + 3 + 4 + 5 + 6 + 7 + 7 + 8) / 9 = 4.66667. Since the sample size is odd, median is the middle element i.e. 5. The mode is 7 as it appears the most in the sample.

For the second test case, the elements in the sample are [0, 1, 1, 2]. The minimum and maximum element is 0 and 2 respectively. Mean is (0 + 1 + 1 + 2) / 4 = 1.00000. Since the sample size is even, the median is the average of the two middle elements i.e. (1 + 1) / 2 = 1. The mode is 1 as it appears the most in the sample.
Hint

Can you think of iterating through the given “count” array/list?

Approaches (1)
Brute Force Approach

The basic idea of this approach is to iterate through the sample twice. In the first iteration, we will get the minimum, maximum, mean and mode. In the second iteration, we’ll find the median of the sample.

 

Algorithm

 

  • Create 5 variables “minimum”, “maximum”, “mean”, “median” and “mode”. Also, create two variables “sum” and “total” to store the sum and count of elements in the sample. Also, create a variable called “modeCount” and initialize it to zero to store the count of the mode.
  • Start iterating through the given array/list using the variable ‘i’.
    • If frequency of the current element is non-zero then,
      • Update the “minimum” as minimum = min(minimum, count[i]).
      • Update the “maximum” as maximum = max(maximum, count[i]).
      • Update the “sum” as sum = sum + i * count[i].
      • Update the “total” as total = total + count[i].
      • Now, if the “modeCount” is less than the frequency of the current element, update the “mode” as mode = i and modeCount = count[i].
  • Update “mean” as “mean” = “sum” / “total”.
  • Create a variable called “cur” to store the count of elements.
  • Now, start iterating the given array/list to find the median using the variable ‘i’.
    • We will perform the following steps if the frequency of the current element is non-zero.
    • If “cur”  is less than “count” / 2 then we’ll continue to loop. Because we are trying to find the (count/2)-th and ((count+1)/2) -th element.
    • If “total” is odd then the current element will be the median. So “median” = i and break the loop.
    • Otherwise, we’ll have to take the average two middle elements.
    • If “median” is zero which means the current element is the first middle element.
      • If 2 * cur > total which means both middle elements is the current element. So “median” = i and break the loop.
      • Otherwise, the current element is the first middle element. So “median”  = i
    • Otherwise, the current element will be the second middle element. So “median” = “median” + i. Now, take the average so “median” = “median” / 2 and break the loop.
  • Return the “minimum”, “maximum”, “mean”, “median” and “mode”.
Time Complexity

O(1)

 

Since we are iterating through the array/list twice which has a length of 256. So, the overall time complexity will be O(1).

Space Complexity

O(1)

 

We are not using any extra space.

Code Solution
(100% EXP penalty)
Statistics From A Large Sample
Full screen
Console