Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Mean
2.1.
Arithmetic Mean
2.2.
Geometric mean
2.3.
Harmonic Mean
3.
Median
4.
Mode
5.
Implementation
6.
Frequently Asked Questions
7.
Key Takeaways
Last Updated: Mar 27, 2024

Numerical Data - Measure of Central Tendency

Author Mayank Goyal
0 upvote
Master Python: Predicting weather forecasts
Speaker
Ashwin Goyal
Product Manager @

Introduction

The measure of central tendency is a single value that attempts to describe data by identifying the central position of the data. Sometimes, measures of central tendency are also known as measures of central location. They are also known as summary statistics. It is a measurement of central location.

 

There are three main measures of central tendencies,i.e., mean, median, and mode, which we can calculate using the pandas' python library methods.

Mean

There are three ways to calculate the mean of the dataset. The ways are

Arithmetic Mean

It is the simplest of all, and simply it is the sum of observations in the dataset divided by the number of observations.

 

Mathematically,

 

                                                              Img_src

 

Geometric mean

Geometric Mean is the n-th root of the product of n observations in a data series. Mathematically,

                                                              Img_src

 

Harmonic Mean

Harmonic Mean is the reciprocal of the arithmetic mean,i.e., reciprocal of all observations in a data series. It is generally preferred when giving higher weightage to lower magnitude data observations.

 

Mathematically, 

 

   

                                                                  Img_src

 

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Median

Represents the middle value of a data series, sorted in ascending or descending order. In the case of an odd count of observations(n+1)/2 th observation is the median. While in the case of even count of the observation, the average of n/2 and (n/2)+1 th observation is chosen.

 

While if the distribution is in the form of continuous or discrete data, then we calculate the cumulative Frequency like below:

Random variable(x) Frequency of x Cumulative frequency
1 10 10
2 12 22
3 15 37
4 16 53
5 18 71
6 23 94
7 28 122
Total    122

Here the total number of observations is 122, which is even. So, the median would be the N/2 = 61st observation. By looking at the cumulative Frequency, we can determine the 61st observation. In the table above, which lists discrete observations, we expect the 61st observation to be 5. Why? There are 53 observations which are ≤ 4. The following 18 observations are valued at 5. So, given 53 + 8 = 61, it means that the 8th observation after the last four should be five and hence the median.

 

While if the observations are divided in class, find out the range where median like we just did in discrete data,i.e., the class of the median, then we can find median use the below formula,

 

                                        Img_src

 

Where,

 

Lm= lower limit of the median class

Um= Upper limit of the median class

CF=Cumulative Frequency before to median class

Fm= Frequency of the median class

N= total number of observations.

 

Mode

One of the most straightforward measures represents the observations with maximum frequency. 

For example, suppose we have series like 2, 3, 4, 4, 4, 4, 5, 7, the mode of this series will be four because it has a maximum frequency,i.e., 4. Suppose we have a continuous distribution like below:

Age Range Count
25-30 29
30-35 43
35-40 22
40-45 41

The age group, 30-35, has a maximum count of 43, which will be our model class. Now to calculate the mode, use the below formula:

 

                                       Img_src

 

Where, 

 

Lm= lower limit of the model class

Fmo= Frequency of the model class

F1= Frequency of the class before the model class

F2= Frequency of the class succeeding model class

CLmo= length of the model class.

 

Implementation

Importing Libraries

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore'

 

Creating a dataframe

data = pd.DataFrame({
        "Data": [2689-21-3457,10,12,13]
})

 

Calculating Mean

Mean = data.mean()
Mean

 

Output

Data    5.538462

dtype: float64

 

Inserting Mean into Dataframe

data["Mean"] = [float(Mean) forin range(len(data))]
data.plot()

 

 

Mode

 

Reading the Dataset

df=pd.read_csv(r"C:\Users\goyal\Desktop\ml\jupyter\train.csv")

 

Calculating Median

Median=df['Fare'].median()
Median

 

Output

14.4542

 

Inserting Median into the dataframe

x=pd.DataFrame({"fare":df['Fare']})
x["Median"] = [float(Median) forin range(len(x))]
x.plot()

 

 

Mode

 

Creating Dataframe

data = pd.DataFrame({
        "Data": [2689-21-3457,10,12,13,2,2,2,3,4,5,64,3,4,5,6,3,8,2,2,1,1,-1,-1,-3,-4,]
})

 

Plotting the Dataframe

y=sns.catplot(x="Data",kind="count",palette="ch:.95",data=data)

 

 

Calculating Mode

Mode = data.mode()
Mode

 

Frequently Asked Questions

  1. What is the best measurement of central tendency?
    Well, there's no best for measure central tendency. It solely depends on the data we are feeding, and different measures have their own positive and negative points depending on the data taken.    
     
  2. When is the mean considered the best measure of central tendency?
    When our data distribution is continuous and symmetrical, we have a normal distribution. However, it all depends on what you are trying to show from your data.
     
  3. Which is greatest in a normally distributed data set: mode, median, or mean?
    The mean, median, and mean are equal if the data set is normally distributed without any skewness.

Key Takeaways

Let us brief the article.

Firstly, we saw different central tendencies applied to numerical data and their basic implementation. Lastly, we saw which central tendencies work best on different data types.

Central tendencies are the foundation for all advanced statistical work and help interpret the data for better results.

I hope you like the article. Keep updated for more exciting articles.

Happy Learning Ninjas!

Next article
Expectation and Variance - Discrete Random Variable
Live masterclass