Table of contents
1.
Introduction
2.
Variance
2.1.
Implementation using python
3.
Standard Deviation
3.1.
Implementation using python
4.
Comparison between Variance and Standard Deviation
5.
Application of Variance and Standard Deviation
6.
Frequently Asked Questions
7.
Conclusion
Last Updated: Mar 27, 2024
Easy

Variance and standard deviation

Introduction

Variance is the measure of the average of squared values of the dataset from each point to the mean. If the variance of any dataset is zero, it means all values are identical. Standard deviation is calculated simply as the square root of the variance. It is one of the popular measures of variability as it can return the original units of the data values. In this article, we will be covering both these terms variance and standard deviation and learn how to implement them using python. So let’s get started and dive more into the terms variance and standard deviation.

Also, see -  Locally Weighted Regression.

Variance

Variance is responsible for quantifying spread or dispersion. The characteristics of sample or production describe how much changeability is in it. 

High variance signifies that dataset values are far from their mean. Our dataset will be responsible for high levels of variability. On the other side, low variance signifies that the values are very close to their mean. Our dataset will be responsible for low levels of variability. The steps for calculating variance are:

  • First, calculate the mean(the average of the numbers).
  • For finding the squared difference, for every number subtract the result and then find its square.
  • Lastly, find the average between the squared differences.

 

The variance is expressed by the following mathematical expression:

In the expression given above, 2is the representation for variance, xistands for the individual values from the dataset.  depicts the mean or average value of the dataset and n is the total number of values in the dataset. Now, let’s go through an example for a better understanding. Let’s say you have 5 boxes and their heights are as follows: 120mm, 200mm, 150mm, 400mm, and 320mm.

Mean = (120+200+150+400+320)/5

          = 1190/5

          = 934

So, the mean is 934mm. Now, let’s calculate the difference between the heights. The difference are: -814, -734, -784, -534, -614. Now, we have reached to the final step. Let’s calculate the variance.

Variance = (-814)2+(-734)2+(-784)2+(-534)2+(-614)2/ 5

              = ( 662596 + 538756 + 614656 + 285156 + 376996) / 5

              = 495632

Hence, the variance is found to be 495632.

Standard deviation =495632

                               = 704.011

Implementation using python

 The average can be calculated using the numpy.average() in python. The syntax for the same looks like:

Syntax:

numpy.average(a, axis=None, weights=None, returned=false)

Parameters are:

a: a is the array of data values to be averaged,

axis: the value with which axes to be done with a,

weights: it signifies an array of weights value,

returned: the default value is set to false. If it is true, only a tuple will be returned.

# Python program for average of a list

# Importing the NumPy module
import numpy as np

# Taking a list of elements
list = [1, 2, 3, 4, 5, 6, 7, 8]

# Calculating average using average()
print(np.average(list))
You can also try this code with Online Python Compiler
Run Code

 

Output:

Now, for calculating the variance, take the same dataset as in average. First, calculate the deviations from their mean then find the average of their squared values.

In general, 

(1-4.5)2=12.25

(2-4.5)2=6.25

(3-4.5)2=2.25

(4-4.5)2=0.25

(5-4.5)2=0.25

(6-4.5)2=2.25

(7-4.5)2=6.25

(8-4.5)2=12.25

Variance = (12.25+6.25+2.25+0.25+0.25+2.25+6.25+12.25) / 8 = 5.25

# Python program for average of a list

# Importing the NumPy module
import numpy as np

# Taking a list of elements
list = [1, 2, 3, 4, 5, 6, 7, 8]

# Calculating variance using var()
print(np.var(list))
You can also try this code with Online Python Compiler
Run Code

 

Output:

Standard Deviation

The measure of variation or dispersion is termed as standard deviation. It is applicable for numeric values only. In other words, standard deviation is the square root of the variance and is calculated as 2. hence, for calculating standard deviation just find, =2.

The syntax for the same is:

numpy.std(a, axis=None, dtype=None, ddof=0, keepdims=<no value>)

Parameters used here are:

a: a is the array of data values to be averaged,

axis: the value with which axes to be done with a,

type: it denotes the computing variance type,

out: an alternate output array for keeping the result,

ddof: delta degrees of freedom,

keepdims: if the vague of keepdims is set to true, the axes produced will be in the left side of the result.

Implementation using python

# Importing the NumPy module
import numpy as np

# Taking a list of elements
mylist = [1, 2, 3, 4, 5, 6, 7, 8]

# Calculating standard deviation using std()
print(np.std(mylist))
You can also try this code with Online Python Compiler
Run Code

 

Output:

Comparison between Variance and Standard Deviation

Recommended Topic, Agents in Artificial Intelligence

Application of Variance and Standard Deviation

Variance plays an important role in data interpretation related to statistics. Its real-life usage lies in the polls. Generally, for any opinion-based results, data agencies cannot go to each and every person and take their vote. They set criteria based on the annual income of the population, ethnicity, educational level, salary, etc.

Standard deviation is preferred for data comparison. One of the important usages of standard deviation is weather forecasting. Let’s say if the data forecasted is yielding low results, then it means the sample size probably varies reliably and vice-versa.

See, More, Descriptive Statistics

Frequently Asked Questions

  1. Why variance is not considered a popular measure of variability?
    The reason behind this is that variance does not have the same units of measure as the original data. If the original data is in cm, then the unit of variance will be in cm2.
     
  2. Variance is good or bad for stock investors?
    Low variance tends to be good for risk-prone investments and high variance is good for less risk tolerance investments.
     
  3. Define high standard deviation?
    High standard deviation indicates that a lot of variances is present in the observed data circled around the mean. This also indicates that the data points are very spread out. 

Conclusion

This article covered the introduction to variance and standard deviation, how to calculate variance and standard deviation, and the implementations of variance and standard deviation in python. Keeping the theoretical knowledge at our fingertips helps us get about half the work done. To gain complete understanding, practice is a must. To achieve thorough knowledge on machine learning you may refer to our machine learning course.

Live masterclass