Table of contents
1.
Introduction
2.
 
3.
 
4.
Properties OF CDF
5.
Implementation
5.1.
Using Histogram
5.2.
Sorting Data
6.
FAQs
7.
Key Takeaways
Last Updated: Mar 27, 2024

Cumulative Distribution Function

Author Mayank Goyal
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X or just distribution function of X, evaluated at x, the probability that random variable X will have a value less than or equal to x. The cumulative distribution function is the function brother of PDF. We generally use CDFs on PDF- distributed data.
The unique aspect of CDFs is that they are monotonic. More specifically, monotonic increasing,i.e., it means that the probability will always increase over time, so CDFs are typically used as a scalar of a continuous distribution. 
In the case of a continuous scalar distribution, it gives the area under the probability density function from minus infinity to x.

The CDF of a real-valued X is given by

Fx(x) = P(X<=x)

The right side of this equation represents the probability that random variable X takes a value less than or equal to X.

The possibility that X lies in the interval (a, b] where a < b. 

P(a<X<=b)=Fx(b) - Fx(a)

In simpler terms, we check the probability of X being less than or equal to x. The graph of a CDF is a straight line. Otherwise, it will go up. That is why CDF has a hill-like shape, and precisely that is why CDF is monotonic non-decreasing. CDFs are always right-continuous, so the values continuously go left to right. So if we plot them, they would increase from left to right.

The CDFs of a continuous random variable X can be expressed as the integral of its probability density function fx as follows:

In the case of a random variable X that has discrete distribution, CDF is as follows:

 

 

Properties OF CDF

  • Every cumulative distribution function F(X) is non-decreasing.
  • If the maximum value of the CDF function is at x, F(x) = 1.
  • The CDF ranges from 0 to 1.

Implementation

Using Histogram

Import modules

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

Generating Sample data points

N = 1000
# initializing random values
data = np.random.randn(N)

 

# getting data of the histogram
count, bins_count = np.histogram(data, bins=20)

Finding PDF and CDF using histogram data.

pdf = count / sum(count)
# We use numpy np.cumsum to calculate the CDF
cdf = np.cumsum(pdf)

Plotting Histogram

plt.hist(data, bins=20, density=True)

Output

Plotting CDF and PDF

plt.plot(bins_count[1:], pdf, color="red", label="PDF")
plt.plot(bins_count[1:], cdf, label="CDF")
plt.legend()

Output

 

Sorting Data

Importing Libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

Data Sample Generation

N = 10000
# normal distribution
data = np.random.randn(N)

Sorting the Data

X = np.sort(data)

Calculating the CDF value

Y = np.arange(N) / float(N)

Plotting CDF

plt.xlabel('x-axis')
plt.ylabel('y-axis')
  
plt.title('CDF using sorting the data')
  
plt.plot(x, y, marker='o')

Output

FAQs

  1. Is CDF always continuous?
    The CDF is always continuous for a continuous random variable, and for discrete random variables, CDF is discontinuous.
     
  2. Why is PDF derivative of CDF?
    The PDF is the derivative of a CDF. Thus, a PDF is a function of a random variable, x, and its magnitude will indicate the probability of measuring a particular value. PDF is the slope of a CDF. A PDF must always be positive.
     
  3. What is the significant difference between PDF and CDF?
    The CDF is the probability that random variable values less than or equal to x, whereas the PDF is that a random variable, say X, will take a value exactly equal to x.

Key Takeaways

Let us brief the article.
Firstly, we saw CDF and its monotonic nature. Further, we saw properties of CDF, and lastly, we saw two methods of implementation of CDF. While there are certainly more CDFs to find, it is essential to realize that having a broad understanding of the function is essential. Anyway, it was my quick introduction to CDFs.
Check out this problem - Largest Rectangle in Histogram

I hope you like this article.
Happy Learning Ninjas!

Live masterclass