Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Probability and statistics make the backbone of many machine learning techniques. In this blog, we’ll go through one such aspect of statistics - Random variables. We will quickly brush up on some of the fundamental terminologies in statistics before we dive deeper into random variables.
Basic Terminologies in Statistics
Discrete variable - to put it very simply, discrete variables can take a finite number of values and are countable. For example,
The change in your pocket is countable; hence, it can be classified as a discrete variable.
The number of animals in a sanctuary is finite and can be classified as a discrete variable.
The amount of money in all the banks in the entire city is also finite. It might take a lot of time to count that number, but it’s still countable.
Continuous variable - Continuous variables can take any value within a specified range. The number of possible values is infinite. These are measured values instead of counted values. For example,
The weight of a student in a class with weights ranging between 50 to 70 kg.
The age of a person. It’s essential to understand that age isn’t a discrete variable since age can be as accurate as we want it to be.
Random variable - a variable whose value depends on the outcome of a random event. It is also referred to as random quantity and is defined as a measurable function.
A random variable, by norm, is generally denoted by a capital letter.
The probability distribution of a random variable Y gives out all possible values of Y and probabilities associated with each of these values.
A random variable can be discrete as well as continuous in nature.
Discrete Random Variable
A discrete random variable is a random variable that is discrete in nature. That means the values that the random variable can take are distinct and finite.
For example, Let Y be a random variable representing the sum of 2 dice rolled simultaneously. Given below is the probability distribution table for all possible values of Y.
Here, P(Y) is the associated probability for a given value of random variable Y.
A column chart can represent a discrete random variable probability distribution.
As the name suggests, a continuous random variable is a random variable that is continuous in nature. That means the possible number of values the random variable can take is infinite.
Unlike discrete random variables where a column chart could represent the probability distribution, the probability distribution for a continuous random variable is represented by a density curve.
The probability P(X) for a given range of values for X is given by the area under the graph in that range.
The probability of a continuousrandom variableX being exactly equal to a number is zero.
Probability Density Function
Probability density functions are used for describing probabilities of continuous random variables.
The pdf, denoted with ƒ, of a continuous random variable X satisfies the below-mentioned conditions:-
𝑓(𝑥)≥0, for all 𝑥∈ℝ
𝑓 is piecewise continuous
-∞∞f(x)dx=1
𝑃(𝑎≤𝑋≤𝑏)=baf(x)dx
The initial three conditions are necessary for a function to be a valid pdf. While the fourth condition is how to use pdf to compute probabilities.
Example:
Let a random variable denoted by X represent the waiting time for a person waiting for an elevator. The maximum waiting time could be 2 minutes. So the possible values are in the interval [0,2]. A possible pdf can be given as -
The graph of ƒ is given below, and it can be noted that it satisfies the first three conditions.
From the graph, it can be said ƒ(x) ≥ 0 where x is a real no.
ƒ(x) is piecewise continuous.
We can compute: -∞∞f(x)dx=02x dx = 01x dx + 02(2-x) dx=1
So If we want to compute the probability of a person waiting for less than 30 seconds for the elevator, the expression is given as
𝑃(0≤𝑋≤0.5) = 00.5x dx = 0.125
Cumulative Distribution Function
Cdf can be applied to continuous as well as discrete random variables alike.
Means and Variances of Random Variables
The mean of a discrete random variable X is given by its weighted average. For each X, the weight is the associated probability with X. It is given by the formula-
The mean obtained is generally referred to as the expected value of X.
Law of large numbers - It is observed that as the number of observations for X increases, the mean of the observed values, x, reaches the mean of the population, µ.
Rules for means
Consider X is a random variable with ‘a’ and ‘b’ as fixed numbers, then
µa+bX = a + bµX
Given X and Y are two random variables, then
µX+Y = µX + µY
Variance in Discrete Random Variable
Given X is a discrete random variable with mean µ, then the expression for variance is given by-
Rules for Variances
Given X is a random variable and a and b are fixed numbers, then
Given X and Y are independent random variables, then.
Frequently Asked Questions
What are discrete and continuous variables?
Discrete variables are finite in nature. Example- birds on a tree.
Continuous variables can take any value within a range, which is essentially infinite possible values - weights of a student in class.
What is a variance?
Variance can be defined as the expectation of the squared deviation of a random variable from its sample mean.
What are random variables?
Values of Random variables are unknown and depend on the possible outcomes of a random event.
Conclusion
The blog details out types of variables and deeply explains and contrasts discrete and continuous random variables. Readers are advised to go through the blog and understand each aspect of it since it’s one of the fundamental ideas in data science.
We hope this blog has helped you enhance your Random variables knowledge. To learn more about Data Science and Machine Learning concepts, refer to our data science and machine learning articles.