Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Generally, we use Hypothesis testing to make statistical decisions using experimental data. Hypothesis Testing is an assumption that we make about the population parameter.
Hypothesis testing is an essential part of statistics. We use a hypothesis test to evaluate which of the two mutually exclusive statements about a population is best supported by the sample data.
Basics of Hypothesis Testing
Null Hypothesis
Statistical hypothesis tests are based on a null hypothesis statement that assumes no relationship or association between whatever variables you are testing. The Null hypothesis is a basic assumption made on domain knowledge. For example, the average age of students in class is eighteen.
Alternative hypothesis
Hypothesis testing aims to determine whether the null hypothesis is true or not on a given sample data. If there is enough evidence supporting the null hypothesis given the data, we accept the null hypothesis. Otherwise, if the null hypothesis is unlikely given the data, we might reject the null in favor of the alternative hypothesis.
We use the alternative hypothesis in hypothesis testing contrary to the null hypothesis. We generally consider that the observations result from a real effect. From the example above, the average age of students is not eighteen.
Level of Significance
Once we have the null and alternative hypothesis in hand, we choose a significance level. It is a probability threshold that determines when you reject the null hypothesis. It is impossible to have 100% accuracy for accepting or rejecting a hypothesis. Therefore, we select a level of significance that is usually 5%, which means our output should be 95% confident to give a similar result in each sample.
P-value
After carrying out a test, we reject the null hypothesis if the probability of getting a result as extreme is less than the significance level. The likelihood of seeing a result as extreme or more extreme than the one observed is the p-value.
The P-value is the likelihood of finding the observed, or more extreme, results when the null hypothesis of a study question is true — the definition of ‘extreme’ depends on how we test the hypothesis.
If the P-value is less than the chosen significance level, we reject the null hypothesis, i.e., accept that our sample gives reasonable evidence to support the alternative hypothesis.
Type I error
Type I error occurs when we reject the null hypothesis, even when the Null hypothesis is confirmed. Type I error is denoted by alpha. The normal curve that shows the critical region is called the alpha region in hypothesis testing.
Type II error
When we accept the null hypothesis, it is false. We denote Type II errors by beta. In Hypothesis testing, the normal curve that shows the acceptance region is called the beta region.
Now let us see some of the widely used hypothesis testing types:-
T-test
We use the T-test to compare the mean of two given samples. In the t-test, the sample has a normal distribution. We use the t-test when the population parameters (mean and standard deviation) are unknown.
There are three versions of the t-test:
Independent samples or a Two-sample T-test that compares the mean for two groups.
Paired sample t-test compares means of the same group at different intervals.
One sample t-test tests the mean of one sample against a known mean.
The statistic for this hypothesis testing is called t-statistic, the score for which is calculated as:
t = (x1 — x2) / (σ / √n1 + σ / √n2)
Where, x1 = mean of sample 1 x2 = mean of sample 2 n1 = size of sample 1 n2 = size of sample 2
Implementation
One sample t-test
Hypothesis: H0(Null Hypothesis): There is no mean difference in the heights of different students. i.e., μ = 170. H1(Alternative Hypothesis): The population mean is less than 170. i.e., μ < 170.
Importing libraries
from scipy.stats import ttest_1samp import numpy as np
if pval<0.05: print("Null hypothesis rejected!") else: print("Null hypothesis accepted!")
Output
Null hypothesis accepted!
ANOVA Test
The t-test works fine while dealing with two samples, but sometimes we want to compare more than two groups simultaneously. In that case, we have to compare the means of each sample. We can carry out a separate t-test for each pair of samples, but you increase the chances of false positives when you conduct many tests.
ANOVA, also known as analysis of variance, compares multiple (three or more) samples with a single test. There are two primary flavors of ANOVA: One-way ANOVA compares the difference between the three or more samples of a single independent variable. Two-way ANOVA: It allows us to test the effect of one or more independent variables on two or more samples.
The hypothesis that we test in ANOVA is Null Hypothesis: All pairs of samples have the equal or same mean. Alternate Hypothesis: Minimum of one pair of samples is significantly different.
Code
Importing Libraries
from scipy.stats import f_oneway import numpy as np
if p > 0.05: print('Probably the same distribution') else: print('Probably different distributions')
Output
Probably the same distribution
Chi-square Test
The test is applied when we have two categorical variables from a single population. We use the chi-square test to determine whether there is a significant association between the two variables.
Z-Test
In a z-test, the sample should have a normal distribution. We calculate Z-score with population parameters such as mean and standard deviation. It is used to validate a hypothesis that the sample drawn belongs to the same population.
Null Hypothesis: Sample mean the same as the population mean. Alternate Hypothesis: Sample mean is not the same as the population mean.
One sample Z-test
Importing Libraries
import pandas as pd from scipy import stats from statsmodels.stats import weightstats as stests
if pval<0.05: print("Null hypothesis rejected!") else: print("Null hypothesis accepted!")
Output
Null hypothesis accepted!
FAQs
Can we change our hypothesis?? It is not a good practice to change hypotheses. Generally, we form a hypothesis prior, and we cannot change it after the data collection.
What is the primary goal of hypothesis testing? Hypothesis testing tests whether the null hypothesis can be rejected or approved. If we reject the null hypothesis, the research hypothesis can be accepted.
What does a p-value of 0.05 mean? The null hypothesis is true when P>0.05. A statistically significant test result (P ≤ 0.05) means that the test hypothesis is false and we should reject it. A P value greater than 0.05 means that no effect was observed.
Key Takeaways
Let us brief the article.
Firstly we saw hypothesis testing. Further, we saw the basics of hypothesis testing that is needed before the validation of a hypothesis. Lastly, we saw different types of hypothesis testing and under which circumstances they are used.
I hope you all like this article. Happy Learning Ninjas!