Introduction
Have you ever wondered how your email account accurately segregates regular emails, important emails, and spam emails? It’s not a very complex trick and we’ll learn the secret behind it. This is done with a supervised learning model called Logistic regression (However, it can be done with other machine learning algorithms also, but for the sake of this blog, we’ll stick to Logistic regression).
Logistic regression is employed in supervised learning tasks. More specifically, it is used for classification tasks. We know that name throws some people off. But the regression in the logistic regression is slightly misleading. It is NOT a regression model. Logistic regression is a probabilistic function. That means it makes use of probabilities of events to make its prediction.
Methodology
Suppose we are given a task, say we are given a customer’s banking history and are tasked to find if the customer can be sanctioned a loan. Basically we need to find if given a loan, will the customer default on payment or not. We can use logistic regression for this purpose. It will be a binary classification between ‘Yes’ or ‘No’. Logistic regression makes use of a sigmoid function and it is of the form -
We know the straight line equation -
y = w0 + w1x
We know the sigmoid function has a range between 0 and 1. So let’s divide the above equation by 1-y.
y / (1-y) : 0 for y = 0 and ∞ for y = 1
But we require our function to be between -∞ to +∞. For that, we’ll take logarithm so the new equation is:
Log (y / (1-y)) = w0 + w1x
Upon simplifying our final equation then becomes -
Source - Link
Here y = predicted probability belonging to the default class( default class is 1(yes))
w0 + w1x = the linear model within logistic regression.
Also, the function is of the form of a sigmoid function
The Sigmoid function has a range between 0 and 1. And therefore forms an S-like curve.
The logistic function predicts the probability of an outcome. Hence its value lies anywhere between 0 and 1. And that’s where it gets its name from. We choose a threshold value above which the final prediction would be 1 and 0 otherwise.
Let’s talk about the linear equation w0 + w1x within the logistic function. Why do we need the logistic regression function in the first place if it stems from linear regression?
It’s because the linear regression equation isn’t confined within a range, unlike logistic regression. And it would be a very difficult task to assign a threshold value for class membership for a linear regression function. Thus we feed the predicted value to a sigmoid function which makes it Logistic regression having a range between 0 and 1. Now since the range is between 0 and 1(no outliers) it would be convenient to do a probabilistic classification.
It represents a linear relationship between the input features and the final output.
Here x = input feature
w0 = bias term
w1 = weight associated with the input variable
Now suppose we take 0.5 as our threshold value. That means
A predicted value >0.5 from the logistic function would have the final prediction as 1 and,
A predicted value ≤0.5 from the logistic function would have the final prediction as 0.
This is also called the decision boundary.
Source - link
Plotting the graph clears what makes logistic regression different from linear regression.