Regression in machine learning is a subfield of supervised machine learning aimed to establish a relationship between different quantitative variables by formulating mathematical equations. Some of the uses are the following –

• Prediction of the price of a house
• Weather forecast
• Time taken for a vehicle to go from one point to another
• Estimation of product sale

To learn about machine learning, read our blog on – What is machine learning?

Regression in Machine Learning Techniques

There are several regression in machine learning techniques, including –

• Linear Regression
• Logistic Regression
• Ridge Regression
• LASSO Regression
• Polynomial Regression

Linear Regression

Linear regression is the oldest, simplest, and most widely used supervised machine learning algorithm for predictive analytics. Linear regression is one of the easiest ways to perform regression problems in machine learning. It is a statistical technique used to predict or estimate a quantitative variable based on another quantitative variable.

There are two types of Linear Regression –

• Simple Linear Regression
• Multiple Linear Regression

Must Read – Statistical Methods Every Data Scientist Should Know

Simple Linear Regression

Simple linear regression consists of generating a regression model (equation of a line) that allows us to explain the linear relationship that exists between two variables. The dependent variable or response is identified as Y and to the predictor or independent variable as X.

The simple linear regression model is described according to the equation:

AND=β0+β1X1+ϵ

Being β0 the ordinate at the origin, β1 the slope, and ϵthe random error.

The latter represents the difference between the value adjusted by the line and the real value. It collects the effect of all those variables that influence AND but they are not included in the model as predictors. Random error is also known as a remainder.

In the vast majority of cases, the values β0 and β1 population are unknown, so, from a sample, their estimates are β^0 and β^1.

These estimates are known as regression coefficients or least square coefficient estimates, since they take those values ​​that minimize the sum of residual squares, giving rise to the line that passes closest to all the points. (There are alternatives to the method of least squares to obtain the estimates of the coefficients).

Explore data science courses

Multiple Linear Regression

Multiple linear regression allows generating a linear model in which the value of the dependent variable or response (AND) is determined from a set of independent variables called predictors (X1, X2, X3…). It is an extension of simple linear regression, so understanding the latter is critical.

Multiple regression models can be used to predict the value of the dependent variable or to evaluate the influence that predictors have on it (the latter must be analyzed with caution so as not to misinterpret cause-effect).

Multiple linear models follow the following equation:

ANDi=(β0+β1X1i+β2X2i+⋯+βnXni)+andi

β0 – The ordinate at the origin, the value of the dependent variable AND when all predictors are zero.

Βi – The average effect of the increase in one unit of the predictor variable Xi on the dependent variable AND, keeping the rest of the variables constant. They are known as partial regression coefficients.

ANDi – The residual or error, the difference between the observed value and the one estimated by the model.

It is important to bear in mind that the magnitude of each partial regression coefficient depends on the units in which the predictor variable to which it corresponds is measured, so its magnitude is not associated with the importance of each predictor.

To determine what impact each of the variables has on the model, the standardized partial coefficients are used, which are obtained by standardizing (subtracting the mean and dividing by the standard deviation) the predictor variables after adjusting the model.

You may also be interested in exploring:

Logistic Regression

Logistic Regression is a regression method that allows estimating the probability of a binary qualitative variable as a function of a quantitative variable. One of the main applications of logistic regression is binary classification, in which the observations are classified into one group or another depending on the value of the variable used as a predictor.

For example, classifying an unknown individual as male or female based on jaw size.

It is important to bear in mind that, although logistic regression allows classifying, it is a regression model that models the logarithm of the probability of belonging to each group. The final allocation is made based on the predicted probabilities.

Logistic regression also allows calculating the probability that the dependent variable belongs to each of the two categories based on the value acquired by the independent variable.

Ridge Regression

Ridge Regression is a corrective measure to alleviate the problem of multicollinearity in the data or machine learning model. Multicollinearity is handled by reducing the coefficient of estimates of the high correlation of variables (in some cases, the reduction is close to or equal to zero, for large values ​​of the adjustment parameter).

Ridge regression adds a small factor bias to the variables in order to alleviate this problem. It is advisable to use the Ridge regression results (the set of estimated coefficients) with a model selection technique (e.g. – cross-validation) to determine the most appropriate model for the given data.

LASSO Regression

LASSO (Less Absolute Shrink and Selection Operator) Regression is also a regularization methodology that helps to reduce the complexity in a model or data. It prohibits the absolute size of the regression coefficient, leading the coefficient value to get closer to zero. This method uses a penalty that affects the value of the regression coefficients.

As the penalty increases, more coefficients become zero and vice versa. It uses the L1 normalization technique in which the fit parameter is used as the amount of shrinkage.

As the fit parameter increases, the bias increases, and as it decreases, the variance increases. If it is constant, then no coefficient is zero and, since it is, tends to infinity, then all coefficients will be zero.

Polynomial Regression

Polynomial Regression is a special case of Linear Regression, it extends the linear model by adding additional predictors, obtained by raising each of the original predictors to a power. For example, a cubic regression uses three variables, as predictors. This approach provides a simple way to provide a nonlinear fit to the data.

The standard method for extending Linear Regression to a non-linear relationship between the dependent and independent variables has been to replace the linear model with a polynomial function.

Conclusion

Regression is a very deep methodology and different types of regression analysis than the ones listed here exist. These, however, are the most popular ones in machine learning and can help you tap the full potential of your data, and make better business decisions.

—————————————————————————————————————————-

If you have recently completed a professional course/certification, click here to submit a review.

5.00 avg. rating (95% score) - 1 vote