Table of contents
1.
Introduction
2.
Types of Regularisation
2.1.
L1 Regularisation
2.2.
L2 regularisation
3.
What do L1 and L2 regularisation achieve?
3.1.
Comparison of L1 and L2 regularisation
4.
FAQs
5.
Key Takeaways
Last Updated: Aug 13, 2025

L1 and L2 regularisation

Author Tashmit
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

We have studied various deep learning and regression algorithms, but we missed out on an important concept, i.e., regularisation. Regularisation is a group of techniques that are used to prevent overfitting. Overfitting occurs when a model describes the training set but cannot generalise well over new inputs. In this article, we will study the various techniques used in regularisation.

Also Read About, Resnet 50 Architecture

Types of Regularisation

There are mainly three types of Regularisation:

  • L1 or Lasso Regularisation
  • L2 of Ridge Regularisation
  • Dropout Regularisation

 

Our primary focus will be on L1 and L2 regularisation. You can visit this article to study Dropout regularisation. 

Also, a prior understanding of linear regression is essential; therefore, you can visit Introduction to Linear Regression  and Linear Regression: Theory and code article to brush up on your knowledge.

L1 Regularisation

Let us understand this with an example. Assume that the dataset has two features: salary and experience. Based on the years of experience, we have to predict the salary. Now, let's say that our training dataset has only two values. Therefore, the best-fit line of linear regression will pass through both points.

Hence the cost function or the sum of residual will be 0. Now to test the data:

We can see the test dataset is overfitted. Therefore, we use L1 and L2 regularisation to overcome the overfitting situation. 

In L1 regularization, we reduce the cost function by adding the absolute value of the magnitude of the weights of various features. The mathematical representation is

Source: Link

The L1 regularisation focuses on reducing the steepness of the best-fit line to zero, which is used to reduce the value of the cost function. That results in more accurate outputs. Apart from that, it is also used in feature selection. When the steepness is zero, there are weights whose values are zero; those features can be excluded from the data for prediction. The process of cross-validation decides the value of lambda.

Sample code 

#importing necesary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn import datasets
#importing the dataset
X, y = datasets.load_diabetes(return_X_y=True)
print("Computing regularization path using the LARS ...")
_, _, coefs = linear_model.lars_path(X, y, method="lasso", verbose=True)
xx = np.sum(np.abs(coefs.T), axis=1)
xx /= xx[-1]
#plotting the graph
plt.plot(xx, coefs.T)
ymin, ymax = plt.ylim()
plt.vlines(xx, ymin, ymax, linestyle="dashed")
plt.xlabel("|coef| / max|coef|")
plt.ylabel("Coefficients")
plt.title("LASSO Path")
plt.axis("tight")
plt.show()
You can also try this code with Online Python Compiler
Run Code

Output:

L2 regularisation

L2 regularisation is responsible for dealing with multicollinearity problems. It is used only to reduce the cost function by adding the square of the weights. 

The L2 regularisation also focuses on reducing the steepness and the sum of residuals. It minimizes the square of weights near zero but never equal to zero. The mathematical representation is

Source: Link

In L2 regularisation as well the process of cross-validation decides the value of lambda.

Sample Code

#importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
# X is a 10x10 matrix
X = 1.0 / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)
# Compute paths
n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)
coefs = []
for a in alphas:
    ridge = linear_model.Ridge(alpha=a, fit_intercept=False)
    ridge.fit(X, y)
    coefs.append(ridge.coef_)
# Display results
ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale("log")
ax.set_xlim(ax.get_xlim()[::-1])  # reverse axis
plt.xlabel("alpha")
plt.ylabel("weights")
plt.title("Ridge coefficients as a function of the regularization")
plt.axis("tight")
plt.show()
You can also try this code with Online Python Compiler
Run Code

Output:

What do L1 and L2 regularisation achieve?

By performing L1 and L2 regularisation, the weight values tend towards zero. Still, in L2 regularisation, the weight values tend precisely to zero. When choosing the regularisation term lambda, the goal is to strike the right balance between the low complexity of the model and the accuracy.

Source: Link

The model will be simple but risk underfitting the data if the lambda value is too high. The model won't learn enough about the training data to make valuable predictions.

If the lambda value is too low, the model will be more complex and run the risk of overfitting the data. The model will learn too much about the particularities of the training data and won't generalize to new data.

Do check out this article to know the difference between L1 and L2 regularisation.

Comparison of L1 and L2 regularisation

  • L1 regularisation is preferred over L2 regularisation as it provides a sparse solution when a large number of features are provided. 
  • On the other hand, when a dimensionality reduction algorithm is applied, it loses some relevant features, sometimesL2 regularisation is useful when colinear or codependent features are present, while L2 regularisation might result in an uninterpretable model if there is a high dimensional dataset.

 

For a detailed comparison of L1 and L2 regularisation, please refer our another interesting blog on L1 vs L2 Regularisation

FAQs

  1. What is the importance of regularisation?
    Regularisation plays a vital role in machine learning. It is responsible for overcoming the problem of overfitting.
     
  2. What will happen if the value of lambda is high?
    If the value of lambda increases, model complexity will have a more outstanding contribution to the cost, increasing the value of the cost function instead of reducing it.
     
  3. How do you choose between L1 and L2 regularisation?
    L1 and L2 regularisation have different but equally essential properties. L1 tends to shrink coefficients to zero, whereas L2 shrinks coefficients evenly. L1 is, therefore, valid for feature selection, as we can drop any variables associated with coefficients that go to zero.

Key Takeaways

This article gave a brief explanation about the L1 and L2 regularisation. We discussed types of regularisation, and sample codes of L1 and L2 regularisation respectively. To dive deeper into machine learning, check out our industry-level courses on coding ninjas.

Live masterclass