Table of contents
1.
Introduction
1.1.
Need of dropout
1.2.
Tips to use Dropout regularization
1.3.
Implementation of Dropout in Keras
2.
Frequently Asked Questions
3.
Key Takeaways
Last Updated: Mar 27, 2024

Dropout - Regularization Method

Author soham Medewar
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Let us first understand what regularization means. Regularization is a measure taken against overfitting. Overfitting occurs when a model describes the training set but cannot generalize well over new inputs. Overfitted models have no predictive capacity for data that they haven't seen. 

Regularization for hyperparameters helps modify the gradient so that it doesn't step in directions that lead it to overfit. Regularization includes the following: 

  • Dropout
  • DropConnect
  • L1 regularization
  • L2 regularization


In this article, we will study dropout regularization.

Dropout is a mechanism used to improve the training of neural networks by omitting a hidden unit. It also speeds up training. Dropout is driven by randomly dropping a neuron so that it will not contribute to the forward pass and backpropagation.

Need of dropout

When a fully-connected layer has many neurons, co-adaption is more likely to happen. Co-adaptation can be defined as extracting nearly similar features or hidden features from the input data. Co-adaptation occurs when two different neurons are nearly the same.

This poses two different problems to our model: 

  • Wastage of machine's resources when computing the same output. 
  • If many neurons are extracting the same features, it adds more significance to those features for our model. This leads to overfitting if the duplicate extracted features are specific to only the training set.


We can solve the above problem in the following way:

As the title suggests, we use dropout while training the Neural Networks to minimize co-adaption.

In dropout, we randomly shut down some fraction of a layer's neurons at each training step by zeroing out the neuron values. The fraction of neurons to be zeroed out is the dropout rate, rd. The remaining neurons have their values multiplied by 1 / (1 - rd) so that the overall sum of the neuron values remains the same.

Here is an image that displays the neural network before and after the dropout regularization.

source

Tips to use Dropout regularization

Dropout is a powerful method of regularization that we can use across many models. It is a computationally inexpensive way of regularizing model training by removing units from the network. Dropout works with nearly every type of neural network architecture and has been shown to work well with SGD. 

  • Dropout works by setting a unit's activations to 0.0 temporarily. For input layer neurons, dropout is done with a probability (e.g., probability of retaining or removing an activation) between 0.5 and 1.0.
  • In hidden layers, dropout is done with a probability of 0.5. We can prevent coadaptation among detectors by randomly omitting neurons, which helps drive better generalization in models on held-out data.
  • In general, almost all settings for dropout help, short of extreme settings. A setting of 0.5 for dropout is common and has been seen to work well on a range of networks and goals.
  •  Rather than applying Dropout regularization on a single layer, apply it to all the hidden layers to increase the efficiency of the model. Even when other regularisation approaches are missing, dropout tends to render hidden unit activations sparse, resulting in sparse representations.
  • Dropout may be less beneficial in situations when there is a huge amount of training data. Dropout is more effective on models that are likely to overfit training data; it is also effective for models with limited training data.

Implementation of Dropout in Keras

Dropout has three arguments and they are as follows:

keras.layers.Dropout(rate, noise_shape = None, seed = None)
  • rate - represents the fraction of the input unit to be dropped. It will be from 0 to 1.
  • noise_shape -  represent the dimension of the shape in which the dropout to be applied. For example, the input shape is (batch_size, timesteps, features). Then, to apply dropout in the timesteps, (batch_size, 1, features) need to be specified as noise_shape
  • seed - random seed.

 

Also Read, Resnet 50 Architecture

Frequently Asked Questions

  1. What does dropout do?
    Dropout helps to prevent the model from overfitting. Dropout simply excludes some of the neurons in the hidden layers at the time of training.
     
  2. What is overfitting in Machine Learning?
    Overfitting indicates that your model is too complex for the problem that it is solving.
     
  3. Where do I put a dropout?
    Usually, dropout is placed on the fully connected layers only because they are the ones with the greater number of parameters and thus they're likely to excessively co-adapting themselves causing overfitting.
     
  4. Why does dropout prevent overfitting?
    Dropout prevents overfitting due to a layer's "over-reliance" on a few of its inputs. Because these inputs aren't always present during training (i.e. they are dropped at random), the layer learns to use all of its inputs, improving generalization.

Key Takeaways

In this article, we have discussed the following topics:

  • Regularization and its types
  • Dropout regularization
  • Need of dropout
  • How to use dropout?


Hello readers, here's a perfect course that will guide you to dive deep into Machine learning.

Happy Coding!

Live masterclass