Introduction
Let us first understand what regularization means. Regularization is a measure taken against overfitting. Overfitting occurs when a model describes the training set but cannot generalize well over new inputs. Overfitted models have no predictive capacity for data that they haven't seen.
Regularization for hyperparameters helps modify the gradient so that it doesn't step in directions that lead it to overfit. Regularization includes the following:
- Dropout
- DropConnect
- L1 regularization
- L2 regularization
In this article, we will study dropout regularization.
Dropout is a mechanism used to improve the training of neural networks by omitting a hidden unit. It also speeds up training. Dropout is driven by randomly dropping a neuron so that it will not contribute to the forward pass and backpropagation.
Need of dropout
When a fully-connected layer has many neurons, co-adaption is more likely to happen. Co-adaptation can be defined as extracting nearly similar features or hidden features from the input data. Co-adaptation occurs when two different neurons are nearly the same.
This poses two different problems to our model:
- Wastage of machine's resources when computing the same output.
- If many neurons are extracting the same features, it adds more significance to those features for our model. This leads to overfitting if the duplicate extracted features are specific to only the training set.
We can solve the above problem in the following way:
As the title suggests, we use dropout while training the Neural Networks to minimize co-adaption.
In dropout, we randomly shut down some fraction of a layer's neurons at each training step by zeroing out the neuron values. The fraction of neurons to be zeroed out is the dropout rate, rd. The remaining neurons have their values multiplied by 1 / (1 - rd) so that the overall sum of the neuron values remains the same.
Here is an image that displays the neural network before and after the dropout regularization.
Tips to use Dropout regularization
Dropout is a powerful method of regularization that we can use across many models. It is a computationally inexpensive way of regularizing model training by removing units from the network. Dropout works with nearly every type of neural network architecture and has been shown to work well with SGD.
- Dropout works by setting a unit's activations to 0.0 temporarily. For input layer neurons, dropout is done with a probability (e.g., probability of retaining or removing an activation) between 0.5 and 1.0.
- In hidden layers, dropout is done with a probability of 0.5. We can prevent coadaptation among detectors by randomly omitting neurons, which helps drive better generalization in models on held-out data.
- In general, almost all settings for dropout help, short of extreme settings. A setting of 0.5 for dropout is common and has been seen to work well on a range of networks and goals.
- Rather than applying Dropout regularization on a single layer, apply it to all the hidden layers to increase the efficiency of the model. Even when other regularisation approaches are missing, dropout tends to render hidden unit activations sparse, resulting in sparse representations.
- Dropout may be less beneficial in situations when there is a huge amount of training data. Dropout is more effective on models that are likely to overfit training data; it is also effective for models with limited training data.
Implementation of Dropout in Keras
Dropout has three arguments and they are as follows:
keras.layers.Dropout(rate, noise_shape = None, seed = None) |
- rate - represents the fraction of the input unit to be dropped. It will be from 0 to 1.
- noise_shape - represent the dimension of the shape in which the dropout to be applied. For example, the input shape is (batch_size, timesteps, features). Then, to apply dropout in the timesteps, (batch_size, 1, features) need to be specified as noise_shape
- seed - random seed.
Also Read, Resnet 50 Architecture
Frequently Asked Questions
-
What does dropout do?
Dropout helps to prevent the model from overfitting. Dropout simply excludes some of the neurons in the hidden layers at the time of training.
-
What is overfitting in Machine Learning?
Overfitting indicates that your model is too complex for the problem that it is solving.
-
Where do I put a dropout?
Usually, dropout is placed on the fully connected layers only because they are the ones with the greater number of parameters and thus they're likely to excessively co-adapting themselves causing overfitting.
-
Why does dropout prevent overfitting?
Dropout prevents overfitting due to a layer's "over-reliance" on a few of its inputs. Because these inputs aren't always present during training (i.e. they are dropped at random), the layer learns to use all of its inputs, improving generalization.