Table of contents
1.
Introduction
2.
Normalization
3.
Why Batch normalization?
4.
How does Batch normalization actually work?
5.
Implementation of batch normalization using Keras
6.
Advantages of batch normalization
7.
FAQs
8.
Key Takeaways
Last Updated: Mar 27, 2024

Batch Normalization - Implementation

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Deep Learning is a fascinating Machine Learning technique that tries to mimic the functioning of the human brain with the help of neurons. It is the technology behind many remarkable technologies such as driverless cars and virtual assistants like Siri and Alexa.

It involves a set of complicated algorithms that enable the computer to learn from the data that we provide. But the data that we provide needs to be meaningful so that our model can learn from it correctly.

So, Before training our neural network. We need to pre-process our data. One of the components of the pre-processing step is normalization. Batch Normalization is a type of normalization.

link

Also Read, Resnet 50 Architecture

Normalization

Normalization means converting our data into a common scale. It is necessary because our data may contain some data points that are very high and some that are very low. There is also a problem that the features in our dataset can differ significantly. For example, let us assume we have data on the annual salary of some individuals. If we have one feature that tells us about the individual’s age and the other about the salary, these features are not at the same scale. Such data sets can cause problems because the higher data points can give rise to an imbalance in the gradients. This problem is famously known as the exploding gradient problem.

 

link

As we can see above, all the data points are now closer to the center leading to all the values on the same scale. 

Let us now understand what batch normalization is and why it is used.

Why Batch normalization?

Now we understand the necessity of normalization. However, normalizing the entire dataset beforehand can be troublesome. It also makes our model harder to train and slows down the learning speed.

Batch normalization is a type of normalization applied between the layers of the neural network. The entire input data is not used. Instead, it is done in mini-batches throughout the neural network.

We can add batch normalization between the layers of our neural network using existing python libraries. The main aim of Batch normalization is to standardize the input so that their mean is zero with a standard deviation of one. 

How does Batch normalization actually work?

Batch normalization aims to convert our data in such a way that its mean is zero with a variance of one. So the output of one layer is normalized before inputting it into the next layer. 

 

link

As demonstrated in the above image, Batch normalization is used to normalize the output of each layer.

Batch normalization does the following calculations:

1. Calculate the batch mean and batch variance

First, the original mean and variance of the batch are calculated using the formula:

link

Where xis the data points.

 

2. Normalize the inputs

The data is then normalized by using the mean and variance calculated in the previous step.

link

Where 𝝈B is the batch variance, 𝜇B is the batch mean, and 𝝐 is a small float value added in the formula to avoid dividing by zero.

 

3. Scale and shift the data and create the output

Finally, the normalized input is scaled and shifted to get the output of our batch normalization layer.

 

link

Where yi is the output, and 𝞫 is the offset that is learned during the training.

These are the basic calculations performed by the batch normalization layer during the training. Note that this layer has a lot of other parameters that we can use to make the model more accurate. Let us now look at how we can implement batch normalization in our code.

There are various arguments that can be used in BatchNormalization() according to our needs in Keras. The complete information about all these arguments can be found here.

Implementation of batch normalization using Keras

Batch normalization can be easily implemented in our neural network using existing Python libraries. We will focus on the implementation of batch normalization using Keras

Batch normalization can be applied in our neural network using the BatchNormalization layer.

BatchNormalization()

 

Batch normalization can be applied before or after the activation function of the previous layer. 

Generally, it is advised to implement batch normalization before the activation layer. For example:

...
model = Sequential()
model.add(Flatten())
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
...

 

However, batch normalization sometimes can yield a better result if used after the activation layer. For example:

...
model = Sequential() 
model.add(Flatten())
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())
...

 

It is advised to train the neural network using the different placement of batch normalization and using the one that gives us the best result.

This is an example of adding batch normalization layers between the layers of a Convolutional neural network.

from tensorflow.keras.models import Sequential
from keras.layers import BatchNormalization
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense, Dropout, Flatten

model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))
You can also try this code with Online Python Compiler
Run Code

Advantages of batch normalization

  • It reduces the training time of our model.
  • Internal covariant shift is reduced.
  • It makes hypermeter tuning less sensitive.
  • Higher learning rates can be used to train our model.
  • It also regularizes the model to some extent, and the need for other regularization techniques is reduced.

FAQs

  1. Can we train our model without using batch normalization?
    Yes, we can train our model without batch normalization or even without normalization. However, it is highly advised to use normalization as discussed above.
     
  2. Can we use the BatchNormalization() layer as the first layer of our model?
    Yes, we can use it as the first layer of our model. We have to use the keyword argument input_shape  while using it as the first layer.
     
  3. What are the different arguments in BatchNormalization()?
    There are various arguments that can be used in BatchNormalization() according to our needs. For example, epsilon adds a small float number to avoid dividing the variance by zero.

Key Takeaways

This blog gives us a brief introduction to Batch normalization and its implementation. We saw how it could be easily implemented in our model using Keras. There are various other normalization methods. You can learn more about Machine learning and Neural networks on our Coding Ninja machine learning course.

Live masterclass