Why Batch normalization?
Now we understand the necessity of normalization. However, normalizing the entire dataset beforehand can be troublesome. It also makes our model harder to train and slows down the learning speed.
Batch normalization is a type of normalization applied between the layers of the neural network. The entire input data is not used. Instead, it is done in minibatches throughout the neural network.
We can add batch normalization between the layers of our neural network using existing python libraries. The main aim of Batch normalization is to standardize the input so that their mean is zero with a standard deviation of one.
How does Batch normalization actually work?
Batch normalization aims to convert our data in such a way that its mean is zero with a variance of one. So the output of one layer is normalized before inputting it into the next layer.
link
As demonstrated in the above image, Batch normalization is used to normalize the output of each layer.
Batch normalization does the following calculations:
1. Calculate the batch mean and batch variance
First, the original mean and variance of the batch are calculated using the formula:
link
Where x_{i }is the data points.
2. Normalize the inputs
The data is then normalized by using the mean and variance calculated in the previous step.
link
Where 𝝈_{B} is the batch variance, 𝜇_{B} is the batch mean, and 𝝐 is a small float value added in the formula to avoid dividing by zero.
3. Scale and shift the data and create the output
Finally, the normalized input is scaled and shifted to get the output of our batch normalization layer.
link
Where y_{i} is the output, and 𝞫 is the offset that is learned during the training.
These are the basic calculations performed by the batch normalization layer during the training. Note that this layer has a lot of other parameters that we can use to make the model more accurate. Let us now look at how we can implement batch normalization in our code.
There are various arguments that can be used in BatchNormalization() according to our needs in Keras. The complete information about all these arguments can be found here.
Implementation of batch normalization using Keras
Batch normalization can be easily implemented in our neural network using existing Python libraries. We will focus on the implementation of batch normalization using Keras.
Batch normalization can be applied in our neural network using the BatchNormalization layer.
BatchNormalization()
Batch normalization can be applied before or after the activation function of the previous layer.
Generally, it is advised to implement batch normalization before the activation layer. For example:
...
model = Sequential()
model.add(Flatten())
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
...
However, batch normalization sometimes can yield a better result if used after the activation layer. For example:
...
model = Sequential()
model.add(Flatten())
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())
...
It is advised to train the neural network using the different placement of batch normalization and using the one that gives us the best result.
This is an example of adding batch normalization layers between the layers of a Convolutional neural network.
from tensorflow.keras.models import Sequential
from keras.layers import BatchNormalization
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense, Dropout, Flatten
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))
Advantages of batch normalization
 It reduces the training time of our model.
 Internal covariant shift is reduced.
 It makes hypermeter tuning less sensitive.
 Higher learning rates can be used to train our model.
 It also regularizes the model to some extent, and the need for other regularization techniques is reduced.
FAQs

Can we train our model without using batch normalization?
Yes, we can train our model without batch normalization or even without normalization. However, it is highly advised to use normalization as discussed above.

Can we use the BatchNormalization() layer as the first layer of our model?
Yes, we can use it as the first layer of our model. We have to use the keyword argument input_shape while using it as the first layer.

What are the different arguments in BatchNormalization()?
There are various arguments that can be used in BatchNormalization() according to our needs. For example, epsilon adds a small float number to avoid dividing the variance by zero.
Key Takeaways
This blog gives us a brief introduction to Batch normalization and its implementation. We saw how it could be easily implemented in our model using Keras. There are various other normalization methods. You can learn more about Machine learning and Neural networks on our Coding Ninja machine learning course.