**Introduction**

First, let us get to know the history of the AlexNet. The AlexNet CNN architecture11 won the 2012 ImageNet ILSVRC challenge by a large margin: it achieved a 17% top-5 error rate while the second-best achieved only 26%! It was developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. ImageNet is the dataset of 1281167 images for training and 50000 images for training. All the images are classified into 1000 classes, and the size of the data set is around 150GB.

The structure of AlexNet is similar to LeNet-5, but the main difference is it is much larger and deeper. It was the first convolution neural network that stacked convolutional layers on top of each other rather than stacking a pooling layer on top of each convolutional layer.

Before going on to the architecture of the AlexNet, we will get to know some terms that will be useful in understanding the structure of AlexNet.

**Some important terms**

**Stride**

Stride basically denotes how far the filter will move over a convolution layer in each step along one direction. In other words, if the value of stride is 1, then we move the filter 1 pixel each time.

Let us understand stride using an example.

The above figure has a convolution layer of 5Ă—5. A pooling layer of 1 is applied to the convolution layer (The layer surrounded by zero is the pooling layer). A filter of size 3Ă—3 is applied to the layer. Now, let S be the stride; therefore, the dimension of the next layer after processing from the filter will be (W - F + 2Ă—P)/S + 1. Here W is the layer's width, F is the size of the filter that is to be applied, P is the size of the pooling layer, and S is the size of stride.

If the value of S is 2, then the resultant convolution layer will be of size (5-3+2)/2+1, i.e., 3Ă—3.

**Kernels and filters**

A 2D matrix consisting of weights is called a kernel. A filter can be referred to as multiple kernels stacked together. In other words, a filter is the 3D structure of multiple kernels placed on each other.

**Dropout regularization**

Dropout is a mechanism used to improve the training of neural networks by omitting a hidden unit. It also speeds up training. Dropout is driven by randomly dropping a neuron so that it will not contribute to the forward pass and backpropagation.

**Max Pooling**

Max pooling is an operation where the maximum value is calculated for the patches of the feature map. This method is used to make a downsampled feature map. It is generally used after the convolutional layer.