Table of contents
1.
Introduction
2.
What is CNN?
3.
Why CNN?
4.
How does CNN work?
4.1.
Convolution Layer
4.2.
Pooling Layer
4.3.
Flattening
4.4.
Fully Connected Layer
5.
AlexNet image classification
6.
Practical implementation of CNN
7.
Architects of CNN
8.
Applications of CNN
9.
Frequently Asked Questions
10.
Key Takeaways
Last Updated: Mar 27, 2024

Understanding of Convolutional Neural Network

Author Tashmit
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

AI has been experiencing tremendous growth in bridging the gap between the capabilities of humans and machines. The researchers work on various aspects of the field to make extraordinary things happen. One such area is the domain of Computer Vision.

The schedule for this field is to train machines to view the world as humans do, use the understanding for a multitude of tasks such as Image & Video recognition, Image Analysis & Classification, Media Recreation, Recommendation Systems, Natural Language Processing, etc. The progress in Computer vision with Deep Learning is constructed and improved with time, primarily over one particular algorithm; a Convolutional Neural Network.

Source: Link

What is CNN?

A Convolutional Neural Network is a Deep Learning algorithm that can take in an input image, assign importance to various aspects in the picture, and differentiate one from another. The preprocessing required in a CNN is much lower as compared to different classification algorithms. 

The architecture of CNN is analogous to that of the connectivity pattern of Neurons in the Human Brain.

Why CNN?

A CNN can successfully store the Spatial and Temporal domain in an image through relevant filters. It performs a better fitting of the image due to the reduced number of features and the reusability of weights, which means the network can be trained to understand the image's fineness better.

How does CNN work?

Source: Link

CNN is different from other neural networks by its superior performance with image, speech, and audio signal inputs. It has four types of layers, which are:

  • Convolutional layer
  • Pooling layer
  • Flattening
  • Fully-connected (FC) layer

Convolution Layer

The Convolution Layer is the initial layer to pull out features from an input image. It preserves the relationship between pixels by learning image features using a small sequence of input data.

Source: Link

Pooling Layer

The pooling layer is another building block of a CNN and plays a vital role in pre-processing an image. This layer lessens the number of parameters when the image is too large.

Source: Link

Flattening

After pooling, the next layer is flattening. As the name suggests, in this layer our pooling result is flattened. The pooling matrix that is created after the pooling layer is converted into a 1-D matrix, all the values are put together in columns one by one. 

Source: Link

This is done because we want to put this layer into the neural network for further processing.

Source: Link

Fully Connected Layer

The value of the pixel of the input image is not directly connected to the output layer. However, in the fully-connected layer, each neuron in the output layer connects directly to a node in the previous layer. Further, the layer performs the task of classification based on the features extracted through the earlier layers and their different filters.

Source: Link

As you can see in the above image, each neuron is connected to the last neuron in a fully connected layer.

AlexNet image classification

Source: Link

AlexNet is an image classification Convolution neural network designed by Alex Krizhevsky that won the ImageNet award in 2012. Although this network is similar to LeNet architecture, it has filters and a deeper network. The image shows that it has 11x11, 5x5, and 3x3 layers of convolution, max pooling, strides, data augmentation, Relu activation. It is built in two pipelines because it was trained for six days.   

To know more detailed information about the layers of CNN, check out this article.

Practical implementation of CNN

#importing the necessary libraries
import torch  
import matplotlib.pyplot as plt  
import numpy as np  
import torch.nn.functional as func  
import PIL.ImageOps  
from torch import nn  
from torchvision import datasets,transforms   
#Transforming and Training the inbuilt dataset MNIST to perform CNN classification
transform1=transforms.Compose([transforms.Resize((28,28)),transforms.ToTensor(),transforms.Normalize((0.5,),(0.5,))])  
training_dataset=datasets.MNIST(root='./data',train=True,download=True,transform=transform1)  
training_loader=torch.utils.data.DataLoader(dataset=training_dataset,batch_size=50,shuffle=True)  
#Function to change the image back to array, so that we can plot it. 
def convert_image(tensor):  
    image=tensor.clone().detach().numpy()  
    image=image.transpose(1,2,0)  
    print(image.shape)  
    image=image*(np.array((1,1,1))+np.array((1,1,1)))  
    image=image.clip(0,1)  
    return image  
#Wraping the train loader to create an object which allows us to go through the alterable training loader one element at a time.
dataiter=iter(training_loader)  
images,labels=dataiter.next()  
fig=plt.figure(figsize=(50,30))  
for idx in np.arange(20):  
    ax=fig.add_subplot(5,20,idx+1)  
    plt.imshow(convert_image(images[idx]))  
    ax.set_title([labels[idx].item()])  
class model(nn.Module):  
        def __init__(self):  
            super().__init__()  
            self.conv1=nn.Conv2d(1,20,5,1)  #Defining the next convolution layer which will adjust the parameters on its own
            self.conv2=nn.Conv2d(20,50,5,1)  
            self.fully1=nn.Linear(4*4*50,100)  
            self.fully2=nn.Linear(500,10)  
        def forward_pass(self,x):  #This function will do the forward feed.
            x=func.relu(self.conv1(x))  
            x=func.max_pool2d(x,2,2)  
            x=func.relu(self.conv2(x))  
            x=func.max_pool2d(x,2,2)  
            x=x.view(-1,4*4*50) #Reshaping the output into desired shape  
            x=func.relu(self.fully1(x)) #Applying relu activation function to our first fully connected layer  
            x=self.fully2(x)    #We will not apply activation function here because we are dealing with multiclass dataset  
            return x      
model=model()
You can also try this code with Online Python Compiler
Run Code

 

Output

This code will generate an output of random number, but one such output is as follows:

Architects of CNN

There are various architectures in the field of Convolutional Networks that are most commonly used.

  • LeNet: It is used to detect handwritten cheques by banks.
  • AlexNet: It is used for any object-detection task.
  • ZF Net: It is used in a diagnostic role.
  • GoogLeNet: It is used for face detection and recognition.
  • VGGNet: It is used for image recognition.
  • ResNet: It is used to train intense neural networks.

Applications of CNN

Some typical applications of CNN can be seen in:

  • Marketing: By the face detection method, social media platforms provide suggestions about who might be in a photograph making it easier to tag friends in photo albums. 
  • Healthcare: With the help of CNN, doctors can identify cancerous tumors that have been incorporated into radiology technology, enabling doctors to identify healthy anatomy better.
  • Retail: Some platforms have included image searching options allowing brands to recommend items.   
  • Automotive: As technology grows, driverless cars are trained features like lane line detection to improve driver and passenger safety.

Check out this article - Padding In Convolutional Neural Network

Also see, Artificial Intelligence in Education

Frequently Asked Questions

  1. What is the difference between neural networks and convolution neural networks?
    CNN has a different design than regular neural networks. Neural networks transform an input by putting it through a series of hidden layers. In CNN, every layer comprises a set of neurons where each layer is fully connected to all neurons in the previous layer.
     
  2. Why is CNN used for image classification?
    CNN is used for image classification and recognition because of its high accuracy. The CNN follows a hierarchical model that builds a network, like a funnel, and finally gives out a fully-connected layer where all the neurons are connected, and the output is processed.
     
  3. Why is CNN preferred over other algorithms?
    The main reason that CNN is preferred over other algorithms is that it automatically detects the necessary features without any human supervision. For example, if it is given many pictures of cats and dogs, it learns distinctive features for each class by itself. CNN is also computationally efficient.

Key Takeaways

CNN is the most commonly used algorithm for image classification; it detects the essential features in an image without any human intervention. This article gave a detailed description of Convolution Neural Networks. It explains what CNN is, how it works, the various architectures which are practically being implemented. Also learned a practical implementation of CNN code. If you're interested in going deeper, Check out our industry-oriented machine learning course curated by our faculty from Stanford University and Industry experts.

Live masterclass