Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Last Updated: Mar 27, 2024
Difficulty: Medium

Resnet in PyTorch

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

Computers must understand pictures and find common patterns, so we must revise some methods. These methods need to include essential information that makes sense of the images. To solve this, we create deep networks with many layers, which can identify simple and complex features in pictures. This way, the whole system can learn better. However, deep networks encounter the "vanishing gradients" problem, where early layers become too small to make a difference. ResNet, a new network type, was introduced to solve this problem.

ResNet in PyTorch

In this article, we will discuss ResNet in PyTorch. We will also explore its architecture and comparison with other CNN architectures.

PyTorch

PyTorch is a tool that supports developers in working with deep learning models. It includes numerous helpful features for creating and training neural networks. One wonderful thing about it is its dynamic computational graph, which makes fixing errors and flexibly developing models effortless. Developers like PyTorch more than TensorFlow because it operates on simple and familiar Python commands. Also, many people support and enhance PyTorch, so it's widely used in the AI and machine learning community.

PyTorch
Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

ResNets

ResNet, which stands for "Residual Network," is a deep neural network that significantly changed how computers recognize images. The wonderful thing about ResNet is that it utilizes "residual blocks" to understand the difference between what goes in and what emerges from each layer. It makes it easier to train deep networks, making them more accurate in understanding images. Because of ResNet, we can now create better computer programs for tasks like recognizing objects in pictures, finding things in photos, splitting snaps into different parts, and even generating new images. It has become one of the famous and successful deep learning models.

ResNets

Residual Block

In ResNet, a "residual blockserves as a building block for the whole network. It includes multiple layers stacked together. Here's the exciting part: we add the output of one layer to another layer deeper inside the block. Then, they use a particular function to combine the results. This direct connection between layers is called a "shortcut" or "skip connection." These shortcuts enable the network to learn better and make it possible to train intense networks effectively. It's like solving a tricky problem called the "vanishing gradient." Because of these shortcuts, ResNet is excellent at identifying images and works well in computer tasks like object recognition.

Residual Block

Implementation of ResNet

Let's take a step-by-step example to understand the working of ResNet better.

Importing the Libraries

In the first step, we will import all the required libraries such as Numpy for numerical computations, Torch for deep learning framework in the PyTorch, 'torch.nn' module that has tools for constructing a neural network, transform for image transformation to perform computer vision and SubsetRandomSampler to select random sample data from a provided dataset.

  • Python

Python

import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
output

Loading the dataset

In the second step, we must load our dataset into the system. To load our dataset, we use a library called torchvision. It allows us to effortlessly access numerous computer vision datasets and preprocess them for modeling. We define a function called "data_loader" to return the training or test data based on the input.

  • Python

Python

def data_loader(data_dir,
                batch_size,
                random_seed=42,
                valid_size=0.1,
                shuffle=True,
                test=False):
 
    # Normalize the dataset with mean and standard deviation
    normalize = transforms.Normalize(
        mean=[0.4914, 0.4822, 0.4465],
        std=[0.2023, 0.1994, 0.2010],
    )
    # Define transforms
    transform = transforms.Compose([
            transforms.Resize((224,224)),
            transforms.ToTensor(),
            normalize,
    ])
    if test:
        # Load the test dataset
        dataset = datasets.CIFAR10(
          root=data_dir, train=False,
          download=True, transform=transform,
        )
        # Create a data loader for the test dataset
        data_loader = torch.utils.data.DataLoader(
            dataset, batch_size=batch_size, shuffle=shuffle
        )
        return data_loader
    # load the training and validation datasets
    train_dataset = datasets.CIFAR10(
        root=data_dir, train=True,
        download=True, transform=transform,
    )
    valid_dataset = datasets.CIFAR10(
        root=data_dir, train=True,
        download=True, transform=transform,
    )
    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))
    if shuffle:
        np.random.seed(42)
        np.random.shuffle(indices)
    # Split the training dataset
    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)
    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, sampler=train_sampler)
 
    valid_loader = torch.utils.data.DataLoader(
        valid_dataset, batch_size=batch_size, sampler=valid_sampler)
    return (train_loader, valid_loader)
# CIFAR10 dataset
train_loader, valid_loader = data_loader(data_dir='./data',
                                         batch_size=64)
test_loader = data_loader(data_dir='./data',
                              batch_size=64,
                              test=True)
output

Then we normalized the dataset by calculating each color channel's mean and standard deviation. Data loaders are helpful because they allow us to examine and pass the data in small batches. It is useful when dealing with large datasets with millions of pictures.

Build Residual Block

To begin constructing the network, we build a ResidualBlock that we can reuse throughout the entire network. This block includes a special "skip connection" , an optional parameter. The skip connection allows the input "x" to join the block output directly. This ResidualBlock will create our network more efficiently and uncomplicated to train by allowing residual learning.

  • Python

Python

class CodingNinjasResidualBlock(nn.Module):

    # Set parameters and attributes
    def __init__(self, cn_in_channels, cn_out_channels, cn_stride=1, cn_downsample=None):
        super(CodingNinjasResidualBlock, self).__init__()

        # First convolutional layer
        self.cn_conv1 = nn.Sequential(
            nn.Conv2d(cn_in_channels, cn_out_channels, kernel_size=3, stride=cn_stride, padding=1),
            nn.BatchNorm2d(cn_out_channels),
            nn.ReLU()
        )

        # Second convolutional layer
        self.cn_conv2 = nn.Sequential(
            nn.Conv2d(cn_out_channels, cn_out_channels, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(cn_out_channels)
        )
        self.cn_downsample = cn_downsample
        self.cn_relu = nn.ReLU()
        self.cn_out_channels = cn_out_channels
    
    def forward(self, x):
        residual = x
        out = self.cn_conv1(x)
        out = self.cn_conv2(out)
        if self.cn_downsample:
            residual = self.cn_downsample(x)
        out += residual
        out = self.cn_relu(out)
        return out

Construct ResNet

Now that we keep our ResidualBlock ready, we can build our ResNet, a robust deep-learning architecture for image classification. To construct each block, we operate a helper function called _make_layer. This operation adds the layers individually, including the ResidualBlock, as needed.

After finishing the blocks, we add an average pooling layer to lower the spatial dimensions further. Ultimately, we encompass the final linear layer that produces the outcome for each category in our image classification task.

  • Python

Python

class CodingNinjasResNet(nn.Module):
    # Set the parameters and attributes
    def __init__(self, cn_block, cn_layers, cn_num_classes=10):
        super(CodingNinjasResNet, self).__init__()
        self.cn_inplanes = 64
        self.cn_conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )
        self.cn_maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.cn_layer0 = self._make_layer(cn_block, 64, cn_layers[0], stride=1)
        self.cn_layer1 = self._make_layer(cn_block, 128, cn_layers[1], stride=2)
        self.cn_layer2 = self._make_layer(cn_block, 256, cn_layers[2], stride=2)
        self.cn_layer3 = self._make_layer(cn_block, 512, cn_layers[3], stride=2)
        self.cn_avgpool = nn.AvgPool2d(7, stride=1)
        self.cn_fc = nn.Linear(512, cn_num_classes)
    # Acts like a helper function to create a sequence of residual blocks in a layer
    def _make_layer(self, cn_block, cn_planes, cn_blocks, stride=1):
        cn_downsample = None
        if stride != 1 or self.cn_inplanes != cn_planes:
            cn_downsample = nn.Sequential(
                nn.Conv2d(self.cn_inplanes, cn_planes, kernel_size=1, stride=stride),
                nn.BatchNorm2d(cn_planes),
            )
        layers = []
        layers.append(cn_block(self.cn_inplanes, cn_planes, stride, cn_downsample))
        self.cn_inplanes = cn_planes
        for i in range(1, cn_blocks):
            layers.append(cn_block(self.cn_inplanes, cn_planes))
        return nn.Sequential(*layers)
    def forward(self, x):
        x = self.cn_conv1(x)
        x = self.cn_maxpool(x)
        x = self.cn_layer0(x)
        x = self.cn_layer1(x)
        x = self.cn_layer2(x)
        x = self.cn_layer3(x)
        x = self.cn_avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.cn_fc(x)
        return x

# Example usage:
# Instantiate the CodingNinjasResNet
cn_resnet = CodingNinjasResNet(CodingNinjasResidualBlock, [2, 2, 2, 2], cn_num_classes=10)

# Generate a random input tensor (batch_size=1, 3 channels, 224x224 image)
input_tensor = torch.randn(1, 3, 224, 224)

# Pass the input tensor through the model
output = cn_resnet(input_tensor)

# Print the output shape
print("Output shape:", output.shape)
output

Setting the Hyperparameters

You set the parameter according to your need. We must experiment with different hyperparameter values for our model to discover the most suitable setting. The hyperparameters specify the number of epochs, batch size, learning rate, loss function, and optimizer.

  • Python

Python

cn_num_classes = 10
cn_num_epochs = 20
cn_batch_size = 16
cn_learning_rate = 0.01
cn_model = CodingNinjasResNet(CodingNinjasResidualBlock, [3, 4, 6, 3]).to(device)

# Loss and optimizer
cn_criterion = nn.CrossEntropyLoss()
cn_optimizer = torch.optim.SGD(cn_model.parameters(), lr=cn_learning_rate, weight_decay=0.001, momentum=0.9)

# Train the model
cn_total_step = len(cn_train_loader)

Train the Model

In the training method of our model in PyTorch, we load pictures in batches using the train_loader. The model forecasts labels for these images, and we estimate the loss between the predictions and the actual labels using the loss function. After each training epoch, we estimate the model on the validation set and turn off gradient measures to speed up the evaluation process. Following these steps, we iteratively train the model to better execute our image classification task.

  • Python

Python

import gc

cn_total_step = len(cn_train_loader)

for cn_epoch in range(cn_num_epochs):
    for cn_i, (cn_images, cn_labels) in enumerate(cn_train_loader):
        cn_images = cn_images.to(device)
        cn_labels = cn_labels.to(device)
        
        # Forward passing
        cn_outputs = cn_model(cn_images)
        cn_loss = cn_criterion(cn_outputs, cn_labels)
        

        # Backward passing and optimization
        cn_optimizer.zero_grad()
        cn_loss.backward()
        cn_optimizer.step()
        del cn_images, cn_labels, cn_outputs
        torch.cuda.empty_cache()
        gc.collect()
    
    print('Epoch [{}/{}], Loss: {:.4f}'
          .format(cn_epoch + 1, cn_num_epochs, cn_loss.item()))
    

     with torch.no_grad():
        cn_correct = 0
        cn_total = 0
        for cn_images, cn_labels in valid_loader:
            cn_images = cn_images.to(device)
            cn_labels = cn_labels.to(device)
            cn_outputs = cn_model(cn_images)
            _, cn_predicted = torch.max(cn_outputs.data, 1)
            cn_total += cn_labels.size(0)
            cn_correct += (cn_predicted == cn_labels).sum().item()
            del cn_images, cn_labels, cn_outputs
        

print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * cn_correct / cn_total))
output

Comparing ResNet with other CNN architectures.

Below is a comparison of ResNet with other CNN architectures.

Architecture Year Features Use cases
LeNet 1998

LeNet was the first application of CNN.


It uses a sigmoid or tanh activation function. 

It is mostly used in handwritten digit recognition.
AlexNet 2012

AlexNet was the first that used ReLu activation.

 

It used GPU to train its model.

It is mainly used in image recognition.
VGGNet 2014

VGGNet is a deep network with small convolutional filters.

 

It has many configurations, such as VGG16 or VGG19.

It is used in image recognition.
ResNet 2015

ResNet uses Residual connections to tackle vanishing gradients.

 

It introduces "skip connections" or "shortcuts."

 

It has many configurations like ResNet-50, ResNet-101 or ResNet-152.

It is mostly used in large-scale image recognition.
GoogleLeNet 2014

GoogleLeNet uses inception modules for multi-scale features.

 

It has many versions like Inception v1, v2 or v3.

It is primarily used in image classification and object recognition.

Frequently Asked Questions

What in PyTorch is ResNet?

ResNet is a deep neural network architecture in PyTorch created to overcome the vanishing gradient problem by using skip connections or residual blocks.

Using skip connections, how does ResNet work?

ResNet uses skip connections to directly propagate the input across multiple layers during training, facilitating the gradient flow.

What are residual blocks in ResNet?

The essential building components of ResNet are residual blocks. Each block has a shortcut link combining the original input with a group of convolutional layers.

What issue is resolved by ResNet architecture?

ResNet addresses the degradation issue by enabling very deep networks (with more than 100 layers) to train without worrying about disappearing gradients.

How is the architecture of ResNet categorized?

The ResNet architectures group their many layers, such as ResNet-18, ResNet-34, ResNet-50, etc. The number reflects the total number of layers, which includes convolutional and fully connected layers.

Conclusion

In this article, we learned about ResNet in PyTorch. We also realized some unique features of PyTorch that attract developers to operate it. We even examine the term residual block and why it is crucial. To understand better about ResNet, we implemented the ResNet on a sample dataset and discovered its accuracy. We also compared ResNet and other architectures like VGGNet or AlexNet to understand what makes ResNet so special.


Do check out the link to learn more about such topic


You can find more informative articles or blogs on our platform. You can also practice more coding problems and prepare for interview questions from well-known companies on your platform, Coding Ninjas Studio.

Topics covered
1.
Introduction
2.
PyTorch
3.
ResNets
4.
Residual Block
5.
Implementation of ResNet
5.1.
Importing the Libraries
5.2.
Python
5.3.
Loading the dataset
5.4.
Python
5.5.
Build Residual Block
5.6.
Python
5.7.
Construct ResNet
5.8.
Python
5.9.
Setting the Hyperparameters
5.10.
Python
5.11.
Train the Model
5.12.
Python
6.
Comparing ResNet with other CNN architectures.
7.
Frequently Asked Questions
7.1.
What in PyTorch is ResNet?
7.2.
Using skip connections, how does ResNet work?
7.3.
What are residual blocks in ResNet?
7.4.
What issue is resolved by ResNet architecture?
7.5.
How is the architecture of ResNet categorized?
8.
Conclusion