Table of contents
1.
Introduction
2.
Brief Introduction to CNNs
2.1.
CNN Layers
3.
Introduction to Pooling Layer
3.1.
Types of Pooling Operations
4.
Why do we need Pooling Layers in CNN?
4.1.
Dimensionality Reduction
4.2.
Reducing Overfitting
4.3.
Increasing Receptive Field
5.
Pooling Operations in Python
5.1.
Using PyTorch
5.2.
Python
5.3.
Using Tensorflow
5.4.
Python
6.
Advantages of Pooling Layer in CNNs
6.1.
Spatial Hierarchy
6.2.
Translation Invariance
6.3.
Computational Efficiency
6.4.
Feature Generalization
7.
Disadvantages of Pooling Layer in CNNs
7.1.
Loss of Spatial Information
7.2.
Sensitivity to Hyperparameters
7.3.
Unsuitable for All Data Types
8.
Alternatives to Pooling Layer in CNN
8.1.
Strided Convolution
8.2.
Global Average Pooling
8.3.
Attention Mechanisms
9.
Frequently Asked Questions
9.1.
What are some commonly used Activation Functions?
9.2.
Can a CNN work without Pooling Layers?
9.3.
Can CNNs work with other types of data besides images?
10.
Conclusion
Last Updated: Mar 27, 2024
Medium

Pooling Layer in Convolutional Neural Network

Author Abhinav Anand
1 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Convolutional Neural Networks, or CNNs, are specialized for usage in tasks involving visual data. They have layers such as the convolutional layer, which detects features, and the Pooling Layer, which handles downsampling. 

pooling layer in cnn

This article will teach you about the Pooling Layer in Convolutional Neural Networks. We discuss the different layers in CNNs and then move to the Pooling Layer. You will also learn how to use pooling operations with PyTorch and Tensorflow.

Let’s get started.

Brief Introduction to CNNs

cnn architecture

Source: commons.wikimedia.org

Convolutional Neural Network is a type of deep learning model designed to process visual data such as images and videos. It consists of many layers that work together to extract and process features from input data.

CNN Layers

A CNN consists of the following layers:-

  • Input Layer: This layer receives the raw input data, such as pixels from an image. Each node in this layer represents a data feature, and they encode and pass the data to the next layers.
     
  • Convolutional Layer(s): These layers apply convolutional operations on the data received from the input layer. The purpose of these layers is to detect patterns in the input using filters and represent the learned information in feature maps.
     
  • Activation Layer: After applying convolutional operations, an activation function is applied to the feature maps for introducing non-linearity to the CNN.
     
  • Pooling Layers: These layers reduce the spatial dimensions of the data while retaining crucial information. This reduction in dimensions is also known as downsampling. We will discuss these layers in the next section.
     
  • Additional Convolutional and Activation Layers: CNNs usually consist of multiple stacks of these layers to learn more complex hierarchical features.
     
  • Flattening Layer: This layer converts 2D feature maps into 1D vectors as the next layers require a flat input.
     
  • Fully Connected Layers: These layers make high-level decisions based on the features extracted from the input data.
     
  • Output Layer: This layer produces the network’s outputs based on the data processed in the previous layers. The Number of nodes in these layers depends on the task the network has to perform.
     

The next section will discuss the Pooling Layer in Convolutional Neural Networks.

Introduction to Pooling Layer

The convolutional layers in a CNN extract features from the input data and produce feature maps with large spatial dimensions that can increase the computational complexity. The Pooling Layer reduces the spatial dimensions of feature maps while preserving important information. Let’s see how this layer works.

downsampling example

The Pooling Layer uses a two-dimensional filter on the input feature map to compute output values for the region it covers. The filter slides from region to region to cover the entire feature map. There are two important hyperparameters for the filter:-

  1. Size: Determines the size of the region considered at each step.
     
  2. Stride: Determines how many pixels the filter will move after each operation.
     

Depending on the stride, the regions can either be overlapping or non-overlapping. The dimensions of the output feature map can be found using the following formula:-

(Input Size - Filter Size)/Stride+1

Types of Pooling Operations

The output value for each region can be computed using the following operations:-

types of pooling operations
  • Max Pooling: This operation captures the most dominant feature in a region by using the maximum value within that region.
     
  • Average Pooling: This operation results in a smoothed representation of a region as the output value is computed by taking the average of all the values within the region.

 

There is another type of pooling, which is known as Global Pooling. In this technique, the input feature map is downsampled to a single value by the above pooling operations globally without using a filter.

Now, let’s discuss some reasons for using pooling layers in convolutional neural networks.

Why do we need Pooling Layers in CNN?

Pooling layers are an important part of CNNs, and they are used for the following reasons:-

Dimensionality Reduction

Pooling layers are primarily used for reducing the spatial dimension (width and height) of feature maps generated by convolutional layers while retaining important information.

Reducing Overfitting

Pooling layers add a form of regularization to the input data by summarizing the information in feature maps.

Increasing Receptive Field

Nodes in the deeper layers of a CNN have a larger receptive field, which means they can perceive larger regions of the input image. This is possible due to the pooling layers, which help capture complex hierarchical data.

In the next section, you will learn how to use pooling in Python.

Pooling Operations in Python

We will use PyTorch and Tensorflow to perform both max and average pooling operations on a 2D matrix, which is similar to a grayscale input image.

Using PyTorch

  • Python

Python

import torch

import torch.nn.functional as F



# creating a sample 2D matrix

matrix = torch.tensor([[1, 2, 3, 4],

                      [5, 6, 7, 8],

                      [9, 10, 11, 12],

                      [13, 14, 15, 16]], dtype=torch.float32)


# reshaping the input matrix to match shape that PyTorch expects

matrix = matrix.view(1, 1, 4, 4)  # (batch_size, channels, height, width)


# max Pooling with a 2x2 window and stride 2

max_pooled = F.max_pool2d(matrix, kernel_size=2, stride=2)

print("Max Pooled Matrix:\n", max_pooled)


# average Pooling with a 2x2 window and stride 2

avg_pooled = F.avg_pool2d(matrix, kernel_size=2, stride=2)

print("Average Pooled Matrix:\n", avg_pooled)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

 

We first imported PyTorch and the functional module. Then, we created a sample 2D matrix and reshaped it to the expected format. At last, we simply applied the max and average pooling operations using max_pool2d and avg_pool2d functions from PyTorch.

Using Tensorflow

  • Python

Python

import tensorflow as tf


# creating a sample 2D matrix

matrix = tf.constant([[1, 2, 3, 4],

                     [5, 6, 7, 8],

                     [9, 10, 11, 12],

                     [13, 14, 15, 16]], dtype=tf.float32)


# expanding dimensions to match the expected input shape for Keras

matrix = tf.expand_dims(matrix, axis=0)  # adds batch size 1 to the tensor


# expanding dimensions to specify the number of channels (1 for grayscale)

matrix = tf.expand_dims(matrix, axis=-1)  # adds a single color channel


# max pooling with a 2x2 filter and stride 2

max_pooling_layer = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2))

max_pooled = max_pooling_layer(matrix)

print("Max Pooled Matrix:\n", max_pooled)


# average pooling with a 2x2 filter and stride 2

avg_pooling_layer = tf.keras.layers.AveragePooling2D(pool_size=(2, 2), strides=(2, 2))

avg_pooled = avg_pooling_layer(matrix)

print("Average Pooled Matrix:\n", avg_pooled)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

We first imported Tensorflow and created a 2D matrix/tensor representing a grayscale image. To match the expected dimensions, we used the expand_dims function from Tensorflow to add batch and channel dimensions. For the pooling operation, we used the MaxPooling2D and AveragePooling2D functions to create Pooling Layers and applied these layers to the tensor we created.

In the following sections, we will discuss the various advantages and disadvantages of Pooling Layers in Convolutional Neural Networks.

Advantages of Pooling Layer in CNNs

The following are some advantages of the Pooling Layer in CNNs:-

Spatial Hierarchy

Pooling Layers help the network capture high-level features as it goes deeper. For example, the initial layers may only detect simple features such as textures and edges, but the later layers can analyze larger input regions due to the downsampling.

Translation Invariance

The downsampling by Pooling Layers makes the CNN less sensitive to the exact position of images, which means they can recognize patterns or features in the input images regardless of their positioning.

Computational Efficiency

Pooling operations are very simple to compute in comparison with convolutional operations. They also reduce the volume of input data by downsampling their spatial dimensions. 

Feature Generalization

Pooling helps CNNs focus on important features by generalizing information in the local regions. It also reduces noise in feature maps by giving more importance to dominant features.

Disadvantages of Pooling Layer in CNNs

The following are some disadvantages of the Pooling Layer in CNNs:-

Loss of Spatial Information

Pooling Layers perform downsampling on the input feature maps, which leads to the loss of fine details, which is important in applications like medical image analysis.

Sensitivity to Hyperparameters

Pooling is extremely sensitive to the size and stride hyperparameters. Appropriately selecting these parameters can be challenging and may require some trial and error.

Unsuitable for All Data Types

Pooling operations do not preserve the relationships in unstructured data like graphs or point clouds, as each data point can have a different number of connections or relationships.

Now, let’s take a look at some alternatives to pooling layers.

Alternatives to Pooling Layer in CNN

The following are some alternatives to pooling layers in CNNs:-

Strided Convolution

The stride of convolution layers can be increased instead of using pooling layers. Using larger strides results in some regions being skipped, effectively reducing the size of feature maps.

Global Average Pooling

Instead of using filters and sliding them over feature maps, a single value for each map is computed by averaging the values across the entire spatial dimension. We have discussed this in the previous sections as well.

Attention Mechanisms

Attention mechanisms give each part of the input dynamic weightage instead of treating them equally, allowing models to focus on specific parts of the input. These mechanisms are common in deep learning models that deal with natural language processing and computer vision.

Frequently Asked Questions

What are some commonly used Activation Functions?

Some commonly used activation functions in neural networks are the sigmoid function, the hyperbolic tangent function, and the rectified linear unit(ReLU). Out of these, ReLU is the most popular due to its efficiency and simplicity.

Can a CNN work without Pooling Layers?

Yes, CNNs can work without Pooling Layers. These types of CNNs are commonly used in tasks that require sensitivity to fine details, such as object detection and image segmentation.

Can CNNs work with other types of data besides images?

Even though CNNs are primarily designed for grid-like data, they can be easily modified for other data types. For example, 1D CNNs are used for sequential data like time series, and graph CNNs are used for graph-like structure data.

Conclusion

In this article, we discussed the Pooling Layer in Convolutional Neural Networks. We started with a brief introduction to CNNs, and then looked at the details of Pooling Layers. We also used PyTorch and Tensorflow for performing pooling operations on 2D matrices.

You can read the following articles to learn more about machine learning:-

Happy Learning!

Live masterclass