Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Have you ever wondered how human beings recognize images so quickly? For example, if there is an image of dogs, then you can easily recognize whether it is a puppy or an adult dog.
This is because of the remarkable capabilities of our neural networks, specifically the visual system in the brain. The visual system consists of complex networks of interconnected neurons that work together to process and interpret visual information.
Sometimes, it may say Panda is a Bear. Weird right?
So, in this article, we will discuss a technique where CNNs visualize the information without losing any pixel. That technique is called Padding.
So, let us start with revising what CNN is.
What is a Convolutional Neural Network (CNN)?
Convolutional Neural Networks (ConvNets) are powerful deep learning models used for Image Recognition and computer vision tasks. It works on taking an input image, assigning importance to various aspects in the picture, and differentiate one from another.
The main idea behind CNNs is to automatically learn and extract meaningful features from images. Think of features as distinctive patterns or characteristics that help us identify objects or shapes. For example, in a cat image, features could include the shape of the ears or the presence of whiskers.
The architecture of CNN is analogous to that of the connectivity pattern of Neurons in the Human Brain.
What is Padding on CNN?
Padding in the context of CNNs, refers to the technique of adding extra pixels or values around the input data ( usually images) before applying the convolutional operations.
Now, one may think why even perform an additional operation?
Think of padding like adding a protective border around a picture. In a Convolutional Neural Network (CNN), this border is made up of extra pixels. These extra pixels have a very important job: they make sure that the CNN doesn't lose important details and keeps the size of its "feature maps" consistent as it goes through different layers.
Before moving forward, let us discuss the different types of layers of a Convolutional neural network:
Layers of a Convolutional Neural Network
Let's have a look at each layer of a CNN:
Input layer
This is where CNN starts its journey. It's like a big sheet with lots of squares, and each square represents a pixel in the image. Think of this layer as the canvas where the CNN begins to analyze the image.
Convolutional layer
After the input layer, we have the convolutional layer. Imagine placing a magnifying glass over the canvas. This layer looks at the image by sliding a small window (like the magnifying glass) across the canvas, examining the values of the pixels within that window.
Here's where padding comes in. We add a little extra space around the canvas, ensuring that the magnifying glass doesn't miss any parts of the image, even the edges.
The convolutional layer is like a detective searching for patterns in the image. If it finds a pattern it recognizes (like the shape of a cat's ear or a wheel of a car), it signals this discovery with enthusiasm. If it doesn't see a familiar pattern, it's not as excited and gives a weaker signal.
Pooling layer
Next in our CNN journey is the pooling layer. This layer takes the findings from the convolutional layer and decides to make things a bit smaller. Imagine this layer as a filter that lets only the most important information through. It's like having a smaller piece of paper to work with instead of the entire canvas.
Fully connected layer
Finally, we reach the fully connected layer, which is the decision-maker of CNN. It's like the brain of the network. All the information from the previous layers comes together here, and connections are made between neurons. It's like all the detectives in our analogy gathering to make a final decision about what the image is all about.
In a nutshell, Padding in a CNN is like adding extra space around the canvas and ensuring that the magnifying glass and filters cover the entire image, even the edges. This extra space helps preserve important details and makes the CNN more effective in recognizing patterns and making decisions about what it sees in an image.
Let us discuss more on it.
Why Do We Need Padding in CNNs?
When the image is undergoing the process of convolution the kernel is passed according to the stride. While moving, the kernel scans each pixel and in this process it scans a few pixels multiple times and a few pixels less times(borders). In general, pixels in the middle are used more often than pixels on corners and edges. This in turn may cause poor border detection. We can overcome this problem using padding.
Let us take an example to understand it in a better way:
As you can see in the image above, the kernel scans the border less times as compared to the middle pixels.
Here, padding comes into the picture. In padding as we have discussed above, we apply extra pixels.
Here’s how you can do this:
Easy right.
Next, let us discuss the types of padding:
How Does Padding Work?
Padding Basics: Padding adds extra space or characters to the beginning or end of a string to ensure it meets a specific length requirement.
Left Padding: Adds space or characters to the left side of the string, pushing the original content to the right.
Right Padding: Adds space or characters to the right side of the string, shifting the original content to the left.
Purpose: Padding is commonly used in formatting data, such as aligning text in tables or ensuring consistent display widths.
Types of Padding
There are few types of padding like Zero Padding, Valid, Same, Causal, Constant, Reflection and Replication.
Out of these most popular paddings are Valid and Same.
Let us see them more clearly.
Zero Padding:
Zero padding involves adding zeros (or any constant value) to the edges of the data.
Example:
Reflective Padding:
Reflective padding involves mirroring the existing values at the edges of the data.
Example: If you have a 1D array [1, 2, 3, 4]
So, the mirrored array will be: [4, 3, 2, 1]
You apply reflective padding of 1 element to each side, it becomes [3, 2, 1, 2, 3, 4, 3].
Valid Padding:
Valid padding (also known as "no padding") involves not adding any extra elements to the data. It can result in a smaller output size compared to the input.
Same Padding:
Same padding is typically used in convolutional neural networks (CNNs) to ensure that the output size matches the input size by adding padding as needed.
Example: If you have a 1D array [1, 2, 3, 4] and you apply a filter of size 3 with "same" padding, it adds 1 element to each side, resulting in [0, 1, 2, 3, 4, 0].
Difference between Stride And Padding
The table below summarizes the key differences between the both:
Characteristic
Stride
Padding
Definition
The number of pixels that the filter moves over the input image at a time.
The addition of zeros around the edges of the input image.
Effect on output image size
A larger stride results in a smaller output image.
No, but indirectly affects it by increasing the size of the input image.
Default value
1
0
Computational complexity
Lower for larger strides
Higher for larger strides
Ability to preserve detail
Lower for larger strides
Higher for larger strides
Ability to prevent information loss
Lower for valid padding
Higher for same padding
Frequently Asked Questions
What is padding in the context of data processing?
Padding involves adding extra elements or values to the edges of a data structure, such as an array or an image, to achieve specific goals, such as maintaining a desired output size or preserving information near the boundaries.
What are the main types of padding?
The main types of padding include zero padding, reflective (mirror) padding, valid padding, and same padding. These types are used in different contexts and for different purposes.
What is zero padding, and when is it used?
Zero padding involves adding zeros (or any constant value) to the edges of the data. It is often used to extend data while maintaining simplicity and ease of computation. In convolutional neural networks (CNNs), zero padding can help maintain the size of feature maps.
Conclusion
In conclusion, padding is a fundamental technique used in data processing, particularly in fields such as image processing, neural networks, and signal processing. It involves adding extra elements or values to the edges of a data structure for various purposes.