Introduction
Suppose we feed an image to the Convent, and we know this is the bus image. Now, instead of setting the crossentropy loss to maximise bus, I will set up crossentropy loss to maximise ostrich, and then I will backpropagate to the network. We will not modify any of these weights for parameters but only change the Image. I know this is a bus image, but now I am setting the objective to fire for the Ostrich class. So I will backpropagate and change this Image to increase the likelihood of ostrich class. And it turns out that if we do it with very minimal changes to the Image, we can fool the convolutional neural network.
So this is the change.
We have the original Image, and the second Image is the amount of change we made. The third Image is the original Image plus this change. Now to the human eye, there is no distinction, and here you would all first think that this is a bus, and in fact, I do not even see that there is a noise in the third bus. I have taken some Birds and a Temple. The neural network predicts the modified Image as an ostrich in all cases.
We can also do this starting with random images and then optimising them to predict some classes. In all of these cases, the classifier is 99.6% confident in the class. And the network is not just expecting it; it's predicting with very high confidence. So why is this happening?
Images are incredibly highdimensional objects. These are 227 x 227 metrics which is a very high dimensional object. And no matter how much training data we have, we see only a tiny space of this high dimensional Image because it's a real number 227x227. Just imagine the number of possibilities out there. Whether we have 1 million samples or 10 million samples for training, this is much smaller than the number of models in the space. Of these, only a few are images.
So now think of all 227 x227 metrics that you can make and how many of them will be natural images. The probability of realistic images is very, very small. Most of these are random things; they are just matrices that do not make any sense, which look like the images you see here.
We fit some decision boundaries using the training images; these are the decision boundaries we provide. The Red one is class 1, and the rest of the Green part is class 2. We are making these decision boundaries for some thousand types. While doing so, we make decisions for many points that we have not seen. We have not seen much space. But, we have decided that all of them belong to the green class.
Similarly, we decided some of them belong to the Red class. So, in particular, what we have done is, we saw a cheetah class image from Cheetah class.
I saw a few pictures from a cheetah class, and I drew some boundaries around it to say that this is the Cheetah class. But my border also contains images because this is a very high dimensional space, and in that boundary, a lot of points fall in, and some of these points are these random points that have no relation to Cheetah. I have been so aggressive in fitting the training data. I have drawn these boundaries, which also include many of these points. We need to start with these random images that go somewhere inside this boundary, and we are all set. It will begin to detect it as a Cheetah because the classifier has drawn the boundaries.
Also Read About, Resnet 50 Architecture
Frequently Asked Questions

What is fooling the Convolutional network?
Fooling a neural network is the technique to flip the Neural Network's classification by finding the minimum amount of necessary input.

Define Image?
An image is a twodimensional function that represents a measure of some characteristics such as brightness or colours of viewed scene.

How many layers are there in the Convolution Neural network?
There are three layers in a convolution neural network:
1. Convolutional layer
2. Fully connected layer
3. Pooling layer

What is a Pixel? Explain?
An image consists of a finite number of elements, each of which has a particular location or value; these are called Pixels.

What is a Neural network?
Neural networks consist of Neurons that are the core processing unit of the Neural network. There is an input layer, a middle layer, and the Neural network's output layer.