Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Purpose of Grad CAM
3.
CAM
4.
Grad CAM
5.
Grad CAM Implementation using Keras
6.
Limitations of Grad CAM
7.
Frequently Asked Questions
7.1.
What is the need for Grad CAM? 
7.2.
How is Grad CAM different from CAM? 
7.3.
What are the further improvements in Grad CAM?
8.
Conclusion 
Last Updated: Mar 27, 2024

Grad CAM

Author Arun Nawani
0 upvote
Master Python: Predicting weather forecasts
Speaker
Ashwin Goyal
Product Manager @

Introduction

Convolutional Neural Networks are used extensively in today’s world. They find their applications in the field of medicine and casual purposes as well. While they can maintain unprecedented levels of accuracy in their predictions, we don’t know how exactly they make these predictions. This is a major drawback of Deep learning. There’s a lack of model interpretability which is essential for efficient model understanding and debugging. This is where Gradient-weighted Class Activation Mapping or Grad CAM comes in. 

Purpose of Grad CAM

Deep Learning models are often regarded as Black Box methods for the following reasons-

  • We don’t know which feature the model is focusing on in the input image. 
  • Which neurons were activated in the forward propagation. 
  • How did the model arrive at the prediction. 

 

This questions model’s reliability since there’s no way to validate the prediction methods. Grad CAM algorithm addresses the problem by giving a heatmap of features the model is focusing on. Using Grad CAM, practitioners can visually validate the model predictions themselves, ensuring the model is focusing on the correct features to give the predictions. If the model isn’t focusing on the correct features, then it means:

  • The model hasn’t learned the underlying patterns in the input images. 
  • The training procedure needs to be revisited. 
  • The dataset might not be enough. 
  • And most importantly, the model isn’t ready to be deployed yet. 
Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

CAM

Class Activation Map is a technique to find discriminative class regions for CNN predictions by generating class activation maps. The base network is modified and all the fully connected layers at the end are removed. And a tensor product is included which takes Global-Averaged-Pooled convolutional feature maps as its input and gives the probability of each class as the output. 

Source - link

This is a major drawback of CAM. The model needs to be retrained. Grad CAM addresses this problem. The Grad CAM method doesn’t require you to drop the fully connected layers and can be applied to any general CNN model.

Grad CAM

Grad CAM makes use of CAM procedure but further extends its applicability by incorporating gradient information. The gradient of the loss with respect to the final convolutional layer determines the associated weights with each of its feature maps. The gradients flowing back are global-average-pooled to obtain the weights. 

Source - link

Grad CAM’s heatmap is a weighted combination of feature maps just like CAM. But it’s followed by ReLU function. 

Source - link
 

Source - link

Grad CAM is a generalization of CAM and that’s what allows it to be applicable to any CNN-based architecture.

Grad CAM Implementation using Keras

The implementation sample was provided by the official Keras documentation. The image is taken as a screenshot from here.

import numpy as np
import tensorflow as tf
from tensorflow import keras

# Display
from IPython.display import Image, display
import matplotlib.pyplot as plt
import matplotlib.cm as cm

 

model_builder = keras.applications.xception.Xception
img_size = (299, 299)
preprocess_input = keras.applications.xception.preprocess_input
decode_predictions = keras.applications.xception.decode_predictions

last_conv_layer_name = "block14_sepconv2_act"

# The local path to our target image
img_path = keras.utils.get_file(
  "african_elephant.jpg", "https://i.imgur.com/Bvro0YD.png"
)

display(Image(img_path)

 

Grad CAM algorithm

def get_img_array(img_path, size):
  # `img` is a PIL image of size 299x299
  img = keras.preprocessing.image.load_img(img_path, target_size=size)
  # `array` is a float32 Numpy array of shape (299, 299, 3)
  array = keras.preprocessing.image.img_to_array(img)
  # We add a dimension to transform our array into a "batch"
  # of size (1, 299, 299, 3)
  array = np.expand_dims(array, axis=0)
  return array


def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
  # First, we create a model that maps the input image to the activations
  # of the last conv layer as well as the output predictions
  grad_model = tf.keras.models.Model(
      [model.inputs], [model.get_layer(last_conv_layer_name).output, model.output]
  )

  # Then, we compute the gradient of the top predicted class for our input image
  # with respect to the activations of the last conv layer
  with tf.GradientTape() as tape:
      last_conv_layer_output, preds = grad_model(img_array)
      if pred_index is None:
          pred_index = tf.argmax(preds[0])
      class_channel = preds[:, pred_index]

  # This is the gradient of the output neuron (top predicted or chosen)
  # with regard to the output feature map of the last conv layer
  grads = tape.gradient(class_channel, last_conv_layer_output)

  # This is a vector where each entry is the mean intensity of the gradient
  # over a specific feature map channel
  pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))

  # We multiply each channel in the feature map array
  # by "how important this channel is" with regard to the top predicted class
  # then sum all the channels to obtain the heatmap class activation
  last_conv_layer_output = last_conv_layer_output[0]
  heatmap = last_conv_layer_output @ pooled_grads[..., tf.newaxis]
  heatmap = tf.squeeze(heatmap)

  # For visualization purpose, we will also normalize the heatmap between 0 & 1
  heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
  return heatmap.numpy()

 

Testing

img_array = preprocess_input(get_img_array(img_path, size=img_size))

# Make model
model = model_builder(weights="imagenet")

# Remove last layer's softmax
model.layers[-1].activation = None

# Print what the top predicted class is
preds = model.predict(img_array)
print("Predicted:", decode_predictions(preds, top=1)[0])

# Generate class activation heatmap
heatmap = make_gradcam_heatmap(img_array, model, last_conv_layer_name)

# Display heatmap
plt.matshow(heatmap)
plt.show()

 

Predicted: [('n02504458', 'African_elephant', 9.862389)]

 


 

Creating a superimposed virtualisation

def save_and_display_gradcam(img_path, heatmap, cam_path="cam.jpg", alpha=0.4):
  # Load the original image
  img = keras.preprocessing.image.load_img(img_path)
  img = keras.preprocessing.image.img_to_array(img)

  # Rescale heatmap to a range 0-255
  heatmap = np.uint8(255 * heatmap)

  # Use jet colormap to colorize heatmap
  jet = cm.get_cmap("jet")

  # Use RGB values of the colormap
  jet_colors = jet(np.arange(256))[:, :3]
  jet_heatmap = jet_colors[heatmap]

  # Create an image with RGB colorized heatmap
  jet_heatmap = keras.preprocessing.image.array_to_img(jet_heatmap)
  jet_heatmap = jet_heatmap.resize((img.shape[1], img.shape[0]))
  jet_heatmap = keras.preprocessing.image.img_to_array(jet_heatmap)

  # Superimpose the heatmap on original image
  superimposed_img = jet_heatmap * alpha + img
  superimposed_img = keras.preprocessing.image.array_to_img(superimposed_img)

  # Save the superimposed image
  superimposed_img.save(cam_path)

  # Display Grad CAM
  display(Image(cam_path))


save_and_display_gradcam(img_path, heatmap)

 

Output:

Limitations of Grad CAM

  • Grad CAM algorithm fails to localise multiple occurrences of an object in the input image. 
  • Localisation of heatmap may be inaccurate with reference to class region coverage due to partial derivatives premise. 
  • The continual up and down sampling may lead to signal loss. 
    Also read, Sampling and Quantization

Frequently Asked Questions

What is the need for Grad CAM? 

Deep learning models aren’t very transparent with the way they come up with their predictions. However, as a practitioner, it’s important to ensure the model is focusing on the right features to make its predictions. This problem is addressed by Grad CAM which generates a heatmap that indicates which features in the image are focused on and to what extent.

How is Grad CAM different from CAM? 

CAM method requires you to drop the fully connected layers. This modification means that the model would’ve to be retrained. Grad CAM preserves the fully connected layers. 

What are the further improvements in Grad CAM?

Further improvements in Grad CAM are made in terms of better localisation as well explaining occurrences of multiple objects in a single image which is implemented in improved version of Grad CAM i.e, Grad CAM++.

Conclusion 

The blog beings by highlighting a major drawback of deep learning models, their lack of transparency. This problem was addressed by Grad CAM. the blog thoroughly covers the algorithm along with its implementation and limitations. 

Refer to our Guided Path on Coding Ninjas Studio to upskill yourself in Data Structures and AlgorithmsCompetitive ProgrammingJavaScriptSystem DesignMachine learning and many more! If you want to test your competency in coding, you may check out the mock test series and participate in the contests hosted on Coding Ninjas Studio! But if you have just started your learning process and are looking for questions asked by tech giants like Amazon, Microsoft, Uber, etc; you must look at the problemsinterview experiences, and interview bundle for placement preparations.

Nevertheless, you may consider our paid courses to give your career an edge over others!

Do upvote our blogs if you find them helpful and engaging!

Happy Learning!!

Previous article
Hallucinations in Computer Vision
Next article
Image Super-Resolution
Live masterclass