Table of contents
1.
Introduction
2.
Introduction to Automatic Differentiation 
3.
What is tf.gradientTape?
3.1.
Example 1
3.2.
Example 2 
4.
Various Methods in tf.GradientTape
4.1.
Introduction 
4.1.1.
Automatically Watched Variables 
4.1.2.
Watch_accessed_variables = False 
4.1.3.
Persistent = True 
4.1.4.
Stop_Recording() 
5.
Frequently Asked Questions 
5.1.
What is TensorFlow? 
5.2.
Name three Working components of TensorFlow architecture. 
5.3.
What is gradient descent? 
5.4.
What are the variables in TensorFlow? 
5.5.
What is mnist Dataset? 
6.
Conclusion
Last Updated: Mar 27, 2024
Medium

Finding Gradient in Tensorflow using tf.GradientTape

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

The GradientTape function records the operation for automatic differentiation. The recorded operations are executed within this context manager, and at least one of their inputs is being watched. The tensors can manually watch by invoking the watch method on this context manager. 

Introduction image

Keep reading this blog to understand better how we can find gradients in TensorFlow using tf.gradient tape. Before diving into the methods, we will first look at automatic differentiation and tf.gradients tape. 

Introduction to Automatic Differentiation 

To efficiently and accurately evaluate the derivatives of numerical functions expressed as computer software, a family of techniques known as automatic differentiation (AD), also known as algorithmic differentiation or simply "auto-diff," is been created. AD applications include computational user dynamics, atmospheric sciences, and engineering design optimization. AD is a small but well-established field. The fields of machine learning and AD have mainly been unconnected until very recently, and in some cases, they independently discovered each other's findings. 

General-purpose AD has been absent from the machine learning toolbox despite its applicability; however, this is slowly changing as it continues under the names "dynamic computational graphs" and "differentiable programming." We discuss the key implementation strategies, applications where AD is directly applicable, and the intersection of AD and machine learning. We aim to clarify the usage of the terms "auto diff," "automatic differentiation," and "symbolic differentiation," as these are occurring more frequently in machine learning settings by precisely defining the main differentiation techniques and their interrelationships.

What is tf.gradientTape?

GradientTape() is used to record operations for automatic differentiation

Syntax: tensorflow.GradientTape( persistent, watch_accessed_variables)

Persistent (optional): It can either be True or False with the default value False. It defines whether persistent gradient tape is created or not.

watch_accessed_variables: It is a boolean defining whether the tape will automatically watch any (trainable) variables accessed while the tape is active or not.

Example 1

# Importing the library
import tensorflow as ttf
  
x = ttf.constant(5.0)
  
# Using GradientTape
with ttf.GradientTape() as CN:
  CN.watch(x)
  y = x * x * x
  
# Computing gradient
res  = CN.gradient(y, x)
  
# Printing result
print("result: ",res) 

Output

output

Example 2 

# Importing the library
import tensorflow as ttf
  
p = ttf.constant(4.0)
  
# Using GradientTape
with ttf.GradientTape() as CN:
  CN.watch(p)
  
  # Using nested GradientTape for calculating higher order derivative
  with ttf.GradientTape() as CN:
    CN.watch(x)
    y = p * p * p
  # Computing first order gradient
  first_order_derivative = gg.gradient(y, p)
  
# Computing Second order gradient
second_order_derivative  = gfg.gradient(first_order, p) 
  
# Printing result
print("first_order_derivative: ",first_order)
print("second_order_derivative: ",second_order)

Output 

output

Various Methods in tf.GradientTape

Introduction 

We will track the computations and compute gradients with tf.gradientTape as follows:   

p = tf.Variable(2.0, trainable=True)
with tf.GradientTape() as tape:
    y = p**3


print(tape.gradient(y, p).numpy()) 

Output

output

 

  • GradientTape doesn't track constants by default so we will instruct it with: tape.watch(variable)
  • We perform some computations ranging from cubing it to passing it to the neural networks. 
     

If, at any point, we want to use multiple variables in our calculations, we will pass a list or tuple to those variables.  

Automatically Watched Variables 

GradientTape automatically watches all trainable variables, so if x is a trainable variable instead of a constant, there is no need to instruct the tape to watch it. 

import tensorflow as ttf
p = ttf.constant(2.0)
with ttf.GradientTape() as tape:
    tape.watch(p)
    y = p**3
print(tape.gradient(y, p).numpy())

Output

output

If we will re-run this by replacing the first line with:

x = tf.constant(4.0)

or

x = tf.Variable(4.0, trainable=False)

The code would raise an error, as GradientTape wouldn't be watching x.

Watch_accessed_variables = False 

We can set the GradientTape watch variable to false if we want so that it cannot watch all training variables automatically.  

import tensorflow as ttf
p = ttf.Variable(2.0, trainable=True)
with ttf.GradientTape(watch_accessed_variables=False) as tape:
    y = p**3


print(tape.gradient(y, p))

We can get fine control over what variables we want to watch by disabling watch_accessed_variables 

If you have a lot of trainable variables and are not optimizing them all at once, you should disable watch_accessed_variables to protect yourself from mistakes.

Output

output

Persistent = True 

If we run the given code: 

import tensorflow as tf
x = tf.Variable(5.0, trainable=True)
y = tf.Variable(3.0, trainable=True)
with tf.GradientTape() as tape:
    y1 = x ** 2
    y2 = y ** 3
print(tape.gradient(y1, x).numpy())
print(tape.gradient(y2, y).numpy())

Expected Output should be:

10

27 

 

But actually when we call tapeGradient second time, will raise an error, to resolve this error we need to set the persistent = true. 

import tensorflow as tf
x = tf.Variable(5.0, trainable=True)
y = tf.Variable(3.0, trainable=True)
with tf.GradientTape(persistent=True) as tape:
    y1 = x ** 2
    y2 = y ** 3
                                                                                                                                                                                                                                                                                                                                             
print(tape.gradient(y1, x).numpy())
print(tape.gradient(y2, y).numpy())

Output 

 

Stop_Recording() 

It will temporarily pause the recoding, leading to greater computation speed. 

import tensorflow as ttf
p = ttf.Variable(5.0, trainable=True)
with ttf.GradientTape() as tape:
    y = p**2
    with tape.stop_recording():
        print(tape.gradient(y, p).numpy()) 

Frequently Asked Questions 

What is TensorFlow? 

A Python-based library called TensorFlow is used to build machine learning programs. It is a basic toolkit for performing complicated math. It provides users with the flexibility to create experimental learning architectures. 

Name three Working components of TensorFlow architecture. 

Preprocessing the data -> Building the model >Training the model 
 

What is gradient descent? 

To put it simply, it is a numerical method for identifying the inputs to an equation system that minimizes its Output. This set of equations represents our model in the context of machine learning; the inputs are the model's unobserved parameters, and the Output is a loss function that must be minimized to reflect the amount of error between the model and our data. 

What are the variables in TensorFlow? 

Variables in TensorFlow are objects that can change their values in response to state changes. At the time of execution, it prints the most recent value. It begins when tf.variable.initializer initializes a variable.

What is mnist Dataset? 

Handwriting recognition data from the MNIST (Modified National Institute of Standards and Technology database) to support visual image recognition. 

Conclusion

In this blog, we discussed the introduction to automatic differentiation, and then we discussed the tf.gradientTape and the various methods in tf.gradientTape. 

Refer to our guided paths on Coding Ninjas Studio to learn more about DSA, Competitive Programming, JavaScript, System Design, etc. Enroll in our courses and refer to the mock test and problems available; look at the Top 150 Interview Puzzles interview experiences, and interview bundle for placement preparations. Read our blogs on aptitudecompetitive programminginterview questionsIT certifications, and data structures and algorithms for the best practice.

Live masterclass