Table of contents
1.
Introduction
2.
Background: RNN and LSTM
2.1.
Recurrent Neural Networks (RNN)
2.2.
Long Short Term Memory (LSTM)
3.
Gated Recurrent Unit (GRU)
3.1.
Working of GRU
3.1.1.
Reset gate
3.1.2.
Update gate
3.1.3.
Calculating the output by using these two gates
3.1.4.
Final output of GRU
4.
Frequently Asked Questions
4.1.
Which is better, LSTM or GRU?
4.2.
What are the three gates in LSTM called?
4.3.
How many hidden states are in GRU?
5.
Conclusion
Last Updated: Jul 11, 2024
Easy

Gated Recurrent Units (GRUs)

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Neural Networks have gained a lot of value due to their ability to solve many problems with great accuracy. Much research and work are done to make these neural networks better, faster, and more accurate. Gated recurrent units (GRU) are a recent development in recurrent neural networks. Let us learn more about GRUs in detail.

source

Also Read, Resnet 50 Architecture

Background: RNN and LSTM

Recurrent Neural Networks (RNN)

RNN is an artificial neural network where the nodes form a sequence. It uses the output from the previous step and the current input to get the output of the current node. It has an internal memory that saves all the information. It uses the same functions on all the inputs and is called ‘recurrent’.

RNN Cell

As RNN process the previous inputs, it performs well on our sequential input. But due to this repeated calculation on all the inputs, the problem of ‘vanishing gradient’ or ‘exploding gradient’ arises. It fails to handle “long-term dependencies” as the inputs stored a long time ago can vanish and become useless.

Long Short Term Memory (LSTM)

LSTM is a modified RNN. It stands for Long Short Term Memory. RNN has a single layer of tanh, while LSTM has three sigmoid gates and one tanh layer. The gates in LSTM decide the information to be sent to the next layer and the information that is to be rejected. The gates allow the gradients to flow unchanged without any calculation, solving the problem. 

LSTM Cell 

We used LSTM for a long time to solve the problem of vanishing gradients. GRU is a recent installment in this field that is similar to LSTM.

Gated Recurrent Unit (GRU)

Gated Recurrent Unit (GRU) is an improved version of RNN. GRUs were introduced in 2014 by Cho, et al. Like LSTM, it uses gating mechanisms to control the flow of information between the network cells. GRU aims to solve the problem of vanishing gradient and performs better than a standard RNN. Let us see what makes them so effective.

Working of GRU

GRU uses a reset gate and an update gate to solve the vanishing gradient problem. These gates decide what information to be sent to the output. They can keep the information from long back without diminishing it as the training continues. We can visualize the architecture of GRU below:

source

Reset gate

Source

The reset gate determines the information of the past that it needs to forget. It uses the same formula as the update gate.

Update gate

Update gate is responsible for long-term memory. It determines the amount of information on the previous steps that must be passed further. The equation used in the update gate is:

Zis the output of the update gate for step t. Xt is the current input. Xt is multiplied by its weight W(z). ht-1 holds the information for the previous t-1 steps. U(z) is the corresponding weight of ht-1. After adding, the sigmoid activation function is applied to it.

Calculating the output by using these two gates

We use these two gates to calculate the final output of GRU. A new memory location is created that stores the information from the past using the reset gate. It is calculated by:

We multiply Xt, the current input, by its weight (W), and ht-1 with weight U. We then calculate the Hadamard product, i.e., the element-wise product between the output of the reset gate (rt) Uht-1. We then take the sum and apply the tanh activation function.

Final output of GRU

The final output (ht) of GRU is calculated by using update gate and h’t, which we calculated in the previous step. ht is calculated by:

We apply the Hadamard product on the update gate (zt) and ht-1 and to 1-zt and h’t, and then we take the sum to get the output of GRU.

source

This is how GRU solves the vanishing gradient problem. It keeps the relevant information and passes down the next step. It can perform excellently if trained correctly.

Frequently Asked Questions

Which is better, LSTM or GRU?

Both have their benefits. GRU uses fewer parameters, and thus, it uses less memory and executes faster. LSTM, on the other hand, is more accurate for a large dataset.

What are the three gates in LSTM called?

They are called the input gate, forget gate, and output gate.

How many hidden states are in GRU?

There is only one hidden state in GRU.

Conclusion

In this article, we have extensively discussed GRU. We saw it’s working and learned about how it uses reset gate and update gate. You can learn more about RNN and LSTM at coding ninjas. To get a complete understanding of various machine learning algorithms, check out our Machine learning course.

Recommended Reading: 

Spring Boot Architecture

Live masterclass