Table of contents
1.
Introduction
2.
What is an RNN cell?
3.
Types of RNN cells
3.1.
LSTM
3.1.1.
Working of LSTM
3.1.2.
Input gate
3.1.3.
Forget Gate
3.1.4.
Output gate
3.2.
GRU
3.2.1.
Update gate
3.2.2.
Reset gate
4.
Applications
5.
Frequently Asked Questions
6.
Key Takeaways
Last Updated: Mar 27, 2024

Understanding an RNN cell

Author Tashmit
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Recurrent Neural Networks is a generalization of a feed-forward neural network that has internal memory. They are a special kind of neural network designed to deal with sequential data effectively. This kind of data includes time series, text documents, or audio. In RNN, the same function is performed for every data input, while the output of the current input depends on past computation. To make a decision, it considers the current information and the result that it has learned from the last intake.

Source: Link

Also Read, Resnet 50 Architecture

What is an RNN cell?

According to the Tensorflow description, “An RNN cell, in the most abstract setting, is anything that has a state and performs some operation that takes a matrix of inputs.” 

Recurrent Neural Network cells distinguish themselves from the regular neurons because they have a state and thus can recollect details from the past. RNN cells form the spine of recurrent networks.

A sequence of inputs is passed through an RNN cell, one at a time on a mathematical level. The state of the cell assists it to remember the past series and combine that details with the current input to provide an output. An easier way to view it is by unrolling what happens during the sequence, revealing a simpler Deep Network.

Source: Link

As explained in the image, when a cell is unfolded, its result is calculated by the previous output. 

Types of RNN cells

There are two types of RNN cells, namely, LSTMs and GRUs. Both have gates, which have values between 0 and 1 corresponding to each input. The motive of these gates is to forget and retain a few selected inputs, showing that these cells can both remember information from the past and let it go when required, which allows them to handle sequence better.

Now, let us look at the two most common cell types.

LSTM

LSTM is short of Long-Short Term Memory. It is an artificial recurrent neural network architect that is used in the field of deep learning. LSTM has feedback connections, unlike the standard feed-forward neural networks. It can process single data points and an entire data sequence, such as in speech or video. They have a more complicated cell structure than an average recurrent neuron, allowing them to regulate learning better or forget effectively from the different input sources.

Working of LSTM

An LSTM system consists of three gates, input gate, output gate, and forget gate.  

Source: Link

Input gate

The input gate discovers which value from the input should be used to modify the memory. The sigmoid function decides which values to let through 0 or 1. And tanh function gives weightage to the values passed, determining their level of importance ranging from -1 to 1.

Forget Gate

The forget gate discovers the details to be discarded from the block. A sigmoid function decides it. It sees the previous state (ht-1) and the content input (Xt) and outputs a number between 0(omit this) and 1(keep this) for each number in the cell state Ct-1.

Output gate

The output is decided using the input and the memory of the block. The sigmoid function determines which values to let through 0 or 1. However, the tanh function decides which values to let through 0, 1. The tanh function gives weightage to the values passed, determining their level of importance ranging from -1 to 1 and multiplying with a sigmoid output.

GRU

The GRU is the newer version of RNN and is similar to LSTM. In GRU, there is no forget gate; instead, it is only used to update gates. It consists of only two gates, the update gate, and the reset gate.
 

Source: Link

Update gate

The update gate is similar to the gates input gate and forgets gate in LSTM. They are also supposed to judge what information to throw away and what information to store.

Reset gate

The reset gate is supposed to decide how much past information must be forgotten.

Applications

  • Finding the entire parametric space: Beginning with all possible connections and then pruning the redundant ones, leaving us with the important ones. Since starting with ‘all’ possible links is computationally a nightmare, the actual search space in these methods is usually limited.

Source: Link

  • Growing a cell one node at a time: This method relies on tactics used to create decision trees. After every repetition, a new node is added at the top of the graph. The tree grows from the output and ends at leaf nodes when we get both x_t and h_t-1 (the inputs).

Source: Link

  • Genetic Algorithms: RNN cell architectures that are the star performers in the current generation are cross-bred to produce better cell architectures in the next generation.

Source: Link

Frequently Asked Questions

  1. What does an RNN gives as output?
    The output of an RNN layer contains a single vector per sample. This vector is the output of the RNN cell corresponding to the last timestep containing information about the entire input sequence.
     
  2. What is a fully connected layer in RNN?
    Fully Connected Layers are the layer that maps the output of the LSTM layer to the desired output size. 
     
  3. What is the activation function is an RNN?
    The most commonly used activation functions in RNN modules are Sigmoid. Tanh. RELU.

Key Takeaways

RNN is suitable for developing sequence data for predictions but suffers from short-term memory. LSTM’s and GRU’s were built as a method to mitigate short-term memory using mechanisms called gates. They are neural network that regulates the flow of information flowing through the sequence chain. LSTM’s and GRU’s are used in deep learning applications like speech recognition, speech synthesis, natural language understanding, etc. If you’re interested in going deeper, Check out our industry-oriented machine learning course curated by our faculty from Stanford University and Industry experts.

Live masterclass