Table of contents
1.
Introduction
2.
Use-Cases
3.
Drawbacks of ANN
4.
Unrolled RNN 
5.
Frequently Asked Questions 
6.
Key Takeaways
Last Updated: Mar 27, 2024

Unrolling Recurrent Neural Network

Author Rajkeshav
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

In Deep learning, we looked at Artificial Neural Networks and Conventional Neural Networks (mainly used for Image processing). This blog will talk about the Recurrent Neural Network used primarily in Natural Language Processing tasks. In deep learning,  CNNs are especially for images, RNNs are mainly for natural language processing. There are other use cases, so we will understand how Recurrent Neural Network works. We will look at different applications of RNN in natural language processing and other domains. We will be looking at some real-life use cases where sequence models are helpful.  
 

Also Read, Resnet 50 Architecture

Use-Cases

1. You must have used Google, mail, or Gmail here. When you type in a sentence, it will auto-complete it. Google has this Recurrent Neural Network embedded in it where when we type in any sentence, it auto-completes the sentence.  

Source

Auto-Completion

2. Another use case is translation. You must have used Google Translate to quickly translate sentences from one language to another. 

Source

Google Translate

3. The third use case is "Named Entity Recognition," wherein X gives the neural network a statement, and in Y,  the neural network will tell us the person's name, the company E, and the time. 

Source

Named Entity Recognition

4. The fourth use case is Sentiment Analysis, where we have a paragraph that will tell us whether this product is one star, two stars, etc. 

Source

Sentiment Analysis

Drawbacks of ANN

Now you think we can't use a simple neural network to solve this problem. All these problems are called sequence modelling problems because the sequence is essential. 

When it comes to human language, arrangement is necessary. For example, the second one doesn't make sense when saying "how are you"  versus " you are how." So for language translation, we will build this kind of neural network. We know the input is the English statement, and the output could be the Hindi statement once we make this network.

Source

  • What if my sentence size changes. So I might be inputting different sentence sizes, and with fixed neural network architecture, not going to work because we have to decide how many neurons are there in the input and the output layer. So with language translation, the number of neurons becomes a problem. 

Source

  • The second issue is too much computation. We all know neural networks work on numbers; they don't work on a string. So we have to convert words into a vector. One of the ways of transforming into a vector is to let there are 25 thousand words in our vocabulary, and we will do one-hot encoding for each word. We have to do a similar thing for output as well. This process will increase too many computations. Each word must be converted into a vector, and we need a humongous layer of neurons.  

Source

  • The third issue is sometimes, when we translate language, two different English sentences might have a single Hindi statement. The same statement we can say differently. Hence, the neural network must learn a different set of edges (in yellow) because the parameter is not shared. 

By summing up, there are three issues using Artificial Neural Networks for sequence problems.

  1. Variable size of input or output neurons
  2. Too much computation
  3. No parameter sharing

Unrolled RNN 

 

 

The architecture of an Unrolled Recurrent Neural Network

 

W = Input to hidden weights

U = Hidden to hidden weights

V = The hidden to output

All W, U, V are shared

Parameters

The Recurrent Neural Network is a neural network in which the output of previous time slices is fed as input to the current time slice. We have the initial states, and using the initial state and the initial input value; we compute the activation. After that, using V, we can calculate the output. So each one, if we take them apart, is a small backpropagation. But they are connected through the hidden layer through time.

So that means every time we create a new input, we get an output, and an error is computed(E). That means error values are spread across different time slices and this particular model. So this is an Unrolled model where we have 'n' time slices, and then there are n inputs that we feed in the time slice one after the other and share the weights along with their time slices. And then, we start computing parameters, say h0, h1, etc. 

We can unroll a recurrent neural network through time, share the values of the hidden weights through time, and compute the output in the same manner that we compute the result for a neural network. The only change we made is the computation of different h values using previous values stored in the memory. The current input and the weight vector connect the hidden layer. 

 

RNNs are required to unroll by the length of the input sequence. When RNN is unrolled N times, Neurons are replicated N times inside the network, consuming a humongous amount of memory, especially for long input sequences. This obstructs a small footprint implementation of online learning or adaptation. Also, this "full unrolling" makes parallel training with multiple rows inefficient on shared memory models such as graphics processing units (GPUs).

Source

A Recurrent Neural Network with cycles

 

Source

Unrolled original Recurrent Neural Network with no cycles

Frequently Asked Questions 

1. What are the three major drawbacks of Artificial Neural Networks?

  1. Variable size of input or output neurons
  2. Too much computation
  3. No parameter sharing

2. What is the major drawback of Recurrent Neural Network?

=>  The memory required for training an extensive recurrent network can quickly grow as the number of time steps climb into hundreds.

3. What are Recurrent Neural Networks?

=>  Recurrent Neural Networks are a type of neural network in which the output of previous time slices is fed as input to the current time slice.

4. What are the characteristics of unrolled Recurrent Neural Networks?

=> An unrolled recurrent neural network is acyclic, also known as directed acyclic network, and can be used for topological ordering.

5. What are the applications of Recurrent Neural Networks?

  1.  Speech recognition
  2.  Image captioning
  3.  Stock price prediction, etc. 

Key Takeaways

In this blog, we learned the requirements of Recurrent Neural Networks and the drawbacks of Artificial Neural Networks. We also saw how a Recurrent Neural Network could be unfolded into directed acyclic components.

Check out this article - Padding In Convolutional Neural Network

Don't forget to go through- Fooling Convolutional Network

Live masterclass