Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
In Deep learning, we looked at Artificial Neural Networks and Conventional Neural Networks (mainly used for Image processing). This blog will talk about the Recurrent Neural Network used primarily in Natural Language Processing tasks. In deep learning, CNNs are especially for images, RNNs are mainly for natural language processing. There are other use cases, so we will understand how Recurrent Neural Network works. We will look at different applications of RNN in natural language processing and other domains. We will be looking at some real-life use cases where sequence models are helpful.
1. You must have used Google, mail, or Gmail here. When you type in a sentence, it will auto-complete it. Google has this Recurrent Neural Network embedded in it where when we type in any sentence, it auto-completes the sentence.
3. The third use case is "Named Entity Recognition," wherein X gives the neural network a statement, and in Y, the neural network will tell us the person's name, the company E, and the time.
Now you think we can't use a simple neural network to solve this problem. All these problems are called sequence modelling problems because the sequence is essential.
When it comes to human language, arrangement is necessary. For example, the second one doesn't make sense when saying "how are you" versus " you are how." So for language translation, we will build this kind of neural network. We know the input is the English statement, and the output could be the Hindi statement once we make this network.
What if my sentence size changes. So I might be inputting different sentence sizes, and with fixed neural network architecture, not going to work because we have to decide how many neurons are there in the input and the output layer. So with language translation, the number of neurons becomes a problem.
The second issue is too much computation. We all know neural networks work on numbers; they don't work on a string. So we have to convert words into a vector. One of the ways of transforming into a vector is to let there are 25 thousand words in our vocabulary, and we will do one-hot encoding for each word. We have to do a similar thing for output as well. This process will increase too many computations. Each word must be converted into a vector, and we need a humongous layer of neurons.
The third issue is sometimes, when we translate language, two different English sentences might have a single Hindi statement. The same statement we can say differently. Hence, the neural network must learn a different set of edges (in yellow) because the parameter is not shared.
By summing up, there are three issues using Artificial Neural Networks for sequence problems.
Variable size of input or output neurons
Too much computation
No parameter sharing
Unrolled RNN
The architecture of an Unrolled Recurrent Neural Network
W = Input to hidden weights
U = Hidden to hidden weights
V = The hidden to output
All W, U, V are shared
Parameters
The Recurrent Neural Network is a neural network in which the output of previous time slices is fed as input to the current time slice. We have the initial states, and using the initial state and the initial input value; we compute the activation. After that, using V, we can calculate the output. So each one, if we take them apart, is a small backpropagation. But they are connected through the hidden layer through time.
So that means every time we create a new input, we get an output, and an error is computed(E). That means error values are spread across different time slices and this particular model. So this is an Unrolled model where we have 'n' time slices, and then there are n inputs that we feed in the time slice one after the other and share the weights along with their time slices. And then, we start computing parameters, say h0, h1, etc.
We can unroll a recurrent neural network through time, share the values of the hidden weights through time, and compute the output in the same manner that we compute the result for a neural network. The only change we made is the computation of different h values using previous values stored in the memory. The current input and the weight vector connect the hidden layer.
RNNs are required to unroll by the length of the input sequence. When RNN is unrolled N times, Neurons are replicated N times inside the network, consuming a humongous amount of memory, especially for long input sequences. This obstructs a small footprint implementation of online learning or adaptation. Also, this "full unrolling" makes parallel training with multiple rows inefficient on shared memory models such as graphics processing units (GPUs).
Unrolled original Recurrent Neural Network with no cycles
Frequently Asked Questions
1. What are the three major drawbacks of Artificial Neural Networks?
Variable size of input or output neurons
Too much computation
No parameter sharing
2. What is the major drawback of Recurrent Neural Network?
=> The memory required for training an extensive recurrent network can quickly grow as the number of time steps climb into hundreds.
3. What are Recurrent Neural Networks?
=> Recurrent Neural Networks are a type of neural network in which the output of previous time slices is fed as input to the current time slice.
4. What are the characteristics of unrolled Recurrent Neural Networks?
=> An unrolled recurrent neural network is acyclic, also known as directed acyclic network, and can be used for topological ordering.
5. What are the applications of Recurrent Neural Networks?
Speech recognition
Image captioning
Stock price prediction, etc.
Key Takeaways
In this blog, we learned the requirements of Recurrent Neural Networks and the drawbacks of Artificial Neural Networks. We also saw how a Recurrent Neural Network could be unfolded into directed acyclic components.