Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Recurrent Neural Networks (RNNs) are a class of artificial neural networks that are designed to tackle sequential data analysis. They have an internal memory mechanism that efficiently processes sequential inputs such as text, speech, and time-series data.
This article will teach you about Recurrent Neural Networks and build a simple language model using PyTorch.
Let’s get started.
What is a Recurrent Neural Network?
A Recurrent Neural Network (RNN) is a type of artificial neural network designed to process sequential data, like time series or natural language. It utilizes feedback connections, allowing the network to maintain a hidden state that captures information from previous steps in the sequence.
What is Sequential Data?
Sequential data involves inputs that are Interconnected through time, like words in a sentence or successive data points in a time series. The sequence of elements matters, and it impacts the interpretation or analysis.
Some examples of sequential data include time series data (e.g., stock prices over days) and text sequences (e.g., book sentences).
In the next section, we will look at the architecture of a simple recurrent neural network.
Architecture of a Recurrent Neural Network
The architecture of an RNN is designed to handle sequential data by using feedback connections that allow the network to maintain a memory of past inputs. This memory, also known as the hidden state, enables the RNN to capture hidden patterns and temporal relationships in the input sequence.
Let’s take a closer look at each component of an RNN.
Input Layer
At each sequence step, the RNN takes an input vector representing the current element in the sequence. For example, this could be the vector representation of a word in a given sentence in language processing.
Hidden State
The RNN maintains a hidden state that changes as the sequence progresses. The hidden state stores information from previous steps and serves as the memory of the network.
Recurrent Connections
The hidden state from the previous step is combined with the current input to produce an output and update the hidden state for the current step. This looping connection allows the RNN to capture relations among sequential inputs.
Weight Sharing
The same set of weights and biases is used for each step in the sequence. This weight sharing enables the RNN to learn and apply the same patterns to different elements in the sequence.
Output Layer
The output layer produces a prediction or output based on the combination of the current input and the hidden state. This output can be used for various tasks, depending on the application, such as predicting the next word in a sentence or forecasting future values in a time series.
Now that you are familiar with the architecture of a simple recurrent neural network, let’s understand how exactly the outputs and hidden state values are calculated.
Inner Workings of a Recurrent Neural Network
At each step, the hidden state of the previous time step and the input data are multiplied with weight matrices, and the result is passed to an activation function, which introduces non-linearity to the computations. Some commonly used activation functions are:-
Sigmoid Function
Hyperbolic Tangent Function
Rectified Linear Unit
The hidden state and output for the step “t” is calculated using the following formulas:-
Where f is the activation function, weightinput, weighthidden, and weightoutput are weight matrices.
Recurrent neural networks are trained to achieve desired outputs by training them with a set of given inputs.
The training process involves the following steps:-
Forward Pass
For each time step in the input sequence, the RNN performs the computations as we discussed earlier and passes the hidden state to the next step.
Comparison with Target
The output generated by the RNN is compared to the actual target value at the corresponding time step. A loss value is calculated, which determines how far away is the predicted value from the actual value.
Accumulating Loss
The losses across all time steps are accumulated to calculate an overall loss value for the entire sequence, and the gradients are calculated with respect to the weight matrices used in the network, which indicates how much each weight matrix contributed to the error.
Backpropagation
The computed gradients are back propagated through each step to update the three weights we discussed before. This process allows the model to produce accurate outputs.
In the next section, we will implement a simple RNN character model with PyTorch.
Building a Simple RNN using PyTorch
We will build a model to complete a sentence based on a few characters or words as input. Before starting, you should install Numpy and Torch using the following command:-
pip install torch numpy
Once you have installed them, follow the steps below:-
Step 1: Importing Packages and Defining Sentences
import torch
from torch import nn
import numpy as np
text = ['hello how are you', 'i am fine', 'how are you doing']
chars = set(''.join(text))
numtochar = dict(enumerate(chars))
chartonum = {char: ind for ind, char in numtochar.items()}
Here, we have extracted unique characters from the combined sentences and created two dictionaries – one maps characters to integer indices, and the other maps integer indices back to characters.
Step 2: Padding Input Sentences
We will be padding the input sentences to ensure they all have the same length by adding spaces to the smaller sentences.
maxsize = len(max(text, key=len))
for i in range(len(text)):
while maxsize>len(text[i]):
text[i] += ' '
Step 3: Defining Input and Target Sequences
As the model will predict the next character in the sequence at each time step, we will have to divide each input sentence into two sequences:-
Input Sequence
Target Sequence
Then, we will convert these sequences to integers by mapping them using the dictionaries we created earlier.
iseq = []
tseq = []
for i in range(len(text)):
iseq.append(text[i][:-1])
tseq.append(text[i][1:])
print("Input Sequence: {}\nTarget Sequence: {}".format(iseq[i], tseq[i]))
for i in range(len(text)):
iseq[i] = [chartonum[character] for character in iseq[i]]
tseq[i] = [chartonum[character] for character in tseq[i]]
Output:
Step 4: One-Hot Encoding
This technique is used to represent categorical data, such as words or characters, in a binary format.
dsize = len(chartonum)
slen = maxsize - 1
bsize = len(text)
def encode(sequence, dize, slen, bsize):
features = np.zeros((bsize, slen, dsize), dtype=np.float32)
# Replacing 0s at the character index with 1
for i in range(bsize):
for u in range(slen):
features[i, u, sequence[i][u]] = 1
return features
iseq = encode(iseq, dsize, slen, bsize)
Here, we defined the encode() function that creates arrays of zeros for each character and replaces the corresponding character index with a 1.
Step 5: Converting Tensors
After all the data pre-processing, we can now move the data from Numpy arrays to Torch tensors.
We will define the RNN_Model class that inherits PyTorch’s base class. The constructor initializes the model's parameters – input_size (dimension of input data), output_size (dimension of output), dim (number of hidden units in the RNN), and layers (number of RNN layers).
class RNN_Model(nn.Module):
def __init__(self, input_size, output_size, dim, layers):
super(RNN_Model, self).__init__()
self.layer = layers
self.dim = dim
self.rnn = nn.RNN(input_size, dim, layers, batch_first=True)
self.fc = nn.Linear(dim, output_size)
def forward(self, x):
batch_size = x.size(0)
#Initializing the hidden state
hidden = self.init_first(batch_size)
# Passing the input and hidden state into the model
out, hidden = self.rnn(x, hidden)
out = out.contiguous().view(-1, self.dim)
out = self.fc(out)
return out, hidden
def init_first(self, batch_size):
# This method generates the first hidden state
hidden = torch.zeros(self.n_layers, batch_size, self.dim)
return hidden
The forward() method defines how data flows through the model during training, and the init_first() method initializes the hidden state for the RNN’s first input.
Step 8: Instantiating Model and Defining Hyperparameters
After defining the model above, we'll instantiate the model with the relevant parameters and define the following hyperparameters:-
n_epochs: This hyperparameter specifies the number of times the model will go through the training set.
lr: This is the learning rate that affects the rate at which our model updates the weights each time backpropagation is done.
We will also define the optimizer and loss function as well.
# Instantiating the model
model = RNN_Model(input_size=dsize, output_size=dsize, dim=12, layers=1)
# Defining hyperparameters
epochs = 100
lr=0.01
# Defining Loss and Optimizer functions
centloss = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
We will define two functions – predict() and sample(). We will use them to get an output from our model and convert it to text form.
def predict(model, character):
# Encoding the input to binary form
character = np.array([[chartonum[c] for c in character]])
character = encode(character, dsize, character.shape[1], 1)
character = torch.from_numpy(character)
character = character.to(device)
out, hidden = model(character)
prob = nn.functional.softmax(out[-1], dim=0).data
char_ind = torch.max(prob, dim=0)[1].item()
return numtochar[char_ind], hidden
def sample(model, out_len, start='hey'):
model.eval()
start = start.lower()
chars = [ch for ch in start]
size = out_len - len(chars)
for ii in range(size):
char, h = predict(model, chars)
chars.append(char)
return ''.join(chars)
Step 11: Testing
Now, we will test the model with some input words.
print(sample(model, 23, 'hello'))
print(sample(model, 23, 'hello i am'))
Output:
You can see that our model accurately completes the sentences using the input words.
In the next section, we will discuss the drawbacks of recurrent neural networks.
Drawbacks of Recurrent Neural Networks
The following are some limitations/drawbacks of traditional recurrent neural networks:-
Vanishing Gradient Problem
When loss gradients become extremely small as they are backpropagated, the network struggles to learn long-range relationships between distant time steps. This is known as the vanishing gradient problem.
Short-Term Memory
As the length of input sequences increases, hidden states cannot retain information from earlier steps. This leads to the loss of relevant information.
Exploding Gradient Problem
RNNs can also suffer from the exploding gradient problem, where gradients grow exponentially as they are propagated through steps. This can cause instability during training.
To deal with these problems, many advanced, recurrent neural network architectures have been developed over the years. We will discuss them briefly in the next section.
Variants of RNN
The following RNN architectures handle long-range dependencies and vanishing/exploding gradient issues:-
Long Short-Term Memory(LSTM)
These networks use gating mechanisms to handle long-range relationships and the vanishing gradient problem. These gating mechanisms control how much information is added or discarded from the cell state. LSTM uses the following gating mechanisms:-
Input Gate: Controls how much new information is added to the cell state.
Forget Gate: Controls what information should be discarded from the cell state.
Output Gate: Determines what information from the cell state should be included in the output.
Gated Recurrent Unit(GRU)
GRU is a simplified version of LSTM that combines the hidden and cell states into a single vector. It only uses the following two gating mechanisms:-
Reset Gate: This determines the amount of the previous hidden state that can be forgotten.
Update Gate: Decides how much of the new input and the previous hidden state will be combined.
LSTM and GRU architectures can capture long-range relationships and handle the vanishing gradient problem, but GRU is less complex and computationally intensive.
In the next section, we will take a look at some real-world applications of recurrent neural networks.
Real-World Applications of RNN
The following are some real-world applications of recurrent neural networks:-
Language Generation: RNNs are used to generate text, such as in chatbots, language translation, and text generation tasks.
Sentiment Analysis: RNNs can analyze the sentiment of a piece of text, helping determine the emotional tone of the content.
Speech Recognition: RNNs convert spoken language into written text, which can be used to implement voice assistants and transcription services.
Stock Price Prediction: RNNs can also predict stock prices by learning patterns and trends from historical data.
Weather Forecasting: RNNs are used to predict weather patterns and conditions based on historical weather data.
Frequently Asked Questions
What are weight matrices in an RNN?
Weight matrices in a Recurrent Neural Network (RNN) are learnable parameters that determine how input and hidden state information is transformed and combined during the network's computations.
What is an optimizer function?
An optimizer function in PyTorch is a tool used during neural network training to adjust the model's parameters (weights and biases) based on the calculated gradients.
What are hyperparameters?
Hyperparameters are the settings or configurations that are chosen before training a machine learning model. Unlike model parameters, which are learned during training, hyperparameters are set by the machine learning engineer or researcher.
Conclusion
In this article, you learned about the basics of a recurrent neural network. We looked at a simple RNN architecture and how it computes output values. We also built a simple RNN that can complete sentences based on some input words or characters.
You can read the following articles to learn more about machine learning:-