Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
What is a Recurrent Neural Network?
2.1.
What is Sequential Data?
3.
Architecture of a Recurrent Neural Network
3.1.
Input Layer
3.2.
Hidden State
3.3.
Recurrent Connections
3.4.
Weight Sharing 
3.5.
Output Layer
4.
Inner Workings of a Recurrent Neural Network
4.1.
Forward Pass
4.2.
Comparison with Target
4.3.
Accumulating Loss
4.4.
Backpropagation
5.
Building a Simple RNN using PyTorch
5.1.
Step 1: Importing Packages and Defining Sentences
5.2.
Step 2: Padding Input Sentences
5.3.
Step 3: Defining Input and Target Sequences
5.4.
Step 4: One-Hot Encoding
5.5.
Step 5: Converting Tensors
5.6.
Step 6: Defining the Model
5.7.
Step 8: Instantiating Model and Defining Hyperparameters
5.8.
Step 9: Training
5.9.
Step 10: Defining Helper Functions
5.10.
Step 11: Testing
6.
Drawbacks of Recurrent Neural Networks
6.1.
Vanishing Gradient Problem
6.2.
Short-Term Memory
6.3.
Exploding Gradient Problem
7.
Variants of RNN
7.1.
Long Short-Term Memory(LSTM)
7.2.
Gated Recurrent Unit(GRU)
8.
Real-World Applications of RNN
9.
Frequently Asked Questions
9.1.
What are weight matrices in an RNN?
9.2.
What is an optimizer function?
9.3.
What are hyperparameters?
10.
Conclusion
Last Updated: Mar 27, 2024
Hard

Recurrent Neural Networks with PyTorch

Author Abhinav Anand
0 upvote
Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

Recurrent Neural Networks (RNNs) are a class of artificial neural networks that are designed to tackle sequential data analysis. They have an internal memory mechanism that efficiently processes sequential inputs such as text, speech, and time-series data.

recurrent neural networks with pytorch

This article will teach you about Recurrent Neural Networks and build a simple language model using PyTorch.

Let’s get started.

What is a Recurrent Neural Network?

A Recurrent Neural Network (RNN) is a type of artificial neural network designed to process sequential data, like time series or natural language. It utilizes feedback connections, allowing the network to maintain a hidden state that captures information from previous steps in the sequence.

What is Sequential Data?

Sequential data involves inputs that are Interconnected through time, like words in a sentence or successive data points in a time series. The sequence of elements matters, and it impacts the interpretation or analysis. 

Some examples of sequential data include time series data (e.g., stock prices over days) and text sequences (e.g., book sentences).

In the next section, we will look at the architecture of a simple recurrent neural network.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Architecture of a Recurrent Neural Network

The architecture of an RNN is designed to handle sequential data by using feedback connections that allow the network to maintain a memory of past inputs. This memory, also known as the hidden state, enables the RNN to capture hidden patterns and temporal relationships in the input sequence.

architecture of a RNN

Let’s take a closer look at each component of an RNN.

Input Layer

At each sequence step, the RNN takes an input vector representing the current element in the sequence. For example, this could be the vector representation of a word in a given sentence in language processing.

Hidden State

The RNN maintains a hidden state that changes as the sequence progresses. The hidden state stores information from previous steps and serves as the memory of the network.

Recurrent Connections

The hidden state from the previous step is combined with the current input to produce an output and update the hidden state for the current step. This looping connection allows the RNN to capture relations among sequential inputs.

Weight Sharing 

The same set of weights and biases is used for each step in the sequence. This weight sharing enables the RNN to learn and apply the same patterns to different elements in the sequence.

Output Layer

The output layer produces a prediction or output based on the combination of the current input and the hidden state. This output can be used for various tasks, depending on the application, such as predicting the next word in a sentence or forecasting future values in a time series.

Now that you are familiar with the architecture of a simple recurrent neural network, let’s understand how exactly the outputs and hidden state values are calculated.

Inner Workings of a Recurrent Neural Network

At each step, the hidden state of the previous time step and the input data are multiplied with weight matrices, and the result is passed to an activation function, which introduces non-linearity to the computations. Some commonly used activation functions are:-

  • Sigmoid Function
     
  • Hyperbolic Tangent Function
     
  • Rectified Linear Unit
     

The hidden state and output for the step “t” is calculated using the following formulas:-

hiddent = f(weighthidden * hiddent-1 + weightinput * inputt)
outputt = weightoutput * hiddent

 

Where f is the activation function, weightinput, weighthidden, and weightoutput are weight matrices.

Recurrent neural networks are trained to achieve desired outputs by training them with a set of given inputs. 

The training process involves the following steps:-

Forward Pass

For each time step in the input sequence, the RNN performs the computations as we discussed earlier and passes the hidden state to the next step.

Comparison with Target

The output generated by the RNN is compared to the actual target value at the corresponding time step. A loss value is calculated, which determines how far away is the predicted value from the actual value.

Accumulating Loss

The losses across all time steps are accumulated to calculate an overall loss value for the entire sequence, and the gradients are calculated with respect to the weight matrices used in the network, which indicates how much each weight matrix contributed to the error.

Backpropagation

The computed gradients are back propagated through each step to update the three weights we discussed before. This process allows the model to produce accurate outputs.

In the next section, we will implement a simple RNN character model with PyTorch.

Building a Simple RNN using PyTorch

We will build a model to complete a sentence based on a few characters or words as input. Before starting, you should install Numpy and Torch using the following command:-

pip install torch numpy

 

Once you have installed them, follow the steps below:-

Step 1: Importing Packages and Defining Sentences

import torch
from torch import nn
import numpy as np

text = ['hello how are you', 'i am fine', 'how are you doing']

chars = set(''.join(text))

numtochar = dict(enumerate(chars))

chartonum = {char: ind for ind, char in numtochar.items()}

 

Here, we have extracted unique characters from the combined sentences and created two dictionaries – one maps characters to integer indices, and the other maps integer indices back to characters.

Step 2: Padding Input Sentences

We will be padding the input sentences to ensure they all have the same length by adding spaces to the smaller sentences.

maxsize = len(max(text, key=len))
for i in range(len(text)):
    while maxsize>len(text[i]):
        text[i] += ' '

Step 3: Defining Input and Target Sequences

As the model will predict the next character in the sequence at each time step, we will have to divide each input sentence into two sequences:-

  • Input Sequence
     
  • Target Sequence
     

Then, we will convert these sequences to integers by mapping them using the dictionaries we created earlier.

iseq = []
tseq = []

for i in range(len(text)):
    iseq.append(text[i][:-1])
    tseq.append(text[i][1:])
    print("Input Sequence: {}\nTarget Sequence: {}".format(iseq[i], tseq[i]))

for i in range(len(text)):
    iseq[i] = [chartonum[character] for character in iseq[i]]
    tseq[i] = [chartonum[character] for character in tseq[i]]

 

Output:

output

Step 4: One-Hot Encoding

This technique is used to represent categorical data, such as words or characters, in a binary format.

dsize = len(chartonum)
slen = maxsize - 1
bsize = len(text)

def encode(sequence, dize, slen, bsize):
    features = np.zeros((bsize, slen, dsize), dtype=np.float32)
    
    # Replacing 0s at the character index with 1
    for i in range(bsize):
        for u in range(slen):
            features[i, u, sequence[i][u]] = 1
    return features

iseq = encode(iseq, dsize, slen, bsize)

 

Here, we defined the encode() function that creates arrays of zeros for each character and replaces the corresponding character index with a 1.

Step 5: Converting Tensors

After all the data pre-processing, we can now move the data from Numpy arrays to Torch tensors.

tseq = torch.Tensor(tseq)
iseq = torch.from_numpy(iseq)

Step 6: Defining the Model

We will define the RNN_Model class that inherits PyTorch’s base class. The constructor initializes the model's parameters – input_size (dimension of input data), output_size (dimension of output), dim (number of hidden units in the RNN), and layers (number of RNN layers).

class RNN_Model(nn.Module):
    def __init__(self, input_size, output_size, dim, layers):
        super(RNN_Model, self).__init__()

        self.layer = layers
        self.dim = dim

        self.rnn = nn.RNN(input_size, dim, layers, batch_first=True)   
        self.fc = nn.Linear(dim, output_size)
    
    def forward(self, x):
        
        batch_size = x.size(0)

        #Initializing the hidden state
        hidden = self.init_first(batch_size)

        # Passing the input and hidden state into the model
        out, hidden = self.rnn(x, hidden)
        
        out = out.contiguous().view(-1, self.dim)
        out = self.fc(out)
        
        return out, hidden
    
    def init_first(self, batch_size):
        # This method generates the first hidden state
        hidden = torch.zeros(self.n_layers, batch_size, self.dim)
        return hidden

 

The forward() method defines how data flows through the model during training, and the init_first() method initializes the hidden state for the RNN’s first input.

Step 8: Instantiating Model and Defining Hyperparameters

After defining the model above, we'll instantiate the model with the relevant parameters and define the following hyperparameters:-

  • n_epochs: This hyperparameter specifies the number of times the model will go through the training set.
     
  • lr: This is the learning rate that affects the rate at which our model updates the weights each time backpropagation is done.
     

We will also define the optimizer and loss function as well.

# Instantiating the model
model = RNN_Model(input_size=dsize, output_size=dsize, dim=12, layers=1)

# Defining hyperparameters
epochs = 100
lr=0.01

# Defining Loss and Optimizer functions
centloss = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

Step 9: Training

In this step, we will start training our model.

device = torch.device("cpu")
iseq = iseq.to(device)
for epoch in range(1, epochs + 1):
    optimizer.zero_grad()
    output, hidden = model(iseq)
    output = output.to(device)
    tseq = tseq.to(device)
    loss = centloss(output, tseq.view(-1).long())
    loss.backward() # Performs backpropagation
    optimizer.step() # Updates weight matrices
    
    if epoch%10 == 0:
        print('Epoch: {}/{}.............'.format(epoch, epochs), end=' ')
        print("Loss: {:.4f}".format(loss.item()))

 

Output: 

output

Step 10: Defining Helper Functions

We will define two functions – predict() and sample(). We will use them to get an output from our model and convert it to text form.

def predict(model, character):
    # Encoding the input to binary form
    character = np.array([[chartonum[c] for c in character]])
    character = encode(character, dsize, character.shape[1], 1)
    character = torch.from_numpy(character)
    character = character.to(device)
    
    out, hidden = model(character)

    prob = nn.functional.softmax(out[-1], dim=0).data
    char_ind = torch.max(prob, dim=0)[1].item()

    return numtochar[char_ind], hidden
def sample(model, out_len, start='hey'):
    model.eval()
    start = start.lower()
    chars = [ch for ch in start]
    size = out_len - len(chars)
    for ii in range(size):
        char, h = predict(model, chars)
        chars.append(char)

    return ''.join(chars)

Step 11: Testing

Now, we will test the model with some input words.

print(sample(model, 23, 'hello'))
print(sample(model, 23, 'hello i am'))

 

Output:

output

 

You can see that our model accurately completes the sentences using the input words.

In the next section, we will discuss the drawbacks of recurrent neural networks.

Drawbacks of Recurrent Neural Networks

The following are some limitations/drawbacks of traditional recurrent neural networks:-

Vanishing Gradient Problem

When loss gradients become extremely small as they are backpropagated, the network struggles to learn long-range relationships between distant time steps. This is known as the vanishing gradient problem.

Short-Term Memory

As the length of input sequences increases, hidden states cannot retain information from earlier steps. This leads to the loss of relevant information.

Exploding Gradient Problem

RNNs can also suffer from the exploding gradient problem, where gradients grow exponentially as they are propagated through steps. This can cause instability during training.

To deal with these problems, many advanced, recurrent neural network architectures have been developed over the years. We will discuss them briefly in the next section.

Variants of RNN

The following RNN architectures handle long-range dependencies and vanishing/exploding gradient issues:-

Long Short-Term Memory(LSTM)

These networks use gating mechanisms to handle long-range relationships and the vanishing gradient problem. These gating mechanisms control how much information is added or discarded from the cell state. LSTM uses the following gating mechanisms:-

  1. Input Gate: Controls how much new information is added to the cell state.
     
  2. Forget Gate: Controls what information should be discarded from the cell state.
     
  3. Output Gate: Determines what information from the cell state should be included in the output.

Gated Recurrent Unit(GRU)

GRU is a simplified version of LSTM that combines the hidden and cell states into a single vector. It only uses the following two gating mechanisms:-

  1. Reset Gate: This determines the amount of the previous hidden state that can be forgotten.
     
  2. Update Gate: Decides how much of the new input and the previous hidden state will be combined.

 

LSTM and GRU architectures can capture long-range relationships and handle the vanishing gradient problem, but GRU is less complex and computationally intensive.

In the next section, we will take a look at some real-world applications of recurrent neural networks.

Real-World Applications of RNN

The following are some real-world applications of recurrent neural networks:-

  • Language Generation: RNNs are used to generate text, such as in chatbots, language translation, and text generation tasks.
     
  • Sentiment Analysis: RNNs can analyze the sentiment of a piece of text, helping determine the emotional tone of the content.
     
  • Speech Recognition: RNNs convert spoken language into written text, which can be used to implement voice assistants and transcription services.
     
  • Stock Price Prediction: RNNs can also predict stock prices by learning patterns and trends from historical data.
     
  • Weather Forecasting: RNNs are used to predict weather patterns and conditions based on historical weather data.

Frequently Asked Questions

What are weight matrices in an RNN?

Weight matrices in a Recurrent Neural Network (RNN) are learnable parameters that determine how input and hidden state information is transformed and combined during the network's computations.

What is an optimizer function?

An optimizer function in PyTorch is a tool used during neural network training to adjust the model's parameters (weights and biases) based on the calculated gradients. 

What are hyperparameters?

Hyperparameters are the settings or configurations that are chosen before training a machine learning model. Unlike model parameters, which are learned during training, hyperparameters are set by the machine learning engineer or researcher.

Conclusion

In this article, you learned about the basics of a recurrent neural network. We looked at a simple RNN architecture and how it computes output values. We also built a simple RNN that can complete sentences based on some input words or characters.

You can read the following articles to learn more about machine learning:-

 

Happy Learning!

Live masterclass