Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Last Updated: Mar 27, 2024
Difficulty: Medium

LSTMs and Bi-LSTM in PyTorch

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

STM stands for “Short-Term Memory” in artificial intelligence, which is basically used for deep sequential modeling. STM is a memory system that basically stores and manipulates information for a shorter period. LSTM and Bi-LSTM are the two types under the STM (Short-Time Memory).

LSTMs and Bi-LSTM in PyTorch

In the article “LSTMs and Bi-LSTM in PyTorch”, we will discuss what are LSTM and Bi-LSTM with their differences.

LSTM in PyTorch

In this section of the article “LSTMs and Bi-LSTM in PyTorch”, we will first discuss LSTM in PyTorch. LSTM is an advanced version (or an extension) of recurrent neural networks (RNNs) that was designed to model chronological sequences and long-range dependencies more precisely than the conventional RNNs.

Two of the advantages of LSTM over recurrent neural networks is that recurrent neural networks fail when RNNs cannot process longer sequences or suffer from short-term memory.

Structure of LSTM

In this sub-section of the article “LSTMs and Bi-LSTM in PyTorch”, we will discuss the structure of LSTM. LSTM carries the contextual information using something called “gates”. The four layers are present that interact with each other in order to produce the output of the cell along with its state; here is the structure of LSTM below:

structure of lstm

In the structure of LSTM, there are three gates that are contained by the LSTM. Here are the following gates:

  • Forget Gate: Forget gate helps LSTM to decide which information is to be forgotten away from the memory.
     
  • Input Gate: The input gate helps LSTM to decide which information to be added to the memory.
     
  • Output Gate: The output gate helps LSTM to take a new memory and to decide which part should be used to generate the output or to make an answer.

Applications of LSTM

In this sub-section of the article “LSTMs and Bi-LSTM in PyTorch”, we will discuss the advantages of LSTM. There are various applications of LSTM; here are some of the following:

  • It can be used in language modeling and text generation, which basically involves the computation of words.
     
  • It can also be used in speech and handwriting recognition.
     
  • It can be used in mapping a sequence in one language to a sequence in another language.
     
  • It can be used in predicting the musical notes which can be beneficial in music generation.
Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Bi-LSTM in PyTorch

In this section of the article “LSTMs and Bi-LSTM in PyTorch”, we will discuss Bi-LSTM in PyTorch. Bi-LSTM stands for Bidirectional LSTMs, which are the extension of the traditional LSTM that is basically used to improve the model performance. There are different real-life applications of Bi-LSTM, which are sequence classification, speech recognition, and forecasting models.

A Bi-LSTM consists of two LSTMs, where the first LSTM takes the input in the forward direction, and the second LSTM takes the input in the backward or reverse direction.

Structure of Bi-LSTM

In this sub-section of the article “LSTMs and Bi-LSTM in PyTorch”, we will discuss the structure of Bi-LSTM. As we have just discussed that Bi-LSTM consists of two LSTMs, first for the forward and second for the backward or reverse direction.

structure of bi-lstm

(src: ieeexplore.ieee.org)

Advantages of Bi-LSTM over LSTM

In this sub-section of the article “LSTMs and Bi-LSTM in PyTorch”, we will discuss the advantages of Bi-LSTM over LSTM. There are various advantages of Bi-LSTM over LSTM, which are as follows:

  • It requires less resources and time to get trained and become ready for real-world application.
     
  • It reduces the case of overfitting, whereas LSTM is prone to overfitting, which can create difficulty in some cases.
     
  • It can take the inputs in both directions (forward and backward), unlike LSTM.
     
  • It also improves the model performance of LSTM models.

Implementation of LSTM in PyTorch

In this sub-section of the article “LSTMs and Bi-LSTM in PyTorch”, we will discuss the implementation of LSTM in PyTorch. we will discuss and see how can we use LSTM in PyTorch. We will take an example of the clean jokes dataset, which will be available in the form of a CSV file. We will train the LSTM network to create a text generation model that will be going to predict the next words given a series of words.

Importing Libraries

Our first step should be to import all the necessary libraries to use LSTM in PyTorch, which you can do with the given below code:

Code

  • Python

Python

import torch
from torch import nn
import pandas as pd
from collections import Counter
import argparse
import numpy as np
from torch import nn, optim
from torch.utils.data import DataLoader, Dataset

Creating Dataset Class

After importing all the necessary libraries, we need to create a dataset class that will basically load the individual samples of data from the file, which can be done with the following code:

Code

  • Python

Python

# custom dataset class to load the individual samples of data from csv file
class Dataset(Dataset):
    def __init__(self):
        self.words = self.load_words()
        self.uniq_words = self.get_uniq_words()


        self.index_to_word = {index: word for index, word in enumerate(self.uniq_words)}
        self.word_to_index = {word: index for index, word in enumerate(self.uniq_words)}


        self.words_indexes = [self.word_to_index[w] for w in self.words]


    def load_words(self):
        train_df = pd.read_csv('cleanjokes.csv')
        text = train_df['Joke'].str.cat(sep=' ')
        return text.split(' ')


    def get_uniq_words(self):
        word_counts = Counter(self.words)
        return sorted(word_counts, key=word_counts.get, reverse=True)


    # returns the length of the dataset
    def __len__(self):
        return len(self.words_indexes) - 4


    def __getitem__(self, index):
        return (
            torch.tensor(self.words_indexes[index:index+4]),
            torch.tensor(self.words_indexes[index+1:index+4+1]))


Explanation

In the above code, we created a class named “Dataset”, where we basically load the individual samples of data from the ‘cleanjokes.csv’ file. We load each and every word from the file with the parameter passed Joke, which you can assume as a column name of the table containing all the jokes.

Creating Model

After creating the dataset class, we need to create a custom model that will define the model architecture. Here is the below code for the same:

Code

  • Python

Python

# custom model class to define the model architecture
class Model(nn.Module):
    def __init__(self, dataset):
        super(Model, self).__init__()
        self.lstm_size = 128
        self.embedding_dim = 128
        self.num_layers = 3

        n_vocab = len(dataset.uniq_words)

        # embedding layer
        self.embedding = nn.Embedding(
            num_embeddings=n_vocab,
            embedding_dim=self.embedding_dim,
        )

        # lstm layer
        self.lstm = nn.LSTM(
            input_size=self.lstm_size,
            hidden_size=self.lstm_size,
            num_layers=self.num_layers,
            dropout=0.1,
        )

        # fully connected layer
        self.fc = nn.Linear(self.lstm_size, n_vocab)

    def forward(self, x, prev_state):
        embed = self.embedding(x)
        output, state = self.lstm(embed, prev_state)
        logits = self.fc(output)
        return logits, state

    def init_state(self, sequence_length):
        return (torch.zeros(self.num_layers, sequence_length, self.lstm_size),
                torch.zeros(self.num_layers, sequence_length, self.lstm_size))

 

Explanation

In the above code, we created a class named “Model”, where is basically defines the model architecture. We created an LSTM layer using an LSTM object with the passed parameters like input_size, hidden_size, num_layers, and dropout, that is fully connected.

Training the Model

Here comes the first main part of the implementation, which is to train the model on the given dataset (.csv file). So here is the given function below to train the model:

Code

  • Python

Python

def train(dataset, model):
    model.train()

    dataloader = DataLoader(dataset, batch_size=64)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters())

    for epoch in range(7):
        state_h, state_c = model.init_state(4)

        for batch, (x, y) in enumerate(dataloader):

            y_pred, (state_h, state_c) = model(x, (state_h, state_c))
            loss = criterion(y_pred.transpose(1, 2), y)

            state_h = state_h.detach()
            state_c = state_c.detach()
           
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            print({ 'epoch number=': epoch, ' batch numer=': batch, ' loss value=': loss.item() })

 

Explanation

In the above code, we created a function called train() with 2 arguments as dataset and model that we created initially. In this function, iteration is done 7 times which shows the number of passes to train the model on the given dataset (.csv file). Then epoch, batch number, along with the loss value, is printed for each iteration.

Testing the Model

Here comes the first main part of the implementation, which is to test the model with the given series of words after training the model on the given dataset (.csv file). So here is the given function below to train the model:

Code

  • Python

Python

def predict(dataset, model, text, next_words=50):
    model.eval()

    words = text.split(' ')
    state_h, state_c = model.init_state(len(words))

    for i in range(0, next_words):
        x = torch.tensor([[dataset.word_to_index[w] for w in words[i:]]])
        y_pred, (state_h, state_c) = model(x, (state_h, state_c))

        last_word_logits = y_pred[0][-1]
        p = torch.nn.functional.softmax(last_word_logits, dim=0).detach().numpy()
        word_index = np.random.choice(len(last_word_logits), p=p)
        words.append(dataset.index_to_word[word_index])

    return words

dataset = Dataset()
model = Model(dataset)

train(dataset, model)
print(predict(dataset, model, text='do sick boats'))

 

Explanation

In the above code, a function is created called predict() with the arguments dataset, model and the series of words by which the prediction of the joke will be made. The text passed is ‘do sick boats’, which is nothing but a part of a joke, so the joke it will return is ‘Where do sick boats go? The dock!’.

Execution of Code

Here, the implementation of LSTM in Pytorch is completed; the code can be executed using the below command:

Code

  • Python Command

Python Command

python lstm_implementation.py

 

Output

lstm output


Explanation

Here, the output is the joke that matches the given series of words (‘do sick boats’) which is ‘Where do sick boats go? The dock!’.

Implemenation of Bi-LSTM in PyTorch

In this sub-section of the article “LSTMs and Bi-LSTM in PyTorch”, we will discuss the implementation of Bi-LSTM in PyTorch.

Importing Libraries

Our first step should be to import all the necessary libraries to use Bi-LSTM in PyTorch, which you can do with the given below code:

Code:

  • Python

Python

import torch
import torch.nn as nn  
import torch.optim as optim  
import torch.nn.functional as F  
from torch.utils.data import DataLoader, Dataset  
import torchvision.datasets as datasets  
import torchvision.transforms as transforms  

# Set device
device = torch.device("cpu")

Setting Hyperparameters

After importing libraries, some hyperparameters and variables should be set to build and train the model. Here is the below code for the same:

Code:

  • Python

Python

# Hyperparameters
learning_rate = 0.01
batch_size = 64
num_epochs = 4
input_size = 28
sequence_length = 28
num_layers = 2
hidden_size = 256
num_classes = 10


Explanation:

In the above code, some variables are set such as learning rate, batch size, input size, hidden size, sequence length, etc.

Load Data

Now, the data can be loaded using the MNIST dataset, which is basically provided by torchvision module. Here is the below code for the same:

Code:

  • Python

Python

# Load Data
train_dataset = datasets.MNIST( root="dataset/", train=True, transform=transforms.ToTensor(), download=True)

test_dataset = datasets.MNIST( root="dataset/", train=False, transform=transforms.ToTensor(), download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=True)


Explanation:

In the above code, the data is loaded using loaders where the train_loader is using train_dataset and test_loader is using test_dataset.

Bi-LSTM Model Class

Now, after data is loaded and the custom model class is inherited, the base class of the class.nn module. Here is the below code for the same:

Code:

  • Python

Python

# Create a bidirectional LSTM model class
class BiLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(BiLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hidden_size * 2, num_classes)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(device)

        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])

        return out


Explanation:

In the above, a class is created named ‘BiLSTM’ where the forward() function is created, As Bi-LSTM works for forward and backward both.

Setting Custom Network

Now, after creating the Bi-LSTM model class is created, which needs to be initialized. Here is the below code for the same:

Code:

  • Python

Python

# Initialize the custom network
model = BiLSTM(input_size, hidden_size, num_layers, num_classes).to(device)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)


Explanation:

In the above code, the model ‘BiLSTM’ is initialized, and criterion and optimizer are created for entropy loss and learning rate, respectively.

Training the Model

Here comes the first main part of the implementation, which is to train the model on the given dataset. So here is the given function below to train the model:

Code:

  • Python

Python

# Train Network
for epoch in range(num_epochs):
    for batch_idx, (data, targets) in enumerate(train_loader):
        # Get data to cuda if possible
        data = data.to(device=device).squeeze(1)
        targets = targets.to(device=device)

        # forward
        scores = model(data)
        loss = criterion(scores, targets)

        # backward
        optimizer.zero_grad()
        loss.backward()

        # gradient descent or adam step
        optimizer.step()


Explanation:

In the above code, the model is trained where the value of num_epochs is 4, so the number of passes will be 4. Now both forward and backward reading is done as Bi-LSTM works for the backward and forward input both.

Checking Accuracy

Here comes the first main part of the implementation, which is to find the accuracy of the model on the given dataset. So here is the given function below to train the model:

Code:

  • Python

Python

def check_accuracy(loader, model):
    if loader.dataset.train:
        print("Checking accuracy on the training data")
    else:
        print("Checking accuracy on the test data")

    num_correct = 0
    num_samples = 0
    model.eval()

    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device).squeeze(1)
            y = y.to(device=device)

            scores = model(x)
            _, predictions = scores.max(1)
            num_correct += (predictions == y).sum()
            num_samples += predictions.size(0)


        print(f"Correct examples / total examples = {num_correct} / {num_samples} with accuracy \{float(num_correct)/float(num_samples)*100:.2f}")

    model.train()

check_accuracy(train_loader, model)
check_accuracy(test_loader, model)


Explanation:

In the above code, the accuracy function is created, and the function check_accuracy() is called with the arguments as train_loader and test_loader along with the model.

Execution of Code

Here, the implementation of Bi-LSTM in Pytorch is completed; the code can be executed using the below command:

Code

  • Python Command

Python Command

python bi_lstm_implement.py

 

Output

bi-lstm output

Explanation:

In the above output, the accuracy for training and testing the data is printed as 95.8% and 95.56%.

Difference between LSTM and Bi-LSTM

In this section of the article “LSTMs and Bi-LSTM in PyTorch”, we will discuss the difference between LSTM and Bi-LSTM. Here is the table below to show the difference:

Basis LSTM Bi-LSTM
Definition It is an advanced version (or an extension) of recurrent neural networks (RNNs) that was designed to model chronological sequences. These are the extension of the traditional LSTM that is basically used to improve the model performance.
Applications There are several applications, such as Robot Control, Speech Recognition, Music Composition, and handwriting recognition. There are different real-life applications of Bi-LSTM, which are sequence classification, speech recognition, and forecasting models.
Resources and Time It requires more resources and time to get trained on the dataset. It requires less resources and time to get trained on the dataset.
Directions of Input It can only take inputs in the forward direction. It can take the inputs in both directions (forward and backward).

 
Model Performance It improves the model performance but is slower than Bi-LSTM. It improves the model performance of LSTM models (better than LSTM does).

Frequently Asked Questions

What is Deep Sequence Modelling?

Deep Sequence Models are the type of models where the information at different timestamps can be used in stock prediction, medical diagnostics, climate change, and autonomous driving.

In which IDEs Pytorch can be used?

There are various IDEs (Integrated Development Environments) where you can do the implementation, such as Microsoft Visual Studio Code, Sublime Text, Pycharm, Jupyter Notebook, and Notepad.

Why to use the Pytorch library?

Pytorch provides good flexibility and high speeds for deep neural networks and can be very beneficial in applications such as image classification, object detection, and generative tasks.

Why LSTM is better than RNN?

In RNN, the longer sequence is challenging where LSTM can be used. Using LSTM, the vanishing gradient problem does not create, whereas RNN can create.

Conclusion

Both LSTM and Bi-LSTM are used for deep sequential modeling where Bi-LSTM is an extension of LSTM which increases the performance of the LSTM model. In the article “LSTMs and Bi-LSTM in PyTorch”, we discussed what is LSTM with its structure and applications. Then we discussed what is Bi-LSTM with its structure and its advantages over LSTM. Lastly, the implementation of LSTM in PyTorch is done.

Here are more articles that are recommended to read:

 

You can refer to our guided paths on the Coding Ninjas. You can check our course to learn more about DSADBMSCompetitive ProgrammingPythonJavaJavaScript, etc. 

Also, check out some of the Guided Paths on topics such as Data Structure and AlgorithmsCompetitive ProgrammingOperating SystemsComputer Networks, DBMS, and System Design, etc. as well as some Contests, Test Series, and some Interview Experiences curated by top Industry Experts.

Happy Learning!

Topics covered
1.
Introduction
2.
LSTM in PyTorch
2.1.
Structure of LSTM
2.2.
Applications of LSTM
3.
Bi-LSTM in PyTorch
3.1.
Structure of Bi-LSTM
3.2.
Advantages of Bi-LSTM over LSTM
4.
Implementation of LSTM in PyTorch
4.1.
Importing Libraries
4.2.
Python
4.3.
Creating Dataset Class
4.4.
Python
4.5.
Creating Model
4.6.
Python
4.7.
Training the Model
4.8.
Python
4.9.
Testing the Model
4.10.
Python
4.11.
Execution of Code
4.12.
Python Command
5.
Implemenation of Bi-LSTM in PyTorch
5.1.
Importing Libraries
5.2.
Python
5.3.
Setting Hyperparameters
5.4.
Python
5.5.
Load Data
5.6.
Python
5.7.
Bi-LSTM Model Class
5.8.
Python
5.9.
Setting Custom Network
5.10.
Python
5.11.
Training the Model
5.12.
Python
5.13.
Checking Accuracy
5.14.
Python
5.15.
Execution of Code
5.16.
Python Command
6.
Difference between LSTM and Bi-LSTM
7.
Frequently Asked Questions
7.1.
What is Deep Sequence Modelling?
7.2.
In which IDEs Pytorch can be used?
7.3.
Why to use the Pytorch library?
7.4.
Why LSTM is better than RNN?
8.
Conclusion