Table of contents
1.
Introduction
2.
Understanding Transformers
3.
Encoder in Transformer Model 
4.
Decoder in Transformer Model
5.
Hugging Face Transformer API
6.
Classifying texts with transformers in PyTorch
7.
Tokenizing
8.
Getting features
9.
Creating the features dataset
10.
Dataset class
11.
Custom Pytorch model
12.
Training and testing
13.
Frequently Asked Questions
13.1.
What is the use of self-attention in transformers in PyTorch?
13.2.
What is the Hugging face transformer API?
13.3.
What are tokens in Tokenization?
13.4.
What are transformers in Pytorch?
14.
Conclusion 
Last Updated: Mar 27, 2024
Hard

Transformers in PyTorch

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Pytorch is an open source Python framework used to build machine learning models for image recognition and natural language processing. Transformers in PyTorch are a type of neural network model that translate and write text. These models are mainly based on the concept of self attention and learn from the different patterns in the text. They are, in fact, an important part of Natural Language Processing (NLP).  

Transformers in PyTorch

In this article, we will discuss transformers in PyTorch in detail, along with their implementation.

Understanding Transformers

Transformers in PyTorch are a class of neural network models that are used widely in Natural language processing (NLP). The flexible framework of PyTorch is very useful in building and training transformer models. Software like BERT, GPT3, and T5 are based on transformers. They are mainly used in NLP tasks like text classification, machine translation, analysis of emotions, etc.

Encoder in Transformer Model 

The encoder is a part of the transformer model which is made up of three parts. They are the linear layer, the multi-headed attention with eight heads, and the feed-forward neural network. The encoder's job is to process the input sequence and record all the useful information. This is done by using different attention focuses on different parts of the sequence. 

The dimension of the input layer at the encoder side is 2048, and that at the output side is 512.

Decoder in Transformer Model

The decoder is used in the transformer model to generate texts and translate languages. It works in a just opposite way to the encoder. The decoder also consists of many layers with the two important one being the self attention and the feed forward neural network. It takes in two inputs:

  • It takes the output from the encoder as input for multi-headed attention.
  • Target sentence embeddings are also taken as input in the decoder.

Hugging Face Transformer API

Transformers in PyTorch can be implemented using the Hugging Face API. The Hugging Face Transformer API is a library in Python that provides a user-friendly interface to work on transformer-based models in NLP.

To install the Hugging face transformer, we can use the command:

Code

pip install transformers
You can also try this code with Online Python Compiler
Run Code

 

Output

Output1

 

To download a pre-trained model, we can type the code as given below:

Code

from transformers import AutoTokenizer, AutoModel

# The tokenizer corresponding to a pre-trained model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Instantiating the weights from a pre-trained model
model = AutoModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("Introduction to transformers", return_tensors="pt")
outputs = model(**inputs)
print(outputs)
You can also try this code with Online Python Compiler
Run Code

 

Output

Output2

For different tasks, we can write the code as given below.

Code

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Tokenizer and model instances
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")

# Two possible outcomes
classes = ["not paraphrase", "is paraphrase"]

# sequences to classify
sequence_0 = "The company Coding Ninjas is based in India"
sequence_1 = "AMangoes are good for your health"
sequence_2 = "Coding Ninjas headquarters are situated in Gurugram"

# Getting tokenizer outputs
paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt")
not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt")

# Outputs from the tokenizer are fed into the model object
paraphrase_classification_logits = model(**paraphrase).logits
not_paraphrase_classification_logits = model(**not_paraphrase).logits

# get the probability scores
paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=1).tolist()[0]

# Should be paraphrased
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")
    
# Should not be paraphrased
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")
You can also try this code with Online Python Compiler
Run Code


Output

Output3

Explanation

With the help of the BERT model and softmax, the probability of paragraphs and paraphrases are printed accordingly.

Next, let us implement the code for classifying texts with transformers in Pytorch.

Classifying texts with transformers in PyTorch

Text classification is a very important part of NLP. It is used in detecting spam emails, sentiment detection, etc. 

We first need to install three libraries to build this text classification model.

Code

pip install transformers
pip install datasets
pip install tokenizers
You can also try this code with Online Python Compiler
Run Code

 

Now, the dataset library offers many datasets for different tasks. The Hugging Face API can be used to access the datasets. Here we will use the emotions dataset with the help of the load_dataset function.

Code

from datasets import load_dataset
emotion = load_dataset("emotions")
print(emotion)
You can also try this code with Online Python Compiler
Run Code

Tokenizing

The 'AutoTokenizer' class in the transformers library makes the process of loading tokenizers for our pre-trained models in very easy way. In this example, we will be using the distill-BERT version. This provides a ready-to-use model and creates a tool that breaks the text into smaller parts. The code below is the implementation of the same.

Code

from transformers import AutoModel, AutoTokenizer
import torch
model_checkpoint = "distilbert-base-uncased"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModel.from_pretrained(model_checkpoint).to(device)
You can also try this code with Online Python Compiler
Run Code

 

Next, with the help of the hugging face transformer library, we can tokenize a dataset using a tokenizer instance.

Code

def custom_tokenize(input_batch):
    return custom_tokenizer(input_batch["content"], pad_to_max_length=True, truncation=True)
encoded_emotions = emotions_dataset.map(custom_tokenize, batch_size=None, batched=True)
print(encoded_emotions)
You can also try this code with Online Python Compiler
Run Code


Output

output4

Getting features

Here, we will be using a pre-trained model to understand and get the features from the text. The final hidden state generated from the input fed at the transformer model is used to classify texts. We will produce the last hidden states while operating within the 'torch.no_grad' mode.

Now we will map the encoded_emotions object to receive the states.

Code

def extract_hidden_states(input_batch):
    """Transfering input dictionary values to the specified device and computes model output/last hidden states."""
    
    inputs = {key: value.to(device) for key, value in input_batch.items() if key in custom_tokenizer.model_input_names}
    with torch.no_grad():
        last_hidden_state = custom_model(**inputs).last_hidden_state
    return {"hidden_state": last_hidden_state[:, 0].cpu().numpy()}
encoded_emotions.set_format("torch", columns=["input_ids", "attention_mask", "target_label"])
emotions_hidden_states = encoded_emotions.map(extract_hidden_states, batched=True)
print(emotions_hidden_states)
You can also try this code with Online Python Compiler
Run Code


Output

output5

Creating the features dataset

Next, the hidden_state column and numpy and pandas are used to create the dataset.

Code

import numpy as np

# Extract the hidden states from the mapped dataset
X_train_features = np.array(custom_emotions_hidden['train']['hidden_state'])
X_valid_features = np.array(custom_emotions_hidden["validation"]["hidden_state"])
y_train_labels = np.array(custom_emotions_hidden["train"]["label"])
y_valid_labels = np.array(custom_emotions_hidden["validation"]["label"])

# Check the shapes of the extracted features 
print("Shape of training features:", X_train_features.shape)
print("Shape of validation features:", X_valid_features.shape)
You can also try this code with Online Python Compiler
Run Code


The length of the hidden state vector is 768. The training set contains 16000 examples, while the test set contains 2000 examples.

The dataFrame is built using the pandas library, as shown below.

Code

import pandas as pd

# Concatenate X_train data and y_train labels
train_data = pd.concat([pd.DataFrame(train_X), pd.Series(train_y)], axis=1)

# Concatenate X_valid data and y_valid labels
valid_data = pd.concat([pd.DataFrame(valid_X), pd.Series(valid_y)], axis=1)
You can also try this code with Online Python Compiler
Run Code

Dataset class

Next, we will be creating our custom PyTorch dataset class to load the dataset objects.

Code

class CustomDataset(torch.utils.data.Dataset):

    def __init__(self, dataframe):
        # Assign the dataframe to a class attribute
        self.dataframe = dataframe
        # Extract the feature columns and target column
        x_features = dataframe.iloc[:, 0:768].values
        y_targets = dataframe.iloc[:, 768].values
        # Convert feature and target arrays to tensors
        self.features = torch.tensor(x_features, dtype=torch.float32)
        self.targets = torch.tensor(y_targets, dtype=torch.int32)
        
    def __len__(self):
        # Return the length of the dataset (number of samples)
        return len(self.targets)
        
    def __getitem__(self, idx):
        # Retrieve the feature and target for a specific index
        feature_sample = self.features[idx]
        target_sample = self.targets[idx]
        return feature_sample, target_sample
You can also try this code with Online Python Compiler
Run Code

Custom Pytorch model

We will define a PyTorch model to design a feed-forward neural network.

Code

# Check if GPU is available, otherwise use CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define input size, number of classes, learning rate, batch size, and epochs
input_feat_size = 768
num_output_classes = pd.Series(emotions['train']['label']).unique().size
learn_rate = 0.001
batch_size_val = 64
num_epochs = 3

# Prepare training dataset and data loader
training_dataset = MyDataset(train_data)
train_data_loader = DataLoader(dataset=training_dataset, batch_size=batch_size_val, shuffle=True)

# Utilize the validation portion as the test set in this context
testing_dataset = MyDataset(validation_data)
test_data_loader = DataLoader(dataset=testing_dataset, batch_size=batch_size_val, shuffle=True)

# Create an instance of the neural network model and move it to the designated device (GPU or CPU)
nn_model = NeuralNetwork(input_feat_size, num_output_classes).to(device)

# Define the loss criterion for the classification task
loss_criterion = nn.CrossEntropyLoss()

# Set up the Adam optimizer to update the model parameters
model_optimizer = optim.Adam(nn_model.parameters(), lr=learn_rate)
You can also try this code with Online Python Compiler
Run Code


Creating database objects for parallel batched loading of data examples.

Code

# Check if GPU is available, otherwise use CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define input size, number of classes, learning rate, batch size, and epochs
input_feat_size = 768
num_output_classes = pd.Series(emotions['train']['label']).unique().size
learn_rate = 0.001
batch_size_val = 64
num_epochs = 3

# Prepare training dataset and data loader
training_dataset = MyDataset(train_data)
train_data_loader = DataLoader(dataset=training_dataset, batch_size=batch_size_val, shuffle=True)

# Utilize the validation portion as the test set in this context
testing_dataset = MyDataset(validation_data)
test_data_loader = DataLoader(dataset=testing_dataset, batch_size=batch_size_val, shuffle=True)

# Create an instance of the neural network model and move it to the designated device (GPU or CPU)
nn_model = NeuralNetwork(input_feat_size, num_output_classes).to(device)

# Define the loss criterion for the classification task
loss_criterion = nn.CrossEntropyLoss()

# Set up the Adam optimizer to update the model parameters
model_optimizer = optim.Adam(nn_model.parameters(), lr=learn_rate)
You can also try this code with Online Python Compiler
Run Code

Training and testing

Next we will group all the parts and define a loop for our transformer model. We will also write a function to check the degree of correctness of the results.

Code

for current_epoch in range(total_epochs):
    for batch_idx, (input_data, true_labels) in enumerate(training_loader):
        input_data = input_data.to(device=device)
        true_labels = true_labels.to(device=device)
        true_labels = true_labels.to(torch.int64)
        predicted_scores = classification_model(input_data)
        batch_loss = loss_criterion(predicted_scores, true_labels)
        optimizer_instance.zero_grad()
        batch_loss.backward()
        optimizer_instance.step()
    num_correct_predictions = 0
    num_total_samples = 0
    model_instance.eval()
    with torch.no_grad():
        for batch_input, batch_labels in data_loader:
            batch_input = batch_input.to(device=device)
            batch_labels = batch_labels.to(device=device)
            batch_input = batch_input.reshape(batch_input.shape[0], -1)
            batch_scores = model_instance(batch_input)
            _, batch_predictions = batch_scores.max(1)
            num_correct_predictions += (batch_predictions == batch_labels).sum()
            num_total_samples += batch_predictions.size(0)
        accuracy_percentage = float(num_correct_predictions) / float(num_total_samples) * 100
        print(f'Accuracy: {accuracy_percentage:.2f}')
    model_instance.train()
    
# Evaluate accuracy on both training and test data
assess_accuracy(training_loader, classification_model)
assess_accuracy(test_loader, classification_model)
You can also try this code with Online Python Compiler
Run Code

 

Output

Output

Frequently Asked Questions

What is the use of self-attention in transformers in PyTorch?

Self Attention in transformers is used to calculate the importance of different words in a text sequence and the relation between them. This is done by giving more attention to certain words, which in turn improves the model by increasing its ability to figure out the meaning of the sentence.

What is the Hugging face transformer API?

Transformers in PyTorch can be built with the help of the Hugging Face API. The Hugging Face Transformer API is a powerful library in Python that provides an interface that can be easily used to work on transformer-based models in NLP.

What are tokens in Tokenization?

With the help of Tokenization, the text is divided into different units called tokens. These tokens can be either words or characters. These tokens allow transformers in PyTorch to process and understand the language efficiently.

What are transformers in Pytorch?

Transformers in PyTorch are a set of neural network models that are used in Natural language processing (NLP). Software like BERT, GPT3, and T5 are based on transformers that are used in classifying texts, analyzing sentiments, etc. 

Conclusion 

Kudos on finishing this article! We have discussed how the Hugging Face library can be used to classify texts with the help of transformers in PyTorch.  This not only reduces the complexity but also increases the efficiency of the NLP models.   

We hope this blog has helped you understand transformers in PyTorch better. Keep learning! We suggest you read some of our other articles related to PyTorch: 

  1. Introduction to PyTorch
  2. PyTorch Tensors
  3. Machine Learning with Python
  4. Introduction to Deep Learning
  5. torch.nn module in PyTorch
     

Refer to our Guided Path to enhance your skills in DSACompetitive ProgrammingJavaScriptSystem Design, and many more! If you want to test your competency in coding, you may check out the mock test series and participate in the contests hosted on Coding Ninjas!

But suppose you are just a beginner and are looking for questions from tech giants like Amazon, Microsoft, Uber, etc. For placement preparations, you must look at the problemsinterview experiences, and interview bundles.

Best of Luck! 

Happy Learning!

Live masterclass