Sentiment Analysis using BERT

Introduction

This article has discussed the theoretical aspects and implementation of Sentiment Analysis using BERT.

Bidirectional Representation for Transformers (BERT) was proposed by Google AI language researchers in 2018. Although the original goal was to improve understanding of the meaning of Google Search queries, BERT has become one of the essential and comprehensive architectures for various natural language tasks, producing state-of-the-art results on Sentence pair classification, question-answer tasks, and other natural language tasks.

Architecture

BERT's versatility to do various NLP tasks with state-of-the-art accuracy is one of its most essential advantages (similar to the transfer learning we used in Computer vision). The study also provided the architecture of several jobs for this purpose. In this post, we'll look at applying the BERT architecture for single-sentence classification challenges, specifically the architecture used for the binary classification test CoLA (Corpus of Linguistic Acceptability). We went over BERT architecture in detail in the previous post, but let's go over some of the key points again:

BERT has proposed the two versions:

12 levels of encoder stack with 12 bidirectional self-attention heads and 768 hidden units in BERT (BASE).
24 layers of encoder stack with 24 bidirectional self-attention heads and 1024 hidden units in BERT (LARGE).

Google has released two versions of the BERT BASE and BERT LARGE for TensorFlow implementation: Uncased and Cased. Before WordPiece tokenization, letters are lowercased in an uncased version.

Sentiment Analysis with BERT

Steps necessary to train sentiment analysis model:

Install the Transformers library;
Load the BERT Classifier and Tokenizer modules and the Input modules.
Create a processed dataset by downloading the IMDB Reviews Data (this will take multiple procedures;
Fine-tune the Loaded BERT model by configuring it and training it.
Use the Fine-Tuned Model to Make Predictions

Installing Transformers

Installing the Transformers library is pretty straightforward:

pip install transformers

You can also try this code with Online Python Compiler

Run Code

We'll load the pre-trained BERT Tokenizer and Sequence Classifier and InputExample and InputFeatures after the installation is complete. Then, using the Sequence Classifier and BERT's Tokenizer, we'll create our model and tokenizer.

CODE-

from transformers import BertTokenizer, TFBertForSequenceClassification
from transformers import InputExample, InputFeatures
model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

You can also try this code with Online Python Compiler

Run Code

IMDB Dataset

Andrew L. Maas compiled the IMDB Reviews Dataset, an extensive movie review dataset from the leading movie rating site. The IMDB Reviews dataset determines if a review is positive or negative in nature. There are 25,000 movie reviews for training and 25,000 for testing in this database. All 50,000 reviews have been categorized and can be used for supervised deep learning. Furthermore, another 50,000 unlabeled reviews will not be used in this case study. We shall solely use the training dataset in this case study.

TensorFlow and Pandas will be the initial two imports.

CODE-

import tensorflow as tf
import pandas as ps

You can also try this code with Online Python Compiler

Run Code

Get the Data from the Stanford Repo

Then, using the tf.keras.utils.get file function, we can get the dataset from Stanford's appropriate directory, as seen below:

URL = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
dataset =tf.keras.utils.get_file(fname="aclImdb_v1.tar.gz",origin=URL,untar=True,cache_dir='.',cache_subdir='')

You can also try this code with Online Python Compiler

Run Code

Remove Unlabeled Reviews

The following operations are required to remove the unlabeled reviews. Each operation is explained in the comments below:

# shutil module helps in getting a number of high-level 
# operations on files and collections of files.
import os
import shutil
# Create main directory path ("/aclImdb")
main_dir = os.path.join(os.path.dirname(dataset), 'aclImdb')
# Create sub directory path ("/aclImdb/train")
train_dir = os.path.join(main_dir, 'train')
# Remove unsup folder since this is a supervised learning task
remove_dir = os.path.join(train_dir, 'unsup')
shutil.rmtree(remove_dir)
# View the final train folder
print(os.listdir(train_dir))

You can also try this code with Online Python Compiler

Run Code

Train and Test Split

Now that we've cleaned and prepared our data, we can use the lines below to construct the text dataset from directory. I'd like to process all of the data in one go. That's why I went with a big batch size:

# Creation of training dataset and validation set
# dataset from our "aclImdb/train" directory with a 80/20 split.
train = tf.keras.preprocessing.text_dataset_from_directory(
    'aclImdb/train', batch_size=30000, validation_split=0.2, 
    subset='training', seed=123)
test = tf.keras.preprocessing.text_dataset_from_directory(
    'aclImdb/train', batch_size=30000, validation_split=0.2, 
    subset='validation', seed=123)

You can also try this code with Online Python Compiler

Run Code

Convert to Pandas to View and Process

I'd like to prepare our basic train and test datasets for our BERT model now that we have them. I'll generate a pandas dataframe from our TensorFlow dataset object to make it more understandable. Our train Dataset object is converted to a train pandas dataframe using the following code:

for i in train.take(1):
  train_feat = i[0].numpy()
  train_lab = i[1].numpy()
train = pd.DataFrame([train_feat, train_lab]).T
train.columns = ['DATA_COLUMN', 'LABEL_COLUMN']
train['DATA_COLUMN'] = train['DATA_COLUMN'].str.decode("utf-8")
train.head()

You can also try this code with Online Python Compiler

Run Code

Creating Input Sequences

We have two pandas Dataframe objects that need to be converted into acceptable BERT model objects. We'll use the InputExample method to build sequences from our dataset. The following is how to invoke the InputExample function:

InputExample(guid=None, text_a = "Hello, world", text_b = None,label = 1)

You can also try this code with Online Python Compiler

Run Code

We'll now construct two major functions:

1 — convert data to examples: This takes our train and test datasets and turns every row into an object InputExample.

2 — conversion of examples to tf dataset: This function tokenizes the InputExample objects, then creates the appropriate input format using the tokenized objects, and lastly creates an input dataset to feed to the model.

CODE-

def convert_data_to_examples(trn, tst, DT_COLUMN, LBL_COLUMN): 
  trn_InputExmpls = trn.apply(lambda x: InputExample(guid=None,txt_a = x[DT_COLUMN], txt_b = None,lbl = x[LBL_COLUMN]), axis = 1)
  vldtn_InputExamples = test.apply(lambda x: InputExample(guid=None,txt_a = x[DT_COLUMN],txt_b = None,label = x[LBL_COLUMN]), axis = 1)
  return trn_InputExmpls, vldtn_InputExamples
  trn_InputExamples, vldtn_InputExamples = convert_data_to_examples(trn, tst,'DT_COLUMN','LBL_COLUMN')
  
def convert_examples_to_tf_dataset(exmpls, tknzr, max_length=128):
    features = [] 
    for e in examples:
        input_dict = tknzr.encode_plus(e.text_a,add_special_tokens=True,
            max_length=max_length, # truncates if len(s) > max_length
            return_token_type_ids=True,
            return_attention_mask=True,
            pad_to_max_length=True, # pads to the right by default # CHECK THIS for pad_to_max_length
            truncation=True
        )
        input_ids, token_type_ids, attention_mask = (input_dict["input_ids"],
            input_dict["token_type_ids"], input_dict['attention_mask'])
        features.append(
            InputFeatures(
                input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, label=e.label
            )
        )
    def gen():
        for f in features:
            yield (
              {
                    "input_ids": f.input_ids,
                    "attention_mask": f.attention_mask,
                    "token_type_ids": f.token_type_ids,
                },
                f.label,
            )
    return tf.data.Dataset.from_generator(gen,({"input_ids": tf.int32, "attention_mask": tf.int32, "token_type_ids": tf.int32}, tf.int64),({
                "input_ids": tf.TensorShape([None]),
                "attention_mask": tf.TensorShape([None]),
                "token_type_ids": tf.TensorShape([None]),
            },
            tf.TensorShape([]),
        ),
    )
DATA_COLUMN = 'DATA_COLUMN'
LABEL_COLUMN = 'LABEL_COLUMN'

You can also try this code with Online Python Compiler

Run Code

For calling the above function,

train_InputExamples, validation_InputExamples = convert_data_to_examples(train, test, DATA_COLUMN, LABEL_COLUMN)
train_data = convert_examples_to_tf_dataset(list(train_InputExamples), tokenizer)
train_data = train_data.shuffle(100).batch(32).repeat(2)
validation_data = convert_examples_to_tf_dataset(list(validation_InputExamples), tokenizer)
validation_data = validation_data.batch(32)

You can also try this code with Online Python Compiler

Run Code

Our dataset containing processed input sequences is ready to be fed to the model.

Configuring the BERT model and Fine-tuning

Our optimizer will be Adam, our loss function will be CategoricalCrossentropy, and our accuracy measure will be SparseCategoricalAccuracy. We can get around 95% accuracy by fine-tuning the model for two epochs, which is fantastic.

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0), 
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])
model.fit(train_data, epochs=2, validation_data=validation_data)

You can also try this code with Online Python Compiler

Run Code

Making Predictions

Made two lists for reviews, The first one represents a positive review, while the second one is clearly negative.

pred_sentences = ['This was an awesome movie. I watch this beautiful movie twice my time if I have known it was this good','Worst movies of all time. I lost two hours of my life because of this movie']

You can also try this code with Online Python Compiler

Run Code

We'll use our pre-trained BERT tokenizer to tokenize our reviews. We'll feed these tokenized sequences into our model and run a final softmax layer to get the predictions. The argmax function can then determine whether our review sentiment prediction is positive or negative. Finally, we'll use a simple for loop to output the results. All of these procedures are performed in the following lines:

tf_batch = tokenizer(pred_sentences, max_length=128, padding=True, truncation=True, return_tensors='tf')
tf_outputs = model(tf_batch)
tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)
labels = ['Negative','Positive']
label = tf.argmax(tf_predictions, axis=1)
label = label.numpy()
for i in range(len(pred_sentences)):
  print(pred_sentences[i], ": \n", labels[label[i]])

You can also try this code with Online Python Compiler

Run Code

You've created a transformers network using a pre-trained BERT model and achieved a sentiment analysis accuracy of 95% on the IMDB reviews dataset!

Check this out : Boundary value analysis

FAQs

1. Can I use BERT for sentiment analysis?

Without requiring significant task-specific architecture changes, the pre-trained BERT model may be fine-tuned with just one additional output layer to learn a wide range of tasks such as neural machine translation, question answering, sentiment analysis, and text summarization.

2. Which BERT model is used for sentiment analysis?

The Keras API model will be used for Sentiment Analysis Training with the BERT Model.

3. Can BERT be used for unsupervised sentiment analysis?

BERT (Bidirectional Encoder Representations from Transformers) is a pre-training language representation NLP model developed by Google. It is trained unsupervised using a vast quantity of plain text data publicly available on the web (Wikipedia and Google Books).

4. Which algorithm is best for sentiment analysis?

A hybrid method is used. The most recent, efficient, and extensively utilized approach to sentiment analysis is hybrid sentiment analysis models.

Key Takeaways

So that's the end of the article.

In this article, we have extensively discussed Sentiment Analysis using BERT.

Isn't Machine Learning exciting!! We hope that this blog has helped you enhance your knowledge regarding Sentiment Analysis using BERT and if you would like to learn more, check out our articles on the MACHINE LEARNING COURSE. Do upvote our blog to help other ninjas grow. Happy Coding!