Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Sentiment expresses feelings, thoughts, ideas, or attitudes toward a specific issue or topic. Machine learning models that predict sentiments have helpful real-world applications, such as measuring public opinion and determining customer happiness.
In this blog, we will be predicting sentiments with Keras. Let’s start going!
What is Sentiment Analysis?
A sentiment analysis, also known as opinion mining, is an NLP (Natural Language Processing) task that identifies the sentiment or emotion expressed in a text. Understanding the author's underlying attitude or sentiment towards a given topic, whether positive, negative, or neutral, is the main objective of sentiment analysis.
Some applications of sentiment analysis are social media monitoring, brand monitoring, business intelligence, and market research. It enables companies to examine text data for insightful information, keep checks on public opinion, and base decisions on the opinions and sentiments of their customers.
In this blog, we will predict sentiments with Keras on IMDB movie review datasets.
Understanding the IMDB Datasets
IMDB dataset is a standard dataset used for sentiment analysis, which is frequently used as a benchmark to evaluate the efficiency of sentiment analysis models. It consists of reviews of films, each of which is given a positive (1) or negative (0) sentiment rating.
The dataset is easily accessible through the keras.datasets module in Python.
Characteristics of IMDB Datasets
The characteristics of IMDB Datasets are:-
Size: There are 50,000 movie reviews in the IMDb dataset. Two sets of these reviews are created: a training set that contains 25,000 reviews and a testing set that contains the remaining 25,000 reviews
Label Distribution: This is a binary sentiment classification problem because the dataset is evenly balanced with 50% positive and 50% negative ratings
Text Data: The length of each movie review in the dataset varies, and each review is composed of a string of words. Various movie genres and themes are covered in these English-language reviews
Pre-requisites
Before predicting sentiments with Keras, Users must have some experience in below mentioned technology.
Familiar with Python and Natural Language Programming Concepts
Familiar with Keras and TensorFlow for building Deep Learning Models
Some basic knowledge of numpy and matplotlib library in Python
Building Model to Predict Sentiments with Keras
In this section, we will build NLP Model in Keras to analyze the sentiment in a given review (text) that predicts the sentiment, whether positive or negative, based on the words used in the text. We will use the IMDB dataset to predict the sentiment.
Preprocessing of Data
In this section of "Predicting Sentiments with Keras," we will import all the important and necessary libraries along with the Keras framework.
Code
// Loading Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
// Importing Keras Modules
import keras
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding, LSTM, Dense
from keras import Sequential
// Loading IMDB Dataset
from keras.datasets import IMDb
import warnings
warnings.filterwarnings('ignore')
You can also try this code with Online Python Compiler
Next, we will check by printing whether the data is split into a set of 2500 tests and 2500 training or not.
Output
Analysis of the IMDB Dataset
In this section of "Predicting Sentiments with Keras," we will analyze the IMDB Dataset by exploring it. First, we will check the number of unique words and sentiment classes in the training dataset.
Code
print("Number of Unique Words: ")
print(len(np.unique(np.hstack(x_train))))
print("Sentiment Classes:")
print(np.unique(y_train))
You can also try this code with Online Python Compiler
Now, we will see a few words and their index in IMDB Datasets. We will print the top 10 entries in the dictionary using the function get_word_index() with the below code:-
Code
w_i = tf.keras.datasets.imdb.get_word_index()
for x in list(w_i)[0:10]:
print("{}:{}".format(x, w_i[x]))
You can also try this code with Online Python Compiler
In the above plot, we can see the mass distribution has a clipped length of 400-1000 words.
Truncating Review Words
The reviews in the dataset are different in length. To truncate the review, we will use the maxlen parameter.
maxlen: This parameter specifies the longest possible text, such as a movie review, that can be used. We must establish a time limit for the evaluation in order to analyze the material effectively. By using this, it will be ensured that any reviews that are longer than maxlen will be truncated.
For neural networks, all reviews must be of the same length. So we will convert all the reviews to the exact size of the max length of 500 using the pad_sequences function in Keras.
In the above code, we are converting all the reviews greater than 500 in length into a maximum word limit of 500.
Building IMDB Model
A neural Network Model can be created by a straightforward single-layer hidden with a multi-layer perceptron model. This model will be constructed using an embedding layer, an LSTM layer, and a dense layer.
LSTM models are preferred to address this problem since they are better at maintaining long-distance connections and can handle sequential data. Using LSTM in NLP tasks is advantageous since it can take input for prediction in the form of a sentence rather than a single word. Thus, using LSTM is more practical and effective for NLP jobs.
Each word in the input will be transformed by the embedding layer into a dense vector of a specific size (embedded dimensions). To increase the model's accuracy, we must also set the hyperparameters, such as batch size, epochs, LSTM units, etc.
The layers of the LSTM are embedded in Keras using a sequential() model. To improve the model, you can experiment with the number of layers, or to prevent overfitting, you can add dropout layers.
Code
num_words = 5000
embeding_dim = 32
output_lstm = 100
# Setting up the model
model = Sequential()
model.add(Embedding(num_words, embeding_dim, input_length=500))
model.add(LSTM(output_lstm))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.summary()
You can also try this code with Online Python Compiler
We have set the word to 5,000, the word vector size to 32 dimensions, and the input length to 500. The output of this first layer will be a 32×500-sized matrix.
The output layer has one neuron, which employs sigmoid activation to produce 0 and 1 as predictions.
Training of IMDB Model
We can train the model by specifying the training set, validation_data as X_test, and y_test to evaluate our accuracy and loss for the training and validation sets at each epoch.
To train our model with two epochs, we will use model.fit().
In two epochs, we get an accuracy of the model to 87%. The model will give an 87% accurate result. You will have to train on more epochs to get more accuracy.
If no validation data is provided while training the model, the model can be assessed separately using the following code:
Code
model_eval = model.evaluate(x_test, y_test)
You can also try this code with Online Python Compiler
We will load a model and provide user input to check whether the model is working correctly. We will create a function that accepts user input and, based on user input, predict whether the model is working. The review's length will be adjusted, the words will be converted to an index, and predictions about the review's sentiment will be made.
Code
#load model
from keras.models import load_model
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
import numpy as np
loaded_model = load_model('imdb_analysis.h5')
sentiment = ['Neutral','Negative','Positive']
sequence = Tokenizer().texts_to_sequences(['This movie is adventures.'])
test = pad_sequences(sequence, maxlen=500)
sentiment[np.around(loaded_model.predict(test), decimals=0).argmax(axis=1)[0]]
print(loaded_model.predict(test))
You can also try this code with Online Python Compiler
We get an accuracy of 0.684, closer to 1 than negative sentiment. Our sentiment analysis model correctly predicted a positive sentiment as a result. We can increase your accuracy by training more epochs.
Frequently Asked Questions
What is Keras?
Python-based Keras is an open-source deep-learning library. To create neural networks, it offers a high-level interface. TensorFlow is frequently utilized as Keras' backend.
Which datasets can be preferred for sentiment analysis with Keras?
IMDb movie reviews, Twitter sentiment140, Amazon product reviews, and Yelp reviews are well-known datasets for sentiment analysis. Online, pre-labeled datasets are readily accessible.
How to deal with unbalanced sentiment classes in the dataset?
Suppose there are many more positive reviews than negative reviews in the dataset's sentiment classes, for example. In that case, you can tackle the imbalance during training using oversampling, undersampling, or class weighting strategies.
Conclusion
In this blog, we have discussed using a trained LSTM model in Keras to predict sentiment (positive or negative) in the movie review text using IMDB Datasets. We have trained as well as tested the working of the model.
We hope this blog has helped you to gain knowledge of predicting sentiments with Keras. Do not stop learning! We recommend you read some of our related articles to predicting sentiments with Keras:
But suppose you have just started your learning process and are looking for questions from tech giants like Amazon, Microsoft, Uber, etc. For placement preparations, you must look at the problems, interview experiences,and interview bundles