Implementation
import numpy as np #linear algebra
import pandas as pd #data processing
import os

You can also try this code with Online Python Compiler
Run Code
import re #for removing non-letter character
import nltk #NLP

You can also try this code with Online Python Compiler
Run Code
from nltk.corpus import stopwords #to remove useless words
from nltk.stem.porter import * #stemming

You can also try this code with Online Python Compiler
Run Code
from sklearn.model_selection import train_test_split #to split the data for training and testing purpose
# for building model
import tensorflow as tf
import seaborn as sns

You can also try this code with Online Python Compiler
Run Code
#for data visualization
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
%matplotlib inline

You can also try this code with Online Python Compiler
Run Code
#loading dataset
data = pd.read_csv("Twitter_Data.csv")
data.head(5)

You can also try this code with Online Python Compiler
Run Code
#checking for missing data (null data)
data.isnull().sum()
data.shape

You can also try this code with Online Python Compiler
Run Code
#dropping missing data
data.dropna(axis=0, inplace=True)
data.shape #data dimensions

You can also try this code with Online Python Compiler
Run Code
#mapping tweet categories
data['category'] = data['category'].map({-1.0:'Negative', 0.0:'Neutral', 1.0:'Positive'})
data.head()

You can also try this code with Online Python Compiler
Run Code
#distribution of sentiments
data.groupby('category').count().plot(kind='bar')

You can also try this code with Online Python Compiler
Run Code

labels=['Negative','Neutral','Positive']
sizes=[]
colors = ['red','yellow','green']
p=0
n=0
N=0
for i in data['category']:
if i=='Negative':
n+=1
elif i=='Positive':
p+=1
else:
N+=1
sizes.append(n)
sizes.append(N)
sizes.append(p)

You can also try this code with Online Python Compiler
Run Code
#pie chart for tweets
explode = (0.05, 0.05, 0.05)
plt.pie(sizes,explode = explode,colors=colors,labels=labels,autopct='%1.1f%%',shadow=True,startangle=90)
plt.axis('equal')
plt.title("Tweets Distribution")
plt.show()

You can also try this code with Online Python Compiler
Run Code

def tweet_to_words(tweet):
text = tweet.lower() #make all letters to lowercase
text = re.sub(r"[^a-zA-Z0-9]", " ", text) #remove non-letters
words = text.split() #tokenize
words = [w for w in words if w not in stopwords.words("english")] #removing stopwords
words = [PorterStemmer().stem(w) for w in words]
return words
print("\nOriginal tweet -> ",data['clean_text'][0])
print("\nProcessed tweet -> ",data['clean_text'][0])

You can also try this code with Online Python Compiler
Run Code
Original tweet -> when modi promised “minimum government maximum governance” expected him begin the difficult job reforming the state why does take years get justice state should and not business and should exit psus and temples
Processed tweet -> when modi promised “minimum government maximum governance” expected him begin the difficult job reforming the state why does take years get justice state should and not business and should exit psus and temples
X = list(map(tweet_to_words,data['clean_text']))
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
Y = le.fit_transform(data['category'])

You can also try this code with Online Python Compiler
Run Code
['modi', 'promis', 'minimum', 'govern', 'maximum', 'govern', 'expect', 'begin', 'difficult', 'job', 'reform', 'state', 'take', 'year', 'get', 'justic', 'state', 'busi', 'exit', 'psu', 'templ']
0
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=1)
print('Number of tweets in the total set : {}'.format(len(X)))
print('Number of tweets in the training set : {}'.format(len(X_train)))
print('Number of tweets in the testing set : {}'.format(len(X_test)))

You can also try this code with Online Python Compiler
Run Code
Number of tweets in the total set : 162969
Number of tweets in the training set : 130375
Number of tweets in the testing set : 32594
#bag of words
from sklearn.feature_extraction.text import CountVectorizer
count_vector = CountVectorizer(max_features=5000,preprocessor=lambda x:x, tokenizer=lambda x:x)
X_train=count_vector.fit_transform(X_train).toarray()
X_test=count_vector.fit_transform(X_test).toarray()

You can also try this code with Online Python Compiler
Run Code
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

You can also try this code with Online Python Compiler
Run Code
max_words = 5000
max_len=50

You can also try this code with Online Python Compiler
Run Code
def tokenize_pad_sequences(text):
'''
This function tokenize the input text into sequnences of intergers and then
pad each sequence to the same length
'''
# Text tokenization
tokenizer = Tokenizer(num_words=max_words, lower=True, split=' ')
tokenizer.fit_on_texts(text)
# Transforms text to a sequence of integers
X = tokenizer.texts_to_sequences(text)
# Pad sequences to the same length
X = pad_sequences(X, padding='post', maxlen=max_len)
# return sequences
return X, tokenizer

You can also try this code with Online Python Compiler
Run Code
print('Before Tokenization & Padding \n', data['clean_text'][0])

You can also try this code with Online Python Compiler
Run Code
Before Tokenization & Padding
when modi promised “minimum government maximum governance” expected him begin the difficult job reforming the state why does take years get justice state should and not business and should exit psus and temples

You can also try this code with Online Python Compiler
Run Code
X, tokenizer = tokenize_pad_sequences(data['clean_text'])
print('After Tokenization & Padding \n', X[0])

You can also try this code with Online Python Compiler
Run Code
After Tokenization & Padding
[ 42 1 307 66 1726 1119 40 2378 2 1211 205 2 215 32
155 100 49 69 1068 215 50 3 6 546 3 50 4179 3
2806 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0]

You can also try this code with Online Python Compiler
Run Code
# Convert categorical variable into dummy/indicator variables.
y = pd.get_dummies(data['category'])
# Train and Test split
X_train, X_test,y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)
# Extracting validation set from the train set
valid_size=1000
X_valid, y_valid = X_train[-valid_size:], y_train[-valid_size:]
X_test, y_test = X_train[:-valid_size], y_train[:-valid_size]

You can also try this code with Online Python Compiler
Run Code
print('Train Set ->', X_train.shape, y_train.shape)
print('Validation Set ->', X_valid.shape, y_valid.shape)
print('Test Set ->', X_test.shape, y_test.shape)

You can also try this code with Online Python Compiler
Run Code
Train Set -> (114078, 50) (114078, 3)
Validation Set -> (1000, 50) (1000, 3)
Test Set -> (113078, 50) (113078, 3)

You can also try this code with Online Python Compiler
Run Code
import keras.backend as K
def f1_score(precision, recall):
# Function to calculate f1 score
f1_val = 2*(precision*recall)/(precision+recall+K.epsilon())
return f1_val

You can also try this code with Online Python Compiler
Run Code
#machine learning model
from keras.models import Sequential
from keras.layers import Embedding, Conv1D, MaxPooling1D, Bidirectional, Dense, Dropout
from keras.metrics import Precision, Recall

You can also try this code with Online Python Compiler
Run Code
vocab_size = 5000
embedding_size = 32

You can also try this code with Online Python Compiler
Run Code
# Build model
model= Sequential()
model.add(Embedding(vocab_size, embedding_size, input_length=max_len))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Bidirectional(LSTM(32)))
model.add(Dropout(0.4))
model.add(Dense(3, activation='softmax'))

You can also try this code with Online Python Compiler
Run Code
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy', Precision(), Recall()])

You can also try this code with Online Python Compiler
Run Code
# Train model
num_epochs = 10
batch_size = 32
history = model.fit(X_train, y_train,validation_data=(X_valid, y_valid),batch_size=batch_size, epochs=num_epochs, verbose=0)

You can also try this code with Online Python Compiler
Run Code
# Evaluate model on the test set
loss, accuracy, precision, recall = model.evaluate(X_test, y_test, verbose=0)
# Print metrics
print('')
print('Accuracy : {:.4f}'.format(accuracy))
print('Precision : {:.4f}'.format(precision))
print('Recall : {:.4f}'.format(recall))
print('F1 Score : {:.4f}'.format(f1_score(precision, recall)))

You can also try this code with Online Python Compiler
Run Code
Accuracy : 0.9869
Precision : 0.9890
Recall : 0.9853
F1 Score : 0.9871

You can also try this code with Online Python Compiler
Run Code
def predict_class(text):
'''Function to predict sentiment class of the passed text'''
sentiment_classes = ['Negative', 'Neutral', 'Positive']
max_len=50
# Transforms text to a sequence of integers using a tokenizer object
xt = tokenizer.texts_to_sequences(text)
# Pad sequences to the same length
xt = pad_sequences(xt, padding='post', maxlen=max_len)
# Do the prediction using the loaded model
yt = model.predict(xt).argmax(axis=1)
# Print the predicted sentiment
print('The predicted sentiment is', sentiment_classes[yt[0]])

You can also try this code with Online Python Compiler
Run Code
predict_class(['"hello how are you'])

You can also try this code with Online Python Compiler
Run Code
The predicted sentiment is Neutral

You can also try this code with Online Python Compiler
Run Code
FAQs
1. What is the significance of text emotion detection?
Emotion detection is an important area of study in human-computer interaction. Researchers have made substantial efforts to detect emotions from face and auditory data, but recognizing emotions from textual data is still a new and hot study topic.
2. What are sentiment analysis and emotion detection?
Emotion Analysis seeks to discover and distinguish emotions such as anger, disgust, fear, pleasure, sadness, and surprise through the expression of words. Sentiment analysis aims to determine whether the text contains positive, neutral, or negative feelings. Emotion Analysis, on the other hand, seeks to see and recognize different sorts of sentiments, such as anger, disgust, fear, happiness, sadness, and surprise, through the expression of words.
3. In sentiment analysis, what technology is used?
To assign weighted sentiment scores to entities, topics, themes, and categories inside a sentence or phrase, a sentiment analysis system for text analysis combines natural language processing (NLP) and machine learning techniques.
Key Takeaways
Let us brief the article.
Firstly, we saw the importance of sentiment analysis or emotion detection in our daily livelihood. Later, we saw how we could implement emotion detection using Bi-directional LSTM. Lastly, we saw a detailed implementation of the same.
Check out this problem - First Missing Positive
I hope you all like this article. Want to learn more about Data Analysis? Here is an excellent course that can guide you in learning. Can also refer to our Machine Learning course.
Happy Learning Ninjas!