Table of contents
1.
Introduction
2.
Understanding Sentiment Analysis
2.1.
Overview of sentiment analysis
2.2.
Applications and use cases of sentiment analysis
3.
Overview of NRC Lexicon
3.1.
Explanation of the NRC Lexicon dataset
3.2.
Structure and format of NRC Lexicon
4.
Implementing NRC Lexicon in Python
4.1.
Introduction to Python libraries for sentiment analysis
4.2.
Loading and preprocessing the NRC Lexicon dataset
4.3.
Sentiment analysis using NRC Lexicon in Python
4.4.
Emotional analysis using NRC Lexicon in Python
4.4.1.
Output:
5.
Limitations and challenges of NRC Lexicon
6.
Frequently Asked Questions
6.1.
What is sentiment analysis?
6.2.
What is the NRC Lexicon?
6.3.
How does NRC Lexicon handle negations?
7.
Conclusion
Last Updated: Mar 27, 2024
Medium

NRC Lexicon In Python

Author Gaurav Singh
0 upvote

Introduction

Hi Ninjas, in this blog, we will learn about the NRC Lexicon in Python. In recent years, we have seen that Natural language processing has gained much recognition. It helps us extract subjective information from the text. Also, it helps us classify if the text is positive or negative, which allows us to understand the emotions and sentiments of the text. But to classify any word as positive, negative, or neutral, we need a dictionary that associates words with sentiment level, also called Lexicons.

The National Research Council of Canada developed NRC Lexicon, hence the name NRC. It is a dictionary that associates English words with sentiment categories such as positive, neutral, or negative.

NRC Lexicon In Python

So in this blog, we will cover various topics on using NRC Lexicon in Python. We will first understand the importance and application of sentiment analysis, then understand the NRC Lexicon more detailedly. We will see how to implement the NRC Lexicon in Python and finally see what the limitations and challenges of NRC Lexicon in Python are.

Understanding Sentiment Analysis

Overview of sentiment analysis

Sentiment Analysis is one of the most used subfields in Natural Language Processing which helps us understand the emotions or sentiments of the given text. It analyses the text to determine the sentiment, emotions, new articles, and social media posts.

The primary goal of the sentiment analysis is to classify the given text as positive, negative, or neutral. There are many particular lifecycles of any Sentiment analysis:

  • Text Preprocessing: It includes tokenization which is splitting text into individual words or tokens; removing stop words which removes all the most common English words which have no semantic meaning; and lastly removing negations, and punctuation marks.
     
  • Approaches: There are two types of approaches to sentimental analysis. These include Lexicon-Based and Machine Learning approaches:

    1. Lexicon-Based Approach: We use sentiment lexicons, such as NRC lexicons. It simply associates the words to some sentiment values or scores and then aggregates them for the entire text to determine the sentiment of the text.
       
    2. Machine Learning Approach: We employ ML for sentimental analysis. We train the model on labeled datasets. So the model will learn from the patterns between words and sentiments (labeled dependent variable). This learning can be used to classify new unseen text data.
       
  • Sentiment Classification: Now the preprocessing is done, and the models are also trained to perform sentiment analysis. The sentiment of each text is classified as positive, neutral, or negative.

Applications and use cases of sentiment analysis

Sentiment Analysis helps many organizations learn about the insights for real-time customer sentiment analysis. It helps them understand the experiences of the customer and their brand reputation. Also, the tools involved in sentiment analysis are used in emails, tweets, forums, and youtube comments, to understand the sentiment of the text. This algorithm helps us understand the sentiments in:

Use Case

Description

Customer Feedback Analysis

Analyzing customer feedback from surveys, reviews, and interactions to identify areas for improvement and monitor satisfaction levels.

Brand Reputation Management

Monitoring brand mentions on social media and other platforms to maintain a positive brand image and address negative sentiment proactively.

Social Media Monitoring

Tracking and analyzing sentiment trends in social media conversations to understand customer opinions, emerging trends, and campaign effectiveness.

Market Research

Gaining insights into consumer preferences, product perception, and market trends through sentiment analysis of surveys, focus groups, and discussions.

Financial Analysis

Analyzing sentiment in financial news, social media, and reports to predict market trends, assess risks, and guide investment and trading decisions.

Political Analysis

Understanding public sentiment toward political candidates, policies, and issues through sentiment analysis of speeches, news, and social media.

Healthcare and Public Health

Assessing patient satisfaction, improving healthcare quality, and monitoring public health sentiment through sentiment analysis of patient feedback and discussions.

Overview of NRC Lexicon

The National Research Council of Canada creates an overview of the NRC Lexicon. As discussed earlier, it is one of the most used sentiment lexicons. It provides a vast collection of English words and their associated sentiment values or scores. Also, it becomes essential to understand the structure and the format of the NRC Lexicon and the categories that it classifies.

Explanation of the NRC Lexicon dataset

The NRC Lexcion is a dataset that consists of 14000 English words or phrases. Each of those phrases is labeled with sentiment categories. It helps us understand the sentiment or the emotional tone of the authors writing the text.

It helps us know the emotions and if the sentence is positive or negative, i.e., the sentiment of the phrase. This helps us understand the analysis and produce a more accurate sentiment classification.

Structure and format of NRC Lexicon

NRC Lexicon is stored in the Tab Separated values (TSV) file format. In this file, each row represents a word and the associated sentiment categories. The columns in the TSV file are as follows:

  • Word: It contains the 27000 English words for which the sentiment categories are given.
     
  • Emotion Categories: Emotion Categories: This category covers the eight basic emotions: Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, and Trust. Each emotion has a corresponding binary sentiment value of 0 and 1.
     
  • Sentiment Categories: Apart from the emotions category, we also have a sentiment category that says if the text phrase is negative or positive. The output is again a binary, i.e., 0 and 1, which tells if the sentence is a positive or negative sentiment.

Implementing NRC Lexicon in Python

Introduction to Python libraries for sentiment analysis

Python has various powerful packages on the internet. These can be used to do sentimental analysis. Some of the most popular libraries include NLTK, TextBlob, and VanderSentiment. In this particular sentient of the bog, we will specifically focus on the NRC Lexicon in Python, which is present in the NLTK Library.

Loading and preprocessing the NRC Lexicon dataset

When pursuing a project on sentiment analysis, we need first to load and preprocess the data. As discussed, the NRC Lexicon is in a tab-separated values (TSV) file. We use the Pandas library from Python to load the dataset into the DataFrame.

pip install nrclex
You can also try this code with Online Python Compiler
Run Code

Sentiment analysis using NRC Lexicon in Python

from nrclex import NRCLex

# Create an instance of NRCLex using the input text
text = "This movie is amazing! I absolutely loved it."
sentiment = NRCLex(text)

# Get sentiment scores
positive_score = sentiment.affect_frequencies['positive']
negative_score = sentiment.affect_frequencies['negative']

# Determine the overall sentiment based on scores
if positive_score > negative_score:
    print("Positive sentiment")
elif negative_score > positive_score:
    print("Negative sentiment")
else:
    print("Neutral sentiment")
You can also try this code with Online Python Compiler
Run Code


Output:

Positive sentiment


In this example code, we created an instance of NRCLex module from the Python Library using the input text. This module has all the code required to tokenize the text and then it matches with the NRC Lexicon, thus calculating sentiment scores. We can also understand the sentiment scores with the help of the affect_frequencies  attribute, which will help us to know the frequencies of positive and negative categories.

Emotional analysis using NRC Lexicon in Python

To do the emotional analysis using NRC Lexicon, where it will calculate the frequencies of different emotions present in the text using the lexicon

from nrclex import NRCLex

# Create an instance of NRCLex using the input text
text = "This movie is amazing! I absolutely loved it."
emotions = NRCLex(text).affect_frequencies

# Print the emotions and their frequencies
for emotion, frequency in emotions.items():
    print(emotion, frequency)
You can also try this code with Online Python Compiler
Run Code


Output:

anticipation 0.0
joy 0.6666666666666666
trust 0.0
fear 0.0
surprise 0.0
sadness 0.0
disgust 0.0
anger 0.0


In this example code, we first created an instance of NRCLex using the input text. Then we accessed the emotional frequencies using the affect_frequencies attribute, which is basically a dictionary with emotion names as keys and their related frequencies as values.

The above code snippet prints the emotions and their frequencies present in the input text. In this case, the input text expresses a positive sentiment and is related to the emotion of "joy" with a frequency of approximately 0.67 which so the highest of all. Other emotions have frequencies of 0 since they are not expressed in the given text.

Limitations and challenges of NRC Lexicon

There are some limitations to this excellent technique which is used for sentiment analysis:

  1. Words Coverage: The NRC Lexicon does not include all the words, and also it does not take into consideration the surrounding words which provide the context of the whole sentence. It needs to be more precise when determining the context of the sentence.
     
  2. Accurate Predictions: Though it provides the usual positive and negative labels, but it doesn’t tell us about the intensity, sarcasm, or cultural variations. This limits the accuracy of the output that it gives.

Frequently Asked Questions

What is sentiment analysis?

Sentiment analysis is extracting opinions from the text. It determines whether the sentiment expressed is positive, negative, or neutral.

What is the NRC Lexicon?

The NRC Lexicon is a dataset associating words with sentiments. It helps identify positive and negative sentiments in text for sentiment analysis tasks.

How does NRC Lexicon handle negations?

The NRC Lexicon doesn't handle negations directly, but by detecting negation words like "not," sentiment polarity can be reversed for words following the negation to capture negated sentiments.

Conclusion

In conclusion, we have seen in this blog the NRC Lexicon in Python and how it is so important in sentiment analysis. We took an overview of sentimental analysis, NRC Lexicon in Python, and how it is implemented in the coding part to understand the emotions or sentiments of the text. Lastly, we also saw the limitations and challenges faced in NRC Lexicon.

If you would like to explore Python and its packages more, refer to the blogs below for more information

For more information, refer to our Guided Path on Coding Ninjas Studio to upskill yourself in PythonData Structures and AlgorithmsCompetitive ProgrammingSystem Design, and many more! Head over to our practice platform, Coding Ninjas Studio, to practice top problems, attempt mock tests, read interview experiences and interview bundles, follow guided paths for placement preparations, and much more!

Live masterclass