Table of contents
1.
Introduction
2.
Lemmatization
3.
Basic Implementation
4.
Lemmatization with Parts Of Speech
5.
FAQs
6.
Key Takeaways
Last Updated: Mar 27, 2024
Easy

Lemmatization

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Lemmatization is a technique used to convert or transform words to their normalized form. It is similar to stemming, where Lemmatization uses morphological analysis of the words to convert or transform to their base forms. In Lemmatization, it uses a dictionary to map different variants of the words to their root. For example, consider the words ‘is’, ‘was’, ‘were’. The lemmatizer will convert these words into their root form ‘be’. We will go through the lemmatization in the coming sections.

Lemmatization

Natural Language Processing is the new field of Machine Learning concepts where we mainly focus on text processing, text analyzing, and many more related to speech.
To go through the natural language processing pipeline, the first step we come across is ‘Text Processing’. We need to fetch useful text formats from the text or speech that we are processing so as to get the process done. The text processing contains various steps, including but not limited to Normalization, Tokenization, Named Entity Recognition, Stemming, Lemmatization, etc. Among them, the Lemmatization concept is similar to stemming, wherein both cases we used to reduce the words to their normalized base forms.
Stemming just strike off or removes the tail ends of the given words using certain algorithms without even knowing the actual meaning of the words.
But the Lemmatization has the ability to recognize in what parts of speech the words are present, what is the exact meaning of the words given, etc. 
Lemmatization is the super useful and more powerful technique used to do the text processing step in the Natural language processing pipeline. Lemmatization uses a dictionary to map the words to their root form using their parts of speech and meaning. Each root form of the word in the Lemmatization concept is called Lemma or Lemme, and this concept is more similar to stem in Stemming. Let’s say we have two words, ‘good’ and ‘better’. The stemming will produce the stem words as it is, i.e, ‘good’ and ‘better’. But whereas in Lemmatization, the lemmatizer can able to determine the difference and semantic between ‘good’ and ‘better’ and map those two words to their root form or lemma as ‘good’.

Basic Implementation

Python’s ‘nltk’ is the huge and most popular library that supports many useful methods in the natural language processing phase. It is the library that many data scientists use to perform their text processing and various other tasks.
To implement the Lemmatization concept in Python, Python’s ‘nltk’ is the best option for us.
‘nltk’ uses the ‘WordNet’ database to reduce the words to the root forms.
To use Lemmatizer, we need to import ‘WordNetLemmatizer’, which is packaged in nltk.stem.wordnet module. To do that, 

from nltk.stem.wordnet import WordNetLemmatizer
You can also try this code with Online Python Compiler
Run Code

And then, we can use

Lemmed_data = WordNetLemmatizer().lemmatize(word) #the word which to be reduced
You can also try this code with Online Python Compiler
Run Code

Here, the lemmatize() method used to do the discussed task using the WordNet database will take a word as the argument and return that word's lemma.

#Basic Example of using Lemmatization concept.
import re
import nltk
from nltk.corpus import stopwords
text = "Are the human people the ones who started the war? Is AI a bad thing ?, It will change your view of the matrix. Look at it at least twice and definitely watch part 2. The first time you see The Second Renaissance it may look boring.”
# Normalize it
text = re.sub(r"[^a-zA-Z0-9]", " ", text.lower())
# Tokenize it
words = text.split()
print(words)
You can also try this code with Online Python Compiler
Run Code

The output of this will be a list of tokens of split words.

['are', 'the', 'human', 'people', 'the', 'ones', 'who', 'started', 'the', 'war', 'is', 'ai', 'a', 'bad', 'thing', 'it', 'will', 'change', 'your', 'view', 'of', ‘the’, 'matrix', 'look', 'at', 'it', 'at', 'least', 'twice', 'and', 'definitely', 'watch', 'part', '2', ‘the’, 'first', 'time', 'you', 'see', 'the', 'second', 'renaissance', 'it', 'may', 'look', 'boring']
#The below statement will remove all the stop words such as ‘the’, ‘you’, ‘it’, etc.
words = [word for word in words if word not in stopwords.words("english")]

#Implementing the Lemmatization concept.
from nltk.stem.wordnet import WordNetLemmatizer as wnl
# Reduce words to their root form
lemmed = [wnl().lemmatize(word) for word in words]
print(lemmed)
You can also try this code with Online Python Compiler
Run Code

The Output of the python snippet will be as follows:

['human', 'people', 'one', 'started', 'war', 'ai', 'bad', 'thing', 'change', 'view', 'matrix', 'look', 'least', 'twice', 'definitely', 'watch', 'part', '2', 'first', 'time', 'see', 'second', 'renaissance', 'may', 'look', 'boring']

Lemmatization with Parts Of Speech

The above example shows that the only change after Lemmatization is 'ones' to 'one, i.e., the plural noun is turned to a singular noun. Here Lemmatizer needs to know or make an assumption about the part of speech of each word in the given input and then reduce the word to its normalized or reduced form. 
This makes sense that the lemmatizer will reduce the words based on the parts of speech.
So this leads to introducing an additional optional parameter to the lemmatize() method, ‘pos’

Lemmed_data = wnl().lemmatize(word, pos = ‘v’)
#v = verb here
You can also try this code with Online Python Compiler
Run Code

For the same example, we can use the pos parameter to convert the ‘boring’, ‘started’ to their root forms as ‘bore’, and ‘start’. This can be done as shown below:

lemmed = [wnl().lemmatize(word, pos='v') for word in lemmed]
print(lemmed)
You can also try this code with Online Python Compiler
Run Code

The output, as discussed earlier, will be:

['human', 'people', 'one', 'start', 'war', 'ai', 'bad', 'thing', 'change', 'view', 'matrix', 'look', 'least', 'twice', 'definitely', 'watch', 'part', '2', 'first', 'time', 'see', 'second', 'renaissance', 'may', 'look', 'bore']

We can use other parameter values when we need them.
When compared to stemming and lemmatization, stemming doesn’t use a dictionary as we used in the concept of Lemmatization; thus, stemming will be considered as a less memory-intensive method to use when compared to Lemmatization.
You can learn more about this functionality here, nltk Lemmatization.

FAQs

  1. What is the Lemmatization of words?
    Lemmatization of words is an important step in the Natural Language Processing pipeline, where we try to reduce the words to their root forms by using the morphological analysis of the words.
  2. What are the common differences between Lemmatization and Stemming?
    Lemmatization uses morphological analysis of the words to reduce to their root forms, whereas Stemming uses various algorithms to reduce words by cutting the tail of the words. Lemmatization uses a dictionary but Stemming doesn’t.
  3. How does Python support Lemmatization?
    Python’s nltk library provides a beautiful default method called WordNetLemmatizer(), which uses WordNet database to use the words, use lemmatize() method, which takes a word, a poS as parameters.
  4. What is the output of the Lemmatization step?
    The output of the Lemmatization step is the lemma of the input word, and the lemma here represents the reduces or normalized form of the word.

Key Takeaways

In this article, we have briefly discussed the concept of Lemmatization, how it is used, what are the differences between stemming and Lemmatization, and how to use python to implement Lemmatization.
Here is a small task for you,
For Given a list of words such as ‘Change’, ‘Changes’, ‘Changed’, ‘Changing’.
What is the output when we will apply Stemming and Lemmatization?
Comment the answer below.

Hey Ninjas! You can check out more unique courses on machine learning concepts through our official website, Coding Ninjas, and checkout Coding Ninjas Studio to learn through articles and other important stuff to your growth.
Happy Learning!

Live masterclass