Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Working Of Levenshtein Distance
3.
Example
4.
Computing Alignments
5.
FAQs
6.
Key Takeaways
Last Updated: Mar 27, 2024
Easy

Levenshtein Distance - NLP

Author Mayank Goyal
0 upvote
Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

The Levenshtein distance is a string metric used to compare two sequences. The Levenshtein distance between two words is the smallest number of single-character modifications (insertions, deletions, or substitutions) required to transform one word into the other.

The Levenshtein distance is named after Vladimir Levenshtein, a Russian scientist who invented the method in 1965 if we don't know how to spell or pronounce Levenshtein.

For example, the Levenshtein distance between "kitty" and "sitting" is three because changing one into the other requires at least three modifications.

kitten-> sittin (the letter "s" is substituted for the letter "k")

sitten -> sittin (substitution of "i" for "e")

sittin ->sitting (insertion of "g" at the end).

An "edit" is defined as the addition of a character, the deletion of a character, or the substitution of a character.

The Levenshtein distance algorithm has been utilized in the following applications:

  • Checking your spelling
  • Speech recognition is a service that allows you to recognize
  • An examination of DNA
  • Detection of plagiarism

Working Of Levenshtein Distance

We can count the minimal character modifications in two words in a basic scenario. Edits are characterized by insertions, deletions, or substitutions on one or more characters. – Minimum distance can be thought of as a search task in which we look for the shortest path—a sequence of edits—from one string to the next.

Because the universe of all possible modifications is vast, we can't search haphazardly. 

  • Instead of recomputing all of the different edit pathways, we could memorize the shortest path to a state each time we saw it.
  • We can accomplish this with dynamic programming.
  • Dynamic programming refers to a group of algorithms that use a table-based approach. Combining solutions to sub-issues is a way of solving difficulties.

For two strings – the source string X of length n and the target string Y of length m:

  •  We define lev(i,j) as the edit distance between X[1..i] and Y[1..j], i.e., the first I characters of X and the first j characters of Y (n,m).
  • The edit distance between X and Y is thus lev(n,m).

We will compute lev(n,m) from the bottom up, combining subproblem solutions. First, compute the base cases: 

lev(i,0) =i, a length I source substring and an empty target string necessitate i delete.

lev(0,j) = j, j inserts are required for a target substring of length j and an empty source string.

After computing lev(i,j) for tiny i, j, we calculate larger lev(i,j) based on the smaller values previously obtained.

Therefore, Levenshtein Distance is given by:

                                                                 Img_src

Where, 

a = string #1

b = string #2

i = the terminal character position of string #1.

j = the terminal character position of string #2.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Example

The Levenshtein distance for strings A and B can be calculated by using a matrix. It is initialized in the following way:

String a=sitting

String b=kitten

index   0 1 2 3 4 5 6 7
    # S I T T I N G
0 # 0 1 2 3 4 5 6 7
1 K 1              
2 I 2              
3 T 3              
4 T 4              
5 E 5              
6 N 6              

 

Now the updated matrix will look like this:

index   0 1 2 3 4 5 6 7
    # S I T T I N G
0 # 0 1 2 3 4 5 6 7
1 K 1 1            
2 I 2              
3 T 3              
4 T 4              
5 E 5              
6 N 6              

 

Now, we will find Lev(2,1):

The updated matrix will look like this:

index   0 1 2 3 4 5 6 7
    # S I T T I N G
0 # 0 1 2 3 4 5 6 7
1 K 1 1            
2 I 2 2            
3 T 3              
4 T 4              
5 E 5              
6 N 6              

 

The final updated matrix will look like this:

index   0 1 2 3 4 5 6 7
    # S I T T I N G
0 # 0 1 2 3 4 5 6 7
1 K 1 1 2 3 4 5 6 7
2 I 2 2 1 2 3 4 5 6
3 T 3 3 2 1 2 3 4 5
4 T 4 4 3 2 1 2 3 4
5 E 5 5 4 3 2 2 3 4
6 N 6 6 5 4 3 3 2 3

 

The final Levenshtein Distance Value is always in the corner which is highlighted with red marker(lev(6,7)).

Computing Alignments

Editing distance isn't enough. We frequently need to align each character of two strings to each other. To do so, we preserve a "backtrace." We remember where we came from every time we entered a cell. When we conclude, read off the alignment by tracing the path from the upper right corner.

FAQs

  1. What is the Levenshtein distance used for?
    The Levenshtein distance is a statistic used in linguistics to quantify linguistic distance, or how different two languages are from one other.
  2. What is the best way to read Levenshtein distance?
    The Levenshtein distance is a numerical value that indicates how different two strings are. The more significant the difference between the two strings, the higher the number. For example, the Levenshtein distance between "kitty" and "sitting" is three because changing one into the other requires at least three modifications.
  3. What does Levenshtein have to offer?
    Return Number: If one of the arguments exceeds the 255-character limit, the Levenshtein() function returns an integer value of the Levenshtein distance, else -1.
  4. On Levenshtein, how do you calculate the distance between two strings?
    The Levenshtein distance is commonly calculated by looping through a matrix of size (M+1)x(N+1) —where M and N are the lengths of the two words—and conducting various calculations inside each iteration.

Key Takeaways

Let us brief the article.

Firstly, we looked into Levenshtein distance, its use, and its applications. Later, we looked into the working of Levenshtein distance with the help of an example. 

That's the end of the article. I hope you all like it.

Check out this problem - Frog Jump

Do not worry if you want to have in-depth knowledge of different techniques. We have a perfect tutor to help you out.

Happy Learning Ninjas!

 

Live masterclass