Introduction
The Levenshtein distance is a string metric used to compare two sequences. The Levenshtein distance between two words is the smallest number of single-character modifications (insertions, deletions, or substitutions) required to transform one word into the other.
The Levenshtein distance is named after Vladimir Levenshtein, a Russian scientist who invented the method in 1965 if we don't know how to spell or pronounce Levenshtein.
For example, the Levenshtein distance between "kitty" and "sitting" is three because changing one into the other requires at least three modifications.
kitten-> sittin (the letter "s" is substituted for the letter "k")
sitten -> sittin (substitution of "i" for "e")
sittin ->sitting (insertion of "g" at the end).
An "edit" is defined as the addition of a character, the deletion of a character, or the substitution of a character.
The Levenshtein distance algorithm has been utilized in the following applications:
- Checking your spelling
- Speech recognition is a service that allows you to recognize
- An examination of DNA
- Detection of plagiarism
Working Of Levenshtein Distance
We can count the minimal character modifications in two words in a basic scenario. Edits are characterized by insertions, deletions, or substitutions on one or more characters. – Minimum distance can be thought of as a search task in which we look for the shortest path—a sequence of edits—from one string to the next.
Because the universe of all possible modifications is vast, we can't search haphazardly.
- Instead of recomputing all of the different edit pathways, we could memorize the shortest path to a state each time we saw it.
- We can accomplish this with dynamic programming.
- Dynamic programming refers to a group of algorithms that use a table-based approach. Combining solutions to sub-issues is a way of solving difficulties.
For two strings – the source string X of length n and the target string Y of length m:
- We define lev(i,j) as the edit distance between X[1..i] and Y[1..j], i.e., the first I characters of X and the first j characters of Y (n,m).
- The edit distance between X and Y is thus lev(n,m).
We will compute lev(n,m) from the bottom up, combining subproblem solutions. First, compute the base cases:
lev(i,0) =i, a length I source substring and an empty target string necessitate i delete.
lev(0,j) = j, j inserts are required for a target substring of length j and an empty source string.
After computing lev(i,j) for tiny i, j, we calculate larger lev(i,j) based on the smaller values previously obtained.
Therefore, Levenshtein Distance is given by:
Where,
a = string #1
b = string #2
i = the terminal character position of string #1.
j = the terminal character position of string #2.