Introduction
The Parts of Speech of the sentences play a crucial role in delivering their meaning. Tagging sentences with their Parts Of Speech, such as Verb, Noun, Adjective, etc., is also an important step. This will help in particular applications which include spelling check and grammar etc. We can do these things in many ways, including Hidden Markov Models. HMM is used to tag sentences with Parts Of Speech, a process of determining the syntactic category of a word from the words in its surrounding context.
Hidden Markov Model and an Example
For a given sentence, HMM, a sequence model will try to predict labels(tags) to each unit. This process uses probability terms by choosing the most probabilistic sequence of labels for a given sentence. Generally, we will use Markov models in situations where we need to compute probabilities for observable events. Still, often, the events are hidden, as, in the case of parts of speech tagging, the tags are hidden. Thus the use of Markov models in this situation leads to the development of the term - Hidden Markov Model.
We will try to understand the artwork of the Hidden Markov Model using an example.
Let's take a simple, easily understandable, and most famous example of HMM concept:
Statement1 -> “Jane will spot Will”.
Here Jane is Noun, the will is Modal Verb, Spot is a verb, and Will is Noun.
During the transition or while moving from one end to another, we need to observe which parts of Speech follow other Parts of Speech.
i.e., Here for the sentence, Jane will spot Will. We can observe that Noun -> Modal Verb -> Verb -> Noun, here we can say the probability that a Noun is followed by Modal Verb or the probability that Noun follows a Verb is called "Transition Probabilities ".
And what is the probability that the Noun will be the word "Jane", the modal Verb to be a word "will", etc., are called "Emission Probabilities ".
Let's say we have a corpus as shown below:
-
Mary Jane can see Will.
N N M V N -
Spot will see Mary.
N M V N -
Will Jane spot Mary?
M N V N -
Mary will pat Spot.
N M V N
Then Emission Probabilities are as follows:
N |
M |
V |
|
Mary |
4/9 |
0 |
0 |
Jane |
2/9 |
0 |
0 |
Will |
1/9 |
3/4 |
0 |
Spot |
2/9 |
0 |
1/4 |
Can |
0 |
1/4 |
0 |
See |
0 |
0 |
1/2 |
Pat |
0 |
0 |
1/4 |
Here the probability says that if the word is Noun, then the probability that word to be “ Mary “ would be 4/9, etc. Then coming to the Transition Probabilities. Since the transition starts from the start of the sentence and ends at the end of the sentence, <S> and <E> represent the same for every sentence.
N |
M |
V |
<E> |
|
<S> |
3/4 |
1/4 |
0 |
0 |
N |
1/9 |
1/3 |
1/9 |
4/9 |
M |
1/4 |
0 |
3/4 |
0 |
V |
1 |
0 |
0 |
0 |
Here let's consider the value 3/4 for M(in a row) and V(in a column). This value represents that the probability that the Verb will follow the Modal Verb is ¾, etc. Then the hidden Markov Model for this corpus will be:

Here the Parts of Speech are hidden states as they are not visible for observation.
Here the Transition probabilities are dependent on each other, whereas the Emission Probabilities are dependent on the hidden state that we are located at, thus the probability that the hidden Markov model will generate a sentence will be the product of both.
Here the words Mary, will, Spot, Jane, Will, Can, Spot, See, Pal are called Observations.
From this, let us derive a formula for the HMM tagger. So, for the sequence of observations O = o1, o2, o3 … oT, the HMM tagger will find the sequence of states Q = q1, q2, q3 … qT. For this situation, i.e., Parts of Speech tagging, the task will be to find a tag sequence tn1 that maximizes the probability of the sequence of observations of n words w1 n.