Table of contents
1.
Introduction
2.
Hidden Markov Model and an Example
3.
Viterbi Algorithm
4.
FAQs
5.
Key Takeaways
Last Updated: Mar 27, 2024
Easy

Parts Of Speech Tagging - HMM

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

The Parts of Speech of the sentences play a crucial role in delivering their meaning. Tagging sentences with their Parts Of Speech, such as Verb, Noun, Adjective, etc., is also an important step. This will help in particular applications which include spelling check and grammar etc. We can do these things in many ways, including Hidden Markov Models. HMM is used to tag sentences with Parts Of Speech, a process of determining the syntactic category of a word from the words in its surrounding context.

Hidden Markov Model and an Example

For a given sentence, HMM, a sequence model will try to predict labels(tags) to each unit. This process uses probability terms by choosing the most probabilistic sequence of labels for a given sentence. Generally, we will use Markov models in situations where we need to compute probabilities for observable events. Still, often, the events are hidden, as, in the case of parts of speech tagging, the tags are hidden. Thus the use of Markov models in this situation leads to the development of the term - Hidden Markov Model.

We will try to understand the artwork of the Hidden Markov Model using an example.
Let's take a simple, easily understandable, and most famous example of HMM concept:

Statement1 -> “Jane will spot Will”.
Here Jane is Noun, the will is Modal Verb, Spot is a verb, and Will is Noun.
During the transition or while moving from one end to another, we need to observe which parts of Speech follow other Parts of Speech.
i.e., Here for the sentence, Jane will spot Will. We can observe that Noun -> Modal Verb -> Verb -> Noun, here we can say the probability that a Noun is followed by Modal Verb or the probability that Noun follows a Verb is called "Transition Probabilities ".
And what is the probability that the Noun will be the word "Jane", the modal Verb to be a word "will", etc., are called "Emission Probabilities ".

Let's say we have a corpus as shown below:

  1. Mary Jane can see Will.
      N       N     M    V     N
  2. Spot will see Mary.
      N     M    V    N
  3. Will Jane spot Mary?
      M    N     V       N
  4. Mary will pat Spot.
      N      M   V    N

Then Emission Probabilities are as follows:

 

N

M

V

Mary

4/9

0

0

Jane

2/9

0

0

Will 

1/9

3/4

0

Spot

2/9

0

1/4

Can

0

1/4

0

See

0

0

1/2

Pat

0

0

1/4

 

Here the probability says that if the word is Noun, then the probability that word to be “ Mary “ would be 4/9, etc. Then coming to the Transition Probabilities. Since the transition starts from the start of the sentence and ends at the end of the sentence, <S> and <E> represent the same for every sentence.

 

N

M

V

<E>

<S>

3/4

1/4

0

0

N

1/9

1/3

1/9

4/9

M

1/4

0

3/4

0

V

1

0

0

0

 

Here let's consider the value 3/4 for M(in a row) and V(in a column). This value represents that the probability that the Verb will follow the Modal Verb is ¾, etc. Then the hidden Markov Model for this corpus will be:

                                                                        source

Here the Parts of Speech are hidden states as they are not visible for observation.
Here the Transition probabilities are dependent on each other, whereas the Emission Probabilities are dependent on the hidden state that we are located at, thus the probability that the hidden Markov model will generate a sentence will be the product of both.
Here the words Mary, will, Spot, Jane, Will, Can, Spot, See, Pal are called Observations.
From this, let us derive a formula for the HMM tagger. So, for the sequence of observations O = o1, o2, o3 … oT, the HMM tagger will find the sequence of states Q = q1, q2, q3 … qT. For this situation, i.e., Parts of Speech tagging, the task will be to find a tag sequence tn1  that maximizes the probability of the sequence of observations of n words w1 n.

   

Viterbi Algorithm

The Viterbi Algorithm will help us by giving the chain of states that generates the observations with the highest probability. Let's say we need to tag the Parts of Speech for the sentence: “Jane will spot Will”.
The corresponding probability graph will look like this.

                                                                        source

Now we need to calculate the path probabilities, or we term it as the probability of being in state j after first t observations vt(j), 

 

let's say for will (Jane will spot Will), the path probability will be from Noun -> 1/6(previous Noun state)*1/9(the transition probability)*1/9(the current Noun state probability) = 1/486
Similarly, from the Modal verb, it will be 0*1/4*1/9 = 0.
And from Verb Node, it will be 0*1*1/9 = 0.
Pick the highest probability and assign it as its path probability. On calculating all the path probabilities, we get:

                                                                       source

And then, we will remove all the nodes with path probabilities as zero and consider the optimal path with the highest probability at each node, and we get the path as - >

<S> -> N -> M -> V -> N -> <E>

Thus this Viterbi algorithm tags the sentence “Jane will spot Will” with their parts of Speech as “Jane” as Noun, “ will” as Modal Verb, “spot” as a verb, and “Will” as a Noun.
In this way, the HMM will tag the sentences with their Parts of Speech.

FAQs

  1. What is Part of Speech Tagging?
    The Part of Speech tagging is similar to the classification problem where we gonna tag the particular words or tokens of a sentence with their part of speech or simply add a description of the tokens of a sentence.
  2. What are the observations in the Parts of Speech Tagging problem?
    In the above-discussed example, we can say that the observations are the tokens or the words like Jane, Will, Spot, will, can, see, etc., i.e., the words of a sentence are simply termed as Observations.
  3. How to model the Parts of Speech Tagging using HMM?
    As we discussed earlier, the HMM uses a sequence of observations and comes to a decision; similarly, in this part of speech tagging problem, the HMM is built to observe a certain sequence of sentences and decide the highest probability sequence which output the parts of speech of the words in the input sentence.
  4. What is Hidden Markov Model?
    Hidden Markov Model is a concept in Machine Learning which uses the concept of Cumulative Reward, which follows a sequence of observations to train by itself and come to a decision. We can also define it as the statistical approach used to describe the evolution of observations that depends on internal factors or internal decisions.

Key Takeaways

Hooray! We have reached the end of our learning. As usual, learning will not end, so keep improving your skillset by exploring other concepts. In this article, we have discussed parts of speech tagging, HMM use-case in parts of speech problem, using the Viterbi algorithm to optimize the HMM tagger to work much more efficiently.
Hey Ninjas! You can check out more unique courses on machine learning concepts through our official website, Coding Ninjas, and checkout Coding Ninjas Studio to learn through articles and other important stuff to your growth.

Happy Learning!

Live masterclass