Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
Types of POS taggers
Rule-Based Tagging
Stochastic Part-of-Speech Tagging
POS tagging with Hidden Markov Model
Key Takeaways
Last Updated: Mar 27, 2024

PoS Tagging with HMM - Implementation

Master Python: Predicting weather forecasts
Ashwin Goyal
Product Manager @


This article will study the theoretical and implementation of POS tagging. The POS tag phrases with parts of speech such as nouns, verbs, adjectives, adverbs, and so on are known as part of speech tagging (POS).

Hidden Markov Models (HMM) are a fundamental notion that can describe the most intricate real-time processes, such as machine translation, speech recognition, synthesis, bioinformatics gene recognition, computer vision, human gesture detection, etc.

Types of POS taggers

POS-tagging algorithms fall into two distinctive groups:

  • Rule-Based POS Taggers
  • Stochastic POS Taggers

Rule-Based Tagging

In natural language processing, statistical techniques have outperformed rule-based methods in the automatic part of speech tagging.

Typical rule-based techniques provide tags to unknown or ambiguous words based on contextual information. The grammatical qualities of the term, its previous word, its following word, and other aspects are used to disambiguate it.

Stochastic Part-of-Speech Tagging

The term stochastic tagger' can apply to various approaches to the POS tagging problem. Any model that includes frequency or probability in some way qualifies as stochastic.

  • The simplest stochastic taggers separate words only based on the likelihood that a comment would appear with a specific tag. In other words, the tag assigned to an ambiguous instance of the term is the one that seems most frequently in training set with the word. One flaw is that it produces a legitimate tag for a particular word; it may also have inadmissible tag sequences.
  • Calculating the chance of a specific sequence of tags occurring is an alternative to the word frequency approach. The event decides the optimal tag for a given word that appears with the n preceding tags, sometimes referred to as the n-gram approach. Because it evaluates the tags for individual words depending on the context, this technique makes more sense than the previous one.
  • Another complexity that can be added to a stochastic tagger is when tag sequence probability and word frequency data are used. The Hidden Markov Model is what it's called (HMM).

In HMM models, we have two types of probabilities. 

  • The first is emission probabilities, which indicate the chances of making specific observations given a given state.
  • On the other hand, transition probabilities describe the likelihood of shifting to another state given a current one.


There are various techniques that can be used for POS tagging such as 

  • Rule-based POS tagging: Rule-based POS tagging models assign POS tags to words based on handwritten rules and contextual information. Context frame rules are a common name for these rules. "If an ambiguous/unknown word ends in the suffix 'ing' and is followed by a Verb, identify it as a Verb," an example.
  • Transformation Based Tagging:  The transformation-based methods employ a set of handcrafted rules and automatically induced practices developed during training.
  • Deep learning models: Various deep learning models, such as Meta-BiLSTM, have been utilized for POS tagging and have exhibited an impressive accuracy of around 97 percent.
  • Stochastic (Probabilistic) tagging: Frequency, probability, and statistics are part of a stochastic approach. The most basic stochastic strategy employs the most frequently used tag for a specific word in the annotated training data to tag that term in the unannotated text. However, this strategy might sometimes result in tag sequences for sentences that aren't allowed according to a language's grammatical standards. Calculating the probabilities of different tag sequences that are conceivable for a sentence and assigning the POS tags from the sequence with the best likelihood is one such approach. HMMs (Hidden Markov Models) are probabilistic methods for assigning a POS Tag.

POS tagging with Hidden Markov Model

HMM (Hidden Markov Model) is a stochastic POS tagging algorithm. Handwriting, musical score following,gesture recognition, , partial discharges, and bioinformatics are hidden Markov models used in reinforcement learning and temporal pattern recognition.


# Importing essential libraries

import nltk as nl

from sklearn.model_selection import train_test_split

import numpy as np

import pandas as pd

import random

import pprint, time

#installing the treebank corpus from library nltk'treebank')


#installing the universal tagset from library nltk'universal_tagset')
# reading the Treebank tagged sentences

nl_data = list(nl.corpus.treebank.tagged_sents(tagset='universal'))


# split data into 75:25

tr_set,ts_set =train_test_split(nl_data,train_size=0.75,test_size=0.25)

# create list of test and train tagged words

tr_tg_word = [ tup for sent in tr_set for tup in sent ]

ts_tg_word = [ tup for sent in ts_set for tup in sent ]

#Unique tags present in training data

tags = {tag for word,tag in tr_tg_word}

# defining Emission Probability

def word_given_tag(word, tag, tr_bag = tr_tg_word):

    tg_lis = [pair for pair in tr_bag if pair[1]==tag]

    ct_tag = len(tg_lis)# for the passed tag occurred in train_bag

    w_given_tg_lis = [pair[0] for pair in tg_lis if pair[0]==word]

#now calculate the passed word occurred as the passed tag.

    ct_w_given_tg = len(w_given_tg_lis)

    return (ct_w_given_tg, ct_tag)

# computation ofTransition probability

def t2_with_t1(t2, t1, tr_bag = tr_tg_word):

    tags = [pair[1] for pair in tr_bag]

    ct_t1 = len([t for t in tags if t==t1])

    ct_t2_t1 = 0

    for index in range(len(tags)-1):

        if tags[index]==t1 and tags[index+1] == t2:

            ct_t2_t1 += 1

    return (ct_t2_t1, ct_t1)

# t x t transition tags matrix
# Matrix(i, j) means P(jth tag after the ith tag)
tgs_mtx = np.zeros((len(tags), len(tags)), dtype='float32')
for i, t1 in enumerate(list(tags)):
    for j, t2 in enumerate(list(tags)):
        tgs_mtx[i, j] = t2_with_t1(t2, t1)[0]/t2_with_t1(t2, t1)[1]

tags_df = pd.DataFrame(tgs_mtx, columns = list(tags), index=list(tags))


Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job




1. What is hmm POS tagging?
The practice of tagging phrases with parts of speech such as nouns, verbs, adjectives, adverbs, and so on is known as part of speech tagging (POS).

2. What is POS tagging concerning hmm NLP?
Text processing technology known as Parts of Speech (POS) tagging is used to interpret the meaning of a text correctly. The practice of assigning the appropriate POS marker (noun, pronoun, adverb, etc.) to each word in an input text is known as POS tagging.

3. What are the applications of POS tagging?
Named Entity Recognition (NER), sentiment analysis, question answering, and word sense disambiguation all use POS tagging.

4. What is Viterbi in NLP?
The Viterbi algorithm is a dynamic programming approach for determining the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of seen events, especially in Markov information sources and hidden Markov models (HMM).

Key Takeaways

So that's the end of the article.

In this article, we have extensively discussed the PoS Tagging with HMM - Implementation.

Isn't Machine Learning exciting!! We hope that this blog has helped you enhance your knowledge regarding Gradient Boosting Machine and if you would like to learn more, check out our articles on MACHINE LEARNING COURSE. Do upvote our blog to help other ninjas grow. Happy Coding!

Previous article
Parts Of Speech Tagging - HMM
Next article
Syntactic Analysis and Parser
Live masterclass