Introduction
This article will study the theoretical and implementation of POS tagging. The POS tag phrases with parts of speech such as nouns, verbs, adjectives, adverbs, and so on are known as part of speech tagging (POS).
Hidden Markov Models (HMM) are a fundamental notion that can describe the most intricate real-time processes, such as machine translation, speech recognition, synthesis, bioinformatics gene recognition, computer vision, human gesture detection, etc.
Types of POS taggers
POS-tagging algorithms fall into two distinctive groups:
- Rule-Based POS Taggers
- Stochastic POS Taggers
Rule-Based Tagging
In natural language processing, statistical techniques have outperformed rule-based methods in the automatic part of speech tagging.
Typical rule-based techniques provide tags to unknown or ambiguous words based on contextual information. The grammatical qualities of the term, its previous word, its following word, and other aspects are used to disambiguate it.
Stochastic Part-of-Speech Tagging
The term stochastic tagger' can apply to various approaches to the POS tagging problem. Any model that includes frequency or probability in some way qualifies as stochastic.
- The simplest stochastic taggers separate words only based on the likelihood that a comment would appear with a specific tag. In other words, the tag assigned to an ambiguous instance of the term is the one that seems most frequently in training set with the word. One flaw is that it produces a legitimate tag for a particular word; it may also have inadmissible tag sequences.
- Calculating the chance of a specific sequence of tags occurring is an alternative to the word frequency approach. The event decides the optimal tag for a given word that appears with the n preceding tags, sometimes referred to as the n-gram approach. Because it evaluates the tags for individual words depending on the context, this technique makes more sense than the previous one.
- Another complexity that can be added to a stochastic tagger is when tag sequence probability and word frequency data are used. The Hidden Markov Model is what it's called (HMM).
In HMM models, we have two types of probabilities.
- The first is emission probabilities, which indicate the chances of making specific observations given a given state.
- On the other hand, transition probabilities describe the likelihood of shifting to another state given a current one.
Techniques
There are various techniques that can be used for POS tagging such as
- Rule-based POS tagging: Rule-based POS tagging models assign POS tags to words based on handwritten rules and contextual information. Context frame rules are a common name for these rules. "If an ambiguous/unknown word ends in the suffix 'ing' and is followed by a Verb, identify it as a Verb," an example.
- Transformation Based Tagging: The transformation-based methods employ a set of handcrafted rules and automatically induced practices developed during training.
- Deep learning models: Various deep learning models, such as Meta-BiLSTM, have been utilized for POS tagging and have exhibited an impressive accuracy of around 97 percent.
- Stochastic (Probabilistic) tagging: Frequency, probability, and statistics are part of a stochastic approach. The most basic stochastic strategy employs the most frequently used tag for a specific word in the annotated training data to tag that term in the unannotated text. However, this strategy might sometimes result in tag sequences for sentences that aren't allowed according to a language's grammatical standards. Calculating the probabilities of different tag sequences that are conceivable for a sentence and assigning the POS tags from the sequence with the best likelihood is one such approach. HMMs (Hidden Markov Models) are probabilistic methods for assigning a POS Tag.
POS tagging with Hidden Markov Model
HMM (Hidden Markov Model) is a stochastic POS tagging algorithm. Handwriting, musical score following,gesture recognition, , partial discharges, and bioinformatics are hidden Markov models used in reinforcement learning and temporal pattern recognition.
CODE
# Importing essential libraries
import nltk as nl
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
import random
import pprint, time
#installing the treebank corpus from library nltk
nl.download('treebank')
#installing the universal tagset from library nltk
nl.download('universal_tagset')
# reading the Treebank tagged sentences
nl_data = list(nl.corpus.treebank.tagged_sents(tagset='universal'))
# split data into 75:25
tr_set,ts_set =train_test_split(nl_data,train_size=0.75,test_size=0.25)
# create list of test and train tagged words
tr_tg_word = [ tup for sent in tr_set for tup in sent ]
ts_tg_word = [ tup for sent in ts_set for tup in sent ]
#Unique tags present in training data
tags = {tag for word,tag in tr_tg_word}
# defining Emission Probability
def word_given_tag(word, tag, tr_bag = tr_tg_word):
tg_lis = [pair for pair in tr_bag if pair[1]==tag]
ct_tag = len(tg_lis)# for the passed tag occurred in train_bag
w_given_tg_lis = [pair[0] for pair in tg_lis if pair[0]==word]
#now calculate the passed word occurred as the passed tag.
ct_w_given_tg = len(w_given_tg_lis)
return (ct_w_given_tg, ct_tag)
# computation ofTransition probability
def t2_with_t1(t2, t1, tr_bag = tr_tg_word):
tags = [pair[1] for pair in tr_bag]
ct_t1 = len([t for t in tags if t==t1])
ct_t2_t1 = 0
for index in range(len(tags)-1):
if tags[index]==t1 and tags[index+1] == t2:
ct_t2_t1 += 1
return (ct_t2_t1, ct_t1)
# t x t transition tags matrix
# Matrix(i, j) means P(jth tag after the ith tag)
tgs_mtx = np.zeros((len(tags), len(tags)), dtype='float32')
for i, t1 in enumerate(list(tags)):
for j, t2 in enumerate(list(tags)):
tgs_mtx[i, j] = t2_with_t1(t2, t1)[0]/t2_with_t1(t2, t1)[1]
print(tgs_mtx)
tags_df = pd.DataFrame(tgs_mtx, columns = list(tags), index=list(tags))
display(tags_df)