Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
What is PoS Tagging?
3.
Different PoS Tagging techniques
4.
Implementation of PoS Tagging
5.
Frequently Asked Questions
6.
Conclusion
Last Updated: Mar 27, 2024
Easy

Part of Speech (PoS) Tagging

Author Prakriti
0 upvote
Master Python: Predicting weather forecasts
Speaker
Ashwin Goyal
Product Manager @

Introduction

A language is an essential tool for communication. Natural Language Processing (NLP) aims to understand and interpret human language. Using NLP, we can perform many tasks in a single click, such as Google translation, handwriting recognition, and the list. However, computers cannot understand and process the raw data, and hence we need to process the data so that machines can understand it. In school, we read about part of speech in the grammar of any language. Part of speech is essential to understand any language, and hence in NLP, too, machines need to understand the part of speech. 

What is PoS Tagging?

Part of Speech (PoS) Tagging refers to how we classify words and give them labels according to their part of speech. Part of Speech tags defines words' context, usage, and function in a sentence. It is essential to understand the relationships between words and the structure of a sentence to understand its meaning.

The group of labels/tags used to tag the words is known as tagset.

Pos tagging helps in information retrieval, question answering, word sense disambiguation, etc.

Example-

I will book a flight to India.

I am reading a book.

In both sentences, the word "book" is used. But, in the first sentence, it is used as a verb, and in the second sentence, it is used as a noun.

In the English language, we have the following eight parts of speech.

Source

  • Noun (N) — Mary, home, table, chair, book
  • Pronoun(PRO) — he, she, his, her, this
  • Verb (V) — run, speak, read, go
  • Adverb(ADV) — happily, greatly, thankfully
  • Adjective(ADJ) — happy, green, purple, four, big
  • Preposition (P) — in, on, from, at
  • Conjunction (CON) — but, or, and
  • Interjection (INT) — Hi!, Wow!, Great!
Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Different PoS Tagging techniques

We have different techniques for PoS tagging. For example -

  • Lexical tagging
    Here, we assign the most frequently occurring tag with the word in the training corpus.
  • Rule-based tagging
    Here, we assign tags based on specific rules. For example, we can a rule that words ending with “ing” or “ed” are verbs.
    We can use rule-based tagging with lexical tagging to assign tags to words absent in the training set but present in the testing set.
  • Probabilistic tagging
    Here, we assign tags based on the probability of occurrence of a specific tag. You can read more about Hidden Markov Models and Conditional Random Fields for this approach.
  • Deep learning-based tagging
    Recurrent Neural Networks can also be used to assign PoS tags.

Implementation of PoS Tagging

Various tools exist for PoS Tagging like NLTK, spaCy, TextBlob, etc.

Let us see how to do PoS tagging using NLTK.

import nltk
nltk.download('punkt')
nltk.download('tagsets')
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import word_tokenize
text = word_tokenize("Coding Ninjas is one of the best learning platform.") #tokenizing words
nltk.pos_tag(text) #assigning tags

 

Output

[nltk_data] Downloading package punkt to /root/nltk_data...

[nltk_data]   Package punkt is already up-to-date!

[nltk_data] Downloading package tagsets to /root/nltk_data...

[nltk_data]   Package tagsets is already up-to-date!

[nltk_data] Downloading package averaged_perceptron_tagger to

[nltk_data]     /root/nltk_data...

[nltk_data]   Package averaged_perceptron_tagger is already up-to-

[nltk_data]       date!

[('Coding', 'VBG'),

 ('Ninjas', 'NNP'),

 ('is', 'VBZ'),

 ('one', 'CD'),

 ('of', 'IN'),

 ('the', 'DT'),

 ('best', 'JJS'),

 ('learning', 'NN'),

 ('platform', 'NN'),

 ('.', '.')]

 

If we want to know more about any tag, we can use the help function. Suppose we want to learn more about the JJS tag.

nltk.help.upenn_tagset("JJS")

 

Output

JJS: adjective, superlative

    calmest cheapest choicest classiest cleanest clearest closest commonest

    corniest costliest crassest creepiest crudest cutest darkest deadliest

    dearest deepest densest dinkiest …

 

If we want to see the complete list of tags,

nltk.help.upenn_tagset()

 

Output

$: dollar

    $ -$ --$ A$ C$ HK$ M$ NZ$ S$ U.S.$ US$

'': closing quotation mark

    ' ''

(: opening parenthesis

    ( [ {

): closing parenthesis

    ) ] }

,: comma

    ,

--: dash

    --

.: sentence terminator

    . ! ?

:: colon or ellipsis

    : ; ...

CC: conjunction, coordinating

    & 'n and both but either et for less minus neither nor or plus so

    therefore times v. versus vs. whether yet

CD: numeral, cardinal

    mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one forty-

    seven 1987 twenty '79 zero two 78-degrees eighty-four IX '60s .025

    fifteen 271,124 dozen quintillion DM2,000 ...

DT: determiner

    all an another any both del each either every half la many much nary

    neither no some such that the them these this those

EX: existential there

    there

FW: foreign word

    gemeinschaft hund ich jeux habeas Haementeria Herr K'ang-si vous

    lutihaw alai je jour objets salutaris fille quibusdam pas trop Monte

    terram fiche oui corporis ...

IN: preposition or conjunction, subordinating

    astride among uppon whether out inside pro despite on by throughout

    below within for towards near behind atop around if like until below

    next into if beside ...

JJ: adjective or numeral, ordinal

    third ill-mannered pre-war regrettable oiled calamitous first separable

    ectoplasmic battery-powered participatory fourth still-to-be-named

    multilingual multi-disciplinary ...

JJR: adjective, comparative

    bleaker braver breezier briefer brighter brisker broader bumper busier

    calmer cheaper choosier cleaner clearer closer colder commoner costlier

    cozier creamier crunchier cuter ...

JJS: adjective, superlative

    calmest cheapest choicest classiest cleanest clearest closest commonest

    corniest costliest crassest creepiest crudest cutest darkest deadliest

    dearest deepest densest dinkiest ...

LS: list item marker

    A A. B B. C C. D E F First G H I J K One SP-44001 SP-44002 SP-44005

    SP-44007 Second Third Three Two * a b c d first five four one six three

    two

MD: modal auxiliary

    can cannot could couldn't dare may might must need ought shall should

    shouldn't will would

NN: noun, common, singular or mass

    common-carrier cabbage knuckle-duster Casino afghan shed thermostat

    investment slide humour falloff slick wind hyena override subhumanity

    machinist ...

NNP: noun, proper, singular

    Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos

    Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA

    Shannon A.K.C. Meltex Liverpool ...

NNPS: noun, proper, plural

    Americans Americas Amharas Amityvilles Amusements Anarcho-Syndicalists

    Andalusians Andes Andruses Angels Animals Anthony Antilles Antiques

    Apache Apaches Apocrypha ...

NNS: noun, common, plural

    undergraduates scotches bric-a-brac products bodyguards facets coasts

    divestitures storehouses designs clubs fragrances averages

    subjectivists apprehensions muses factory-jobs ...

PDT: pre-determiner

    all both half many quite such sure this

POS: genitive marker

    ' 's

PRP: pronoun, personal

    hers herself him himself hisself it itself me myself one oneself ours

    ourselves ownself self she thee theirs them themselves they thou thy us

PRP$: pronoun, possessive

    her his mine my our ours their thy your

RB: adverb

    occasionally unabatingly maddeningly adventurously professedly

    stirringly prominently technologically magisterially predominately

    swiftly fiscally pitilessly ...

RBR: adverb, comparative

    further gloomier grander graver greater grimmer harder harsher

    healthier heavier higher however larger later leaner lengthier less-

    perfectly lesser lonelier longer louder lower more ...

RBS: adverb, superlative

    best biggest bluntest earliest farthest first furthest hardest

    heartiest highest largest least less most nearest second tightest worst

RP: particle

    aboard about across along apart around aside at away back before behind

    by crop down ever fast for forth from go high i.e. in into just later

    low more off on open out over per pie raising start teeth that through

    under unto up up-pp upon whole with you

SYM: symbol

    % & ' '' ''. ) ). * + ,. < = > @ A[fj] U.S U.S.S.R * ** ***

TO: "to" as preposition or infinitive marker

    to

UH: interjection

    Goodbye Goody Gosh Wow Jeepers Jee-sus Hubba Hey Kee-reist Oops amen

    huh howdy uh dammit whammo shucks heck anyways whodunnit honey golly

    man baby diddle hush sonuvabitch ...

VB: verb, base form

    ask assemble assess assign assume atone attention avoid bake balkanize

    bank begin behold believe bend benefit bevel beware bless boil bomb

    boost brace break bring broil brush build ...

VBD: verb, past tense

    dipped pleaded swiped regummed soaked tidied convened halted registered

    cushioned exacted snubbed strode aimed adopted belied figgered

    speculated wore appreciated contemplated ...

VBG: verb, present participle or gerund

    telegraphing stirring focusing angering judging stalling lactating

    hankerin' alleging veering capping approaching traveling besieging

    encrypting interrupting erasing wincing ...

VBN: verb, past participle

    multihulled dilapidated aerosolized chaired languished panelized used

    experimented flourished imitated reunifed factored condensed sheared

    unsettled primed dubbed desired ...

VBP: verb, present tense, not 3rd person singular

    predominate wrap resort sue twist spill cure lengthen brush terminate

    appear tend stray glisten obtain comprise detest tease attract

    emphasize mold postpone sever return wag ...

VBZ: verb, present tense, 3rd person singular

    bases reconstructs marks mixes displeases seals carps weaves snatches

    slumps stretches authorizes smolders pictures emerges stockpiles

    seduces fizzes uses bolsters slaps speaks pleads ...

WDT: WH-determiner

    that what whatever which whichever

WP: WH-pronoun

    that what whatever whatsoever which who whom whosoever

WP$: WH-pronoun, possessive

    whose

WRB: Wh-adverb

    how however whence whenever where whereby whereever wherein whereof why

``: opening quotation mark

Frequently Asked Questions

1. What is PoS tagging?
PoS tagging assigns part of speech to words based on their context and definition.

2. What is PoS tagging used for?
PoS tagging is helpful in text analysis tools, corpus search, etc.

3. What is JJ in PoS tagging?
JJ stands for an adjective in PoS tagging.

4. What does NLTK pos_tag do?
The NLTK pos_tag function assigns a particular part of speech to words based on their context and definition.

5. What is NN in PoS tagging?
NN stands for a noun in PoS tagging.

Conclusion

This article discussed Part of Speech(PoS) tagging in Python.

We hope this blog has helped you enhance your knowledge regarding PoS tagging and if you would like to learn more, check out our free content on NLP and more unique courses. Do upvote our blog to help other ninjas grow.

Happy Coding!

Next article
Parts Of Speech Tagging - HMM
Live masterclass