Implementation of NLTK
To download NLTK, you need Python versions 3.7, 3.8, 3.9, or 3.10.
Installing NLTK in Windows
- Install Python 3.10 using https://www.python.org/downloads/ if you do not have Python installed.
- Install NLTK using https://pypi.python.org/pypi/nltk
- Run the “import nltk” command to check if NLTK is installed properly.
Installing NLTK in MAC/Unix
- Install Python 3.10 using https://www.python.org/downloads/ if you do not have Python installed.
- Run the command “pip install --user -U nltk”.
- Run the “import nltk” command to check if NLTK is installed properly.
To use NLTK in google colab
We can install NLTK using the pip command.
pip install nltk #installing nltk
Now, run the following command to check if NLTK is installed properly.
import nltk #importing nltk
If everything goes fine, NLTK is installed properly and ready to use.
NLTK has many datasets, pre-trained models for easy use. We can find the detailed list here.
Let’s use the famous Brown corpus present in NLTK.
nltk.download('brown') #first we need to download the data
from nltk.corpus import brown #then we can import the data
print(brown.words())
Output
[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data] Unzipping corpora/brown.zip.
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
Instead of downloading the datasets separately, we can download everything in a single go using the following command.
nltk.download('all')
Output
[nltk_data] Downloading collection 'all'
[nltk_data] |
[nltk_data] | Downloading package abc to /root/nltk_data...
[nltk_data] | Unzipping corpora/abc.zip.
[nltk_data] | Downloading package alpino to /root/nltk_data...
[nltk_data] | Unzipping corpora/alpino.zip.
…
[nltk_data] | Downloading package words to /root/nltk_data...
[nltk_data] | Unzipping corpora/words.zip.
[nltk_data] | Downloading package ycoe to /root/nltk_data...
[nltk_data] | Unzipping corpora/ycoe.zip.
[nltk_data] |
[nltk_data] Done downloading collection all
True
Similarly, if we want to download only the corpus, we can use nltk.download(“all-corpora”).
Now, let’s try out some functions of NLTK.
import nltk
sentence="Coding Ninjas is one of the best learning platforms."
tokens = nltk.word_tokenize(sentence)
print(tokens)
Output
['Coding', 'Ninjas', 'is', 'one', 'of', 'the', 'best', 'learning', 'platforms', '.']
Our sentence is now split into tokens in a single step using word_tokenize() of the NLTK package.
Now, if we want to do POS tagging of the words, we can do the following.
tagged_tokens = nltk.pos_tag(tokens)
print(tagged_tokens)
Output
[('Coding', 'VBG'), ('Ninjas', 'NNP'), ('is', 'VBZ'), ('one', 'CD'), ('of', 'IN'), ('the', 'DT'), ('best', 'JJS'), ('learning', 'NN'), ('platforms', 'NNS'), ('.', '.')]
Frequently Asked Questions
1. What is NLTK used for?
Natural Language Toolkit (NLTK) is used for doing NLP tasks such as removing stopwords, tokenizing words, etc.
2. What is the difference between NLP and NLTK?
Natural Language Processing(NLP) aims to understand and interpret the human language to perform various tasks such as language translation, automatic question answering, etc. The Natural Language ToolKit(NLTK) package contains multiple libraries to perform NLP tasks in Python.
3. Why is NLTK the best?
NLTK is best as it has a lot of pre-trained models and algorithms for doing NLP tasks quickly and easily.
4. How do I use NLTK in Python?
You can use Google Colab to use NLTK in Python easily. You can download it using the command “pip install nltk” and import using the command “import nltk”.
5. What is an alternative to NLTK?
We have a library called spaCy, which is similar to NLTK.
Conclusion
This article discussed the package NLTK, its use cases, installation, and implementation. In conclusion, NLTK in Python is a robust library for natural language processing that simplifies tasks like text analysis, tokenization, and linguistic modeling. It serves as a valuable tool for learners and professionals to efficiently work with human language data.
Recommended Readings: