Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Last Updated: Mar 27, 2024

N-Gram Modelling

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM


Language modelling is a way of determining the probability of any sequence of words. Before getting started with N-Gram modelling, we need to know something about the Markov chain. We can consider a Markov chain a chain of States. we can have a chain of different states, say a b c d e f g, and so on. So when we write them in a sequence, we get a chain of different states, that is, we can go from state a to state b, b to state c, and so on. so we have two conditions, a and b, and we have the probabilities over here. 

The probability of going from state a to state b is 50%. The likelihood of going state a to state 50%, state b to state a is 50%, state b to state b is 50%. Let's assume that the initial state is a,  so based on these probabilities, we can choose anyone because they have equal chances. 

So in this way, we can form a sequence of different states. This sequence or chain of conditions is called Markov chains. 

N-Gram is a continuous sequence of N items from a sample of text. So these items are the different states that we saw in Markov chains. These items can be character words or sentences, and we can even increase the scope we can even make articles and so on. So when n is 2, we call it a Bigram; when n is 3, we call it a Trigram, and so on. So in the cast of characters, we consider the characters to be the state of Markov chains.

Sentence = ''I am a good boy'', n= 2

Bigram = 'I', 'am', 'go', 'od', b, 'y', etc..

Now consider the Trigrams for the sentence- ''The bird is flying in the blue sky''.

Trigram = 'The', 'he', 'e', 'b', 'fly', etc.

Working of N-Gram model

An N-Gram language model tells the probability of occurrence of a given N-Gram within any sequence of words. We can expect the likelihood of a word (w), given the history of previous observations (h), Containing n-1 words. We can compute the joint probability by using the conditional probability of a word given previous statements as-.

p(w1, w2,...wn) = p(w1) x p(w2 | w1) x p(w3 | w1 w2) x p(w4 | w1 w2 w3)...p(wn | w1 w2 w3…wn-1)

In N-Gram modelling there is a simple assumption that-

.p(wk | w1 w2 w3…wk-1) = .p(wn | wk-1)

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job


Let's implement the N-Gram model where n=3 of the Reuters corpus, a collection of thousands of new documents and millions of words. We will use the NLTK package to build the N-Gram model quickly.

from nltk.corpus import reuters
from nltk import bigrams, trigrams
from collections import Counter, defaultdict
# Create a placeholder for model
model = defaultdict(lambda: defaultdict(lambda: 0))
# Count frequency of co-occurrence  
for sentence in reuters.sents():
    for w1, w2, w3 in trigrams(sentence, pad_right=True, pad_left=True):
        model[(w1, w2)][w3] += 1
 # Let's transform the counts to probabilities
for w1_w2 in model:
    total_count = float(sum(model[w1_w2].values()))
    for w3 in model[w1_w2]:
        model[w1_w2][w3] /= total_count

We will start with two words – "today the ." We want our model to predict the following word:

#predict the model
print(dict(model["today", "the"]))



N should be higher to get a better context of the text. Higher the value of n, the computational overhead will be higher and lead to sparsity.

Frequently Asked Questions

  1. What is language modelling?
    Language modelling is a way of reminding the probabilities of any sequence of words.
  2. What are the applications of N-Gram modelling?
    N-Gram models are used in speech recognition, spam detection, recommendation systems, etc.
  3. How many methods of language modelling are there?
    There are two methods of language modelling-
    1. Statistical language modelling
    2. Natural language modelling
  4. What is conditional probability?
    Conditional probability is the probability of occurrence of any event given another event has already occurred.
  5. What are the metrics of language modelling?
    They are three metrics of language modelling-
    1. Entropy
    2. Cross-entropy
    3. Perplexity 

Key Takeaways 

This blog taught us about the basics of the N-Gram model and its implementation.I hope you found N-Gram modelling very interesting and more straightforward. Never forget to visit the link for more exciting machine learning models.

Do check out-

Topics covered
Working of N-Gram model
Frequently Asked Questions
Key Takeaways