Latent Dirichlet Allocation
Latent Dirichlet Allocation (LDA) is a tool and technique for Topic Modeling that classifies or categorizes the text in a document and the words per topic using Dirichlet distributions and processes.
The LDA makes two fundamental assumptions:
- Documents are a made of topics, and
- Topics are a made of tokens (or words)
In these topics, the probability distribution is employed to generate the words. In statistical terminology, the documents are the probability density (or distribution) of subjects, and the topics are the probability distribution of words.
It's an algebraic algorithm; any ML algorithm has three steps.
- Representation: Defining the problem in mathematical entities.
- Defining the loss
- Optimization: Minimize the loss.
Let's understand these three things in LDA.
How Does LDA work?
First, LDA applies the following two key assumptions to the corpus at hand. Assume we have a corpus containing the following five documents:
Document 1: This weekend, I'd want to see a movie.
Document 2: Yesterday, I went shopping. New Zealand defeated India at Southampton by eight wickets to win the World Test Championship.
Document 3: I am not a cricket fan. Netflix and Amazon Prime both have excellent movie selections.
Document 4: While watching movies is a fun way to unwind, I'd rather paint and read some good books this time. It's been a long time!
Document 5: I love this blueberry milkshake! Dr Joe Dispenza's books are worth reading. His work is revolutionary! His works aided in discovering a lot of information on how our beliefs affect our biology and how our brains can be rewired.
A document word (or document term matrix), commonly known as DTM, can represent any corpus, which is a collection of documents.
The first step with text data is to clean, preprocess, and tokenize the text to words, as we all know. We get the following document word matrix after preprocessing the documents:
The five documents are D1, D2, D3, D4, and D5, and the words are represented by the Ws. Therefore there are eight distinct words from W1 to W8.
As a result, the matrix has the following shape: 5 * 8 (five rows and eight columns):

As a result, the corpus now primarily consists of the above-preprocessed document-word matrix, in which , each row represents a documentand each column represents tokens or words.
As demonstrated below, LDA converts this document-word matrix into two other matrices: Document Term and Topic Word.

Below is a description of these matrices:
The conceivable themes (represented by K above) that the documents can contain are already included in the Document-Topic matrix. Assume we have five themes and five papers, resulting in a matrix with a dimension of 5*6.
The words (or terms) that certain subjects can contain are listed in the Topic-Word matrix. The vocabulary has 5 themes and 8 distinct tokens; hence the matrix was 6*8 in shape.
Representation of LDA

- The yellow box refers to the entire corpus of papers (represented by M). M = 5 in our example because we have 5 documents.
- The amount of words in a document is provided by N in the peach colour box.
- Many words might be found inside this peach box. W, which is in the blue colour circle, is one of the words.
The ultimate purpose of LDA is to identify the most optimized Document-Topic and Topic-Word distributions by finding the most optimal representation of the Document-Topic and Topic-Word matrices.
Because LDA thinks that documents are made up of various topics, and topics are made up of a variety of words, it starts at the document level to determine which topics and words would have generated these documents.
Now, let's look at our corpus, which consisted of five texts (D1 to D5), each with a different quantity of words:
The LDA model has two parameters that control the distributions:
- Alpha () is in charge of per-document topic distribution
- Beta () is in charge of per-topic word distribution.
To summarize:
- M: total documents in the corpus
- N: number of words in the document
- w: Word in a document
- z: latent topic assigned to a word
- theta (𝛳): topic distribution
- LDA model's parameters: Alpha (ɑ) and Beta (ꞵ)
Optimisation:
The ultimate purpose of LDA is to identify the most optimized Document-Topic and Topic-Word distributions by finding the most optimal representation of the Document-Topic and Topic-Word matrices.
Because LDA thinks that documents are made up of a variety of topics, and topics are made up of a variety of words, it starts at the document level to determine which topics and words would have generated these papers.
Now, let's look at our corpus, which consisted of five texts (d1 to d5), each with a different quantity of words:
d1 : (w1, w2, w3, w4, w5, w6, w7, w8)
d2 : (w`1, w`2, w`3, w`4, w`5, w`6, w`7, w`8, w`9, w`10)
d3 : (w“1, w“2, w“3, w“4, w“5, w“6, w“7, w“8, w“9, w“10, w“11, w“12, w“13, w“14 w“15)
d4 : (w“`1, w“`2, w“`3, w“`4, w“`5, w“`6, w“`7, w“`8, w“`9, w“`10, w“`11, w“`12)
d5 : (w““1, w““2, w““3, w““4, w““5, w““6, w““7, w““8, w““9, w““10,…, w““32, w““33, w““34)
After the first iteration, LDA provides the initial document-topic and topic-word matrices. The goal is to improve these findings, which LDA accomplishes by iterating over all the documents and terms.
LDA also assumes that all of the assigned topics are correct, except the present word. So, using those already-correct subject-word assignments, LDA attemptsLDA will iterate over each document 'D' and each word 'w' with a new assignment that:
LDA will iterate over each document 'D' and word 'w'.
How would it go about doing that? It does it by computing two probabilities for each topic (k): p1 and p2.
P1: the proportion of words currently assigned to the topic in the document (D) (k)
P2: the percentage of assignments devoted to the topic(k) out of all documents derived from the term w. To put it another way, p2 represents the percentage of papers in which the word (w) is also assigned to the topic (k)
The formulas for p1 and p2 are as follows:
P1 equals proportion (subject k / document D), and P2 equals percentage (word w / topic k).
LDA now estimates a new probability, which is the product of (p1*p2), using these probabilities p1 and p2. Through this product probability, LDA identifies the new topic, which is the most relevant topic for the current word.
The product probability of p1 * p2 is used to reassign the word 'w' from the document 'D' to a new topic 'k.'
Now, during the step of selecting a new topic, 'k,' the LDA is run for a large number of iterations until a steady-state is reached. LDA reaches its convergence point when it produces the most optimal document-term and topic-word matrices representation.
The working and method of Latent Dirichlet Allocation are now complete.
Implementation
# Parameters tuning using Grid Search
from sklearn.model_selection import GridSearchCV
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.manifold import TSNE
grid_params = {'n_components' : list(range(5,10))}
# LDA model
lda = LatentDirichletAllocation()
lda_model = GridSearchCV(lda,param_grid=grid_params)
lda_model.fit(document_term_matrix)
# Estimators for LDA model
lda_model1 = lda_model.best_estimator_
print("Best LDA model's params" , lda_model.best_params_)
print("Best log likelihood Score for the LDA model",lda_model.best_score_)
print("LDA model Perplexity on train data", lda_model1.perplexity(document_term_matrix))

You can also try this code with Online Python Compiler
Run Code
There are three major hyperparameters in LDA. They are 'alpha,' which stands for document-topic density factor, 'beta,' which stands for word density in a subject, and 'or the number of components,' which stands for the number of topics you wish to cluster or divide into portions of the document.

FAQS
1. What is a good explanation of latent Dirichlet allocation?
Latent Dirichlet Allocation (LDA) is a popular form of statistical topic modelling. In LDA, documents are represented as a mixture of topics and a topic is a bunch of words. Those topics reside within a hidden, also known as a latent layer.
2. Why do we use latent Dirichlet allocation?
The aim behind the LDA to find topics that the document belongs to, on the basis of words contains in it. It assumes that documents with similar topics will use a similar group of words. This enables the documents to map the probability distribution over latent topics and topics are probability distribution.
3. What is the difference between LDA and LSA?
Both LSA and LDA have the same input which is a Bag of words in matrix format. LSA focus on reducing matrix dimension while LDA solves topic modelling problems. I will not go through mathematical detail and as there is lot of great material for that.
4. Is Latent Dirichlet Allocation Bayesian?
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modelled as a finite mixture over an underlying set of topics.
Conclusion
So in a nutshell LDA is a type of statistical model which is used for topic modelling
for discovering the abstract “topics” that occur in a collection of documents and are used to classify text in a document to a particular topic.
Hey Ninjas! Don't stop here; check out Coding Ninjas for Machine Learning, more unique courses, and guided paths. Also, try Coding Ninjas Studio for more exciting articles, interview experiences, and fantastic Data Structures and Algorithms problems.
Happy Learning!