Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
Markov Random Fields
Conditional Random Fields
CRF Theory and Likelihood Optimization
Key Takeaways
Last Updated: Mar 27, 2024

Conditional Random Fields

Author soham Medewar
0 upvote
Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM


Conditional Random Fields (CRFs) are a type of discriminative model that is well suited to prediction tasks in which contextual information or the status of the neighbors influences the present prediction. CRFs are used to solve problems such as named entity recognition, part of speech tagging, gene prediction, noise reduction, and object detection.

In this article, we'll go through the fundamental math and jargon associated with Markov Random Fields, which is the concept on which CRF is based. Furthermore, we will discuss a simple Conditional Random Fields model, demonstrating why they are ideally suited to sequential prediction issues. Then, in the context of the CRF model, we will go through the likelihood maximization problem and related derivations.

Markov Random Fields

A Markov Random Field, also known as a Markov Network, is a type of graphical model in which random variables are connected in an undirected graph. The random variables' dependency or independence is determined by the graph's structure.


  • The graph G = (V, E) represents a Markov Network, with the vertices or nodes representing random variables and the edges reflecting the interdependence between those variables.
  • The graph can be factored into J separate cliques or factors, each regulated by a factor function ϕⱼ, each with a subset of random variables as its scope. For all conceivable values of Dⱼ. ϕⱼ(dⱼ) should be strictly positive.
  • The product of all the factor functions is the unnormalized joint probability of the variables, therefore for the MRF illustrated above with V= (A, B, C, D), the joint probability may be stated as:


The denominator is the sum of the factors' products across all possible values for the random variables. It's a constant that's also known as the partition function and is abbreviated as Z.

Read about, Machine Learning

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

Conditional Random Fields

Let's assume we have a Markov Random Field that is divided into two sets of random variables, Y and X.

"When we condition the graph on X globally, i.e., when the values of random variables in X are fixed or given, all the random variables in set Y follow the Markov property p(Yᵤ/X, Yᵥ, u≠v) = p(Yᵤ/X, Yₓ, Yᵤ~Yₓ), where Yᵤ~Yₓ implies that Yᵤ and Yₓ are neighbors in the graph." The Markov Blanket of a variable is made up of its adjacent nodes or variables.

The chain-structured graph shown below is one such graph that satisfies the aforementioned property:



As the CRF is a discriminative model, it models the conditional probability P(Y/X), which means that X is always given or observed. As a result, the graph eventually descends into a simple chain.

We call X and Y the evidence and label variables, respectively, because we condition on X and aim to find the appropriate Yᵢ for every Xᵢ.

We can see that the "factor reduced" CRF model in the above figure follows Markov's property as shown for variable Y₂ in the below equation. As we can see in the equation below, the conditional probability of Y₂ depends only on its neighboring nodes.

CRF Theory and Likelihood Optimization

Let's start by defining the parameters, then use the Gibbs notation to construct the equations for joint (and conditional) probabilities.

1. Label domain: Assume that the domain of random variables in set Y is {m ϵ ℕ | 1≤m ≤M}, i.e., the first M natural numbers.

2. Evidence structure and a domain: Assume that the random variables in set X are F-dimensional real-valued vectors, i.e., ∀ Xᵢ ϵ X, Xᵢ ϵ Rˢ.

3. The length of the CRF chain should be L, which includes L labels and L evidence variables.

4. Let βᵢ(Yᵢ, Yⱼ) = Wcc’ if Yᵢ = c, Yⱼ = c’ and j = i+1, 0 otherwise.

5. Let β’ᵢ(Yᵢ, Xᵢ) = W’c . Xᵢ, if Yᵢ = c and 0 otherwise.

6. The total number of parameters is M x M + M x S, indicating that there is a single parameter for each label transition ( possible label transitions = M x M ) and S parameters for each label (M possible labels) that will be multiplied to the observation variable (a vector of size S) for that label.

7. Let D = {(xn, yn)} for n=1 to N, be the training data comprising of N examples.

So, the energy and the likelihood can be expressed in the following way:

As a result, the training problem boils down to maximizing the log-likelihood for all Wcc' and W'cs model parameters.

The gradient of the log-likelihood with respect to W’cs is derived in the below equation:

Note that the second term in the above equation denotes the sum of marginal probability of y’ᵢ being equal to c, weighted by xnis. The y’-i here denotes the set of label/y variables at each position except position i.

For dL/dWcc', a similar derivation may be figured out as shown below.


1. What do you mean by CRF?

CRF is also known as a conditional random field. It is a type of discriminative model that's best for prediction tasks when the current forecast is influenced by contextual information or the status of the neighbors.

2. What is CRF in image segmentation?

When the class labels for different inputs are not independent, a conditional random field is utilized as a discriminative statistical modelling tool. For example, the class label for a pixel is also determined by the labels of its neighbors.

3. What is the difference between CRF and HMM (Hidden Markov Model)?

HMM is a directed graph, whereas the CRF is an undirected graph. HMM predicts the probability of co-occurrence by explicitly modelling the transition probability and the phenotypic probability.

4. What is the difference between CRF and MRF (Markov Random Fields)?

A Conditional Random Field (CRF) is a type of MRF that determines a posterior for variables x given data z. The factorization into the data distribution P (x|z) and the prior P (x) is not made explicit, unlike the hidden MRF.

Key Takeaways

In this article, we have discussed the following topics:

  • Introduction to CRF
  • MRF
  • CRF Theory and Likelihood Optimization

Want to learn more about Machine Learning? Here is an excellent course that can guide you in learning. 

Happy Coding!

Live masterclass