**Conditional Random Fields**

Let's assume we have a Markov Random Field that is divided into two sets of random variables, Y and X.

"When we condition the graph on X globally, i.e., when the values of random variables in X are fixed or given, all the random variables in set Y follow the Markov property p(Yáµ¤/X, Yáµ¥, uâ‰ v) = p(Yáµ¤/X, Yâ‚“, Yáµ¤~Yâ‚“), where Yáµ¤~Yâ‚“ implies that Yáµ¤ and Yâ‚“ are neighbors in the graph." The Markov Blanket of a variable is made up of its adjacent nodes or variables.

The chain-structured graph shown below is one such graph that satisfies the aforementioned property:

^{source}

As the CRF is a discriminative model, it models the conditional probability P(Y/X), which means that X is always given or observed. As a result, the graph eventually descends into a simple chain.

We call X and Y the evidence and label variables, respectively, because we condition on X and aim to find the appropriate Yáµ¢ for every Xáµ¢.

We can see that the "factor reduced" CRF model in the above figure follows Markov's property as shown for variable Yâ‚‚ in the below equation. As we can see in the equation below, the conditional probability of Yâ‚‚ depends only on its neighboring nodes.

**CRF Theory and Likelihood Optimization**

Let's start by defining the parameters, then use the Gibbs notation to construct the equations for joint (and conditional) probabilities.

**1. Label domain:** Assume that the domain of random variables in set Y is {m Ïµ â„• | 1â‰¤m â‰¤M}, i.e., the first M natural numbers.

**2. Evidence structure and a domain:** Assume that the random variables in set X are F-dimensional real-valued vectors, i.e., âˆ€ Xáµ¢ Ïµ X, Xáµ¢ Ïµ RË¢.

**3.** The length of the CRF chain should be L, which includes L labels and L evidence variables.

**4.** Let Î²áµ¢(Yáµ¢, Yâ±¼) = Wccâ€™ if Yáµ¢ = c, Yâ±¼ = câ€™ and j = i+1, 0 otherwise.

**5.** Let Î²â€™áµ¢(Yáµ¢, Xáµ¢) = Wâ€™c . Xáµ¢, if Yáµ¢ = c and 0 otherwise.

**6.** The total number of parameters is M x M + M x S, indicating that there is a single parameter for each label transition ( possible label transitions = M x M ) and S parameters for each label (M possible labels) that will be multiplied to the observation variable (a vector of size S) for that label.

**7.** Let D = {(xn, yn)} for n=1 to N, be the training data comprising of N examples.

So, the energy and the likelihood can be expressed in the following way:

As a result, the training problem boils down to maximizing the log-likelihood for all Wcc' and W'cs model parameters.

The gradient of the log-likelihood with respect to Wâ€™cs is derived in the below equation:

Note that the second term in the above equation denotes the sum of marginal probability of yâ€™áµ¢ being equal to c, weighted by xnis. The yâ€™-i here denotes the set of label/y variables at each position except position i.

For dL/dWcc', a similar derivation may be figured out as shown below.

**FAQs**

**1. What do you mean by CRF?**

CRF is also known as a conditional random field. It is a type of discriminative model that's best for prediction tasks when the current forecast is influenced by contextual information or the status of the neighbors.

**2. What is CRF in image segmentation?**

When the class labels for different inputs are not independent, a conditional random field is utilized as a discriminative statistical modelling tool. For example, the class label for a pixel is also determined by the labels of its neighbors.

**3. What is the difference between CRF and HMM (Hidden Markov Model)?**

HMM is a directed graph, whereas the CRF is an undirected graph. HMM predicts the probability of co-occurrence by explicitly modelling the transition probability and the phenotypic probability.

**4. What is the difference between CRF and MRF (Markov Random Fields)?**

A Conditional Random Field (CRF) is a type of MRF that determines a posterior for variables x given data z. The factorization into the data distribution P (x|z) and the prior P (x) is not made explicit, unlike the hidden MRF.

**Key Takeaways**

In this article, we have discussed the following topics:

- Introduction to CRF
- MRF
- CRF Theory and Likelihood Optimization

Want to learn more about __Machine Learning__? Here is an excellent course that can guide you in learning.

Happy Coding!