1.
Introduction
2.
Clustering
3.
GMMs
4.
Expectation Maximisation
5.
Applications of GMMs
6.
FAQs
7.
Key Takeaways
Last Updated: Mar 27, 2024

# Gaussian Mixture Models

Arun Nawani
0 upvote

## Introduction

Weâ€™ve learned a lot about various machine learning models in our previous blogs and how each of them can be used given the conditions. Machine learning models can be divided into 2 subcategories- Supervised and Unsupervised learning. For this blog, weâ€™ll focus on unsupervised learning models, more specifically a clustering algorithm- Gaussian Mixture Model(GMM).

## Clustering

Before we dive deep into GMMs, itâ€™s essential to know what about clustering and what clustering models we have. Clustering is an unsupervised learning technique where similar data points are grouped, forming various clusters. Each cluster contains data points that the model identifies as similar in some aspect. One such clustering technique is Gaussian Mixture models. GMM is a distribution-based model instead of a distance-based model like K-means

## GMMs

GMMs work on the idea that there are a certain number of Gaussian distributions and each of these distributions makes a cluster. Suppose there are 3 gaussian distributions with different sets of mean and variance. Now, the probability of every data point to fit into a certain distribution is calculated. GMMs are probabilistic models and use soft clustering technique to assign which data point belongs to which cluster. This is one fundamental difference between hard k-means clustering and GMMs. The difference between hard-clustering and soft-clustering techniques is that hard-clustering techniques are definite about assigning a data point to a single cluster. However, in soft-clustering, data points may be shared among different clusters, the extent of which is decided based on the probability of the data point to belong to that cluster.

Letâ€™s take a step back and understand what are Gaussian distributions. Gaussian distributions are bell curves symmetrically spread around the mean value.

The spread of the curve depends on variance in the gaussian distribution. The more the variance, the more spread out the curve is.

For the above distribution depicted in a 2D space, the probability density function is given by

For a 3D distribution, that is, a curve constituting 2 variables. The probability density would be given by -

Where,

x = input vector

Âµ = 2D mean vector

âˆ‘ = 2 x 2 covariance matrix

This is an equation for a 2 variable Gaussian distribution. This can further be generalized for an n-dimension distribution where x and Âµ would be given by n-length vectors and âˆ‘ be given by n x n covariance matrix.

## Expectation Maximisation

Expectation Maximisation is a model parameter determining technique and is best used when there are missing data values. It becomes a tough task to set the right model parameters if the values are missing variables. Hence, EM first estimates optimum values for missing variables with available data and then finds the right model parameters. Itâ€™s a 2 step algorithm -

• Estimation step(E-step) - Estimating missing values with available data
• Maximisation step(M-step) - Model parameters are decided after estimating the missing values.

Say, we want to assign K clusters, that means, K Gaussian distributions. The appropriate parameters will be Mean, Covariance, and an additional density of distribution parameter(Ï€). Initially, we assign these values randomly using K-means or hierarchical clustering, followed by the E-step-

• E-step- For every point, calculate the probability of it belonging to a certain cluster.

• M-step- The density value is given by

The mean and covariance matrix are calculated and updated as per the following formula-

Itâ€™s an iterative process and is done to maximise the log-likelihood function.

## Applications of GMMs

GMMs have a variety of real-world applications. Some of them are listed below.

• Used for signal processing
• Used for customer churn analysis
• Used for language identification
• Used in video game industry
• Genre classification of songs

## FAQs

1. How do GMMs differentiate from K-means clustering?
GMMs and K-means, both are clustering algorithms used for unsupervised learning tasks. However, the basic difference between the 2 is that k-means is a distance-based clustering method while GMMs is a distribution-based clustering method.

2. Briefly explain EM algorithm.
It is a statistical algorithm used to find the right parameters when there are missing values in the data. The missing variables are called latent variables. Latent variables are first estimated using the available values and then parameters are updated with the completed data. Itâ€™s a 2 step process - E-step(estimation of latent variables) followed by M-step(Updating the model parameters). It is an iterative process.

## Key Takeaways

The blog briefly explains a very powerful clustering algorithm- Gaussian Mixture Models and highlights its salient features by contrasting it with other clustering algorithms. However, we recommend trying it out yourself to better understand the nitty-gritty of this clustering method. You may check out our industry-oriented courses on machine learning to give yourself that promising start to your machine learning journey.
Happy Learning!!

Live masterclass