Table of contents
1.
Introduction
2.
What is the Diffusion Model?
2.1.
Denoising the Diffusion Model
3.
Architecture of the Diffusion Model
3.1.
Forward Diffusion Process
3.2.
Reconstruction
3.3.
Training and Results
3.3.1.
Construction of the Model
3.3.2.
Computation of Loss Function
4.
Implementation of the Diffusion Model
5.
Applications and Use Cases
6.
Frequently Asked Questions
6.1.
How does a diffusion model work?
6.2.
Are diffusion models better than GANs?
6.3.
What is a diffusion model analysis?
7.
Conclusion
Last Updated: Mar 27, 2024

A Quick Guide to Diffusion Model

Author Sanjana Yadav
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Hello there!!

Machine Learning models are programs trained to discover patterns in new data and generate predictions. These models are represented mathematically as a function that accepts requests in the form of input data, makes predictions based on the input data, and then returns an output in response.

In this article, we will learn one such Machine Learning model called the Diffusion Model.

A Quick Guide to Diffusion Model

What is the Diffusion Model?

Diffusion Models are generative models, meaning they create data comparable to the data on which they were trained. Diffusion Models function fundamentally by corrupting training data by successively adding Gaussian noise and then learning to recover data by reversing this noising process. We may use the Diffusion Model to produce data after training by simply sending randomly sampled noise through the learned denoising process.

Diffusion models are inspired by non-equilibrium thermodynamics.  They construct a Markov chain of diffusion steps to gradually add random noise to data and then learn to reverse the diffusion process to construct the desired data samples from the noise. Unlike VAE or flow models, Diffusion models are trained using a predetermined process, and the latent variable has a large dimensionality (same as the original data).

Denoising the Diffusion Model

Denoising diffusion models have been known for a long time. It derives from the notion of Diffusion Maps, which is one of the dimensionality reduction strategies used in the Machine Learning field. It also incorporates ideas from probabilistic approaches like Markov Chains, which have been employed in various applications. Sohl-Dickstein et al. proposed the initial Denoising Diffusion technique.

Denoising diffusion modeling consists of forward diffusion and reverse diffusion or reconstruction. Gaussian noise is gradually added to the data throughout the forward diffusion process until it becomes all noise. The reverse/reconstruction technique removes the noise by employing a neural network model to learn the conditional probability densities.

The image below gives an idea of the diffusion model.

Diffusion Model

Src: https://bit.ly/3uBA7eD  

Architecture of the Diffusion Model

A Diffusion Model, as previously stated, consists of a forward process in which a datum is gradually noised and a reverse process in which noise is turned back into a sample from the target distribution.

Forward Diffusion Process

The forward diffusion process does not need training since it can be explicitly defined as a Markov Chain, unlike an encoder in the VAEs. Starting with the original data point, we add Gaussian noise in T steps to get a collection of noisy samples. Because the probability density forecast at time t is solely reliant on the immediate predecessor at time t-1, the conditional probability density may be calculated as follows:

conditional probability density equation

The total distribution of the entire process may then be calculated as follows:

total distribution equation

The mean and the variance of the density function are affected by the parameter βτ, a hyper parameter whose value can be kept constant throughout the procedure or progressively altered in the subsequent steps. A range of functions (e.g., sigmoid, tanh, linear, etc.) can be used to describe the behavior of a differential parameter value assignment.

The above derivation is sufficient to predict the successive states; however, if we want to sample at a given time interval t without going through all of the intermediary steps, allowing for an efficient implementation, we can formulate the above equation again by substituting the hyper-parameter as ατ = 1 — βτ. The above equation is then reframed as follows:

Reframed equation

To generate samples at time step t using the probability density estimation available at time step t-1, we may use another thermodynamic concept known as 'Langevin dynamics.' According to stochastic gradient Langevin dynamics, we can only sample the new states of the system using the gradient of the density function in a Markov Chain update. After that, sampling of new data point at a time t for a step size dependent on the preceding point at time t-1 may be determined as follows:

sampling of new data

Reconstruction

If the system's current state is given, the reverse procedure necessitates estimating probability density at an earlier time step. This requires estimating the q(χτ-1 | χτ) for t=T and creating a data sample from isotropic Gaussian noise.

In contrast to the forward process, estimating the previous state from the current state needs knowledge of all previous gradients, which we cannot get without a learning model that can predict such estimations. As a result, we must train a neural network model that estimates ρθ(χτ-1|χτ) based on learned weights θ and the current state at time t. This is estimated as follows:

neural network model equation
neural network model equation

Ho. et al. suggested a parameterization for the mean function, which may be calculated as follows:

parameterization for the mean function

The authors of Ho. et al. proposed using a fixed variance function as Σθ = βτ. The time t-1 sample may therefore be calculated as follows:

 time t-1 sample calculation

Training and Results

Construction of the Model

The model used in diffusion model training follows similar patterns to a VAE network. However, it is frequently maintained, considerably simpler, and straightforward compared to other network designs. The input layer's input size is the same as the data dimensions'. Depending on the network's depth needs, multiple hidden layers may exist. The middle levels are linear layers, each with its own activation function. The final layer has the same size as the original input layer, allowing the original data to be reconstructed. The last layer of the Denoising Diffusion Networks consists of two different outputs, one for the mean and one for the variance of the estimated probability density.

Computation of Loss Function

The network model's goal is to optimize the following loss function:

Loss function

Sohl-Dickstein et al. suggested a simplified version of this loss function that expresses the loss as a linear combination of KL-divergence between two gaussian distributions and a collection of entropies. This simplifies the computation and makes the loss function straightforward to construct. The loss function is then:

Loss function

Ho et al. provided further simplification and enhancement in the loss function by using the parameterization for the mean, as described earlier for the forward process. As a result, the loss function becomes:

Loss function

Implementation of the Diffusion Model

While Diffusion Models have not yet been democratized to the same extent as other older Machine Learning architectures/approaches, there are still implementations accessible for usage. The denoising-diffusion-pytorch package, which implements an image diffusion model, is the simplest method to utilize a Diffusion Model in PyTorch. Simply use the following command in the terminal to install the package:

pip install denoising_diffusion_pytorch

Steps to train the model and generate results:

  1. Import the necessary packages.
  2. Define the network architecture.
  3. Define the diffusion model.
  4. Train the model and generate images.

Applications and Use Cases

As previously noted, Diffusion Model research has expanded in recent years. Diffusion Models are inspired by non-equilibrium thermodynamics, now generate state-of-the-art image quality.

Apart from cutting-edge image quality, Diffusion Models provide several other advantages. Some of these are

  • They do not require adversarial training.
  • Diffusion Models provide the extra benefits of scalability and parallelizability regarding training efficiency.

Frequently Asked Questions

How does a diffusion model work?

Diffusion models function by removing training data with noise and then learning to retrieve the data by reversing the noise process.

Are diffusion models better than GANs?

Diffusion models generate more realistic images than GANs.

What is a diffusion model analysis?

A diffusion model analysis is based on a multi-dimensional search for the best estimates for all free parameters, such that the expected and observed response time distributions are near.

Conclusion

In this article, we learned about the diffusion model from scratch. We learned about its architecture, implementation, and uses.
 

We hope that the article helped you learn the diffusion model and its uses in an easy and insightful manner. You may read more about the Machine Learning models and much more here.

You can also visit our website to read more such blogs. Make sure you enroll in our courses, take mock tests, solve problems, and interview puzzles. Also, you can prepare for interviews with interview experiences and an interview bundle.

Keep learning and keep growing, Ninjas!

Thank you

Live masterclass