Table of contents
1.
Introduction
2.
Autoencoders
2.1.
Contractive Autoencoders
2.2.
Working of Contractive Autoencoders
2.3.
Mathematics Behind Contractive Autoencoders
2.4.
Implementation
3.
Frequently Asked Questions
4.
Key Takeaways
Last Updated: Mar 27, 2024

Contractive Autoencoder

Author Mayank Goyal
1 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

In this big-data era, where tons of data are generated by leading social networking sites and e-commerce sites, we live in a world of data abundance. Our machine learning algorithms have mainly exploited labeled datasets. Most of the data generated from different networking sites are unstructured and unlabelled, so it is high time our machine learning community focuses on unsupervised learning algorithms to excel in AI.

To learn about unstructured data, we take the help of autoencoders. Autoencoders are learning models working on unstructured data that utilize the power of neural networks to perform the task of representation learning. In machine learning, representation learning means embedding the components and features of original data in some low-dimensional structure for better understanding, visualizing, and extracting useful information. These low dimensional vectors help us gain great details on our data, such as how close two instances of the dataset are, finding different structures and patterns present in the dataset, etc.

Autoencoders

They are also known as self-encoders, trained to reproduce their inputs. Autoencoders come under the category of unsupervised learning algorithms. Autoencoders are generally considered self-supervised algorithms. Suppose, for a training example, x. The label is x itself. But generally, they are deemed unsupervised as there are no classification or regression labels.

If an autoencoder does this perfectly, the output vector x_output equals x. The design of the autoencoder is as a particular two-part structure, the encoder and the decoder.

AutoEncoder = Decoder(Encoder(x))

The model train uses the reconstruction loss to minimize the difference between x and x_output. If the inputs are actual, we can define reconstruction loss as MSE(x, x_output). That's why autoencoders are also called bottleneck neural networks. We are forcing a compressed knowledge representation.

Contractive Autoencoders

A contractive autoencoder is considered an unsupervised deep learning technique. It helps a neural network to encode unlabeled training data. The idea behind that is to make the autoencoders robust small changes in the training dataset. 

We use autoencoders to learn a representation, or encoding, for a set of unlabeled data. It is usually the first step towards dimensionality reduction or generating new data models. Contractive autoencoder targets to learn invariant representations to unimportant transformations for the given data.

Working of Contractive Autoencoders

A contractive autoencoder is less sensitive to slight variations in the training dataset. We can achieve this by adding a penalty term or regularizer to whatever cost or objective function the algorithm is trying to minimize. The result reduces the learned representation's sensitivity towards the training input. This regularizer needs to conform to the Frobenius norm of the Jacobian matrix for the encoder activation sequence concerning the input.

If this value is zero, we don't observe any change in the learned hidden representations as we change input values. But if the value is huge, then the learned model is unstable as the input values change.

We generally employ Contractive autoencoders as one of several other autoencoder nodes. It is in active mode only when other encoding schemes fail to label a data point.

Mathematics Behind Contractive Autoencoders

We need to know some essential equations before deriving contractive autoencoder,i.e., the jacobian matrix and Frobenius norm. The Jacobian matrix of a vector-valued function of several variables matrix all its first-order partial derivatives in vector calculus. When this matrix is square, i.e., when the matrix takes the same number of unknowns as input as the number of vector components of its output, it is the Jacobian determinant.

As we know, we calculate the magnitude of the vector to measure how large it is. Similarly, we calculate the norm to gauge how significant its elements are. There are various ways to calculate norm as shown below:

                     Img_src

The Frobenius norm, also known as the Euclidean norm, is the matrix norm of an m x n matrix defined as the square root of the sum of the absolute squares of its elements. As the autoencoders robust of small changes in the training dataset. We add another penalty term to the loss function of autoencoders. We define the loss function as:

The added penalty term is given by

     

The penalty term, λ x ||(J(x))||^2, is the squared Frobenius norm of the Jacobian matrix of partial derivatives.

 

In the above penalty term, we first need to calculate the Jacobian matrix of the hidden layer. Calculating a jacobian of the hidden layer concerning input is similar to gradient calculation. Let's first figure out the Jacobian of hidden layer:

        

Where phi is sigmoid nonlinearity, we need to get the jth hidden unit's dot product of the i-th feature and the corresponding weight. Then using the chain rule and substituting our above assumptions for Z and h, we get:

                             

The above method is similar to how we calculate the gradient descent. Still, there is one significant difference: we take h(X) as a vector-valued function, each as a separate output. Our primary aim is to calculate the norm, so we can simplify that in our implementation so that we don't need to construct the diagonal matrix:

Implementation

Here is the pseudo-code for the contractive encoders.

Importing the libraries

import numpy as np
import tensorflow as tf

 

Initializing weights and inputs

lam =100
def loss(x, x_bar):
    mse = tf.reduce_mean(  tf.keras.losses.mse(x, x_bar)) 
    mse *= 2828
    W= tf.Variable(value=model.get_layer('bottleneck').get_weights()[0])
    W = tf.transpose(W)
    h=model.get_layer('bottleneck').output # bottleneck is the hidden layer
    dh = h * (1 - h)  
    contractive = lam*tf.reduce_sum(tf.linalg.matmul(dh**2,tf.square(W)),      
    axis=1)
    total_loss = mse + contractive
    return total_loss

 

Frequently Asked Questions

  1. What is the need for contractive autoencoders?
    A contractive autoencoder is considered an unsupervised deep learning technique. It helps a neural network to encode unlabeled training data. We use autoencoders to learn a representation, or encoding, for a set of unlabeled data. It is usually the first step towards dimensionality reduction or generating new data models.
     
  2. What are the different layers of autoencoders?
    It consists of an input layer (the first layer), a hidden layer (the yellow layer), and an output layer (the last layer). The objective of the network is for the output layer to be the same as the input layer.
     
  3. Are autoencoders unsupervised?
    Autoencoders come under the category of unsupervised learning algorithms. Autoencoders are generally considered self-supervised algorithms. Suppose, for a training example, x. The label is x itself. But generally, they are deemed unsupervised as there are no classification or regression labels.

Key Takeaways

Let us brief the article.

Firstly, we saw autoencoders, the purpose of autoencoders. Then we saw what's contractive autoencoders. Later we saw what makes contractive autoencoders different from the rest of autoencoders. Furthermore, we saw working behind the contractive autoencoders and the mathematics behind it. Lastly, we saw the implementation of contractive autoencoders.

I hope you all like this article.

Happy Learning Ninjas!

Live masterclass