**Introduction**

In this big-data era, where tons of data are generated by leading social networking sites and e-commerce sites, we live in a world of data abundance. Our machine learning algorithms have mainly exploited labeled datasets. Most of the data generated from different networking sites are unstructured and unlabelled, so it is high time our machine learning community focuses on unsupervised learning algorithms to excel in AI.

To learn about unstructured data, we take the help of autoencoders. Autoencoders are learning models working on unstructured data that utilize the power of neural networks to perform the task of representation learning. In machine learning, representation learning means embedding the components and features of original data in some low-dimensional structure for better understanding, visualizing, and extracting useful information. These low dimensional vectors help us gain great details on our data, such as how close two instances of the dataset are, finding different structures and patterns present in the dataset, etc.

**Autoencoders**

They are also known as self-encoders, trained to reproduce their inputs. Autoencoders come under the category of unsupervised learning algorithms. Autoencoders are generally considered self-supervised algorithms. Suppose, for a training example, x. The label is x itself. But generally, they are deemed unsupervised as there are no classification or regression labels.

If an autoencoder does this perfectly, the output vector x_output equals x. The design of the autoencoder is as a particular two-part structure, the encoder and the decoder.

AutoEncoder = Decoder(Encoder(x))

The model train uses the reconstruction loss to minimize the difference between x and x_output. If the inputs are actual, we can define reconstruction loss as MSE(x, x_output). That's why autoencoders are also called bottleneck neural networks. We are forcing a compressed knowledge representation.

**Contractive Autoencoders**

A contractive autoencoder is considered an unsupervised deep learning technique. It helps a neural network to encode unlabeled training data. The idea behind that is to make the autoencoders robust small changes in the training dataset.

We use autoencoders to learn a representation, or encoding, for a set of unlabeled data. It is usually the first step towards dimensionality reduction or generating new data models. Contractive autoencoder targets to learn invariant representations to unimportant transformations for the given data.

**Working of Contractive Autoencoders**

A contractive autoencoder is less sensitive to slight variations in the training dataset. We can achieve this by adding a penalty term or regularizer to whatever cost or objective function the algorithm is trying to minimize. The result reduces the learned representation's sensitivity towards the training input. This regularizer needs to conform to the Frobenius norm of the Jacobian matrix for the encoder activation sequence concerning the input.

If this value is zero, we don't observe any change in the learned hidden representations as we change input values. But if the value is huge, then the learned model is unstable as the input values change.

We generally employ Contractive autoencoders as one of several other autoencoder nodes. It is in active mode only when other encoding schemes fail to label a data point.

**Mathematics Behind Contractive Autoencoders**

We need to know some essential equations before deriving contractive autoencoder,i.e., the jacobian matrix and Frobenius norm. The Jacobian matrix of a vector-valued function of several variables matrix all its first-order partial derivatives in vector calculus. When this matrix is square, i.e., when the matrix takes the same number of unknowns as input as the number of vector components of its output, it is the Jacobian determinant.

As we know, we calculate the magnitude of the vector to measure how large it is. Similarly, we calculate the norm to gauge how significant its elements are. There are various ways to calculate norm as shown below:

The Frobenius norm, also known as the Euclidean norm, is the matrix norm of an m x n matrix defined as the square root of the sum of the absolute squares of its elements. As the autoencoders robust of small changes in the training dataset. We add another penalty term to the loss function of autoencoders. We define the loss function as:

The added penalty term is given by

The penalty term, Î» x ||(J(x))||^2, is the squared Frobenius norm of the Jacobian matrix of partial derivatives.

In the above penalty term, we first need to calculate the Jacobian matrix of the hidden layer. Calculating a jacobian of the hidden layer concerning input is similar to gradient calculation. Let's first figure out the Jacobian of hidden layer:

Where phi is sigmoid nonlinearity, we need to get the jth hidden unit's dot product of the i-th feature and the corresponding weight. Then using the chain rule and substituting our above assumptions for Z and h, we get:

The above method is similar to how we calculate the gradient descent. Still, there is one significant difference: we take h(X) as a vector-valued function, each as a separate output. Our primary aim is to calculate the norm, so we can simplify that in our implementation so that we don't need to construct the diagonal matrix:

**Implementation**

Here is the pseudo-code for the contractive encoders.

Importing the libraries

import numpy as np import tensorflow as tf |

Initializing weights and inputs

lam =100 def loss(x, x_bar): mse = tf.reduce_mean( tf.keras.losses.mse(x, x_bar)) mse *= 28 * 28 W= tf.Variable(value=model.get_layer('bottleneck').get_weights()[0]) W = tf.transpose(W) h=model.get_layer('bottleneck').output # bottleneck is the hidden layer dh = h * (1 - h) contractive = lam*tf.reduce_sum(tf.linalg.matmul(dh**2,tf.square(W)), axis=1) total_loss = mse + contractive return total_loss |