What are Autoencoders?
Autoencoder is Feed-Forward Neural Networks where the input and the output are the same. Autoencoders encode the image and then decode it to get the same image. The core idea of autoencoders is that the middle layer must contain enough information to represent the input.
There are three important properties of autoencoders:
1. Data Specific: We can only use autoencoders for the data that it has previously been trained on. For instance, to encode an MNIST digits image, we’ll have to use an autoencoder that previously has been trained on the MNIST digits dataset.
2. Lossy: Information is lost while encoding and decoding the images using autoencoders, which means that the reconstructed image will have some missing details compared to the original image.
3. Unsupervised: Autoencoders belong to the unsupervised machine learning category because we do not require explicit labels corresponding to the data; the data itself acts as input and output.

Caption: Architecture of an Autoencoder
Sparse Autoencoder
Sparse Autoencoders are one of the valuable types of Autoencoders. The idea behind Sparse Autoencoders is that we can achieve an information bottleneck (same information with fewer neurons) without reducing the number of neurons in the hidden layers. The number of neurons in the hidden layer can be greater than the number in the input layer.
We achieve this by imposing a sparsity constraint on the learning. According to the sparsity constraint, only some percentage of nodes can be active in a hidden layer. The neurons with output close to 1 are active, whereas the neurons close to 0 are in-active neurons.
More specifically, we penalize the loss function such that only a few neurons are active in a layer. We force the autoencoder to represent the input information in fewer neurons by reducing the number of neurons. Also, we can increase the code size because only a few neurons are active, corresponding to a layer.
Caption: Sparse Autoencoder
Source: www.medium.com
In Sparse autoencoders, we use L1 regularization or KL-divergence to learn useful features from the input layer.