Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
The main applications of deep neural networks are in supervised learning, such as classification or regression. But Generative Adversarial Networks (GANs), which use neural networks for generative modelling, have a fundamentally different goal. Generative modelling is a machine learning activity that involves automatically identifying and learning the regularities or patterns in the input data such that the model may be used to produce new examples that might have been reasonably derived from the original dataset.
This blog provides information on how to generate human faces using GAN, as well as the details of the generator, discriminator, and training of the GAN model.
Without further ado, let's get started.
What is GAN?
The term "Generative Adversarial Network," or GAN, refers to the technology that powers different varieties of AI, namely Artistic Style Transfer and face-swapping applications, Natural Voice Generation, Music Synthesis, smart reply, smart compose, etc. Other kinds of neural networks approach learning in a different way than a GAN. A Generator and a Discriminator are two neural networks used in GAN computational frameworks that "compete" against one another to produce the intended outcome. The Discriminator's job is to tell the difference between actual and fake images, while the Generator's mission is to produce phoney images that look realistic. Images that look like real-life photos are produced when both are operating at peak efficiency.
Let's look at the details of the working of GAN.
Working of GAN
By competing with two neural networks, GANs can learn the probability distribution of a dataset.
The Generator is a neural network that creates new data instances, whereas the Discriminator is a neural network that determines whether or not each instance of data it reviews actually belongs to the training dataset. The discriminator receives fresh, phoney images that the generator produces in the meantime. It does this in the hopes that even if they are fraudulent, they will also be accepted as legitimate. The fictitious image is produced from a 100-dimensional noise sample with a uniform distribution between -1.0 and 1.0 using transposed convolution, which is convolution's inverse.
The steps a GAN takes are as follows:
The generator outputs an image after receiving random numbers.
The discriminator receives this created image in addition to a stream of photos from the real, ground-truth dataset.
The discriminator inputs both authentic and fraudulent images and outputs probabilities, a value between 0 and 1, with 1 denoting a prediction of authenticity and 0 denoting fraudulent.
Consequently, you have two feedback loops:
The discriminator is in a feedback loop with the images' known ground truth.
The discriminator and the generator are connected in a feedback loop.
Let's look into the details of Generator Network.
Generator Network
The Generator Network takes as its input random noise, which it then processes using a differentiable function (a neural network) to give it a recognisable structure that resembles the images in the training dataset. The choice of the input random noise affects the Generator's output. The Generator Network produces a variety of realistic output visuals when run over various random input noises.
Let's look into the details of Discriminator Network.
Discriminator Network
A basic classifier network called a discriminator network outputs the likelihood that an image is real. As a result, the Discriminator Network is exposed to genuine images from the training set half the time and false images from the Generator the other half of the time during the training process. The Discriminator's goal is to give real photos a probability of around 1 and fraudulent images a probability of around 0.
Dataset
This dataset is excellent for developing and testing face identification models, especially for identifying facial characteristics like identifying people who have brown hair, are grinning, or are wearing glasses. Large position variations, backdrop clutter, and a variety of persons are all covered in the photographs, which are accompanied by several images and detailed annotations.
You can download the dataset from Kaggle. Our goal is to develop a model that can produce accurate representations of people that do not exist in reality.
Let's look at the code of loading the dataset.
Loading Dataset
The following code loads the dataset:
Code:
from tqdm import tqdm
import numpy as np
import pandas as pd
import os
from matplotlib import pyplot as plt
PIC_DIR = './drive/img_align_celeba/'
IMG_COUNT = 10000
ORG_WIDTH = 178
ORG_HEIGHT = 208
diff = (ORG_HEIGHT - ORG_WIDTH) // 2
WIDTH = 128
HEIGHT = 128
crop_rect = (0, diff, ORG_WIDTH, ORG_HEIGHT - diff)
images = []
for pic_file in tqdm(os.listdir(PIC_DIR)[:IMAGES_COUNT]):
pic = Image.open(PIC_DIR + pic_file).crop(crop_rect)
pic.thumbnail((WIDTH, HEIGHT), Image.ANTIALIAS)
images.append(np.uint8(pic)) #Normalize the images
images = np.array(images) / 255
images.shape #printing first 25 images
plt.figure(1, figsize=(10, 10))
for i in range(25):
plt.subplot(5, 5, i+1)
plt.imshow(images[i])
plt.axis('off')
plt.show()
You can also try this code with Online Python Compiler
Let's look into the details of creating a Generator.
Creating a Generator
The generator goes the opposite way: The person attempting to deceive the discriminator is the artist. There are 8 convolutional layers in this network. Here, we begin by feeding our input—referred to as gen input—into our first convolutional layer. Each convolutional layer conducts a convolution, followed by batch normalisation, leaky ReLu, and finally another convolution. The tanh activation function is then returned.
Code:
L_DIM = 32
CHANNELS = 3
def create_generator():
gen_input = Input(shape=(L_DIM, ))
y = Dense(128 * 16 * 16)(gen_input)
y = LeakyReLU()(y)
y = Reshape((16, 16, 128))(y)
y = Conv2D(256, 5, padding='same')(y)
y = LeakyReLU()(y)
y = Conv2DTranspose(256, 4, strides=2, padding='same')(y)
y = LeakyReLU()(y)
y = Conv2DTranspose(256, 4, strides=2, padding='same')(y)
y = LeakyReLU()(y)
y = Conv2DTranspose(256, 4, strides=2, padding='same')(y)
y = LeakyReLU()(y)
y = Conv2D(512, 5, padding='same')(y)
y = LeakyReLU()(y)
y = Conv2D(512, 5, padding='same')(y)
y = LeakyReLU()(y)
y = Conv2D(CHANNELS, 7, activation='tanh', padding='same')(y)
generator = Model(gen_input, y)
return generator
You can also try this code with Online Python Compiler
Let's look into the details of creating a Discriminator.
Creating a Discriminator
Similar to the generator network, the discriminator network is made up of convolutional layers. We are going to do a convolution for each layer of the network, followed by batch normalisation to speed up and improve the network's accuracy, and ultimately a leaky ReLu.
Code:
def create_discriminator():
disc_input = Input(shape=(HEIGHT, WIDTH, CHANNELS))
x = Conv2D(256, 3)(disc_input)
x = LeakyReLU()(x)
x = Conv2D(256, 4, strides=2)(x)
x = LeakyReLU()(x)
x = Conv2D(256, 4, strides=2)(x)
x = LeakyReLU()(x)
x = Conv2D(256, 4, strides=2)(x)
x = LeakyReLU()(x)
x = Conv2D(256, 4, strides=2)(x)
x = LeakyReLU()(x)
x = Flatten()(x)
x = Dropout(0.4)(x)
x = Dense(1, activation='sigmoid')(x)
discriminator = Model(disc_input, x)
optimizer = RMSprop(
lr=.0001,
clipvalue=1.0,
decay=1e-8
)
discriminator.compile(
optimizer=optimizer,
loss='binary_crossentropy'
)
return discriminator
You can also try this code with Online Python Compiler
The generator model and the discriminator model can then be combined into a single larger model by using a GAN model. The output and computed error from the discriminator model will be utilised to train the model weights in the generator using this larger model. To guarantee that only the weights of the generator model are changed, the model weights in this larger GAN model are marked as not trainable because the discriminator model is trained separately. This modification to the discriminator weights' trainability only applies to the combined GAN model; it does not apply to the discriminator when used independently. The input to this larger GAN model is a point in the latent space, which is used to build a picture, which is then provided as input to the discriminator model, which determines if the image is real or the image is fake. Since the Discriminator's output is sigmoid, the loss is calculated using binary cross-entropy. In this situation, RMSProp as an optimizer produces more convincing phoney images than Adam. The rate of learning is 0.0001. Throughout the training's final stages, learning is stabilised by weight decay and clip value. If you wish to change the learning rate, you must change the decay. A probability distribution is replicated using GANs. As a result, we ought to employ loss functions that take into account the separation between the distribution of the data produced by the GAN and the distribution of the actual data.
Due to the discriminator's difficulty in telling the difference between real and false, performance suffers as the generator grows better with training. The discriminator has a 50% accuracy if the generator performs flawlessly. To make its forecast, the discriminator essentially flips a coin. The discriminator feedback diminishes over time, which complicates the GAN's overall ability to converge. The generator starts to train on garbage feedback when the discriminator stops providing entirely random feedback, which could cause the quality of the GAN to collapse.
The software employs a generative adversarial network (GAN) approach, in which two neural networks engage in a cat-and-mouse game, with one trying to create false images that are indistinguishable from actual photographs and the other trying to recognise the difference.
What is the difference between Generator and Discriminator?
The generator creates fresh data by using inputs that are only simple random variables whereas the discriminator attempts to discern between "true" and "fabricated" input by building a classifier.
Is GAN used for model generation?
In a broader sense, GANs are a model architecture for training generative models, and deep learning models are frequently used in this architecture.
Conclusion
In this article, we have extensively discussed the details of Human Face Generation using GAN along with its working , details of generator and discriminator, and Training the GAN Model.