Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Implementation
2.1.
Progressive Growing Custom Layers
2.2.
Weighted Sum Layer
2.3.
Minibatch Stdev
2.4.
Pixel Normalization
2.5.
Progressive Growing Discriminator Model
2.6.
Progressive Growing Generator Model
2.7.
Composite Models for Training the Generators
3.
FAQs
4.
Key Takeaways
Last Updated: Mar 27, 2024
Easy

Progressive Growing GAN - Part 1

Author soham Medewar
0 upvote
Master Python: Predicting weather forecasts
Speaker
Ashwin Goyal
Product Manager @

Introduction

Before going on to the main topic lets us briefly discuss GAN. GANs (Generative Adversarial Networks) are a strong family of neural networks used for unsupervised learning. GANs are composed of two competing neural networks they are generator and discriminator models that can analyze, capture, and duplicate the changes within a dataset.

Progressively Growing GANs is also called ProGAN.The architecture of progressive GAN was developed by NVIDIA in 2018. Basic GANs can generate images of size 32 × 32 (CIFAR-10), 128 × 128 (ImageNet). Progressive growing GANs can generate images of size 1024 × 1024.

It includes starting with a very small image and gradually adding layers blocks so that the output size of the generator model grows and the input size of the discriminator model grows until the required image size is reached. This method has shown to be very effective in producing high-quality synthetic images that are extremely lifelike.

 

​​Let us start implementing a progressive growing model.

Implementation

We will implement the proGAN model that will generate random faces of celebrities. In this model, we will use the dataset from this link.

We'll start by defining some custom layers that will be used to define the generator and discriminator models, and then we'll define functions to construct and grow the discriminator and generator models themselves.

Progressive Growing Custom Layers

To create the progressive growing generative adversarial network, three custom layers are required.

WeightedSum: During a growth phase, this variable is used to adjust the weighted sum of the old and new layers.

MinibatchStdev: In the discriminator, this is used to summarise statistics for a batch of photos.

PixelNormalization: In the generator model, it's used to normalize activation maps.

In addition, the research employs a weight constraint known as "equalized learning rate." This, too, would need the use of a custom layer. We won't utilize equalized learning rate in this tutorial because it's too long, so we'll just use a simple max norm weight restriction instead.

Weighted Sum Layer

# outputs weighted sum
class WeightedSum(Add):
 # init with the default value
 def __init__(self, alpha=0.0, **kwargs):
  super(WeightedSum, self).__init__(**kwargs)
  self.alpha = backend.variable(alpha, name='ws_alpha')

 # outputs weighted sum of inputs
 def _merge_function(self, inputs):
  # supports weighted sum of only two inputs
  assert (len(inputs) == 2)
  # ((1-a) * input1) + (a * input2)
  output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1])
  return output

Minibatch Stdev

# mini-batch standard deviation layer
class MinibatchStdev(Layer):
 # initialising the layer
 def __init__(self, **kwargs):
  super(MinibatchStdev, self).__init__(**kwargs)

 # performing the operation
 def call(self, inputs):
  # calculating the mean value for each pixel across the channels
  mn = backend.mean(inputs, axis=0, keepdims=True)
  # calculating the squared differences between the pixel values and the mean
  squared_differences = backend.square(inputs - mn)
  # calculating the average of the squared differences (variance)
  mean_square_differences = backend.mean(squared_differences, axis=0, keepdims=True)
  # adding a small value to avoid a blow-up when we are calculating stdev
  mean_square_differences += 1e-8
  # taking square root of the variance (stdev)
  set_log_device_placement = backend.sqrt(mean_square_differences)
  # calculating the mean standard deviation across each pixel coord
  mean_pix = backend.mean(set_log_device_placement, keepdims=True)
  # scaling this up to be the size of one input feature map for each sample
  shape = backend.shape(inputs)
  output = backend.tile(mean_pix, (shape[0], shape[1], shape[2], 1))
  # concatenating with the output
  combined = backend.concatenate([inputs, output], axis=-1)
  return combined

Pixel Normalization

# pixel-wise feature vector normalization layer
class PixelNormalization(Layer):
 # initialising the layer
 def __init__(self, **kwargs):
  super(PixelNormalization, self).__init__(**kwargs)

 # performing the operation
 def call(self, inputs):
  # calculating square pixel values
  val = inputs**2.0
  # calculating the mean pixel values
  mean_val = backend.mean(val, axis=-1, keepdims=True)
  # ensuring the mean is not zero
  mean_val += 1.0e-8
  # calculating the sqrt of the mean squared value (L2 norm)
  l2 = backend.sqrt(mean_val)
  # normalizing values by the l2 norm
  normalized = inputs / l2
  return normalized

Progressive Growing Discriminator Model

The discriminator model is a deep convolutional neural network that takes as input a 4×4 color image and predicts whether it is real or not.

A 1×1 convolutional layer is the initial hidden layer. A MinibatchStdev, 3×3, and 4×4 convolutional layers, as well as a fully connected layer that outputs a prediction, make up the output block. After all layers, leaky ReLU activation functions are utilized, while the output layers use a linear activation function.

# adding a discriminator block
def add_discriminator_block(old_model, n_input_layers=3):
 # initializing weights
intl = RandomNormal(stddev=0.02)
 # weight constraint
cnst = max_norm(1.0)
 # getting shape of existing model
in_shape = list(old_model.input.shape)
 # defining new input shape as double the size
i_s = (int(in_shape[-2])*2, int(in_shape[-2])*2, int(in_shape[-1]))
in_img = Input(shape=i_s)
 # defining new input processing layer
d = Conv2D(128, (1,1), padding='same', kernel_initializer=intl, kernel_constraint=cnst)(in_img)
d = LeakyReLU(alpha=0.2)(d)
 # defining new block
d = Conv2D(128, (3,3), padding='same', kernel_initializer=intl, kernel_constraint=cnst)(d)
d = LeakyReLU(alpha=0.2)(d)
d = Conv2D(128, (3,3), padding='same', kernel_initializer=intl, kernel_constraint=cnst)(d)
d = LeakyReLU(alpha=0.2)(d)
d = AveragePooling2D()(d)
block_new = d
 # skipping the input
 for i in range(n_input_layers, len(old_model.layers)):
  d = old_model.layers[i](d)
 # defining straight-through model
m1 = Model(in_img, d)
 # compiling model
m1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))
 # downsampling the new larger image
downsample = AveragePooling2D()(in_img)
 # connecting old input
block_old = old_model.layers[1](downsample)
block_old = old_model.layers[2](block_old)
 # fade in output of old model input layer with new input
d = WeightedSum()([block_old, block_new])
 # skipping the input
 for i in range(n_input_layers, len(old_model.layers)):
  d = old_model.layers[i](d)
 # defining straight-through model
m2 = Model(in_img, d)
 # compiling model
m2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))
 return [m1, m2]

Progressive Growing Generator Model

The generator model generates a synthetic image using a random point from the latent space as input.

The discriminator models and the generator models are both defined in the same way.

A foundation model for creating 4×4 images is constructed, and growth versions of the model for huge image output sizes are created.

# adding a generator block
def add_generator_block(old_model):
 # initializing weights
init = RandomNormal(stddev=0.02)
 # weight constraint
const = max_norm(1.0)
 # getting the end of the last block
block_end = old_model.layers[-2].output
 # upsampling, and defining new block
upsampling = UpSampling2D()(block_end)
g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(upsampling)
g = PixelNormalization()(g)
g = LeakyReLU(alpha=0.2)(g)
g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
g = PixelNormalization()(g)
g = LeakyReLU(alpha=0.2)(g)
 # adding new output layer
out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
 # defining model
model1 = Model(old_model.input, out_image)
 # getting the output layer from old model
out_old = old_model.layers[-1]
 # connecting the upsample to the old output layer
out_image2 = out_old(upsampling)
 # defining new output image as the weighted sum of the old and new models
merged = WeightedSum()([out_image2, out_image])
 # defining model
model2 = Model(old_model.input, merged)
 return [model1, model2]

# defining generator models
def define_generator(latent_dim, n_blocks, in_dim=4):
 # initializing weights
init = RandomNormal(stddev=0.02)
 # weight constraint
const = max_norm(1.0)
model_list = list()
 # base model latent input
in_latent = Input(shape=(latent_dim,))
 # linearly scaling up to activation maps
g  = Dense(128 * in_dim * in_dim, kernel_initializer=init, kernel_constraint=const)(in_latent)
g = Reshape((in_dim, in_dim, 128))(g)
 # conv 4x4, input block
g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
g = PixelNormalization()(g)
g = LeakyReLU(alpha=0.2)(g)
 # conv 3x3
g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
g = PixelNormalization()(g)
g = LeakyReLU(alpha=0.2)(g)
 # conv 1x1, output block
out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
 # defining model
model = Model(in_latent, out_image)
 # storing model
model_list.append([model, model])
 # creating submodels
 for i in range(1, n_blocks):
  # getting prior model without the fade-on
  old_model = model_list[i - 1][0]
  # creating new model for next resolution
  models = add_generator_block(old_model)
  # storing model
  model_list.append(models)
 return model_list

Composite Models for Training the Generators

As the generator models are not trained directly, they are not compiled.  Instead, the generator models are trained using Wasserstein loss via the discriminator models. This includes showing the discriminator’s generated images as real images and calculating the loss, which is then utilized to update the generator models.

A given generator model must be linked with a given discriminator model in terms of picture size (e.g., 4×4 or 8×8), as well as training phase (e.g., growth phase (introducing the new block) or fine-tuning phase) (normal or straight-through).

This can be accomplished by building a new model for each pair of models that stacks the generator on top of the discriminator, allowing the synthetic image to pass straight into the discriminator model to determine if it is real or not. This composite model can then be used to train the generator via the discriminator, with the discriminator's weights marked as not trainable (just in this model) to guarantee that they are not changed during the deception.

# defining composite models for training generators via discriminators
def define_composite(discriminators, generators):
model_list = list()
 # creating composite models
 for i in range(len(discriminators)):
  g_models, d_models = generators[i], discriminators[i]
  # straight-through model
  d_models[0].trainable = False
  model1 = Sequential()
  model1.add(g_models[0])
  model1.add(d_models[0])
  model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))
  # fade-in model
  d_models[1].trainable = False
  model2 = Sequential()
  model2.add(g_models[1])
  model2.add(d_models[1])
  model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))
  # storing
  model_list.append([model1, model2])
 return model_list

The training part for this model will be covered in this blog.

Also read, Sampling and Quantization

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

FAQs

1. Do GANs need a lot of data?

GAN models are data-hungry, requiring massive amounts of varied and high-quality training samples to create high-fidelity natural pictures of various categories.

 

2. What is a discriminator in GAN?
In a GAN, the Discriminator is just a classifier. It attempts to distinguish between real data and data generated by the Generator. It might utilize any network architecture suitable for the sort of data it categorizes.

 

3. What is proGAN?

The term "ProGAN" refers to a type of generative adversarial network that was invented at NVIDIA. Progressive Growing of GANs For Improved Quality, Stability, and Variation.

 

4. What is Wasserstein loss?

The Wasserstein loss function aims to increase the difference between real and generated picture ratings. The following is a summary of the function as described in the paper: Critic Loss = [average critic score on real images] − [average critic score on fake images].

Key Takeaways

In this article, we have discussed the architecture and model implementation of proGAN. The training part of the generator models is covered in the second part of this blog.

Read about Batch Operating System here.

Want to learn more about Machine Learning? Here is an excellent course that can guide you in learning. 

Happy Coding!

Previous article
StackGAN
Next article
Pix2Pix GAN
Live masterclass