Introduction
Before going on to the main topic lets us briefly discuss GAN. GANs (Generative Adversarial Networks) are a strong family of neural networks used for unsupervised learning. GANs are composed of two competing neural networks they are generator and discriminator models that can analyze, capture, and duplicate the changes within a dataset.
Progressively Growing GANs is also called ProGAN.The architecture of progressive GAN was developed by NVIDIA in 2018. Basic GANs can generate images of size 32 × 32 (CIFAR-10), 128 × 128 (ImageNet). Progressive growing GANs can generate images of size 1024 × 1024.
It includes starting with a very small image and gradually adding layers blocks so that the output size of the generator model grows and the input size of the discriminator model grows until the required image size is reached. This method has shown to be very effective in producing high-quality synthetic images that are extremely lifelike.

Let us start implementing a progressive growing model.
Implementation
We will implement the proGAN model that will generate random faces of celebrities. In this model, we will use the dataset from this link.
We'll start by defining some custom layers that will be used to define the generator and discriminator models, and then we'll define functions to construct and grow the discriminator and generator models themselves.
Progressive Growing Custom Layers
To create the progressive growing generative adversarial network, three custom layers are required.
WeightedSum: During a growth phase, this variable is used to adjust the weighted sum of the old and new layers.
MinibatchStdev: In the discriminator, this is used to summarise statistics for a batch of photos.
PixelNormalization: In the generator model, it's used to normalize activation maps.
In addition, the research employs a weight constraint known as "equalized learning rate." This, too, would need the use of a custom layer. We won't utilize equalized learning rate in this tutorial because it's too long, so we'll just use a simple max norm weight restriction instead.
Weighted Sum Layer
# outputs weighted sum
class WeightedSum(Add):
# init with the default value
def __init__(self, alpha=0.0, **kwargs):
super(WeightedSum, self).__init__(**kwargs)
self.alpha = backend.variable(alpha, name='ws_alpha')
# outputs weighted sum of inputs
def _merge_function(self, inputs):
# supports weighted sum of only two inputs
assert (len(inputs) == 2)
# ((1-a) * input1) + (a * input2)
output = ((1.0 - self.alpha) * inputs[0]) + (self.alpha * inputs[1])
return outputMinibatch Stdev
# mini-batch standard deviation layer
class MinibatchStdev(Layer):
# initialising the layer
def __init__(self, **kwargs):
super(MinibatchStdev, self).__init__(**kwargs)
# performing the operation
def call(self, inputs):
# calculating the mean value for each pixel across the channels
mn = backend.mean(inputs, axis=0, keepdims=True)
# calculating the squared differences between the pixel values and the mean
squared_differences = backend.square(inputs - mn)
# calculating the average of the squared differences (variance)
mean_square_differences = backend.mean(squared_differences, axis=0, keepdims=True)
# adding a small value to avoid a blow-up when we are calculating stdev
mean_square_differences += 1e-8
# taking square root of the variance (stdev)
set_log_device_placement = backend.sqrt(mean_square_differences)
# calculating the mean standard deviation across each pixel coord
mean_pix = backend.mean(set_log_device_placement, keepdims=True)
# scaling this up to be the size of one input feature map for each sample
shape = backend.shape(inputs)
output = backend.tile(mean_pix, (shape[0], shape[1], shape[2], 1))
# concatenating with the output
combined = backend.concatenate([inputs, output], axis=-1)
return combinedPixel Normalization
# pixel-wise feature vector normalization layer
class PixelNormalization(Layer):
# initialising the layer
def __init__(self, **kwargs):
super(PixelNormalization, self).__init__(**kwargs)
# performing the operation
def call(self, inputs):
# calculating square pixel values
val = inputs**2.0
# calculating the mean pixel values
mean_val = backend.mean(val, axis=-1, keepdims=True)
# ensuring the mean is not zero
mean_val += 1.0e-8
# calculating the sqrt of the mean squared value (L2 norm)
l2 = backend.sqrt(mean_val)
# normalizing values by the l2 norm
normalized = inputs / l2
return normalizedProgressive Growing Discriminator Model
The discriminator model is a deep convolutional neural network that takes as input a 4×4 color image and predicts whether it is real or not.
A 1×1 convolutional layer is the initial hidden layer. A MinibatchStdev, 3×3, and 4×4 convolutional layers, as well as a fully connected layer that outputs a prediction, make up the output block. After all layers, leaky ReLU activation functions are utilized, while the output layers use a linear activation function.
# adding a discriminator block
def add_discriminator_block(old_model, n_input_layers=3):
# initializing weights
intl = RandomNormal(stddev=0.02)
# weight constraint
cnst = max_norm(1.0)
# getting shape of existing model
in_shape = list(old_model.input.shape)
# defining new input shape as double the size
i_s = (int(in_shape[-2])*2, int(in_shape[-2])*2, int(in_shape[-1]))
in_img = Input(shape=i_s)
# defining new input processing layer
d = Conv2D(128, (1,1), padding='same', kernel_initializer=intl, kernel_constraint=cnst)(in_img)
d = LeakyReLU(alpha=0.2)(d)
# defining new block
d = Conv2D(128, (3,3), padding='same', kernel_initializer=intl, kernel_constraint=cnst)(d)
d = LeakyReLU(alpha=0.2)(d)
d = Conv2D(128, (3,3), padding='same', kernel_initializer=intl, kernel_constraint=cnst)(d)
d = LeakyReLU(alpha=0.2)(d)
d = AveragePooling2D()(d)
block_new = d
# skipping the input
for i in range(n_input_layers, len(old_model.layers)):
d = old_model.layers[i](d)
# defining straight-through model
m1 = Model(in_img, d)
# compiling model
m1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))
# downsampling the new larger image
downsample = AveragePooling2D()(in_img)
# connecting old input
block_old = old_model.layers[1](downsample)
block_old = old_model.layers[2](block_old)
# fade in output of old model input layer with new input
d = WeightedSum()([block_old, block_new])
# skipping the input
for i in range(n_input_layers, len(old_model.layers)):
d = old_model.layers[i](d)
# defining straight-through model
m2 = Model(in_img, d)
# compiling model
m2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))
return [m1, m2]Progressive Growing Generator Model
The generator model generates a synthetic image using a random point from the latent space as input.
The discriminator models and the generator models are both defined in the same way.
A foundation model for creating 4×4 images is constructed, and growth versions of the model for huge image output sizes are created.
# adding a generator block
def add_generator_block(old_model):
# initializing weights
init = RandomNormal(stddev=0.02)
# weight constraint
const = max_norm(1.0)
# getting the end of the last block
block_end = old_model.layers[-2].output
# upsampling, and defining new block
upsampling = UpSampling2D()(block_end)
g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(upsampling)
g = PixelNormalization()(g)
g = LeakyReLU(alpha=0.2)(g)
g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
g = PixelNormalization()(g)
g = LeakyReLU(alpha=0.2)(g)
# adding new output layer
out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
# defining model
model1 = Model(old_model.input, out_image)
# getting the output layer from old model
out_old = old_model.layers[-1]
# connecting the upsample to the old output layer
out_image2 = out_old(upsampling)
# defining new output image as the weighted sum of the old and new models
merged = WeightedSum()([out_image2, out_image])
# defining model
model2 = Model(old_model.input, merged)
return [model1, model2]
# defining generator models
def define_generator(latent_dim, n_blocks, in_dim=4):
# initializing weights
init = RandomNormal(stddev=0.02)
# weight constraint
const = max_norm(1.0)
model_list = list()
# base model latent input
in_latent = Input(shape=(latent_dim,))
# linearly scaling up to activation maps
g = Dense(128 * in_dim * in_dim, kernel_initializer=init, kernel_constraint=const)(in_latent)
g = Reshape((in_dim, in_dim, 128))(g)
# conv 4x4, input block
g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
g = PixelNormalization()(g)
g = LeakyReLU(alpha=0.2)(g)
# conv 3x3
g = Conv2D(128, (3,3), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
g = PixelNormalization()(g)
g = LeakyReLU(alpha=0.2)(g)
# conv 1x1, output block
out_image = Conv2D(3, (1,1), padding='same', kernel_initializer=init, kernel_constraint=const)(g)
# defining model
model = Model(in_latent, out_image)
# storing model
model_list.append([model, model])
# creating submodels
for i in range(1, n_blocks):
# getting prior model without the fade-on
old_model = model_list[i - 1][0]
# creating new model for next resolution
models = add_generator_block(old_model)
# storing model
model_list.append(models)
return model_listComposite Models for Training the Generators
As the generator models are not trained directly, they are not compiled. Instead, the generator models are trained using Wasserstein loss via the discriminator models. This includes showing the discriminator’s generated images as real images and calculating the loss, which is then utilized to update the generator models.
A given generator model must be linked with a given discriminator model in terms of picture size (e.g., 4×4 or 8×8), as well as training phase (e.g., growth phase (introducing the new block) or fine-tuning phase) (normal or straight-through).
This can be accomplished by building a new model for each pair of models that stacks the generator on top of the discriminator, allowing the synthetic image to pass straight into the discriminator model to determine if it is real or not. This composite model can then be used to train the generator via the discriminator, with the discriminator's weights marked as not trainable (just in this model) to guarantee that they are not changed during the deception.
# defining composite models for training generators via discriminators
def define_composite(discriminators, generators):
model_list = list()
# creating composite models
for i in range(len(discriminators)):
g_models, d_models = generators[i], discriminators[i]
# straight-through model
d_models[0].trainable = False
model1 = Sequential()
model1.add(g_models[0])
model1.add(d_models[0])
model1.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))
# fade-in model
d_models[1].trainable = False
model2 = Sequential()
model2.add(g_models[1])
model2.add(d_models[1])
model2.compile(loss=wasserstein_loss, optimizer=Adam(lr=0.001, beta_1=0, beta_2=0.99, epsilon=10e-8))
# storing
model_list.append([model1, model2])
return model_listThe training part for this model will be covered in this blog.
Also read, Sampling and Quantization




