Modifying StarGAN V2 using Modulated Convolutions

In this tutorial we will replace Adaptive Instance Normalization(AdaIN) layers in StarGAN V2 model and train it on images with a resolution 512x512 pixels.

4 min readJul 12, 2021

Why StarGAN V2?

Today, there are many models for generating high-quality images. Specifically, for the attributes swapping task(in 2021), the best quality is given by models that are a further development of the StyleGAN, or obtained by distilling it, which requires significant computational time for training on new data domains. The proposed model generates images as shown at the beginning of the article after 24 hours of training from scratch on a single Google Colab GPU.

StarGAN V2 Architecture

StarGAN V2[1] is an image to image model which transfers image style using AdaIN layers managed by the conditional encoder. It uses information about the object structure and its texture separately, which allows the user to get combined images.

StarGAN’s parts related to image generation shown in the picture below. They include the ResNet-like[2] encoder— marked in green, decoder with AdainResBlk modules(it will be described below) — purple, and a set of condition-dependent style information encoders(gray-blue) with a shared head layers — marked in turquoise.

StarGAN works as follows. At the beginning, style encoders extracts low level features from the image. Then generator encodes object’s geometry information and gives it to the pyramid of AdainResBlk modules.

Each AdainResBlk block contains StyleGAN’s Adaptive Instance Normalization(AdaIN) modules[3], which modulates abstract object’s geometric representation by the information received from the style encoder.

Let’s start our project to replace AdaIN normalization with modulated convolutions from StyleGAN 2[4].

StarGAN modification

At first, we need the original StarGAN’s repo: git clone https://github.com/clovaai/stargan-v2.git.

The source code of AdainResBlk is located in the core/model.py file. The code was shown below.

Now, we replace AdainResBlk with lucidrains StyleGAN 2 modules[5]. Functionality similar to AdainResBlk is implemented in class GeneratorBlock(file stylegan2_pytorch.py). Let’s copy this class and its dependencies — Conv2DMod, Blur and RGBBlock, to our repo.

Generator block with Modulated Convolutions

The final version of the generator’s block is shown below.

For simplicity, we will not change the original conception of StyleGAN’s usage of two streams — feature stream and RGB image stream, so it’s necessary to modify the generator’s forward method.

Replace the recent lines:

with the next block of code:

For avoiding OOM on test call, comment “latent-guided image synthesis” and “reference-guided image synthesis” blocks in debug_image function(file utils.py).

Fake images pool

For training model on 512x512 images we’ll have to reduce batch size to 1. In order to stabilize the training process we will use the fake images buffer(from pytorch-CycleGAN-and-pix2pix repo) which allows us to update discriminator’s weights using a history of generated data rather than the latest fake output.

Notes about Colab

If you will train the model in Colab environment you can fix a step parameter in _save_checkpoint and _load_checkpoint functions(in any case, Google Drive creates backups) and add the next lines in a train function which copy current model to the Drive: