Modifying StarGAN V2 using Modulated Convolutions

Model results at 512px resolution

Why StarGAN V2?

Today, there are many models for generating high-quality images. Specifically, for the attributes swapping task(in 2021), the best quality is given by models that are a further development of the StyleGAN, or obtained by distilling it, which requires significant computational time for training on new data domains. The proposed model generates images as shown at the beginning of the article after 24 hours of training from scratch on a single Google Colab GPU.

StarGAN V2 Architecture

StarGAN V2[1] is an image to image model which transfers image style using AdaIN layers managed by the conditional encoder. It uses information about the object structure and its texture separately, which allows the user to get combined images.

StarGAN’s parts related to image generation shown in the picture below. They include the ResNet-like[2] encoder— marked in green, decoder with AdainResBlk modules(it will be described below) — purple, and a set of condition-dependent style information encoders(gray-blue) with a shared head layers — marked in turquoise.

Stargan V2's inference

StarGAN works as follows. At the beginning, style encoders extracts low level features from the image. Then generator encodes object’s geometry information and gives it to the pyramid of AdainResBlk modules.

Each AdainResBlk block contains StyleGAN’s Adaptive Instance Normalization(AdaIN) modules[3], which modulates abstract object’s geometric representation by the information received from the style encoder.

AdainResBlk module structure

Let’s start our project to replace AdaIN normalization with modulated convolutions from StyleGAN 2[4].

StarGAN modification

At first, we need the original StarGAN’s repo: git clone

The source code of AdainResBlk is located in the core/ file. The code was shown below.

Now, we replace AdainResBlk with lucidrains StyleGAN 2 modules[5]. Functionality similar to AdainResBlk is implemented in class GeneratorBlock(file Let’s copy this class and its dependencies — Conv2DMod, Blur and RGBBlock, to our repo.

Generator block with Modulated Convolutions

The final version of the generator’s block is shown below.

For simplicity, we will not change the original conception of StyleGAN’s usage of two streams — feature stream and RGB image stream, so it’s necessary to modify the generator’s forward method.

Replace the recent lines:

with the next block of code:

For avoiding OOM on test call, comment “latent-guided image synthesis” and “reference-guided image synthesis” blocks in debug_image function(file

Fake images pool

For training model on 512x512 images we’ll have to reduce batch size to 1. In order to stabilize the training process we will use the fake images buffer(from pytorch-CycleGAN-and-pix2pix repo) which allows us to update discriminator’s weights using a history of generated data rather than the latest fake output.

Notes about Colab

If you will train the model in Colab environment you can fix a step parameter in _save_checkpoint and _load_checkpoint functions(in any case, Google Drive creates backups) and add the next lines in a train function which copy current model to the Drive:

Model training

After putting AFHQ in the data/ folder we can start training.

Training on images with size 256x256 can be started by:

Model output on 256x256 images

To train on 512x512px resolution run:

Model output on 512x512 images

Source codes are available here.


  1. Yunjey Choi, Youngjung Uh, Jaejun Yoo, Jung-Woo Ha, StarGAN v2: Diverse Image Synthesis for Multiple Domains.
  2. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition
  3. Tero Karras, Samuli Laine, Timo Aila, A Style-Based Generator Architecture for Generative Adversarial Networks.
  4. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila, Analyzing and Improving the Image Quality of StyleGAN




Computer Vision Researcher | | E-Mail: | Facebook:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Deploy a Servable BERT QA Model Using TensorFlow Serving

A note about finding anomalies

Classifying Days of Elevated Fire Risk and Predicting the Burn Area of Wildfires

Machine learning interview notes (part 2 — NLP, Prodigy)

A Primer on Neural Networks

Prompting in NLP: Prompt-based zero-shot learning

A General Discussion about Recommendation System

Using Deep Neural Networks to mimic human right brain

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vitaliy Hramchenko

Vitaliy Hramchenko

Computer Vision Researcher | | E-Mail: | Facebook:

More from Medium

This way to the egress: Barnum effect or language understanding in GPT-type models


SGPN paper explained

Rapid Deep Learning: The Bridge We Forgot to Build