Modifying StarGAN V2 using Modulated Convolutions
In this tutorial we will replace Adaptive Instance Normalization(AdaIN) layers in StarGAN V2 model and train it on images with a resolution 512x512 pixels.
Why StarGAN V2?
Today, there are many models for generating high-quality images. Specifically, for the attributes swapping task(in 2021), the best quality is given by models that are a further development of the StyleGAN, or obtained by distilling it, which requires significant computational time for training on new data domains. The proposed model generates images as shown at the beginning of the article after 24 hours of training from scratch on a single Google Colab GPU.
StarGAN V2 Architecture
StarGAN V2 is an image to image model which transfers image style using AdaIN layers managed by the conditional encoder. It uses information about the object structure and its texture separately, which allows the user to get combined images.
StarGAN’s parts related to image generation shown in the picture below. They include the ResNet-like encoder— marked in green, decoder with AdainResBlk modules(it will be described below) — purple, and a set of condition-dependent style information encoders(gray-blue) with a shared head layers — marked in turquoise.
StarGAN works as follows. At the beginning, style encoders extracts low level features from the image. Then generator encodes object’s geometry information and gives it to the pyramid of AdainResBlk modules.
Each AdainResBlk block contains StyleGAN’s Adaptive Instance Normalization(AdaIN) modules, which modulates abstract object’s geometric representation by the information received from the style encoder.
Let’s start our project to replace AdaIN normalization with modulated convolutions from StyleGAN 2.
At first, we need the original StarGAN’s repo: git clone https://github.com/clovaai/stargan-v2.git.
The source code of AdainResBlk is located in the core/model.py file. The code was shown below.
Now, we replace AdainResBlk with lucidrains StyleGAN 2 modules. Functionality similar to AdainResBlk is implemented in class GeneratorBlock(file stylegan2_pytorch.py). Let’s copy this class and its dependencies — Conv2DMod, Blur and RGBBlock, to our repo.
The final version of the generator’s block is shown below.
For simplicity, we will not change the original conception of StyleGAN’s usage of two streams — feature stream and RGB image stream, so it’s necessary to modify the generator’s forward method.
Replace the recent lines:
with the next block of code:
For avoiding OOM on test call, comment “latent-guided image synthesis” and “reference-guided image synthesis” blocks in debug_image function(file utils.py).
Fake images pool
For training model on 512x512 images we’ll have to reduce batch size to 1. In order to stabilize the training process we will use the fake images buffer(from pytorch-CycleGAN-and-pix2pix repo) which allows us to update discriminator’s weights using a history of generated data rather than the latest fake output.
Notes about Colab
If you will train the model in Colab environment you can fix a step parameter in _save_checkpoint and _load_checkpoint functions(in any case, Google Drive creates backups) and add the next lines in a train function which copy current model to the Drive:
After putting AFHQ in the data/ folder we can start training.
Training on images with size 256x256 can be started by:
To train on 512x512px resolution run:
Source codes are available here.
- Yunjey Choi, Youngjung Uh, Jaejun Yoo, Jung-Woo Ha, StarGAN v2: Diverse Image Synthesis for Multiple Domains.
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition
- Tero Karras, Samuli Laine, Timo Aila, A Style-Based Generator Architecture for Generative Adversarial Networks.
- Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila, Analyzing and Improving the Image Quality of StyleGAN