7 Cycle Gan

Application;s of cycle GAN is in the field of style transfer, here we build a model that can transform the input image in order to give the impression that it comes from the same collection as a given set of style images.
Below the style collection used is zebras and base collection is horse images. You can see style transfer on our base image (horse).
Once Cycle GAN is trained, we can also use the same model to perform style transfer from horse (base images) to zebras (style collection ) or the other way. This is possible because of the way cycle GAN is trained.

What's special about it :

So far [[AI/GAN/4-GAN| Vanilla GAN]] and [[AI/GAN/3-Variational autoencoders | VAE ]] are used to produce novel observations by randomly sampling a vector from a standard normal distribution, there is no input image.
However with style transfer our goal is to extract style component from a collection of images and embed them into the base input images.
One of the major advantage of cycle GAN is it doesn't need paired images to perform training unlike in pix2pix or style GAN.

As you can see above in pix2pix GAN you need a pair dataset, base image and how base image might look like once the style transfer is done. And pix2pix uses these pairs to perform training and learn style transfer. But we don't often have pairs of data.
Also unlike Pix2Pix and style GAN, cycle GAN is capable of bidirectional style transfer.

Cycle GAN contains 2 generators and 2 discriminators. Let's say we have 2 sets of images collections, say : A --> Apples ; B --> Oranges
we want to perform style transfer from oranges to apples and apples to oranges.

Generators perform the style transfer from one collection to the other :
Generator $G_{AB}$ : Learns to convert an image from domain A to domain B (Fake B).
Generator $G_{BA}$ : Learns to convert an image from domain B to domain A (Fake A).
The generators in cycle GAN uses UNET architecture.
UNET blends high level abstraction information captured during down sampling process (i.e image style) with specific spatial information retained in low level layers using skip connections during upsampling.

The generator of CycleGAN uses InstanceNormalization layers rather than BatchNormalization layers, which can lead to more satisfying results in style transfer methods.

Discriminators here are similar to the discriminators in vanilla GAN. They try to identify which is real and fake.
Discriminator $D_{A}$ : Learns the difference between real images from domain A and fake images generated by $G_{BA}$ .
Discriminator $D_{B}$ : Learns the difference between real images from domain B and fake images generated by $G_{AB}$ .
One difference of discriminator compared to vanilla GAN is the output is 8x8 single channel tensor rather than single number. CycleGAN inherits this from PatchGAN, where dis divides images into patches and then guesses real or fake.
The benefit of using a PatchGAN discriminator is that the loss function can then measure how good the discriminator is at distinguishing images based on their style rather than their content. Since each individual element of the discriminator prediction is based only on a small square of the image, it must use the style of the patch, rather than its content, to make its decision. This is exactly what we require; we would rather our discriminator is good at identifying when two images differ in style than content.

We judge the generators A and B, simultaneously on three criteria :
- Validity : Do the images produced by each generator fool the relevant discriminator? ( For example, does output from $G_{BA}$ fool $d_{A}$ and does output from $G_{AB}$ fool $d_{B}$ ? )
- Reconstruction. If we apply the two generators one after the other (in both directions), do we return to the original image? The CycleGAN gets its name from this cyclic reconstruction criterion.
- Identity. If we apply each generator to images from its own target domain, does the image remain unchanged?

The 3 losses are shown above

These 3 losses are weighted and combined to form as the final loss for the generators.
Every loss has it's importance and cycle GAN is quite sensitive to weights given to the loss terms, look at an example below where identity loss is given weight 0.

Trained without identity loss

The CycleGAN has still managed to translate the oranges into apples but the color of the tray holding the oranges has flipped from black to white, as there is now no identity loss term to prevent this shift in background colors. The identity term helps regulate the generator to ensure that it only adjust parts of the image that are necessary to complete the transformation and no more.

We apply 2 techniques from recent works to stabilize our model training procedure.
- 1. We replace the negative log likelihood with least squared loss. This loss is more stable during training and generates high quality images.
  - For example square loss for discriminator $D_{A}$ is --> min $[ (D_{A}(A) - 1)^2 + D_A(G_{BA}(B))]$
  - Similarly for generator $G_{BA}$ = min $[ D_A(G_{BA}(B))^2 - 1 ]$
  1. To reduce oscillation, we update the discriminator using a history of generated images rather than the latest generation. We can keep an image buffer to hold this previous generation.

FCN Score : (labels-->photo)
- This metric is used when GAN is used to create photo from labels.
- The intuition is that if we generate a photo from a label map of “car on the road”, then we have succeeded if the FCN applied to the generated photo detects “car on the road”
- The label map generated using FCN could be compared with ground truth in terms of per pixel accuracy, per class accuracy, class IOU.

Semantic segmentation metrics : To evaluate the performance of photo-->labels, we use the standard metrics like per-pixel accuracy, per-class accuracy, and mean class Intersection Over-Union

On translation tasks that involve color and texture cycle GAN works well, however tasks where geometric changes are required cgan performance is quite poor.
For example, on the task of dog-->cat transfiguration, the learned translation degenerates into making minimal changes to the input
Some failure cases are caused by the distribution characteristics of the training datasets. For example, our method has got confused in the horse --> zebra example where zebra pattern is drawn over the man sitting on horse as the model never saw such case.
**Find below few failure cases: **

Paired image training still performs better in segmenting parts in image compared to cycleGAN, However Integrating weak or semi-supervised data may lead to substantially more powerful translators, still we only need a fraction of the annotation cost compared to fully-supervised systems.

Tags: #gan #cycle_gan #neural_style_transfer

Reference :