0% found this document useful (0 votes)
17 views9 pages

CycleGAN - Learning To Translate Images (Without Paired Training Data) - by Sarah Wolf - Towards Data Science

Uploaded by

hai.nguyen29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

CycleGAN - Learning To Translate Images (Without Paired Training Data) - by Sarah Wolf - Towards Data Science

Uploaded by

hai.nguyen29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

5/17/2021 CycleGAN: Learning to Translate Images (Without Paired Training Data) | by Sarah Wolf | Towards Data Science

Open in app

Follow 594K Followers

CycleGAN: Learning to Translate Images


(Without Paired Training Data)
Sarah Wolf Nov 20, 2018 · 7 min read

An image of zebras translated to horses, using a CycleGAN

Image-to-image translation is the task of transforming an image from one domain (e.g.,
images of zebras), to another (e.g., images of horses). Ideally, other features of the
image — anything not directly related to either domain, such as the background —
should stay recognizably the same. As we might imagine, a good image-to-image
translation system could have an almost unlimited number of applications. Changing art
styles, going from sketch to photo, or changing the season of the landscape in a photo
are just a few examples.

https://towardsdatascience.com/cyclegan-learning-to-translate-images-without-paired-training-data-5b4e93862c8d 1/9
5/17/2021 CycleGAN: Learning to Translate Images (Without Paired Training Data) | by Sarah Wolf | Towards Data Science

Open in app

Examples of paired and unpaired data. *Image taken from the paper.

While there has been a great deal of research into this task, most of it has utilized
supervised training, where we have access to (x, y) pairs of corresponding images from
the two domains we want to learn to translate between. CycleGAN was introduced in the
now well-known 2017 paper out of Berkeley, Unpaired Image-to-Image Translation
using Cycle-Consistent Adversarial Networks. It was interesting because it did not
require paired training data — while an x and y set of images are still required, they do
not need to directly correspond to each other. In other words, if you wanted to translate
between sketches and photos, you still need to train on a bunch of sketches and a bunch
of photos, but the sketches would not need to be of the exact photos in your dataset.

Since paired data is harder to find in most domains, and not even possible in some, the
unsupervised training capabilities of CycleGAN are quite useful.

How does it work?

A Tale of Two Generators


CycleGAN is a Generative Adversarial Network (GAN) that uses two generators and two
discriminators. (Note: If you are not familiar with GANs, you may want to read up about

https://towardsdatascience.com/cyclegan-learning-to-translate-images-without-paired-training-data-5b4e93862c8d 2/9
5/17/2021 CycleGAN: Learning to Translate Images (Without Paired Training Data) | by Sarah Wolf | Towards Data Science

them before continuing).


Open in app

We call one generator G, and have it convert images from the X domain to the Y domain.
The other generator is called F, and converts images from Y to X.

Both G and F are generators that take an image from one domain and translate it to another. G maps from X to
Y, whereas F goes in the opposite direction, mapping Y to X.

Each generator has a corresponding discriminator, which attempts to tell apart its
synthesized images from real ones.

One discriminator provides adversarial training for G, and the other does the same for F.

The Objective Function


There are two components to the CycleGAN objective function, an adversarial loss and a
cycle consistency loss. Both are essential to getting good results.

If you are familiar with GANs, the adversarial loss should come as no surprise. Both
generators are attempting to “fool” their corresponding discriminator into being less
able to distinguish their generated images from the real versions. We use the least
squares loss (found by Mao et al to be more effective than the typical log likelihood loss)
to capture this.

https://towardsdatascience.com/cyclegan-learning-to-translate-images-without-paired-training-data-5b4e93862c8d 3/9
5/17/2021 CycleGAN: Learning to Translate Images (Without Paired Training Data) | by Sarah Wolf | Towards Data Science

However, the adversarial loss alone is not sufficient to produce good images, as it leaves
Open in app
the model under-constrained. It enforces that the generated output be of the appropriate
domain, but does not enforce that the input and output are recognizably the same. For
example, a generator that output an image y that was an excellent example of that
domain, but looked nothing like x, would do well by the standard of the adversarial loss,
despite not giving us what we really want.

The cycle consistency loss addresses this issue. It relies on the expectation that if you
convert an image to the other domain and back again, by successively feeding it through
both generators, you should get back something similar to what you put in. It enforces
that F(G(x)) ≈ x and G(F(y)) ≈ y.

We can create the full objective function by putting these loss terms together, and
weighting the cycle consistency loss by a hyperparameter λ. We suggest setting λ = 10.

Generator Architecture
Each CycleGAN generator has three sections: an encoder, a transformer, and a decoder.
The input image is fed directly into the encoder, which shrinks the representation size
while increasing the number of channels. The encoder is composed of three convolution
layers. The resulting activation is then passed to the transformer, a series of six residual
blocks. It is then expanded again by the decoder, which uses two transpose convolutions
to enlarge the representation size, and one output layer to produce the final image in
RGB.

You can see the details in the figure below. Please note that each layer is followed by an
instance normalization and a ReLU layer, but these have been omitted for simplicity.

https://towardsdatascience.com/cyclegan-learning-to-translate-images-without-paired-training-data-5b4e93862c8d 4/9
5/17/2021 CycleGAN: Learning to Translate Images (Without Paired Training Data) | by Sarah Wolf | Towards Data Science

Open in app

An architecture for a CycleGAN generator. As you can see above, the representation size shrinks in the
encoder phase, stays constant in the transformer phase, and expands again in the decoder phase. The
representation size that each layer outputs is listed below it, in terms of the input image size, k. On each layer
is listed the number of filters, the size of those filters, and the stride. Each layer is followed by an instance
normalization and ReLU activation.

Since the generators’ architecture is fully convolutional, they can handle arbitrarily large
input once trained.

Discriminator Architecture
The discriminators are PatchGANs, fully convolutional neural networks that look at a
“patch” of the input image, and output the probability of the patch being “real”. This is
both more computationally efficient than trying to look at the entire input image, and is
also more effective — it allows the discriminator to focus on more surface-level features,
like texture, which is often the sort of thing being changed in an image translation task.

If you’ve read about other image-to-image translation systems, you may already be
familiar with PatchGAN. By the time of the CycleGAN paper, a version of PatchGAN had
already been successfully used in paired image-to-image translation by Isola et al in
Image-to-Image Translation with Conditional Adversarial Nets.

https://towardsdatascience.com/cyclegan-learning-to-translate-images-without-paired-training-data-5b4e93862c8d 5/9
5/17/2021 CycleGAN: Learning to Translate Images (Without Paired Training Data) | by Sarah Wolf | Towards Data Science

Open in app

An example architecture for a PatchGAN discriminator. PatchGAN is a fully convolutional network, that takes
in an image, and produces a matrix of probabilities, each referring to the probability of the corresponding
“patch” of the image being “real” (as opposed to generated). The representation size that each layer outputs is
listed below it, in terms of the input image size, k. On each layer is listed the number of filters, the size of those
filters, and the stride.

As you can see in the example architecture above, the PatchGAN halves the
representation size and doubles the number of channels until the desired output size is
reached. In this case, it was most effective to have the PatchGAN evaluate 70x70 sized
patches of the input.

Reducing Model Oscillation


To prevent the model from changing drastically from iteration to iteration, the
discriminators were fed a history of generated images, rather than just the ones
produced by the latest versions of the generators. To do this, we keep a pool to store the
50 most recently generated images. This technique of reducing model oscillation was
pioneered by Shrivastava et al. in Learning from Simulated and Unsupervised Images
through Adversarial Training.

Other Training Details

https://towardsdatascience.com/cyclegan-learning-to-translate-images-without-paired-training-data-5b4e93862c8d 6/9
5/17/2021 CycleGAN: Learning to Translate Images (Without Paired Training Data) | by Sarah Wolf | Towards Data Science

The training approach was fairly typical for an image-to-image translation task. The
Open in app
Adam optimizer, a common variant of gradient descent, was used to make training more
stable and efficient. The learning rate was set to 0.0002 for the first half of training, and
then linearly reduced to zero over the remaining iterations. The batch size was set to 1,
which is why we refer to instance normalization, rather than batch normalization, in the
architecture diagrams above.

Strengths and Limitations


Overall, the results produced by CycleGAN are very good — image quality approaches
that of paired image-to-image translation on many tasks. This is impressive, because
paired translation tasks are a form of fully supervised learning, and this is not. When the
CycleGAN paper came out, it handily surpassed other unsupervised image translation
techniques available at the time. In “real vs fake” experiments, humans were unable to
distinguish the synthesized image from the real one about 25% of the time.

CycleGAN can be used for collection style transfer, where the entire works of an artist are used to train the
model. *Image taken from paper.

If you are planning to use CycleGAN for a practical application, it is important to be


aware of its strengths and limitations. It works well on tasks that involve color or texture
changes, like day-to-night photo translations, or photo-to-painting tasks like collection
style transfer (see above). However, tasks that require substantial geometric changes to
the image, such as cat-to-dog translations, usually fail.

https://towardsdatascience.com/cyclegan-learning-to-translate-images-without-paired-training-data-5b4e93862c8d 7/9
5/17/2021 CycleGAN: Learning to Translate Images (Without Paired Training Data) | by Sarah Wolf | Towards Data Science

Open in app

A very unimpressive attempt at a cat-to-dog image translation. Don’t try to use a CycleGAN for this. *Image
taken from paper.

Translations on the training data often look substantially better than those done on test
data.

Conclusion
Thanks for reading! I hope this was a useful overview. If you would like to see more
implementation details, there are some great public implementations out there you can
refer to. Please leave a comment if you have questions, corrections, or suggestions for
improving this post.

Sign up for The Variable


By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.

Emails will be sent to [email protected].


Get this newsletter
Not you?

https://towardsdatascience.com/cyclegan-learning-to-translate-images-without-paired-training-data-5b4e93862c8d 8/9
5/17/2021 CycleGAN: Learning to Translate Images (Without Paired Training Data) | by Sarah Wolf | Towards Data Science

Machine Learning Gans Computer Vision Towards Data Science


Open in app

About Help Legal

Get the Medium app

https://towardsdatascience.com/cyclegan-learning-to-translate-images-without-paired-training-data-5b4e93862c8d 9/9

You might also like