0% found this document useful (0 votes)
15 views11 pages

Fashiongan: Display Your Fashion Design Using Conditional Generative Adversarial Nets

Uploaded by

bangarujay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views11 pages

Fashiongan: Display Your Fashion Design Using Conditional Generative Adversarial Nets

Uploaded by

bangarujay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Pacific Graphics 2018 Volume 37 (2018), Number 7

H. Fu, A. Ghosh, and J. Kopf


(Guest Editors)

FashionGAN: Display your fashion design using Conditional


Generative Adversarial Nets

Y. R. Cui1 , Q. Liu1 , C. Y. Gao†1 and Z. Su1,2

[email protected], [email protected], [email protected], [email protected]


1 School of Data and Computer Science, Sun Yat-sen University, China
2 National Engineering Research Center of Digital Life, Sun Yat-sen University

Fashion sketch to garment image Garment image to another garment image Contour image to garment image

Input1 Input2 Output Input1 Input2 Output Input1 Input2 Output

Figure 1: Some examples. FashionGAN simplifies the pipeline of the traditional virtual garment display. It directly generates some specified
garment images from the garment design elements like fashion sketches and fabric patterns. In addition, FashionGAN has the feature of
reusability, which means the existing design elements (eg. contours or garment images) could be reused to generate another garment images.

Abstract
Virtual garment display plays an important role in fashion design for it can directly show the design effect of the garment
without having to make a sample garment like traditional clothing industry. In this paper, we propose an end-to-end virtual
garment display method based on Conditional Generative Adversarial Networks. Different from existing 3D virtual garment
methods which need complex interactions and domain-specific user knowledge, our method only need users to input a desired
fashion sketch and a specified fabric image then the image of the virtual garment whose shape and texture are consistent with
the input fashion sketch and fabric image can be shown out quickly and automatically. Moreover, it can also be extended to
contour images and garment images, which further improves the reuse rate of fashion design. Compared with the existing
image-to-image methods, the quality of images generated by our method is better in terms of color and shape.
CCS Concepts
•Networks → Network reliability; •Computing methodologies → Computer vision;

1. Introduction ever, it is expensive and time-consuming to make a sample gar-


ment. In recent years, the personalized customizing of the garment
The workflow of garment production includes two main stages: de-
becomes more and more fashionable, and requires the fashion en-
sign and manufacture. Before mass customized production, sample
terprises to improve the workflow of garment production to meet
garments had to be produced to display the design concept. How-
the demand of garment customization. The pipeline of garment cus-
tomization is the process of customer interactive selection of fabric
and other garment design elements. Its core and key technology
† Corresponding author. are to present the effect of the garment in the design phase. It will

c 2018 The Author(s)


Computer Graphics Forum c 2018 The Eurographics Association and John
Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

help the customer preview the virtual garment style through their Our main contributions are as follows:
chosen.
• We propose an end-to-end 2D virtual garment display frame-
Plenty of display methods have been proposed to preview vir- work, which is seamlessly integrated with the existing methods
tual garments in recent years. Most of them focus on displaying of garment design and production in the fashion industry. The
the 3D virtual garment. These methods display some details of the framework can automatically generate virtual garment images
garment in the design stage from different views without making in the design phase without any complex operations and experi-
sample garments. They would reduce the cost of the research and enced user interactions, which can greatly improve the efficiency
development of the garment production. However, the pipeline of of garment design and production.
3D virtual garment display contains a series of complex operations, • By directly encoding the fabric image to generate latent codes
such as making patterns from fashion sketches, 2D pattern sewing. containing visual information, our method could generate the
Further, these operations need lots of experienced users’ interac- garment image with a specified fabric image. While other ex-
tions. isting image-to-image translation methods cannot establish the
one-to-one map between the garment image and the fabric im-
Different from the 3D virtual garment display methods, we pro-
age. Furthermore, the encoding method of latent vector can en-
pose a novel end-to-end 2D virtual garment display framework with
sure the generated garment image has similar fabric information
the conditional Generative Adversarial Nets. Users only need to in-
even if the fabric image does not belong to the training dataset.
put a fashion sketch and a fabric image into the framework. A gar-
• Inspired by the architecture of BicycleGAN, we design a two-to-
ment image, whose shape is the same with the fashion sketch and
one GAN architecture which can generate an image with speci-
texture follows the fabric pattern, will be automatically generated.
fied visual information, and we formulate a loss function to gen-
It is very convenient for fashion designers and consumers to look
erator regular texture image, like stripe texture.
through the realistic effect of a garment in the design phase. Thus,
the proposed framework greatly improves the efficiency of the re-
search and development of the garment production. 2. Related Work
Generating a virtual garment image with a fashion sketch and Virtual garment display. Virtual garment display is useful in im-
a fabric image is a problem of image-to-image translation. Con- proving the efficiency of fashion design. The graphics researchers
volutional neural networks are powerful deep learning models that have investigated virtual garment display for decades, but most
can solve such problems [ZIE16]. However, usual CNN architec- of their efforts emphasize dressed characters. These methods are
tures require pre-specified loss functions and such losses may not mainly divided into two categories: 2D pattern-based garment mod-
be suitable for some realistic images. In recent years, Generative eling and Sketch-based garment modeling. 2D pattern-based gar-
Adversarial Network (GAN) [GPAM∗ 14] and its improved mod- ment modeling systems [VCMT05,FCRC05,BGK∗ 13] allow users
els, conditional Generative Adversarial Network (cGAN) [MO14] to edit patterns in 2D and use physics-based simulation to generate
has been widely used for image-to-image translation. a 3D garment model. These systems require significant time and
domain-specific user knowledge to create patterns that achieve a
Pix2pix [IZZE17] first proposed uses cGAN with the input im-
desired 3D garment. Sketch-based virtual garment modeling sys-
age condition to solve this problem. In pix2pix, shapes of the gen-
tems that allow users to trace garment silhouettes on top of a man-
erated image follow the input, but its color and texture are random.
nequin and then automatically infer plausible 3D garment shape
In order to generate a deterministic result, BicycleGAN [ZZP∗ 17]
[DJW∗ 06,TWB∗ 07,RMSC11,JHR∗ 15]. These systems create gar-
adds an encoder and establishes a bijection between the output
ment models which are frequently simple, non-physical and hence
space and the latent space. But the visual information contained
unrealizable. Different from these 3D virtual garment display meth-
in the latent vector still undetermined and inconsistent with the im-
ods, we propose a 2D virtual garment display based on the gen-
age condition of the generator. TextureGAN [XSA∗ 18] tries to at-
erative adversarial model, which avoids the demand for domain-
tach extra information (like cloth patch) to the image condition,
specific user knowledge and time-consuming of the traditional gar-
but its results depend on the position and size of the attached
ment modeling.
patch. Above model trained by paired images. Recently, Cycle-
GAN [ZPIE17] and MUNIT [HLBK18] propose using unsuper- Image-to-image GAN. Generative adversarial network (GAN)
vised learning to solve the image translation problem. A user ob- [GPAM∗ 14] is an impressive model can generate an image from
tains the desired image by specifying a content latent vector or a noise vector. It is composed of the generator and discriminator
transferring from the existing image. However, this method is hard which are trained in an adversarial manner. Due to the random out-
to generate the image containing detailed texture. For example, a put of the original GAN, researchers tried to add some conditions
striped garment image in fashion design. We design a Generative to constrain the generator. Conditional GAN (cGAN) [MO14] was
Adversarial Network (GAN) architecture for previewing the gar- one of those improved models. Types of conditions including dis-
ment design, named FashionGAN, which maps the desired fabric crete labels [MO14,DCSF15] , texts [RAY∗ 16,ZXL∗ 17] and so on.
image to a selected fashion sketch or a contour image of the gar- Especially, Isola et al. [IZZE17] first proposed regarding an image
ment (see Figure 1). This network architecture is similar to Bicy- as the condition, which accomplishes image translation at the pixel
cleGAN [ZZP∗ 17], but we encode fabric image to generate latent level. Afterward, the cGAN with image condition has been success-
vector and define a novel loss function which makes the network fully applied to many fields like style transfer [ZPIE17], terrain au-
architecture more suitable for generating the virtual garment im- thoring [DPWB17], high-resolution image generating [WLZ∗ 18],
age. and so on. In order to generate the desired results, researchers

c 2018 The Author(s)


Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

tried to add some stricter image conditions, such as a sketch with Enlightened by the idea of acquiring input information by en-
some colored scribblers [SLF∗ 17], a sketch with masks and tex- coding and carrying different features through a latent vector, we
ture patches [XSA∗ 18]. However, these methods were hard to gen- establish the mapping between the output and the random latent
erate uncommon shapes, color or texture [SLF∗ 17]. Inspired by vector. The randomness of latent vectors ensures the diversity of
their ideas, we use a sketch to constrain the shape of the generated output and encoding the output ensures the rationality of the latent
image. In addition, we also an encoded latent vector to constrain vector.
the generator output arbitrary color and texture. In order to gener-
BicycleGAN has two issues that make it impossible to be used
ate diverse results, some researchers proposed multi-modal meth-
directly in making our fashion design display framework. Firstly,
ods [ZZP∗ 17, GKN∗ 18, HLBK18]. Our FashionGAN is a cGAN
in the cVAE-GAN of BicycleGAN, the latent vector of the input
architecture with the image condition as well. Instead of using
ground truth contains some shape information of the image. Mean-
ground truths and sketch/contour images as the paired image to
while, the input contour image is the condition of the generator
train the network, we add a fabric pattern image to each pair so
that also constrains the shape of the generated image. Shape in-
that the visual information of the fabric pattern will be mapped
formation conveyed by two different inputs may cause redundancy
to contour image naturally. Except for cGAN, adding extra loss
and inconsistency. Furthermore, the ground truth is always unavail-
terms to the value function of GANs can achieve impressive image-
able, especially in fashion design fields. Secondly, texture details
to-image tasks like inpainting [PKD∗ 16, YCYL∗ 17] and super-
in fabric pattern cannot be restored with only L1 loss. Figure 5(d)
resolution [LTH∗ 17]. Rather than use the noise latent vector, VAE-
demonstrates the G just generates the average color of the texture
GAN [LSLW16] adds a variational auto-encoder generating visual
in the fabric pattern.
attribute vectors before the generator in original GAN.

3.2. FashionGAN
3. Architecture of FashionGAN
In order to solve these two problems, we improve the model of
To help users preview the effect of a garment in the design phase,
BicycleGAN. We use the image of the real fabric pattern to train
the goal is to provide an image synthesis pipeline that can generate
the encoder E so that only the material and color information of
garment images based on a fashion sketch and a specified fabric
the image of real fabric pattern is contained in the latent vector. In
image. Theoretically, this goal is to learn a mapping G among three
other words, the shape of the final generated image is constrained
different images domains X , C, Y. During the training phase, three
by fashion sketch/contour image, and the color together with the
images in the three domains, x ∈ X , c ∈ C, y ∈ Y, are given. They
material is constrained by the fabric pattern. Then we add an extra
are essentially samples of the joint distribution p(xx, c , y ). During
local loss module to control the generator synthesizing texture. For
the test phase, let an image x ∈ X and an image c ∈ C, then the
simplifying, we name our solution as FashionGAN.
generated image with the desired content is G(xx, c ) ∈ Y. G(xx, c )
is a sample of the conditional distribution p(yy|xx c). BicycleGAN FashionGAN learns a two-to-one mapping among three image
provides a feasible way to establish the image synthesis pipeline. domains. Namely, E : c → z , G : {zz, x } → ŷy (see Figure 2). Based
However, it is a one-to-one mapping between the latent vector and on BicycleGAN, we combine two types of networks called netA
output space, which is not suitable for our goal completely. In the and netB. The input of netA and netB is a tuple including a contour
following, we will first review the BicycleGAN and further extend image x , a fabric pattern image c and corresponding ground truth
it to formulate our FashionGAN. (desired garment image) y . x is extracted from y using Holistically-
Nested Edge Detection (HED) [XT15]. c is randomly cropped from
3.1. Preliminary y . NetA uses the cVAE-GAN architecture. At first, c is encoded
by E so that the latent vector z contains the visual information of
BicycleGAN learns a bijection between two image domains by the fabric pattern. Then the cGAN procedures is conducted. Thus
considering cVAE-GAN [LSLW16] and cLR-GAN [DKD16] si- E : c → z , G : {zz, x } → ŷy. The sampling item in cGAN loss function
multaneously. cVAE-GAN is actually a Variational Auto-Encoder should be modified to z ∼ E(cc):
(VAE), which reconstructs the ground truth y using the GAN loss
LGAN (G, D, E) and the L1 loss L1im (G, E). The encoded vector E(yy) LGAN (G, D, E) =Ex , y ∼pdata (xx, y ) [log(D(xx, y ))]
(2)
is expected to follow the Gaussian distribution that makes the KL +Ex ∼pdata (xx),zz∼E(cc) [log(1 − D(xx, G(xx, z )))].
divergence item LKL (E) is needed. Thus E : y → z , G : {zz, x } → ŷy.
cLR-GAN aims to ensure each random latent vector z can generate It is noted that, the original GAN loss may bring a mode collapse,
reasonable results using adversarial loss item LGAN (G, D) and L1 which influences the diversity of results. In our training, we use
loss item L1vec (G, E). Thus, G : z → ŷy, E : ŷy → ẑz. cLR-GAN con- WGAN [ACB17] loss, namely
structs a mapping between the output of G and the latent vector, but
LGAN (G, D, E) =Ex , y ∼pdata (xx, y ) [D(xx, y )]
the discriminator could not see the ground truth y . Hence, Bicycle- (3)
GAN takes advantage of cVAE-GAN and cLR-GAN: −Ex ∼pdata (xx),zz∼E(cc) [D(xx, G(xx, z ))].

We consider the L1 loss of image to ensure ŷy as close as y ,


G∗ = arg min maxLGAN (G, D, E) + λim L1im (G, E) + λKL LKL (E) namely mapping the fabric pattern to the contour image. Sampling
G D
item is also modified:
+LGAN (G, D) + λvec L1vec (G, E).
(1) L1im (G, E) = Ex , y ∼pdata (xx, y ),zz∼E(cc) [k y − G(xx, z ) k]. (4)

c 2018 The Author(s)


Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

Global Loss feature maps, which is extracted from a pre-trained classification


Fabric Contour Contour CNN. Namely, the inner product of vectorized feature maps i and
pattern 𝒄 Image 𝒙 Image 𝒙
j in layer the l:
Gli j = ∑ Fikl Fjk
l
. (5)
… … …
k
Generated
$
Image 𝒚
As CNN contains more than one convolution layer, we select
Encoder 𝑬 Generator 𝑮 Discriminator 𝑫
(ResNet)
KL
loss (Unet) L1
(PatchGAN) different Gram matrices in multi-scale. Let Gli j and Ali j be the Gram
loss
matrix of ŷ p and c p in layer i, respectively. Local scale loss is:
1 1
Ground truth 𝒚
Ls = ∑ 4N 2 M2
|L| l∈L ∑(Gli j − Ali j )2 , (6)
Local Texture Loss Module Discri-
l l i, j
minator
2
where Nl is the number of feature maps in layer i and Ml is the
Style style Style

Fabric
VGG
Matrix loss Matrix
VGG

Fabric
number of pixel in each feature map. L represents the chosen lay-
patch 𝒄𝒑 L1
loss
patch 𝒚
$𝒑 ers. Following with Gatys et al. [GEB15] , we choose some feature
maps of ŷy and c on the layer conv2_2, conv3_2 and conv4_2 in
(a) Architecture of netA the VGG-19 network. Using Equation (6) calculates the style loss
among them.
Contour The pixel loss. We use the L1 loss to ensure the generated texture
Image 𝒙 𝒙
same to it in the input fabric pattern. Namely,
Generated
#
Image 𝒚 L p =k ŷy p − c p k1 . (7)

… …
The local discriminator loss Lgan . We train another conditional
discriminator D2 judging whether two patches have the same tex-
Generator 𝑮 Discriminator 𝑫 tures. Same to D, G, E, the value function of D2 , G, E is:
L1 loss (Unet) (PatchGAN)
Lgan (G,D2 , E) = Ec p ∼p(cc p ) [log(D2 (cc p , c p ))]

+Ex ∼pdata (xx),cc p ∼p(cc p ),zz∼E(cc) [log(1 − D2 (cc p ,W [G(xx , z)]))],
Ground (8)
Encoder 𝑬 (ResNet) truth 𝒚 where G(xx, z ) = ŷy,W (ŷy) = ŷy p . A pair of patches(cc p , c p ) that have
the same textures is regarded as a positive example, otherwise
(b) Architecture of netB (cc p , ŷy p ) is regarded as a negative example.
To sum up, the local texture loss can consists of above three
Figure 2: Architecture of netA and netB. The generator, the dis-
terms, namely:
criminator and the encoder are same in the two architecture. The
netA and netB are trained alternatively. Lt = λs Ls + λ p L p + λgan Lgan . (9)

Therefore, the loss function of netA is:

We have demonstrated that BicycleGAN is hard to restore tex- LNetA (G, D, E) = LGAN (G, D, E)+λim L1im (G, E)+λKL LKL (E)+Lt .
ture details in the fabric pattern using afore-motioned the global (10)
loss LGAN and L1im . In order to describe the fine-grained texture, Same with BicycleGAN, the cLR-GAN is considered as our
we add a local loss module referencing TextureGAN [XSA∗ 18] to netB. In general, a fashion designer expects to see more reference
netA. These local loss items include local style loss Ls , local pixel results except for the desired one. NetB plays a role in increas-
loss L p , and local discriminator loss Lgan . Different from Texture- ing output diversity by inputting random noise, as well as ensur-
GAN, FashionGAN aims to encode the input fabric pattern then ing each random latent vector z ∼ N(0, 1) could generate a ra-
output the mapping result and diverse reference results. The archi- tional result. NetA maps a random noise to the generated image,
tecture of FashionGAN is based on cVAE-GAN and cLR-GAN. namely z → ŷy. NetB maps the generated image to the random noise.
Whereas, TextureGAN transform an input image attached with a Namely ŷy → ẑz. Multi-modal results can be obtained with netB. The
fabric pattern to the realistic image. What they do is trying to "prop- value function of NetB is:
agate" the texture patch in contour image. Its architecture is based
LNetB (G, D, E) = LGAN (G, D) + λL1vec (G, E). (11)
on cGAN. Following the architecture of netA, G generates an im-
age ŷy mapping the input fabric pattern c . We sample two patches
In summary, the overall value function of FashionGAN is:
ŷy p and c p of random size from 45 × 45 to 65 × 65 at ŷy and c .
LAll = LNetA (G, D, E) + LNetB (G, D, E) (12)
The local style loss Ls , which is inspired by the deep learning
based texture synthesis and style transfer work [GEB15, GEB16]. Figure 2 shows the architecture of netA and netB. When train-
The style of an image can be represented by the Gram matrix of its ing FashionGAN , {xx(i) , y (i) , c (i) } are inputed. G, D, E in netA

c 2018 The Author(s)


Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

is trained once according to Equation (10) by executing forward 4.2. Architecture details
and backward propagation. Then keeping the G, D, E and netB is
This subsection we illustrate the implementation details of en-
trained once according to Equation (11). Repeating this alternation.
coder E, generator G and discriminator D. For E, we use ResNet
[HZRS16] which is extensively employed in image classification
and visual feature description. A 64 × 64 fabric pattern image in-
put to E. For G, we also use U-Net architecture proposed in DC-
Mapping GAN [RMC15] which is the first CNN version of GAN. Inputs of
Encoder Generator result G are a garment contour image with 256 × 256 × 1 and an 8 × 1
Input
cloth
encoded latent vector. When testing, a user can input arbitrary con-
tour, fashion sketch or garment image to G. As for D. Following the
pix2pix work, we utilize PatchGAN proposed by pix2pix for dis-
Fashion sketch Diverse results
criminator which can be understood as a form of texture/style loss.
The network of G, D, E is composed of basic modules like residual
or convolution blocks. It can be easily extended to higher resolution
Figure 3: Architecture of FashionGAN in the test phase images. We also implement the 512 × 512 and 1024 × 1024 ver-
sions. In order to compare with baseline methods, we use 256×256
version.
Figure 3 shows the architecture of the FashionGAN in the test
phase. Users just input a fashion sketch/contour of garment together 4.3. Training details
with a fabric image. The generator will output a realistic garment
image with the desired fabric. Users can also input a list of random In order to reduce mode collapse, we apply wGAN loss item
latent vectors to show the diverse references. [ACB17] to train G and D. According to the author’s advice,
we use RMSprop optimizer with 1e-4 learning rate and 1 batch
size. For all our experiments, we use different weight settings for
4. Implementation the losses. For netA, set λim = 10, λKL = 0.01 in Equation (10),
λs = 2, 000, λ p = 1, λg an = 1 in Equation (9). For netB, we set
In this section, we first illustrate our garment dataset together with λ = 0.5 in Equation (11).
the experimental environment. Then we introduce the architecture
and training details.
5. Experiment results and analysis
This section, we give some baselines and evaluation metrics. We
4.1. Dataset and experiment environment conduct comparative experiments on these baselines and give the
We built a garment dataset. Garment images should have the qualitative and quantitative evaluation of the results. Lastly, We
clean background to avoid generating unreasonable results. There- show other examples of fashion design, such as garment parts,
fore we collected a lot of garment images and manually select the handbags, shoes, and user-generated sketch.
desired samples. Our data comes from two different sources:
• Crawling 5,000 images from the E-commerce website like 5.1. Baselines
taosousou † . We compare five different the most related projects:
• Filtering 19,000 images from public large-scale garment dataset
• pix2pix [IZZE17]. We used the paired data, ground truth-
like DeepFashion ‡ [LLQ∗ 16].
contour image, to train it.
Then we built a fabric pattern dataset by cropping a patch from • CycleGAN [ZPIE17]. It is an unsupervised model trained with
each image in the garment dataset. In order to train G, D, E, gar- adversarial loss and cycle reconstruction loss in two domains A
ment image ground truths, corresponding contour images extracted and B. In our experiment, contour images are in domain A, and
by HED and fabric pattern images are inputted to FashionGAN . ground truths are in domain B.
The training set contains 21000 pure color garments images and • BicycleGAN [ZZP∗ 17]. It can specify the content of the gener-
1000 striped garment images. The test set contains 1800 pure color ated image by encoding ground truth. In addition, it can also gen-
garment images and 100 striped garment images. We trained two erate multi-modal outputs by inputting random noise. We also
different models on these two datasets. For each model, we trained use the paired data to train it.
160 epochs on NVIDIA GeForce GTX 1080 Ti. At test time, we • MUNIT [HLBK18]. It is also an unsupervised model which con-
just input a fashion sketch/contour/garment image together with sists of two auto-encodersEA , EB and uses content and style la-
desired fabric pattern, the FashionGAN will output mapping results tent code. In our experiment, contour image and ground truth
within 0.5s per image. input to EA , EB separately.
• TextureGAN [XSA∗ 18]. It is a cGAN model whose condition
is a contour image with a fabric pattern. We assume the fabric
† http://www.taotaosou.com/ pattern is located in the center of the contour.
‡ http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html For the sake of fairness, we train all models 160 epoch.

c 2018 The Author(s)


Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

√∫

√∫

(a) (b)

√∫

√∫

√∫

(c) (d)

√∫ √∫

(e)

Figure 4: Example results of our method. (a) Mapping a fabric pattern to a fashion sketch. (b) Mapping a fabric pattern to a garment image.
(c) Mapping a fabric pattern to a contour image. (d) Mapping a stripe fabric pattern to a contour image. (e) Shoes, bags, garment parts
and user-generated sketch mapping examples. The first column of each sub-figure represents the input contour, fashion sketch or garment
image. The second column is the input fabric pattern. The third column is the corresponding mapping result. The rest columns contain more
reference results. c 2018 The Author(s)
Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

(a)

(b)
c 2018 The Author(s)
Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

(c)

c 2018 The Author(s)


(d) Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.

Figure 5: Results of our method and other methods. (a) Mapping a fabric pattern to a fashion sketch. (b) Mapping a stripe fabric pattern
to a contour image. (c) Reconstructing ground truth. (d) Mapping a fabric pattern to a contour image. Column 1-3 is our results. Follow
columns are the results of pix2pix, BicycleGAN, MUNIT, CycleGAN. The last two columns show results of TextureGAN.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

5.2. Evaluate Metric Table 1: Human performance


We give quantitative analysis metric to evaluate our model and the
baselines.
Human preference. To compare mapping results by different
methods. We perform the human perceptual study on the website
of our laboratory. 30 volunteers are given 100 random input im-
age pairs including contour/fashion sketch and corresponding fab-
ric patterns, meanwhile, they are given 6 different output results for
each pair. Then they grade 6 outputs in color accuracy, shape accu-
racy, and overall realism. The highest score is 5 and the lowest is 0.
Lastly, we average these scores. (FC) layer [HLBK18]. This architecture is good for natural image,
Inception score. The inception score (IS) [SVI∗ 16, SGZ∗ 16] is but is not suitable for the fashion sketch or contour image. Results
a well-known metric for evaluating generate results quantitatively of CycleGAN are the worst. It cannot map the color correctly. Tex-
from labelled dataset: tureGAN tries to "propagate" the input fabric pattern. The results
depend on the size and position of the pattern. When the pattern
exp(Ex KL(p(y|x) k p∗ (y))) (13) is larger and at the center of fashion sketch, the filling results will
where x is one sample, and p(y|x) is the softmax layer output of a be better. In addition, Xian et al. [XSA∗ 18] state that they need to
trained classifier. p∗ (y) is the overall label distribution of generated specify the image mask of the filling region, which brings complex
samples. A higher inception score means high-quality and diverse. operations to the user. Compared with these methods, our Fashion-
GAN does not need the mask. The color and texture of the gen-
erated image are controlled by the encoded latent vector, and the
5.3. Comparison against baselines shape constrained by fashion sketch. Figure 5 (b) shows the stripe
In this subsection, we first conduct 4 different groups of experi- fashion sketch mapping examples. We can observe that pix2pix
ments (Figure 4 ) to illustrate our model. Then we conduct compar- cannot correctly map shape and color. CycleGAN cannot inherit
ative experiments on baselines and give qualitative and quantitative the color of the fabric pattern. BicycleGAN cannot map the detail
analysis. Lastly, we present more general results of our model. of strip fabric pattern into contour. Its results are close to the aver-
age color of the input pattern. Results of MUNIT keep the stripe but
Figure 4 (a), (b), (c) presents some examples of mapping a fabric not smooth enough compared with ours. The reasons for these ex-
pattern to a fashion sketch, garment image or contour. And Figure perimental results same to (a). To a certain extent, the texture can be
4 (d) show examples of mapping a stripe fabric pattern to a contour. propagated using TextureGAN. For regular texture like the stripe,
Results demonstrate that FashionGAN can generate desired, realis- however, this level of filling will influence the visual effect. Figure
tic and reasonable garment images. In particular, clear body curves 5 (c) shows some examples of reconstructing the ground truth. We
can be observed in the third rows of Figure 4 (c), therefore Fash- found that our model and CycleGAN can reconstruct the shape and
ionGAN can also recognize men’s and women’s garments. Except color of ground truth well. See the second row of Figure 5 (c), the
for single color fabric pattern mapping problem, FashionGAN can result of pix2pix and BicycleGAN exists the color deficiency. The
also map stripe pattern since it contains local loss module. result of MUNIT and TextureGAN is relatively worse.
Figure 5 compares FashionGAN with pix2pix [IZZE17], Bicy-
Table 1 lists the average score of color, shape accuracy and over-
cleGAN [ZZP∗ 17], MUNIT [HLBK18], CycleGAN [ZPIE17] and
all feeling to the generated garment images. First, the unsupervised
TextureGAN [XSA∗ 18]. We retrain these models on our dataset
model (CycleGAN, MUNIT) got a lower score, indicating that they
containing 24,000+ garment images.
are not appropriate for specified image-to-image translation. Then,
Figure 5 (a) (d) shows the fashion sketch and the contour im- the supervised model achieves a higher score in shape accuracy.
age mapping examples. Except for pix2pix, all mapping results We illustrated that they both belong to cGAN. The input contour
keep the outline of input fashion sketch. However, only our model is the constraint condition of the generator causing the generated
mapping the color correctly meanwhile keeps the internal details. image keeps the garment shapes. Lastly, FashionGAN , Bicycle-
Pix2pix can generate realistic images, but the color is totally mis- GAN, MUNIT, TextureGAN got higher color accuracy, indicat-
matched to input fabric pattern. The reason is that pix2pix try to ing that if the color information provided, the generator can output
build a one-to-one mapping between the ground truth and the con- colorful garment images. MUNIT and TextureGAN try to convey
tour image. When the ground truths do not exist, they cannot gen- this information through conditions in the generator. FashionGAN
erate the desired output. BicycleGAN generates the mapping result and BicycleGAN use the latent vector conveys this information.
that keeps the color partially for the encoded latent vector contains BicycleGAN encodes the input ground truth and its generator is
part of visual information of the fabric pattern. Results of MUNIT constrained by the contour image. There always exist redundancy
can keep the color, but the shape especially internal details of gen- and inconsistency such as contour information in latent vector and
erated images are deficient. MUNIT model is trained in an unsuper- condition. Therefore, we utilize encoder express visual information
vised manner, and using style code to control the results. The style containing fabric pattern, and using the condition of the generator
encoder of MUNIT includes several strode convolutional layers, to convey shape information. Experimental results proved that it is
followed by a global average pooling layer and a fully connected the most effective way.

c 2018 The Author(s)


Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

Table 2: Inception score


Fashion- pix2pix Bicycle- Cycle- MUNIT Texture-
GAN GAN GAN GAN
Contour+
cloth patch 3.408 3.454 3.368 3.680 3.466 3.171
Contour+
2.643 2.071 2.396 2.191 2.498 2.282
ground truth
Fashion sketch+
4.567 4.393 4.502 4.184 5.074 3.840
cloth patch
Contour+
3.097 3.183 3.421 3.128 3.323 3.383
stripe patch
Average 3.429 3.275 3.422 3.296 3.59 3.169

Figure 7: Examples of some failed mapping results.

Figure 6: Examples of general image-to-image results. garment images with an irregular texture and trained them on Fash-
ionGAN. The generator outputs images with the average color of
the input fabric pattern. We analyze the reasons are as follows. First,
the encoder in our network can learn the mapping between single-
Table 2 displays the inception score of different methods in 4 test color or simple color combination (like stripe) and latent vector.
cases. This score reflects the diversity and quality of the generated There are various kinds of irregular textures in the real scene. Each
images. Except for MUNIT, our FashionGAN is better than other of them contains complex color combinations. It is hard to describe
models. TextureGAN and pix2pix are one-to-one mapping models, it using latent vector in the current architecture. Second, the irreg-
thus their inception scores are lower. Our multi-modal module ref- ular textures are in variant shapes, the current local loss module
erence the BicycleGAN, and their inception scores are very close. cannot reconstruct them well. In the future, we aim to construct a
MUNIT is the state-of-the-art multi-modal model, which achieves new local loss module describing the irregular texture. What’ more,
the highest IS. The reason is that they assume all images share con- we will extend our model to more general image translation tasks,
tent space. In the future work, we plan to add a content encoder to like image colorization and style transfer.
the diversity of results.

6. Conclusion
5.4. More applications
We propose FashionGAN for exhibiting the design effect of gar-
FashionGAN is a robust model. Figure 4 (e) shows some map- ments. Our model is an end-to-end framework which only needs
ping examples of garment parts and user-generated sketches. These users input a desired fabric image and a fashion sketch, a gar-
test cases are totally different with training data, but they are still ment image based on input data will be automatically generated.
mapped correctly with the model trained on garment dataset. Ex- Based on cGAN, FashionGAN establishes a bijection relationship
cept for garment design, shoes and bags design are very common in between fabric and latent vector so that the latent vector can be ex-
fashion fields. We also retrain our FashionGAN on shoes and bags plained with fabric information. Moreover, we explore to add some
dataset provided by BicycleGAN [ZZP∗ 17]. Figure 4 gives some local losses to help generate images with stripe texture. The exper-
examples of test cases. FashionGAN can generate desired results by iments show our method can indeed achieve impressive results in
specifying the fabric pattern. FashionGAN can be extended to gen- fashion design.
eral image-to-image translation like colorization. Figure 6 shows
examples like changing the color of seawater, grass or a fabric pat-
tern. 7. Acknowledgements
This research is supported by the National Key Research
5.5. Limitation and future work and Development Plan in China (2018YFC0830500), the
Guangzhou Science Technology and Innovation Commission
Figure 7 shows some failed mapping examples. Our FashionGAN (GZSTI16EG14/201704030079), and the National Natural Science
is workable for single-color and regular (like stripe) fabric pattern, Foundation of China (61772140).
but it cannot map an irregular one. Especially, we selected 2,000

c 2018 The Author(s)


Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets

References [MO14] M IRZA M., O SINDERO S.: Conditional generative adversarial


nets. Computer Science (2014), 2672–2680. 2
[ACB17] A RJOVSKY M., C HINTALA S., B OTTOU L.: Wasserstein gan.
arXiv preprint arXiv:1701.07875 (2017). 3, 5 [PKD∗ 16] PATHAK D., K RAHENBUHL P., D ONAHUE J., DARRELL T.,
E FROS A. A.: Context encoders: Feature learning by inpainting. In
[BGK∗ 13] B ERTHOUZOZ F., G ARG A., K AUFMAN D. M., G RINSPUN IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
E., AGRAWALA M.: Parsing sewing patterns into 3d garments. Acm (June 2016). 3
Transactions on Graphics(TOG) 32, 4 (2013), 1–12. 2
[RAY∗ 16] R EED S. E., A KATA Z., YAN X., L OGESWARAN L.,
[DCSF15] D ENTON E., C HINTALA S., S ZLAM A., F ERGUS R.: Deep S CHIELE B., L EE H.: Generative adversarial text to image synthesis.
generative image models using a laplacian pyramid of adversarial net- International Conference on Machine Learning(ICML) (2016), 1060–
works. In Advances in Neural Information Processing Systems(NIPS) 1069. 2
(2015), pp. 1486–1494. 2
[RMC15] R ADFORD A., M ETZ L., C HINTALA S.: Unsupervised rep-
[DJW∗ 06] D ECAUDIN P., J ULIUS D., W ITHER J., B OISSIEUX L., resentation learning with deep convolutional generative adversarial net-
S HEFFER A., C ANI M.-P.: Virtual garments: A fully geometric ap- works. Computer Science (2015). 5
proach for clothing design. In Computer Graphics Forum (2006), vol. 25,
Wiley Online Library, pp. 625–634. 2 [RMSC11] ROBSON C., M AHARIK R., S HEFFER A., C ARR N.:
Context-aware garment modeling from sketches. Computers & Graph-
[DKD16] D ONAHUE J., K RÄHENBÜHL P., DARRELL T.: Adversarial ics(CG) 35, 3 (2011), 604–613. 2
feature learning. arXiv preprint arXiv:1605.09782 (2016). 3
[SGZ∗ 16] S ALIMANS T., G OODFELLOW I., Z AREMBA W., C HEUNG
[DPWB17] D IGNE J., P EYTAVIE A., W OLF C., B ENES B.: Interactive V., R ADFORD A., C HEN X.: Improved techniques for training gans. In
example-based terrain authoring with conditional generative adversarial Advances in Neural Information Processing Systems (2016), pp. 2234–
networks. Acm Transactions on Graphics(TOG) 36, 6 (2017), 228. 2 2242. 9
[FCRC05] F ONTANA M., C ARUBELLI A., R IZZI C., C UGINI U.: [SLF∗ 17] S ANGKLOY P., L U J., FANG C., Y U F., H AYS J.: Scribbler:
Clothassembler: a cad module for feature-based garment pattern assem- Controlling deep image synthesis with sketch and color. In IEEE Confer-
bly. Computer-Aided Design and Applications 2, 6 (2005), 795–804. 2 ence on Computer Vision and Pattern Recognition (CVPR) (July 2017).
[GEB15] G ATYS L. A., E CKER A. S., B ETHGE M.: Texture synthesis 3
using convolutional neural networks. pp. 262–270. 4 [SVI∗ 16] S ZEGEDY C., VANHOUCKE V., I OFFE S., S HLENS J., W O -
[GEB16] G ATYS L. A., E CKER A. S., B ETHGE M.: Image style transfer JNA Z.: Rethinking the inception architecture for computer vision. In
using convolutional neural networks. In Computer Vision and Pattern Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition(CVPR) (2016), pp. 2414–2423. 4 Recognition (2016), pp. 2818–2826. 9
[GKN∗ 18] G HOSH A., K ULHARIA V., NAMBOODIRI V. P., T ORR [TWB∗ 07] T URQUIN E., W ITHER J., B OISSIEUX L., C ANI M.-P.,
P. H., D OKANIA P. K.: Multi-agent diverse generative adversarial net- H UGHES J. F.: A sketch-based interface for clothing virtual characters.
works. In IEEE Conference on Computer Vision and Pattern Recognition IEEE Computer graphics and applications(CGA) 27, 1 (2007). 2
(CVPR) (June 2018). 3 [VCMT05] VOLINO P., C ORDIER F., M AGNENAT-T HALMANN N.:
[GPAM∗ 14] G OODFELLOW I. J., P OUGET-A BADIE J., M IRZA M., X U From early virtual garment simulation to interactive fashion design.
B., WARDE -FARLEY D., O ZAIR S., C OURVILLE A., B ENGIO Y.: Gen- Computer-aided design 37, 6 (2005), 593–608. 2
erative adversarial nets. In Advances in Neural Information Processing [WLZ∗ 18] WANG T.-C., L IU M.-Y., Z HU J.-Y., TAO A., K AUTZ J.,
Systems(NIPS) (2014), pp. 2672–2680. 2 C ATANZARO B.: High-resolution image synthesis and semantic manip-
[HLBK18] H UANG X., L IU M.-Y., B ELONGIE S., K AUTZ J.: Multi- ulation with conditional gans. In IEEE Conference on Computer Vision
modal unsupervised image-to-image translation. In European Confer- and Pattern Recognition (CVPR) (June 2018). 2
ence on Computer Vision(ECCV) (2018). 2, 3, 5, 9 [XSA∗ 18] X IAN W., S ANGKLOY P., AGRAWAL V., R AJ A., L U J.,
[HZRS16] H E K., Z HANG X., R EN S., S UN J.: Deep residual learning FANG C., Y U F., H AYS J.: Texturegan: Controlling deep image syn-
for image recognition. In IEEE Conference on Computer Vision and thesis with texture patches. In IEEE Conference on Computer Vision
Pattern Recognition (CVPR) (June 2016). 5 and Pattern Recognition (CVPR) (June 2018). 2, 3, 4, 5, 9

[IZZE17] I SOLA P., Z HU J.-Y., Z HOU T., E FROS A. A.: Image- [XT15] X IE S., T U Z.: Holistically-nested edge detection. International
to-image translation with conditional adversarial networks. In IEEE Journal of Computer Vision 125, 1-3 (2015), 1–16. 3
Conference on Computer Vision and Pattern Recognition (CVPR) (July [YCYL∗ 17] Y EH R. A., C HEN C., Y IAN L IM T., S CHWING A. G.,
2017). 2, 5, 9 H ASEGAWA -J OHNSON M., D O M. N.: Semantic image inpainting with
deep generative models. In IEEE Conference on Computer Vision and
[JHR∗ 15] J UNG A., H AHMANN S., ROHMER D., B EGAULT A.,
Pattern Recognition (CVPR) (July 2017). 3
B OISSIEUX L., C ANI M. P.: Sketching folds:developable surfaces
from non-planar silhouettes. Acm Transactions on Graphics(TOG) 34, 5 [ZIE16] Z HANG R., I SOLA P., E FROS A. A.: Colorful image coloriza-
(2015), 1–12. 2 tion. In European Conference on Computer Vision (2016), Springer,
pp. 649–666. 2
[LLQ∗ 16] L IU Z., L UO P., Q IU S., WANG X., TANG X.: Deepfash-
ion: Powering robust clothes recognition and retrieval with rich annota- [ZPIE17] Z HU J.-Y., PARK T., I SOLA P., E FROS A. A.: Unpaired
tions. In IEEE Conference on Computer Vision and Pattern Recognition image-to-image translation using cycle-consistent adversarial networks.
(CVPR) (June 2016). 5 In IEEE International Conference on Computer Vision (ICCV) (Oct
2017). 2, 5, 9
[LSLW16] L ARSEN A. B. L., S ONDERBY S. K., L AROCHELLE H.,
W INTHER O.: Autoencoding beyond pixels using a learned similarity [ZXL∗ 17] Z HANG H., X U T., L I H., Z HANG S., H UANG X., WANG X.,
metric. International Conference on Machine Learning(ICML) (2016), M ETAXAS D.: Stackgan: Text to photo-realistic image synthesis with
1558–1566. 3 stacked generative adversarial networks. In IEEE International Confer-
ence on Computer Vision (ICCV) (2017), pp. 5907–5915. 2
[LTH∗ 17] L EDIG C., T HEIS L., H USZAR F., C ABALLERO J., C UN -
NINGHAM A., ACOSTA A., A ITKEN A., T EJANI A., T OTZ J., WANG [ZZP∗ 17] Z HU J.-Y., Z HANG R., PATHAK D., DARRELL T., E FROS
Z., S HI W.: Photo-realistic single image super-resolution using a gener- A. A., WANG O., S HECHTMAN E.: Toward multimodal image-to-
ative adversarial network. In IEEE Conference on Computer Vision and image translation. In Advances in Neural Information Processing Sys-
Pattern Recognition (CVPR) (July 2017). 3 tems(NIPS) (2017), Curran Associates, Inc., pp. 465–476. 2, 3, 5, 9, 10

c 2018 The Author(s)


Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.

You might also like