Fashiongan: Display Your Fashion Design Using Conditional Generative Adversarial Nets
Fashiongan: Display Your Fashion Design Using Conditional Generative Adversarial Nets
Fashion sketch to garment image Garment image to another garment image Contour image to garment image
Figure 1: Some examples. FashionGAN simplifies the pipeline of the traditional virtual garment display. It directly generates some specified
garment images from the garment design elements like fashion sketches and fabric patterns. In addition, FashionGAN has the feature of
reusability, which means the existing design elements (eg. contours or garment images) could be reused to generate another garment images.
Abstract
Virtual garment display plays an important role in fashion design for it can directly show the design effect of the garment
without having to make a sample garment like traditional clothing industry. In this paper, we propose an end-to-end virtual
garment display method based on Conditional Generative Adversarial Networks. Different from existing 3D virtual garment
methods which need complex interactions and domain-specific user knowledge, our method only need users to input a desired
fashion sketch and a specified fabric image then the image of the virtual garment whose shape and texture are consistent with
the input fashion sketch and fabric image can be shown out quickly and automatically. Moreover, it can also be extended to
contour images and garment images, which further improves the reuse rate of fashion design. Compared with the existing
image-to-image methods, the quality of images generated by our method is better in terms of color and shape.
CCS Concepts
•Networks → Network reliability; •Computing methodologies → Computer vision;
help the customer preview the virtual garment style through their Our main contributions are as follows:
chosen.
• We propose an end-to-end 2D virtual garment display frame-
Plenty of display methods have been proposed to preview vir- work, which is seamlessly integrated with the existing methods
tual garments in recent years. Most of them focus on displaying of garment design and production in the fashion industry. The
the 3D virtual garment. These methods display some details of the framework can automatically generate virtual garment images
garment in the design stage from different views without making in the design phase without any complex operations and experi-
sample garments. They would reduce the cost of the research and enced user interactions, which can greatly improve the efficiency
development of the garment production. However, the pipeline of of garment design and production.
3D virtual garment display contains a series of complex operations, • By directly encoding the fabric image to generate latent codes
such as making patterns from fashion sketches, 2D pattern sewing. containing visual information, our method could generate the
Further, these operations need lots of experienced users’ interac- garment image with a specified fabric image. While other ex-
tions. isting image-to-image translation methods cannot establish the
one-to-one map between the garment image and the fabric im-
Different from the 3D virtual garment display methods, we pro-
age. Furthermore, the encoding method of latent vector can en-
pose a novel end-to-end 2D virtual garment display framework with
sure the generated garment image has similar fabric information
the conditional Generative Adversarial Nets. Users only need to in-
even if the fabric image does not belong to the training dataset.
put a fashion sketch and a fabric image into the framework. A gar-
• Inspired by the architecture of BicycleGAN, we design a two-to-
ment image, whose shape is the same with the fashion sketch and
one GAN architecture which can generate an image with speci-
texture follows the fabric pattern, will be automatically generated.
fied visual information, and we formulate a loss function to gen-
It is very convenient for fashion designers and consumers to look
erator regular texture image, like stripe texture.
through the realistic effect of a garment in the design phase. Thus,
the proposed framework greatly improves the efficiency of the re-
search and development of the garment production. 2. Related Work
Generating a virtual garment image with a fashion sketch and Virtual garment display. Virtual garment display is useful in im-
a fabric image is a problem of image-to-image translation. Con- proving the efficiency of fashion design. The graphics researchers
volutional neural networks are powerful deep learning models that have investigated virtual garment display for decades, but most
can solve such problems [ZIE16]. However, usual CNN architec- of their efforts emphasize dressed characters. These methods are
tures require pre-specified loss functions and such losses may not mainly divided into two categories: 2D pattern-based garment mod-
be suitable for some realistic images. In recent years, Generative eling and Sketch-based garment modeling. 2D pattern-based gar-
Adversarial Network (GAN) [GPAM∗ 14] and its improved mod- ment modeling systems [VCMT05,FCRC05,BGK∗ 13] allow users
els, conditional Generative Adversarial Network (cGAN) [MO14] to edit patterns in 2D and use physics-based simulation to generate
has been widely used for image-to-image translation. a 3D garment model. These systems require significant time and
domain-specific user knowledge to create patterns that achieve a
Pix2pix [IZZE17] first proposed uses cGAN with the input im-
desired 3D garment. Sketch-based virtual garment modeling sys-
age condition to solve this problem. In pix2pix, shapes of the gen-
tems that allow users to trace garment silhouettes on top of a man-
erated image follow the input, but its color and texture are random.
nequin and then automatically infer plausible 3D garment shape
In order to generate a deterministic result, BicycleGAN [ZZP∗ 17]
[DJW∗ 06,TWB∗ 07,RMSC11,JHR∗ 15]. These systems create gar-
adds an encoder and establishes a bijection between the output
ment models which are frequently simple, non-physical and hence
space and the latent space. But the visual information contained
unrealizable. Different from these 3D virtual garment display meth-
in the latent vector still undetermined and inconsistent with the im-
ods, we propose a 2D virtual garment display based on the gen-
age condition of the generator. TextureGAN [XSA∗ 18] tries to at-
erative adversarial model, which avoids the demand for domain-
tach extra information (like cloth patch) to the image condition,
specific user knowledge and time-consuming of the traditional gar-
but its results depend on the position and size of the attached
ment modeling.
patch. Above model trained by paired images. Recently, Cycle-
GAN [ZPIE17] and MUNIT [HLBK18] propose using unsuper- Image-to-image GAN. Generative adversarial network (GAN)
vised learning to solve the image translation problem. A user ob- [GPAM∗ 14] is an impressive model can generate an image from
tains the desired image by specifying a content latent vector or a noise vector. It is composed of the generator and discriminator
transferring from the existing image. However, this method is hard which are trained in an adversarial manner. Due to the random out-
to generate the image containing detailed texture. For example, a put of the original GAN, researchers tried to add some conditions
striped garment image in fashion design. We design a Generative to constrain the generator. Conditional GAN (cGAN) [MO14] was
Adversarial Network (GAN) architecture for previewing the gar- one of those improved models. Types of conditions including dis-
ment design, named FashionGAN, which maps the desired fabric crete labels [MO14,DCSF15] , texts [RAY∗ 16,ZXL∗ 17] and so on.
image to a selected fashion sketch or a contour image of the gar- Especially, Isola et al. [IZZE17] first proposed regarding an image
ment (see Figure 1). This network architecture is similar to Bicy- as the condition, which accomplishes image translation at the pixel
cleGAN [ZZP∗ 17], but we encode fabric image to generate latent level. Afterward, the cGAN with image condition has been success-
vector and define a novel loss function which makes the network fully applied to many fields like style transfer [ZPIE17], terrain au-
architecture more suitable for generating the virtual garment im- thoring [DPWB17], high-resolution image generating [WLZ∗ 18],
age. and so on. In order to generate the desired results, researchers
tried to add some stricter image conditions, such as a sketch with Enlightened by the idea of acquiring input information by en-
some colored scribblers [SLF∗ 17], a sketch with masks and tex- coding and carrying different features through a latent vector, we
ture patches [XSA∗ 18]. However, these methods were hard to gen- establish the mapping between the output and the random latent
erate uncommon shapes, color or texture [SLF∗ 17]. Inspired by vector. The randomness of latent vectors ensures the diversity of
their ideas, we use a sketch to constrain the shape of the generated output and encoding the output ensures the rationality of the latent
image. In addition, we also an encoded latent vector to constrain vector.
the generator output arbitrary color and texture. In order to gener-
BicycleGAN has two issues that make it impossible to be used
ate diverse results, some researchers proposed multi-modal meth-
directly in making our fashion design display framework. Firstly,
ods [ZZP∗ 17, GKN∗ 18, HLBK18]. Our FashionGAN is a cGAN
in the cVAE-GAN of BicycleGAN, the latent vector of the input
architecture with the image condition as well. Instead of using
ground truth contains some shape information of the image. Mean-
ground truths and sketch/contour images as the paired image to
while, the input contour image is the condition of the generator
train the network, we add a fabric pattern image to each pair so
that also constrains the shape of the generated image. Shape in-
that the visual information of the fabric pattern will be mapped
formation conveyed by two different inputs may cause redundancy
to contour image naturally. Except for cGAN, adding extra loss
and inconsistency. Furthermore, the ground truth is always unavail-
terms to the value function of GANs can achieve impressive image-
able, especially in fashion design fields. Secondly, texture details
to-image tasks like inpainting [PKD∗ 16, YCYL∗ 17] and super-
in fabric pattern cannot be restored with only L1 loss. Figure 5(d)
resolution [LTH∗ 17]. Rather than use the noise latent vector, VAE-
demonstrates the G just generates the average color of the texture
GAN [LSLW16] adds a variational auto-encoder generating visual
in the fabric pattern.
attribute vectors before the generator in original GAN.
3.2. FashionGAN
3. Architecture of FashionGAN
In order to solve these two problems, we improve the model of
To help users preview the effect of a garment in the design phase,
BicycleGAN. We use the image of the real fabric pattern to train
the goal is to provide an image synthesis pipeline that can generate
the encoder E so that only the material and color information of
garment images based on a fashion sketch and a specified fabric
the image of real fabric pattern is contained in the latent vector. In
image. Theoretically, this goal is to learn a mapping G among three
other words, the shape of the final generated image is constrained
different images domains X , C, Y. During the training phase, three
by fashion sketch/contour image, and the color together with the
images in the three domains, x ∈ X , c ∈ C, y ∈ Y, are given. They
material is constrained by the fabric pattern. Then we add an extra
are essentially samples of the joint distribution p(xx, c , y ). During
local loss module to control the generator synthesizing texture. For
the test phase, let an image x ∈ X and an image c ∈ C, then the
simplifying, we name our solution as FashionGAN.
generated image with the desired content is G(xx, c ) ∈ Y. G(xx, c )
is a sample of the conditional distribution p(yy|xx c). BicycleGAN FashionGAN learns a two-to-one mapping among three image
provides a feasible way to establish the image synthesis pipeline. domains. Namely, E : c → z , G : {zz, x } → ŷy (see Figure 2). Based
However, it is a one-to-one mapping between the latent vector and on BicycleGAN, we combine two types of networks called netA
output space, which is not suitable for our goal completely. In the and netB. The input of netA and netB is a tuple including a contour
following, we will first review the BicycleGAN and further extend image x , a fabric pattern image c and corresponding ground truth
it to formulate our FashionGAN. (desired garment image) y . x is extracted from y using Holistically-
Nested Edge Detection (HED) [XT15]. c is randomly cropped from
3.1. Preliminary y . NetA uses the cVAE-GAN architecture. At first, c is encoded
by E so that the latent vector z contains the visual information of
BicycleGAN learns a bijection between two image domains by the fabric pattern. Then the cGAN procedures is conducted. Thus
considering cVAE-GAN [LSLW16] and cLR-GAN [DKD16] si- E : c → z , G : {zz, x } → ŷy. The sampling item in cGAN loss function
multaneously. cVAE-GAN is actually a Variational Auto-Encoder should be modified to z ∼ E(cc):
(VAE), which reconstructs the ground truth y using the GAN loss
LGAN (G, D, E) and the L1 loss L1im (G, E). The encoded vector E(yy) LGAN (G, D, E) =Ex , y ∼pdata (xx, y ) [log(D(xx, y ))]
(2)
is expected to follow the Gaussian distribution that makes the KL +Ex ∼pdata (xx),zz∼E(cc) [log(1 − D(xx, G(xx, z )))].
divergence item LKL (E) is needed. Thus E : y → z , G : {zz, x } → ŷy.
cLR-GAN aims to ensure each random latent vector z can generate It is noted that, the original GAN loss may bring a mode collapse,
reasonable results using adversarial loss item LGAN (G, D) and L1 which influences the diversity of results. In our training, we use
loss item L1vec (G, E). Thus, G : z → ŷy, E : ŷy → ẑz. cLR-GAN con- WGAN [ACB17] loss, namely
structs a mapping between the output of G and the latent vector, but
LGAN (G, D, E) =Ex , y ∼pdata (xx, y ) [D(xx, y )]
the discriminator could not see the ground truth y . Hence, Bicycle- (3)
GAN takes advantage of cVAE-GAN and cLR-GAN: −Ex ∼pdata (xx),zz∼E(cc) [D(xx, G(xx, z ))].
Fabric
VGG
Matrix loss Matrix
VGG
Fabric
number of pixel in each feature map. L represents the chosen lay-
patch 𝒄𝒑 L1
loss
patch 𝒚
$𝒑 ers. Following with Gatys et al. [GEB15] , we choose some feature
maps of ŷy and c on the layer conv2_2, conv3_2 and conv4_2 in
(a) Architecture of netA the VGG-19 network. Using Equation (6) calculates the style loss
among them.
Contour The pixel loss. We use the L1 loss to ensure the generated texture
Image 𝒙 𝒙
same to it in the input fabric pattern. Namely,
Generated
#
Image 𝒚 L p =k ŷy p − c p k1 . (7)
… …
The local discriminator loss Lgan . We train another conditional
discriminator D2 judging whether two patches have the same tex-
Generator 𝑮 Discriminator 𝑫 tures. Same to D, G, E, the value function of D2 , G, E is:
L1 loss (Unet) (PatchGAN)
Lgan (G,D2 , E) = Ec p ∼p(cc p ) [log(D2 (cc p , c p ))]
…
+Ex ∼pdata (xx),cc p ∼p(cc p ),zz∼E(cc) [log(1 − D2 (cc p ,W [G(xx , z)]))],
Ground (8)
Encoder 𝑬 (ResNet) truth 𝒚 where G(xx, z ) = ŷy,W (ŷy) = ŷy p . A pair of patches(cc p , c p ) that have
the same textures is regarded as a positive example, otherwise
(b) Architecture of netB (cc p , ŷy p ) is regarded as a negative example.
To sum up, the local texture loss can consists of above three
Figure 2: Architecture of netA and netB. The generator, the dis-
terms, namely:
criminator and the encoder are same in the two architecture. The
netA and netB are trained alternatively. Lt = λs Ls + λ p L p + λgan Lgan . (9)
We have demonstrated that BicycleGAN is hard to restore tex- LNetA (G, D, E) = LGAN (G, D, E)+λim L1im (G, E)+λKL LKL (E)+Lt .
ture details in the fabric pattern using afore-motioned the global (10)
loss LGAN and L1im . In order to describe the fine-grained texture, Same with BicycleGAN, the cLR-GAN is considered as our
we add a local loss module referencing TextureGAN [XSA∗ 18] to netB. In general, a fashion designer expects to see more reference
netA. These local loss items include local style loss Ls , local pixel results except for the desired one. NetB plays a role in increas-
loss L p , and local discriminator loss Lgan . Different from Texture- ing output diversity by inputting random noise, as well as ensur-
GAN, FashionGAN aims to encode the input fabric pattern then ing each random latent vector z ∼ N(0, 1) could generate a ra-
output the mapping result and diverse reference results. The archi- tional result. NetA maps a random noise to the generated image,
tecture of FashionGAN is based on cVAE-GAN and cLR-GAN. namely z → ŷy. NetB maps the generated image to the random noise.
Whereas, TextureGAN transform an input image attached with a Namely ŷy → ẑz. Multi-modal results can be obtained with netB. The
fabric pattern to the realistic image. What they do is trying to "prop- value function of NetB is:
agate" the texture patch in contour image. Its architecture is based
LNetB (G, D, E) = LGAN (G, D) + λL1vec (G, E). (11)
on cGAN. Following the architecture of netA, G generates an im-
age ŷy mapping the input fabric pattern c . We sample two patches
In summary, the overall value function of FashionGAN is:
ŷy p and c p of random size from 45 × 45 to 65 × 65 at ŷy and c .
LAll = LNetA (G, D, E) + LNetB (G, D, E) (12)
The local style loss Ls , which is inspired by the deep learning
based texture synthesis and style transfer work [GEB15, GEB16]. Figure 2 shows the architecture of netA and netB. When train-
The style of an image can be represented by the Gram matrix of its ing FashionGAN , {xx(i) , y (i) , c (i) } are inputed. G, D, E in netA
is trained once according to Equation (10) by executing forward 4.2. Architecture details
and backward propagation. Then keeping the G, D, E and netB is
This subsection we illustrate the implementation details of en-
trained once according to Equation (11). Repeating this alternation.
coder E, generator G and discriminator D. For E, we use ResNet
[HZRS16] which is extensively employed in image classification
and visual feature description. A 64 × 64 fabric pattern image in-
put to E. For G, we also use U-Net architecture proposed in DC-
Mapping GAN [RMC15] which is the first CNN version of GAN. Inputs of
Encoder Generator result G are a garment contour image with 256 × 256 × 1 and an 8 × 1
Input
cloth
encoded latent vector. When testing, a user can input arbitrary con-
tour, fashion sketch or garment image to G. As for D. Following the
pix2pix work, we utilize PatchGAN proposed by pix2pix for dis-
Fashion sketch Diverse results
criminator which can be understood as a form of texture/style loss.
The network of G, D, E is composed of basic modules like residual
or convolution blocks. It can be easily extended to higher resolution
Figure 3: Architecture of FashionGAN in the test phase images. We also implement the 512 × 512 and 1024 × 1024 ver-
sions. In order to compare with baseline methods, we use 256×256
version.
Figure 3 shows the architecture of the FashionGAN in the test
phase. Users just input a fashion sketch/contour of garment together 4.3. Training details
with a fabric image. The generator will output a realistic garment
image with the desired fabric. Users can also input a list of random In order to reduce mode collapse, we apply wGAN loss item
latent vectors to show the diverse references. [ACB17] to train G and D. According to the author’s advice,
we use RMSprop optimizer with 1e-4 learning rate and 1 batch
size. For all our experiments, we use different weight settings for
4. Implementation the losses. For netA, set λim = 10, λKL = 0.01 in Equation (10),
λs = 2, 000, λ p = 1, λg an = 1 in Equation (9). For netB, we set
In this section, we first illustrate our garment dataset together with λ = 0.5 in Equation (11).
the experimental environment. Then we introduce the architecture
and training details.
5. Experiment results and analysis
This section, we give some baselines and evaluation metrics. We
4.1. Dataset and experiment environment conduct comparative experiments on these baselines and give the
We built a garment dataset. Garment images should have the qualitative and quantitative evaluation of the results. Lastly, We
clean background to avoid generating unreasonable results. There- show other examples of fashion design, such as garment parts,
fore we collected a lot of garment images and manually select the handbags, shoes, and user-generated sketch.
desired samples. Our data comes from two different sources:
• Crawling 5,000 images from the E-commerce website like 5.1. Baselines
taosousou † . We compare five different the most related projects:
• Filtering 19,000 images from public large-scale garment dataset
• pix2pix [IZZE17]. We used the paired data, ground truth-
like DeepFashion ‡ [LLQ∗ 16].
contour image, to train it.
Then we built a fabric pattern dataset by cropping a patch from • CycleGAN [ZPIE17]. It is an unsupervised model trained with
each image in the garment dataset. In order to train G, D, E, gar- adversarial loss and cycle reconstruction loss in two domains A
ment image ground truths, corresponding contour images extracted and B. In our experiment, contour images are in domain A, and
by HED and fabric pattern images are inputted to FashionGAN . ground truths are in domain B.
The training set contains 21000 pure color garments images and • BicycleGAN [ZZP∗ 17]. It can specify the content of the gener-
1000 striped garment images. The test set contains 1800 pure color ated image by encoding ground truth. In addition, it can also gen-
garment images and 100 striped garment images. We trained two erate multi-modal outputs by inputting random noise. We also
different models on these two datasets. For each model, we trained use the paired data to train it.
160 epochs on NVIDIA GeForce GTX 1080 Ti. At test time, we • MUNIT [HLBK18]. It is also an unsupervised model which con-
just input a fashion sketch/contour/garment image together with sists of two auto-encodersEA , EB and uses content and style la-
desired fabric pattern, the FashionGAN will output mapping results tent code. In our experiment, contour image and ground truth
within 0.5s per image. input to EA , EB separately.
• TextureGAN [XSA∗ 18]. It is a cGAN model whose condition
is a contour image with a fabric pattern. We assume the fabric
† http://www.taotaosou.com/ pattern is located in the center of the contour.
‡ http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html For the sake of fairness, we train all models 160 epoch.
√∫
√∫
(a) (b)
√∫
√∫
√∫
(c) (d)
√∫ √∫
(e)
Figure 4: Example results of our method. (a) Mapping a fabric pattern to a fashion sketch. (b) Mapping a fabric pattern to a garment image.
(c) Mapping a fabric pattern to a contour image. (d) Mapping a stripe fabric pattern to a contour image. (e) Shoes, bags, garment parts
and user-generated sketch mapping examples. The first column of each sub-figure represents the input contour, fashion sketch or garment
image. The second column is the input fabric pattern. The third column is the corresponding mapping result. The rest columns contain more
reference results. c 2018 The Author(s)
Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets
(a)
(b)
c 2018 The Author(s)
Computer Graphics Forum c 2018 The Eurographics Association and John Wiley & Sons Ltd.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets
(c)
Figure 5: Results of our method and other methods. (a) Mapping a fabric pattern to a fashion sketch. (b) Mapping a stripe fabric pattern
to a contour image. (c) Reconstructing ground truth. (d) Mapping a fabric pattern to a contour image. Column 1-3 is our results. Follow
columns are the results of pix2pix, BicycleGAN, MUNIT, CycleGAN. The last two columns show results of TextureGAN.
Yirui Cui &Qi Liu / FashionGAN: Display your fashion design using Conditional Generative Adversarial Nets
Figure 6: Examples of general image-to-image results. garment images with an irregular texture and trained them on Fash-
ionGAN. The generator outputs images with the average color of
the input fabric pattern. We analyze the reasons are as follows. First,
the encoder in our network can learn the mapping between single-
Table 2 displays the inception score of different methods in 4 test color or simple color combination (like stripe) and latent vector.
cases. This score reflects the diversity and quality of the generated There are various kinds of irregular textures in the real scene. Each
images. Except for MUNIT, our FashionGAN is better than other of them contains complex color combinations. It is hard to describe
models. TextureGAN and pix2pix are one-to-one mapping models, it using latent vector in the current architecture. Second, the irreg-
thus their inception scores are lower. Our multi-modal module ref- ular textures are in variant shapes, the current local loss module
erence the BicycleGAN, and their inception scores are very close. cannot reconstruct them well. In the future, we aim to construct a
MUNIT is the state-of-the-art multi-modal model, which achieves new local loss module describing the irregular texture. What’ more,
the highest IS. The reason is that they assume all images share con- we will extend our model to more general image translation tasks,
tent space. In the future work, we plan to add a content encoder to like image colorization and style transfer.
the diversity of results.
6. Conclusion
5.4. More applications
We propose FashionGAN for exhibiting the design effect of gar-
FashionGAN is a robust model. Figure 4 (e) shows some map- ments. Our model is an end-to-end framework which only needs
ping examples of garment parts and user-generated sketches. These users input a desired fabric image and a fashion sketch, a gar-
test cases are totally different with training data, but they are still ment image based on input data will be automatically generated.
mapped correctly with the model trained on garment dataset. Ex- Based on cGAN, FashionGAN establishes a bijection relationship
cept for garment design, shoes and bags design are very common in between fabric and latent vector so that the latent vector can be ex-
fashion fields. We also retrain our FashionGAN on shoes and bags plained with fabric information. Moreover, we explore to add some
dataset provided by BicycleGAN [ZZP∗ 17]. Figure 4 gives some local losses to help generate images with stripe texture. The exper-
examples of test cases. FashionGAN can generate desired results by iments show our method can indeed achieve impressive results in
specifying the fabric pattern. FashionGAN can be extended to gen- fashion design.
eral image-to-image translation like colorization. Figure 6 shows
examples like changing the color of seawater, grass or a fabric pat-
tern. 7. Acknowledgements
This research is supported by the National Key Research
5.5. Limitation and future work and Development Plan in China (2018YFC0830500), the
Guangzhou Science Technology and Innovation Commission
Figure 7 shows some failed mapping examples. Our FashionGAN (GZSTI16EG14/201704030079), and the National Natural Science
is workable for single-color and regular (like stripe) fabric pattern, Foundation of China (61772140).
but it cannot map an irregular one. Especially, we selected 2,000
[IZZE17] I SOLA P., Z HU J.-Y., Z HOU T., E FROS A. A.: Image- [XT15] X IE S., T U Z.: Holistically-nested edge detection. International
to-image translation with conditional adversarial networks. In IEEE Journal of Computer Vision 125, 1-3 (2015), 1–16. 3
Conference on Computer Vision and Pattern Recognition (CVPR) (July [YCYL∗ 17] Y EH R. A., C HEN C., Y IAN L IM T., S CHWING A. G.,
2017). 2, 5, 9 H ASEGAWA -J OHNSON M., D O M. N.: Semantic image inpainting with
deep generative models. In IEEE Conference on Computer Vision and
[JHR∗ 15] J UNG A., H AHMANN S., ROHMER D., B EGAULT A.,
Pattern Recognition (CVPR) (July 2017). 3
B OISSIEUX L., C ANI M. P.: Sketching folds:developable surfaces
from non-planar silhouettes. Acm Transactions on Graphics(TOG) 34, 5 [ZIE16] Z HANG R., I SOLA P., E FROS A. A.: Colorful image coloriza-
(2015), 1–12. 2 tion. In European Conference on Computer Vision (2016), Springer,
pp. 649–666. 2
[LLQ∗ 16] L IU Z., L UO P., Q IU S., WANG X., TANG X.: Deepfash-
ion: Powering robust clothes recognition and retrieval with rich annota- [ZPIE17] Z HU J.-Y., PARK T., I SOLA P., E FROS A. A.: Unpaired
tions. In IEEE Conference on Computer Vision and Pattern Recognition image-to-image translation using cycle-consistent adversarial networks.
(CVPR) (June 2016). 5 In IEEE International Conference on Computer Vision (ICCV) (Oct
2017). 2, 5, 9
[LSLW16] L ARSEN A. B. L., S ONDERBY S. K., L AROCHELLE H.,
W INTHER O.: Autoencoding beyond pixels using a learned similarity [ZXL∗ 17] Z HANG H., X U T., L I H., Z HANG S., H UANG X., WANG X.,
metric. International Conference on Machine Learning(ICML) (2016), M ETAXAS D.: Stackgan: Text to photo-realistic image synthesis with
1558–1566. 3 stacked generative adversarial networks. In IEEE International Confer-
ence on Computer Vision (ICCV) (2017), pp. 5907–5915. 2
[LTH∗ 17] L EDIG C., T HEIS L., H USZAR F., C ABALLERO J., C UN -
NINGHAM A., ACOSTA A., A ITKEN A., T EJANI A., T OTZ J., WANG [ZZP∗ 17] Z HU J.-Y., Z HANG R., PATHAK D., DARRELL T., E FROS
Z., S HI W.: Photo-realistic single image super-resolution using a gener- A. A., WANG O., S HECHTMAN E.: Toward multimodal image-to-
ative adversarial network. In IEEE Conference on Computer Vision and image translation. In Advances in Neural Information Processing Sys-
Pattern Recognition (CVPR) (July 2017). 3 tems(NIPS) (2017), Curran Associates, Inc., pp. 465–476. 2, 3, 5, 9, 10