Fashion Analysis and Understanding With Artificial Intelligence
Fashion Analysis and Understanding With Artificial Intelligence
a
Hangzhou Dianzi University School of Computer Science and Technology, Key Laboratory of Complex Systems Modeling and Simulation, China
b
Tencent Technology (Shanghai) Co., YoutuLab, China
Keywords: As handling fashion big data with Artificial Intelligence (AI) has become exciting challenges for
Cross-media fashion data computer scientists, fashion studies have received increasing attention in computer vision, ma-
Fashion recognition chine learning and multimedia communities in the past few years. In this paper, introduce the
Fashion understanding progress in fashion research and provide a taxonomy of these fashion studies that include low-
Fashion application
level fashion recognition, middle-level fashion understanding and high-level fashion applica-
Fashion industry
Artificial intelligence
tions. Finally, we discuss the challenges that when the fashion industry faces AI technologies.
1. Introdution
From time immemorial, fashion has been intimately associated with being a human as evidenced by beads and jewelry found even
in most ancient cultures. In contemporary society, fashion has had a significant effect on every aspect of social life, causing and
reflecting changes in social, economic, political, and cultural landscapes. The fashion industry has become one of the biggest seg-
ments of the economy in the world, estimated at 3 trillion dollars as of 2018, representing two percent of global GDP.1
On the other side, the growing popularity of social media and the prosperity of e-commerce has produced massive amounts of
cross-media fashion data, such as street data shared by users, runway show data released by fashion brands and product data
provided by e-commerce sites, displaying a rich and complex set of multimedia contents. Therefore, understanding and analyzing the
semantics of large-scale cross-media fashion data through machine learning and computer vision techniques is one of the essential
business analytics and technology tools for revolutionizing the industry and reshaping the mechanics of fashion. For instance, an
increasing number of popular designers and brands are leveraging leading social networks to survey customer preferences, such as
opinions, ideas, feedback, and trends.2
Due to its societal and economic impact, handling cross-media fashion data with new techniques has become exciting challenge
for computer scientists. Fortunately, fashion studies have received increasing attention in computer vision, machine learning and
multimedia communities in recent years. Those proposed theoretical research and promising applications provide sufficient technical
supports for understanding and analyzing cross-media fashion data from all aspects. The main goal of this paper is to introduce the
progress in fashion research and provide an overview of these works published by computer scientists.
Motivated by Song and Mei (2018), we classify the topics of fashion research into three levels, namely, low-level fashion re-
cognition, middle-level fashion understanding and high-level fashion applications, as shown in Table 1. Fashion recognition aims to
identify fashion garments at the pixel level for assisting fashion understanding. Two fundamental tasks are involved with low-level
fashion recognition, that is, clothing/human parsing and landmark detection. The goal of fashion understanding is to explore
⁎
Corresponding author.
E-mail addresses: [email protected] (X. Gu), [email protected] (F. Gao), [email protected] (M. Tan), [email protected] (P. Peng).
1
https://fashionunited.com/global-fashion-industry-statistics/
2
https://www.smartdatacollective.com/how-big-data-changing-fashion-industry/ .
https://doi.org/10.1016/j.ipm.2020.102276
Received 31 December 2019; Received in revised form 7 April 2020; Accepted 20 April 2020
Available online 29 April 2020
0306-4573/ © 2020 Elsevier Ltd. All rights reserved.
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Table 1
The taxonomy of fashion studies.
Field Subfield Methods
Fashion Recognition Clothing/Human Parsing Graphcial Model, Non-parametric Model, Parselets Representation Method, CNN Model, Adversarial
Model
Landmark Detection Deep Learning Methods
Fashion Understanding Clothing Attribute Prediction Single-task Learning, Multi-task Learning, Transfer Learning
Fashion Style Prediction Supervised Learning, Unsupervised Learning
Fashion Applications Fashion Retrieval Cross-scenario Retrieval Model, Interactive Retrieval Model
Fashion Recommendation Complementary Recommendation Model,Personalized Recommendation Model, Scenario-oriented
Recommendation Model, Explainable Recommendation Model, Generative Model
Fashion Compatibility Pairwise Compatibility Learning, Outfit Compatibility Learning
Fashion Image Synthesis Pose Guided Generative Model, Text Guiled Generative Model, Virtual Try-on Model, Image
Transformation Model, Fashion Design Model
Fashion Data Mining Fashion Trends Analysis, Hybrid Analytics
semantics (e.g. clothing attributes, fashion styles) from fashion data for supporting advanced fashion applications. Hence, there are
two popular tasks for middle-level fashion understanding, including clothing attribute prediction and fashion style prediction. At the
highest level, fashion applications cover a wide range of studies such as fashion retrieval, fashion recommendation, fashion com-
patibility, fashion image synthesis and fashion data mining, which lead us a step closer to the AI-enhanced fashion industry.
In addition to the aforementioned academic works, researchers from top technology companies are transforming fashion at a
faster pace than ever. For example, an Amazon team has developed an algorithm that learns about a particular fashion style and
creates similar images from scratch.3 IBM has teamed up with Tommy Hilfiger and the Fashion Institute of Technology for a project
called Reimagine Retail for helping give retailers an edge by equipping them with skills in AI design.4 Alibaba has collaborated with
GUESS to launch a pilot FashionAI concept shop, providing customers with a more enriching shopping experience that combines
online and offline shopping behaviors.5
Although some promising results have been achieved in the field of fashion studies, there are still many challenges as AI changing
the fashion world in different aspects. For instance, from the perspective of the fashion industry, the most serious problem is the huge
gap between various value chains of the fashion industry such as design, manufacturing and marketing. From the perspective of
improving the aforementioned fashion studies, main challenges involve lacking a unified large-scale hybrid fashion dataset, learning
a good feature representation of fashion data, the difficulty of fashion assessment, improving user’s shopping experiences and de-
veloping AI assistants for fashion designers, etc.
The rest of this paper is organized as follows. In Section 2, we introduce five kinds of approaches for addressing clothing/human
parsing tasks and several approaches for landmark detection. In Section 3, we first discuss three types of approaches for clothing
attribute prediction and then introduce supervised and unsupervised methods for fashion style prediction. In Section 4, we introduce
the latest works on fashion retrieval, fashion recommendation, fashion compatibility, fashion image synthesis and fashion data
mining, respectively. In Section 5, we review a variety of fashion benchmark datasets. In Section 6, we discuss the future challenges
that when the fashion industry faces AI technologies.
Fashion recognition focuses on pixel level computation of fashion images, which includes clothing parsing (or human parsing) and
landmark detection. Clothing parsing predicts pixel-wise labeling for garment items (e.g., hair, head, upper clothes and pants), which
builds a foundation for other fashion understanding tasks. Human parsing further partitions the human body along with clothing
items into semantic regions. Clothing/human parsing is extremely challenging due to the wide variety of garment items, possible
variations in combination, layering, and occlusion.
Landmark detection is to localize fashion landmarks such as corners of neckline, hemline, and cuff, which can better distinguish
the attributes of the clothes for retrieval and recommendation. These landmarks not only implicitly capture bounding boxes of
clothes, but also indicate their functional regions. Due to the large variation and non-rigid deformation of clothes, fashion landmark
detection is also a challenging task.
Many researchers have proposed different kinds of approaches for solving clothing/human parsing problem, which can be ca-
tegorized into five classes according to the underlying techniques: (1) Graphical models (e.g., conditional random fields) (Krhenbhl &
Koltun, 2011) are the common methods to enforce spatial contiguity in the output label maps. However, graphical models mainly
3
https://www.technologyreview.com/s/608668/amazon-has-developed-an-ai-fashion-designer/ .
4
https://www.whichplm.com/the-future-of-fashion-ai-changing-fashion-retail-industry/ .
5
https://www.businesswire.com/news/home/20180709005279/en/GUESS-Collaborates-Alibaba-Bring-Artificial-Intelligence-Fashion .
2
X. Gu, et al. Information Processing and Management 57 (2020) 102276
focus on constrained parsing problem and handle low-level inconsistencies with a small scope. (2) Non-parametric methods need not
require a lot of prior knowledge and rely on over-segmentation and pose estimation. However, the non-parametric methods are
limited by inaccurate matching, which results in the noises during the label transferring. (3) Parselets representation methods use
parselet as the building blocks for clothing/human parsing to overcome the inconsistent targets between pose estimation and clothing
parsing. Parselets are a group of parsable segments which can generally be obtained by low-level over-segmentation algorithms and
bear strong semantic meaning. (4) CNN models have made great progress in clothing/human parsing. However, using the pixel-wise
classification loss, CNN usually ignores the micro context between pixels and the macro context between semantic parts. (5) Ad-
versarial models reduce the semantic inconsistency and the local inconsistency in the parsing results by the use of adversarial
networks.
Liu, Luo, Qiu, Wang, and Tang (2016) first proposed FashionNet to learn clothing features by jointly predicting clothing attributes
and landmarks. Later, Liu, Yan, Luo, Wang, and Tang (2016) proposed a three-stage deep fashion alignment (DFA) framework for
landmark detection. Yan et al. (2017) trained a Deep LAndmark Network (DLAN) iteratively for jointly estimating bounding boxes
and landmarks in an end-to-end manner. Both Liu, Yan, et al. (2016) and Yan et al. (2017) are based on the regression model.
Wang, Xu, Shen, and Zhu (2018) indicated that the regression model is highly non-linear and difficult to optimize. They proposed a
knowledge-guided fashion network (AttentiveNet) for fashion landmark localization and clothing category classification by extending
neural network with domain-specific grammars. Later, Lee, Oh, Jung, and Kim (2019) introduced contextual knowledge of clothes
and proposed a global-local embedding module (Global-Local) to achieve more accurate landmark prediction performance.
Clothing attributes are an informative and compact representation for describing people. As illustrated in Fig. 1, beyond color and
pattern, clothing attributes include other important features such as material, collar, length, cut and fastener. Fine-grained clothing
attributes recognition can be used for fashion retrieval, fashion recommendation and fashion analysis. Different from clothing at-
tributes, fashion styles emerge organically from how people assemble outfits of clothing, serving as the expressions of an individual’s
3
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Fig. 1. An illustrative example of fine-grained clothing attributes (adapted from Di et al., 2013).
characters and aesthetics. For example, the Hipster Wars project (Kiapour, Yamaguchi, Berg, & Berg, 2014) defines five style cate-
gories including hipster, goth, preppy, pinup and bohemian. Fashion style can benefit various fashion analysis tasks such as fashion
compatibility and fashion trends analysis.
We classify clothing attributes prediction approaches into the following three types: (1) Single-task learning only focuses on
learning clothing attributes of fashion images from a specific fashion domain. (2) Multi-task learning learns clothing attributes and
other tasks (e.g. landmark detection) simultaneously. (3) Transfer learning learns clothing attributes by bridging the gap between
fashion images of different domains.
Some existing works on fashion style prediction learn style representation in a supervised learning way, i.e., constructing clas-
sifiers for fashion style prediction. Considering manually defined style categories would be too abstract to capture subtle style
differences, other existing works focused on learning style representation in unsupervised learning way.
4
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Supported by the low-level fashion recognition and middle-level fashion understanding techniques, high-level fashion applica-
tions blossom in fashion retrieval, fashion recommendation, fashion compatibility, fashion image synthesis and fashion data mining.
In the fashion domain, fashion retrieval focuses on identifying clothing items from an image database based on an input query, while
fashion recommendation emphasizes recommending clothing items or outfits under certain conditions such as occasion, location and
users’ preferences. Different from retrieval and recommendation, fashion compatibility computes the matching score between
clothing items. Very recently, the success of generative adversarial networks encourages researchers to devote themselves to fashion
image synthesis. Meanwhile, with the big fashion data from Internet, fashion data mining is another popular research topic.
Initially, a few related works on fashion retrieval is designed only within one scenario (Wang & Zhang, 2011). As online shopping
has become an exponentially growing market, some works are devoted to cross-scenario clothing retrieval (Huang, Feris, Chen, &
Yan, 2015; Jiang, Wu, & Fu, 2016; Kalantidis, Kennedy, & Li, 2013; Kiapour, Han, Lazebnik, Berg, & Berg, 2015; Liu, Song, et al.,
2012). To obtain more precise search results, some works propose to provide interactive search techniques that allow a user to
iteratively refine the results retrieved by the fashion search engine (Ak, Kassim, Lim, & Tham, 2018; Guo et al., 2018; Kovashka,
Parikh, & Grauman, 2012; Liu, Huang, He, Chiew, & Gao, 2015; Mizuochi, Kanezaki, & Harada, 2014; Xu et al., 2019; Zhao, Feng,
Wu, & Yan, 2017).
There has been a large body of research literatures on fashion recommendation as it can promote people’s participation in online
shopping. As shown in Fig. 2, some of the works identified whether two products are complementary (Huynh, Ciptadi, Tyagi, &
Agrawal, 2018; Iwata, Wanatabe, & Sawada, 2011; Jagadeesh, Piramuthu, Bhardwaj, Di, & Sundaresan, 2014; Kumar & Gupta, 2019;
Zhou, Di, Zhou, & Zhang, 2018), some built personalized models by learning people’s preferences implicitly or explicitly (Bracher,
Heinz, & Vollgraf, 2016; He & McAuley, 2016a; 2016b; Hidayati et al., 2018; Hu, Yi, & Davis, 2015; Yu et al., 2018), some considered
conditions (e.g. occasion, location) for clothing recommendation (Kang, Kim, Leskovec, Rosenberg, & McAuley, 2018; Liu, Feng,
et al., 2012; Zhang et al., 2017), some studied the tasks of explainable outfit recommendation (Feng et al., 2018; Hou et al., 2019; Lin
et al., 2018), and some improves fashion recommedation by connecting it with fashion image generation (Hsiao, Katsman, Wu,
Parikh, & Grauman, 2019; Kang, Fang, Wang, & McAuley, 2017; Lin et al., 2019).
5
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Fig. 2. Some illustrative examples of fashion recommendation studies: (a) An example of complementary item recommendation (adapted from
Jagadeesh et al., 2014). (b) An example of personalized fashion recommendation (adapted from Hidayati et al., 2018). (c) An example of scenario-
oriented outfit recommendation (adapted from Liu, Feng, et al., 2012). (d) An example of explainable fashion recommendation (adapted from
Lin et al., 2018). (e) An example of outfit improvement with generation (adapted from Hsiao et al., 2019).
arbitrary fashion units. Zhou et al. (2018) proposed to incorporate the expert knowledge including purchase behaviors, image
contents and product descriptions for providing fashion item recommendation.
Methods for fashion compatibility learning usually fall within two categories, namely, pairwise compatibility learning and outfit
compatibility learning, where the former takes a fashion item as a query and searches compatible items from different categories, and
the latter selects fashion items of different categories to form compatible outfits.
6
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Recent years have seen remarkable advances especially on generative models such as Generative Adversarial Networks (GANs)
(Goodfellow et al., 2014) and Variational Autoencoders (VAEs) (Kingma & Welling, 2014). Extensive studies have been conducted on
fashion image synthesis by using generative models, such as pose guided fashion/person image generation (Balakrishnan, Zhao,
Dalca, Durand, & Guttag, 2018; Dong et al., 2018; Esser, Sutter, & Ommer, 2018; Lassner, Pons-Moll, & Gehler, 2017; Ma, Jia, Sun,
et al., 2017; 2018; Pumarola, Agudo, Sanfeliu, & Moreno-Noguer, 2018; Siarohin, Sangineto, Lathuiliére, & Sebe, 2018), text guided
fashion image synthesis (Gnel, Erdem, & Erdem, 2018; Zhou et al., 2019; Zhu, Fidler, Urtasun, Lin, & Loy, 2017), virtual try-on
applications (Chou, Lee, Zhang, Lee, & Hsu, 2018; Dong et al., 2019; Han, Wu, Wu, Yu, & Davis, 2018; Wang, Zheng, et al., 2018; Wu,
Lin, Tao, & Cai, 2018), fashion image transformation (Han, Wu, Huang, Scott, & Davis, 2019; Jetchev & Bergmann, 2017; Mo, Cho, &
Shin, 2018; Raj et al., 2018; Yoo, Kim, Park, Paek, & Kweon, 2016; Zhao et al., 2018) and fashion design (Cui, Liu, Gao, & Su, 2018;
Jiang & Fu, 2017; Sbai, Elhoseiny, Bordes, LeCun, & Couprie, 2018; Xian et al., 2018; Yildirim, Seward, & Bergmann, 2018). Fig. 3
displays several examples of studies on fashion image synthesis.
7
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Fig. 3. Some illustrative examples of studies on fashion image synthesis: (a) An example of pose guided fashion image generation (adapted from
Ma, Jia, Sun, et al., 2017). (b) An example of text guided fashion image synthesis (adapted from Zhu et al., 2017). (c) An example of virtual try-on
applications (adapted from Han et al., 2018). (d) An example of fashion image transformation (adapted from Han et al., 2019). (e) An example of
fashion design synthesis (adapted from Cui et al., 2018).
when facing large spatial deformation challenge in the realistic virtual try-on tasks. Further, Yu, Wang, and Xie (2019) proposed the
VTNFP model by first generating warped clothing, followed by generating a body segmentation map of the person wearing the target
clothing and ending with a try-on synthesis module. To generate a new person image after fitting the desired clothes into the input
image and manipulate human poses, Dong et al. (2019) proposed Multi-pose Guided Virtual Try-on Network.
Extracting valuable knowledge from cross-media fashion data have become a great interest for the industry and academia because
of its promising opportunity for boosting the fashion industry. For instance, investigating the fashion trends (Abe et al., 2017; Al-
Halah, Stiefelhagen, & Grauman, 2017; Chen, Chen, Cong, Hsu, & Luo, 2015; Gu et al., 2017; Matzen, Bala, & Snavely, 2017;
Vittayakorn, Yamaguchi, Berg, & Berg, 2015) every year is remarkable for the industry as well as sociology and psychology. On the
other hand, extensive hybrid analytics have also been conducted due to the wide variety of demands (Chang, Cheng, Wu, & Hua,
2017; Chen & Luo, 2017; Jia et al., 2016; Kwak, Murillo, Belhumeur, Kriegman, & Belongie, 2013; Murillo, Kwak, Bourdev,
Kriegman, & Belongie, 2012; Song, Wang, Hua, & Yan, 2011; Takagi, Simo-Serra, Iizuka, & Ishikawa, 2017; Vittayakorn, Berg, & Berg,
2017; Yamaguchi, Berg, & Ortiz, 2014; Zou et al., 2016).
8
X. Gu, et al. Information Processing and Management 57 (2020) 102276
A variety of benchmark datasets have been introduced and contributed to a comprehensive understanding of fashion. Some
datasets are specifically tailored for a particular task such as clothing parsing, style prediction, fashion recommendation, fashion
compatibility and fashion trends analysis, while some are designed to evaluate multiple tasks of fashion understanding and analysis
simultaneously. Table 2 summarizes the comparison among the most representative fashion datasets.
In this part, we select some representative experimental results conducted on the DeepFashion dataset since it is the most widely
used fashion benchmark.
9
X. Gu, et al.
Table 2
Comparison of the representative fashion datasets.
Datasets # images # category # attributes # bboxes # landmarks # masks Tasks
10
Fashionpedia (Jia, Shi, Sirotenko, & Cui, 2019) 50K 46 92 ✗ ✗ 50K Clothing Parsing,Attribute Prediction
FashionAI (Zou et al., 2019) 357K 6 245 ✗ 324K ✗ Landmark Detection,Attribute Prediction
WTBI (Kiapour et al., 2015) 425K 11 ✗ 39K ✗ ✗ Attribute Prediction,Clothing Retrieval
DARN (Huang et al., 2015) 540K 9 179 7K ✗ ✗ Attribute Prediction,Clothing Retrieval
Amazon Fashion (He & McAuley, 2016a) 431K 6 ✗ ✗ ✗ ✗ Fashion Recommendation
FashionVC (Song et al., 2017) 20K 2 ✗ ✗ ✗ ✗ Fashion Compatibility
Deepfashion (Liu, Luo, et al., 2016) 800K 50 1000 ✗ 120K ✗ Landmark Detection,Attribute Prediction, Clothing Retrieval,Fashion Image Synthesis
Deepfashion2 (Ge et al., 2019) 491K 13 ✗ 801K 801K 801K Landmark Detection,Clothing Parsing,Clothing Retrieval
Information Processing and Management 57 (2020) 102276
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Table 3
Quantitative results for clothing attribute prediction on the Deepfashion dataset with top-k accuracy. Higher values are better. The best scores are
marked in bold.
Method Category Texture Fabric Shape Part Style All
top-3 | top-5 top-3 | top-5 top-3 | top-5 top-3 | top-5 top-3 | top-5 top-3 | top-5 top-3 | top-5
WTBI (Chen et al., 2012) 43.73 | 66.26 24.21 | 32.65 25.38 | 36.06 23.39 | 31.26 26.31 | 33.24 49.85 | 58.68 27.46 | 35.37
DARN (Huang et al., 2015) 59.48 | 79.58 36.15 | 48.15 36.64 | 48.52 35.89 | 46.93 39.17 | 50.14 66.11 | 71.36 42.35 | 51.95
FashionNet (Liu, Luo, et al., 2016) 82.58 | 90.17 37.46 | 49.52 39.30 | 49.84 39.47 | 48.59 44.13 | 54.02 66.43 | 73.16 45.52 | 54.61
Corbiere (Corbire et al., 2017) 86.30 | 92.80 53.60 | 63.20 39.10 | 48.80 50.10 | 59.50 38.80 | 48.90 30.50 | 38.30 23.10 | 30.40
AttentiveNet (Wang, Xu, et al., 2018) 90.99 | 95.78 50.31 | 65.48 40.31 | 48.23 53.32 | 61.05 40.65 | 56.32 68.70 | 74.25 51.53 | 60.95
Currently, some promising results have been achieved in fashion studies including fashion recognition, fashion understanding and
fashion applications. Several new techniques have been seamlessly embedded in the products for customers. For instance, fashion
recognition and fashion recommendation algorithms have been applied to the world’s largest eCommerce website, Taobao.
However, from the perspective of the fashion industry, the biggest problem is the huge gap between design, manufacturing and
marketing, which results in a huge waste of resources. Although many researchers have participated in equipping the fashion industry
with technologies, most of them only focus on a specific task but neglect to build connections among design, manufacturing and
marketing. New AI technologies that link the various value chains of the fashion industry are promising directions. Besides, there are
still many challenges for improving the aforementioned fashion studies, such as lacking a unified large-scale hybrid fashion dataset,
learning a good feature representation of fashion data, the difficulty of fashion assessment, improving user’s shopping experiences
and developing AI assistants for fashion designers. In the following, we mainly discuss these research challenges.
Table 4
Quantitative results for landmark detection on the Deepfashion dataset with normalized error (NE). Smaller values of NE indicates better results.
The best scores are marked in bold.
Method L.Collar R.Collar L.Sleeve R.Sleeve L.Waistline R.Waistline L.Hem R.Hem Avg.
FashionNet (Liu, Luo, et al., 2016) 0.0854 0.0902 0.0973 0.0935 0.0854 0.0845 0.0812 0.0823 0.0872
DFA (Liu, Yan, et al., 2016) 0.0628 0.0637 0.0658 0.0621 0.0726 0.0702 0.0658 0.0663 0.0660
DLAN (Yan et al., 2017) 0.0570 0.0611 0.0672 0.0647 0.0703 0.0694 0.0624 0.0627 0.0643
AttentiveNet (Wang, Xu, et al., 2018) 0.0415 0.0404 0.0496 0.0449 0.0502 0.0523 0.0537 0.0551 0.0484
Global-Local (Lee et al., 2019) 0.0312 0.0324 0.0427 0.0434 0.0361 0.0373 0.0442 0.0475 0.0393
11
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Fig. 4. Results of in-shop clothing retrieval (adopted from Liu, Luo, et al., 2016). (a) Example queries along with top-5 retrieved images. (b)
Retrieval accuracies of different methods.
Fig. 5. Results of consumer-to-shop clothing retrieval (adopted from Liu, Luo, et al., 2016). (a) Example queries along with top-5 retrieved images.
(b) Retrieval accuracies of different methods.
1. Collection of a unified large-scale hybrid fashion dataset. Currently, available fashion datasets are either too small, or from a single
data source, or tailored for a specific task, or spanning a short period of time. There is a lack of good benchmark dataset for
training, testing, evaluating and comparing the performance of different algorithms for fashion analysis. Therefore, it would be
tremendously helpful for researchers if there exists a unified large-scale fashion dataset, which contains multiple modalities data
and spans a long period of time.
2. Good representation learning approaches of fashion data. Studying a good representation of fashion data is an indispensable step for
fashion analysis and understanding, such as fashion trend analysis, fashion information retrieval and fashion recommendation.
Fashion data is a special kind of cross-media data, has its distinctive features such as multi-modal, multi-domain and weakly labeled.
Conventional representation learning approaches cannot be directly applied to fashion data since these models do not consider the
distinctive characteristics of fashion data. Hence, how to effectively integrate complementary features from multiple channels is a
great challenge for studying a good representation of cross-media fashion data.
3. Difficulty of fashion assessment. Not all fashion analysis tasks can be evaluated with objective metrics. For instance, predicting how
fashionable a person looks on a particular photograph involves many aesthetic factors such as the garments the subject is wearing,
how visually appealing the person is, and how appealing the scene behind the person is. Such similar tasks mostly rely on the
results of user studies, which are obtained from a small group of people. However, user studies can be easily influenced by the
users’ personal preferences or the environment. Thus, it is vital to build a novel and objective assessment metric for fashion
analysis.
4. Improving user’s shopping experiences. With e-commerce becoming a central way that people shop, new techniques such as virtual
try-on systems and digital wardrobe assistants that help you decide what to wear have been popular in both industry and aca-
demia. For virtual try-on systems, the main challenges are caused by the difficulties of rendering clothing and 3-D human body
modeling for arbitrary people. Thus, there are two key factors for developing new virtual try-on methods modeling, which
includes clothing geometry and capturing body shape details. For digital wardrobe assistants, the main challenges are caused by
the subjectivity of fashion compatibility and dynamic changes in user preferences. The key to this problem lies in modeling
fashion compatibility by using online learning algorithms, as online fashion data is changing rapidly over time.
5. Developing AI assistants for fashion designers. Generating new realistic fashion designs automatically through image generation
would be significant in the design process. For instance, synthesizing fashion designs conditioned on user-specified multiple
12
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Fig. 6. A visual comparison of three virtual try-on methods (adapted from Yu et al., 2019).
fashion attributes such as color, shape and texture would greatly reduce the cost of producing clothes. Some progress has been
made with the success of generative models, however, current synthesis results are too coarse to serve fashion designers. Besides,
the diverse attributes of fashion images in color, shape, pattern and style make it challenging to generate realistic fashion images.
Therefore, research works on how to handle such multimodal conditions as well as generate high-resolution realistic fashion
designs should be inspired.
Xiaoling Gu: Conceptualization, Methodology, Writing - original draft. Fei Gao: Writing - review & editing. Min Tan: Software,
Visualization. Pai Peng: Supervision.
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grants 61802100, 61971172, 61972119,
61702145 and 61971339. This work was also supported by the China Post-Doctoral Science Foundation under Grant 2019M653563.
References
Abe, K., Suzuki, T., Ueta, S., Nakamura, A., Satoh, Y., & Kataoka, H. (2017). Changing fashion cultures. CoRR. arXiv:1703.07920.
Ak, K. E., Kassim, A. A., Lim, J.-H., & Tham, J. Y. (2018). Learning attribute representations with localization for flexible fashion search. CVPR7708–7717.
Al-Halah, Z., Stiefelhagen, R., & Grauman, K. (2017). Fashion forward: Forecasting visual style in fashion. ICCV388–397.
Balakrishnan, G., Zhao, A., Dalca, A. V., Durand, F., & Guttag, J. V. (2018). Synthesizing images of humans in unseen poses. CVPR8340–8348.
Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., & Gool, L. V. (2012). Apparel classification with style. ACCV.
Bracher, C., Heinz, S., & Vollgraf, R. (2016). Fashion dna: Merging content and sales data for recommendation and article mapping. CoRR. arXiv:1609.02489.
Chang, Y.-T., Cheng, W.-H., Wu, B., & Hua, K.-L. (2017). Fashion world map: Understanding cities through streetwear fashion. ACM multimedia91–99.
Chen, H., Gallagher, A. C., & Girod, B. (2012). Describing clothing by semantic attributes. ECCV (3)7574. ECCV (3) 609–623.
Chen, K., Chen, K., Cong, P., Hsu, W. H., & Luo, J. (2015). Who are the devils wearing Prada in New York City? ACM multimedia177–180.
Chen, K.-T., & Luo, J. (2017). When fashion meets big data: Discriminative mining of best selling clothing features. WWW (companion volume)15–22.
Chen, L., & He, Y. (2018). Dress fashionably: Learn fashion collocation with deep mixed-category metric learning. AAAI2103–2110.
Chen, Q., Huang, J., Feris, R. S., Brown, L. M., Dong, J., & Yan, S. (2015). Deep domain adaptation for describing people based on fine-grained clothing attributes.
CVPR5315–5324.
Chen, W., Huang, P., Xu, J., Guo, X., Guo, C., Sun, F., ... Zhao, B. (2019). Pog: Personalized outfit generation for fashion recommendation at alibaba ifashion. CoRR.
arXiv:1905.01866.
Chen, X., Chen, H., Xu, H., Zhang, Y., Cao, Y., Qin, Z., & Zha, H. (2019). Personalized fashion recommendation with visual explanations based on multimodal attention
network: Towards visually explainable recommendation. SIGIR765–774.
Chou, C.-T., Lee, C.-H., Zhang, K., Lee, H.-C., & Hsu, W. H. (2018). Pivtons: Pose invariant virtual try-on shoe with conditional image completion. ACCV (6)11366. ACCV (6)
654–668.
Corbire, C., Ben younes, H., Ram, A., & Ollion, C. (2017). Leveraging weakly annotated data for fashion image retrieval and label prediction. ICCV Workshops2268–2274.
13
X. Gu, et al. Information Processing and Management 57 (2020) 102276
Cui, Y. R., Liu, Q., Gao, C. Y., & Su, Z. (2018). Fashiongan: Display your fashion design using conditional generative adversarial nets. Computer Graphics Forum : Journal
of the European Association for Computer Graphics, 37(7), 109–119.
Cui, Z., Li, Z., Wu, S., Zhang, X., & Wang, L. (2019). Dressing as a whole: Outfit compatibility learning based on node-wise graph neural networks. WWW307–317.
Di, W., Wah, C., Bhardwaj, A., Piramuthu, R., & Sundaresan, N. (2013). Style finder: Fine-grained clothing style detection and retrieval. CVPR Workshops8–13.
Dong, H., Liang, X., Gong, K., Lai, H., Zhu, J., & Yin, J. (2018). Soft-gated warping-gan for pose-guided person image synthesis. NEURIPS472–482.
Dong, H., Liang, X., Wang, B., Lai, H., Zhu, J., & Yin, J. (2019). Towards multi-pose guided virtual try-on network. CoRR. arXiv:1902.11026.
Dong, J., Chen, Q., Shen, X., Yang, J., & Yan, S. (2014). Towards unified human parsing and pose estimation. CVPR843–850.
Dong, J., Chen, Q., Xia, W., Huang, Z., & Yan, S. (2013). A deformable mixture parsing model with parselets. ICCV3408–3415.
Dong, Q., Gong, S., & Zhu, X. (2017). Multi-task curriculum transfer deep learning of clothing attributes. WACV520–529.
Esser, P., Sutter, E., & Ommer, B. (2018). A variational u-net for conditional appearance and shape generation. CVPR8857–8866.
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence,
35(8), 1915–1929.
Feng, Z., Yu, Z., Yang, Y., Jing, Y., Jiang, J., & Song, M. (2018). Interpretable partitioned embedding for customized multi-item fashion outfit composition. ICMR143–151.
Ge, Y., Zhang, R., Wu, L., Wang, X., Tang, X., & Luo, P. (2019). Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification
of clothing images. CoRR. arXiv:1901.07973.
Gong, K., Liang, X., Zhang, D., Shen, X., & Lin, L. (2017). Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing.
CVPR6757–6765.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... Bengio, Y. (2014). Generative adversarial nets. NIPS2672–2680.
Gu, X., Wong, Y., Peng, P., Shou, L., Chen, G., & Kankanhalli, M. S. (2017). Understanding fashion trends from street photos via neighbor-constrained embedding learning.
ACM multimedia190–198.
Gu, X., Wong, Y., Shou, L., Peng, P., Chen, G., & Kankanhalli, M. S. (2019). Multi-modal and multi-domain embedding learning for fashion retrieval and analysis. IEEE
Transactions on Multimedia, 21(6), 1524–1537.
Guo, X., Wu, H., Cheng, Y., Rennie, S., Tesauro, G., & Feris, R. S. (2018). Dialog-based interactive image retrieval. NEURIPS676–686.
Gnel, M., Erdem, E., & Erdem, A. (2018). Language guided fashion image manipulation with feature-wise transformations. CoRR. arXiv:1808.04000.
Han, X., Wu, Z., Huang, P. X., Zhang, X., Zhu, M., Li, Y., ... Davis, L. S. (2017). Automatic spatially-aware fashion concept discovery. ICCV1472–1480.
Han, X., Wu, Z., Huang, W., Scott, M. R., & Davis, L. S. (2019). Compatible and diverse fashion image inpainting. CoRR. arXiv:1902.01096.
Han, X., Wu, Z., Jiang, Y.-G., & Davis, L. S. (2017). Learning fashion compatibility with bidirectional lstms. ACM multimedia1078–1086.
Han, X., Wu, Z., Wu, Z., Yu, R., & Davis, L. S. (2018). Viton: An image-based virtual try-on network. CVPR7543–7552.
He, R., & McAuley, J. J. (& McAuley, 2016a). Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. WWW507–517.
He, R., & McAuley, J. J. (McAuley, 2016b). Vbpr: Visual Bayesian personalized ranking from implicit feedback. AAAI. AAAI Press144–150.
He, R., Packer, C., & McAuley, J. J. (2016). Learning compatibility across categories for heterogeneous item recommendation. Icdm937–942.
Hidayati, S. C., Hsu, C.-C., Chang, Y.-T., Hua, K.-L., Fu, J., & Cheng, W.-H. (2018). What dress fits me best?: Fashion recommendation on the clothing style for personal body
shape. ACM multimedia438–446.
Hou, M., Wu, L., Chen, E., Li, Z., Zheng, V. W., & Liu, Q. (2019). Explainable fashion recommendation: A semantic attribute region guided approach. CoRR. arXiv:1905.
12862.
Hsiao, W.-L., & Grauman, K. (2017). Learning the latent ”look”: Unsupervised discovery of a style-coherent embedding from fashion images. ICCV4213–4222.
Hsiao, W.-L., & Grauman, K. (2018). Creating capsule wardrobes from fashion images. CVPR7161–7170.
Hsiao, W.-L., Katsman, I., Wu, C.-Y., Parikh, D., & Grauman, K. (2019). Fashion++: Minimal edits for outfit improvement. CoRR. arXiv:1904.09261.
Hu, Y., Yi, X., & Davis, L. S. (2015). Collaborative fashion recommendation: A functional tensor factorization approach. ACM multimedia129–138.
Hu, Z., Yang, Z., Salakhutdinov, R. R., Qin, L., Liang, X., Dong, H., & Xing, E. P. (2018). Deep generative models with learnable knowledge constraints.
NEURIPS10522–10533.
Huang, J., Feris, R. S., Chen, Q., & Yan, S. (2015). Cross-domain image retrieval with a dual attribute-aware ranking network. ICCV1062–1070.
Huynh, C. P., Ciptadi, A., Tyagi, A., & Agrawal, A. (2018). Craft: Complementary recommendations using adversarial feature transformer. CoRR. arXiv:1804.10871.
Iwata, T., Wanatabe, S., & Sawada, H. (2011). Fashion coordinates recommender system using photographs from fashion magazines. IJCAI2262–2267.
Jagadeesh, V., Piramuthu, R., Bhardwaj, A., Di, W., & Sundaresan, N. (2014). Large scale visual recommendations from street fashion images. Kdd1925–1934.
Jetchev, N., & Bergmann, U. (2017). The conditional analogy gan: Swapping fashion articles on people images. ICCV workshops2287–2292.
Jia, J., Huang, J., Shen, G., He, T., Liu, Z., Luan, H.-B., & Yan, C. (2016). Learning to appreciate the aesthetic effects of clothing. AAAI1216–1222.
Jia, M., Shi, M., Sirotenko, M., & Cui, Y. (2019). The fashionpedia ontology and fashion segmentation dataset. https://fashionpedia.github.io/home/Fashionpedia_
overview.html.
Jia, M., Zhou, Y., Shi, M., & Hariharan, B. (2018). A deep-learning-based fashion attributes detection model. CoRR. arXiv:1810.10148.
Jiang, S., & Fu, Y. (2017). Fashion style generator. IJCAI3721–3727.
Jiang, S., Shao, M., Jia, C., & Fu, Y. (2016). Consensus style centralizing auto-encoder for weak style classification. AAAI1223–1229.
Jiang, S., Wu, Y., & Fu, Y. (2016). Deep bi-directional cross-triplet embedding for cross-domain clothing retrieval. ACM multimedia52–56.
Kalantidis, Y., Kennedy, L., & Li, L.-J. (2013). Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. ICMR.
Kang, W.-C., Fang, C., Wang, Z., & McAuley, J. J. (2017). Visually-aware fashion recommendation and design with generative image models. ICDM207–216.
Kang, W.-C., Kim, E., Leskovec, J., Rosenberg, C., & McAuley, J. J. (2018). Complete the look: Scene-based complementary product recommendation. CoRR.
arXiv:1812.01748.
Kiapour, M. H., Han, X., Lazebnik, S., Berg, A. C., & Berg, T. L. (2015). Where to buy it: Matching street clothing photos in online shops. ICCV3343–3351.
Kiapour, M. H., Yamaguchi, K., Berg, A. C., & Berg, T. L. (2014). Hipster wars: Discovering elements of fashion styles. Eccv (1)8689. Eccv (1) 472–488.
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. ICLR.
Kovashka, A., Parikh, D., & Grauman, K. (2012). Whittlesearch: Image search with relative attribute feedback. CVPR2973–2980.
Krhenbhl, P., & Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. NIPS109–117.
Kumar, S., & Gupta, M. D. (2019). C+gan: Complementary fashion item recommendation. CoRR. arXiv:1906.05596.
Kwak, I. S., Murillo, A. C., Belhumeur, P. N., Kriegman, D. J., & Belongie, S. J. (2013). From bikers to surfers: Visual recognition of urban tribes. Bmvc.
Lassner, C., Pons-Moll, G., & Gehler, P. V. (2017). A generative model of people in clothing. ICCV853–862.
Lee, H., Seol, J., & goo Lee, S. (2017). Style2vec: Representation learning for fashion items from style sets. CoRR. arXiv:1708.04014.
Lee, S., Oh, S., Jung, C., & Kim, C. (2019). A global-local emebdding module for fashion landmark detection. CoRR. arXiv:1908.10548.
Li, Y., Cao, L., Zhu, J., & Luo, J. (2017). Mining fashion outfit composition using an end-to-end deep learning approach on set data. IEEE Transactions on Multimedia,
19(8), 1946–1955.
Liang, X., Liu, S., Shen, X., Yang, J., Liu, L., Dong, J., ... Yan, S. (2015). Deep human parsing with active template regression. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 37(12), 2402–2414.
Liang, X., Xu, C., Shen, X., Yang, J., Liu, S., Tang, J., ... Yan, S. (2015). Human parsing with contextualized convolutional neural network. ICCV1386–1394.
Lin, Y., Ren, P., Chen, Z., Ren, Z., Ma, J., & de Rijke, M. (2018). Explainable fashion recommendation with joint outfit matching and comment generation. CoRR.
arXiv:1806.08977.
Lin, Y., Ren, P., Chen, Z., Ren, Z., Ma, J., & de Rijke, M. (2019). Improving outfit recommendation with co-supervision of fashion generation. WWW1095–1105.
Liu, A., Su, Y., Nie, W., & Kankanhalli, M. S. (2017). Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 39(1), 102–114.
Liu, S., Feng, J., Domokos, C., Xu, H., Huang, J., Hu, Z., & Yan, S. (2014). Fashion parsing with weak color-category labels. IEEE Transactions on Multimedia, 16,
253–265.
Liu, S., Feng, J., Song, Z., Zhang, T., Lu, H., Xu, C., & Yan, S. (2012). Hi, magic closet, tell me what to wear!. ACM multimedia619–628.
Liu, S., Liang, X., Liu, L., Lu, K., Lin, L., & Yan, S. (2014). Fashion parsing with video context. ACM multimedia467–476.
Liu, S., Liang, X., Liu, L., Shen, X., Yang, J., Xu, C., ... Yan, S. (2015). Matching-cnn meets knn: Quasi-parametric human parsing. CVPR1419–1427.
Liu, S., Song, Z., Wang, M., Xu, C., Lu, H., & Yan, S. (2012). Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. ACM
14
X. Gu, et al. Information Processing and Management 57 (2020) 102276
multimedia1335–1336.
Liu, Z., Huang, H., He, Q., Chiew, K., & Gao, Y. (2015). Rare category exploration on linear time complexity. DASFAA (2)9050. DASFAA (2) 37–54.
Liu, Z., Luo, P., Qiu, S., Wang, X., & Tang, X. (2016). Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. CVPR1096–1104.
Liu, Z., Yan, S., Luo, P., Wang, X., & Tang, X. (2016). Fashion landmark detection in the wild. Eccv (2)9906. Eccv (2) 229–245.
Luc, P., Couprie, C., Chintala, S., & Verbeek, J. (2016). Semantic segmentation using adversarial networks. CoRR. arXiv:1611.08408.
Luo, Y., Zheng, Z., Zheng, L., Guan, T., Yu, J., & Yang, Y. (2018). Macro-micro adversarial network for human parsing. ECCV (9)11213. ECCV (9) 424–440.
M., J. O., & Tuytelaars, T. (2016). Modeling visual compatibility through hierarchical mid-level elements. CoRR. arXiv:1604.00036.
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., & Gool, L. V. (2017). Pose guided person image generation. NIPS405–415.
Ma, L., Sun, Q., Georgoulis, S., Gool, L. V., Schiele, B., & Fritz, M. (2018). Disentangled person image generation. CVPR99–108.
Ma, Y., Jia, J., Zhou, S., Fu, J., Liu, Y., & Tong, Z. (2017). Towards better understanding the clothing fashion styles: A multimodal deep learning approach. AAAI38–44.
Matzen, K., Bala, K., & Snavely, N. (2017). Streetstyle: Exploring world-wide clothing styles from millions of photos. CoRR. arXiv:1706.01869.
McAuley, J. J., Targett, C., Shi, Q., & van den Hengel, A. (2015). Image-based recommendations on styles and substitutes. SIGIR43–52.
Mizuochi, M., Kanezaki, A., & Harada, T. (2014). Clothing retrieval based on local similarity with multiple images. ACM multimedia1165–1168.
Mo, S., Cho, M., & Shin, J. (2018). Instagan: Instance-aware image-to-image translation. CoRR. arXiv:1812.10889.
Murillo, A. C., Kwak, I. S., Bourdev, L. D., Kriegman, D. J., & Belongie, S. J. (2012). Urban tribes: Analyzing group photos from a social perspective. CVPR workshops28–35.
Nakamura, T., & Goto, R. (2018). Outfit generation and style extraction via bidirectional lstm and autoencoder. CoRR. arXiv:1807.03133.
Papandreou, G., Chen, L.-C., Murphy, K., & Yuille, A. L. (2015). Weakly- and semi-supervised learning of a dcnn for semantic image segmentation. CoRR, arXiv:1502.
02734.
Park, S., Nie, B. X., & Zhu, S.-C. (2018). Attribute and-or grammar for joint parsing of human pose, parts and attributes. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 40(7), 1555–1569.
Pumarola, A., Agudo, A., Sanfeliu, A., & Moreno-Noguer, F. (2018). Unsupervised person image synthesis in arbitrary poses. CVPR8620–8628.
Raj, A., Sangkloy, P., Chang, H., Hays, J., Ceylan, D., & Lu, J. (2018). Swapnet: Image based garment transfer. ECCV11216. ECCV 679–695.
Rostamzadeh, N., Hosseini, S., Boquet, T., Stokowiec, W., Zhang, Y., Jauvin, C., & Pal, C. (2018). Fashion-gen: The generative fashion dataset and challenge. CoRR.
arXiv:1806.08317.
Sbai, O., Elhoseiny, M., Bordes, A., LeCun, Y., & Couprie, C. (2018). Design: Design inspiration from generative networks. Eccv workshops11131. Eccv workshops 37–44.
Shih, Y.-S., Chang, K.-Y., Lin, H.-T., & Sun, M. (2018). Compatibility family learning for item recommendation and generation. AAAI2403–2410.
Siarohin, A., Sangineto, E., Lathuiliére, S., & Sebe, N. (2018). Deformable gans for pose-based human image generation. CVPR3408–3416.
Simo-Serra, E., Fidler, S., Moreno-Noguer, F., & Urtasun, R. (2015). Neuroaesthetics in fashion: Modeling the perception of fashionability. CVPR869–877.
Simo-Serra, E., & Ishikawa, H. (2016). Fashion style in 128 floats: Joint ranking and classification using weak data for feature extraction. CVPR298–307.
Singh, K. K., & Lee, Y. J. (2016). End-to-end localization and ranking for relative attributes. ECCV (6)9910. ECCV (6) 753–769.
Song, S., & Mei, T. (2018). When multimedia meets fashion. IEEE Multimedia, 25(3), 102–108.
Song, X., Feng, F., Han, X., Yang, X., Liu, W., & Nie, L. (2018). Neural compatibility modeling with attentive knowledge distillation. SIGIR5–14.
Song, X., Feng, F., Liu, J., Li, Z., Nie, L., & Ma, J. (2017). Neurostylist: Neural compatibility modeling for clothing matching. ACM multimedia753–761.
Song, Z., Wang, M., Hua, X.-S., & Yan, S. (2011). Predicting occupation via human clothing and contexts. ICCV1084–1091.
Takagi, M., Simo-Serra, E., Iizuka, S., & Ishikawa, H. (2017). What makes a style: Experimental analysis of fashion prediction. ICCV workshops2247–2253.
Vaccaro, K., Shivakumar, S., Ding, Z., Karahalios, K., & Kumar, R. (2016). The elements of fashion style. UIST777–785.
Vasileva, M. I., Plummer, B. A., Dusad, K., Rajpal, S., Kumar, R., & Forsyth, D. A. (2018). Learning type-aware embeddings for fashion compatibility. ECCV (16)11220.
ECCV (16) 405–421.
Veit, A., Kovacs, B., Bell, S., McAuley, J. J., Bala, K., & Belongie, S. J. (2015). Learning visual clothing style with heterogeneous dyadic co-occurrences. ICCV4642–4650.
Vittayakorn, S., Berg, A. C., & Berg, T. L. (2017). When was that made? WACV715–724.
Vittayakorn, S., Yamaguchi, K., Berg, A. C., & Berg, T. L. (2015). Runway to realway: Visual analysis of fashion. WACV951–958.
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., & Yang, M. (2018). Toward characteristic-preserving image-based virtual try-on network. ECCV (13)11217. ECCV (13)
607–623.
Wang, W., Xu, Y., Shen, J., & Zhu, S.-C. (2018). Attentive fashion grammar network for fashion landmark detection and clothing category classification. CVPR4271–4280.
Wang, X., & Zhang, T. (2011). Clothes search in consumer photos via color matching and attribute learning. ACM multimedia1353–1356.
Wu, Z., Lin, G., Tao, Q., & Cai, J. (2018). M2e-try on net: Fashion from model to everyone. CoRR. arXiv:1811.08599.
Xian, W., Sangkloy, P., Agrawal, V., Raj, A., Lu, J., Fang, C., ... Hays, J. (2018). Texturegan: Controlling deep image synthesis with texture patches. CVPR8456–8465.
Xu, N., Zhang, H., Liu, A., Nie, W., Su, Y., Nie, J., & Yongdong, Z. (2019). Multi-level policy and reward-based deep reinforcement learning framework for image
captioning. IEEE Transactions on Multimedia.
Yamaguchi, K., Berg, T. L., & Ortiz, L. E. (2014). Chic or social: Visual popularity analysis in online fashion networks. ACM multimedia773–776.
Yamaguchi, K., Kiapour, M. H., & Berg, T. L. (2013). Paper doll parsing: Retrieving similar styles to parse clothing items. ICCV3519–3526.
Yamaguchi, K., Kiapour, M. H., Ortiz, L. E., & Berg, T. L. (2012). Parsing clothing in fashion photographs. CVPR3570–3577.
Yamaguchi, K., Kiapour, M. H., Ortiz, L. E., & Berg, T. L. (2015). Retrieving similar styles to parse clothing. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 37(5), 1028–1040.
Yamaguchi, K., Okatani, T., Sudo, K., Murasaki, K., & Taniguchi, Y. (2015). Mix and match: Joint model for clothing and attribute recognition. BMVC51.1–51.12.
Yan, S., Liu, Z., Luo, P., Qiu, S., Wang, X., & Tang, X. (2017). Unconstrained fashion landmark detection via hierarchical recurrent transformer networks. ACM multi-
media172–180.
Yang, W., Luo, P., & Lin, L. (2014). Clothing co-parsing by joint image segmentation and labeling. CVPR3182–3189.
Yang, X., He, X., Wang, X., Ma, Y., Feng, F., Wang, M., & Chua, T.-S. (2019). Interpretable fashion matching with rich attributes. SIGIR775–784.
Yang, X., Ma, Y., Liao, L., Wang, M., & Chua, T.-S. (2019). Transnfcm: Translation-based neural fashion compatibility modeling. AAAI403–410.
Yildirim, G., Seward, C., & Bergmann, U. (2018). Disentangling multiple conditional inputs in gans. CoRR. arXiv:1806.07819.
Yin, R., Li, K., Lu, J., & Zhang, G. (2019). Enhancing fashion recommendation with visual compatibility relationship. WWW3434–3440.
Yoo, D., Kim, N., Park, S., Paek, A. S., & Kweon, I.-S. (2016). Pixel-level domain transfer. ECCV (8)9912. ECCV (8) 517–532.
Yu, R., Wang, X., & Xie, X. (2019). Vtnfp: An image-based virtual try-on network with body and clothing feature preservation. ICCV10510–10519.
Yu, W., Zhang, H., He, X., Chen, X., Xiong, L., & Qin, Z. (2018). Aesthetic-based clothing recommendation. WWW649–658.
Zanfir, M., Popa, A.-I., Zanfir, A., & Sminchisescu, C. (2018). Human appearance transfer. CVPR5391–5399.
Zhang, N., Paluri, M., Ranzato, M., Darrell, T., & Bourdev, L. D. (2014). Panda: Pose aligned networks for deep attribute modeling. CVPR1637–1644.
Zhang, X., Jia, J., Gao, K., Zhang, Y., Zhang, D., Li, J., & Tian, Q. (2017). Trip outfits advisor: Location-oriented clothing recommendation. IEEE Transactions on
Multimedia, 19(11), 2533–2544.
Zhao, B., Feng, J., Wu, X., & Yan, S. (2017). Memory-augmented attribute manipulation networks for interactive fashion search. CVPR6156–6164.
Zhao, B., Wu, X., Cheng, Z.-Q., Liu, H., Jie, Z., & Feng, J. (2018). Multi-view image generation from a single-view. ACM multimedia383–391.
Zheng, S., Yang, F., Kiapour, M. H., & Piramuthu, R. (2018). Modanet: A large-scale street fashion dataset with polygon annotations. ACM multimedia1670–1678.
Zhou, X., Huang, S., Li, B., Li, Y., Li, J., & Zhang, Z. (2019). Text guided person image synthesis. CoRR. arXiv:1904.05118.
Zhou, Z., Di, X., Zhou, W., & Zhang, L. (2018). Fashion sensitive clothing recommendation using hierarchical collocation model. ACM multimedia1119–1127.
Zhu, S., Fidler, S., Urtasun, R., Lin, D., & Loy, C. C. (2017). Be your own prada: Fashion synthesis with structural coherence. ICCV1689–1697.
Zou, Q., Zhang, Z., Wang, Q., Li, Q., Chen, L., & Wang, S. (2016). Who leads the clothing fashion: Style, color, or texture? A computational study. CoRR. arXiv:1608.
07444.
Zou, X., Kong, X., Wong, W., Wang, C., Liu, Y., & Cao, Y. (2019). Fashionai: A hierarchical dataset for fashion understanding. CVPR workshops.
15