0% found this document useful (0 votes)
130 views15 pages

Fashion Analysis and Understanding With Artificial Intelligence

This document summarizes recent progress in using artificial intelligence for fashion analysis and understanding. It categorizes fashion research into three levels: low-level fashion recognition involving tasks like clothing/human parsing and landmark detection; middle-level fashion understanding including clothing attribute prediction and fashion style prediction; and high-level fashion applications such as fashion retrieval, recommendation, compatibility analysis, image synthesis, and data mining. The document also discusses challenges in bridging the gap between AI research and the fashion industry.

Uploaded by

fefepoj556
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views15 pages

Fashion Analysis and Understanding With Artificial Intelligence

This document summarizes recent progress in using artificial intelligence for fashion analysis and understanding. It categorizes fashion research into three levels: low-level fashion recognition involving tasks like clothing/human parsing and landmark detection; middle-level fashion understanding including clothing attribute prediction and fashion style prediction; and high-level fashion applications such as fashion retrieval, recommendation, compatibility analysis, image synthesis, and data mining. The document also discusses challenges in bridging the gap between AI research and the fashion industry.

Uploaded by

fefepoj556
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Information Processing and Management 57 (2020) 102276

Contents lists available at ScienceDirect

Information Processing and Management


journal homepage: www.elsevier.com/locate/infoproman

Fashion analysis and understanding with artificial intelligence


T
Xiaoling Gu ,a, Fei Gaoa, Min Tana, Pai Pengb

a
Hangzhou Dianzi University School of Computer Science and Technology, Key Laboratory of Complex Systems Modeling and Simulation, China
b
Tencent Technology (Shanghai) Co., YoutuLab, China

ARTICLE INFO ABSTRACT

Keywords: As handling fashion big data with Artificial Intelligence (AI) has become exciting challenges for
Cross-media fashion data computer scientists, fashion studies have received increasing attention in computer vision, ma-
Fashion recognition chine learning and multimedia communities in the past few years. In this paper, introduce the
Fashion understanding progress in fashion research and provide a taxonomy of these fashion studies that include low-
Fashion application
level fashion recognition, middle-level fashion understanding and high-level fashion applica-
Fashion industry
Artificial intelligence
tions. Finally, we discuss the challenges that when the fashion industry faces AI technologies.

1. Introdution

From time immemorial, fashion has been intimately associated with being a human as evidenced by beads and jewelry found even
in most ancient cultures. In contemporary society, fashion has had a significant effect on every aspect of social life, causing and
reflecting changes in social, economic, political, and cultural landscapes. The fashion industry has become one of the biggest seg-
ments of the economy in the world, estimated at 3 trillion dollars as of 2018, representing two percent of global GDP.1
On the other side, the growing popularity of social media and the prosperity of e-commerce has produced massive amounts of
cross-media fashion data, such as street data shared by users, runway show data released by fashion brands and product data
provided by e-commerce sites, displaying a rich and complex set of multimedia contents. Therefore, understanding and analyzing the
semantics of large-scale cross-media fashion data through machine learning and computer vision techniques is one of the essential
business analytics and technology tools for revolutionizing the industry and reshaping the mechanics of fashion. For instance, an
increasing number of popular designers and brands are leveraging leading social networks to survey customer preferences, such as
opinions, ideas, feedback, and trends.2
Due to its societal and economic impact, handling cross-media fashion data with new techniques has become exciting challenge
for computer scientists. Fortunately, fashion studies have received increasing attention in computer vision, machine learning and
multimedia communities in recent years. Those proposed theoretical research and promising applications provide sufficient technical
supports for understanding and analyzing cross-media fashion data from all aspects. The main goal of this paper is to introduce the
progress in fashion research and provide an overview of these works published by computer scientists.
Motivated by Song and Mei (2018), we classify the topics of fashion research into three levels, namely, low-level fashion re-
cognition, middle-level fashion understanding and high-level fashion applications, as shown in Table 1. Fashion recognition aims to
identify fashion garments at the pixel level for assisting fashion understanding. Two fundamental tasks are involved with low-level
fashion recognition, that is, clothing/human parsing and landmark detection. The goal of fashion understanding is to explore


Corresponding author.
E-mail addresses: [email protected] (X. Gu), [email protected] (F. Gao), [email protected] (M. Tan), [email protected] (P. Peng).
1
https://fashionunited.com/global-fashion-industry-statistics/
2
https://www.smartdatacollective.com/how-big-data-changing-fashion-industry/ .

https://doi.org/10.1016/j.ipm.2020.102276
Received 31 December 2019; Received in revised form 7 April 2020; Accepted 20 April 2020
Available online 29 April 2020
0306-4573/ © 2020 Elsevier Ltd. All rights reserved.
X. Gu, et al. Information Processing and Management 57 (2020) 102276

Table 1
The taxonomy of fashion studies.
Field Subfield Methods

Fashion Recognition Clothing/Human Parsing Graphcial Model, Non-parametric Model, Parselets Representation Method, CNN Model, Adversarial
Model
Landmark Detection Deep Learning Methods
Fashion Understanding Clothing Attribute Prediction Single-task Learning, Multi-task Learning, Transfer Learning
Fashion Style Prediction Supervised Learning, Unsupervised Learning
Fashion Applications Fashion Retrieval Cross-scenario Retrieval Model, Interactive Retrieval Model
Fashion Recommendation Complementary Recommendation Model,Personalized Recommendation Model, Scenario-oriented
Recommendation Model, Explainable Recommendation Model, Generative Model
Fashion Compatibility Pairwise Compatibility Learning, Outfit Compatibility Learning
Fashion Image Synthesis Pose Guided Generative Model, Text Guiled Generative Model, Virtual Try-on Model, Image
Transformation Model, Fashion Design Model
Fashion Data Mining Fashion Trends Analysis, Hybrid Analytics

semantics (e.g. clothing attributes, fashion styles) from fashion data for supporting advanced fashion applications. Hence, there are
two popular tasks for middle-level fashion understanding, including clothing attribute prediction and fashion style prediction. At the
highest level, fashion applications cover a wide range of studies such as fashion retrieval, fashion recommendation, fashion com-
patibility, fashion image synthesis and fashion data mining, which lead us a step closer to the AI-enhanced fashion industry.
In addition to the aforementioned academic works, researchers from top technology companies are transforming fashion at a
faster pace than ever. For example, an Amazon team has developed an algorithm that learns about a particular fashion style and
creates similar images from scratch.3 IBM has teamed up with Tommy Hilfiger and the Fashion Institute of Technology for a project
called Reimagine Retail for helping give retailers an edge by equipping them with skills in AI design.4 Alibaba has collaborated with
GUESS to launch a pilot FashionAI concept shop, providing customers with a more enriching shopping experience that combines
online and offline shopping behaviors.5
Although some promising results have been achieved in the field of fashion studies, there are still many challenges as AI changing
the fashion world in different aspects. For instance, from the perspective of the fashion industry, the most serious problem is the huge
gap between various value chains of the fashion industry such as design, manufacturing and marketing. From the perspective of
improving the aforementioned fashion studies, main challenges involve lacking a unified large-scale hybrid fashion dataset, learning
a good feature representation of fashion data, the difficulty of fashion assessment, improving user’s shopping experiences and de-
veloping AI assistants for fashion designers, etc.
The rest of this paper is organized as follows. In Section 2, we introduce five kinds of approaches for addressing clothing/human
parsing tasks and several approaches for landmark detection. In Section 3, we first discuss three types of approaches for clothing
attribute prediction and then introduce supervised and unsupervised methods for fashion style prediction. In Section 4, we introduce
the latest works on fashion retrieval, fashion recommendation, fashion compatibility, fashion image synthesis and fashion data
mining, respectively. In Section 5, we review a variety of fashion benchmark datasets. In Section 6, we discuss the future challenges
that when the fashion industry faces AI technologies.

2. Low-level fashion recognition

Fashion recognition focuses on pixel level computation of fashion images, which includes clothing parsing (or human parsing) and
landmark detection. Clothing parsing predicts pixel-wise labeling for garment items (e.g., hair, head, upper clothes and pants), which
builds a foundation for other fashion understanding tasks. Human parsing further partitions the human body along with clothing
items into semantic regions. Clothing/human parsing is extremely challenging due to the wide variety of garment items, possible
variations in combination, layering, and occlusion.
Landmark detection is to localize fashion landmarks such as corners of neckline, hemline, and cuff, which can better distinguish
the attributes of the clothes for retrieval and recommendation. These landmarks not only implicitly capture bounding boxes of
clothes, but also indicate their functional regions. Due to the large variation and non-rigid deformation of clothes, fashion landmark
detection is also a challenging task.

2.1. Clothing/human parsing

Many researchers have proposed different kinds of approaches for solving clothing/human parsing problem, which can be ca-
tegorized into five classes according to the underlying techniques: (1) Graphical models (e.g., conditional random fields) (Krhenbhl &
Koltun, 2011) are the common methods to enforce spatial contiguity in the output label maps. However, graphical models mainly

3
https://www.technologyreview.com/s/608668/amazon-has-developed-an-ai-fashion-designer/ .
4
https://www.whichplm.com/the-future-of-fashion-ai-changing-fashion-retail-industry/ .
5
https://www.businesswire.com/news/home/20180709005279/en/GUESS-Collaborates-Alibaba-Bring-Artificial-Intelligence-Fashion .

2
X. Gu, et al. Information Processing and Management 57 (2020) 102276

focus on constrained parsing problem and handle low-level inconsistencies with a small scope. (2) Non-parametric methods need not
require a lot of prior knowledge and rely on over-segmentation and pose estimation. However, the non-parametric methods are
limited by inaccurate matching, which results in the noises during the label transferring. (3) Parselets representation methods use
parselet as the building blocks for clothing/human parsing to overcome the inconsistent targets between pose estimation and clothing
parsing. Parselets are a group of parsable segments which can generally be obtained by low-level over-segmentation algorithms and
bear strong semantic meaning. (4) CNN models have made great progress in clothing/human parsing. However, using the pixel-wise
classification loss, CNN usually ignores the micro context between pixels and the macro context between semantic parts. (5) Ad-
versarial models reduce the semantic inconsistency and the local inconsistency in the parsing results by the use of adversarial
networks.

2.1.1. Graphical models


Yamaguchi, Kiapour, Ortiz, and Berg (2012) first introduced clothing parsing task and utilized a superpixel labeling approach on a
conditional random field by relying on human pose estimation. Later, Yang, Luo, and Lin (2014) developed an integrated system of
clothing co-parsing to jointly parse a set of clothing images into semantic configurations. Liu, Feng, et al. (2014) proposed a weakly-
supervised fashion parsing framework for combining the human pose estimation module, the MRF-based color and category inference
module and the (super) pixel-level category classifier learning module.

2.1.2. Non-parametric models


Yamaguchi, Kiapour, and Berg (2013) and Yamaguchi, Kiapour, Ortiz, and Berg (2015) proposed a clothing parsing method based
on nearest neighbor style retrieval by combining a pre-trained global clothing model, a local clothing model from retrieved examples
and transferred parse masks from retrieved examples. Further, Liu, Liang, et al. (2015) built a deep quasi-parametric human parsing
framework for matching any semantic region of a KNN image to the testing image. By leveraging video contexts without extra
annotation, Liu, Liang, et al. (2014) proposed a novel semi-supervised learning strategy to address human parsing, which contains the
contextual video parsing and the non-parametric human parsing.

2.1.3. Parselets representation methods


Dong, Chen, Xia, Huang, and Yan (2013) first proposed Deformable Mixture Parsing Model for human parsing by using the
Parselet representation. Then, Dong, Chen, Shen, Yang, and Yan (2014) further proposed a unified framework for simultaneous
human parsing and pose estimation by utilizing Parselets and Mixture of Joint-Group Templates as the representations for the
semantic parts.

2.1.4. CNN Models


As convolutional neural network facilitates great advances in general object/scene semantic segmentation (Farabet, Couprie,
Najman, & LeCun, 2013; Liang, Xu, et al., 2015; Liu, Su, Nie, & Kankanhalli, 2017; Papandreou, Chen, Murphy, & Yuille, 2015)
proposed a Contextualized CNN architecture for human parsing task. Next, Liang, Liu, et al. (2015) formulated the human parsing
task as an Active Template Regression problem, which integrates the cross-layer context, global image-level context, within-super-
pixel context and cross-super-pixel neighborhood context into a unified network. Further, Gong, Liang, Zhang, Shen, and Lin (2017)
proposed a self-supervised structure-sensitive learning approach for clothing parsing.

2.1.5. Adversarial models


As the adversarial networks yield the competitive results in semantic segmentation (Luc, Couprie, Chintala, & Verbeek, 2016; Luo
et al., 2018) proposed a new framework called Macro-Micro Adversarial Network (MMAN) for human parsing, which significantly
reduces the semantic inconsistency and the local inconsistency in the parsing results.

2.2. Landmark detection

Liu, Luo, Qiu, Wang, and Tang (2016) first proposed FashionNet to learn clothing features by jointly predicting clothing attributes
and landmarks. Later, Liu, Yan, Luo, Wang, and Tang (2016) proposed a three-stage deep fashion alignment (DFA) framework for
landmark detection. Yan et al. (2017) trained a Deep LAndmark Network (DLAN) iteratively for jointly estimating bounding boxes
and landmarks in an end-to-end manner. Both Liu, Yan, et al. (2016) and Yan et al. (2017) are based on the regression model.
Wang, Xu, Shen, and Zhu (2018) indicated that the regression model is highly non-linear and difficult to optimize. They proposed a
knowledge-guided fashion network (AttentiveNet) for fashion landmark localization and clothing category classification by extending
neural network with domain-specific grammars. Later, Lee, Oh, Jung, and Kim (2019) introduced contextual knowledge of clothes
and proposed a global-local embedding module (Global-Local) to achieve more accurate landmark prediction performance.

3. Middle-level fashion understanding

Clothing attributes are an informative and compact representation for describing people. As illustrated in Fig. 1, beyond color and
pattern, clothing attributes include other important features such as material, collar, length, cut and fastener. Fine-grained clothing
attributes recognition can be used for fashion retrieval, fashion recommendation and fashion analysis. Different from clothing at-
tributes, fashion styles emerge organically from how people assemble outfits of clothing, serving as the expressions of an individual’s

3
X. Gu, et al. Information Processing and Management 57 (2020) 102276

Fig. 1. An illustrative example of fine-grained clothing attributes (adapted from Di et al., 2013).

characters and aesthetics. For example, the Hipster Wars project (Kiapour, Yamaguchi, Berg, & Berg, 2014) defines five style cate-
gories including hipster, goth, preppy, pinup and bohemian. Fashion style can benefit various fashion analysis tasks such as fashion
compatibility and fashion trends analysis.

3.1. Clothing attribute prediction

We classify clothing attributes prediction approaches into the following three types: (1) Single-task learning only focuses on
learning clothing attributes of fashion images from a specific fashion domain. (2) Multi-task learning learns clothing attributes and
other tasks (e.g. landmark detection) simultaneously. (3) Transfer learning learns clothing attributes by bridging the gap between
fashion images of different domains.

3.1.1. Single-task learning


Chen, Gallagher, and Girod (2012) described clothing appearance by semantic attributes using a CRF based approach. Similarly,
Bossard et al. (2012) used a random forest approach to classify the apparel attributes. Di, Wah, Bhardwaj, Piramuthu, and
Sundaresan (2013) defined a set of style-related visual attributes and learn individual classifiers for each attribute. Zhang, Paluri,
Ranzato, Darrell, and Bourdev (2014) proposed to infer human attributes from images of people by augmenting deep CNNs. Different
from the above methods that conducted attribute recognition only based on annotated attribute labels, Corbire, Ben younes, Ram,
and Ollion (2017) proposed to identify attribute vocabulary using weakly labeled image-text data from shopping sites.

3.1.2. Multi-task learning


Park, Nie, and Zhu (2018) presented an attribute and-or grammar model for jointly inferring human body pose and human
attributes in a parse graph with attributes augmented to nodes in the hierarchical representation. Yamaguchi, Okatani, Sudo,
Murasaki, and Taniguchi (2015) proposed a joint model for clothing and attribute recognition. Han, Wu, Huang, et al. (2017)
proposed to learn spatial-aware concept representations by projecting images and their attributes into a joint visual-semantic em-
bedding space. Singh and Lee (2016) proposed an end-to-end deep CNN to simultaneously localize and rank relative visual attributes
by combining a localization module with a ranking module.

3.1.3. Transfer learning


Chen, Huang, et al. (2015) presented a double-path deep domain adaptation network for the problem of describing people based
on fine-grained clothing attributes. Dong, Gong, and Zhu (2017) proposed a transfer learning model to explore multiple sources of
different types of web annotations with multi-labeled fine-grained attributes, which can effectively recognize attributes given un-
constrained images taken from-the-wild. Jia, Zhou, Shi, and Hariharan (2018) proposed a deep model built on the Faster R-CNN
model for recognizing fashion attributes with a large-scale image dataset of 594 fine-grained attributes under different scenarios.

3.2. Fashion style prediction

Some existing works on fashion style prediction learn style representation in a supervised learning way, i.e., constructing clas-
sifiers for fashion style prediction. Considering manually defined style categories would be too abstract to capture subtle style
differences, other existing works focused on learning style representation in unsupervised learning way.

3.2.1. Supervised learning


Kiapour et al. (2014) defined five fashion styles and trained an SVM over a set of handcrafted features to categorize items into
clothing styles. Instead of using handcrafted features, Simo-Serra and Ishikawa (2016) extracted discriminative style features by
jointly training a deep feature extraction network and a deep classification network. Jiang, Shao, Jia, and Fu (2016) further proposed
a consensus style centralizing auto-encoder to extract robust style feature presentation for style classification. Different from the
above methods that construct a classification network, Ma, Jia, Zhou, et al. (2017) built a fashion semantic space for describing
clothing fashion styles.

4
X. Gu, et al. Information Processing and Management 57 (2020) 102276

3.2.2. Unsupervised learning


Vaccaro, Shivakumar, Ding, Karahalios, and Kumar (2016) presented a model that learns correspondences between fashion design
elements and styles by training polylingual topic models on outfit data collected from Polyvore. Likewise, Hsiao and Grauman (2017)
proposed an unsupervised approach to learn a style-coherent representation by probabilistic polylingual topic models. By leveraging
a large collection of user-created style sets, Lee, Seol, and goo Lee (2017) built a representation learning framework that can learn
latent style features of fashion items.

4. High-level fashion applications

Supported by the low-level fashion recognition and middle-level fashion understanding techniques, high-level fashion applica-
tions blossom in fashion retrieval, fashion recommendation, fashion compatibility, fashion image synthesis and fashion data mining.
In the fashion domain, fashion retrieval focuses on identifying clothing items from an image database based on an input query, while
fashion recommendation emphasizes recommending clothing items or outfits under certain conditions such as occasion, location and
users’ preferences. Different from retrieval and recommendation, fashion compatibility computes the matching score between
clothing items. Very recently, the success of generative adversarial networks encourages researchers to devote themselves to fashion
image synthesis. Meanwhile, with the big fashion data from Internet, fashion data mining is another popular research topic.

4.1. Fashion retrieval

Initially, a few related works on fashion retrieval is designed only within one scenario (Wang & Zhang, 2011). As online shopping
has become an exponentially growing market, some works are devoted to cross-scenario clothing retrieval (Huang, Feris, Chen, &
Yan, 2015; Jiang, Wu, & Fu, 2016; Kalantidis, Kennedy, & Li, 2013; Kiapour, Han, Lazebnik, Berg, & Berg, 2015; Liu, Song, et al.,
2012). To obtain more precise search results, some works propose to provide interactive search techniques that allow a user to
iteratively refine the results retrieved by the fashion search engine (Ak, Kassim, Lim, & Tham, 2018; Guo et al., 2018; Kovashka,
Parikh, & Grauman, 2012; Liu, Huang, He, Chiew, & Gao, 2015; Mizuochi, Kanezaki, & Harada, 2014; Xu et al., 2019; Zhao, Feng,
Wu, & Yan, 2017).

4.1.1. Cross-scenario retrieval model


Liu, Song, et al. (2012) first considered the cross-scenario retrieval problem of finding similar clothing in online stores given a
daily human photo captured in the wild. Kalantidis et al. (2013) improved the cross-scenario clothing retrieval that can trivially scale
to hundreds of product classes and millions of product images. Kiapour et al. (2015) utilized deep learning techniques (WTBI) to
match a real-world example of a garment item to the same item in an online shop. Meanwhile, Huang et al. (2015) presented a dual
attribute-aware ranking network (DARN) for the problem of cross-domain image retrieval. Next, Jiang, Wu, et al. (2016) proposed a
deep bi-directional cross-triplet embedding algorithm to jointly solve the bi-directional shop-to-street and street-to-shop clothing
retrieval problems.

4.1.2. Interactive retrieval model


Mizuochi et al. (2014) developed a clothing retrieval system, where users can retrieve their desired clothes which are globally
similar to an image and partially similar to another image. Kovashka et al. (2012) proposed an effective new form of feedback for
image search using relative attributes. Zhao et al. (2017) introduced a new fashion search protocol where attribute manipulation is
allowed within the interaction between users and search engines. Guo et al. (2018) addressed a new task for interactive visual content
search, where the dialog agent learns to interact with a human user throughout several dialog turns, and the user gives feedback in
natural language.

4.2. Fashion recommendation

There has been a large body of research literatures on fashion recommendation as it can promote people’s participation in online
shopping. As shown in Fig. 2, some of the works identified whether two products are complementary (Huynh, Ciptadi, Tyagi, &
Agrawal, 2018; Iwata, Wanatabe, & Sawada, 2011; Jagadeesh, Piramuthu, Bhardwaj, Di, & Sundaresan, 2014; Kumar & Gupta, 2019;
Zhou, Di, Zhou, & Zhang, 2018), some built personalized models by learning people’s preferences implicitly or explicitly (Bracher,
Heinz, & Vollgraf, 2016; He & McAuley, 2016a; 2016b; Hidayati et al., 2018; Hu, Yi, & Davis, 2015; Yu et al., 2018), some considered
conditions (e.g. occasion, location) for clothing recommendation (Kang, Kim, Leskovec, Rosenberg, & McAuley, 2018; Liu, Feng,
et al., 2012; Zhang et al., 2017), some studied the tasks of explainable outfit recommendation (Feng et al., 2018; Hou et al., 2019; Lin
et al., 2018), and some improves fashion recommedation by connecting it with fashion image generation (Hsiao, Katsman, Wu,
Parikh, & Grauman, 2019; Kang, Fang, Wang, & McAuley, 2017; Lin et al., 2019).

4.2.1. Complementary recommendation model


Iwata et al. (2011) proposed a probabilistic topic model to recommend tops for bottoms by learning information about co-
ordinates from visual features in each fashion item. Jagadeesh et al. (2014) proposed two classes for outfit recommendation to
discover key fashion insights from large quantities of online fashion images. Kumar and Gupta (2019) introduced a conditional
generative adversarial model to draw realistic samples from paired fashion clothing distribution and provide real samples to pair with

5
X. Gu, et al. Information Processing and Management 57 (2020) 102276

Fig. 2. Some illustrative examples of fashion recommendation studies: (a) An example of complementary item recommendation (adapted from
Jagadeesh et al., 2014). (b) An example of personalized fashion recommendation (adapted from Hidayati et al., 2018). (c) An example of scenario-
oriented outfit recommendation (adapted from Liu, Feng, et al., 2012). (d) An example of explainable fashion recommendation (adapted from
Lin et al., 2018). (e) An example of outfit improvement with generation (adapted from Hsiao et al., 2019).

arbitrary fashion units. Zhou et al. (2018) proposed to incorporate the expert knowledge including purchase behaviors, image
contents and product descriptions for providing fashion item recommendation.

4.2.2. Personalized recommendation model


Hu et al. (2015) proposed a personalized outfit recommendation method that models the user-items interaction based on tensor
decomposition. To capture the aesthetic preference of consumers at a particular time, Yu et al. (2018) proposed an approach that
incorporates aesthetic features into a tensor factorization model. By using users’ opinions or feedback, He and McAuley (2016b)
introduced a scalable matrix factorization approach that incorporates visual signals of product images into predictors of people’s
opinions. And later, He and McAuley (2016a) built visually-aware recommender systems by combining high-level visual features,
users’ past feedback, as well as evolving trends within the community. Particularly, Hidayati et al. (2018) proposed a framework for
learning the compatibility of clothing styles and body shapes from social big data.

4.2.3. Scenario-oriented recommendation model


Liu, Feng, et al. (2012) studied both occasion oriented outfit recommendation and pairing problems. Likewise, Kang et al. (2018)
proposed an approach to recommend visually compatible products based on scene images. And Zhang et al. (2017) studied the
correlation between clothing and location automatically using online travel photos.

4.2.4. Explainable recommendation model


Feng et al. (2018) proposed an interpretable partitioned embedding network to extract interpretable partitioned embedding from
clothing items. Lin et al. (2018) proposed a neural fashion recommender model that can simultaneously provide fashion re-
commendations and generate abstractive comments as explanations. Chen, Chen, et al. (2019) made the first step towards perso-
nalized fashion recommendation with visual explanations by jointly leveraging image region-level features and user review in-
formation.

4.2.5. Generative model


Lin et al. (2019) incorporated the supervision of generation loss to improve outfit recommendation by a neural co-supervision
learning framework. Kang et al. (2017) proposed an end-to-end approach for both personalized recommendation and design, which is
not only capable of suggesting existing items to a user but also capable of generating new fashion images that match user preferences.
Hsiao et al. (2019) introduced a deep image generation neural network to recommend minimal adjustments to a full-body clothing
outfit that will have maximal impact on its fashionability.

4.3. Fashion compatibility

Methods for fashion compatibility learning usually fall within two categories, namely, pairwise compatibility learning and outfit
compatibility learning, where the former takes a fashion item as a query and searches compatible items from different categories, and
the latter selects fashion items of different categories to form compatible outfits.

6
X. Gu, et al. Information Processing and Management 57 (2020) 102276

4.3.1. Pairwise compatibility learning


Early works on pairwise compatibility learning followed the idea to map the items into a common latent compatibility space and
estimate the distance between style vectors of items (Chen & He, 2018; M. & Tuytelaars, 2016; McAuley, Targett, Shi, & van den
Hengel, 2015; Song et al., 2017; Veit et al., 2015). Lately, some works proposed to model the compatibility of items by mapping items
into several latent visual spaces and jointly modeled the distances in these latent spaces, which can measure the compatibility of
items in different aspects (He, Packer, & McAuley, 2016; Shih, Chang, Lin, & Sun, 2018).
However, these above-mentioned methods tend to make the items of the same categories close in the learned latent visual spaces.
Thus, some works proposed to add the categorical information for improving the item embedding learning (Vasileva et al., 2018;
Yang, Ma, Liao, Wang, & Chua, 2019). On the other hand, most methods on compatibility modeling rely on data-driven methods but
neglect fashion domain knowledge. To address this deficiency, Song et al. (2018) explored fashion domain knowledge for compat-
ibility modeling by following a teacher-to-student learning scheme. Yin, Li, Lu, and Zhang (2019) used an external expert fashion
collection dataset and incorporated visual compatibility relationships as well as style information into fashion compatibility learning.
Yang, He, et al. (2019) developed an attribute-based interpretable compatibility method to inject interpretability into the pairwise
compatibility modeling.

4.3.2. Outfit compatibility modeling


Recently, a few works have been trying to directly measure the compatibility of a whole outfit. To take contextual information
(such as titles and categories) into consideration, Li, Cao, Zhu, and Luo (2017) proposed to represent an outfit as a sequence and
evaluate the outfit compatibility with a Recurrent Neural Network by extracting multi-modal information from items. Han, Wu,
Jiang, and Davis (2017) further represented an outfit as a bidirectional sequence with a specific order (i.e., top to bottom and then to
accessories). Nakamura and Goto (2018) proposed an architecture containing three subnetworks to learn outfit compatibility and
extract the outfit style simultaneously in an unsupervised manner.
Chen, Huang, et al. (2019) argued that it is unreasonable to regard an outfit as an ordered sequence and proposed to connect user
preferences regarding individual items and outfits with a Transformer architecture. Simo-Serra, Fidler, Moreno-Noguer, and
Urtasun (2015) implicitly learned the outfit compatibility by predicting how fashionable a person looks on a particular photograph.
Cui, Li, Wu, Zhang, and Wang (2019) constructed a Fashion Graph to model the outfit compatibility by representing an outfit as a
graph, where each node represents a category and each edge represents the interaction between two categories. In contrast to most
works that require labeled data to learn compatibility, Hsiao and Grauman (2018) proposed a topic model to generate compatible
outfits learned from unlabeled images of people wearing outfits.

4.4. Fashion image synthesis

Recent years have seen remarkable advances especially on generative models such as Generative Adversarial Networks (GANs)
(Goodfellow et al., 2014) and Variational Autoencoders (VAEs) (Kingma & Welling, 2014). Extensive studies have been conducted on
fashion image synthesis by using generative models, such as pose guided fashion/person image generation (Balakrishnan, Zhao,
Dalca, Durand, & Guttag, 2018; Dong et al., 2018; Esser, Sutter, & Ommer, 2018; Lassner, Pons-Moll, & Gehler, 2017; Ma, Jia, Sun,
et al., 2017; 2018; Pumarola, Agudo, Sanfeliu, & Moreno-Noguer, 2018; Siarohin, Sangineto, Lathuiliére, & Sebe, 2018), text guided
fashion image synthesis (Gnel, Erdem, & Erdem, 2018; Zhou et al., 2019; Zhu, Fidler, Urtasun, Lin, & Loy, 2017), virtual try-on
applications (Chou, Lee, Zhang, Lee, & Hsu, 2018; Dong et al., 2019; Han, Wu, Wu, Yu, & Davis, 2018; Wang, Zheng, et al., 2018; Wu,
Lin, Tao, & Cai, 2018), fashion image transformation (Han, Wu, Huang, Scott, & Davis, 2019; Jetchev & Bergmann, 2017; Mo, Cho, &
Shin, 2018; Raj et al., 2018; Yoo, Kim, Park, Paek, & Kweon, 2016; Zhao et al., 2018) and fashion design (Cui, Liu, Gao, & Su, 2018;
Jiang & Fu, 2017; Sbai, Elhoseiny, Bordes, LeCun, & Couprie, 2018; Xian et al., 2018; Yildirim, Seward, & Bergmann, 2018). Fig. 3
displays several examples of studies on fashion image synthesis.

4.4.1. Pose guided generative model


Ma, Jia, Sun, et al. (2017) is the first work to synthesize fashion/person images in arbitrary poses from an image while keeping the
clothing the same with a U-Net-like network. More recent methods proposed either a novel architectures or losses to improve the
results (Balakrishnan et al., 2018; Dong et al., 2018; Esser et al., 2018; Hu et al., 2018; Lassner et al., 2017; Ma et al., 2018; Pumarola
et al., 2018; Siarohin et al., 2018).

4.4.2. Text guided generative model


Zhu et al. (2017) solved the task of generating new outfits with precise regions conforming to a language description while
retaining the wearer’s body structure. Gnel et al. (2018) proposed an approach for language conditioned editing of fashion images,
which leverages feature-wise linear modulation to relate and transform visual features with natural language representations without
using extra spatial information. Zhou et al. (2019) presented a method to manipulate the visual appearance of a person’s image
according to natural language descriptions.

4.4.3. Virtual try-on model


Image-based virtual try-on applications without using 3D information are popular due to their practical meaning.
Han et al. (2018) first proposed the VITON model to transfer a clothing item in a product image to a person with a virtual try-on
network. Later on, Wang, Zheng, et al. (2018) proposed the CP-VTON model that mainly addresses the characteristic preserving issue

7
X. Gu, et al. Information Processing and Management 57 (2020) 102276

Fig. 3. Some illustrative examples of studies on fashion image synthesis: (a) An example of pose guided fashion image generation (adapted from
Ma, Jia, Sun, et al., 2017). (b) An example of text guided fashion image synthesis (adapted from Zhu et al., 2017). (c) An example of virtual try-on
applications (adapted from Han et al., 2018). (d) An example of fashion image transformation (adapted from Han et al., 2019). (e) An example of
fashion design synthesis (adapted from Cui et al., 2018).

when facing large spatial deformation challenge in the realistic virtual try-on tasks. Further, Yu, Wang, and Xie (2019) proposed the
VTNFP model by first generating warped clothing, followed by generating a body segmentation map of the person wearing the target
clothing and ending with a try-on synthesis module. To generate a new person image after fitting the desired clothes into the input
image and manipulate human poses, Dong et al. (2019) proposed Multi-pose Guided Virtual Try-on Network.

4.4.4. Image transformation model


Mo et al. (2018) proposed a coined instance-aware GAN to translate a source fashion item to a target fashion item in fashion
images. Han et al. (2019) presented a two-stage image-to-image generation framework for compatible and diverse fashion image
inpainting. Zhao et al. (2018) proposed a framework to modify the viewpoint or the pose of a person from an image while keeping the
clothing the same. Yoo et al. (2016) proposed a pixel-level domain transfer model for generating a standalone piece of clothing given
a person image. Zanfir, Popa, Zanfir, and Sminchisescu (2018) proposed an automatic person-to-person appearance transfer model
for photographic image synthesis. Similarly, Raj et al. (2018) presented a SwapNet for garment transfer with arbitrary body pose,
shape, and clothing.

4.4.5. Fashion design model


Sbai et al. (2018) introduced specific conditioning of GANs on texture and shape elements for generating fashion design images.
Jiang and Fu (2017) addressed the fashion style generation problem which generates a clothing image with a certain style given a
basic clothing image and a fashion style image. Xian et al. (2018) proposed an approach for controlling fashion image synthesis with
input sketch and texture patches by training the generative network with a local texture loss in addition to adversarial and content
loss. Further, Cui et al. (2018) proposed an end-to-end framework for displaying the design effect of garments, which only needs users
to input a desired fabric image and a specified sketch.

4.5. Fashion data mining

Extracting valuable knowledge from cross-media fashion data have become a great interest for the industry and academia because
of its promising opportunity for boosting the fashion industry. For instance, investigating the fashion trends (Abe et al., 2017; Al-
Halah, Stiefelhagen, & Grauman, 2017; Chen, Chen, Cong, Hsu, & Luo, 2015; Gu et al., 2017; Matzen, Bala, & Snavely, 2017;
Vittayakorn, Yamaguchi, Berg, & Berg, 2015) every year is remarkable for the industry as well as sociology and psychology. On the
other hand, extensive hybrid analytics have also been conducted due to the wide variety of demands (Chang, Cheng, Wu, & Hua,
2017; Chen & Luo, 2017; Jia et al., 2016; Kwak, Murillo, Belhumeur, Kriegman, & Belongie, 2013; Murillo, Kwak, Bourdev,
Kriegman, & Belongie, 2012; Song, Wang, Hua, & Yan, 2011; Takagi, Simo-Serra, Iizuka, & Ishikawa, 2017; Vittayakorn, Berg, & Berg,
2017; Yamaguchi, Berg, & Ortiz, 2014; Zou et al., 2016).

8
X. Gu, et al. Information Processing and Management 57 (2020) 102276

4.5.1. Fashion trends analysis


Some works on fashion trends analysis focus on runway fashion, while others focus on street fashion. Chen, Chen, et al. (2015)
discovered fashion trends in New York City by utilizing semantic clothing attributes. Vittayakorn et al. (2015) analyzed how fashion
trends transfer from runway collections to the real-life dressing patterns. Matzen et al. studied street fashion trends by creating a
visual embedding of clothing style (Matzen et al., 2017). Gu et al. proposed an embedding learning network for street fashion analysis
(Gu et al., 2017). Chang et al. (2017) depicted the street fashion of a city by discovering fashion items that are most iconic for the city.
Al-Halah et al. proposed to predict the future popularity of styles discovered from fashion images in an unsupervised manner (Al-
Halah et al., 2017).

4.5.2. Hybrid analytics


Since identity is linked to how a person chooses to dress, clothing can be predictive of occupation (Song et al., 2011) or one’s
social “urban tribe” (Kwak et al., 2013; Murillo et al., 2012). Yamaguchi et al. (2014) presented a vision-based approach to quan-
titatively evaluate the influence of social factors and content factors on popularity in a large real-world fashion social network.
Chen and Luo (2017) designed a system that detects the possible popular and attractive clothing features by utilizing a large-scale
online shopping dataset. Jia et al. (2016) explored the relationship between visual features and aesthetic words of clothing.
Gu et al. (2019) proposed a multi-view embedding learning approach for multi-modal and multi-domain fashion data representation
and fashion analysis.

5. Fashion benchmark datasets

A variety of benchmark datasets have been introduced and contributed to a comprehensive understanding of fashion. Some
datasets are specifically tailored for a particular task such as clothing parsing, style prediction, fashion recommendation, fashion
compatibility and fashion trends analysis, while some are designed to evaluate multiple tasks of fashion understanding and analysis
simultaneously. Table 2 summarizes the comparison among the most representative fashion datasets.

5.1. Overview of fashion datasets

5.1.1. Benchmark datasets for a single task


Yamaguchi et al. (2012) built Fashionista dataset with 685 fully parsed images for human parsing task, which is the first
benchmark for clothing parsing. In terms of style prediction, Kiapour et al. (2014) built a Hipster dataset containing five styles (e.g.
hipster, bohemian, goth, preppy, and pinup). For fashion compatibility, for example, Song et al. (2017) presented FashionVC dataset
that consists of 20,726 outfits with 14,871 tops and 13,663 bottoms. He and McAuley (2016a) collected Amazon Fashion dataset that
includes product images along with users’ review histories for fashion recommendation. Vittayakorn et al. (2015) collected Run-
way2Realway dataset for analyzing visual trends in fashion, which contains 348,598 images of 9328 fashion shows that cover 15
years. Recently, Rostamzadeh et al. (2018) proposed Fashion-Gen dataset with 293,008 fashion images for text-to-image and attri-
butes-to-image synthesis task.

5.1.2. Benchmark datasets for multiple tasks


DeepFashion (Liu, Luo, et al., 2016) has been one of the most popular datasets for multiple tasks of fashion studies including
landmark detection, attribute prediction, clothing retrieval and fashion image synthesis. It contains over 800K images annotated with
categories, attributes, landmarks, and consumer-commercial image pairs. DeepFashion2 (Ge et al., 2019) is a versatile benchmark
covering multiple tasks in fashion understanding. It contains 801K clothing items and 873K consumer-commercial clothes pairs,
where each clothing item has rich annotations such as style, scale, viewpoint, occlusion, bounding box, dense landmarks, and masks.
Kiapour et al. (2015) collected WTBI dataset for matching a real-world example of a clothing item to the same item in an online shop
and attribute prediction. Zou et al. (2019) proposed the FashionAI dataset with 24 key points, 245 attribute labels that cover 6
categories of women’s clothing, and a total of 41 subcategories for landmark detection and attribute prediction. Huang et al. (2015)
collected DARN dataset composed of cross-scenario image pairs with fine-grained attributes for attribute prediction and clothing
retrieval, in which 450K online images and 90K offline counterparts are collected. Zheng, Yang, Kiapour, and Piramuthu (2018)
introduced ModaNet dataset for clothing parsing and attribute prediction, which has more than 55K fully-annotated images with
pixel-level segments, polygons and bounding boxes covering 13 categories.

5.2. Experimental results

In this part, we select some representative experimental results conducted on the DeepFashion dataset since it is the most widely
used fashion benchmark.

5.2.1. Clothing attribute prediction


The most popular evaluation metric used in clothing attribute prediction is the top-k accuracy, which is defined as the ratio of
queries with at least one matching item retrieved within the top-k returned results. Table 3 shows the performance comparisons of
leading clothing attribute prediction methods on the Deepfashion dataset. AttentiveNet (Wang, Xu, et al., 2018) achieves the best
score on clothing category classification and the best average score over all attributes, which demonstrates the effectiveness of

9
X. Gu, et al.

Table 2
Comparison of the representative fashion datasets.
Datasets # images # category # attributes # bboxes # landmarks # masks Tasks

Hipster (Kiapour et al., 2014) 1893 5 ✗ ✗ ✗ ✗ Style Prediction


Fashionista (Yamaguchi et al., 2012) 158K 56 ✗ ✗ ✗ 685 Clothing Parsing
Fashion144k (Simo-Serra et al., 2015) 227K ✗ ✗ ✗ ✗ ✗ Style Prediction,Fashion Trends Analysis
ModaNet (Zheng et al., 2018) 55K 13 ✗ 55k ✗ 55k Clothing Parsing,Attribute Prediction
Runway2Realway (Vittayakorn et al., 2015) 348K ✗ ✗ ✗ ✗ ✗ Fashion Trends Analysis
FashionGen (Rostamzadeh et al., 2018) 293K 48 ✗ ✗ ✗ ✗ Fashion Image Synthesis

10
Fashionpedia (Jia, Shi, Sirotenko, & Cui, 2019) 50K 46 92 ✗ ✗ 50K Clothing Parsing,Attribute Prediction
FashionAI (Zou et al., 2019) 357K 6 245 ✗ 324K ✗ Landmark Detection,Attribute Prediction
WTBI (Kiapour et al., 2015) 425K 11 ✗ 39K ✗ ✗ Attribute Prediction,Clothing Retrieval
DARN (Huang et al., 2015) 540K 9 179 7K ✗ ✗ Attribute Prediction,Clothing Retrieval
Amazon Fashion (He & McAuley, 2016a) 431K 6 ✗ ✗ ✗ ✗ Fashion Recommendation
FashionVC (Song et al., 2017) 20K 2 ✗ ✗ ✗ ✗ Fashion Compatibility
Deepfashion (Liu, Luo, et al., 2016) 800K 50 1000 ✗ 120K ✗ Landmark Detection,Attribute Prediction, Clothing Retrieval,Fashion Image Synthesis
Deepfashion2 (Ge et al., 2019) 491K 13 ✗ 801K 801K 801K Landmark Detection,Clothing Parsing,Clothing Retrieval
Information Processing and Management 57 (2020) 102276
X. Gu, et al. Information Processing and Management 57 (2020) 102276

Table 3
Quantitative results for clothing attribute prediction on the Deepfashion dataset with top-k accuracy. Higher values are better. The best scores are
marked in bold.
Method Category Texture Fabric Shape Part Style All
top-3 | top-5 top-3 | top-5 top-3 | top-5 top-3 | top-5 top-3 | top-5 top-3 | top-5 top-3 | top-5

WTBI (Chen et al., 2012) 43.73 | 66.26 24.21 | 32.65 25.38 | 36.06 23.39 | 31.26 26.31 | 33.24 49.85 | 58.68 27.46 | 35.37
DARN (Huang et al., 2015) 59.48 | 79.58 36.15 | 48.15 36.64 | 48.52 35.89 | 46.93 39.17 | 50.14 66.11 | 71.36 42.35 | 51.95
FashionNet (Liu, Luo, et al., 2016) 82.58 | 90.17 37.46 | 49.52 39.30 | 49.84 39.47 | 48.59 44.13 | 54.02 66.43 | 73.16 45.52 | 54.61
Corbiere (Corbire et al., 2017) 86.30 | 92.80 53.60 | 63.20 39.10 | 48.80 50.10 | 59.50 38.80 | 48.90 30.50 | 38.30 23.10 | 30.40
AttentiveNet (Wang, Xu, et al., 2018) 90.99 | 95.78 50.31 | 65.48 40.31 | 48.23 53.32 | 61.05 40.65 | 56.32 68.70 | 74.25 51.53 | 60.95

introducing attention mechanisms in the proposed fashion grammar model.

5.2.2. Landmark detection


The most popular evaluation metric used in fashion landmark detection is the normalized error (NE), which is defined as the l2
distance between detected and the ground truth landmarks in the normalized coordinate space. Table 4 shows the performance
comparisons of leading landmark detection methods on the Deepfashion dataset. The total NE score of the Global-Local method
(Lee et al., 2019) achieves state-of-the-art at 0.0312 and it is noteworthy that the Global-Local method consistently improves the
accuracy in all landmarks.

5.2.3. Clothing retrieval


Similar with clothing attribute prediction, top-k retrieval accuracy is one of the most widely used evaluation metrics in clothing
retrieval. Fig. 4(b) and (b) show the performance comparisons of top-k retrieval accuracy on in-shop retrieval task and consumer-to-
shop retrieval task with k ranging from 1 to 50, respectively. As we can see FashionNet (Liu, Luo, et al., 2016) achieves best per-
formance among all the methods under comparison in two retrieval tasks, while WTBI (Kiapour et al., 2015) has the lowest accuracy.
On the other hand, compared with in-shop retrieval, consumer-to-shop retrieval models achieve much lower accuracies, which
indicates the inherent difficulty of consumer-to-shop clothes retrieval. Figs. 4(a) and 5(a) display some sample queries along with
their top matches for in-shop retrieval task and consumer-to-shop retrieval task, respectively.

5.2.4. Virtual Try-on


The evaluation for virtual try-on typically includes quantitative analysis and qualitative analysis. In terms of quantitative eva-
luation, it is generally based on subjective assessment (e.g. inception score, structural similarity) or user study. In terms of qualitative
analysis, it is generally comparing which approach producing more realistic looking virtual try-on images through visualization.
Fig. 6 shows a visual comparison of three state-of-the-art visual try-on methods. As we can see, compared with VITON (Han et al.,
2018) and CP-VTON (Wang, Zheng, et al., 2018), VTNFP (Yu et al., 2019) generates more realistic try-on results and preserves both
the clothing texture and person body features, especially when a person’s posture is complex, e.g. when the arms are crossing.

6. Discussion and future directions

Currently, some promising results have been achieved in fashion studies including fashion recognition, fashion understanding and
fashion applications. Several new techniques have been seamlessly embedded in the products for customers. For instance, fashion
recognition and fashion recommendation algorithms have been applied to the world’s largest eCommerce website, Taobao.
However, from the perspective of the fashion industry, the biggest problem is the huge gap between design, manufacturing and
marketing, which results in a huge waste of resources. Although many researchers have participated in equipping the fashion industry
with technologies, most of them only focus on a specific task but neglect to build connections among design, manufacturing and
marketing. New AI technologies that link the various value chains of the fashion industry are promising directions. Besides, there are
still many challenges for improving the aforementioned fashion studies, such as lacking a unified large-scale hybrid fashion dataset,
learning a good feature representation of fashion data, the difficulty of fashion assessment, improving user’s shopping experiences
and developing AI assistants for fashion designers. In the following, we mainly discuss these research challenges.

Table 4
Quantitative results for landmark detection on the Deepfashion dataset with normalized error (NE). Smaller values of NE indicates better results.
The best scores are marked in bold.
Method L.Collar R.Collar L.Sleeve R.Sleeve L.Waistline R.Waistline L.Hem R.Hem Avg.

FashionNet (Liu, Luo, et al., 2016) 0.0854 0.0902 0.0973 0.0935 0.0854 0.0845 0.0812 0.0823 0.0872
DFA (Liu, Yan, et al., 2016) 0.0628 0.0637 0.0658 0.0621 0.0726 0.0702 0.0658 0.0663 0.0660
DLAN (Yan et al., 2017) 0.0570 0.0611 0.0672 0.0647 0.0703 0.0694 0.0624 0.0627 0.0643
AttentiveNet (Wang, Xu, et al., 2018) 0.0415 0.0404 0.0496 0.0449 0.0502 0.0523 0.0537 0.0551 0.0484
Global-Local (Lee et al., 2019) 0.0312 0.0324 0.0427 0.0434 0.0361 0.0373 0.0442 0.0475 0.0393

11
X. Gu, et al. Information Processing and Management 57 (2020) 102276

Fig. 4. Results of in-shop clothing retrieval (adopted from Liu, Luo, et al., 2016). (a) Example queries along with top-5 retrieved images. (b)
Retrieval accuracies of different methods.

Fig. 5. Results of consumer-to-shop clothing retrieval (adopted from Liu, Luo, et al., 2016). (a) Example queries along with top-5 retrieved images.
(b) Retrieval accuracies of different methods.

1. Collection of a unified large-scale hybrid fashion dataset. Currently, available fashion datasets are either too small, or from a single
data source, or tailored for a specific task, or spanning a short period of time. There is a lack of good benchmark dataset for
training, testing, evaluating and comparing the performance of different algorithms for fashion analysis. Therefore, it would be
tremendously helpful for researchers if there exists a unified large-scale fashion dataset, which contains multiple modalities data
and spans a long period of time.
2. Good representation learning approaches of fashion data. Studying a good representation of fashion data is an indispensable step for
fashion analysis and understanding, such as fashion trend analysis, fashion information retrieval and fashion recommendation.
Fashion data is a special kind of cross-media data, has its distinctive features such as multi-modal, multi-domain and weakly labeled.
Conventional representation learning approaches cannot be directly applied to fashion data since these models do not consider the
distinctive characteristics of fashion data. Hence, how to effectively integrate complementary features from multiple channels is a
great challenge for studying a good representation of cross-media fashion data.
3. Difficulty of fashion assessment. Not all fashion analysis tasks can be evaluated with objective metrics. For instance, predicting how
fashionable a person looks on a particular photograph involves many aesthetic factors such as the garments the subject is wearing,
how visually appealing the person is, and how appealing the scene behind the person is. Such similar tasks mostly rely on the
results of user studies, which are obtained from a small group of people. However, user studies can be easily influenced by the
users’ personal preferences or the environment. Thus, it is vital to build a novel and objective assessment metric for fashion
analysis.
4. Improving user’s shopping experiences. With e-commerce becoming a central way that people shop, new techniques such as virtual
try-on systems and digital wardrobe assistants that help you decide what to wear have been popular in both industry and aca-
demia. For virtual try-on systems, the main challenges are caused by the difficulties of rendering clothing and 3-D human body
modeling for arbitrary people. Thus, there are two key factors for developing new virtual try-on methods modeling, which
includes clothing geometry and capturing body shape details. For digital wardrobe assistants, the main challenges are caused by
the subjectivity of fashion compatibility and dynamic changes in user preferences. The key to this problem lies in modeling
fashion compatibility by using online learning algorithms, as online fashion data is changing rapidly over time.
5. Developing AI assistants for fashion designers. Generating new realistic fashion designs automatically through image generation
would be significant in the design process. For instance, synthesizing fashion designs conditioned on user-specified multiple

12
X. Gu, et al. Information Processing and Management 57 (2020) 102276

Fig. 6. A visual comparison of three virtual try-on methods (adapted from Yu et al., 2019).

fashion attributes such as color, shape and texture would greatly reduce the cost of producing clothes. Some progress has been
made with the success of generative models, however, current synthesis results are too coarse to serve fashion designers. Besides,
the diverse attributes of fashion images in color, shape, pattern and style make it challenging to generate realistic fashion images.
Therefore, research works on how to handle such multimodal conditions as well as generate high-resolution realistic fashion
designs should be inspired.

CRediT authorship contribution statement

Xiaoling Gu: Conceptualization, Methodology, Writing - original draft. Fei Gao: Writing - review & editing. Min Tan: Software,
Visualization. Pai Peng: Supervision.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 61802100, 61971172, 61972119,
61702145 and 61971339. This work was also supported by the China Post-Doctoral Science Foundation under Grant 2019M653563.

References

Abe, K., Suzuki, T., Ueta, S., Nakamura, A., Satoh, Y., & Kataoka, H. (2017). Changing fashion cultures. CoRR. arXiv:1703.07920.
Ak, K. E., Kassim, A. A., Lim, J.-H., & Tham, J. Y. (2018). Learning attribute representations with localization for flexible fashion search. CVPR7708–7717.
Al-Halah, Z., Stiefelhagen, R., & Grauman, K. (2017). Fashion forward: Forecasting visual style in fashion. ICCV388–397.
Balakrishnan, G., Zhao, A., Dalca, A. V., Durand, F., & Guttag, J. V. (2018). Synthesizing images of humans in unseen poses. CVPR8340–8348.
Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., & Gool, L. V. (2012). Apparel classification with style. ACCV.
Bracher, C., Heinz, S., & Vollgraf, R. (2016). Fashion dna: Merging content and sales data for recommendation and article mapping. CoRR. arXiv:1609.02489.
Chang, Y.-T., Cheng, W.-H., Wu, B., & Hua, K.-L. (2017). Fashion world map: Understanding cities through streetwear fashion. ACM multimedia91–99.
Chen, H., Gallagher, A. C., & Girod, B. (2012). Describing clothing by semantic attributes. ECCV (3)7574. ECCV (3) 609–623.
Chen, K., Chen, K., Cong, P., Hsu, W. H., & Luo, J. (2015). Who are the devils wearing Prada in New York City? ACM multimedia177–180.
Chen, K.-T., & Luo, J. (2017). When fashion meets big data: Discriminative mining of best selling clothing features. WWW (companion volume)15–22.
Chen, L., & He, Y. (2018). Dress fashionably: Learn fashion collocation with deep mixed-category metric learning. AAAI2103–2110.
Chen, Q., Huang, J., Feris, R. S., Brown, L. M., Dong, J., & Yan, S. (2015). Deep domain adaptation for describing people based on fine-grained clothing attributes.
CVPR5315–5324.
Chen, W., Huang, P., Xu, J., Guo, X., Guo, C., Sun, F., ... Zhao, B. (2019). Pog: Personalized outfit generation for fashion recommendation at alibaba ifashion. CoRR.
arXiv:1905.01866.
Chen, X., Chen, H., Xu, H., Zhang, Y., Cao, Y., Qin, Z., & Zha, H. (2019). Personalized fashion recommendation with visual explanations based on multimodal attention
network: Towards visually explainable recommendation. SIGIR765–774.
Chou, C.-T., Lee, C.-H., Zhang, K., Lee, H.-C., & Hsu, W. H. (2018). Pivtons: Pose invariant virtual try-on shoe with conditional image completion. ACCV (6)11366. ACCV (6)
654–668.
Corbire, C., Ben younes, H., Ram, A., & Ollion, C. (2017). Leveraging weakly annotated data for fashion image retrieval and label prediction. ICCV Workshops2268–2274.

13
X. Gu, et al. Information Processing and Management 57 (2020) 102276

Cui, Y. R., Liu, Q., Gao, C. Y., & Su, Z. (2018). Fashiongan: Display your fashion design using conditional generative adversarial nets. Computer Graphics Forum : Journal
of the European Association for Computer Graphics, 37(7), 109–119.
Cui, Z., Li, Z., Wu, S., Zhang, X., & Wang, L. (2019). Dressing as a whole: Outfit compatibility learning based on node-wise graph neural networks. WWW307–317.
Di, W., Wah, C., Bhardwaj, A., Piramuthu, R., & Sundaresan, N. (2013). Style finder: Fine-grained clothing style detection and retrieval. CVPR Workshops8–13.
Dong, H., Liang, X., Gong, K., Lai, H., Zhu, J., & Yin, J. (2018). Soft-gated warping-gan for pose-guided person image synthesis. NEURIPS472–482.
Dong, H., Liang, X., Wang, B., Lai, H., Zhu, J., & Yin, J. (2019). Towards multi-pose guided virtual try-on network. CoRR. arXiv:1902.11026.
Dong, J., Chen, Q., Shen, X., Yang, J., & Yan, S. (2014). Towards unified human parsing and pose estimation. CVPR843–850.
Dong, J., Chen, Q., Xia, W., Huang, Z., & Yan, S. (2013). A deformable mixture parsing model with parselets. ICCV3408–3415.
Dong, Q., Gong, S., & Zhu, X. (2017). Multi-task curriculum transfer deep learning of clothing attributes. WACV520–529.
Esser, P., Sutter, E., & Ommer, B. (2018). A variational u-net for conditional appearance and shape generation. CVPR8857–8866.
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence,
35(8), 1915–1929.
Feng, Z., Yu, Z., Yang, Y., Jing, Y., Jiang, J., & Song, M. (2018). Interpretable partitioned embedding for customized multi-item fashion outfit composition. ICMR143–151.
Ge, Y., Zhang, R., Wu, L., Wang, X., Tang, X., & Luo, P. (2019). Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification
of clothing images. CoRR. arXiv:1901.07973.
Gong, K., Liang, X., Zhang, D., Shen, X., & Lin, L. (2017). Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing.
CVPR6757–6765.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... Bengio, Y. (2014). Generative adversarial nets. NIPS2672–2680.
Gu, X., Wong, Y., Peng, P., Shou, L., Chen, G., & Kankanhalli, M. S. (2017). Understanding fashion trends from street photos via neighbor-constrained embedding learning.
ACM multimedia190–198.
Gu, X., Wong, Y., Shou, L., Peng, P., Chen, G., & Kankanhalli, M. S. (2019). Multi-modal and multi-domain embedding learning for fashion retrieval and analysis. IEEE
Transactions on Multimedia, 21(6), 1524–1537.
Guo, X., Wu, H., Cheng, Y., Rennie, S., Tesauro, G., & Feris, R. S. (2018). Dialog-based interactive image retrieval. NEURIPS676–686.
Gnel, M., Erdem, E., & Erdem, A. (2018). Language guided fashion image manipulation with feature-wise transformations. CoRR. arXiv:1808.04000.
Han, X., Wu, Z., Huang, P. X., Zhang, X., Zhu, M., Li, Y., ... Davis, L. S. (2017). Automatic spatially-aware fashion concept discovery. ICCV1472–1480.
Han, X., Wu, Z., Huang, W., Scott, M. R., & Davis, L. S. (2019). Compatible and diverse fashion image inpainting. CoRR. arXiv:1902.01096.
Han, X., Wu, Z., Jiang, Y.-G., & Davis, L. S. (2017). Learning fashion compatibility with bidirectional lstms. ACM multimedia1078–1086.
Han, X., Wu, Z., Wu, Z., Yu, R., & Davis, L. S. (2018). Viton: An image-based virtual try-on network. CVPR7543–7552.
He, R., & McAuley, J. J. (& McAuley, 2016a). Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. WWW507–517.
He, R., & McAuley, J. J. (McAuley, 2016b). Vbpr: Visual Bayesian personalized ranking from implicit feedback. AAAI. AAAI Press144–150.
He, R., Packer, C., & McAuley, J. J. (2016). Learning compatibility across categories for heterogeneous item recommendation. Icdm937–942.
Hidayati, S. C., Hsu, C.-C., Chang, Y.-T., Hua, K.-L., Fu, J., & Cheng, W.-H. (2018). What dress fits me best?: Fashion recommendation on the clothing style for personal body
shape. ACM multimedia438–446.
Hou, M., Wu, L., Chen, E., Li, Z., Zheng, V. W., & Liu, Q. (2019). Explainable fashion recommendation: A semantic attribute region guided approach. CoRR. arXiv:1905.
12862.
Hsiao, W.-L., & Grauman, K. (2017). Learning the latent ”look”: Unsupervised discovery of a style-coherent embedding from fashion images. ICCV4213–4222.
Hsiao, W.-L., & Grauman, K. (2018). Creating capsule wardrobes from fashion images. CVPR7161–7170.
Hsiao, W.-L., Katsman, I., Wu, C.-Y., Parikh, D., & Grauman, K. (2019). Fashion++: Minimal edits for outfit improvement. CoRR. arXiv:1904.09261.
Hu, Y., Yi, X., & Davis, L. S. (2015). Collaborative fashion recommendation: A functional tensor factorization approach. ACM multimedia129–138.
Hu, Z., Yang, Z., Salakhutdinov, R. R., Qin, L., Liang, X., Dong, H., & Xing, E. P. (2018). Deep generative models with learnable knowledge constraints.
NEURIPS10522–10533.
Huang, J., Feris, R. S., Chen, Q., & Yan, S. (2015). Cross-domain image retrieval with a dual attribute-aware ranking network. ICCV1062–1070.
Huynh, C. P., Ciptadi, A., Tyagi, A., & Agrawal, A. (2018). Craft: Complementary recommendations using adversarial feature transformer. CoRR. arXiv:1804.10871.
Iwata, T., Wanatabe, S., & Sawada, H. (2011). Fashion coordinates recommender system using photographs from fashion magazines. IJCAI2262–2267.
Jagadeesh, V., Piramuthu, R., Bhardwaj, A., Di, W., & Sundaresan, N. (2014). Large scale visual recommendations from street fashion images. Kdd1925–1934.
Jetchev, N., & Bergmann, U. (2017). The conditional analogy gan: Swapping fashion articles on people images. ICCV workshops2287–2292.
Jia, J., Huang, J., Shen, G., He, T., Liu, Z., Luan, H.-B., & Yan, C. (2016). Learning to appreciate the aesthetic effects of clothing. AAAI1216–1222.
Jia, M., Shi, M., Sirotenko, M., & Cui, Y. (2019). The fashionpedia ontology and fashion segmentation dataset. https://fashionpedia.github.io/home/Fashionpedia_
overview.html.
Jia, M., Zhou, Y., Shi, M., & Hariharan, B. (2018). A deep-learning-based fashion attributes detection model. CoRR. arXiv:1810.10148.
Jiang, S., & Fu, Y. (2017). Fashion style generator. IJCAI3721–3727.
Jiang, S., Shao, M., Jia, C., & Fu, Y. (2016). Consensus style centralizing auto-encoder for weak style classification. AAAI1223–1229.
Jiang, S., Wu, Y., & Fu, Y. (2016). Deep bi-directional cross-triplet embedding for cross-domain clothing retrieval. ACM multimedia52–56.
Kalantidis, Y., Kennedy, L., & Li, L.-J. (2013). Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. ICMR.
Kang, W.-C., Fang, C., Wang, Z., & McAuley, J. J. (2017). Visually-aware fashion recommendation and design with generative image models. ICDM207–216.
Kang, W.-C., Kim, E., Leskovec, J., Rosenberg, C., & McAuley, J. J. (2018). Complete the look: Scene-based complementary product recommendation. CoRR.
arXiv:1812.01748.
Kiapour, M. H., Han, X., Lazebnik, S., Berg, A. C., & Berg, T. L. (2015). Where to buy it: Matching street clothing photos in online shops. ICCV3343–3351.
Kiapour, M. H., Yamaguchi, K., Berg, A. C., & Berg, T. L. (2014). Hipster wars: Discovering elements of fashion styles. Eccv (1)8689. Eccv (1) 472–488.
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. ICLR.
Kovashka, A., Parikh, D., & Grauman, K. (2012). Whittlesearch: Image search with relative attribute feedback. CVPR2973–2980.
Krhenbhl, P., & Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. NIPS109–117.
Kumar, S., & Gupta, M. D. (2019). C+gan: Complementary fashion item recommendation. CoRR. arXiv:1906.05596.
Kwak, I. S., Murillo, A. C., Belhumeur, P. N., Kriegman, D. J., & Belongie, S. J. (2013). From bikers to surfers: Visual recognition of urban tribes. Bmvc.
Lassner, C., Pons-Moll, G., & Gehler, P. V. (2017). A generative model of people in clothing. ICCV853–862.
Lee, H., Seol, J., & goo Lee, S. (2017). Style2vec: Representation learning for fashion items from style sets. CoRR. arXiv:1708.04014.
Lee, S., Oh, S., Jung, C., & Kim, C. (2019). A global-local emebdding module for fashion landmark detection. CoRR. arXiv:1908.10548.
Li, Y., Cao, L., Zhu, J., & Luo, J. (2017). Mining fashion outfit composition using an end-to-end deep learning approach on set data. IEEE Transactions on Multimedia,
19(8), 1946–1955.
Liang, X., Liu, S., Shen, X., Yang, J., Liu, L., Dong, J., ... Yan, S. (2015). Deep human parsing with active template regression. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 37(12), 2402–2414.
Liang, X., Xu, C., Shen, X., Yang, J., Liu, S., Tang, J., ... Yan, S. (2015). Human parsing with contextualized convolutional neural network. ICCV1386–1394.
Lin, Y., Ren, P., Chen, Z., Ren, Z., Ma, J., & de Rijke, M. (2018). Explainable fashion recommendation with joint outfit matching and comment generation. CoRR.
arXiv:1806.08977.
Lin, Y., Ren, P., Chen, Z., Ren, Z., Ma, J., & de Rijke, M. (2019). Improving outfit recommendation with co-supervision of fashion generation. WWW1095–1105.
Liu, A., Su, Y., Nie, W., & Kankanhalli, M. S. (2017). Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 39(1), 102–114.
Liu, S., Feng, J., Domokos, C., Xu, H., Huang, J., Hu, Z., & Yan, S. (2014). Fashion parsing with weak color-category labels. IEEE Transactions on Multimedia, 16,
253–265.
Liu, S., Feng, J., Song, Z., Zhang, T., Lu, H., Xu, C., & Yan, S. (2012). Hi, magic closet, tell me what to wear!. ACM multimedia619–628.
Liu, S., Liang, X., Liu, L., Lu, K., Lin, L., & Yan, S. (2014). Fashion parsing with video context. ACM multimedia467–476.
Liu, S., Liang, X., Liu, L., Shen, X., Yang, J., Xu, C., ... Yan, S. (2015). Matching-cnn meets knn: Quasi-parametric human parsing. CVPR1419–1427.
Liu, S., Song, Z., Wang, M., Xu, C., Lu, H., & Yan, S. (2012). Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. ACM

14
X. Gu, et al. Information Processing and Management 57 (2020) 102276

multimedia1335–1336.
Liu, Z., Huang, H., He, Q., Chiew, K., & Gao, Y. (2015). Rare category exploration on linear time complexity. DASFAA (2)9050. DASFAA (2) 37–54.
Liu, Z., Luo, P., Qiu, S., Wang, X., & Tang, X. (2016). Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. CVPR1096–1104.
Liu, Z., Yan, S., Luo, P., Wang, X., & Tang, X. (2016). Fashion landmark detection in the wild. Eccv (2)9906. Eccv (2) 229–245.
Luc, P., Couprie, C., Chintala, S., & Verbeek, J. (2016). Semantic segmentation using adversarial networks. CoRR. arXiv:1611.08408.
Luo, Y., Zheng, Z., Zheng, L., Guan, T., Yu, J., & Yang, Y. (2018). Macro-micro adversarial network for human parsing. ECCV (9)11213. ECCV (9) 424–440.
M., J. O., & Tuytelaars, T. (2016). Modeling visual compatibility through hierarchical mid-level elements. CoRR. arXiv:1604.00036.
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., & Gool, L. V. (2017). Pose guided person image generation. NIPS405–415.
Ma, L., Sun, Q., Georgoulis, S., Gool, L. V., Schiele, B., & Fritz, M. (2018). Disentangled person image generation. CVPR99–108.
Ma, Y., Jia, J., Zhou, S., Fu, J., Liu, Y., & Tong, Z. (2017). Towards better understanding the clothing fashion styles: A multimodal deep learning approach. AAAI38–44.
Matzen, K., Bala, K., & Snavely, N. (2017). Streetstyle: Exploring world-wide clothing styles from millions of photos. CoRR. arXiv:1706.01869.
McAuley, J. J., Targett, C., Shi, Q., & van den Hengel, A. (2015). Image-based recommendations on styles and substitutes. SIGIR43–52.
Mizuochi, M., Kanezaki, A., & Harada, T. (2014). Clothing retrieval based on local similarity with multiple images. ACM multimedia1165–1168.
Mo, S., Cho, M., & Shin, J. (2018). Instagan: Instance-aware image-to-image translation. CoRR. arXiv:1812.10889.
Murillo, A. C., Kwak, I. S., Bourdev, L. D., Kriegman, D. J., & Belongie, S. J. (2012). Urban tribes: Analyzing group photos from a social perspective. CVPR workshops28–35.
Nakamura, T., & Goto, R. (2018). Outfit generation and style extraction via bidirectional lstm and autoencoder. CoRR. arXiv:1807.03133.
Papandreou, G., Chen, L.-C., Murphy, K., & Yuille, A. L. (2015). Weakly- and semi-supervised learning of a dcnn for semantic image segmentation. CoRR, arXiv:1502.
02734.
Park, S., Nie, B. X., & Zhu, S.-C. (2018). Attribute and-or grammar for joint parsing of human pose, parts and attributes. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 40(7), 1555–1569.
Pumarola, A., Agudo, A., Sanfeliu, A., & Moreno-Noguer, F. (2018). Unsupervised person image synthesis in arbitrary poses. CVPR8620–8628.
Raj, A., Sangkloy, P., Chang, H., Hays, J., Ceylan, D., & Lu, J. (2018). Swapnet: Image based garment transfer. ECCV11216. ECCV 679–695.
Rostamzadeh, N., Hosseini, S., Boquet, T., Stokowiec, W., Zhang, Y., Jauvin, C., & Pal, C. (2018). Fashion-gen: The generative fashion dataset and challenge. CoRR.
arXiv:1806.08317.
Sbai, O., Elhoseiny, M., Bordes, A., LeCun, Y., & Couprie, C. (2018). Design: Design inspiration from generative networks. Eccv workshops11131. Eccv workshops 37–44.
Shih, Y.-S., Chang, K.-Y., Lin, H.-T., & Sun, M. (2018). Compatibility family learning for item recommendation and generation. AAAI2403–2410.
Siarohin, A., Sangineto, E., Lathuiliére, S., & Sebe, N. (2018). Deformable gans for pose-based human image generation. CVPR3408–3416.
Simo-Serra, E., Fidler, S., Moreno-Noguer, F., & Urtasun, R. (2015). Neuroaesthetics in fashion: Modeling the perception of fashionability. CVPR869–877.
Simo-Serra, E., & Ishikawa, H. (2016). Fashion style in 128 floats: Joint ranking and classification using weak data for feature extraction. CVPR298–307.
Singh, K. K., & Lee, Y. J. (2016). End-to-end localization and ranking for relative attributes. ECCV (6)9910. ECCV (6) 753–769.
Song, S., & Mei, T. (2018). When multimedia meets fashion. IEEE Multimedia, 25(3), 102–108.
Song, X., Feng, F., Han, X., Yang, X., Liu, W., & Nie, L. (2018). Neural compatibility modeling with attentive knowledge distillation. SIGIR5–14.
Song, X., Feng, F., Liu, J., Li, Z., Nie, L., & Ma, J. (2017). Neurostylist: Neural compatibility modeling for clothing matching. ACM multimedia753–761.
Song, Z., Wang, M., Hua, X.-S., & Yan, S. (2011). Predicting occupation via human clothing and contexts. ICCV1084–1091.
Takagi, M., Simo-Serra, E., Iizuka, S., & Ishikawa, H. (2017). What makes a style: Experimental analysis of fashion prediction. ICCV workshops2247–2253.
Vaccaro, K., Shivakumar, S., Ding, Z., Karahalios, K., & Kumar, R. (2016). The elements of fashion style. UIST777–785.
Vasileva, M. I., Plummer, B. A., Dusad, K., Rajpal, S., Kumar, R., & Forsyth, D. A. (2018). Learning type-aware embeddings for fashion compatibility. ECCV (16)11220.
ECCV (16) 405–421.
Veit, A., Kovacs, B., Bell, S., McAuley, J. J., Bala, K., & Belongie, S. J. (2015). Learning visual clothing style with heterogeneous dyadic co-occurrences. ICCV4642–4650.
Vittayakorn, S., Berg, A. C., & Berg, T. L. (2017). When was that made? WACV715–724.
Vittayakorn, S., Yamaguchi, K., Berg, A. C., & Berg, T. L. (2015). Runway to realway: Visual analysis of fashion. WACV951–958.
Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., & Yang, M. (2018). Toward characteristic-preserving image-based virtual try-on network. ECCV (13)11217. ECCV (13)
607–623.
Wang, W., Xu, Y., Shen, J., & Zhu, S.-C. (2018). Attentive fashion grammar network for fashion landmark detection and clothing category classification. CVPR4271–4280.
Wang, X., & Zhang, T. (2011). Clothes search in consumer photos via color matching and attribute learning. ACM multimedia1353–1356.
Wu, Z., Lin, G., Tao, Q., & Cai, J. (2018). M2e-try on net: Fashion from model to everyone. CoRR. arXiv:1811.08599.
Xian, W., Sangkloy, P., Agrawal, V., Raj, A., Lu, J., Fang, C., ... Hays, J. (2018). Texturegan: Controlling deep image synthesis with texture patches. CVPR8456–8465.
Xu, N., Zhang, H., Liu, A., Nie, W., Su, Y., Nie, J., & Yongdong, Z. (2019). Multi-level policy and reward-based deep reinforcement learning framework for image
captioning. IEEE Transactions on Multimedia.
Yamaguchi, K., Berg, T. L., & Ortiz, L. E. (2014). Chic or social: Visual popularity analysis in online fashion networks. ACM multimedia773–776.
Yamaguchi, K., Kiapour, M. H., & Berg, T. L. (2013). Paper doll parsing: Retrieving similar styles to parse clothing items. ICCV3519–3526.
Yamaguchi, K., Kiapour, M. H., Ortiz, L. E., & Berg, T. L. (2012). Parsing clothing in fashion photographs. CVPR3570–3577.
Yamaguchi, K., Kiapour, M. H., Ortiz, L. E., & Berg, T. L. (2015). Retrieving similar styles to parse clothing. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 37(5), 1028–1040.
Yamaguchi, K., Okatani, T., Sudo, K., Murasaki, K., & Taniguchi, Y. (2015). Mix and match: Joint model for clothing and attribute recognition. BMVC51.1–51.12.
Yan, S., Liu, Z., Luo, P., Qiu, S., Wang, X., & Tang, X. (2017). Unconstrained fashion landmark detection via hierarchical recurrent transformer networks. ACM multi-
media172–180.
Yang, W., Luo, P., & Lin, L. (2014). Clothing co-parsing by joint image segmentation and labeling. CVPR3182–3189.
Yang, X., He, X., Wang, X., Ma, Y., Feng, F., Wang, M., & Chua, T.-S. (2019). Interpretable fashion matching with rich attributes. SIGIR775–784.
Yang, X., Ma, Y., Liao, L., Wang, M., & Chua, T.-S. (2019). Transnfcm: Translation-based neural fashion compatibility modeling. AAAI403–410.
Yildirim, G., Seward, C., & Bergmann, U. (2018). Disentangling multiple conditional inputs in gans. CoRR. arXiv:1806.07819.
Yin, R., Li, K., Lu, J., & Zhang, G. (2019). Enhancing fashion recommendation with visual compatibility relationship. WWW3434–3440.
Yoo, D., Kim, N., Park, S., Paek, A. S., & Kweon, I.-S. (2016). Pixel-level domain transfer. ECCV (8)9912. ECCV (8) 517–532.
Yu, R., Wang, X., & Xie, X. (2019). Vtnfp: An image-based virtual try-on network with body and clothing feature preservation. ICCV10510–10519.
Yu, W., Zhang, H., He, X., Chen, X., Xiong, L., & Qin, Z. (2018). Aesthetic-based clothing recommendation. WWW649–658.
Zanfir, M., Popa, A.-I., Zanfir, A., & Sminchisescu, C. (2018). Human appearance transfer. CVPR5391–5399.
Zhang, N., Paluri, M., Ranzato, M., Darrell, T., & Bourdev, L. D. (2014). Panda: Pose aligned networks for deep attribute modeling. CVPR1637–1644.
Zhang, X., Jia, J., Gao, K., Zhang, Y., Zhang, D., Li, J., & Tian, Q. (2017). Trip outfits advisor: Location-oriented clothing recommendation. IEEE Transactions on
Multimedia, 19(11), 2533–2544.
Zhao, B., Feng, J., Wu, X., & Yan, S. (2017). Memory-augmented attribute manipulation networks for interactive fashion search. CVPR6156–6164.
Zhao, B., Wu, X., Cheng, Z.-Q., Liu, H., Jie, Z., & Feng, J. (2018). Multi-view image generation from a single-view. ACM multimedia383–391.
Zheng, S., Yang, F., Kiapour, M. H., & Piramuthu, R. (2018). Modanet: A large-scale street fashion dataset with polygon annotations. ACM multimedia1670–1678.
Zhou, X., Huang, S., Li, B., Li, Y., Li, J., & Zhang, Z. (2019). Text guided person image synthesis. CoRR. arXiv:1904.05118.
Zhou, Z., Di, X., Zhou, W., & Zhang, L. (2018). Fashion sensitive clothing recommendation using hierarchical collocation model. ACM multimedia1119–1127.
Zhu, S., Fidler, S., Urtasun, R., Lin, D., & Loy, C. C. (2017). Be your own prada: Fashion synthesis with structural coherence. ICCV1689–1697.
Zou, Q., Zhang, Z., Wang, Q., Li, Q., Chen, L., & Wang, S. (2016). Who leads the clothing fashion: Style, color, or texture? A computational study. CoRR. arXiv:1608.
07444.
Zou, X., Kong, X., Wong, W., Wang, C., Liu, Y., & Cao, Y. (2019). Fashionai: A hierarchical dataset for fashion understanding. CVPR workshops.

15

You might also like