Human Parsing For Image-Based Virtual Try-On Using Pix2Pix

The document discusses human parsing for image-based virtual try-on using Pix2Pix. It provides background on challenges with segmentation for virtual try-on and reviews previous work using CNN models that achieve average accuracy of 80-96% but rely on limited datasets. The paper then proposes a human parsing method using Pix2Pix on the VITON dataset, which achieves 89.76% accuracy, 86.80% F1-score, and 76.79% IoU, allowing its use in future virtual try-on research.

Uploaded by

keerthika Peravali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

206 views5 pages

Human Parsing For Image-Based Virtual Try-On Using Pix2Pix

Uploaded by

keerthika Peravali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)

Human Parsing for Image-Based Virtual Try-On

Using Pix2Pix
M. Haswin Anugrah Pratama Willy Anugrah Cahyadi Fiky Yosef Suratman
2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS) | 979-8-3503-9645-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/IoTaIS56727.2022.9975927

School of Electrical Engineering School of Electrical Engineering School of Electrical Engineering

Telkom University Telkom University Telkom University
Bandung, Indonesia Bandung, Indonesia Bandung, Indonesia
[email protected] [email protected] [email protected]
sity.ac.id

Abstract— Image-based virtual try-on is a method that can an average F1-score rate of 80.14%. Liang et al. [10] proposed
let people try on clothes virtually. One of the challenges in Local-Global Long Short-Term Memory (LG-LSTM) which
image-based virtual try-on is segmentation. The segmentation seamlessly incorporates short-distance and long-distance
needed in the virtual try-on implementation is the one that can spatial dependencies into the feature learning overall pixel
divide humans into several objects based on their body parts positions. The results obtained are an average accuracy rate of
such as hair, face, neck, hands, upper body, and lower body. 96.85% and an average F1-score rate of 84.12%. However, the
This type of segmentation is called human parsing. There are performance of those CNN-based approaches heavily relies
several human parsing methods and datasets that have achieved on the availability of annotated images for training from the
great results. Unfortunately, some limitations make the method
ATR dataset [8]. Unfortunately, the ATR dataset [8] doesn't
unsuitable in an image-based virtual try-on model. We proposed
come with the pose label annotations required in an image-
human parsing using the Pix2Pix model with the VITON
dataset. Our model yields an average accuracy of 89.76%, an based virtual try-on model and this dataset is no longer
average F1-score of 86.80%, and an average IoU of 76.79%. published.
These satisfactory results allow our model to be used in Inspired by the limitation of the ATR dataset, Gong et al.
upcoming image-based virtual try-on research. [11] introduced the LIP dataset, which is annotated with both
human parsing and pose labels. There are various pieces of
Keywords—Human Parsing, Pix2Pix, Image Segmentation,
research in human parsing using the LIP dataset [11]. Liang et
Generative Adversarial Networks
al. [12] proposed Self-supervised Structure-sensitive Joint
I. INTRODUCTION human Parsing and Pose estimation Network (SS-JPPNet)
which imposed human pose structures into the parsing results
In recent years, online shopping trends for fashion without resorting to extra supervision. The results obtained are
products have been increasing. This is driving the growth of an average accuracy rate of 54.94% and an average IoU rate
research on image-based virtual try-on. Image-based virtual of 44.73%. Chen et al. [13] proposed DeepLabv3+ which
try-on is a method that can let people try on clothes virtually. extended DeepLabv3 by adding a simple yet effective decoder
After Jetchev et al. [1] introduced CAGAN, there are various module to refine the segmentation results, especially along
image-based virtual try-on methods have been proposed [2], object boundaries. The results obtained are an average
[3], [4]. One of the challenges in image-based virtual try-on is accuracy rate of 55.62% and an average IoU rate of 44.80%.
segmentation. There are several types of image segmentation Wang et al. [14] proposed Hierarchical Human Parsing (HHP)
such as semantic segmentation [5], instance segmentation [6], which addressed the inference over the loopy human structure.
and panoptic segmentation [7] which can perform The results obtained are an average accuracy rate of 70.58%
segmentation to distinguish each object. Unfortunately, when and an average IoU rate of 59.25%. The quantitative results
applied to person image segmentation, a person will be obtained are not better than the ATR dataset [8] because the
counted as one object. Meanwhile, the segmentation needed LIP dataset [11] contains images of people appearing with
in the virtual try-on implementation is one that can divide challenging poses and viewpoints, heavy occlusions, various
humans into several objects based on their body parts such as appearances, and in a wide range of resolutions. Furthermore,
hair, face, neck, hands, upper body, and lower body. This type the background of the images in the LIP dataset is also more
of segmentation is called human parsing. Human parsing is a complex and diverse than the ATR dataset [8]. For that reason,
method to decompose human images into semantic body the LIP dataset [11] is not suitable to use in image-based
regions, which is an important component for image-based virtual try-on models.
virtual try-on research.
Recently, Choi et al. [15] proposed High-Resolution
Previously, convolutional neural networks (CNNs) have Virtual Try-On (VITON-HD) that shows outstanding results
achieved great results in human parsing. Liang et al. [8] as an image-based virtual try-on model. VITON-HD [15]
proposed Active Template Regression (ATR) which designed adopt U-Net architecture [16] for the segmentation generator
two separate convolutional neural networks to build the end- which successfully generates segmentation maps. These
to-end relation between the input image and the structure results inspired us to use U-Net architecture [16] as part of our
outputs. The results obtained are an average accuracy rate of human parsing model. Unfortunately, the VITON-HD dataset
91.11% and an average F1-score rate of 64.38%. Liang et al. is not publicly available. Therefore, we will use the most
[9] proposed Contextualized Convolutional Neural Network widely used dataset for the image-based virtual try-on model,
(Co-CNN) which integrated the cross-layer context, global namely VITON dataset [2]. To experiment on VITON, Han et
image-level context, within-super-pixel context, and cross- al. [2] collected a dataset consisting of around 19,000 frontal-
super-pixel neighborhood context into a unified network. The view women and top clothing image pairs. Due to some noisy
results obtained are an average accuracy rate of 96.02% and

979-8-3503-9645-4/22/$31.00 ©2022 IEEE 413

Authorized licensed use limited to: Keerthika Peravali. Downloaded on March 18,2023 at 02:56:41 UTC from IEEE Xplore. Restrictions apply.
2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)

Figure 1. Overview of Pix2Pix model for human parsing task

images being removed, this dataset is reduced to 16,253 very realistic images such as faces [19], indoor scenes [20],
images divided into 14,221 training sets and 2,032 testing sets. panoramas [21], and clothes [22]. GAN consists of two
However, VITON [2] uses an annotation style from the LIP models, namely a generative model (generator) and a
dataset [11] to compute a human segmentation map, where discriminative model (discriminator). The generator learns to
different regions represent different parts of the human body create fake information that looks real, while the discriminator
like arms, legs, etc. Unlike the LIP dataset [11], the VITON learns to distinguish the real and the fake. The two models will
dataset [2] uses a simple plain background of the images, so it compete with each other and improve their capabilities until
can be easily used in the image-based virtual try-on model. the generator is able to generate fake information that the
discriminator cannot distinguish. Mirza et al. [23] proposed
In this work, we want to improve the human parsing model Conditional Generative Adversarial Networks (CGAN) to
not only by using U-Net architecture [16] but also by allow generating of images from the specific class which
combining it with Generative Adversarial Networks [17]. became the basic idea for text-to-image synthesis [24], [25]
Therefore, we proposed human parsing using the Pix2Pix [18] and image-to-image translation [18], [26]. Isola et al. [18]
model which consists of U-Net [16] and PatchGAN [18] proposed Pix2Pix which learns the mapping from the input
architecture. The objective of this research is to examine the image to the output image. Pix2Pix model consists of a
Pix2Pix model for human parsing tasks using the VITON generator with a U-Net-based architecture [16] and a
dataset [2], so it can be used for the upcoming image-based discriminator with a convolutional PatchGAN classifier
virtual try-on research. architecture [18]. Pix2pix models can also be used for image
II. RELATED WORK segmentation. Therefore, in this work, we use the Pix2Pix
model for the human parsing task.
Human Parsing. Human parsing is a method to
decompose human images into semantic body regions, which III. PROPOSED METHOD
is an important component for image-based virtual try-on Pix2Pix is a Conditional Generative Adversarial Network
research. The CNN-based human parsing methods [8], [9], (CGAN) that learns the mapping from the input image to the
[10] with the ATR dataset [8] which contained 17,700 images output image. Technically, the generator learns the mapping
divided into 16,000 training sets, 1,000 test sets, and 700
from the real image as well as the noise vector , ∶
validation sets. These researches have achieved great results
, → , while the discriminator learns representation from
on human parsing. However, the dataset cannot be used in an
labels as well as the real images , , . In Figure 1, we
image-based virtual try-on model because it has no pose label
use the person image as an input, then put it into the
annotations and is no longer published. The other human
generator . The generator creates the fake parsing image
parsing methods [12], [13], [14] with the LIP dataset [11]
which contained 50,462 images divided into 30,642 training which is then fed into the discriminator along with the
sets, 10,000 test sets, and 10,000 validation sets. The person image . At the same time, the real parsing image
quantitative results obtained may not be better than expected. and the person image are fed into the discriminator . So,
But the dataset is superior because it has label annotations. now in the discriminator , we have two pairs of images, a
Unfortunately, it is not suitable to use in image-based virtual pair of real images , and also a pair of fake images
try-on models because it has complex and diverse , . The discriminator distinguishes the real and the
backgrounds which can complicate the segmentation process. fake using the loss function. Then, the discriminator
The VITON dataset [2] contains 16,253 images divided into produces discriminator loss, while the generator produces
14,221 training sets and 2,032 test sets. Recent image-based generator loss. After the losses are obtained, the model will
virtual try-on research [15], uses the VITON dataset and apply backpropagation to update the weights for the generator
adopts U-Net architecture [16] for the segmentation model. and the discriminator using the optimizer. The generator
Our purpose is to improve the model by using Pix2Pix which and the discriminator will continue to learn until the
is part of Generative Adversarial Networks. generator is able to generate a fake parsing image that
the discriminator cannot distinguish.
Image Translation with Generative Adversarial
Networks. Generative Adversarial Networks (GAN) [17] are
an emergent method of deep learning algorithms that generate

414

Figure 2. U-Net architecture used

A. Loss Function learn to provide an accurate output through concatenating

We use the loss function based on a Conditional data.
Generative Adversarial Network which can be expressed as : The discriminator uses the PatchGAN classifier
ℒ , , log , architecture [18], as seen in Figure 3. Some transposed
convolutional blocks are used in this architecture. To
, log !1 # , , $%. (1) determine whether an image is real or fake, the discriminator
checks an NxN piece of an image, while N could be any size.
The adversarial loss above consists of one generator and The size might be smaller than the input image but could
one discriminator which used a minimax game. The deliver an excellent outcome. The entire image applies the
discriminator tries to maximize its probability by correctly discriminator convolutionally. Furthermore, the discriminator
classifying the real image and the fake image and the has fewer parameters than the generator, so its process could
generator tries to minimize the probability that the be shorter.
discriminator will predict the outcome to be false.
The adversarial loss is combined with the L1 loss to build
a generator that not only tricks the discriminator but also
delivers parsing images close to the ground truth.
ℒ'( , , ‖ # , ‖( .
(2)
Because the adversarial loss involves the L1 loss for the
generator, the total loss function is as follows:
(3)
∗
arg min max ℒ , 2ℒ'( .
1

B. Network Architecture
Pix2Pix models consists of a generator with a U-Net-based
architecture [16] and a discriminator with a convolutional
PatchGAN classifier architecture [18].
Figure 3. PatchGAN architecture used
U-Net is a CNN-based architecture designed for semantic
segmentation tasks on biomedical images. The network is IV. EXPERIMENTS
built on a fully convolutional network [5], which has been A. Experiment Setup
modified and extended to work with fewer training images
and produce more precise segmentation. Dataset. We used VITON dataset [2] which contained
16,253 images divided into 14,221 training sets, and 2,032 test
U-Net is made up of two paths: a contracting path and an sets with a 192 x 256 resolution. The dataset consists of 4
expansive path. The convolutional layers in the contracting categories such as cloth, cloth-mask, image, and image-parse.
path downsample the image while extracting information. The For human parsing model, the only category used are image
expansive path is constructed from transposed convolutional and image-parse. Before the dataset is used in the model, there
layers that upsample the image. Figure 2 depicts the procedure are several pre-processing steps. First, the dataset must be
that takes place in the U-Net architecture. It begins with resized into a 256 x 256 resolution. Then, the dataset must be
downsampling layers (blue), where every convolutional block concatenated for each pair of image and image-parsed
extracts data and sends it into the following convolutional category. Finally, the dataset is ready to use.
block, which extracts more data before it hits the bottleneck in
the center (purple). When the process hits the bottleneck, it Training and Inference. The code was developed in
continues to the upsampling layers (orange), where Tensorflow and ran on an Nvidia K80 GPU with 12 GB of
every transposed convolutional block extends data from the RAM. For training, we utilize minibatch SGD using the Adam
preceding block while concatenating data from the related optimizer for both the generator and the discriminator, with a
downsampling layer. Based on this data, the network could learning rate of 0.0002 and momentum parameters 3( = 0.5.

415

Figure 4. Human parsing results using Pix2Pix

For the generator, we use batch normalization at all layers,
LeakyReLU as the activation function in downsampling
layers, and ReLU as the activation function in upsampling
layers. We use dropout but only applied to the first 3 blocks in
upsampling layers. For the discriminator, we use batch
normalization and LeakyReLU at all layers.
B. Evaluation Metrics
Pixel Accuracy. Pixel accuracy is a popular metric that
reports the percentage of pixels in an image that were correctly
classified.
Intersection over Union. The Intersection over Union
(IoU) metric, also known as the Jaccard index, quantifies the
percent overlap between the target mask and the prediction
output. The IoU metric divides the total number of pixels
present across both masks by the number of pixels shared by V. CONCLUSIONS
the target and prediction masks.
We solved the human parsing task with Pix2Pix models
F1-score. The F1-score metric, also known as the Dice consisting of the U-Net Generator and the PatchGAN
coefficient, has the same concept with IoU metric. The Discriminator. Our model has an average accuracy of 89.76%,
different is F1-score count the intersection from the target and an average F1-score of 86.80%, and an average IoU of
prediction mask, then multiple it by 2 and divide it by the total 76.79%, which outperforms several state-of-the-art human
pixels in both the images. parsing models. The disadvantage of this model is that the
C. Quantitative Analysis resulting image contains few blurry areas, particularly on the
sides of the body parts. These satisfactory results allow our
We compare our proposed method with six state-of-the- model to be used in upcoming image-based virtual try-on
art approaches [8], [9], [10], [12], [13], and [14] shown in research.
Table 1. Our method can significantly outperform three
baselines: 34.82% over SS-JPPNet [12], 34.14% over REFERENCES
DeepLabv3+ [13], and 19.18% over HHP [14] in terms of
average pixel Accuracy. Our method also can significantly [1] N. Jetchev and U. Bergmann, “The Conditional
outperform three baselines: 22.42% over ATR [8], 6.66% Analogy GAN: Swapping Fashion Articles on People
over Co-CNN [9], and 2.68% over LG-LSTM [10] in terms Images,” in 2017 IEEE International Conference on
of the average F1-score. Our method also can significantly Computer Vision Workshops (ICCVW), Oct. 2017,
outperform three baselines: 32.06% over SS-JPPNet [12], pp. 2287–2292. doi: 10.1109/ICCVW.2017.269.
31.99% over DeepLabv3+ [13], and 17.54% over HHP [14] [2] X. Han, Z. Wu, Z. Wu, R. Yu, and L. S. Davis,
in terms of the average IoU. It demonstrates that our method “VITON: An Image-Based Virtual Try-on Network,”
performs very well as a human parsing model. in 2018 IEEE/CVF Conference on Computer Vision
and Pattern Recognition, Jun. 2018, pp. 7543–7552.
D. Qualitative Analysis doi: 10.1109/CVPR.2018.00787.
Figure 4. presents the visual results of our method. The [3] B. Wang, H. Zheng, X. Liang, Y. Chen, L. Lin, and
level of color similarity between the target image and the M. Yang, “Toward Characteristic-Preserving Image-
predicted image for each body part is quite satisfactory. The Based Virtual Try-On Network,” in European
disadvantage of this method is that the resulting image still has Conference on Computer Vision (ECCV), vol. 15,
some blurry parts, especially on the sides of the body parts. 2018, pp. 607–623. doi: 10.1007/978-3-030-01261-
8_36.
[4] H. Yang, R. Zhang, X. Guo, W. Liu, W. Zuo, and P.
Luo, “Towards Photo-Realistic Virtual Try-On by

416

Adaptively Generating↔Preserving Image Content,” Aware Normalization,” in 2021 IEEE/CVF

in 2020 IEEE/CVF Conference on Computer Vision Conference on Computer Vision and Pattern
and Pattern Recognition (CVPR), Jun. 2020, pp. Recognition (CVPR), Jun. 2021, pp. 14126–14135.
7847–7856. doi: 10.1109/CVPR42600.2020.00787. doi: 10.1109/CVPR46437.2021.01391.
[5] J. Long, E. Shelhamer, and T. Darrell, “Fully [16] O. Ronneberger, P. Fischer, and T. Brox, “U-Net:
convolutional networks for semantic segmentation,” Convolutional Networks for Biomedical Image
in 2015 IEEE Conference on Computer Vision and Segmentation,” in Medical Image Computing and
Pattern Recognition (CVPR), Jun. 2015, pp. 3431– Computer-Assisted Intervention (MICCAI), 2015, pp.
3440. doi: 10.1109/CVPR.2015.7298965. 234–241. doi: 10.1007/978-3-319-24574-4_28.
[6] K. He, G. Gkioxari, P. Dollar, and R. Girshick, [17] I. Goodfellow et al., “Generative Adversarial
“Mask R-CNN,” in 2017 IEEE International Networks,” International Conference on Neural
Conference on Computer Vision (ICCV), Oct. 2017, Information Processing Systems (NIPS), vol. 27, pp.
pp. 2980–2988. doi: 10.1109/ICCV.2017.322. 2672–2680, Oct. 2014, doi: 10.1145/3422622.
[7] A. Kirillov, K. He, R. Girshick, C. Rother, and P. [18] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-
Dollar, “Panoptic Segmentation,” in 2019 IEEE/CVF to-Image Translation with Conditional Adversarial
Conference on Computer Vision and Pattern Networks,” in 2017 IEEE Conference on Computer
Recognition (CVPR), Jun. 2019, pp. 9396–9405. doi: Vision and Pattern Recognition (CVPR), Jul. 2017,
10.1109/CVPR.2019.00963. pp. 5967–5976. doi: 10.1109/CVPR.2017.632.
[8] X. Liang et al., “Deep Human Parsing with Active [19] A. Radford, L. Metz, and S. Chintala, “Unsupervised
Template Regression,” IEEE Transactions on Representation Learning with Deep Convolutional
Pattern Analysis and Machine Intelligence, vol. 37, Generative Adversarial Networks,” in International
no. 12, pp. 2402–2414, Dec. 2015, doi: Conference on Learning Representations (ICLR),
10.1109/TPAMI.2015.2408360. Oct. 2016, pp. 1–16.
[9] X. Liang et al., “Human Parsing with Contextualized [20] X. Wang and A. Gupta, “Generative Image Modeling
Convolutional Neural Network,” in 2015 IEEE Using Style and Structure Adversarial Networks,” in
International Conference on Computer Vision European Conference on Computer Vision (ECCV),
(ICCV), Dec. 2015, pp. 1386–1394. doi: vol. 14, 2016, pp. 318–335. doi: 10.1007/978-3-319-
10.1109/ICCV.2015.163. 46493-0_20.
[10] X. Liang, X. Shen, D. Xiang, J. Feng, L. Lin, and S. [21] J.-Y. Zhu, P. Krähenbühl, E. Shechtman, and A. A.
Yan, “Semantic Object Parsing with Local-Global Efros, “Generative Visual Manipulation on the
Long Short-Term Memory,” in 2016 IEEE Natural Image Manifold,” in European Conference
Conference on Computer Vision and Pattern on Computer Vision (ECCV), vol. 14, 2016, pp. 597–
Recognition (CVPR), Jun. 2016, pp. 3185–3193. doi: 613. doi: 10.1007/978-3-319-46454-1_36.
10.1109/CVPR.2016.347. [22] D. Yoo, N. Kim, S. Park, A. S. Paek, and I. S. Kweon,
[11] K. Gong, X. Liang, D. Zhang, X. Shen, and L. Lin, “Pixel-Level Domain Transfer,” in European
“Look into Person: Self-Supervised Structure- Conference on Computer Vision (ECCV), vol. 14,
Sensitive Learning and a New Benchmark for Human 2016, pp. 517–532. doi: 10.1007/978-3-319-46484-
Parsing,” in 2017 IEEE Conference on Computer 8_31.
Vision and Pattern Recognition (CVPR), Jul. 2017, [23] M. Mirza and S. Osindero, “Conditional Generative
pp. 6757–6765. doi: 10.1109/CVPR.2017.715. Adversarial Nets,” Computing Research Repository
[12] X. Liang, K. Gong, X. Shen, and L. Lin, “Look into (CoRR), pp. 1–7, Nov. 2014.
Person: Joint Body Parsing & Pose Estimation [24] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B.
Network and a New Benchmark,” IEEE Transactions Schiele, and H. Lee, “Generative Adversarial Text to
on Pattern Analysis and Machine Intelligence, vol. Image Synthesis,” in International Conference on
41, no. 4, pp. 871–885, Apr. 2019, doi: Machine Learning (ICML), May 2016, pp. 1060–
10.1109/TPAMI.2018.2820063. 1069.
[13] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and [25] T. Xu et al., “AttnGAN: Fine-Grained Text to Image
H. Adam, “Encoder-Decoder with Atrous Separable Generation with Attentional Generative Adversarial
Convolution for Semantic Image Segmentation,” in Networks,” in 2018 IEEE/CVF Conference on
European Conference on Computer Vision (ECCV), Computer Vision and Pattern Recognition, Jun.
vol. 15, 2018, pp. 833–851. doi: 10.1007/978-3-030- 2018, pp. 1316–1324. doi:
01234-2_49. 10.1109/CVPR.2018.00143.
[14] W. Wang, H. Zhu, J. Dai, Y. Pang, J. Shen, and L. [26] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros,
Shao, “Hierarchical Human Parsing with Typed Part- “Unpaired Image-to-Image Translation Using Cycle-
Relation Reasoning,” in 2020 IEEE/CVF Conference Consistent Adversarial Networks,” in 2017 IEEE
on Computer Vision and Pattern Recognition International Conference on Computer Vision
(CVPR), Jun. 2020, pp. 8926–8936. doi: (ICCV), Oct. 2017, pp. 2242–2251. doi:
10.1109/CVPR42600.2020.00895. 10.1109/ICCV.2017.244.
[15] S. Choi, S. Park, M. Lee, and J. Choo, “VITON-HD:
High-Resolution Virtual Try-On via Misalignment-

417

Authorized licensed use limited to: Keerthika Peravali. Downloaded on March 18,2023 at 02:56:41 UTC from IEEE Xplore. Restrictions apply.

Human Gender Classification Based On Hand Images Using Deep Learning
No ratings yet
Human Gender Classification Based On Hand Images Using Deep Learning
10 pages
A Virtual Try-On System Based On Deep Learning
No ratings yet
A Virtual Try-On System Based On Deep Learning
5 pages
LGVTON - A Landmark Guided Approach For Model To Person Virtual Try-On
No ratings yet
LGVTON - A Landmark Guided Approach For Model To Person Virtual Try-On
39 pages
MNIST
No ratings yet
MNIST
15 pages
HR Viton
No ratings yet
HR Viton
27 pages
EScholarship UC Item 3rd9150m
No ratings yet
EScholarship UC Item 3rd9150m
128 pages
Parser-Free Virtual Try-On Via Distilling Appearance Flows CVPR 2021 Paper
No ratings yet
Parser-Free Virtual Try-On Via Distilling Appearance Flows CVPR 2021 Paper
9 pages
Stableviton: Learning Semantic Correspondence With Latent Diffusion Model For Virtual Try-On
No ratings yet
Stableviton: Learning Semantic Correspondence With Latent Diffusion Model For Virtual Try-On
17 pages
VTON Research Evaluation
No ratings yet
VTON Research Evaluation
15 pages
Virtual-Try-Off Via High-Fidelity Garment Reconstruction Using Diffusion Models.18350v1
No ratings yet
Virtual-Try-Off Via High-Fidelity Garment Reconstruction Using Diffusion Models.18350v1
22 pages
Tryondiffusion: A Tale of Two Unets
No ratings yet
Tryondiffusion: A Tale of Two Unets
30 pages
HR Viton
No ratings yet
HR Viton
16 pages
Catvton
No ratings yet
Catvton
16 pages
4 DPB
No ratings yet
4 DPB
10 pages
Omniparser For Pure Vision Based Gui Agent: Yadong Lu, Jianwei Yang, Yelong Shen, Ahmed Awadallah
No ratings yet
Omniparser For Pure Vision Based Gui Agent: Yadong Lu, Jianwei Yang, Yelong Shen, Ahmed Awadallah
14 pages
Sarkar Parameter Efficient Local Implicit Image Function Network For Face Segmentation CVPR 2023 Paper
No ratings yet
Sarkar Parameter Efficient Local Implicit Image Function Network For Face Segmentation CVPR 2023 Paper
11 pages
Image Based Virtual Try-On
No ratings yet
Image Based Virtual Try-On
12 pages
Text2Human: Text-Driven Controllable Human Image Generation
No ratings yet
Text2Human: Text-Driven Controllable Human Image Generation
11 pages
Improving Diffusion Models For Authentic Virtual Try-On in The Wild
No ratings yet
Improving Diffusion Models For Authentic Virtual Try-On in The Wild
30 pages
Choi VITON-HD High-Resolution Virtual Try-On Via Misalignment-Aware Normalization CVPR 2021 Paper
No ratings yet
Choi VITON-HD High-Resolution Virtual Try-On Via Misalignment-Aware Normalization CVPR 2021 Paper
10 pages
Virtual Trial Room
100% (1)
Virtual Trial Room
4 pages
Komprimiertes PDF 2403.05139v3
No ratings yet
Komprimiertes PDF 2403.05139v3
30 pages
End Sem Seminar
No ratings yet
End Sem Seminar
21 pages
FS Vton
No ratings yet
FS Vton
10 pages
Automated Digitization of Student's Marks From The Answer Book
No ratings yet
Automated Digitization of Student's Marks From The Answer Book
9 pages
Fele C-VTON Context-Driven Image-Based Virtual Try-On Network WACV 2022 Paper
No ratings yet
Fele C-VTON Context-Driven Image-Based Virtual Try-On Network WACV 2022 Paper
10 pages
32729-Article Text-36797-1-2-20250410
No ratings yet
32729-Article Text-36797-1-2-20250410
10 pages
Hand Digit Recognition Using CNN & Ann: Upma Jain, Vipashi Kansal, Tanusha Mittal, Ms Sonali Gupta
No ratings yet
Hand Digit Recognition Using CNN & Ann: Upma Jain, Vipashi Kansal, Tanusha Mittal, Ms Sonali Gupta
10 pages
Seminar
No ratings yet
Seminar
23 pages
ICSCSS 177 Final Manuscript
No ratings yet
ICSCSS 177 Final Manuscript
7 pages
Morelli Dress Code High-Resolution Multi-Category Virtual Try-On CVPRW 2022 Paper
No ratings yet
Morelli Dress Code High-Resolution Multi-Category Virtual Try-On CVPRW 2022 Paper
5 pages
The Digital Sovereignty Trap: Avoiding The Return of Silos and A Divided World
No ratings yet
The Digital Sovereignty Trap: Avoiding The Return of Silos and A Divided World
101 pages
Fast and Robust Virtual Try On Based On Parser Free Generative Adversarial Network
No ratings yet
Fast and Robust Virtual Try On Based On Parser Free Generative Adversarial Network
10 pages
Fashion Synthesis With Structural Coherence
No ratings yet
Fashion Synthesis With Structural Coherence
9 pages
Group - 45 Research Paper - Final
No ratings yet
Group - 45 Research Paper - Final
8 pages
Weakly and Semi Supervised Human Body Part Parsing Via Pose-Guided Knowledge Transfer
No ratings yet
Weakly and Semi Supervised Human Body Part Parsing Via Pose-Guided Knowledge Transfer
9 pages
Imagdressing-V1: Customizable Virtual Dressing
No ratings yet
Imagdressing-V1: Customizable Virtual Dressing
9 pages
Another half-RP
No ratings yet
Another half-RP
3 pages
LaDI VTON
No ratings yet
LaDI VTON
15 pages
Tryongan
No ratings yet
Tryongan
11 pages
High-Resolution Image Synthesis and Semantic Manipulation With Conditional Gans
No ratings yet
High-Resolution Image Synthesis and Semantic Manipulation With Conditional Gans
60 pages
Virtual Dressing Room Using OpenCv
No ratings yet
Virtual Dressing Room Using OpenCv
18 pages
Look Into Person: Self-Supervised Structure-Sensitive Learning and A New Benchmark For Human Parsing
No ratings yet
Look Into Person: Self-Supervised Structure-Sensitive Learning and A New Benchmark For Human Parsing
9 pages
Implementation of Virtual Try-On For Clothing Products Using Deep Learning Methods
No ratings yet
Implementation of Virtual Try-On For Clothing Products Using Deep Learning Methods
6 pages
Multistage Adversarial Losses For Pose-Based Human Image Synthesis
No ratings yet
Multistage Adversarial Losses For Pose-Based Human Image Synthesis
9 pages
Viton, CVPR 2018 PDF
No ratings yet
Viton, CVPR 2018 PDF
19 pages
Human Computation and Computer Vision
No ratings yet
Human Computation and Computer Vision
50 pages
Design and Implementation of Virtual Try-On System Using Machine Learning
No ratings yet
Design and Implementation of Virtual Try-On System Using Machine Learning
10 pages
Virtual Try On
No ratings yet
Virtual Try On
10 pages
Pifuhd: Multi-Level Pixel-Aligned Implicit Function For High-Resolution 3D Human Digitization
No ratings yet
Pifuhd: Multi-Level Pixel-Aligned Implicit Function For High-Resolution 3D Human Digitization
10 pages
Cp-Vton, Eccv 2018 PDF
No ratings yet
Cp-Vton, Eccv 2018 PDF
16 pages
Mpvton, Iccv 2019 PDF
No ratings yet
Mpvton, Iccv 2019 PDF
11 pages
SuperAI Singapore Prospectus 2024
No ratings yet
SuperAI Singapore Prospectus 2024
6 pages
Ladi-Vton: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
No ratings yet
Ladi-Vton: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
15 pages
ML, ALgo Roadmap
No ratings yet
ML, ALgo Roadmap
21 pages
Deep Learning For Hand Segmentation in Complex Backgrounds
No ratings yet
Deep Learning For Hand Segmentation in Complex Backgrounds
2 pages
Sample Research Paper Review of Related Literature
100% (1)
Sample Research Paper Review of Related Literature
7 pages
Image Based Virtual Try On Network
No ratings yet
Image Based Virtual Try On Network
4 pages
Full Automated Planning Theory and Practice Ghallab Nau Traverso Ghallab PDF All Chapters
No ratings yet
Full Automated Planning Theory and Practice Ghallab Nau Traverso Ghallab PDF All Chapters
89 pages
Incremental Learning
No ratings yet
Incremental Learning
35 pages
Artificial Intelligence Usage For Teaching and Learning of Christian Religious Education in Tertiary Institutions in Abuja, Nigeria
No ratings yet
Artificial Intelligence Usage For Teaching and Learning of Christian Religious Education in Tertiary Institutions in Abuja, Nigeria
8 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Frequency-Based Deep Exposure Correction For Low-Light Images
No ratings yet
Frequency-Based Deep Exposure Correction For Low-Light Images
10 pages
Etl Ai ML Ebook Chaitali
No ratings yet
Etl Ai ML Ebook Chaitali
34 pages
Different Branches of AI
No ratings yet
Different Branches of AI
25 pages
Google Willow - Quantum World
No ratings yet
Google Willow - Quantum World
3 pages
Robotic Assistant For Object Recognition Using Con
No ratings yet
Robotic Assistant For Object Recognition Using Con
13 pages
Assignment - Week 8 Type of Question: MCQ/MSQ: Course Name: Introduction To Machine Learning Clustering
No ratings yet
Assignment - Week 8 Type of Question: MCQ/MSQ: Course Name: Introduction To Machine Learning Clustering
6 pages
AI Lec3
No ratings yet
AI Lec3
22 pages
Ai Impact On Banking
No ratings yet
Ai Impact On Banking
4 pages
AICE Milestone02 MICHAEL OGUEKHIAN 24.02.2024
No ratings yet
AICE Milestone02 MICHAEL OGUEKHIAN 24.02.2024
4 pages
21st Century Jobs Class Presentation
No ratings yet
21st Century Jobs Class Presentation
12 pages
Classification of The Approaches To The Technological - Resurrection
No ratings yet
Classification of The Approaches To The Technological - Resurrection
29 pages
Unit 7 Artificial Intelligence
No ratings yet
Unit 7 Artificial Intelligence
12 pages
IA in Finance
No ratings yet
IA in Finance
28 pages
A Review of The Marathi Natural Language Processing
No ratings yet
A Review of The Marathi Natural Language Processing
13 pages
UCI 202 Course Outline
No ratings yet
UCI 202 Course Outline
6 pages
LoI Ualberta
No ratings yet
LoI Ualberta
3 pages
MIS Unit 1
No ratings yet
MIS Unit 1
31 pages
Deep Learning Youtube Video Tags: Travis Addair Stanford University Taddair@Stanford - Edu
No ratings yet
Deep Learning Youtube Video Tags: Travis Addair Stanford University Taddair@Stanford - Edu
7 pages
Gartner - Become An Ai First Organization 5 Critical Ai Adoption Phases
No ratings yet
Gartner - Become An Ai First Organization 5 Critical Ai Adoption Phases
21 pages
Python
No ratings yet
Python
12 pages
Normal Duration: Subject Description Form
No ratings yet
Normal Duration: Subject Description Form
3 pages
Extractive Text Summarization Using Word Vector Embedding
No ratings yet
Extractive Text Summarization Using Word Vector Embedding
5 pages
Chat GPT
No ratings yet
Chat GPT
35 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Percept: Fundamentals and Applications
From Everand
Percept: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet

Human Parsing For Image-Based Virtual Try-On Using Pix2Pix

Uploaded by

Human Parsing For Image-Based Virtual Try-On Using Pix2Pix

Uploaded by

2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)

Human Parsing for Image-Based Virtual Try-On

School of Electrical Engineering School of Electrical Engineering School of Electrical Engineering

979-8-3503-9645-4/22/$31.00 ©2022 IEEE 413

Figure 1. Overview of Pix2Pix model for human parsing task

Figure 2. U-Net architecture used

A. Loss Function learn to provide an accurate output through concatenating

Figure 4. Human parsing results using Pix2Pix

Adaptively Generating↔Preserving Image Content,” Aware Normalization,” in 2021 IEEE/CVF

You might also like