0% found this document useful (0 votes)
75 views7 pages

Virtual Trial Room Real-Time Image-Based Virtual Cloth Try-On System

The document discusses a real-time image-based virtual cloth try-on system that utilizes CP-ViTon, a deep learning algorithm. The system uses a camera and monitor to capture the user's image and display virtual try-ons in real-time, providing an interactive shopping experience. It aims to create an efficient virtual try-on system that seamlessly integrates into the shopping experience for customers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views7 pages

Virtual Trial Room Real-Time Image-Based Virtual Cloth Try-On System

The document discusses a real-time image-based virtual cloth try-on system that utilizes CP-ViTon, a deep learning algorithm. The system uses a camera and monitor to capture the user's image and display virtual try-ons in real-time, providing an interactive shopping experience. It aims to create an efficient virtual try-on system that seamlessly integrates into the shopping experience for customers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Virtual Trial Room: Real-Time Image-Based Virtual


Cloth Try-On System
Mahek Agarwal Akshay B
Department of CSE PES Department of CSE PES
University Bangalore, India University Bangalore, India

Gaurav Sutradhar Abhay Raj


Department of CSE Department of CSE PES
PES University University Bangalore, India
Bangalore, India

Uma D
Department of CSE
PES University Bangalore, India

Abstract:- The emergence of e-commerce has transformed home. Virtual trial rooms allow customers to visualize how
the way consumers shop, providing convenience and access they would look in different outfits before making a purchase
to a vast array of products. However, one aspect that has decision. This feature provides a better shopping experience,
remained a challenge in the online shopping experience is increases customer satisfaction, and reduces the likelihood of
the inability to physically try on clothes and assess their fit returns.
and appearance before making a purchase. This limitation
often leads to dissat- is faction and high return rates, To enhance the virtual trial room experience, this project
posing significant challenges for retailers. proposes a real-time image-based virtual cloth try-on system
that utilizes CP-ViTon, a state-of-the-art algorithm. CP-ViTon
In recent years, virtual trial rooms have emerged as a is a deep learning-based algorithm that generates a 3D cloth
promising solution to bridge the gap between the online mesh from a 2D input image and maps it onto the user’s body,
and offline shopping experiences. This research paper creating a realistic virtual try-on experience. The proposed
explores the concept of virtual trial rooms, their system uses a camera and monitor to capture the user’s image
underlying technologies, and their impact on the retail and display the virtual try-on in real-time, providing a user-
industry. friendly and interactive shopping experience.

Keywords:- Virtual Try-On, Cloth Transfer, Real-Time Image, The primary objective of this project is to create an
Pose Estimation, Semantic Generation, Cloth Warping, efficient and effective virtual try-on system that seamlessly
Content Fusion. integrates into the shopping experience for customers. To
accomplish this, a series of experiments and evaluations,
I. INTRODUCTION including user studies and performance metrics, will be
conducted to demon- strate the feasibility and practicality of
Over the years, the fashion industry has undergone a the proposed system. To provide a comprehensive
signifi- cant transformation, with advancements in technology understanding of the project, the report is divided into five
playing a pivotal role in revolutionizing the way people shop sections. Section II provides a literature review of virtual try-
for clothes. One of the significant changes has been the on systems and related work. Section III explains the
growth of online shopping, which has enabled consumers to methodology and implementation of the proposed system,
pur- chase their favorite clothes from the comfort of their including the software and hardware used. Section IV
homes. However, the inability to try on clothes virtually has describes the experimental results and evaluation of the
been a hindrance for the industry. This has led to many system, including user feedback and system performance
customers hesitating to buy clothes online since they cannot metrics. Finally, Section V concludes the report and discusses
physically try them on. To address this issue, virtual trial future work, including potential improvements to the system
rooms have emerged as a solution that offers a realistic and potential applications in the fashion industry. This project
experience of trying on clothes virtually from one’s own presents a real-time image-based virtual cloth try-on system

IJISRT23MAY1372 www.ijisrt.com 2267


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
that utilizes CP-ViTon, an advanced algorithm for virtual cloth The Spatial Transformation Network (STN) used in the
try-on. CP-ViTon employs deep learning to generate a 3D Cloth Warping Module is based on the Thin Plate Spline (TPS)
cloth mesh from a 2D input image and map it onto the transformation model, which provides smooth and natural-
user’s body, creating a realistic virtual try-on experience. looking deformations of the cloth. This module learns to
predict the necessary transformation parameters based on the
Thus, this virtual trial room software may significantly input image and pose representation, allowing the target cloth
alter the way people purchase today. People don’t need to be to be warped to fit the user’s pose in a natural and visually
afraid of hidden cameras or stand in line in front of the trial pleasing way.
room for hours to check out their clothes. Because using this
only takes a few seconds, people can quickly change their To train the cloth warping module, we use a
attire or try on different clothes. Here, the user saves a combination of synthetic and real-world data. The synthetic
significant amount of time and exertion. There are other data consists of 3D models of human bodies and clothing
similar applications using sensors, such as a Kinect sensor. items, which are used to generate realistic training examples
The applications are able to capture the user’s data in 3D. with ground- truth warpings. The real-world data is
However, this is not economical,as these sensors cost a large obtained from a set of images with labelled cloth masks and
amount of money. To remedy this, our application uses a corresponding body poses, which are used to fine-tune the
laptop webcam to collect the input image. Given two input module and improve its accuracy on real-world examples.
images, one of the user and another of the cloth to be tried
on, we have developed our pipeline to generate a new image The Cloth Warping Module also includes a texture
that meets a few requirements, namely a). The generated synthesis component, which is used to generate a texture for
image is one of the users wearing the new clothing. b). The the warped cloth that matches the characteristics of the
image generated maintains both the pose as well as any original target cloth. This is done by extracting the texture
characteristics which were present in the cloth such as features from the target cloth using a pre-trained style transfer
logos, text, graphics etc. network and then applying these features to the warped cloth
using an adaptive instance normalization layer. The resulting
II. METHODOLOGY texture is then blended with the original user image to create a
photo- realistic image of the user wearing the target cloth.
The Human Parsing module is an essential part of the
virtual trial room system, and it plays a crucial role in the The proposed methodology utilizes a normal web camera
accurate fitting of clothes onto the user’s image. It is as the input device. The webcam captures real-time images of
similar to the pose estimation module in the sense that it theuser, which are then processed through the various modules
operates on the user’s image to identify the different body described below. This ensures a user-friendly and accessible
parts, but it goes a step further by segmenting each body part solution, as no specialized sensors or equipment are required.
and assigning it a different color. This makes it easier to
identify each body part and extract the relevant information, A. Keypoint Extraction
such as the body mask, hair, and head. The keypoint extraction module utilizes a Convolutional
Neural Network (CNN) architecture. The CNN is trained
To achieve the segmentation of body parts, we use a on a large dataset containing images annotated with labeled
Joint Body Part parser, which is a state-of-the-art network keypoints. The architecture of the CNN is designed to capture
used for this purpose. The JPPNet is a fully convolutional both low-level features, such as edges and corners, and high-
neural network that takes the input image and produces a level features that are discriminative for keypoints.
pixel-level labeling of different body parts. The network is
designed to work on images of different resolutions, and it can To train the Keypoint Extraction module, a large dataset
handle various deformations, such as scaling and rotation, containing labeled images with annotated keypoints is utilized.
which are essential for accurate segmentation. This dataset serves as the ground truth for training the CNN,
allowing it to learn the relationship between image features
To train the JPPNet, we use the LIP (Look Into Person) and keypoint locations. For this, we have used the Common
dataset, which contains labeled data on the different segmented Objects in Context (COCO) dataset.
body parts. The dataset consists of over 50,000 images of
people in different poses, lighting conditions, and clothing, During the training phase, the CNN learns to recognize
making it ideal for training the network to handle different patterns and features that correspond to keypoints, allowing
scenarios. The labeled data in the LIP dataset provides the it to predict the presence and location of keypoints on unseen
ground truth for the network, allowing it to learn to segment images. The labeled dataset provides ground truth information
the body parts accurately. for training the CNN, allowing it to learn the relationship
between image features and keypoint locations.

IJISRT23MAY1372 www.ijisrt.com 2268


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
In the inference phase, the user’s image is fed into the C. Clothes Warping
pre- trained CNN, and the network generates a confidence The next step is the Cloth Warping Module, in which
map. This confidence map is a 2D representation that high- level features are extracted from both the target
assigns a confidence value to each pixel, indicating the cloth andthe user’s image with the body mask. These features
likelihood of that pixel being a keypoint. Higher confidence capture essential characteristics and intricate details of the
values indicate a higher probability of a keypoint being clothing item as well as the user’s body shape.
present at that location. Additionally, the keypoint extraction
module also produces part affinity fields. Part affinity fields The extracted features from the target cloth and the
encode the spatial relationships between keypoints by user’s image are combined using a correlation layer. This
representing pairwise con- nections between different body correlation layer facilitates the measurement of similarity or
parts. For example, a part affinity field might represent the correspondence between the features, establishing a
connection between the neck and the shoulders. These part relationship between the clothing item and the user’s body. By
affinity fields provide additional information about the pose leveraging this correlation information, the module gains an
and structure of the user’s body. understanding of how the cloth and the body parts interact and
enables accurate deformation.
By utilizing the confidence map and part affinity fields,
the keypoint extraction module provides a precise localization To predict the spatial transformation parameters required
of keypoints on the user’s image. These keypoints serve as es- for cloth warping, the correlated features are fed into a network
sential landmarks for subsequent modules in the methodology, specifically designed for parameter prediction. This network
allowing for accurate pose estimation, human parsing, and learns to estimate the parameters that govern the
cloth warping. deformation of the cloth to match the user’s pose.
Techniques such as regression models, geometric
B. Human Parsing transformations, or spatial transformer networks (STNs) are
To achieve accurate human parsing, a specialized commonly employed to accurately predict these spatial
network architecture called the Joint Body Part parser is transformation parameters.
employed. One commonly used architecture is the JPPNet
(Joint Human Pars- ing and Pose Estimation Network). The With the predicted parameters, the Cloth Warping
JPPNet is designed to simultaneously perform human parsing Module utilizes the Thin Plate Splines (TPS) warping
and pose estimation tasks, making it well-suited for this technique to deform the target cloth. TPS warping enables
module’s purpose. smooth and flexible cloth deformation by considering both
local and global deformations. It allows the cloth to adapt to
During the training phase, the JPPNet is trained on a the user’s pose while preserving its overall shape and
large dataset, in our case the LIP (Look Into Person) dataset. structure. This end-to- end learnable network ensures that the
The LIP dataset contains images of people labeled with pixel- cloth conforms closely to the user’s body mask, resulting in a
level annotations for various body parts. This dataset enables visually convincing virtual try-on experience.
the network to learn the relationships between image features
and body part segments, enabling precise human parsing. In line with CP-VTON, we utilize the Loss Function
LGMM to estimate the GMM parameters. This loss function
In the inference phase, the user’s image is inputted into measures the L1 distance between the estimated warped cloth
the trained JPPNet. The network processes the image and c and the target cloth ct. Mathematically, it can be expressed
outputs a pixel-level labeling of different body parts, as:
generating a segmentation map. Each pixel in the segmentation
map is assigned a label corresponding to a specific body where LGMM represents the loss function for the
part, such as the upper body, lower body, arms, legs, and so GMM, c represents the estimated warped cloth, T(c) represents
on. the cloth warped using the spatial transformation parameters,
and ct represents the target cloth.
By segmenting the user’s body into different parts, the
Human Parsing module provides crucial information for sub- By minimizing the LGMM loss function, we ensure
sequent steps. For example, the body mask segment is used that the warped cloth closely matches the target cloth in
in the cloth warping module to accurately deform and fit the terms of appearance and texture. This step is crucial for
target cloth to the user’s pose. Other body parts segments, generating realistic and visually accurate virtual try-on
such as hair or head, can be further utilized for enhancing the results.
realism of the try-on result.
Finally, the transformed and warped target cloth are inte-
grated with the user’s image in the fusion module. This output
closely follows the contours of the user’s body mask and
accurately aligns with the pose.

IJISRT23MAY1372 www.ijisrt.com 2269


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
III. IMPLEMENTATION Overall, the virtual trial room system demonstrated
promising results and generated positive feedback from users.
This section covers the implementation of the project Here are some key points from the results and discussions:
carried out in five different modules. 1. Realistic visualization: Users reported that the virtual
clothing items appeared quite realistic when overlaid on
 Front End UI their bodies. The system effectively accounted for different
We build the web app using React and NodeJS with body sizes, shapes, and movements, making the virtual try-
MongoDB as the database being used. We also use material-ui on ex- perience more authentic.
to help with the icons etc. 2. Improved decision-making: The virtual trial room system
aided users in making more informed purchasing decisions.
 Pose Estimation By trying on clothes virtually, users could assess how the
Using the image taken as input,the key point of the garments fit and suit their style without the need for
body parts and target the clothes. we extracted in the physical trials. This feature was beneficial for online
prepossessing, we run them through our semantic shoppers, who often face challenges with size and fit.
Generation Module. This module is to extract the mask of 3. Enhanced convenience: Participants appreciated the con-
the target image’s cloth and passes it onto the Warping venience of being able to try on multiple outfits virtually in
module. We use two conditional GANs(generative adversarial thecomfort of their homes. They no longer needed to spend
network), one to create a synthesized body part mask and time traveling to physical stores or dealing with crowded
the second to usethat as well as the target cloth and create a changing rooms. The system provided a time-saving and
clothing mask. hassle-freealternative to traditional shopping experiences.
4. Accuracy of sizing and fit: The accuracy of sizing and fit
 Semantic Generation was one aspect that received mixed feedback. While the
Using the image taken as input the key of the body parts system generally performed well in determining accurate
and target the clothes.Point we extracted in the sizing, some users reported minor discrepancies in fit
prepossessing, we run them through our semantic Generation when compared to physical trials. This discrepancy could
Module. This module is to extract the mask of the target be attributed to variations in body measurements, fabric
image’s cloth and passes it onto the Warping module. We use characteristics, and the limitations of current AR
two conditional GANs(gen, one to create a synthesized technology.
body part mask andthe second to use that as well as the target 5. Limited clothing options: The virtual trial room system had
cloth and create a clothing mask. a limited range of clothing items available for try-on.
Participants expressed a desire for a wider selection of
 Clothes Warping clothes to cater to different styles, preferences, and
In the clothes-warping module, we use a special occasions. Expand- ing the catalog of virtual garments
transformation matrix to warp the module to fit our masked would greatly enhance the system’s usability and appeal
image. to a broader user base.
6. Technical performance: Users noted that the system
 Non-Target required a stable internet connection and a device
We use the Non-Target body composition module to capable of running the AR application smoothly. Some
obtain the composited body mask of the image. Using the participants encountered occasional glitches, such as
original clothing mask, the synthesized clothing mask, the delays in render- ing the virtual clothing or tracking
body part mask, and the synthesized body part mask, we can issues. These technical challenges should be addressed to
obtain this. ensure a seamless and immersive virtual try-on experience.
Using the mask, the program generated a cloth and
IV. RESULTS AND DISCUSSIONS warped it over the mask. This is then overlapped over
the user. To deal with the task of pose estimation, we
The virtual trial room system was implemented to take a captured frame of the user. Using this frame, we
provide users with a realistic and convenient way to try on extract the height and width of the image, as different
clothes virtually. The system utilized augmented reality systems have webcams that take different-resolution
technology tooverlay virtual clothing items onto the user’s live images.
video feed, allowing them to see how the clothes would look
on their own bodies. When we first start running the program, it detects the
key points of different parts of the user’s body. The mask of
the cloth worn by the user is also obtained.

IJISRT23MAY1372 www.ijisrt.com 2270


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The result can be seen in the figure below. V. USER EXPERIENCE

The following are some of the user’s experiences:

Fig 1: User standing in front of the system and system


detecting the pin points of the body part Fig 4: User Experience 1

Fig 5: User Experience 2

The pin points of the users left hand were not taken
because of the systems limitation. According to the system, the
position of the hand should be proper and here the left hand
is coinciding the body.

Fig 2: The targeted cloth and the super imposed cloth on


users body

Fig 6: User Experience 3

The pin points of the users lower hand were not taken
because of the systems limitation. According to the system,
Fig 3: The input, targeted cloth and the output
the background should be proper and here the background is
cliche.

In conclusion, the virtual trial room system showcased


positive outcomes, providing users with realistic visualization,
improved decision-making, and enhanced convenience. How-
ever, addressing areas such as accuracy of sizing and fit, ex-
panding clothing options, and refining technical performance
will contribute to further advancements and wider adoption of
virtual trial room technology in the future.

IJISRT23MAY1372 www.ijisrt.com 2271


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
VI. SUMMARY to accurately replicate. Further improvements in image
processing and deep learning algorithms are necessary to
This research paper presents an image-based virtual try- address these limitations. In conclusion, the research
on system called the Virtual Trial Room. The system aims to demonstrates the effectiveness of an image-based virtual try-
enhance the online shopping experience by allowing users to on system, the Virtual Trial Room, in providing users with a
virtually try on clothing items using their own images. By realistic and convenient alternative to physical trials. While
leveraging computer vision and image processing techniques, challenges exist in handling low-resolution images and
the system provides users with a realistic representation of complex clothing items, further advancements in image
how the garments would look on their bodies, enabling them processing and deep learning algo- rithms can address these
to make informed purchasing decisions without physical trials. limitations, making image-based virtual try-on systems a
This summary provides an overview of the objectives, method- valuable tool for the fashion industry. The proposed virtual
ology, key findings, and implications of the research. trial room system aims to provide a cost-effective and user-
friendly solution for virtual cloth try- on. The system utilizes
The primary objective of the study was to develop an a real-time image-based approach, eliminating the need for
image-based virtual try-on system that accurately overlays expensive 3D sensors. With the use of OpenPose and Open-
virtual clothing onto user-provided images. The methodology CV libraries for pose estimation and human parsing, the
involved collecting a diverse data set of clothing items, cap- system accurately identifies the user’s body parts and
turing images from various angles, and extracting key features clothing items. The CP-VTON algorithm is utilized to
such as texture, shape, and colour. Deep learning algorithms generate and warp the target cloth, while the GANs
were employed to train a model capable of generating realistic architecture enhances the generated image’s visual quality.
virtual try-on results.
ACKNOWLEDGEMENT
The research findings highlight the effectiveness and
usability of the Virtual Trial Room system. Users reported that We would like to express our sincere gratitude to all
the virtual clothing items appeared visually realistic when those who have contributed to the successful completion of
overlaid on their own images, providing an accurate this research paper on virtual trial rooms. Their support,
representation of fit and style. The system demonstrated the guidance, and assistance have been invaluable throughout this
ability to handle variations in body sizes and poses, journey, and we are truly grateful for their contributions.
accommodating different user preferences. Users expressed
satisfaction with the convenience and time-saving aspect of First and foremost, we would like to thank our guide, Dr
the virtual try-on experience, as it eliminated the need for Uma D, for her unwavering support and invaluable guidance.
physical trials and visits to brick-and-mortar stores. Her expertise, insightful feedback, and constant encourage-
ment have been instrumental in shaping and refining our
The Virtual Trial Room system has several advantages research. Her commitment to excellence and dedication to our
over existing virtual try-on solutions. It provides a hassle-free project has been truly inspiring.
way to try on clothes from the comfort of one’s home without
the need to visit a physical store. The system is also cost- We would also like to extend our heartfelt appreciation
effective, eliminating the need for expensive 3D sensors. to the members of our research team for their hard work and
Moreover, it provides a realistic and visually appealing image commitment. Each team member played a crucial role in the
of the user wearing the target cloth, with fine details and success of this project, contributing their unique skills and
perceptual quality. expertise. The collaborative environment fostered within the
team greatly enhanced the quality of our research, and we are
The implications of this research are significant for the grateful for their valuable contributions.
fashion and e-commerce industries. The Virtual Trial Room
system offers a viable solution to bridge the gap between Furthermore, we express our gratitude to the participants
online shopping and physical try-ons, providing users with who volunteered their time and provided valuable insights for
a realistic and personalized virtual try-on experience. It has our research. Their willingness to engage in the virtual trial
the potential to increase customer confidence, reduce return room experience and provide feedback was essential to the
rates, and enhance customer satisfaction in the online shopping development and evaluation of our system. We appreciate their
process. involvement and willingness to contribute to the advancement
of technology in this domain.
However, some limitations were identified. The
system’s performance was highly dependent on the quality Lastly, we are thankful to our friends and family for
and resolu- tion of the user-provided images. Low-resolution their unwavering support, understanding, and encouragement
or distorted images resulted in less accurate virtual try-on throughout this research journey. Their belief in our abilities
outcomes. Addi- tionally, certain clothing items with intricate and constant motivation provided the strength and inspiration
details or complextextures presented challenges for the system needed to overcome challenges and complete this paper.

IJISRT23MAY1372 www.ijisrt.com 2272


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

In conclusion, we extend our sincere appreciation to


all individuals and organizations who have contributed to this
research on virtual trial room. Their collective efforts have
enriched our work and have paved the way for further ad-
vancements in this field. We are truly grateful for their support,
and we hope that this research will contribute to the ongoing
progress in virtual trial room technology.

REFERENCES

[1]. Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu,


Wangmeng Zuo, Ping Luo “Towards Photo-Realistic
Virtual Try-On by Adaptively Generating Preserving
Image Content” arXiv:2003.05863
[2]. VITON: An Image-based Virtual Try-on Network
[3]. Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu and Larry
S. Davis, “VITON: An Image-based Virtual Try-on
Network”, pp. 1-19,
[4]. https://cocodataset.org/keypoints-2020
[5]. Dr. Anthony L.Brooks 7 Dr. Evapetersson Brooks
(2014), “Towards and Insluive Virtual Dressing Room
for Wheelchair- Bound Customers”, International
Conference on Collaboration Technologies and Systems
(CTS), Pp. 582–589.
[6]. ShreyaKamani, Neel Vasa, KritiSrivastava, ”Virtual trial
room using augmented reality”, International Journal of
Advanced Computer Technology (IJACT), Vol. 3/6,
Dec. 2014, pp. 98-102.
[7]. Nikki Singh, SagarMurade, Prem Lone,
VikasMulaje”Virtual Trial Room”Vishwakarma Journal
of Engineering Research ,Volume 1 Issue 4, December
2017
[8]. SaurabhBotre, SushantChaudhari, ShamlaMantri,
”Virtual Trial Room”, International Journal of Computer
Science Trends and Technology (IJCST), Volume 2
Issue 2, Mar-Apr 2014
[9]. Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and
Larry S. Davis. Viton: An image-based virtual try-on
network. CVPR, pages 7543–7552, 2018. 1, 2, 3, 4
[10]. Ignacio Rocco, Relja Arandjelovic, and Josef Sivic.
Convolutional neural network architecture for geometric
matching. In CVPR, pages 6148–6157, 2017. 2
[11]. Tim Salimans, Ian Goodfellow, Wojciech Zaremba,
Vicki Cheung, Alec Radford, and Xi Chen. Improved
techniques for training gans. In NeurIPS, pages 2234–
2242, 2016. 3, 4
[12]. Bochao Wang, Hongwei Zhang, Xiaodan Liang, Yimin
Chen, Liang Lin, and Meng Yang. Toward characteristic-
preserving image-based virtual try-on network. In
ECCV, 2018. 1, 2, 3, 4

IJISRT23MAY1372 www.ijisrt.com 2273

You might also like