Full Text
Full Text
April 2023
Sunita R. Patil
2Department of Computer Engineering, K.J. Somaiya Institute of Engineering & Information Technology,
Mumbai, Maharashtra, India, [email protected]
Bhoir, Smita V. and Patil, Sunita R., "The FASHION Visual Search using Deep Learning Approach" (2023).
Library Philosophy and Practice (e-journal). 7569.
https://digitalcommons.unl.edu/libphilprac/7569
The FASHION Visual Search using Deep Learning Approach
Keywords: E-commerce; Visual Search; Deep Fashion Convolution Neural Network; Fashion
Classification;
Abstract. In recent years, the World Wide Web (WWW) has established itself as a popular source of
information. Using an effective approach to investigate the vast amount of information available on
the internet is essential if we are to make the most of the resources available. Visual data cannot be
indexed using text-based indexing algorithms because it is significantly larger and more complex than
text. Content-Based Image Retrieval, as a result, has gained widespread attention among the scientific
community (CBIR). Input into a CBIR system that is dependent on visible features of the user's input
image at a low level is difficult for the user to formulate, especially when the system is reliant on
visible features at a low level because it is difficult for the user to formulate. In addition, the system
does not produce adequate results. To improve task performance, the CBIR system heavily relies on
research into effective feature representations and appropriate similarity measures, both of which are
currently being conducted. In particular, the semantic chasm that exists between low-level pixels in
images and high-level semantics as interpreted by humans has been identified as the root cause of the
issue. There are two potentially difficult issues that the e-commerce industry is currently dealing with,
and the study at hand addresses them. First, handling manual labeling of products as well as second
uploading product photographs to the platform for sale are two issues that merchants must contend
with. Consequently, it does not appear in the search results as a result of misclassifications. Moreover,
customers who don't know the exact keywords but only have a general idea of what they want to buy
may encounter a bottleneck when placing their orders. By allowing buyers to click on a picture of an
object and search for related products without having to type anything in, an image-based search
algorithm has the potential to unlock the full potential of e-commerce and allow it to reach its full
potential. Inspired by the current success of deep learning methods for computer vision applications,
we set out to test a cutting-edge deep learning method known as the Convolutional Neural Network
(CNN) for investigating feature representations and similarity measures. We were motivated to do so
by the current success of deep learning methods for computer vision applications (CV). According to
the experimental results presented in this study, a deep machine learning approach can be used to
address these issues effectively. In this study, a proposed Deep Fashion Convolution Neural Network
(DFCNN) model that takes advantage of transfer learning features is used to classify fashion products
and predict their performance. The experimental results for image-based search reveal improved
performance for the performance parameters that were evaluated.
1. Introduction
The fashion industry is expanding at a rapid pace in both the United States and internationally.
In comparison to 2020, the domestic fashion market is predicted to grow by 8.8 percent in 2024,
reaching USD (United States dollars) $26.288 million in value. [1][2]. The apparel and fashion
online market are expected to reach USD 113 million in 2018, and it is growing at a rate of 18
percent per year, with mobile shopping accounting for approximately 40% of the total.
The fashion industry also includes accessories such as purses, shoes, and jewellery, which are
categorized as such. The automation, development, and digitization of manufacturing and
distribution have resulted in a new paradigm for the fashion industry. As the fashion industry
becomes more digital, the system evaluates customer fashion desires and attempts to match
those desires as quickly as possible. As a result, digital technology in the fashion industry is
attracting the attention of a diverse range of consumers who, as a result of the shorter
manufacturing cycle, have access to a greater variety of products. Customers' use of smart
gadgets has increased to more than 10 million as a result of the introduction of a new e -
commerce platform [3]. Traditional garment sales channels such as brick-and-mortar stores
have seen sales decline, whereas non-store retailers such as online and mobile merchants have
seen sales increase at the fastest rate. When compared to previous years, the online f ashion
market increased by 16 percent per year from 2011 to 2016, and from January to September
2017, the market increased by 20 percent over the previous year [4]. Eventually, it is expected
to become a major channel in the fashion industry in the coming years.
Since it provides a hassle-free shopping experience and delivery to the user, e-commerce has
transformed the world of consumerism and triggered an increase in demand for goods. Because there
is a greater selection of fashion products available on e-commerce platforms than in traditional retail
stores, shopping for fashion apparel on e-commerce platforms differs significantly from shopping in
traditional retail stores. As a result, a system that more efficiently assists consumers in searching
for the goods they desire and recommending the desired product is becoming increasingly
important to businesses. Therefore, if you want to communicate effectively in the fashion
industry, you cannot rely solely on verbal communication because visual cues and desig n
elements are so important. Although the existing fashion retrieval system employs a text -based
retrieval approach that is based on product attribute information (product name, category, brand
name, and so on), the system is not without its flaws (product name, category, brand name,
etc.). A text-based search strategy has limitations when it comes to providing appropriate search
results in the fashion industry, which includes a significant design component. Numerous
"shopping how" and "smart lens" systems have been developed in recent years, but they have
all failed to deliver good or noteworthy results when searching for products using fashion
images as a starting point. Users must either find an image of the goods online or take their
photographs to use these systems.
Two major issues affect the industry, which is discussed from both the seller's and buyer's
perspectives in this study [5]. An e-commerce site requires sellers to upload photos of their
products as well as relevant labels to the product's description for the product to be sold.
Because of the involvement of humans, this process is prone to errors. Product misclassification
can cause products to be missed in search results, resulting in lower or no sales for the company
that makes them. By using machine learning models, the photographs can be classified with
high accuracy, which in turn motivates the vendors to categorize them correctly. In addition, a
customer's demand may be delayed because he does not understand the correct terminology [5].
When a customer is shopping on an e-commerce website, he or she typically enters the product's
keywords into the search bar. Its search algorithm compares the keywords entered by the user
with the product labels stored in its database, and it returns results that are relevant to the user's
inquiry. When the user locates the product he or she is looking for, the user orders it directly
from the search results page. Using a text-based search necessitates that the consumer has a
thorough understanding of the product and is aware of the terms that should be entered into the
search toolbar. It's important to note that this isn't always the case. We come into contact with
a wide variety of things in our daily lives that we are completely unaware of. Occasionally, w e
are unable to conduct a product search on the e-commerce website due to technical difficulties.
By employing visual search techniques, you can overcome this difficulty. Shoppers who
conduct visual searches look for products that include images or other v isual cues rather than
products that contain keywords. Customers can search for similar products by simply taking a
picture of what they want and uploading it to the visual search engine.
Any visual search algorithm can benefit from machine learning models, which can be used to learn
attributes about new photographs and search for comparable products. To provide more relevant
search results, additional images with similar features will be added to the visual search engine once
the target image has been uploaded to make the search results more relevant as well. It is possible to
generate images' latent properties using autoencoders and other visual search methods, for example.
To further enhance the retrieval of image embedding features, deep neural network models that
have already been trained can be applied to the problem [4][6][7].
A large number of images from all over the Internet are available to users through social media
and respectable smartphones, which means that users have access to a large number of images
from all over the Internet at once. In these circumstances, the ability to search for, filter, and
organize photos becomes increasingly important. Manually searching for the images you want
is an option if you have a small number of images. As the number of things grows, this becomes
impossible to accomplish. To keep up with this rapid expansion, it is necessary to develop
picture retrieval systems that are capable of operating across a wide range of platforms. It is
our goal to develop a retrieval system that is accurate in handling and querying a database of
photographs [8][9].
To summarize, this paper will pursue three broad objectives:
Image Labeling: In E-commerce, while purchasing products appropriate labeling of products is
important. Many images over the web are unlabeled [10].
Image Classification: To design and train different neural network models to learn from large
sets of images of products from an e-commerce website [11].
Image Search: To use autoencoders and cosine similarity to identify similar images [12][13].
2. Transfer Learning
Deep learning [14] is a machine learning method that belongs to a large family of machine
learning methods. [15] Deep learning classifiers, as opposed to traditional neural network
classifiers, develop classifiers with numerous hidden layers to identify the salient low -level
features of a picture, as opposed traditional neural network classifiers. Concerning Deep
Learning, a technique known as transfer learning makes use of an artificial neural network's
ability to use features learned from a previous problems to solve a new problem within the same
domain. It is advantageous to learn through transfer[16][17][18] for a variety of reasons.
• First, it saves computational time by reusing information from a previous training
process rather than starting from scratch with a new model, rather than starting from
scratch with a new model [16].
• The second advantage is that it builds on the knowledge and experience gained through
the use of previous models [17].
• The third point to mention is that when the new training dataset is small, transfer learning
is extremely beneficial [18].
Transfer learning has the potential to benefit a wide range of applications, including computer
vision, audio categorization, and natural language processing. Many attempts have been made to
automate the classification of images, either to speed up the process or to improve accuracy. To
solve the picture categorization problem, the convolutional neural network (CNN) was
developed as one of the first approaches. Because of the work of Krizhevsky et al.[19], CNN
was able to outperform all other methods for solving the picture classification problem by the
year 2012. They achieved state-of-the-art performance in the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) competition, outperforming other commonly used machine
learning algorithms and achieving superior results. In all of these scenarios, training the weights
of the deep network from scratch requires a significant amount of time and a large amount of
data (hundreds of thousands of images). The presence of these constraints makes deep learning
algorithms extremely difficult to implement in the domain of image data, where there are
frequently only a few images available [20]. A significant amount of time and knowledge is
required to annotate fashion images; this is where transfer learning may be beneficial to
students. Utilizing an architecture that has already been trained eliminates the need to start from
scratch. According to several studies, CNN has been used to identify fashion images, either
through transfer learning or the introduction of novel architectural structures. According to [24],
deep CNN-based fashion image categorization models are presented in [21–35] of the literature
reviewed. Some critical open questions are presented here to improve the utilization of transfer
learning in the fashion industry.
Table I: Top 5 accuracy, top 1 accuracy, and the number of parameters of AlexNet, VGG,
Inception, and ResNet in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
challenge.
Architecture Number of Top-5 Top-10
Parameters Accuracy Accuracy
Xception 22,910,480 84.60% 63.30%
VGG-19 138,357,544 91.90% 74.40%
GoogLeNet 23,000,000 92.2% 74.80%
ResNet‐152 25,000,000 94.29% 78.57%
DenseNet 8,062,504 93.34% 76.39%
AlexNet 62,378,344 94.50% 79.00%
Turtle Check
159 Topwe Navy 201 Cas
0 Men Apparel Shirts Fall Men Navy
70 ar Blue 1.0 ual
Blue Shirt
Peter
393 Bottom Sum 201 Cas England Men
1 Men Apparel J eans Blue
86 wear mer 2.0 ual Party Blue
J eans
Titan
592 Wo Accessori Watche Watch Wint 201 Cas
2 Silver Women
63 men es s es er 6.0 ual
Silver Watch
Manchester
213 Bottom Track 201 Cas United Men
3 Men Apparel Black Fall
79 wear Pants 1.0 ual Solid Black
Track Pants
Fabindia
308 Topwe Sum 201 Eth
6 Men Apparel Shirts Green Men Striped
05 ar mer 2.0 nic
Green Shirt
J ealous 21
269 Wo Topwe Sum 201 Cas
7 Apparel Shirts Purple Women
60 men ar mer 2.0 ual
Purple Shirt
Puma Men
291 Accessori Navy Sum 201 Cas
8 Men Socks Socks Pack of 3
14 es Blue mer 2.0 ual
Socks
f) Auto Labelling
To allow automatic labeling (assign unique values), the model must be enabled to
understand what is depicted in the picture when information is being processed. It has
to be trained to know which tag should be attached to each data unit based on article
type.
g) Normalization
It is a method often used in machine learning data preparation. The objective of
normalization is to modify the numerical column values of the dataset to a similar scale
without distorting the values or missing information.
h) Creating a List with Unique Value
Created a list with unique values and added a column with their specified element
position. Store the Labeled dataset for training the model.
• We create an augmented data frame with a file name and also mapping with product id.
• We also perform one of the important tasks called labeling, we assign the unique id to
each article Type.
• Gray Scaling is the process to Convert the colored images into grayscale images.
Normalization is a technique that is commonly used in data processing for machine learning.
The goal of normalization is to transform the values of the dataset's numeric columns to
comparable scales without distorting ranges of values or losing information. Figure 5 shows
sample output images.
Accuracy = TP + TN (3)
TP + TN + FP + FN
When it comes to precision, it's defined as the number of true positives divided by th e sum
of true positives and false positives. This measure is concerned with correctness, i.e., it assesses
the algorithm's ability to predict the future. Precision refers to how "precise" the model is in
predicting positive outcomes and how many of those predictions are correct.
Precision = TP (4)
TP + FP
It is determined by the harmonic mean precision and recall, which is called the F-score. It is
primarily concerned with the analysis of positive classes. This metric has a high value, which
indicates that the model performs better when compared to the negative class.
F-score =2*Precision * Recall (5)
Precision + Recall
We plot the accuracy evolution graph, Training and validation accuracy graph, confusion
matrix, and model accuracy graph. We also display 3-4 results on internal images and 5-6 results
on external images based on different classes (shirts, T-shirts, watches, sarees, jeans, jewelry,
etc.). A confusion matrix is a table i.e. commonly applied to define the perfor mance of a
classification model (or "classifier") onset of known test data. The confusion matrix itself is
simple to understand, but the verbiage connected with it may be confusing.
A. Overall Performance Analysis
The performance analysis of Master Category, ArticleType, SubCategory, and MasterGender
Category is given in Table II, Table III, Table IV, and Table V.
Table II. Master Category based Classification
Category Total Splitting Training Validati No. of
dataset Accuracy on epochs
classes
Accuracy
masterCategory 7 80:20 0.99 0.98 50
masterCategory 7 70:30 0.99 0.97 50
masterCategory 7 50:50 0.99 0.96 50
Table III. SubCategory based Classification
Category Total Splitting Training Validation No. of
classes dataset Acc Acc epochs
subCategory 45 80:20 0.99 0.90 50
subCategory 45 70:30 0.99 0.93 50
subCategory 45 50:50 0.99 0.91 50
Apparel 21397
Accessories 11274
Footwear 9219
Personal Care 2403
Free Items 105
Sporting Goods 25
Home 1
• Prediction Results
The results of the predictions are given in Figure 11. PC indicates predicted class and TC
indicates True Class of product.
op 5 Res lts
xter al ma e as p t
Figure 18: Search results for External image or photograph given as input
Acknowledgments
Ms. Smita Bhoir
M.E. Computer Engineering, PhD (Pursuing). She is pursuing her PhD from K.J. Somaiya College
of Engineering, Mumbai , India. She is having 12 years of R & D experience. Published 15 research
papers, delivered 45 technical talks, organized 40workshops and training programs. Her area of
specialization is Data Science, Deep Web Mining, Wireless Networks.
Dr. Sunita Patil
PhD. Computer Engineeering .She is working as Vice Principal, Dean Academics, Professor,
Department of Computer Engineering, K.J. Somaiya IEIT, Sion, Mumbai, India. She is having more
than 20 years of R & D experience. She has published 30 research papers, organized more than 50
workshops, delivered more than 50 technical talks. Her area of specialization is Data Mining.
REFERENCES