Fashion Recommendation System Using Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Proceedings of the Fourth International Conference on Smart Electronics and Communication (ICOSEC-2023)

IEEE Xplore Part Number : CFP23V90-ART ; ISBN : 979-8-3503-0088-8

Fashion Recommendation System using Machine


Learning
Lingala Sivaranjani Sandeep Kumar Rachamadugu B.V. Suresh Reddy
Assistant Professor, school of Assistant Professor, Dep.of CSE, Dept. of. CSE, Koneru Lakshmaiah
computing, Mohan babu university G.Pulla Reddy Engineering College, Education Foundation,Vaddeswaram,
Tirupati, India Kurnool, A.P, India A.P, India
[email protected] [email protected] [email protected]
2023 4th International Conference on Smart Electronics and Communication (ICOSEC) | 979-8-3503-0088-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICOSEC58147.2023.10275967

Basi reddy A M.Sakthivel Sivakumar Depuru


Assistant Professor, school of Professor, school of computing, Mohan Assistant Professor, school of
computing, Mohan babu university babu university (Erstwhile Sree computing, Mohan babu university
Tirupati, India Vidyanikethan Engineering College), (Erstwhile Sree Vidyanikethan
[email protected] Tirupati, India Engineering College), Tirupati, India
[email protected] [email protected]

Abstract- For the past few years, fast fashion has become challenging for consumers to find items that suit their
very popular, which has had a great impact on the textile and personal style and preferences. The popularity of
fashion industries. Fashion is an integral part of one’s daily personalized fashion recommendations has given rise to the
lives, and it has a significant impact on identity and self- development of fashion recommendation systems that rely
expression. With the increasing availability of digital platforms
and e-commerce websites, the fashion industry has
on machine learning techniques. These systems consider a
transformed, and the way one shops for clothing has evolved. user's previous fashion choices and interests and take into
The rise of e-commerce has brought about a rise in the use of account factors such as datasets and recommendation
fashion recommendation systems. These systems aim to offer algorithms such as CNN and KNN. Fashion
personalized product recommendations to users according to recommendation systems are gaining traction in the fashion
their interests and preferences. The field of machine learning industry due to their ability to provide customized
has made significant strides in the development of image suggestions based on user preferences, which are achieved
processing techniques, parsing techniques, image classification, through the use of machine learning and deep learning
segmentation, and networking, making it an ideal candidate techniques such as CNNs and KNN. CNNs are particularly
for powering these recommendation systems. With these
advancements in technology, the potential for fashion
useful in fashion recommendation systems due to their
recommendation systems to provide even more accurate and image recognition capabilities, which allow for the
individualized product recommendations is substantial. In this extraction of visual information from fashion images. This
study, the dataset is used and the main technique is to use information can be used to identify similar items based on
machine learning by splitting into training and testing data of visual characteristics such as style, pattern, and colour.
the dataset. The convolutional neural network is used to Fashion recommendation systems using CNNs have the
produce similar items in the recommendation system. The potential to improve customer satisfaction and increase sales
CNN layers are upgraded by using RESNET50 and the for fashion retailers. Overall, fashion recommendation
filtering content is based on data provided for the product. The systems using CNN are a promising area of research that has
RESNET50 helps in overcome the problem of vanishing
gradient. Then KNN algorithm is used to recommend similar
the potential to revolutionize the fashion industry by
items. The main idea behind KNN is Euclidean distance and providing personalized recommendations to consumers.
Cosine Similarity which helps in producing similar products. These systems can be used to recommend clothing,
The user's past behavior is significant here and the accessories, and other fashion items to online shoppers.
convolutional neural networks (CNN) model was also utilized Clothing is said to have provided a larger significance on
for picture categorization and recognition. The experimental how individuals seem on the exterior and casual
results of the system achieved retrieval accuracy and communication based on preferences, personalities, jobs,
outperformed the baseline. social position, and view on life. Customers can now keep
up with worldwide fashion a trend, which influences their
Keywords—: Deep Learning Model, Image Recognition,
CNN, KNN, RESNET50, Image Dataset.
shopping selections.
Several factors impact consumer fashion interests,
I. INTRODUCTION including geography, personal preferences, societal
Fashion is an ever-evolving industry, given the increase influences, age, gender, season, and culture. By conveying
in growth of e-commerce and online shopping, consumers the visual attributes, combining fashion preferences with the
have access to vast array of clothing and accessories. By above-mentioned aspects of clothing choices may help
wearing unique fashion attire, people's outside appearance marketers better understand consumer preferences. As a
acts as a representation of their interior perceptions. result, researching customer preferences and suggestions is
However, with so many options available, it can be advantageous for both fashion designers and retailers. Also,

979-8-3503-0088-8/23/$31.00 ©2023 IEEE 1367


Authorized licensed use limited to: NAYEMA NASRIN. Downloaded on January 18,2024 at 03:29:31 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Electronics and Communication (ICOSEC-2023)
IEEE Xplore Part Number : CFP23V90-ART ; ISBN : 979-8-3503-0088-8

data on customer preferences for clothing and products is II. LITERATURE SURVEY
now accessible online in the form of text, comments, Many studies have been conducted in recent years to
photographs, and photos. The goal of the recommender determine how to find an accurate method for fashion
system is to present consumers with potential recommendation systems.
complementary items. Unfortunately, this method had
several drawbacks and overlooked other aspects of the Some of the studies include:
product photos, including colors, patterns, textures, and One approach to creating accurate fashion
shapes. You must be specific about each product's recommendation systems is through parametric distance
description in this text-based search because they might not transformations, as demonstrated by McAuley et al. [1].
cover the complete item. Here, the study suggests a potential This method calculates the distance between different
fix base for your search for the appropriate product on the clothing combinations, with the goal of recommending
supplied input image. Through the feature engineering better-fitting combinations that have a smaller distance. The
process, we attempt to train the model for different features, authors also utilized photographs to generate ideas for
such as colors, textures, and object shapes in the image. alternative designs. This approach can be useful for
Convents, or convolutional neural networks, are a type of recommending clothing combinations that are both
artificial neural network that has become increasingly aesthetically pleasing and well-fitting, providing a more
popular in computer vision applications like object personalized experience for the user.
detection, image categorization, and segmentation. They Sattar et al. [2] developed a method for fashion
have been shown to achieve impressive results in various recommendation system that utilizes multi-photo technique
tasks and are designed to learn feature hierarchies directly to evaluate the user's body shape. This approach can create a
from raw input data like images. A typical CNN architecture conditional model of clothing category recommendations
consists of several layers, including the input, convolutional, that cater to the user's specific body shape, leading to more
activation, pooling, fully connected, and output layers, personalized recommendations. The study also showed that
which work together to extract features from the input data. clothing categories and body shapes are related in real-world
In the context of a CNN-based recommendation system, data, and that the multi-photo technique used in their study
these layers are utilized to suggest items to the user. To outperforms models based on single-view shape estimations
address the issue of vanishing gradients often encountered in or manually annotated body types. Overall, this approach
deep neural networks, the RESNET50 architecture can be can be helpful in improving the accuracy and relevance of
implemented. fashion recommendations for individual users.
The RESNET50 is 50 layers residual network which has Hsiao et al. [3] Created an algorithm that assembles a
48 convolution layer and 2 pooling layers. Initially, we enter small selection of fashion goods to create the most mix-and-
the data and use the input layer to pre-process its properties. match ensembles. The program has the capacity to emulate
Each attribute feature vector is created by the embedding professional stylists.
layer, which may be regarded of as extracting features from Q. Wu, P. Zhao, and Z. Cui proposed a Visual and
the pre-processed data. Following the embedding process, Textual Jointly Enhanced Interpretable (VTJEI) model for
the full-connection procedure is used to link attribute fashion recommendation systems [4]. This model leverages
features as well as establish the user feature and item both visual and textual information to provide more accurate
feature. The prediction rating is then calculated using the suggestions and explanations. Additionally, they developed
user and item features. For suggestion, the Top-k goods with a bidirectional two-layer adaptive attention review model
strong prediction ratings but no user ratings are picked. The that can capture users' preferences and provide textual
secret to its success is the use of local relationships and explanations by highlighting specific terms.
weight sharing. On the one hand, it reduces the number of Y. Hu et al. [5] proposed a recommendation system that
weights, which simplifies the process to enhance the recommends groups of fashion items that interact with each
networks. On the contrary, utilizing deep neural networks other, rather than individual items. They used functional
for recommendations may mitigate the risk of over fitting as tensor factorization to model user-fashion item interactions
humans tend to overlook the temporal context of and a gradient boosting-based approach to transfer the
recommendation requests, such as time of day, situational feature vectors from the feature space to a low-dimensional
context, and other temporal factors. Incorporating such latent space.
temporal contextual factors can improve the accuracy and Kang et al. [6] developed a method to generate new
relevance of recommendations by providing a more holistic fashion item photos according to the user's choices and
understanding of the user's preferences and behavior. Deep product category. They used the Generative Adversarial
neural networks are capable of learning these temporal Networks framework to learn the distribution of fashion
patterns from large amounts of data, resulting in more photographs and generate new items that optimize
personalized and accurate recommendations. Consequently, customers' preferences.
image processing is completed. KNN has now been Wang et al. [7, 19] built a fashion network based on the
implemented in this study to recommend items of VGG-16 architecture and a bidirectional convolutional
comparable items. Only cluster users of the target are recurrent neural network. They used domain-specific
searched when utilizing the closest neighbor technique, grammars to improve message transmission over grammar
making it extremely efficient. Based on the labels or values topologies and provided regularized landmark layouts.
of its nearest neighbors, the algorithm finds the k closest Tan et al. [8] Based on the updated Xception model,
training instances in the feature space to a particular test propose a clothing picture categorization technique. We
sample.

979-8-3503-0088-8/23/$31.00 ©2023 IEEE 1368


Authorized licensed use limited to: NAYEMA NASRIN. Downloaded on January 18,2024 at 03:29:31 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Electronics and Communication (ICOSEC-2023)
IEEE Xplore Part Number : CFP23V90-ART ; ISBN : 979-8-3503-0088-8

provide a clothing image classification system based on the Our model must take into account the fact that an item
improved Xception model. To begin, the last entirely linked need not always be read from left to right or up to down by
layer of the original network is replaced with another fully randomly reflecting it around its x- or y-axis.
connected layer to recognise eight classes rather than 1,000 Training Model and Testing Model:
classes. Second, our network's activation function utilizes Training a convolutional neural network requires a
both the exponential linear unit (ELU) and the rectified dataset of labelled data, along with weight training and
linear unit (ReLU), which may improve the network's expression labelling. This involves training the network on a
nonlinear and learning features. set of input data with known labels and updating its weights
Meshkini et al. [9] The fashion mnist dataset was used to accordingly. The labelled data is crucial in enabling the
evaluate the performance of CNN-based deep learning network to learn and identify patterns in the input data,
architecture, and the results were compared with which allows it to make accurate predictions or
GoogLeNet, VGG, ResNet, DenseNet, and SqueezeNet to recommendations in the case of a recommendation system.
determine which architecture performed the best. The input for training is a .csv file. The input file is first
Duan et al. [10] used the VGG-11 network to classify splitted into two halves as training model and testing model
the fashion-mnist dataset. They used small cumulative of 80% and 20% respectively. The dataset is firstly
convolution kernels instead of larger convolution cores to subjected to pre-processing and then it is categorized. The
learn more complex patterns at a lower cost and added a image pixel values and the image's index in the categories
multilayer nonlinearity layer and batch normalization layer list will be stored in an array called training. Following
to improve model performance. training, the data is tagged and exhibited as arrays. The
dataset's characteristics are normalised using the Keras API.
CHEN et al. [11, 18] The SE-Net model was To ease the training performance of the dataset, the
incorporated into the model to strengthen the beneficial Convolution Network is trained using an image dataset. The
feature channels while weakening the worthless feature dataset contains many labels according to the cloth types
channels. Multi-scale deep separable convolution was used and models. The clothing collections generated from the
to boost the richness of model feature information, and the training process yield optimal results when utilized in
SE-Net module was included in the model to improve conjunction with the training data. The training dataset
relevant features. which is used for testing the images in the dataset provides
the resultant dataset according to the trained images. The
III. PROPOSED TECHNOLOGY AND FEATURES According to the cloth type the result should be produced
The system is categorized into three parts: with related pictures in output.
• Image Pre-processing Recommendation Model:
• Training and Testing Model The Recommendation model is the actual
• Recommendation Model implementation of the FRS where several algorithms and
Image Pre-processing techniques are used such as CNN, KNN, and Resnet50 to
A dataset of fashion-related photos that must be further produce high efficient results.
filtered is needed for the proposed system. Image processing Algorithm:
is the practice of manipulating digital photographs using The following step-by-step procedure is used to produce
computer algorithms. The fundamental objective of image FRS system:
processing is to enhance image pixels and refine image data Step-1: The search query for input images.
by removing distorted noise. A picture is nothing more than Step-2 From the image, the searching method is
a two-dimensional array of integers between 0 and 255. The implemented.
definition of this function is given by the mathematical Step 3: The image is passed into the RESNET50
function f(a, b), where a and b are the two coordinates for Network and it should be resized into the
the horizontal and vertical axes, respectively. The value of prescribed shape.
f(a, b) at a particular point determines the pixel value of an Step-4: It produces one single feature vector. Then we
image at that spot. calculate the Euclidean distance from each feature vector
The following procedures are used to pre-process the with their corresponding feature vector.
image: Step5: Using the distance calculated, similar images are
Orientation: produced.
The metadata associated with a photograph tells our
computers how to display the input image in reference to
how it is stored on disc.
Resize:
Although changing an image's size can seem easy, there
are some considerations to make. Despite the fact that many
model topologies demand square input images, few devices
actually capture absolutely square images. One method is to
stretch an image's size to make it square; another is to keep
the image's aspect ratio while including more pixels to fill in
the resulting "dead space." Fig 1Algorithm Flowchart
Random Flips:
Convolution Neural Network:

979-8-3503-0088-8/23/$31.00 ©2023 IEEE 1369


Authorized licensed use limited to: NAYEMA NASRIN. Downloaded on January 18,2024 at 03:29:31 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Electronics and Communication (ICOSEC-2023)
IEEE Xplore Part Number : CFP23V90-ART ; ISBN : 979-8-3503-0088-8

As a Deep Training procedure, a convolution neural as the rectified linear unit (ReLU), to introduce non-linearity
network takes an input as a picture and assigns priority to into the network.
relevant characteristics in the image, allowing them to be
distinguished from the others. Comparative to other
classification techniques, Convent’s requires a lot less pre-
processing. Whereas filters for crude systems must be
manually engineered, Convents may learn these filters and
properties with the human brain and are modelled after how
the visual cortex is organized. In Excel, Sensory cells only
react to stimuli, and a tiny portion of the visual field is
called the field. A group of related fields that cross over to
include the entire visual region. The main goal of this field Fig 2CNN Architecture
is to allow computers to perceive the world in a manner
similar to that of living things. Furthermore, computers can After passing through the activation function, the feature
leverage this knowledge for a diverse range of tasks, maps are processed by a pooling layer, which reduces their
including video and image recognition, image analysis and spatial dimensionality. The most commonly used method for
image classification, media reproduction, recommender this is max pooling, which takes the largest value within a
systems, natural language processing, and more. This is small window of the feature map and passes it to the next
because deep learning models, such as convolutional neural layer. This process of convolution and pooling is repeated
networks, can extract complex features and patterns from several times, with each layer learning more complex and
large amounts of data, enabling them to perform various abstract features of the input image. Finally, the feature
tasks with a high degree of accuracy and precision. maps are flattened into a vector and fed through one or more
Deep Learning developments in machine vision have fully connected layers. These layers produce a vector of
been built and enhanced throughout time, particularly over probabilities, indicating the input image's class labels.
one particular approach an Artificial Neural Network. A Training:
vast number of linked neurons with trainable weights and Training a CNN involves maximizing a loss function
biases make up CNN models. Layers of neurons are that measures the difference between the network's
distributed throughout the architecture of CNN. It has an predicted output and the actual labels of the input images.
input layer, many layers of long term and short term The cross-entropy loss is commonly used in classification
memory, as well as output units. A network is deemed a problems, and gradient descent is frequently used to
deep convolution neural network if it includes a sizable optimize the process. In this process, the gradients of a loss
number of hidden layers. As opposed to completely linked function are computed with respect to the network weights,
networks CNN’s layers only attach to a small portion of the and the weights are updated in a direction that minimizes the
input vector created by the preceding layer. This method loss. Over fitting is a major challenge in training CNNs,
reduces the amount of connection weights (specifications) in which occurs when the network memorizes the training data
CNN compared to MLP. CNN trains more quickly than rather than learning generalizable features. To prevent
networks of comparable size for this reason. CNN streams overfitting, various techniques have been developed, such as
frequently use two-dimensional (2d) arrays to extract, dropout, early stopping, and data augmentation. Dropout
including images. Convolution Neural Networks is a feed randomly drops out a fraction of neurons during training,
ahead system in which a mammal graphic is used to identify forcing the network to learn more robust features that are
the node linking sequence. CNNs are a type of multilayer not dependent on specific neurons. Early stopping involves
perceptron’s that exhibit high regularity, meaning that each monitoring the network's validation loss during training and
neuron in a layer is linked to every other neuron in the stopping the process when the validation loss starts to
subsequent layer. This is due to the fact that multilayer increase, indicating that the network is overfitting the
perceptron’s are often constructed as fully connected training dataset. Data augmentation involves artificially
networks. This regularity enables the network to efficiently increasing the size of the training dataset by applying
learn and extract features from input data, such as images or random transformations, such as rotation, scaling, and
videos, by applying filters to the input data and reducing the translation, to the input images, which helps the network
spatial dimensions of the data through pooling operations. learn features that are invariant to these transformations and
As a result, CNNs are commonly used in computer vision improves its ability to generalize to new data.
applications for tasks such as image recognition and object ResNet50:
detection. These networks' complete interconnectivity Resnet50 is a deep neural network architecture
makes them susceptible to data over fitting. developed in 2015 by Microsoft researchers. It belongs to
the Resnet family of models, short for "Residual Networks,"
CNN Architecture: which focuses on using residual connections to enhance the
A standard CNN structure comprises convolutional and performance of deep neural networks. Convolutional and
pooling layers, in addition to one or more fully connected pooling layers are commonly used in deep neural networks
layers. The initial input to the network is an unprocessed for image classification, object recognition, and
image, which undergoes convolution with a set of adaptable segmentation tasks in computer vision. However, as the
filters, generating a set of feature maps. These feature maps depth of these networks increases, they may encounter the
are then subjected to a non-linear activation function, such vanishing gradient problem, where the gradients become too
small to update the weights effectively during training,

979-8-3503-0088-8/23/$31.00 ©2023 IEEE 1370


Authorized licensed use limited to: NAYEMA NASRIN. Downloaded on January 18,2024 at 03:29:31 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Electronics and Communication (ICOSEC-2023)
IEEE Xplore Part Number : CFP23V90-ART ; ISBN : 979-8-3503-0088-8

resulting in reduced accuracy despite increasing network


depth. Residual connections, on the other hand, help The network is composed of four stages, with each stage
alleviate this problem by allowing gradients to bypass having several residual blocks. Each residual block is made
several non-linear activation functions, enabling deeper up of two or three convolutional layers, depending on
networks to be trained more effectively. whether the spatial dimensions of the input and output are
Residual Connections: the same or different. The first convolutional layer has a 1x1
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian kernel size and is used to reduce the number of channels,
Sun introduced residual connections in their 2016 paper while the second and third convolutional layers have a 3x3
titled "Deep Residual Learning for Image Recognition. The kernel size and maintain the input's spatial dimensions.
idea is to add a shortcut connection that bypasses one or After passing through a batch normalization layer and a
more layers of the network, allowing the input to be added ReLU activation function, the output of each residual block
directly to the output of a later layer. is added to the input through a residual connection. The
This can be represented mathematically as: final output of each stage is fed through a global average
y = F(x) + x pooling layer, which averages the feature map's spatial
where x is the input to the layer, F(x) is the output of the dimensions, and a fully connected layer, which generates a
layer after applying the convolutional and activation probability distribution across the classes.
functions, and y is the output of the layer after adding the
shortcut connection. Intuitively, the residual connection
allows the network to learn the difference between the input
and output, rather than trying to learn the entire mapping
from input to output. This makes it easier for the network Fig 3 RESNET50 Architecture
to optimize the weights, especially for deep networks where
the gradients can become very small.
Resnet50 Architecture: As previously stated, the 50-layer ResNet architecture is
Resnet50 is an architecture comprising 50 layers, made up of the following components:
consisting of convolutional layers, pooling layers, fully A 77 kernel convolution with 64 extra kernels and a 2-
connected layers, and residual connections. The model takes sized step.A two-size stride with a maximum pooling layer.9
a 224x224 RGB image as input and generates a probability more layers—33,64 kernel convolution, 11,64 kernel
distribution for 1000 classes as output. convolution, and 11,256 kernel convolution. Three times
The initial stage of the network involves a convolutional through these three tiers.12 further layers were added, each
layer that has 64 filters and a kernel size of 7x7. This is then with iterations of 4, 1, 1, 128 and 1, 512 kernels.Six
followed by a batch normalization layer and a ReLU iterations were done on 18 more layers, each with one 1,256
activation function. In order to overcome the problem of core, two 3,256 cores, and one 1,1024.3 iterations of 9
vanishing gradient we use this function. This function is additional layers of 1, 512, 3, 512, and 1, 2048 cores.After
mathematically expressed as average pooling, a fully connected layer with 1000 nodes is
f(a)=max(a,0) generated using the softmax activation function.Two key
Where this function results 1 if input is positive number design ideas govern the ResNet architecture. Initially, all the
or else this function results 0 if input is negative number. layers have the same number of filters, regardless of the
Next in the network architecture comes a max pooling dimension of the resulting feature map. Later, the number of
layer with a stride of 2 and a 3x3 kernel size. This layer filters is doubled to maintain the uniform temporal
decreases the spatial dimensions of the input by a factor of complexity of each layer, even if the feature map's size is
2. halved.
ResNet, which stands for Residual Network, was
Table 1 9x9 grid of pooling layer designed to address the issue of overfitting brought on by an
30 0 25 22 29 61 8 9 51 excess of network layers. For the best-performing model, 16
to 30 layers are the appropriate number. A ResNet is made
8 6 24 45 36 25 47 25 36
out of Residual blocks with the ReLu function. The skip
19 18 7 48 2 30 37 4 1 connection feature of residual blocks is significant. This trait
15 28 19 51 59 4 18 23 24 made the prior training of deeper networks easier because it
5 26 3 1 36 5 37 27 50 adds outputs from earlier layers to the next stack layer, by
9 8 12 0 15 2 25 48 52 offering an alternative shortcut way for the gradient
17 26 29 12 10 28 9 78 55 problem. Without any adjustments or structural changes,
the model was able to deliver us with satisfactory results
0 4 6 31 24 16 44 62 12 after being trained and evaluated on the dataset. The
21 3 9 34 2 6 33 24 17 model’s successful performance inspired us for testing deep
learning models.
Table 2 3x3 grid max pooling layer K Nearest Neighbor:
KNN is a machine learning technique that uses shared
30 61 51 fashion photos to identify groups of people who have
28 59 52 similar traits and then uses the average ratings of the top k
29 34 78 nearest neighbors to produce suggestions. Both content

979-8-3503-0088-8/23/$31.00 ©2023 IEEE 1371


Authorized licensed use limited to: NAYEMA NASRIN. Downloaded on January 18,2024 at 03:29:31 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Electronics and Communication (ICOSEC-2023)
IEEE Xplore Part Number : CFP23V90-ART ; ISBN : 979-8-3503-0088-8

based learning and collaborative based learning are IV. EXPERIMENTATION AND DISCUSSIONS
categories under which KNN falls. In the absence of a
model, memory-based algorithms, such as finding the users Dataset:
closest to a user of interest and suggesting the most popular
A dataset is a collection of data that has been formatted
things among these neighbors, are primarily focused on
for use in machine learning, data analysis, and other
nearest neighbor searches and operate directly with values
applications. It is a structured and ordered body of data that
of recorded interactions. The user-user strategy looks for
is intended to be examined by a computer program. In this
users with the closest interaction profiles (nearest
study, we used a publicly available fashion dataset
neighbours) in order to make new ideas to users who are the
containing images of men and women clothing items. The
most popular among their neighbours (and that are new to
dataset consisted of approximately 44,797 images with
our user). This strategy is known as "user-centered" because
labels for different clothing categories such as T-shirts,
it models users based on how they interact with objects and
dresses, jeans, and skirts. We used a portion of the dataset
calculates the distance between them. The KNN algorithm
for training the deep learning model and the remaining
works by finding the k-nearest neighbors of a given data
images for evaluation. The performance of the fashion
point in the feature space. The feature space is defined by
recommendation system was evaluated using standard
the input variables or features of the dataset. For example, if
evaluation metrics such as precision, recall, and F1-score.
we have a dataset of students' test scores and we want to
These metrics were calculated based on the top-K
predict their final grade, the feature space would include
recommendations provided by the system. The quality and
variables such as math score, science score, and English
amount of the dataset can have a substantial influence on
score.
machine learning model performance. A larger and more
To determine the k-nearest neighbors, the method
diverse dataset is usually preferable for training models that
calculates the distance between the provided data point and
are more accurate and generalizable to new data.
each data point in the dataset using distance metrics such as
Context
Euclidean distance, Manhattan distance, or Minkowski
We are using data from Women Apparel
distance. Then, the program selects the k-nearest neighbors
Recommendation Engine (amazon.com). The source of the
based on the shortest distance. Once the k-nearest neighbors
download is kaggle.com. In dataset each image is
are identified, the method makes a prediction based on
categorized into brand, color and estimated price. Each and
either the majority class or the average value of those
every image is acceptable only with specified size of 128 x
neighbors. For classification tasks, the prediction is based on
128. The dataset is also assigned with .json file which is
the majority class among the k-nearest neighbors. On the
more descriptive to understand. In addition, the dataset is
other hand, for regression tasks, the prediction is based on
professionally shot high resolution product images.
the mean value of the k-nearest neighbors.
Content
In the KNN algorithm, the value of k is a
Each product is identified by an ID with some number as
hyperparameter that must be tweaked. The number of
xxxxxx. From here, you can fetch the image for this product
neighbors to consider when making a forecast is determined
from image/xxxx.jpg and the complete metadata from
by the value of k. If k is too small, the algorithm may be
tops_fashion/xxxxx.
affected by outliers or noise in the data. On the other hand,
if k is too large, the algorithm may not capture the local
Table 3Dataset Table
structure of the data.
To choose the optimal value of k, we can use techniques Atrribute Description
such as cross-validation or grid search. Cross-validation
involves dividing the dataset into training and validation sets Name Fashion Product Images
and training the algorithm using various k values on the
training set. The algorithm's performance is then assessed on Source Kaggle
the validation set, and the k value that produces the best Features Images and Attribute labels
performance is chosen. In grid search, a range of values for
k is specified, and the algorithm is trained and evaluated for No.of 44,797
each value of k. The value of k that gives the best Instances
performance is then selected. JPEG Images and CSV file with
Determine Distance Metrics: File Format attribute labels.
No.of 10
The Euclidean Distance is used to compute the distance Attributes
in order to produce similar images. The formula for Attribute Categorical(color, gender, article type),
Euclidean distance is: Types numerical(height, width etc.)
Tasks Image Classification, Attribute
The most used distance metric is this one, which only Recognition, Visual Search
works with real-valued vectors. The formula above is used
to determine the straight-line distance between the query This is a demonstration of a fashion recommendation
location and the other place being measured. system that utilizes a pre-trained ResNet50 deep learning
model. The goal of the research is to achieve accurate
results for the recommender system, and one of the key

979-8-3503-0088-8/23/$31.00 ©2023 IEEE 1372


Authorized licensed use limited to: NAYEMA NASRIN. Downloaded on January 18,2024 at 03:29:31 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Electronics and Communication (ICOSEC-2023)
IEEE Xplore Part Number : CFP23V90-ART ; ISBN : 979-8-3503-0088-8

findings is the performance boost achieved through the use distances on the y-axis. A higher frequency of shorter
of content-based information, such as product images, in the distances indicates that the query image is more like the
recommendation process. The app is built using Streamlit images in the dataset, while a lower frequency of longer
and loads a pre-trained ResNet50 model with weights that distances indicates that the query image is less similar. This
were trained on the ImageNet dataset. The model is set to be information is valuable for evaluating the performance of a
untrainable. recommendation system and making improvements to it.

It loads a pre-saved set of image features and filenames


for a dataset of fashion images, which have been extracted
using the ResNet50 model. When a user uploads an image
using the Streamlit interface, the app extracts the image
features using the ResNet50 model and the extract_feature
function and then uses a recommend function to find the 5
nearest neighbors(based on Euclidean distance) to the
uploaded image's features in the pre-saved feature dataset.
The app then displays the 5 recommended images (based on
the nearest neighbors) in a row of 5 columns using the
Streamlit interface. The interface as a whole showcases the
The resulting scatter plot shows the distribution of the
utilization of a pre-existing deep learning model for
feature vectors in the reduced 2D space, giving insights into
extracting features from images. These features are then
the structure and similarity of the features.
utilized for establishing a basic fashion image
recommendation system.

The histogram of distances to the nearest neighbors’


graph is a useful tool for analyzing the similarity between a
query image and the images in a dataset. The graph displays For any dataset for processing the images feature
the frequency of distances between the query image and its extraction is the major task and hence the proposed system
nearest neighbors on the x-axis, and the frequency of those

979-8-3503-0088-8/23/$31.00 ©2023 IEEE 1373


Authorized licensed use limited to: NAYEMA NASRIN. Downloaded on January 18,2024 at 03:29:31 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Smart Electronics and Communication (ICOSEC-2023)
IEEE Xplore Part Number : CFP23V90-ART ; ISBN : 979-8-3503-0088-8

is able to produce the results faster compared to an existing [12] Depuru, Sivakumar, Anjana Nandam, S. Sivanantham, K. Amala, V.
Akshaya, and M. Saktivel. "Convolutional Neural Network based
system with a decrease in percentage of 80.64. Human Emotion Recognition System: A Deep Learning Approach."
In 2022 Smart Technologies, Communication and Robotics (STCR),
V. CONCLUSION pp. 1-4. IEEE, 2022.
In conclusion, the fashion recommendation system using [13] Pereira, Artur M., J. Antao B. Moura, Evandro De B. Costa, Thales
a conventional neural network is an effective way to suggest Vieira, Andre RDB Landim, Eirini Bazaki, and Vanissa Wanick.
"Customer models for artificial intelligence-based decision support in
personalized fashion choices to users. By analyzing the fashion online retail supply chains." Decision Support Systems, Vol.
user's input images, the system can recommend outfits that 158, 113795, 2022.
are tailored to their style. The conventional neural network [14] Yangyang, Li, Wang Yajun, and Zhang Miyuan. "POI
approach provides an accurate and efficient way to process Recommendation System using Hypergraph Embedding and Logical
Matrix Factorization." Journal of Artificial Intelligence and Capsule
large amounts of data and make recommendations based on Networks, Vol. 4, No. 1, pp. 37-53, 2023.
user behavior. Overall, this system has the potential to [15] Medina, Adán, Juana Isabel Méndez, Pedro Ponce, Therese Peffer,
revolutionize the way people shop for clothes, by providing Alan Meier, and Arturo Molina. "Using deep learning in real-time for
a more personalized and convenient shopping experience. clothing classification with connected thermostats." Energies, Vol. 15,
Given the growth of technology, there are more advanced No. 5, 1811, 2022.
and accurate recommendation systems that will further [16] Smyth, Barry, Aonghus Lawlor, Jakim Berndsen, and Ciara Feely.
"Recommendations for marathon runners: on the application of
enhance the user experience. recommender systems and machine learning to support recreational
marathon runners." User Modeling and User-Adapted Interaction,
REFERENCES Vol. 32, No. 5, pp. 787-838, 2022.
[17] Arora, Sanchi, and Abhijit Majumdar. "Machine learning and soft
[1] Balim, Caner, and Kemal Özkan. "Diagnosing fashion outfit computing applications in textile and clothing supply chain:
compatibility with deep learning techniques." Expert Systems with Bibliometric and network analyses to delineate future research
Applications, Vol. 215, 119305,2023. agenda." Expert Systems with Applications, 117000, 2022.
[2] Shimizu, Ryotaro, Yuki Saito, Megumi Matsutani, and Masayuki [18] S. Depuru, P. Hari, P. Suhaas, S. R. Basha, R. Girish and P. K. Raju,
Goto. "Fashion intelligence system: An outfit interpretation utilizing "A Machine Learning based Malware Classification Framework,"
images and rich abstract tags." Expert Systems with Applications, 2023 5th International Conference on Smart Systems and Inventive
Vol.213, 119167,2023. Technology (ICSSIT), Tirunelveli, India, 2023, pp. 1138-1143, doi:
10.1109/ICSSIT55814.2023.10060914.
[3] Pervez, Md Nahid, Wan Sieng Yeo, Faizan Shafiq, Muhammad
Munib Jilani, Zahid Sarwar, Mumtahina Riza, Lina Lin, Xiaorong [19] A. B. Reddy and R. Y. R. Kumar, "Performance and Security
Xiong, Vincenzo Naddeo, and Yingjie Cai. "Sustainable fashion: Analysis in Cloud Using Drops and T-Coloring Methods," 2022
Design of the experiment assisted machine learning for the Fourth International Conference on Emerging Research in
environmental-friendly resin finishing of cotton fabric." Heliyon, Electronics, Computer Science and Technology (ICERECT),
e12883, 2023. Mandya, India, 2022, pp. 1-7, doi:
10.1109/ICERECT56837.2022.10060014.
[4] Zhao, Shajunyi, Jianchun Miao, Jingfeng Zhao, and Nader
Naghshbandi. "A comprehensive and systematic review of the
banking systems based on pay-as-you-go payment fashion and cloud
computing in the pandemic era." Information Systems and e-Business
Management, pp: 1-29, 2023.
[5] Deffayet, Romain, Thibaut Thonet, Jean-Michel Renders, and
Maarten de Rijke. "Offline Evaluation for Reinforcement Learning-
based Recommendation: A Critical Issue and Some Alternatives."
arXiv preprint arXiv:2301.00993, 2023.
[6] Balmadres, Jcyle Anne T., Kristine Bartolome, Roi Gerome B. Bunyi,
Jeffrey Rafael B. Jacobo, Jay-ar P. Lalata, Ace C. Lagman, and Ma
Corazon Fernando-Raguro. "Development of Hybrid Personalized E-
commerce Using Collaborative Filtering and Content-Based Filtering
for South Cartel Clothing Company." In Intelligent Sustainable
Systems: Selected Papers of WorldS4 2022, Vol. 1, pp. 83-91.
Singapore: Springer Nature Singapore, 2023.
[7] Peuker, Andreas, and Thomas Barton. "Recommendation Systems
and the Use of Machine Learning Methods." In Apply Data Science:
Introduction, Applications and Projects, Wiesbaden: Springer
Fachmedien Wiesbaden, pp. 79-93, 2023.
[8] Sakthivel, M., S. Sivanantham, R. Kamalraj, and V. Krishnamoorthy.
"An Analysis of Machine Learning Depend on Q-MIND for
Defencing the Distributed Denial of Service Attack on Software
Defined Network." International Journal of Early Childhood Special
Education, Vol. 14, No. 05, pp. 3769-3776, 2022.
[9] Saravanakumar, V., K. G. Suma, M. Sakthivel, K. S. Kannan, and M.
Kavitha. "Segmentation of hyperspectral satellite image based on
classical clustering method." Int J Pure Appl Math, Vol. 118, No. 9,
pp. 813-820, 2018.
[10] Sakthivel, M., S. Sivanantham, V. Akshaya, D. Sivakumar, and H.
Karthikeyan. "A Malicious Botnet Traffic Detection Using Machine
Learning." Journal of Pharmaceutical Negative Results, Vol. 13, No.
4, pp. 968-977,2022.
[11] Depuru, Sivakumar, Anjana Nandam, P. A. Ramesh, M. Saktivel, and
K. Amala. "Human Emotion Recognition System Using Deep
Learning Technique." Journal of Pharmaceutical Negative Results,
Vol.13, no. 4, pp. 1031-1035, 2022.

979-8-3503-0088-8/23/$31.00 ©2023 IEEE 1374


Authorized licensed use limited to: NAYEMA NASRIN. Downloaded on January 18,2024 at 03:29:31 UTC from IEEE Xplore. Restrictions apply.

You might also like