Final Research Paper
Final Research Paper
In today’s era, online shopping is the most practical option because it doesn't have a time limit. The
ordering, payment, and delivery processes are all completed quickly and with instant pleasure, making
it comfortable in a user-friendly atmosphere. This research makes use of consumer behaviour to
pinpoint current fashion trends and create winning marketing plans. Using cutting-edge technologies
including recommendation algorithms, natural language processing (NLP), clustering techniques, and
recommendation systems, this research article attempts to investigate several facets of Myntra fashion
products. The use of TF-IDF scores to determine the most significant and pertinent words and phrases
in order to extract keywords from text using the Natural Language Processing (NLP) technique. K-
means clustering is also used to improve product classification and the grouping of related products
automatically. We can create a recommendation system that enhances customers' shopping
experiences by taking into account pricing, individual category, rating, and review data. The result
shows that that the dataset featured well-known brands and fashion accessories based on sales,
ratings, reviews, and price. NLP methods like TF-IDF scores and K-means clustering improve
marketing strategies and e-commerce experiences. K-means clustering enhances both search results
and customer happiness. A system of ratings, reviews, prices, and category-specific recommendations
might improve the buying experience.
Keywords: Data Analyst, Myntra Fashion Products, Natural Language Processing (NLP), Clustering,
K-means clustering, Recommendation System.
Introduction
Myntra is an Indian e-commerce platform founded in 2007 by Ashutosh Lawani, Vineet Sexana, and
Mukesh Bansal. Initially focused on selling personalized gifts, Myntra transitioned to fashion and
lifestyle products in 2011. Despite being acquired by Flip Kart in 2014, Myntra continues to operate
independently. The platform offers a wide range of clothing and accessories, including sports wear,
bags, watches, sunglasses, accessories, footwear, and apparel. Myntra Fashion Products provides
information on over 1 million fashion products, including product attributes like name, description,
price, brand, category, and images. The platform aims to make fashion accessible to all, offering
personalized recommendations, easy returns, and efficient customer service. Today, Myntra stands as
a comprehensive destination for fashion enthusiasts, offering an extensive array of clothing and
accessories spanning various categories such as sportswear, bags, watches, sunglasses, footwear, and
apparel. With a catalog boasting over 1 million fashion products, Myntra Fashion Products serves as a
one-stop-shop for shoppers seeking the latest trends, styles, and brands. The platform's commitment to
accessibility and inclusivity is reflected in its emphasis on personalized recommendations, hassle-free
returns, and responsive customer service, making fashion more accessible and enjoyable for
consumers across India.
The motivation for this study is the fashion industry's growing dependency on e-commerce sites such
as Myntra, as well as the difficulties in efficiently organizing and selecting a wide variety of fashion
items to satisfy changing consumer demands. It is now critical for e-commerce platforms to
comprehend and address the issues related to product discoverability, personalized recommendations,
inventory management, and customer satisfaction as online shopping continues to gain popularity.
Our research tries to address these difficulties by employing advanced data analytics approaches to
identify the underlying patterns and trends in Myntra fashion product offers.
Even though current algorithms and systems have significantly improved the accessibility and
functioning of e-commerce platforms, there is still much space for innovation and development,
especially in the retail fashion sector. Conventional recommendation systems, for example, frequently
depend on content-based or collaborative filtering techniques, which could not adequately represent
the complex tastes and style sensibilities of customers who are concerned with fashion. Similar to
this, clustering algorithms can find it difficult to classify the wide variety of fashion products that are
offered on sites like Myntra, where styles and trends are ever-changing.
Our research attempts to close these gaps and open up new paths for enhancing user experiences and
promoting business growth by investigating cutting-edge techniques and algorithms designed
especially for the complexities of Myntra product ecosystem. This project report explores Myntra
Fashion Products and uses it to develop a product recommendation system that can learn from user
feedback and improve recommendations over time. This research paper aims to explore various
aspects of Myntra fashion products using cutting-edge technologies such as Natural Language
Processing (NLP), recommendation systems, clustering techniques is used to analyze textual data
associated with Myntra products, including product descriptions, user reviews, and fashion trends.
Recommendation systems are developed to enhance user engagement and personalized shopping
experiences. Clustering techniques organize Myntra vast product inventory into meaningful
categories, optimizing inventory selection and providing intuitive browsing experiences [7].
In conclusion, this research paper aims to contribute to the understanding of Myntra fashion products
through a multidimensional analysis using NLP, recommendation systems, clustering, and sentiment
analysis. The findings can inform strategic decisions, drive innovation, and enhance the
competitiveness of Myntra in the ever-evolving e-commerce landscape.
LETRATURE REVIEW
1. Awasthi et.al (2021) [1], The increasing size of internet data necessitates a solution to
convert it into understandable information. Text summarization, a technique used in
research, can be classified into Extractive and Abstractive methods. Extractive methods
minimize summarization by selecting relevant sentences, with researchers specializing in
Natural Language Processing (NLP) focusing on extractive methods. This paper studies
extractive and abstract methods for summarizing texts, analyzing their effectiveness in
producing a more focused summary.
2. Boorugu et.al (2020) [4], The increasing use of smartphones and the internet has led to a
rise in online shopping. To detect genuine products and make informed decisions, users
often read long reviews. Text summarization, a field of interest for NLP researchers, can
help reduce lengthy texts to short, concise sentences. This paper surveys various text
summarization techniques, using seq2seq, LSTM, and attention mechanisms for improved
accuracy.
3. Manojkumar et.al (2023) [6], This study investigates the best algorithm for summarizing
reviews on websites like Yelp, focusing on text summarization techniques. The research
analyzes metrics like cosine similarity, bilingual evaluation score, and recall-oriented
analysis. Results show LexRank, a text summarization technique, outperforms other
methods with precision and recall values of 0.586 and 0.346, respectively.
4. Bellini et.al (2023) [3], This paper proposes a recommendation system for fashion retail
shops using a multi-clustering approach to predict purchase behavior of newly acquired
customers. The system uses mining techniques and is validated in store and online. The
current marketing solutions focus on popular items, losing focus on customer centricity
and personality. The system was developed in the Feedback project partially founded by
Regione Toscana and tested on Tessilform and Patrizia Pepe.
5. Goyani et.al (2020) [5], Recommendation Systems are popular tools for automated
decision-making, particularly in the movie industry. These systems use collaborative and
content-based filtering to find beneficial information. Combining these methods can
improve the system. This paper surveys state-of-the-art methods for movie
recommendation, including Content Based Filtering, Collaborative Filtering, Hybrid
Approach, and Deep Learning Based Methods.
6. Sharma et.al (2021) [8], In today's tech era, startups and companies need effective
promotion mechanisms like recommender systems to better communicate with users.
These filters predict and show desired items, helping companies decide which products to
launch in the market. These systems are beneficial in various domains, including music,
books, movies, and research articles.
7. Prasad, B. (2007) [7], This paper presents a knowledge-based product recommendation
system for B2C e-commerce, Recommend Ex, which uses previous user purchase
patterns to recommend products to new users. The system uses Case-Based Reasoning
Plan Recognition and Automated Collaborative Filtering approaches, and is tested in a
simulated environment.
8. Armstrong et.al(2020)[2],The study explores the practice of framing a price as a discount
from an earlier one, arguing that a higher initial price indicates high-quality product and
that consumers with reference-dependence preferences are more likely to buy if they
perceive the price as a bargain. The welfare effects of regulation to prevent fictitious
pricing depend on consumer sophistication and naivety.
OBJECTIVE:
Data description:
We have incorporated data set from Kaggle website. the Myntra Fashion Product dataset is a
large, diverse, and high-quality dataset of fashion products from the Indian e-commerce
platform Myntra. The dataset contains 184913 rows and 12 columns. The dataset contains a
wide variety of product information, including product descriptions, categories, attributes,
and prices. This dataset is a valuable resource for research in areas such as fashion
recommendation systems, Cluster Analysis and fashion trend analysis.
Data Preprocessing:
Data preprocessing is a crucial step in machine learning, ensuring the quality and format of
raw data for data mining. It involves tasks like handling missing values, removing outliers,
scaling features, and encoding categorical variables. This process ensures the data is in the
right format and ready for model training.
Typical data preprocessing steps include the following:
Data
Transformation
Data
Preprocessing
Data Cleaning: Data cleaning processes involve identifying and addressing missing values
in a dataset. These processes can be done by removing records with missing values or impute
them using techniques like mean, median, or regression. The goal is to ensure data accuracy,
consistency, and reliability.
First step is checking missing values present in the dataset:
The text reveals that two columns, DiscountPrice (in Rs) and Discount Offer, have missing
values. To address these issues, a formula can be used to replace the missing values in
Microsoft Excel, based on the relationship between DiscountPrice (in Rs), OriginalPrice (in
Rs), and DiscountOffer.
Imputing data that is missing: [8] By using the following formula we can replace the
missing values
Original Price∗(100−Discount ( ¿ % ))
Discount price (in Rs)¿
100
After imputation of missing values, we can see that there is no missing value present in the
dataset.[8]
Model development
From the above figure, it is clear that "Roadster" is the brand with the highest sales volume
when compared to other brands. The Roadster brand sells more than 20,000 units.Out of all
brands, HERE&NOW has the second-highest sales volume.
.
From the above figure, we can see that the category labelled "Western" is the most preferred
one compared to other categories. This indicates that a majority of people prefer to dress in
western clothing over other types.
Distribution Plot on Product Ratings:
From the above figure, dataset's rating distribution is skewed to the right, with a mean of 3.7
and a mode of 4. This suggests that there are more ratings of 4 than any other rating value.
This distribution plot also makes it clear that while some customers were not happy with the
products, the majority of satisfied customers had a good experience.
Extract keywords from product descriptions by using NLP technique:
Natural Language Processing (NLP) techniques are used to extract keywords from
product descriptions, improving search and recommendation systems. The Term
Frequency-Inverse Document Frequency (TF-IDF) score NLP technique is applied,
assigning higher weights to frequently occurring terms and penalizing common ones. This
process involves tokenization, removing stop words, calculating TF and IDF, and
selecting top TF-IDF scored words.
Here is an overview simplified example in Python using the scikit-learn library:
Interpretation:
Our conclusion from the above-mentioned clusters is that the K-means algorithm can be used
to group product descriptions and individual categories into comparable groups and corpora.
With the help of these clusters, we can quickly identify the category we really want based on
the product description.
Methodology
The Myntra Fashion Industry study uses data from numerous websites and internet sources to
concentrate on several brands of Myntra fashion products throughout India. The process
entails gathering datasets, preprocessing and exploring them, creating features for them,
evaluating the models, clustering analysis, and recommendation systems. The dataset, which
was acquired from Kaggle, includes details about a range of fashion products available on the
Myntra platform, such as the product category, brand, cost, and user feedback.
The dataset identification, missing data handling, and summary statistics creation are all part
of the EDA process. By identifying popular brands and fashion trends, feature engineering
improves the dataset's relevance and suitability for analysis. While clustering analysis uses
unsupervised machine learning algorithms like K-Means clustering to identify patterns and
groupings within the model, model evaluation uses metrics like accuracy, precision, recall,
and F1-score.
By using exploratory data analysis, we are able to determine which clothing accessories and
brands are popular with consumers based on factors like price, ratings, reviews, and sales.
Distribution plot product ratings show that few customers have had bad experiences with the
products, and the majority of customers have had positive experiences.
we can conclude that NLP is a method for extracting keywords from text by highlighting the
most important and relevant words and phrases based on TF-IDF scores. The NLP technique
can enhance marketing strategies, e-commerce experiences, and search capabilities by
extracting keywords from product descriptions. It facilitates the discovery of market trends
and raises the Quality of products.
We can draw the conclusion that by automatically assembling comparable products into
clusters, K-means clustering enhances both product classification and user experience. This
improves the relevancy, accuracy, and search results, cuts down on search time, and raises
customer satisfaction.
We can create a recommendation system that enhances customers' shopping experiences by
taking into account factors like price, individual category, rating, and review. The tailored and
accurate suggestions facilitate decision-making. Customers find products within their budgets
by using this method, demonstrating the accuracy and relevance of the recommendation
system to the needs of the users.
Reference:
1. Awasthi, I., Gupta, K., Bhogal, P. S., Anand, S. S., & Soni, P. K. (2021, January). Natural
language processing (NLP) based text summarization-a survey. In 2021 6th International
Conference on Inventive Computation Technologies (ICICT) (pp. 1310-1317). IEEE.
2. Armstrong, M., & Chen, Y. (2020). Discount pricing. Economic Inquiry, 58(4), 1614-1627.
3. Bellini, P., Palesi, L. A. I., Nesi, P., & Pantaleo, G. (2023). Multi clustering recommendation
system for fashion retail. Multimedia Tools and Applications, 82(7), 9989-10016.
4. Boorugu, R., & Ramesh, G. (2020, July). A survey on NLP based text summarization for
summarizing product reviews. In 2020 Second International Conference on Inventive
Research in Computing Applications (ICIRCA) (pp. 352-356). IEEE.
5. Goyani, M., & Chaurasiya, N. (2020). A review of movie recommendation system:
Limitations, Survey and Challenges. ELCVIA: electronic letters on computer vision and
image analysis, 19(3), 0018-37.
6. Manojkumar, V. K., Mathi, S., & Gao, X. Z. (2023). An experimental investigation on
unsupervised text summarization for customer reviews. Procedia Computer Science, 218,
1692-1701.
7. Prasad, B. (2007). A knowledge-based product recommendation system for e-
commerce. International Journal of Intelligent Information and Database Systems, 1(1), 18-
36.
8. Sharma, J., Sharma, K., Garg, K., & Sharma, A. K. (2021). Product recommendation system a
comprehensive review. In IOP conference series: materials science and engineering (Vol.
1022, No. 1, p. 012021). IOP Publishing.