Best Product Recommendation System
Best Product Recommendation System
The history of recommendation system date back to early 90’s. Ever since then the
recommendation systems have been evolving continuously. These recommendation systems
have become the integral part of the web applications. This is mainly because the e-
commerce websites have a large variety of product inventory’s available the data which is to
be displayed to the user can be overwhelming. As a result the user may face difficulties to
search for the product that they maybe looking for. At this point the recommendation systems
comes to usage. Harvard Business Review said that these recommendation algorithms are the
key difference between “born digital” enterprises and legacy companies. This is mainly
because the recommendation engine can provide personalized suggestion of products to the
users depending on their browsing patterns and previous purchase history. As it provides
these personalized recommendations users will be much interested to come back and do more
purchases in the interactive environment provided by our system. With more number of visits
from the user to our website, we can be able to collect more data about the user and the
products, which in turn provides a chance to know the areas of improvements of the product.
The better quality of product we have there is a much likely chance of acquiring more
number of users. This is a whole continuous cyclic process which can be seen in the figure
below[1].The recommendation system helps to increase the sales of items related to a
particular product too. For example if a user buys a monitor from the website, our
recommendation system will suggest him different types of keyboards and mouse. In this way
it helps in increasing the sales of certain items which the user by himself may not be
searching. This recommendation system reduces the load on the database as they provide a
personal recommendation to every individual users instead of displaying whole inventory.
These recommendation systems usually are based on two different filtering techniques
namely Collaborative filtering and Content based filtering. content based filtering is based on
the content i.e. item-item relationship. In this method we form a relationship or cluster
between group of items and display the relevant items to the customer when the customer
searches for one item from the cluster. Collaborative filtering is based on the user
behavior .Depending on the use behavior we do the further recommendations. In
Collaborative filtering we form a cluster among different customers depending on their
previous purchases and history. From this we display the similar products to remaining users
in the cluster.
Problem Statement:
Background: In the rapidly evolving landscape of e-commerce, providing personalized and
relevant product recommendations has become a critical factor for retaining and engaging
customers. Traditional recommendation systems often fall short in accurately understanding
individual preferences and fail to adapt to changing user behavior. This project aims to
develop a cutting-edge product recommendation system that leverages advanced machine
learning techniques to deliver highly tailored and timely recommendations, thereby
enhancing the overall customer experience.
Objectives: The primary objective of this project is to design, develop, and implement a state-
of-the-art product recommendation system capable of generating precise and contextually
appropriate product suggestions for each user. The system should address the following key
challenges:
1. User Personalization: Create a recommendation engine that understands the unique
preferences, interests, and behaviours of individual users, adapting to their evolving
tastes over time.
2. Real-time Adaptation: Develop mechanisms to dynamically adjust recommendations
based on the user's current context, such as browsing behaviour, purchase history, and
search queries.
3. Scalability and Efficiency: Build a system that can handle a large user base and a
vast product catalogue efficiently, ensuring low-latency response times even under
high load.
4. Cross-selling and Up-selling: Implement strategies to intelligently suggest
complementary or higher-value products, thereby increasing the average transaction
value.
5. Novelty and Diversity: Ensure that the system recommends a diverse set of products
to avoid monotony and introduce users to new and interesting items.
6. Cold Start Problem: Address the challenge of providing relevant recommendations
for new users who do not have an established history of interactions.
7. Explainability and Transparency: Incorporate mechanisms to provide users with
understandable explanations for the recommendations made, increasing trust and user
satisfaction.
8. Privacy and Data Security: Prioritize user privacy by implementing robust data
anonymization and protection measures to safeguard sensitive information.
9. A/B Testing and Evaluation: Establish a framework for rigorous testing and
evaluation of recommendation algorithms to continuously refine and optimize the
system's performance.
10. Integration and Deployment: Ensure seamless integration of the recommendation
system into the existing e-commerce platform, with provisions for easy scalability and
maintenance.
Deliverables:
1. Algorithmic Approach: Document the chosen recommendation algorithms and their
rationale, highlighting their suitability for addressing the specified objectives.
2. System Architecture: Provide a detailed architectural overview of the
recommendation system, outlining the components, their interactions, and the data
flow.
3. Data Collection and Preprocessing: Describe the process of acquiring, cleaning, and
preparing the data necessary for training and testing the recommendation models.
4. Model Training and Validation: Present the methodology for training and validating
the recommendation models, including hyperparameter tuning and performance
metrics.
5. User Interface (UI) Design: Develop an intuitive and user-friendly interface for
displaying the product recommendations within the e-commerce platform.
6. Performance Evaluation and Benchmarking: Conduct extensive experiments to
evaluate the system's performance in terms of recommendation accuracy, diversity,
and scalability.
7. Documentation and User Manual: Compile comprehensive documentation,
including a user manual, to facilitate easy integration, maintenance, and
troubleshooting of the recommendation system.
8. Presentation and Demonstration: Prepare a detailed presentation and live
demonstration showcasing the capabilities and effectiveness of the developed product
recommendation system.
By addressing these objectives and delivering the specified components, this project aims to
revolutionize the way products are recommended to users, ultimately leading to increased
customer satisfaction, higher conversion rates, and improved business outcomes for the e-
commerce platform.
Scope of research:
The scope of research encompasses a wide range of areas, each crucial for the successful
development and implementation of an effective recommendation system. Here are the key
aspects that should be considered:
1. Recommendation Algorithms:
- Explore and evaluate various recommendation algorithms such as collaborative filtering,
content-based filtering, matrix factorization, deep learning-based methods, and hybrid
approaches.
- Investigate the suitability of advanced techniques like reinforcement learning, contextual
bandits, and sequence modelling for personalized recommendations.
6. Evaluation Metrics:
- Define appropriate evaluation metrics for assessing the performance of recommendation
models, considering factors like accuracy, diversity, novelty, serendipity, and user
engagement.
By considering these research areas, the project can aim to develop a comprehensive and
innovative product recommendation system that addresses the diverse needs of users and the
e-commerce platform. This multi-faceted approach will contribute to the creation of a system
that stands out in terms of accuracy, personalization, and overall user satisfaction.
1. Research hypothesis:
Hypothesis 1: Personalized Recommendations Improve User Engagement
Null Hypothesis (H0): There is no significant difference in user engagement
metrics between personalized and non-personalized recommendations.
Alternative Hypothesis (H1): Personalized recommendations lead to a
statistically significant increase in user engagement, including click-through
rates, conversion rates, and time spent on site.
2. Hypothesis 2: Contextually Adapted Recommendations Enhance User
Satisfaction
Null Hypothesis (H0): There is no significant difference in user satisfaction
ratings between contextually adapted recommendations and static
recommendations.
Alternative Hypothesis (H1): Contextually adapted recommendations result in
a statistically significant improvement in user satisfaction scores.
3. Hypothesis 3: Incorporating Novelty and Diversity Increases User Interaction
Null Hypothesis (H0): There is no significant impact on user interaction
metrics when incorporating novelty and diversity in recommendations.
Alternative Hypothesis (H1): Recommendations that prioritize novelty and
diversity lead to a statistically significant increase in user interactions,
including exploring new products and categories.
4. Hypothesis 4: Explainable Recommendations Foster Trust and User Confidence
Null Hypothesis (H0): There is no significant difference in user trust and
confidence levels between explainable and non-explainable recommendations.
Alternative Hypothesis (H1): Explainable recommendations result in a
statistically significant increase in user trust and confidence.
5. Hypothesis 5: Effective Handling of Cold Start Problem Enhances New User
Engagement
Null Hypothesis (H0): There is no significant difference in engagement
metrics for new users between systems that address the cold start problem and
those that do not.
Alternative Hypothesis (H1): Systems that effectively handle the cold start
problem lead to a statistically significant increase in engagement for new
users.
6. Hypothesis 6: Cross-selling and Up-selling Strategies Boost Average Transaction
Value
Null Hypothesis (H0): There is no significant difference in average transaction
values between systems that incorporate cross-selling and up-selling strategies
and those that do not.
Alternative Hypothesis (H1): Systems that employ cross-selling and up-selling
strategies result in a statistically significant increase in average transaction
values.
7. Hypothesis 7: Privacy-Preserving Techniques Do Not Compromise
Recommendation Quality
Null Hypothesis (H0): There is no significant difference in recommendation
quality between systems that employ privacy-preserving techniques and those
that do not.
Alternative Hypothesis (H1): Privacy-preserving techniques do not
significantly compromise recommendation quality.
8. Hypothesis 8: Advanced Machine Learning Techniques Outperform Traditional
Approaches
Null Hypothesis (H0): There is no significant difference in recommendation
accuracy between systems using advanced machine learning techniques and
those using traditional approaches.
Alternative Hypothesis (H1): Systems employing advanced machine learning
techniques achieve significantly higher recommendation accuracy.
These research hypotheses provide a framework for systematically evaluating different
aspects of the recommendation system, allowing for rigorous testing and validation of its
effectiveness in enhancing user experience and driving business outcomes.
Objectives:
1. Problem Statement: Clearly define the problem that your project aims to address.
Explain why a product recommendation system is needed and what challenges it aims
to overcome.
2. Scope of the Project:
Define the scope of your project, including the types of products or services
the recommendation system will cover.
Specify any limitations or constraints of the system.
3. User Persona and Target Audience:
Define the intended users of the recommendation system (e.g., e-commerce
shoppers, movie enthusiasts, etc.).
Explain their characteristics, preferences, and behaviors that are relevant to the
recommendation process.
4. Data Collection and Preprocessing:
Describe the data sources used for training the recommendation system (e.g.,
user behavior, product attributes, reviews, etc.).
Explain any preprocessing steps performed on the data, such as cleaning,
normalization, or feature extraction.
5. Algorithm Selection and Implementation:
Discuss the algorithms and techniques chosen for building the
recommendation system (e.g., collaborative filtering, content-based filtering,
hybrid methods, etc.).
Explain why these methods were chosen and how they address the problem.
6. Evaluation Metrics:
Specify the metrics used to evaluate the performance of the recommendation
system (e.g., accuracy, precision, recall, F1-score, etc.).
Justify the choice of these metrics and explain how they measure the
effectiveness of the system.
7. User Experience and Interface Design:
Describe the user interface and interaction flow of the recommendation
system.
Discuss any user experience considerations, such as ease of use,
personalization, and feedback mechanisms.
8. Performance Analysis and Results:
Present the results of the evaluation using the chosen metrics.
Compare the performance of different algorithms or configurations if
applicable.
Discuss any challenges faced and how they were addressed.
9. Recommendation System Enhancements:
Propose potential improvements or enhancements to the recommendation
system based on the findings from the evaluation.
Discuss any future work or research directions that could further optimize the
system.
10. Conclusion and Summary:
Summarize the key findings and contributions of the project.
Reflect on the effectiveness of the recommendation system in meeting its
objectives.
11. References:
List all the sources, papers, and tools that were referenced or used during the
project.
12. Appendices (if necessary):
Include any supplementary material, code snippets, or additional data that
supports the main content of the report.
With the increasing amount of information that people browse daily, how to quickly obtain
Information Items that meet people’s needs has become an urgent issue these days. The effort in
information retrieval have brought great convenience to people who tend to retrieve information by
entering a query or keywords. If some information-intensive websites can proactively suggest
products or information items that users may be interested in, it will greatly improve the efficiency
and satisfaction of users in obtaining information items. The research in the field of recommender
systems precisely originated on this subject. Over the past years, tremendous progress has been
made into this area, from non-personalised to personalised and to more recent deep learning
recommender systems. Although recommender systems have been widely applied, there are still
many issues and challenges in designing high quality recommender systems. To measure the quality
of a recommender system, a scientific and rigorous evaluation process is required. This report
reviews some existing well-established recommender systems and investigate some existing metrics
for evaluating them. Besides, this report gives details of the project’s implementation - a web
application for the offline evaluation of three major collaborative filtering recommendation
algorithms, item-based, user-based, and matrix-factorisation. The application supports a wide range
of readily configurable evaluation metrics for users to visualise the performance between different
recommendation algorithms. The project aims to provide a comprehensive platform for designers to
evaluate recommender systems and guide them to design better recommender systems.
Nowadays, a recommender system can be found in almost every information-intensive website. For
example, a list of likely preferred products are recommended to an customer when browsing the
target product in Amazon1. Moreover, when watching a video clip in Youtube2 , a recommender
system employed in the system suggests some relevant videos to users by learning the users’
behaviours that were generated previously. So to speak, recommender systems have deeply changed
the way we obtain information. Recommender systems not only make it easier and more convenient
for people to receive information, but also provide great potential in the economic growth. As more
and more people realise the importance and power of recommender systems, the exploration for
designing high quality recommender systems have been remaining an active topic in the community
over the past decade. Due to the continuous efforts into the field, thankfully, many recommender
systems have been developed and used in a variety of domains. Based on this, a key question arising
is how to know the performance of recommender systems so that the most suitable ones can be
found to apply in certain contexts or domains. The answer goes to evaluating recommender systems
by conducting rigorous and scientific evaluation experiments. Evaluating recommender systems has
become increasingly important with the growing popularity of recommender systems in some
applications. It is often the case that an application designer need to choose between a set of
candidate recommendations algorithms. This goal can be achieved by comparing the performance of
these algorithms in evaluation experiments. Besides, evaluating recommender systems can help
researchers to select, tune and design recommender systems as a whole. This is because when
designing a recommender system, some key factors influencing the system’s quality are often come
into notice in the process of evaluation. For example, in Herlocker et al. highlight that, when
considering evaluation metrics, evaluators should not only take into account the accuracy metrics,
but also some extra quality metrics, or to say beyond accuracy metrics, which attach importance to
the fact that users are often not interested in the items that they already know and surely like but
sometimes in discovering new items and exploring diverse items. Motivated by the importance of
evaluating recommender systems and the highlight on comprehensive metric considerations into the
evaluation experiments, the project aims to explore the most possible scientific method to
evaluating recommender systems. The implementation of the project is presented in the form of a
web application.
evaluation metrics. Also, it stresses that the metrics should be available for the compare
• Offline evaluating. There are main two approaches to evaluate recommender systems.
Online trials uses A/B testing to evaluate different algorithms by analysing real usage and
algorithm components by using real-world rating datasets [7]. The scope of this project is
offline evaluation.
are well-established and applied, the collaborative filtering (CF) algorithms [3, 2, 7, 4, 5,
11] are considered the most widely-used ones. Although the project concentrates on CF
algorithms, the application developed is extensible and can readily support the addition of
new RS algorithms.
Overall, the project aims to provide a good platform for designers to evaluate recommender systems
and guide them to design better recommender systems. Based on the project’s specification,
there are main five objectives extracted. Blow is the brief description of each projective.
1. A web application. A web application is developed to provide GUI interfaces for users
to conduct experiments more easily and conveniently. The implementation process goes
from prototyping, to the design of pages, to coding, and finally to optimisation. Along the
way, the key idea that has been bearing in mind is making the front interfaces as easily
interactive as possible.
2. Implementation of three algorithms. The three algorithms are all collaborative filtering
3. Evaluation visualisation. This objective can be said the core part of the project. First,
three evaluation methodologies [32] should be integrated to the system. They are repeated
random sub-sampling, leave-one-out and K-fold cross validation. Next, some basic evaluation metrics
(here primarily accuracy metrics) are included. Last, the compare and contrast,
namely metrics visualisation between multiple experiments should be finished.
4. Session mechanism. The system is developed to support two different modes. The first
one is that a user can get access to the application as a guest which means there is no
session between the user and the server. The other mode is session mode, which allows
users to log in the system using a unique session code. In this mode, users are in session
services provided by the server. This objective is outside of the project specification. In
task, the session mechanism is used to manage experiments and store users’ running data
including the information of experiments, running times, metric details, etc. Core activities
experiments.
5. Extension and enhancement. The objective is considered the advanced improvement for
the project. The first extension part falls into the dataset. A file uploading interface is
provided for users to apply dataset in a certain format. In addition, the most important
extension goes to beyond accuracy metrics. This is what the project core implementation lies
on. Beyond accuracy metrics that are extended and enhanced include popularity, diversity,
• Database: MySQL
• Tools: Microsoft Visual Studio, Flask, Web Browser (Google Chrome or Firefox)
HARDWARE REQUIREMENTS:
• 6Gb RAM
• I5 processer
Python is a versatile and dynamically-typed programming language that has gained immense
popularity since its creation in the late 1980s by Guido van Rossum. Known for its simplicity and
readability, Python is often praised for its elegant syntax, making it an ideal choice for both beginners
and experienced developers alike. Its design philosophy emphasizes code readability and ease of use,
which has contributed to its widespread adoption across various domains.
One of Python's major strengths lies in its extensive standard library, which provides a rich set of
modules and packages for tasks ranging from web development to scientific computing. This
extensive ecosystem significantly reduces the need to reinvent the wheel, allowing developers to
leverage existing code and focus on solving specific problems.
Python's versatility extends beyond traditional software development. It serves as a powerful tool for
tasks such as automation, data analysis, machine learning, artificial intelligence, scientific
simulations, and more. Its popularity in the field of data science is particularly notable, owing to
libraries like NumPy, Pandas, and Matplotlib, which facilitate efficient data manipulation, analysis,
and visualization.
The language's object-oriented nature allows for the creation of modular and reusable code,
promoting best practices in software engineering. Additionally, Python supports multiple
programming paradigms, including procedural, functional, and object-oriented, giving developers the
flexibility to choose the approach that best suits their needs.
One of Python's standout features is its strong community support. The Python community is known
for its inclusivity, helpfulness, and a wealth of resources, including extensive documentation, forums,
and an abundance of third-party libraries. This collective effort has propelled Python to become a go-
to language for a wide array of applications.
Furthermore, Python's cross-platform compatibility ensures that code written in Python can run
seamlessly on various operating systems, including Windows, macOS, and Linux. This flexibility is
invaluable for projects that need to be deployed in diverse environments.
In recent years, Python's influence has extended into emerging technologies like machine learning
and artificial intelligence, where libraries such as TensorFlow and PyTorch have become instrumental.
This has solidified Python's position as a leading language in cutting-edge fields.
In summary, Python's appeal lies in its simplicity, readability, extensive standard library, versatility,
and strong community support. These attributes, combined with its adaptability to a wide range of
applications, have firmly established Python as a powerhouse in the world of programming, making
it an essential tool for developers across various industries.
MYSQL:
MySQL is a robust and widely-used open-source relational database management system (RDBMS)
that has played a pivotal role in the world of data storage and retrieval since its inception in the mid-
1990s. Developed by a Swedish company called MySQL AB, it was later acquired by Sun
Microsystems and subsequently by Oracle Corporation. MySQL's popularity stems from its efficiency,
reliability, and scalability, making it a top choice for businesses, web applications, and a wide range of
software projects.
One of MySQL's key strengths lies in its ability to efficiently manage large volumes of structured data.
It employs a tabular structure based on tables with rows and columns, allowing for easy organization
and retrieval of information. This relational model, combined with the Structured Query Language
(SQL), provides a powerful and intuitive way to interact with databases. SQL, a standardized language
for managing relational databases, enables developers to perform a plethora of operations, including
querying, updating, and modifying data.
MySQL is well-known for its speed and performance optimizations. It utilizes various techniques,
such as indexing, caching, and multi-threading, to ensure swift data retrieval and manipulation. This
makes it particularly suitable for applications with high transactional demands, such as e-commerce
platforms and social media sites.
The platform's flexibility and cross-platform compatibility make it a versatile choice for a wide array
of applications. Whether used in conjunction with popular web development frameworks like PHP,
Python, or Java, or integrated into enterprise-level systems, MySQL seamlessly adapts to diverse
environments. Its support for multiple operating systems, including Windows, Linux, and macOS,
further extends its adaptability.
MySQL's open-source nature fosters a thriving community of developers, which has contributed to
its continuous improvement and evolution. This community-driven approach ensures that MySQL
remains up-to-date with the latest technological advancements and security measures. Additionally,
it has led to the development of a vast ecosystem of tools, libraries, and third-party applications that
enhance MySQL's functionality and compatibility with various platforms and technologies.
In recent years, MySQL has embraced features like JSON support, spatial data processing, and
advanced replication mechanisms, catering to modern application development requirements. It has
also been integrated with cloud platforms, allowing for seamless deployment and management of
databases in cloud environments.
In conclusion, MySQL stands as a stalwart in the world of relational database management systems,
owing to its efficiency, reliability, scalability, and open-source nature. Its adaptability to various
platforms, extensive community support, and continual evolution have solidified its position as a go-
to choice for businesses and developers seeking a robust and versatile database solution.
Pandas:
pandas is a software library written for the Python programming language for data manipulation and
analysis. In particular, it offers data structures and operations for manipulating numerical tables and
time series. It is free software released under the three-clause BSD license.
Numpy:
NumPy is a library for the Python programming language, adding support for large, multi-
dimensional arrays and matrices, along with a large collection of high-level mathematical functions
to operate on these arrays.
Scipy:
SciPy is a free and open-source Python library used for scientific computing and technical computing.
SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions,
FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.
SYSTEM DESIGN:
ARCHITECTURE:
USE CASE DIAGRAM:
ACTIVITY DIAGRAM:
SEQUENCE DIAGRAM:
ALGORITHMS:
In deep learning, a computer model learns to perform classification tasks directly from
images, text, or sound. Deep learning models can achieve state-of-the-art accuracy,
sometimes exceeding human-level performance. Models are trained by using a large set of
labeled data and neural network architectures that contain many layers.
In a word, accuracy. Deep learning achieves recognition accuracy at higher levels than ever
before. This helps consumer electronics meet user expectations, and it is crucial for safety-
critical applications like driverless cars. Recent advances in deep learning have improved to
the point where deep learning outperforms humans in some tasks like classifying objects in
images.
While deep learning was first theorized in the 1980s, there are two main reasons it has only
recently become useful:
1. Deep learning requires large amounts of labeled data. For example, driverless car
development requires millions of images and thousands of hours of video.
Deep learning applications are used in industries from automated driving to medical devices.
Automated Driving: Automotive researchers are using deep learning to automatically detect
objects such as stop signs and traffic lights. In addition, deep learning is used to detect
pedestrians, which helps decrease accidents.
Aerospace and Defense: Deep learning is used to identify objects from satellites that locate
areas of interest, and identify safe or unsafe zones for troops.
Medical Research: Cancer researchers are using deep learning to automatically detect cancer
cells. Teams at UCLA built an advanced microscope that yields a high-dimensional data set
used to train a deep learning application to accurately identify cancer cells.
Industrial Automation: Deep learning is helping to improve worker safety around heavy
machinery by automatically detecting when people or objects are within an unsafe distance
of machines.
Electronics: Deep learning is being used in automated hearing and speech translation. For
example, home assistance devices that respond to your voice and know your preferences are
powered by deep learning applications.
• Graph-Based Recommendations: Leverages user-item interaction graphs to infer
relationships and make recommendations based on graph algorithms. In recent years,
knowledge graphs have been used in recommender systems in order to overcome the
problem of user-item interactions sparsity and the cold start problem which CF methods
suffer from by leveraging properties about items and users and representing them in one
single data structure.
The figure below depicts a movie Knowledge Graph and shows how a knowledge-graph
based recommendation can be provided to the user u₂.
Beyond the simple lists of properties already managed by previous versions of recommender
systems, KGs represent and leverage semantically rich relations between entities. By
construction, KGs can easily be linked between each other. For example, it would be
straightforward to extend the graph from the figure above to include movies’ main
characteristics. One remarkable thing about KG recommender systems is their ability to make
use of the KG structure to provide better recommendations .
In general, existing KG-based recommendations can be classified into two main categories:
Path-based methods which explore “the various patterns of connections among items in a
KG to provide additional guidance for recommendations”. However, they heavily rely on
manually designed meta-paths which are hard to optimize in practice. In [5], Yu et al. use the
matrix factorization method to compute latent representation of entities for different sub-
graphs extracted from a heterogeneous KG, and then use an aggregation method to group all
the generated latent representation to compute a recommendation probability. Inspired by
the work proposed in [5], Zhao et al. considers the KG as a heterogeneous information
network (HIN). They extract path based latent features to represent the connectivity
between users and items along different types of relation paths. The drawback of these
methods is that they commonly need expert knowledge to define the type and number of
meta-paths. With the development of deep learning algorithms, different models have been
proposed to automatically encode KG meta-paths through embeddings to overcome the
above mentioned limitations.
SAMPLE CODE:
import numpy as np
import pandas as pd
import os
InteractiveShell.ast_node_interactivity = "all"
import math
import json
import time
import scipy.sparse
%matplotlib inline
# loading data
electronics_data=pd.read_csv("C:/Users/mukhu/OneDrive/Desktop/New
folder/Dataset.csv",names=['userId', 'productId','Rating','timestamp'])
electronics_data.head()
electronics_data.shape
electronics_data=electronics_data.iloc[:1048576,0:]
electronics_data.dtypes
electronics_data.info()
electronics_data.describe()['Rating'].T
with sns.axes_style('white'):
print("The above bar chart represents the distribution of ratings by user. From the chart, the
highest proprotion of user rated the products 5.0 while the lowest number of users rated the
products 2.0")
print("-"*50)
electronics_data.drop(['timestamp'], axis=1,inplace=True)
no_of_rated_products_per_user =
electronics_data.groupby(by='userId')['Rating'].count().sort_values(ascending=False)
no_of_rated_products_per_user.head()
print("The output above represents the top 5 most preferred products by users.")
no_of_rated_products_per_user.describe()
quantiles = no_of_rated_products_per_user.quantile(np.arange(0,1.01,0.01),
interpolation='higher')
plt.figure(figsize=(10,10))
quantiles.plot()
intervals")
intervals")
plt.legend(loc='best')
plt.show()
#Getting the new dataframe which contains users who has given 50 or more ratings
no_of_ratings_per_product =
new_df.groupby(by='productId')['Rating'].count().sort_values(ascending=False)
fig = plt.figure(figsize=plt.figaspect(.5))
ax = plt.gca()
plt.plot(no_of_ratings_per_product.values)
plt.xlabel('Product')
ax.set_xticklabels([])
plt.show()
new_df.groupby('productId')['Rating'].mean().head()
new_df.groupby('productId')['Rating'].mean().sort_values(ascending=False).head()
new_df.groupby('productId')['Rating'].count().sort_values(ascending=False).head()
ratings_mean_count = pd.DataFrame(new_df.groupby('productId')['Rating'].mean())
ratings_mean_count['rating_counts'] =
pd.DataFrame(new_df.groupby('productId')['Rating'].count())
ratings_mean_count.head()
print('The above output indicates that the predicted products with the highest number of
ratings
ratings_mean_count['rating_counts'].max()
plt.figure(figsize=(8,6))
plt.rcParams['patch.force_edgecolor'] = True
ratings_mean_count['rating_counts'].hist(bins=50)
plt.figure(figsize=(8,6))
plt.rcParams['patch.force_edgecolor'] = True
ratings_mean_count['Rating'].hist(bins=50)
print('The above bar chart indicates that the ratings were negatively skewed. This implies
that
there were less products that had lower ratings but most products had higher ratings.')
plt.figure(figsize=(8,6))
plt.rcParams['patch.force_edgecolor'] = True
popular_products = pd.DataFrame(new_df.groupby('productId')['Rating'].count())
most_popular.head(30).plot(kind = "bar")
import os
from surprise.model_selection import train_test_split
data = Dataset.load_from_df(new_df,reader)
algo.fit(trainset)
test_pred = algo.test(testset)
test_pred[1:100]
print('The output above represents the predicted products and the user who predicted the
product
# get RMSE
accuracy.rmse(test_pred, verbose=True)
print("root mean square error (RMSE) was used for evaluating the performance of the
recommen
der on the test data. The RMSE was 1.3446 which implied that the recommender system
moderat
ely performed")
new_df1=new_df.head(10000)
value=0)
ratings_matrix.head()
ratings_matrix.shape
X = ratings_matrix.T
X.head()
X.shape
SVD = TruncatedSVD(n_components=10)
decomposed_matrix = SVD.fit_transform(X)
decomposed_matrix.shape
#Correlation Matrix
correlation_matrix = np.corrcoef(decomposed_matrix)
correlation_matrix.shape
X.index[75]
i = "B00000K135"
product_names = list(X.index)
product_ID = product_names.index(i)
product_ID
correlation_product_ID = correlation_matrix[product_ID]
correlation_product_ID.shape
Recommend.remove(i)
Recommend[0:24]
Recommend.remove(i)
Recommend[0:24]
SAMPLE OUTPUT:
OUTPUT SCREENS:
IMPLEMENTATION:
The system was built using Python as the programming language. Python was chosen
because of it’s ease of use and lot of libraries are available inbuilt, which makes the implementation
part convenient and less stressful. In the initial step we consider the dataset that we need for the
recommender system. The dataset has four columns. The first column is User id which is unique for
every user, Second column is the Product id which is unique for every product in the dataset, Third
column has the rating for a product given by the respective user. Fourth column is timestamp which
gives information about the time of the rating. After considering the dataset the next step is to
perform Exploratory Data Analysis(EDA) as it helps in providing the context that is needed for
building a good model. EDA gives information which is used for identifying any errors in the dataset.
We pre-process the dataset to check if there is accurate data in each field of all the four columns of
the dataset and to avoid any null values. The results of the pre-processing of the dataset can be
observed from the table 1.
After reading the dataset from the CSV file we want to ensure that all the columns are read
into the system correctly without any missing values. table 2 displays the total number of ratings ,
users and number of products in the dataset.
By using the .info() and .dtypes we can get a information about index dtype,column dtype and usage
of the memory. The results can be seen in the table 3.
To see the structure of the dataset we have used the .describe() function which gives information
about the count, mean, standard deviation and other quantile values like min,50% and max are
shown in the table 4 below
After reading the dataset and finding all the details, we move to the ratings column in the dataset.
Since it uses the ratings from different users, we plot the overall ratings to see if they are well
distributed as shown in Figure 4 .The plot shows that five star ratings are given the most by the users
to different products and lowest number of users rated the products with two star ratings . To plot
this graph we have used the seaborn library which is built on top of maltplotlib and is integrated with
the pandas.
From this dataset we drop the unnecessary columns that we may not need to make it more usable.
Next we find the popular products among all the products and average rating for the products . In
the next step we have to choose a classification algorithm to generate the recommendations. In this
paper it is decided to use the KNN classification algorithm. This is mainly because there is a large
amount of training data available compared to the features or columns of the dataset. So KNN can do
a better job compared to SVM which is another classification algorithm The KNN algorithm is
imported from the “Surprise” python library. The next step is to use KNN algorithm to find similarity
between the products as shown in the figure 5 depending on the other products features and their
ratings, and similar purchases done by the other users The KNN algorithm is imported from the
“Surprise” python library.
CONCLUSION AND FUTURE SCOPE:
The primary goal of this project is to provide recommendations to the user in a e-commerce website
by making use of machine learning algorithms. We have designed and implemented the system using
collaborative filtering and Pearson correlation coefficient. The dataset considered has the ratings
given by the other users to a specific product and depending on the similarity between the rated
product we try to recommend the products to our current user. The future work of the project
includes improving the efficiency of the system. And it should also be able to give appropriate
recommendations to the users who don’t have any previous purchase history or to the new users. In
future we can try to use recurrent neural networks and deep learning. With the help of deep learning
techniques we can overcome some of the drawbacks of the matrix factorization technique. Deep
learning uses recurrent neural networks to accommodate time in the recommender system which is
not possible in the matrix factorization method. We can also work on providing sub-optimal
recommendations to the user and record the reaction of the user and it can be used in the future by
the system.
Although this web application has been well-established to provide a platform for designers to
evaluate recommender systems comprehensively using different evaluation metrics, there is still
• Expand to support more algorithms. The application now only supports three collaborative
algorithms , etc.
• Include more evaluation criteria. In literature, there are so many evaluation metrics proposed
to measure the performance of recommender systems. Also, there are also so many different
variation of a given metric, for example, many approaches for measuring coverage. Although
the current system has got some important metrics implemented, different domains often
criteria. Hence, it is necessary to include more evaluation metrics in the system. For
REFERENCES:
7]. https://pandas.pydata.org/
8]. https://matplotlib.org/
9]. https://scikit-learn.org/stable/
10]. http://surpriselib.com/
11]. https://seaborn.pydata.org/
12]. https://www.scipy.org/
13]. Silvana Aciar ; Debbie Zhang ; Simeon Simoff ; John Debenham, Recommendation System Based
(https://ieeexplore.ieee.org/document/4061458)
14]. Kunal Shah ; Akshaykumar Salunke ; Saurabh Dongare ; Kisandas Antala, Recommender systems:
15]. Yuri Stekh ; Mykhoylo Lobur ; Vitalij Artsibasov ; Vitalij Chystyak, Methods and tools for building
recommender systems.
(http://files.grouplens.org/papers/www10_sarwar.pdf)
(https://www.researchgate.net/publication/
318493578_Product_Recommendation_System_from_Users_
Reviews_using_Sentiment_Analysis)
(https://www.researchgate.net/publication/
326555590_Product_Recommendation_Systems_a_Comprehe
nsive_Review)
International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075,
(http://snap.stanford.edu/class/cs224w-2012/projects/cs224w-044-final.v01.pdf)
25]. https://www.anaconda.com/
26]. https://pypi.org/project/pip/
(https://www.ijert.org/research/recommender-systems-types-of-filtering-
techniquesIJERTV3IS110197.pdf)