Report 2
Report 2
Submitted by
Satyajit Biswal(2001229045)
Rudraprasad Mohapatra(2001229042)
Biswajit Sahoo(2001229079)
2020 - 2024
Under the Guidance of
Dr.Mamata Rath
Asso. Professor, Dept. of CSE
Certificate
This is to certify that this is a bonafide Project report, titled “Book Recommendation System
using Collaborative Filtering”, done satisfactorily by Satyajit Biswal (2001229045),
Rudraprasad Mohapatra(2001229042), Biswajit Sahoo(2001229079) in partial fulfillment of
requirements for the degree of B.Tech. in Computer Science & Engineering under Biju
Patnaik University of Technology (BPUT).
This Project report on the above-mentioned topic has not been submitted for any other
examination earlier than in this institution and does not form part of any other course
undergone by the candidate.
We express our indebtedness to our guide Prof. Dr.Mamata Rath, Associate Professor of
the Computer Science & Engineering department who spared her valuable time to go through
the manuscript and offer her scholarly advice in the writing. Her guidance, encouragement,
and all-out help have been invaluable to us. There is a shortage of words to express our
gratitude and thankfulness to her.
We are grateful to all the teachers of the Computer Science & Engineering department,
DRIEMS, for their encouragement, advice, and help.
At the outset, We would like to express our sincere gratitude to Prof.Surajit Mohanty,
H.O.D of the Computer Science & Engineering department for his moral support extended to
us throughout the duration of this project.
We are also thankful to our friends who have helped us directly or indirectly for the success
of this project.
The main purpose of a recommendation system is that it will suggest items to users easily
making their lives easier. Today the quantity of facts on the net increases very hastily and
those want few instruments to seek out and access appropriate data. One such tool is the
recommendation system. Recommendation systems propose products to the users that are
most relevant to that user. Nowadays, online book marketing websites compete with one
another in various ways. One of the most powerful methods for increasing benefits and
retaining customers is a recommendation framework, which can recommend books that
interest the customer. So, the fundamental reason for this system is to support folks who have
an interest in reading and to influence those individuals who are inculcating the habit of
reading. By building a book recommendation system we aim to assist people in opting for the
proper book that interests them and so encouraging them to read more. With the assistance of
data sets and machine learning, we believe we will choose the right book for someone who
supports their interests and the data from several different readers. Therefore, here we use a
collaborative filtering method.
Keywords: Recommendation system, Collaborative filtering
CONTENTS
LIST OF FIGURES I
CHAPTER 1 1
1 INTRODUCTION 1
1.1 LITERATURE SURVEY 2
1.2 BRIEF ABOUT RECOMMENDATION SYSTEM 3
1.3 MOTIVATION OF WORK 3
1.4 OBJECTIVES 4
1.5 ALGORITHMS 5
1.5.1 SUPERVISED LEARNING 5
1.5.2 UNSUPERVISED LEARNING 6
1.6 APPLICATION OF RECOMMENDATION SYSTEM 7
CHAPTER 2 8
2 METHODOLOGY 8
2.1 PROPOSED SYSTEM 8
2.2 MODULES DIVISION 9
2.3 CONTENT-BASED FILTERING 14
2.4 USER INTERFACE 14
CHAPTER 3 17
3 EXPERIMENTAL ANALYSIS AND RESULT 17
1|Page
Chapter 1
INTRODUCTION
In today's digital age, with an abundance of books available in both physical and digital formats,
finding the perfect book to read can be overwhelming. This challenge has given rise to the need
for sophisticated recommendation systems that can help users discover books tailored to their
preferences. One such recommendation approach that has gained popularity is collaborative
filtering. Collaborative filtering is a powerful technique that leverages user behaviour and
preferences to make personalized recommendations. In this book, we embark on a journey to
explore how to build a state-of-the-art book recommendation system using collaborative filtering
techniques. But we won't stop at just the algorithms; we'll also delve into the realm of web
development for the backend and Android development for the user interface. In this
introduction, we'll provide an overview of what you can expect to learn throughout the book, as
well as the motivation behind the project. As avid readers ourselves, we understand the thrill of
discovering a new book that aligns perfectly with your tastes and interests. However, the sheer
volume of books available today, both in print and digital formats, makes this endeavour
increasingly challenging. Traditional bookstores offer limited shelf space, and even online
retailers struggle to curate personalized suggestions. This is where recommendation systems step
in to bridge the gap between readers and the vast literary landscape. The concept of
recommendation systems is not new. They have been employed successfully in various domains,
from e-commerce to streaming services, to help users discover products, movies, and music they
are likely to enjoy. Collaborative filtering, a subset of recommendation techniques, focuses on
user behaviour and preferences to make personalized suggestions. By analysing the choices of
like-minded readers, it uncovers hidden gems that might have otherwise remained buried in the
digital abyss.
2|Page
1.1 LITRATURE SURVEY
Paul Milgram et al [1] This research paper introduces the concept of collaborative
filtering in the context of book recommendation systems. The authors explore various
collaborative filtering techniques, including user-based and item-based approaches. They
discuss the strengths and limitations of these techniques and their relevance to book
recommendations.
Carlos González et al [2] In this review of literature, the authors investigate the use of
Android technology in the context of recommendation systems. They examine Android-
based applications for book recommendations and how they leverage collaborative
filtering algorithms to provide personalized book suggestions to users.
Pratishtha Gupta el al [3] This study delves into the integration of web technologies with
collaborative filtering for book recommendations. The authors discuss web-based
platforms that employ collaborative filtering to offer book recommendations to users
through web browsers. They highlight the user experience and scalability aspects of
these systems.
Vaishnavi Ravindran et al [4] In their survey paper, the authors provide an overview of
machine learning techniques used in collaborative filtering for book recommendations.
They explore the application of matrix factorization, deep learning, and content-based
filtering in the context of book recommendation systems. The paper also discusses the
challenges and future directions in this area.
Sreyan Ghosh et al [5] This research study investigates the impact of machine learning-
based book recommendation systems on user engagement and book discovery. The
authors present a case study involving Android and web technologies, where they
analyze user interactions and preferences to evaluate the effectiveness of their
collaborative filtering approach.
3|Page
1.2 BRIEF ABOUT RECOMMENDATION SYSTEM
There are several potential motivations for using Book recommendation in the real world. Here
are a few:
Personalized Book Discovery: Many readers are overwhelmed by the vast number of
books available today. A personalized book recommendation system can help users
discover books that align with their unique interests and reading history. By tailoring
recommendations, we aim to improve user engagement and satisfaction.
Promoting Lesser-Known Books: Often, high-demand books receive more attention,
leaving lesser-known books overlooked. Our recommendation system can help promote
4|Page
diverse books and authors by suggesting hidden gems to users who might not have
discovered them otherwise.
Boosting Sales and Engagement: For authors, publishers, and bookstores, personalized
recommendations can increase book sales and user engagement. By connecting readers
with book, they are more likely to enjoy, the system can drive revenue and support the
growth of the book industry.
By integrating Android and web development, our recommendation system can reach a
broad audience. Android apps cater to mobile users, while web services provide
accessibility on various devices. This cross-platform approach ensures wider adoption.
Overall, the motivation behind this project is to create a powerful and user-centric book
recommendation system that leverages Android and web development for broad accessibility
and uses machine learning to deliver personalized book suggestions. By addressing the needs of
both readers and stakeholders in the book industry, we aim to create a platform that enhances the
reading experience, promotes diverse literature, and drives engagement and revenue for all
involved parties.
1.4 OBJECTIVES
The objectives of developing a book recommendation system using Android development, web
development for backend services, and machine learning for creating the book recommendation
model are:
Create a system that delivers personalized book recommendations to users based on their
reading preferences, history, and behavior, enhancing the user experience and improving
book discovery.
Promote a wide range of books, including lesser-known and niche titles, to ensure users
are exposed to a diverse selection of reading materials, contributing to a more inclusive
literary ecosystem.
Increase user engagement by providing relevant book suggestions, thereby encouraging
users to spend more time on the platform and read more books.
5|Page
Drive book sales by connecting readers with books they are likely to purchase and enjoy,
benefiting authors, publishers, and bookstores.
Develop a recommendation system accessible via Android mobile applications and web
browsers to reach a broader audience and provide recommendations on various devices.
Implement a scalable and efficient backend infrastructure to handle a growing user base
and deliver recommendations in real-time.
Build and train a machine learning model that utilizes user data and book attributes to
generate accurate and evolving book recommendations.
Implement a recommendation system that continuously learns from user interactions,
adapting to changing user preferences and staying up-to-date with new book releases.
Enhance the overall user experience by reducing the time users spend searching for
books and increasing their satisfaction with the platform.
Establish a competitive edge in the digital book market by offering a state-of-the-art
recommendation system that caters to the specific tastes and needs of users.
1.5 ALGORITHMS
1.5.1 Supervised learning
In order to ensure an adequate allocation of the information to the respective model groups of
the algorithms, these then have to be specified. In other words, the system learns on the basis of
given input and output pairs. During monitored learning, a programmer, who acts as a kind of
teacher, provides the appropriate values for a particular input. The aim is to train the system in
the context of successive calculations with different inputs and outputs to establish connections.
Supervised learning is where you have input variables (X) and an output variable (Y) and you
use an algorithm to learn the mapping function from the input to the output. Y = f(X) The goal is
to approximate the mapping function so well that when you have new input data (X) that you
can predict the output variables (Y) for that data. It is called supervised learning because the
process of an algorithm learning from the training dataset can be thought of as a teacher
supervising the learning process. We know the correct answers, the algorithm iteratively makes
predictions on the training data and is corrected by the teacher. Learning stops when the
algorithm achieves an acceptable level of performance. Techniques of Supervised Machine
6|Page
Learning algorithms include linear and logistic regression, multi- class classification, Decision
Tree and Support Vector Machine. Supervised Learning problems can be further grouped into
Regression and Classification problems. The difference between these two is that the dependent
attribute is numerical for regression and categorical for classification:
Regression
Linear regression is a linear model, e.g. a model that assumes a linear relationship between the
input variables (x) and the single output variable (y). More specifically, that y can be calculated
from a linear combination of the input variables (x). When there is a single input variable (x),
the method is referred to as simple linear regression. When there are multiple input variables,
literature from statistics often refers to the method as multiple linear regression.
Classification
Classification is a process of categorizing a given set of data into classes, It can be performed on
both structured or unstructured data. The process starts with predicting the class of given data
points. The classes are often referred to as target, label or categories. In short classification
either predicts categorical class labels or classification data based on the training set and the
values (class labels) in classifying attributes and uses it in classifying new data. There are
number of classification models. Classification models include Logistic Regression, Decision
Tree, Random Forest, Gradient Boosted Tree, One- vs.-One and Naïve Bayes.
In unsupervised learning, artificial intelligence learns without predefined target values and
without rewards. It is mainly used for learning segmentation (clustering). The machine tries to
structure and sort the data entered according to certain characteristics. For example, a machine
could (very simply) learn that coins of different colors can be sorted according to the
characteristic "color" in order to structure them. Unsupervised Machine Learning algorithms are
used when the information used to train is neither classified nor labeled. The system does not
figure out the right output but it explores the data and can draw inferences from datasets to
describe hidden structures from unlabeled data. Unsupervised Learning is the training of
Machine using information that is neither classified nor labeled and allowing the algorithm to
7|Page
act on that information without guidance. Unsupervised Learning is classified into two
categories of algorithms:
Clustering
A clustering problem is where you want to discover the inherent grouping in the data such as
grouping customers by purchasing behavior.
Association
An Association rule learning problem is where you want to discover rules that describe large
portions of your data such as people that buy X also tend to buy Y.
Recommendation systems have diverse applications across various industries and domains.
These systems are designed to provide personalized suggestions or recommendations to users,
enhancing their experiences, and often leading to increased engagement, customer satisfaction,
and business revenue. Here are some common application areas of recommendation systems:
E-Commerce: Online retailers like Amazon and eBay use recommendation systems to
suggest products to customers based on their browsing history, purchase history, and
preferences. These systems boost sales and help customers discover new products.
Streaming Services: Platforms like Netflix, Spotify, and YouTube employ
recommendation systems to suggest movies, TV shows, music, and videos to users. This
keeps users engaged and encourages them to explore more content.
Social Media: Social networks like Facebook, Twitter, and LinkedIn use
recommendation algorithms to suggest friends to connect with, posts to view, and groups
to join, increasing user engagement and network growth.
News and Content Aggregation: News websites and content aggregators recommend
articles, news stories, and videos based on a user's interests and reading history, allowing
users to stay informed and engaged.
8|Page
Chapter 2
METHODOLOGY
System Architecture describes “the overall structure of the system and the ways in
which the structure provides conceptual integrity”. The system architecture to
build a recommendation system involves the following five major steps.
1. Data Acquisition
2. Data Pre-processing
3. Feature Extraction
4. Training Methods
5. Testing Data
In Step 1, Dataset was collected from Good Reads Website in which three
datasets arepresent i.e. Books Dataset, Ratings Dataset, and Users Dataset.
In Step 2, Datasets were pre-processed to make them suitable for developing the
Recommendation system.
In Step 3, Feature extraction is performed in which Truncated-SVD is used to
reduce the features of the dataset and, Data splitting is done in which training
dataset and testing dataset are divided into 80:20 ratio.
In Step .4, Content Based Filtering System is developed in which book
description is taken as an input and Collaborative Filtering System is developed
by building a model using K-Means Algorithm over Gaussian Mixture after
comparing with Silhouette scores. In step 5, Testing of model with test data is
performed.
9|Page
Fig: 2.1 System Architecture
10 | P a g e
various online sources like databases and files. The size and the quality of the data in the
collected dataset will determine the efficiency of the model.
ISBN
Book-Title
Book-Author
Year of publish on
Publisher
Image-URL-s
11 | P a g e
Image-URL-M
Image-URL-L
One more dataset i.e., ratings dataset was also collected from kaggle website.
12 | P a g e
One more dataset i.e., the Users dataset was also collected from the Kaggle website.
The goal of this step is to study and understand the nature of data that was acquired in the
previous step and also to know the quality of data. In this step, we will check for any null values
and remove them as they may affect the efficiency. Identifying duplicates in the dataset and
removing them is also done in this step.
Once Clustering model has been trained on pre-processed dataset, then the model is tested using
different data points. In this testing step, the model is checked for the silhouette score for
checking goodness of clustering. All the training methods need to be verified for finding out the
14 | P a g e
best model to be used. In figures after fitting our model with training data, we used this model to
predict values for test dataset. These predicted values on testing data are used for models
comparison. The users in the test set, on average, rated their clusters' favorite books higher than
a random set of 10 books by
0.47tars, or nearly half a star.
A Content-Based filtering system recommends items that are similar to the content of the item.
This System uses the description of the items and gives the recommendations that are similar to
the description. We used Cosine similarity as a similarity function for this system. The Item-
Content Matrix which describes the attributes of the features is taken as an input. Based on the
angle between the vectors, Cosine similarity is calculated. We improve the quality of the
content-based system by normalizing and tuning the attributes with the use of the TF-IDF
vectorizer. TF (Term Frequency) is termed as a word frequency in a document. IDF (Inverse
Document Frequency) is universe document frequency. The TF-IDF Vectorizer will tokenize
documents, learn the vocabulary and inverse document frequency weightings, and allow you to
encode new documents. It transforms text to feature vectors that can be used as input to
estimator. Vocabulary is a dictionary that converts each token (word) to feature index in the
matrix, each unique token gets a feature index. It tells you that the token 'me' is represented as
feature number 8 in the output matrix. A vocabulary of 8 words is learned from the documents
and each word is assigned a unique integer index in the output vector. In this paper, TF-IDF that
takes stop_words as a parameter transforms book description into matrix of vectors.
User interface is very essential for any project because everyone who tries to utilize the system
for a purpose will try to access it using an interface. Indeed, our system also has a user interface
built to facilitate users to utilize the services we provide. Users in our system can login/signup
15 | P a g e
using the interface provided to them. They can view all the existing books in our database. The
books that are extracted from the datasets are stored in a database. They can search any book by
its title or by its author. Users can also view the books they have rated and they can also log out
themselves. The web application interface, which serves as the front-end of our project, is
accessible through any web browser. It has been developed using Flask, a web framework that
enables us to create dynamic and responsive web pages.
In addition to the web interface, we are also working on an Android application as part of our
project. The Android app will provide users with another convenient way to access and interact
with our book recommendation system. Users will be able to view and search for books, and the
app will also offer book recommendations when users search for specific books, similar to the
web interface. This multi-platform approach ensures that users can access our services through
both web and mobile channels, providing flexibility and convenience.
16 | P a g e
Fig: 2.9 Web Interface (Light)
17 | P a g e
Chapter 3
EXPERIMENTAL ANALYSIS AND RESULTS
A requirement is a feature that the system must have or a constraint that it must to be accepted
by the client. Requirement Engineering aims at defining the requirements of the system under
construction. Requirement Engineering include two main activities requirement elicitation
which results in the specification of the system that the client understands and analysis which in
analysis model that the developer can unambiguously interpret. A requirement is a statement
about what the proposed system will do.
• Functional Requirements
• Non-Functional Requirements
The book recommendation system must perform data preprocessing, including cleaning,
formatting, and structuring the raw book data, preparing it for analysis. It should apply machine
learning recommendation algorithms, such as collaborative filtering and content-based filtering,
on the preprocessed training data to generate personalized book recommendations for users.
Users should be able to create profiles by providing information such as usernames and
passwords. User profiles will store their preferences, ratings, and reading history. Secure user
authentication is necessary, allowing users to log in using their credentials to access
personalized book recommendations. The system must facilitate book searches, allowing users
to explore books by criteria such as title, author, genre, or other relevant attributes. Detailed
book information, including title, author, description, ratings, and user reviews, should be
accessible to users. Additionally, users should have the ability to rate books they have read, with
these ratings stored in their profiles for future recommendations. A logout feature should enable
users to securely log out from their accounts when done.
18 | P a g e
Integration with an Android app is a crucial functional requirement, ensuring that the app
seamlessly displays book recommendations and provides search functionality. The generated
recommendations should be displayed to users, who can then choose to view books aligned with
their preferences.
• Accuracy
• Reliability
• Flexibility
NumPy
Besides its obvious scientific uses, NumPy can also be used as an efficient multidimensional
container of generic data.
Pandas
Pandas is an open-source library that is built on top of NumPy library. It is a Python package
that offers various data structures and operations for manipulating numerical data and time
series. It is fast and it has high-performance & productivity for users. It provides high-
performance and is easy-to-use data structures and data analysis tools for the Python language.
Pandas is used in a wide range of fields including academic and commercial domains including
economics, Statistics, analytics, etc.
20 | P a g e
SKLearn
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It is
an open-source Python library that implements a range of machine learning, pre-processing,
cross-validation and visualization algorithms using a unified interface. Sklearn provides a
selection of efficient tools for machine learning and statistical modeling including classification,
regression, clustering and dimensionality reduction via a consistence interface in Python. This
library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.
Pickle
Python pickle module is used for serializing and de-serializing a Python object structure.
Pickling is a way to convert a python object (list, dict, etc.) into a character stream. The idea is
that this character stream contains all the information necessary to reconstruct the object in
another python script. Pickling is useful for applications where you need some degree of
persistency in your data. Your program's state data can be saved to disk, so you can continue
working on it later.
Preliminary investigation examine project feasibility, the likelihood the system will be useful to the
organization. The main objective of the feasibility study is to test the Technical, Operational and
Economical feasibility for adding new modules and debugging old running system. All system is
feasible if they are unlimited resources and infinite time. There are aspects in the feasibility study
portion of the preliminary investigation
• Economic Feasibility
• Technical Feasibility
• Operational Feasibility
21 | P a g e
3.3.1 Economic Feasibility
As system can be developed technically and that will be used if installed must still be a good
investment for the organization. In the economic feasibility, the development cost in creating the
system is evaluated against the ultimate benefit derived from the new systems. Financial benefits
must equal or exceed the costs. The system is economically feasible. It does not require any
addition hardware or software. Since the interface for this system is developed using the existing
resources and technologies java1.6 open source, there is nominal expenditure and economic
feasibility for certain.
This assessment focuses on the technical resources available to the organization. It helps
organizations determine whether the technical resources meet capacity and whether the technical
team is capable of converting the ideas into working systems. Technical feasibility also involves
evaluation of the hardware, software, and other technology requirements of the proposed system.
This assessment is based on an outline design of system requirements, to determine whether the
company has the technical expertise to handle completion of the project. When writing a
feasibility report, the following should be taken to consideration
• A brief description of the business to assess more possible factors which could affect the
study
• The possible solutions to the problem At this level, the concern is whether the proposal is
both technically and legally feasible (assuming moderate cost). The technical feasibility
assessment is focused on gaining an understanding of the present technical resources of the
organization and their applicability to the expected needs of the proposed system. It is an
evaluation of the hardware and software and how it meets the need of the proposed system.
22 | P a g e
3.3.3 Operational Feasibility
Proposed projects are beneficial only if they can be turned out into information system. That will
meet the organization‟s operating requirements. Operational feasibility aspects of the project are
to be taken as an important part of the project implementation. Some of the important issues
raised are to test the operational feasibility of a project includes the following
• Will the system be used and work properly if it is being developed and implemented?
• Will there be any resistance from the user that will undermine the possible application
benefits? This system is targeted to be in accordance with the above- mentioned issues.
Beforehand, the management issues and user requirements have been taken into consideration.
So there is no question of resistance from the users that can undermine the possible application
benefits.
23 | P a g e
CONCLUSION AND FUTURE WORK
Conclusion
In this project, we have recommended the books for a user using the model trained using K-
Means Clustering which is a Collaborative Filtering Technique. We have also compared
different models built using different methods and identified the best model and justifies why it
has chosen that model. We have used the books dataset that is available in the Goodreads
website which consists of more than 3000 books. The models are built using the reduced
features which is done by Truncated SVD. Based on those features the author built a model that
gives a positive Silhouette score. The model that is suggested by this paper is useful for book
readers. The system we have developed can make recommendations for new users also.
Future Work
The System has adequate scope for modification in future if it is necessary. Development and
launching of Mobile app and refining existing services and adding more service, System
security, data security and reliability are the main features which can be done in future. The API
for the shopping and payment gateway can be added so that we can also buy a book at the
moment. In the existing system there are only some selected categories, so as an extension to the
site we can add more categories as compared to existing site. Also, we can add admin side with
some functionalities like books management, User management etc.
24 | P a g e
REFERENCES
1. Avi Rana and K. Deeba et.al, “Online Book Recommendation System using Collaborative
Filtering (With Jaccard Similarity)” in IOP ebooks 1362, 2019.
2. G. Naveen Kishore, V. Dhiraj, Sk Hasane Ahammad, Sivaramireddy Gudise, Balaji
Kummaraa and Likhita Ravuru Akkala, “Online Book Recommendation System”
International Journal of Scientific & Technology Research vol.8, issue 12, Dec 2019.
3. Uko E Okon, B O Eke and P O Asaga, “An Improved Online Book Recommender System
using Collaborative Filtering Algorithm”, International Journal of Computer Applications
vol.179-Number 46, 2018.
4. Jinny Cho, Ryan Gorey, Sofia Serrano, Shatian Wang, JordiKai Watanabe-Inouye, “Book
Recommendation System” Winter 2016.
5. Ms. Sushma Rjpurkar, Ms. Darshana Bhatt and Ms. Pooja Malhotra, “Book
Recommendation System” International Journal for Innovative Research in Science &
Technology vol.1, issue 11, April 2015.
6. Abhay E. Patil, Simran Patil, Karanjit Singh, Parth Saraiya and Aayusha Sheregar, “Online
Book Recommendation System using Association Rule Mining and Collaborative Filtering”
International Journal of Computer Science and Mobile Computing vol.8, April 2019.
7. Suhas Patil and Dr. Varsha Nandeo, “A Proposed Hybrid Book Recommender System”
International Journal of Computer Applications vol.6 – No.6,Nov – Dec 2016.
8. Ankit Khera, “Online Recommendation System” SJSU ScholarWorks, Masters Theorem and
Graduate Research, Master‟s Projects, 2008.
9. Anagha Vaidya and Dr. Subhash Shinde, “Hybrid Book Recommendation System”
International Research Journal of Engineering and Technology (IRJET), July 2019.
10. Dhirman Sarma, Tanni Mittra and Mohammad Shahadat Hossain, ”Personalized Book
Recommendation System using Machine Learning Algorithm” The Science and Information
Organization vol.12,2019.
25 | P a g e