Content Based ML Repo
Content Based ML Repo
Content Based ML Repo
Recommendation System
ABHISHEK DUTTA
Regd.No: 2001289265
Content-Based Movie
Recommendation System
Submitted in Partial Fulfillment of
The Requirement for the 7th
Sem. Project
Bachelor of Technology
In
Computer Science & Engineering
Submitted by
ABHISHEK DUTTA
Regd.No: 2001289265
1
CERTIFICATE OF APPROVAL
----------------------------------
Place: Bhubaneswar Dr. Subhra Swetanisha
2
DECLARATION
I, Abhishek Dutta declare that the Minor Project Work presented through this
report was carried out by me in accordance with the requirements and in
compliance of the Academic Regulations of the Biju Patnaik University of
Technology for the Bachelor of Technology (B.Tech.) Degree Programed in
Computer Science & Engineering (Artificial Intelligent and Machine Learning)
and that it has not been submitted for any other academic award. Except where
indicated by specific reference in the text, the work is solely my own work. Work
done in collaboration with, or with the assistance of, others, has been
acknowledged and is indicated as such. Any views expressed in the report are
those of the author.
3
CERTIFICATE
This is to certify that this report of Minor Project work on the topic
entitled Content-Based Movie Recommended System which is
submitted by Abhishek Dutta bearing Registration No.: 2001289265 in
partial fulfillment of the requirement for the award of Bachelor of
Technology in Computer Science & Engineering (Artificial
Intelligent and Machine Learning) of Biju Patnaik University of
Technology, Odisha, is a record of the candidate's own work carried out
by him under my supervision.
4
ABSTRACT
5
ACKNOWLEDGMENT
6
CONTENTS
TITLE Page
No.
Contents 7
1. Introduction 8-9
1.1 Background
2.Introduction to Machine Learning 10-15
2.1 Types of Learning Algorithms
2.2 Applications of Machine Learning
3. Data Collection and Preprocessing 16-17
3.1 Data Source:
3.2 Data Preprocessing
4. Feature Engineering 18-20
4.1 User Context:
4.2 Movie Features
5. Machine Learning Model 21-22
5.1 Algorithm Selection:
5.2 Training
5.3 Prediction:
6. Implementation Using NumPy and Pandas 23-24
6.1 NumPy Integration:
6.2 Pandas Integration:
6.3 Python Code
7. Evaluation Metrics 25-28
7.1 Accuracy:
7.2 Precision and Recall
8. Results and Discussion 29-31
8.1 Performance Metrics
8.2 Discussion
9. Conclusion 32-34
9.1 Summary
9.2 Future Work
10. References 35
7
CHAPTER 1
Introduction
1.1 Background
8
relevance. The rationale behind this shift lies in the recognition that user
preferences are not static; they evolve based on various contextual cues.
9
CHAPTER 2
The process of learning begins with observations or data, such as examples, direct
experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim
is to allow the computers learn automatically without human intervention or
assistance and adjust actions accordingly.
The processes involved in machine learning are similar to that of data mining and
predictive modelling. Both require searching through data to look for patterns and
adjusting program actions accordingly. Many people are familiar with machine
learning from shopping on the internet and being served ads related to their
purchase. This happens because recommendation engines use machine learning
to personalize online ad delivery in almost real time. Beyond personalized
marketing, other common machine learning use cases include fraud detection,
spam filtering, network security threat detection, predictive maintenance and
building news feeds.
10
Machine learning algorithms are often categorized as supervised or unsupervised.
Supervised algorithms require a data scientist or data analyst with machine
learning skills to provide both input and desired output, in addition to furnishing
feedback about the accuracy of predictions during algorithm training. Data
scientists determine which variables, or features, the model should analyse and
use to develop predictions. Once training is complete, the algorithm will apply
what was learned to new data.
Supervised Learning
Supervised machine learning algorithms can apply what has been learned in the
past to new data using labeled examples to predict future events. Starting from
the analysis of a known training dataset, the learning algorithm produces an
inferred function to make predictions about the output values. The system is able
to provide targets for any new input after sufficient training. The learning
algorithm can also compare its output with the correct, intended output and find
errors in order to modify the model accordingly.
11
Fig. 2.1: Supervised Learning Workflow
Unsupervised Learning
12
Fig. 2.1: Unsupervised Learning Workflow
Others
13
and software agents to automatically determine the ideal behaviour within
a specific context in order to maximize its performance. Simple reward
feedback is required for the agent to learn which action is best; this is
known as the reinforcement signal.
Web Search Engine: One of the reasons why search engines like Google,
Bing etc. work so well is because the system has learnt how to rank pages
through a complex learning algorithm.
Spam Detector: Our mail agent like Gmail or Hotmail does a lot of hard
work for us in classifying the mails and moving the spam mails to spam
folder. This is again achieved by a spam classifier running in the back end
of mail application.
14
Hardwiring Intelligence in it is difficult. Best way to do it is to have some way
for machines to learn things themselves. A mechanism for learning – if a machine
can learn from input then it does the hard work for us. This is where Machine
Learning comes in action. Some examples of machine learning are:
15
CHAPTER 3
16
3.2 Data Preprocessing
17
CHAPTER 4
Feature Engineering
- Viewing History: Examining the user's past viewing history to identify patterns
and trends, which contribute to a more comprehensive understanding of their
content preferences.
18
- User Ratings: Incorporating user ratings for movies as a quantitative measure
of their preferences. This helps in distinguishing between movies that a user
merely watched and those they actively enjoyed.
19
for trends, ensuring that recommendations align with both genre preferences and
temporal relevance.
In the subsequent sections, the report will delve into the implementation of
machine learning algorithms and the utilization of Python libraries NumPy and
Pandas to process and analyze these features, ultimately translating them into
accurate and context-aware movie recommendations.
20
CHAPTER 5
21
5.2 Training
Once the algorithm is selected, the next step involves training the model using the
preprocessed dataset. This dataset, enriched with user context and relevant movie
features, serves as the foundation for the machine learning model. During the
training phase, the model learns patterns, correlations, and relationships between
user preferences and movie features.
5.3 Prediction
22
CHAPTER 6
For instance:
23
NumPy arrays provide a structured and efficient way to handle numerical data,
enhancing the system's computational efficiency.
24
6.3 Python Code
Code Snippets Illustrating the Implementation:
25
Fig:6.4: Code Snippet
This comprehensive code snippet illustrates the integration of NumPy and Pandas
in loading, preprocessing, and training a machine learning model for movie
recommendations. It encompasses additional details such as data cleaning
techniques, feature engineering, and evaluation metrics. Adjust the code
according to your specific dataset and requirements.
26
CHAPTER 7
Evaluation Metrics
7.1 Accuracy
27
7.2 Precision and Recall
Precision and recall offer more nuanced insights into the recommendation
system's effectiveness, particularly when there is an imbalance between positive
and negative instances.
- Recall: Recall, also known as sensitivity or true positive rate, assesses the
system's ability to capture all relevant positive instances. It represents the ratio of
correctly predicted positive observations to the total actual positive instances. A
high recall indicates that the system is effective in identifying a significant portion
of actual positive preferences. The recall is calculated by:
Balanced precision and recall are indicative of a recommendation system that not
only predicts positive preferences accurately but also captures a substantial
portion of the actual positive instances. These metrics provide a more
comprehensive evaluation, capturing the system's performance from both
accuracy and relevance perspectives.
28
CHAPTER 8
- Precision: Precision delves deeper into the system's ability to make accurate
positive predictions. It gauges the proportion of correctly predicted positive
observations among the instances predicted as positive. A heightened precision
score implies that the system is adept at accurately identifying and recommending
movies that match users' preferences.
29
- Recall: In tandem with precision, recall measures the system's effectiveness in
capturing all relevant positive instances. It assesses the ratio of correctly predicted
positive observations to the total actual positive instances. A robust recall
indicates the system's capability to comprehensively capture a substantial portion
of users' actual positive preferences.
8.2 Discussion
The discussion phase serves as a critical juncture for interpreting the results
gleaned from the evaluation metrics and identifying potential avenues for
refinement.
- Precision and Recall Analysis: The analysis of precision and recall provides
deeper insights into the system's ability to make positive predictions accurately
and comprehensively capture positive instances. Striking a balance between
precision and recall is pivotal for ensuring that the system not only makes accurate
30
recommendations but also covers a significant portion of users' actual positive
preferences.
31
CHAPTER 9
Conclusion
9.1 Summary
The journey commenced with an exhaustive data collection phase, where a rich
and diverse movie dataset was curated, encompassing a myriad of genres,
directors, release years, and user interactions. This dataset, a cornerstone of the
system, underwent meticulous preprocessing, including the handling of missing
values and transformations for optimal model compatibility.
The feature engineering process emerged as a pivotal step, weaving together the
intricate tapestry of user context and movie features. User-centric elements, such
as preferences, viewing history, and ratings, were harmoniously integrated with
movie features like genre, director, and release year. This intricate dance of
features laid the foundation for a recommendation system that is not just
algorithmic but deeply personalized and context-aware.
32
At the heart of the system lies the machine learning model, carefully selected to
align with the unique demands of movie recommendation. NumPy and Pandas
played instrumental roles, seamlessly integrating numerical operations, array
manipulations, and efficient data manipulation into the training and prediction
phases. This synergy resulted in a robust and efficient model that forms the
backbone of our recommendation system.
33
optimize the model's predictive capabilities and responsiveness across a spectrum
of user profiles.
34
CHAPTER 10
References
[1]. Zhang, J.; Wang, Y.; Yuan, Z.; Jin, Q.; “Personalized
Real-Time Movie Recommendation System: Practical
Technology, vol: 25, 2020, pp: 180-191
35