WT MiniProj
WT MiniProj
Abstract
Date:
In today’s digital age, users are overwhelmed with vast content choices, especially in the entertainment
industry. Recommendation systems serve as intelligent filters that help users discover content tailored to
their preferences. This mini-project focuses on developing a content-based movie recommendation
system using Python’s Scikit-learn library.
The core idea is to utilize movie metadata such as genres, directors, actors, and plot summaries to
compute the similarity between movies and recommend titles similar to the one provided by the user. By
using TF-IDF vectorization on textual data and cosine similarity as the metric for measuring content
closeness, the system can effectively identify and suggest related movies.
This system follows a content-based filtering approach, meaning it does not depend on user ratings or
collaborative inputs. It builds a profile for each movie based on its metadata and matches it against other
movies. This makes the system especially useful in scenarios where user data is scarce or unavailable.
The goal of this project is not only to develop a functional recommendation engine but also to explore the
practical implementation of Natural Language Processing (NLP) techniques, feature engineering, and
machine learning methodologies. By the end of this project, students will have a deeper understanding of
how real-world recommendation engines work and how machine learning models can be deployed to
enhance user experience in digital applications.
This system lays the groundwork for future improvements such as incorporating user preferences, building
hybrid recommendation engines, and integrating the solution into web or mobile applications.
Project Members
Sabale Nikita ,
Shaikh Khushi,
Sahane Sahil.
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 1
2. Introduction
In the era of digital transformation, people are exposed to an overwhelming amount of multimedia
content—particularly movies and TV shows—across various platforms like Netflix, Amazon Prime, and
YouTube. The abundance of content can make it difficult for users to decide what to watch next. To
resolve this dilemma, recommendation systems have become a vital part of digital platforms, guiding
users by predicting and suggesting items of interest.
A recommendation system is a software application that filters information to present users with items
(movies, books, products, etc.) that are most relevant to their interests. These systems rely on data
analysis, machine learning, and user interaction history to provide personalized suggestions.
Recommendation systems are mainly of three types: collaborative filtering, content-based filtering, and
hybrid models. Each has its strengths, depending on the data available and the use case.
The project leverages Python, one of the most popular programming languages for data science, and
Scikit-learn, a powerful machine learning library. It employs TF-IDF (Term Frequency-Inverse
Document Frequency) to convert textual metadata into numerical vectors, allowing us to measure the
cosine similarity between movies.
By integrating machine learning and natural language processing (NLP) techniques, this project not only
delivers practical experience but also mimics how modern movie recommendation systems function
behind the scenes. The project also encourages students to understand the importance of data
preprocessing, vectorization of text, feature selection, and model evaluation.
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 2
3. Literature Survey
Recommendation systems have undergone significant evolution in recent years, transforming from static,
rule-based systems to dynamic, machine learning-driven models. This transformation is largely driven by
the exponential growth of digital content and user data, as well as advances in computing power and
algorithms. The literature on recommendation systems can be broadly classified into three major
categories: content-based filtering, collaborative filtering, and hybrid approaches.
Content-based filtering has been a popular technique in early recommendation engines. It works on the
premise of recommending items similar to those the user has liked in the past, based on item features. For
instance, Pazzani and Billsus (1997) discussed adaptive web-based systems that suggest content to users
by learning from their preferences. In the context of movies, metadata such as genre, cast, director, and
plot summaries are commonly used.
Collaborative filtering, on the other hand, relies on the collective behavior of users. It was popularized
by platforms like Amazon and Netflix, which collect large-scale user interaction data. In particular, user-
based collaborative filtering compares a target user’s preferences with other users to recommend new
items, while item-based collaborative filtering focuses on the similarity between items. Sarwar et al.
(2001) introduced item-based collaborative filtering that significantly improved scalability and accuracy
in large datasets.
Hybrid models aim to overcome the limitations of both content-based and collaborative filtering. They
combine the strengths of each method and offer better accuracy and adaptability. For example, Netflix's
recommendation system uses a hybrid approach by integrating content analysis with collaborative
behavior.
Recent advancements in recommendation systems also involve the use of deep learning and NLP
techniques. Tools such as Word2Vec, LSTM, and transformers are used to model deeper semantic
understanding of content and user preferences. However, these models require a vast amount of data and
computational resources, which may not be practical for every use case or educational setting.
This mini-project uses a more traditional yet effective approach—content-based filtering with TF-IDF
vectorization and cosine similarity—to demonstrate the core principles of recommendation systems.
This methodology is widely recognized in academic literature as an excellent starting point for building
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 3
4. Problem Statement
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 4
5. Motivation
In today’s digital age, users are exposed to an enormous amount of content, especially in the entertainment
industry where thousands of movies are released each year and countless titles are available on streaming
platforms. With this ever-increasing library of content, users often face the dilemma of choice overload.
Instead of enjoying the experience of watching movies, they end up wasting time scrolling endlessly to
find something worthwhile. This issue is not only frustrating for users but also detrimental to the
engagement goals of platforms like Netflix, Amazon Prime, Disney+, and others.
The motivation behind this project stems from the real-world need to simplify the content discovery
process. People want relevant suggestions that align with their tastes without the hassle of searching for
them manually. Recommendation systems have emerged as a solution to this problem, acting as digital
curators that tailor suggestions based on user interests. Among the different types of recommendation
engines, content-based filtering stands out when dealing with new users or cold-start problems, where
there isn’t enough interaction data to support collaborative filtering methods.
This project is particularly motivating for students and learners in the data science and machine learning
fields, as it offers a hands-on opportunity to work with:
Another compelling reason for choosing this project is its practicality and relatability. Most people are
familiar with movie recommendations—they experience them daily on platforms they use. Hence,
understanding how such systems work not only satisfies technical curiosity but also provides insights into
how data-driven decisions influence user behavior and platform engagement.
From an educational perspective, this project is highly beneficial. It helps bridge the gap between theory
and real-world application by demonstrating how basic machine learning algorithms can be used to create
meaningful products. It also introduces foundational skills like data preprocessing, feature extraction, and
model evaluation.
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 5
6. Scope of the Project
The scope of this project revolves around developing a functional, content-based movie recommendation
system using machine learning techniques, particularly the Scikit-learn library in Python. The system
focuses on recommending movies similar to a given movie, based solely on metadata attributes such as
genre, director, actors, and plot summary.
This project strictly uses content-based filtering. It does not utilize user preferences, ratings, or behavior
data. The system relies on analyzing movie features alone to make recommendations. This approach is
beneficial in scenarios where user interaction data is limited or unavailable, making it highly suitable
for startup platforms or new applications.
� Metadata-Driven Recommendation
Movie Title
Genre
Director
Actors
Plot
These fields are preprocessed and combined to form a textual corpus for each movie. This metadata is
then vectorized using TF-IDF, which helps capture the importance of terms across all movies. The use of
textual data aligns the project with basic Natural Language Processing (NLP) concepts.
� Similarity-Based Recommendation
After feature extraction, the similarity between movies is calculated using cosine similarity, a widely
used metric for text-based data comparison. This mathematical approach ensures that only the most
relevant and contextually similar movies are recommended to the user.
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 6
7. Objectives
The primary objective of this mini-project is to develop a content-based movie recommendation system
using Python and the Scikit-learn library. The system should be able to recommend similar movies based
on a user's input (e.g., favorite movie) by analyzing the content features such as genres, keywords, cast,
and director. This project explores natural language processing (NLP), vectorization, cosine similarity,
and other machine learning techniques for recommendation.
In the rapidly evolving domain of data science and artificial intelligence, recommendation systems have
emerged as powerful tools that influence user choices and enhance user experience across various
platforms. The objective of this mini project is to develop a content-based movie recommendation
system using the Scikit-learn library in Python. The system aims to analyze a movie’s metadata and
suggest similar movies to the user based on content similarity.
This project’s key aim is to design a model that can automatically identify and recommend movies that
share similar characteristics with a given movie. We achieve this by analyzing textual metadata such as
keywords, genres, cast, and director. By using Natural Language Processing (NLP) and machine
learning tools such as CountVectorizer and cosine similarity, the system transforms movie features into
a mathematical form that allows for similarity calculations.
Another important objective is to ensure that the system is user-friendly, fast, scalable, and easily
integrable into larger applications or platforms. The solution should also serve as a foundation for more
complex systems involving hybrid models or collaborative filtering techniques.
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 7
8. Requirement Analysis
A thorough understanding of the system’s functional and non-functional requirements is essential for
effective development. This section outlines what the system is expected to do and how it should perform
under various constraints.
Functional Requirements
Non-Functional Requirements
Usability: The system should be easy to understand and use, especially by individuals who are not
data scientists or programmers.
Efficiency: Recommendations should be computed and returned within 1–2 seconds.
Maintainability: The system must be built with modular code that can be reused or extended.
Portability: It should work across multiple operating systems with minimal setup.
Data Requirements
The system uses a dataset containing over 4800 movies with metadata like genres, keywords, cast,
and directors.
Null values must be handled appropriately.
Text preprocessing is required for feature extraction and vectorization.
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 8
9. Software Requirement Specification (SRS)
The SRS provides a detailed description of the system, its components, functionalities, and constraints. It
acts as a reference for developers throughout the lifecycle of the project.
1. Introduction
The recommendation system uses machine learning and text mining techniques to recommend movies
similar to the one input by the user. It is a standalone system that can later be integrated into a web or
mobile application.
2. Product Features
4. Performance Requirements
5. Constraints
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 9
10. Methodology and Proposed System Block Diagram
Methodology
This project uses content-based filtering which relies on movie metadata to compute similarity. The
methodology is broken into several phases:
1. Data Collection and Cleaning: Dataset is imported and null fields in keywords, cast, genres,
and director are filled with blank strings.
2. Feature Engineering: Selected features are combined into a single string for each movie entry.
3. Vectorization: Use CountVectorizer to convert combined text data into a vector space.
4. Similarity Measurement: Calculate cosine similarity between vectors to identify movies with the
most overlap in features.
5. Recommendation Engine: Based on the similarity matrix, top N similar movies are recommended
to the user.
ACTIVITY DIAGRAM:
Start
↓
Load and Clean Dataset
↓
Combine Features
↓
Apply Vectorization
↓
Calculate Cosine Similarity
↓
Take User Input
↓
Find Similar Movies
↓
Display Results
↓
End
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 10
11. Implementation
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Load dataset
df = pd.read_csv('movie_dataset.csv')
# Combine features
def combine_features(row):
return row['keywords'] +" "+row['cast']+" "+row['genres']+" "+row['director']
# Recommendation function
def get_title_from_index(index):
return df[df.index == index]["title"].values[0]
def get_index_from_title(title):
try:
return df[df.title == title]["index"].values[0]
except:
return None
def recommend_movies(movie_title):
movie_index = get_index_from_title(movie_title)
if movie_index is None:
return "Movie not found in dataset."
similar_movies = list(enumerate(cosine_sim[movie_index]))
sorted_movies = sorted(similar_movies, key=lambda x:x[1], reverse=True)[1:11]
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 11
return [get_title_from_index(element[0]) for element in sorted_movies]
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 12
12. Results and Test Cases
Test Case 1
Input: "Avatar"
Expected Output: Top 10 similar movies.
Actual Output: ["Aliens", "Titanic", "Guardians of the Galaxy", ...]
Test Case 2
Test Case 3
The model performs accurately and returns relevant movie recommendations within a second.
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 13
13. Challenges Faced
Data Preprocessing: Handling missing values and inconsistent formats in the dataset required
careful cleaning.
String Matching: Ensuring exact title matches with user input was tricky and could be improved
using fuzzy matching.
Vectorization Issues: Overfitting the vectorizer or underfitting with too few features sometimes
led to poor recommendations.
Interpretability: Making the recommendations understandable and justifiable for users.
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 14
14. Conclusion and Future Scope
This project demonstrated the successful development of a basic content-based movie recommendation
system using Scikit-learn. By using vectorization and cosine similarity, the system effectively
recommends movies similar to the one provided by the user. The results are accurate and provide a good
user experience for casual film discovery.
Future Enhancements
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 15
15. References
Mini-Project Web Technology Lab (2024-25) Third Year Computer Engineering, MET’s, IOE, BKC Nashik 16