POORVIKA
POORVIKA
A PROJECT REPORT
On
"Scalable AI-Driven Movie Recommender System with
Intelligent Engineering Principles"
Bachelor of Technology
In
Computer Science and Engineering (AI & ML)
SOET, CMR University, Bangalore
Submitted by:
POORVIKA A C (22BBTCA116)
JEEVITHA K (22BBTCA067)
KAMURTHY SRAVYA (22BBTCA073)
PANJALA NIKHILA (22BBTCA111)
CERTIFICATE
This is to certify that the project report entitled “Scalable AI-Driven Movie Recommender System
Computer Science and Engineering(AI & ML), SoET, CMR University, Bangalore during the
academic year 2024-25, under the supervision and guidance of Dr Elango S, SoET, CMR University.
Signature Signature
Dr. ELANGO S, Dr. MANJUNATH C R,
Assistant Professor, Professor and Head,
Dept of CSE(AI&ML) Dept of CSE(AI&ML)
i
TABLE OF CONTENTS
Chapter
Title Page No
No
ABSTRACT 1
1 INTRODUCTION
1.1 Background and Context
2-3
1.2 Problem Statement
1.3 Objectives
2 LITERATURE SURVEY
2.1 Traditional Approaches
4-5
2.2 Matrix Factorization and Latent Factor Models
2.3 Applications in the Entertainment Industry
3 DESIGN TOOLS
3.1 Python
3.2 Streamlit
3.3 Pickle 6-7
3.4 The Movie Database (TMDb) API
3.5 Requests Library
3.6 Jupyter Notebook / IPython
4 SYSTEM DESIGN
4.1 System Architecture Overview
8 - 10
4.2 Recommendation Flow diagrams
4.3 Key Design Components
5 SYSTEM ANALYSIS
5.1 Feasibility Study 12 - 14
5.2 System Limitations
6 RESULT ANALYSIS 15 – 16
7 CONCLUSION 17
REFERENCES 18
Personalized AI-Driven Movie Recommendation System
4AIML2061 - Software Engineering for AI
ABSTRACT
In the current digital entertainment landscape, users are presented with an overwhelming array of
movies and streaming options, making it increasingly challenging to decide what to watch next. With
thousands of films spanning multiple genres, languages, and platforms, viewers often experience
decision fatigue and may struggle to find content that truly aligns with their tastes and preferences.
This abundance of choice, although a positive sign of media growth, highlights the pressing need for
intelligent, personalized content delivery mechanisms.
To address this challenge, personalized recommendation systems have emerged as vital tools. These
systems leverage user behavior, preferences, and content similarity to suggest movies that are more
likely to be enjoyed by the individual. By analyzing patterns in user interactions and film
characteristics, such systems reduce the time and effort involved in content discovery, ultimately
enhancing user satisfaction and engagement.
This project proposes the design and development of a Personalized AI-Driven Movie
Recommender System that effectively combines both collaborative filtering and content-based
filtering techniques. The collaborative filtering component draws inferences from user-movie
interactions and similarities among user preferences, while the content-based filtering module
evaluates movie metadata such as genres, descriptions, and ratings to make accurate suggestions.
To enhance real-world applicability, the system is integrated with real-time data from The Movie
Database (TMDb) API, ensuring that users receive updated information about each recommended
movie. This includes not only the movie's title and genre but also rich metadata like high-resolution
posters, official overviews, user ratings, and embedded trailers.
The final output is delivered through a clean and interactive Streamlit-based web interface,
allowing users to input a movie they like and instantly receive a curated list of top-rated, genre-
matched, and thematically relevant movie suggestions. The integration of visual media and
recommendation logic creates a compelling user experience that goes beyond basic list-based
suggestions, offering a comprehensive entertainment guide tailored to each individual.
CHAPTER – 1
INTRODUCTION
1.1 Introduction
With the explosion of digital content in recent years, particularly in the entertainment industry, users
are faced with a daunting challenge: choosing what to watch. Modern streaming platforms offer
access to thousands of movies and TV shows across diverse genres, languages, and formats. While
this abundance provides viewers with unprecedented access to entertainment, it also introduces a
significant cognitive burden known as the “paradox of choice.” Users often find themselves
overwhelmed, scrolling through endless catalogs in search of something enjoyable, frequently giving
up or settling for content that doesn’t fully meet their interests
Traditional search and filter mechanisms on these platforms, although helpful, are often limited by
generic categorization and lack true personalization. They do not learn from the user's evolving
preferences or past interactions. As a result, users are either presented with trending content that
lacks personal relevance or have to manually search for movies that match their mood, taste, or
viewing history. This inefficiency not only affects user satisfaction but also impacts content
engagement and platform retention metrics.
To combat this challenge, the concept of personalized recommendation systems has gained
significant traction. By analyzing user behavior, preferences, and interactions, these systems can
suggest content that aligns with individual tastes, thus streamlining the decision-making process. The
integration of Artificial Intelligence (AI) and Machine Learning (ML) into these systems further
enhances their ability to learn and adapt dynamically, leading to more accurate and meaningful
recommendations.
1.3 Objectives
This project, titled “Personalized AI-Driven Movie Recommender System,” aims to develop a
smart and user-friendly solution that alleviates the content overload problem by offering precise
movie suggestions tailored to each user’s preferences. The primary objectives of the project are as
follows:
• To make use of real-world data from the TMDb (The Movie Database) API, allowing the
system to provide current and visually enriched movie information including trailers, posters,
genres, ratings, and overviews.
• To offer an intuitive and responsive user experience through an interactive frontend built
using Streamlit, where users can simply select a movie they like and instantly receive five
personalized movie recommendations.
• To analyze and evaluate the system’s performance based on key criteria such as
recommendation relevance, user interface usability, data accuracy, and overall responsiveness.
By accomplishing these objectives, the project seeks to demonstrate the effectiveness of AI-driven
techniques in solving real-world problems and to provide a practical, scalable solution that enhances
the everyday experience of movie selection and consumption.
CHAPTER – 2
LITERATURE SURVEY
Recommendation systems have become an integral part of digital platforms in the modern era,
especially within entertainment services such as Netflix, Amazon Prime, YouTube, and Hotstar.
These platforms rely heavily on sophisticated algorithms to provide personalized content to their
users, thereby improving user engagement and retention. Over the past two decades, research in the
field of recommendation systems has evolved from basic heuristics to complex machine learning and
deep learning models, each offering distinct advantages and trade-offs.
Traditional Approaches
Early recommendation systems primarily employed collaborative filtering (CF) and content-based
filtering (CBF) techniques. Collaborative filtering is based on the assumption that users who agreed
in the past will likely agree in the future as well. It uses the preferences of similar users (user-user) or
similar items (item-item) to generate recommendations. This method, while effective, suffers from
the cold start problem—difficulty in recommending items to new users or for new items due to lack
of historical data.
Content-based filtering, on the other hand, relies on item features such as genre, director, cast, and
textual descriptions. It creates user profiles based on their previously liked items and recommends
similar content. While this method works well for users with limited interaction data, it tends to limit
diversity by recommending only items similar to the user’s previous preferences.
To overcome the shortcomings of both methods, hybrid models were introduced. These models
combine multiple filtering techniques, often using a weighted, switching, or mixed hybridization
approach. Burke (2002) was one of the earliest researchers to formalize the concept of hybrid
recommender systems, which have since become the foundation of many commercial systems.
The Netflix Prize (2006) catalyzed significant advancements in recommender systems. Yehuda
Koren and his team introduced matrix factorization techniques, particularly Singular Value
Decomposition (SVD), which became the gold standard for collaborative filtering. These methods
decompose the user-item interaction matrix into latent factors representing underlying patterns, such
as user taste and movie genre preferences. Matrix factorization is scalable and accurate, but it still
requires sufficient data density.
More recently, deep learning has made its way into recommender systems, with techniques such as
Neural Collaborative Filtering (NCF), Autoencoders, and Recurrent Neural Networks (RNNs).
These models can learn complex user-item interactions and temporal dynamics. He et al. (2017)
proposed NCF, a neural network architecture that generalizes matrix factorization by using a multi-
layer perceptron to capture non-linear relationships.
In the domain of content-rich platforms, context-aware recommender systems have also emerged.
These systems consider contextual information such as time of day, location, and device type to
personalize recommendations. They are especially useful for mobile and on-demand services.
Streaming giants like Netflix and Amazon Prime Video heavily rely on hybrid and deep learning-
based recommendation systems. Netflix uses a multi-armed bandit framework for real-time
personalization and continuously A/B tests its algorithms. YouTube leverages a two-stage
recommendation pipeline using deep neural networks to serve billions of personalized videos daily.
Several open-source projects and academic implementations have attempted to replicate these
systems. One such example is MovieLens, a widely-used dataset developed by GroupLens Research
that has been used in thousands of recommender system studies.
Our project draws inspiration from both classical and modern approaches. It combines content-
based filtering (using genres and metadata from TMDb) and collaborative filtering (through a
similarity matrix built using cosine similarity on movie vectors). This hybrid approach provides a
balanced recommendation engine that delivers diverse yet relevant suggestions. By integrating the
TMDb API, the system maintains real-time accuracy and provides visually rich metadata, enhancing
user experience beyond simple text-based outputs.
This literature foundation provides the theoretical and practical groundwork upon which our
recommender system is built. It ensures that the project aligns with state-of-the-art practices while
maintaining accessibility and scalability for future expansion.
CHAPTER – 3
DESIGN TOOLS
To implement the Personalized AI-Driven Movie Recommender System, a range of powerful
design tools and technologies were used. Each tool was selected for its specific strengths, relevance
to the task, and ability to streamline different aspects of system development—from data processing
and backend logic to user interface design and real-time API integration
1. Python
Python is the core programming language used in the development of this project. It is highly
favored in the data science and machine learning communities due to its readable syntax, vast
ecosystem of libraries, and strong community support.
2. Streamlit
3. Pickle
The Pickle module in Python was used for model serialization. It stores the pre-computed data
structures needed by the recommender system:
• movie_list.pkl: A serialized list of all available movie titles and their corresponding metadata
Using Pickle improves performance and efficiency, as it allows loading large pre-processed data
objects without recalculating them at runtime.
TMDb API is a powerful and widely-used movie data service that provides structured information
on films, TV shows, and people in the entertainment industry. The API is RESTful and supports
JSON responses, making it ideal for integration into Python applications.
• Movie posters
API integration ensures that the recommender system remains up-to-date with real-time data,
enriching the user experience by providing contextually relevant and current information about each
recommended film.
5. Requests Library
Python’s requests library was used to interact with the TMDb API. A retry strategy with timeouts
and error handling was implemented using Retry and HTTPAdapter from urllib3 to ensure robust
API calls, especially when network issues or rate limits were encountered.
For initial development and experimentation, the backend logic and algorithm design were
prototyped in Jupyter Notebook. Its interactive interface allowed easy debugging, visualization of
data, and rapid iteration of models and logic.
CHAPTER - 4
SYSTEM DESIGN
The design of the Movie Recommender System involves both architectural and functional aspects
that work in synergy to deliver a smooth and responsive user experience. The system is modular,
scalable, and built with modern tools that allow seamless integration between data handling, logic
processing, and user interface. The design includes components such as the data model,
recommendation engine, API integration, and the front-end interface.
4.1 METHODOLOGY
The architecture of the system can be broken down into three major layers:
o Built using Streamlit, the UI allows the user to interact with the system by selecting a movie
title.
o Displays recommended movies with posters, ratings, genres, overviews, and trailers.
o Uses pre-computed similarity scores (cosine similarity) between movies stored in Pickle files.
o Communicates with the TMDb API to retrieve metadata and visual content.
3. Data Layer:
o Consists of:
▪ movie_list.pkl: Stores a DataFrame of movies with their titles and TMDb IDs.
▪ similarity.pkl: Stores a similarity matrix used to find movies most similar to the selected one.
• The selected title is mapped to its corresponding index in the similarity matrix.
b. Recommendation Engine
• Identifies the top 5 most similar movies, excluding the selected one.
• The similarity data is pre-computed and stored to ensure fast runtime responses.
• Poster images, genres, overview, average rating, and trailers are retrieved in real-time.
• The system handles retry logic to ensure smooth API interaction even during temporary failures.
• Uses Streamlit’s layout capabilities (like st.columns, st.expander, and st.video) to organize and
present the output neatly.
o Movie poster
o Title
o Expandable overview
4. Design Considerations
• Scalability: The modular design allows the addition of more movies by updating the dataset and
similarity matrix.
• Resilience: Network calls to the TMDb API include retry and timeout strategies, ensuring the app
remains functional under mild connectivity issues.
• Maintainability: Clearly separated frontend (Streamlit), backend (Python logic), and data (Pickle
files) layers make the system easy to update and debug.
This design ensures that the Movie Recommender System not only functions efficiently but also
provides a rich, engaging, and personalized user experience. The separation of concerns, robust error
handling, and interactive UI together contribute to a system that is both technically sound and user-
centric.
CHAPTER - 5
IMPLEMENTATION
The implementation of the NLP-based Movie Recommender System involves several key stages—
from data preparation and natural language processing to similarity computation, API integration,
and frontend visualization. Each component has been developed using Python and seamlessly
integrated to ensure a fast, user-friendly, and effective recommendation workflow.
1. Data Preparation
• The core dataset includes movie titles, overviews, genres, and unique IDs.
• This data is processed and stored in a DataFrame and serialized using Pickle for rapid loading
and scalability.
• Only textual fields are used in this project to demonstrate the power of NLP in semantic content-
based filtering.
2. Text Preprocessing
Text preprocessing is a critical step to prepare movie descriptions for vectorization. It includes:
These steps ensure uniformity in text and improve the quality of the feature vectors generated later.
3. TF-IDF Vectorization
• Captures the importance of each term relative to its frequency across the entire corpus.
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movie_data['overview'])
5. Recommendation Logic
• When a user selects a movie, its index is found in the movie list.
• The top 5 most similar movies are selected using the corresponding row in the similarity matrix.
• These movie indices are used to extract TMDb IDs to fetch metadata.
def recommend(movie):
distances = list(enumerate(similarity_matrix[index]))
• TMDb API is used to fetch live metadata, enhancing each recommendation visually and
contextually.
o Poster
o Overview
o Rating
o Genre
o YouTube Trailer
import requests
def fetch_movie_details(movie_id):
metadata_url = f'https://api.themoviedb.org/3/movie/{movie_id}?api_key=API_KEY'
trailer_url = f'https://api.themoviedb.org/3/movie/{movie_id}/videos?api_key=API_KEY'
7. Streamlit Frontend
• Components used:
st.set_page_config(page_title="Movie Recommender")
if st.button("Show Recommendations"):
results = recommend(selected_movie)
st.image(movie['poster'])
st.markdown(movie['title'])
• The requests session includes retry logic using urllib3’s Retry and HTTPAdapter.
• Streamlit’s responsive nature ensures that even failed API fetches do not crash the app.
CHAPTER – 6
RESULT ANALYSIS
In the evolving landscape of digital entertainment, the challenge of choosing a movie from thousands
of available titles has become increasingly complex. The Personalized AI-Driven Movie
Recommender System was conceptualized to analyze this problem and provide a tailored solution
based on user preferences. This section explores the problem definition, feasibility, and requirements
that shape the system’s design and functionality.
Feasibility Study
a. Technical Feasibility
The project leverages Python for the backend logic, which supports popular libraries such as pickle,
requests, and pandas, ensuring strong data handling and API integration. Streamlit is used for the
frontend due to its simplicity and compatibility with data-driven applications. TMDb's API provides
a reliable and free resource for retrieving dynamic movie data.
b. Operational Feasibility
The system operates smoothly on any device with a modern web browser and internet connectivity.
Since the application is built using Streamlit, it requires minimal setup and is user-friendly, making it
suitable for non-technical users as well.
c. Economic Feasibility
The system uses free or open-source tools, such as TMDb’s API (within rate limits), Streamlit, and
local .pkl files for movie data, which makes it economically viable for academic or personal use.
Cloud deployment (optional) can be done on low-cost platforms like Heroku or Streamlit Cloud.
Functional Requirements
• Recommendation Output: Based on the selected movie, the system displays the top 5
recommended movies.
• Movie Metadata: Each recommended movie is shown with its poster, rating, genre, and
overview.
• Trailer Support: Embedded trailers are played (if available) using data from the TMDb API.
Non-Functional Requirements
• Scalability: The modular code design allows for easy integration of more movies or alternative
recommendation algorithms.
• Reliability: The system uses retry logic for API requests to ensure data availability even during
intermittent connection issues.
• Usability: Simple UI layout with clear instructions for users. The app requires no prior training
or technical knowledge to operate.
System Limitations
• The recommendations are based solely on similarity data and are not adjusted for user feedback
over time.
• Relies on TMDb API, which has request limits unless upgraded to a paid plan.
• Does not support multilingual recommendations or advanced personalization like user login
history or preferences.
CHAPTER -7
CONCLUSION
The Personalized AI-Driven Movie Recommender System successfully addresses one of the most
pressing challenges in the digital content space: helping users navigate an overwhelming number of
choices and discover movies tailored to their personal tastes. In today’s world, where digital content
consumption is at an all-time high and continues to grow, personalized recommendation engines are
not just optional enhancements but essential tools for improving user experience and platform
engagement.
This project has achieved its goal of designing and implementing a hybrid recommender system that
combines the strengths of content-based filtering and collaborative filtering to deliver intelligent
and contextually relevant movie suggestions. By integrating precomputed similarity matrices with
real-time metadata from the TMDb API, the system offers more than just textual suggestions—it
provides a visually engaging and informative experience. Users can interact with the system via an
intuitive Streamlit-based UI, receive top 5 movie recommendations based on their input, and access
rich media content such as posters, overviews, genres, ratings, and embedded trailers.
One of the major highlights of the system is its simplicity in design coupled with powerful
functionality. The use of open-source tools and APIs ensures that the system is cost-effective and
accessible, while the modular architecture ensures ease of maintenance and extensibility. For
instance, additional features such as user login, preference tracking, sentiment analysis from user
reviews, or even reinforcement learning-based recommendation feedback loops could be integrated
in the future without overhauling the existing framework.
Furthermore, this project has helped in deepening our understanding of key machine learning
concepts, system architecture, API integration, and frontend/backend synchronization. It also
provided practical experience in solving real-world problems using AI-driven approaches, reflecting
the current trends in tech industries where intelligent automation is becoming the standard.
In conclusion, the system stands as a robust, scalable, and user-centric application that delivers
intelligent movie recommendations in a visually engaging and responsive environment. It not only
meets the objectives set at the beginning of the project but also lays a strong foundation for future
enhancements, research, and deployment in larger-scale platforms.
REFERENCES
1. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T. S. (2017).
Neural Collaborative Filtering. Proceedings of the 26th International Conference on World Wide Web
(WWW '17).
https://doi.org/10.1145/3038912.3052569
2. Aggarwal, C. C. (2016).
Recommender Systems: The Textbook. Springer. ISBN: 978-3-319-29658-1
https://link.springer.com/book/10.1007/978-3-319-29659-8
10. McInerney, J., Hill, S., Volkovs, M., & Larochelle, H. (2020).
Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits. Proceedings
of the 13th ACM International Conference on Web Search and Data Mining (WSDM '20).
https://doi.org/10.1145/3336191.3371792