Project Problem Statement

The project developed a movie recommendation system using R and the Shiny framework, analyzing data from the IMDb dataset to suggest movies based on user preferences. It involved processing multiple TSV files containing movie information, implementing data manipulation and visualization techniques, and creating an interactive web application. The project enhanced skills in data handling, cleaning, and application development while providing insights into recommendation systems.

Uploaded by

Sanskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

Project Problem Statement

Uploaded by

Sanskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Project Problem Statement & Dataset

Statistics with R Project by Mohit Narang 222113 & Neerh Bordoloi 221098

Overview

In this project, we developed a movie recommendation system using R to analyze data

from the IMDb dataset. The aim was to create an interactive web application that
suggests movies to users based on their preferences, using the Shiny framework. The
dataset included several TSV files containing information on movie titles, ratings,
genres, crew, and other relevant details. We started by reading the IMDb data files and
handling any parsing issues during data import. Next, we filtered the data to focus on
movies, merged the ratings data, and split genres for a more detailed analysis. The
recommendation system allows users to select a genre, rate three movies, and get top
10 recommendations based on average ratings and the number of votes. We also
included data visualization using `ggplot2` to present the results in an intuitive format.
Throughout the project, we utilized various R packages for data manipulation, cleaning,
and visualization, while implementing error handling to manage potential data issues.
The result is a functional and interactive system that offers users movie suggestions
tailored to their preferences.

Dataset

The dataset used in this project was downloaded from [IMDb datasets]
(https://datasets.imdbws.com). It consists of multiple tab-separated values (TSV) files
containing various types of information about movies, such as titles, ratings, genres,
crew members, and more. We utilized the following seven files to build the
recommendation system, each playing a crucial role in the data processing:

1. name.basics.tsv: This file contains information about people in the film industry,
including their names, birth and death years, and the titles they've been involved with.
Although not used directly for recommendations, it provides additional context about
crew members.

2. title.akas.tsv: This file includes alternate titles for movies across different regions and
languages. It was used to account for variations in movie titles, ensuring comprehensive
matching when users input movie names.
3. title.basics.tsv: This file contains primary information about movies, such as title,
release year, and genre. It served as the main source for filtering movies by type (e.g.,
movies only) and extracting genres for recommendation purposes. The genre column
was further split to allow multi-genre analysis.

4. title.crew.tsv: This file provides details on the directors and writers of movies. While
not directly used for recommendations, it could enhance future iterations of the system
by incorporating crew-based filtering.

5. title.episode.tsv: This file lists episode-specific information for TV shows, helping to

filter out non-movie entries from our data.

6. title.principals.tsv: It contains information about key cast and crew members

associated with movies. Though not utilized in the current version, it holds potential for
future enhancements in recommendations based on cast.

7. title.ratings.tsv: This file includes user ratings and the number of votes each movie
has received. It was critical for ranking movies by average rating and number of votes to
generate meaningful recommendations.

Each of these files was processed using R packages like `readr` for reading the data
and `dplyr` for filtering, merging, and cleaning the information to ensure quality
recommendations.

Tools and Technologies

- R: The primary programming language used for data analysis and building the
application.
- Shiny: Used for creating the interactive web-based movie recommendation system.
- dplyr: For data manipulation, filtering, and merging datasets.
- readr: To read TSV files and handle data import.
- tidyr: To transform and tidy the data, including splitting genres.
- ggplot2: For creating visualizations to enhance data presentation.
- stringr: For handling string operations during data preprocessing.

Expected Outcomes

Upon completing this project, we gained valuable skills and insights, including:

- Handling and parsing complex datasets in TSV format using `readr`.

- Implementing data cleaning techniques to manage missing values and handle irregular
data.
- Filtering, merging, and manipulating data frames using `dplyr` and `tidyr`.
- Developing an interactive web application using the Shiny framework in R.
- Designing a recommendation system based on user input and collaborative filtering.
- Visualizing data using `ggplot2` to display results in a user-friendly manner.
- Applying error handling techniques to manage data import issues gracefully.

Conclusion

This project provided a comprehensive introduction to data manipulation, interactive

application development, and data visualization in R. It enabled us to apply our skills in
a practical context, enhancing our understanding of recommendation systems. Thank
you for the opportunity to learn and grow through this project.

SRMDB - in (B28 - Research Paper)
No ratings yet
SRMDB - in (B28 - Research Paper)
5 pages
Group 15 Report
No ratings yet
Group 15 Report
23 pages
Report Final-MovieLens
No ratings yet
Report Final-MovieLens
47 pages
Practical Work 1 - Recommender Systems
No ratings yet
Practical Work 1 - Recommender Systems
3 pages
Movie Recommender Systems
No ratings yet
Movie Recommender Systems
11 pages
Movie Recommendation System in R Jupyter Notebook
No ratings yet
Movie Recommendation System in R Jupyter Notebook
18 pages
Recommendation Engine Problem Statement
No ratings yet
Recommendation Engine Problem Statement
37 pages
Movies Final Report
No ratings yet
Movies Final Report
22 pages
Technical Documenetflix Technicalnt
No ratings yet
Technical Documenetflix Technicalnt
15 pages
Seminar Report
No ratings yet
Seminar Report
13 pages
Movie Reccomendation System Project Report
No ratings yet
Movie Reccomendation System Project Report
19 pages
Movie Recommdation Report
No ratings yet
Movie Recommdation Report
10 pages
DSLAB5
No ratings yet
DSLAB5
17 pages
Vaibhav - Project Report On Movie Recommender System Using Machine Learning
No ratings yet
Vaibhav - Project Report On Movie Recommender System Using Machine Learning
11 pages
Report
No ratings yet
Report
26 pages
Source Code
No ratings yet
Source Code
19 pages
IP CSV Project For Class 12
No ratings yet
IP CSV Project For Class 12
22 pages
MIT Data Science and Big Data Analytics Case Study
No ratings yet
MIT Data Science and Big Data Analytics Case Study
8 pages
Iv Year Technical Seminar Presentation
No ratings yet
Iv Year Technical Seminar Presentation
16 pages
IV Year Technical Seminar Presentation
No ratings yet
IV Year Technical Seminar Presentation
16 pages
Python Project Description
No ratings yet
Python Project Description
4 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Technical Docs of NETFLIX MOVIES AND TV SHOWS CLUSTERING
No ratings yet
Technical Docs of NETFLIX MOVIES AND TV SHOWS CLUSTERING
12 pages
smlPBL
No ratings yet
smlPBL
18 pages
It Optics Project Report
No ratings yet
It Optics Project Report
6 pages
Final Project1 IMDB Movie Analysis PDF
No ratings yet
Final Project1 IMDB Movie Analysis PDF
9 pages
Movie Recommendation System Using Machine Learning
No ratings yet
Movie Recommendation System Using Machine Learning
6 pages
RE Paper
No ratings yet
RE Paper
25 pages
M
No ratings yet
M
11 pages
Informatics Practices Project Synopsis Title: Imdb Movie Analysis System
No ratings yet
Informatics Practices Project Synopsis Title: Imdb Movie Analysis System
24 pages
549129758-synopsis
No ratings yet
549129758-synopsis
52 pages
Team 10 Movie Prediction
No ratings yet
Team 10 Movie Prediction
14 pages
Divya_NM[1]-2
No ratings yet
Divya_NM[1]-2
41 pages
Movie_recommendation pranali
No ratings yet
Movie_recommendation pranali
12 pages
IMDB Dataframe Insights
No ratings yet
IMDB Dataframe Insights
3 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
IMDB Movie Analysis1
No ratings yet
IMDB Movie Analysis1
14 pages
Deepanshu - 21BCS5066 Summer Institutional Training Report
No ratings yet
Deepanshu - 21BCS5066 Summer Institutional Training Report
37 pages
Movix Project Report Final
No ratings yet
Movix Project Report Final
15 pages
Predictive CA2
No ratings yet
Predictive CA2
13 pages
Business Intelligence Project Report
No ratings yet
Business Intelligence Project Report
14 pages
2
No ratings yet
2
35 pages
My Project 1
No ratings yet
My Project 1
7 pages
Abhishek Jain
No ratings yet
Abhishek Jain
19 pages
BCA 8th Proposal
No ratings yet
BCA 8th Proposal
17 pages
Minor Project
No ratings yet
Minor Project
15 pages
NM (2)_merged_organized
No ratings yet
NM (2)_merged_organized
16 pages
R Project 98
No ratings yet
R Project 98
15 pages
JasperReports 3.5 for Java Developers
From Everand
JasperReports 3.5 for Java Developers
David R. Heffelfinger
No ratings yet
Internship Report
No ratings yet
Internship Report
26 pages
Survey On Cinematics Recommendation System
No ratings yet
Survey On Cinematics Recommendation System
10 pages
Movie at
No ratings yet
Movie at
11 pages
Netflix data analysis vashisht
No ratings yet
Netflix data analysis vashisht
29 pages
PYTHON CBP - Removed
No ratings yet
PYTHON CBP - Removed
15 pages
rosp PPT
No ratings yet
rosp PPT
17 pages
move rs
No ratings yet
move rs
17 pages
AYASKANTA PARIDA - Report
No ratings yet
AYASKANTA PARIDA - Report
116 pages
IMDB Analysis
No ratings yet
IMDB Analysis
4 pages
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Daxaz Semester1
No ratings yet
Daxaz Semester1
11 pages
Science Bsc Information Technology Semester 6 2024 April Principles of Geographic Information Systems Cbcs
No ratings yet
Science Bsc Information Technology Semester 6 2024 April Principles of Geographic Information Systems Cbcs
2 pages
Assignment - Subjective Questions
No ratings yet
Assignment - Subjective Questions
2 pages
Bda A2
No ratings yet
Bda A2
17 pages
Supervised Learning
No ratings yet
Supervised Learning
4 pages
OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch
No ratings yet
OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch
46 pages
Research Paper
No ratings yet
Research Paper
5 pages
Cloud Computing Notes
No ratings yet
Cloud Computing Notes
19 pages
Dan Brown - Eight Principles of Information Architecture Design
No ratings yet
Dan Brown - Eight Principles of Information Architecture Design
5 pages
Biodata
No ratings yet
Biodata
2 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
34 pages
03data Measurement and Causal Inferences in Machine Learning Opportunities and Challenges For Marketing
No ratings yet
03data Measurement and Causal Inferences in Machine Learning Opportunities and Challenges For Marketing
14 pages
Shrena Tiwari - B.tech - CS - SVVV
No ratings yet
Shrena Tiwari - B.tech - CS - SVVV
2 pages
01 Create An Azure AI Search Solution
No ratings yet
01 Create An Azure AI Search Solution
34 pages
202046702 Artificial Intelligence and Machine Learning
No ratings yet
202046702 Artificial Intelligence and Machine Learning
4 pages
Vector Data Model (GIS)
No ratings yet
Vector Data Model (GIS)
34 pages
Capstone Review-3
No ratings yet
Capstone Review-3
37 pages
DC Question Bank
No ratings yet
DC Question Bank
5 pages
Data Analytics and Machine Learning (Pushpa Singh Asha Rani Mishra Payal Garg) (
100% (1)
Data Analytics and Machine Learning (Pushpa Singh Asha Rani Mishra Payal Garg) (
357 pages
Data Mining: Priyanka Nemalikanti
No ratings yet
Data Mining: Priyanka Nemalikanti
5 pages
Neural Machine Translation For English-Tamil: Himanshu Choudhary Aditya Kumar Pathak
No ratings yet
Neural Machine Translation For English-Tamil: Himanshu Choudhary Aditya Kumar Pathak
7 pages
Immediate Download Graph Algorithms Practical Examples in Apache Spark and Neo4j 1st Edition Mark Needham Ebooks 2024
100% (2)
Immediate Download Graph Algorithms Practical Examples in Apache Spark and Neo4j 1st Edition Mark Needham Ebooks 2024
62 pages
Sushant Tomar (12917704423) - MCA 3C AIML Assignment 2
No ratings yet
Sushant Tomar (12917704423) - MCA 3C AIML Assignment 2
11 pages
"Artificial Intelligence in Health Care ": Master of Business Administration
No ratings yet
"Artificial Intelligence in Health Care ": Master of Business Administration
56 pages
Predicting The Law Area and Decisions of French Supreme Court Cases
No ratings yet
Predicting The Law Area and Decisions of French Supreme Court Cases
7 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
Module 1 - IT Impact and AI
No ratings yet
Module 1 - IT Impact and AI
39 pages
Mastering Predictive Analytics with R 2nd Edition James D. Miller pdf download
100% (2)
Mastering Predictive Analytics with R 2nd Edition James D. Miller pdf download
42 pages
HDFS MAP REDUCE
No ratings yet
HDFS MAP REDUCE
16 pages
Untitled
No ratings yet
Untitled
2 pages

Project Problem Statement

Uploaded by

Project Problem Statement

Uploaded by

Project Problem Statement & Dataset

In this project, we developed a movie recommendation system using R to analyze data

5. title.episode.tsv: This file lists episode-specific information for TV shows, helping to

6. title.principals.tsv: It contains information about key cast and crew members

Tools and Technologies

- Handling and parsing complex datasets in TSV format using `readr`.

This project provided a comprehensive introduction to data manipulation, interactive

You might also like