0% found this document useful (0 votes)
0 views

Project Problem Statement

The project developed a movie recommendation system using R and the Shiny framework, analyzing data from the IMDb dataset to suggest movies based on user preferences. It involved processing multiple TSV files containing movie information, implementing data manipulation and visualization techniques, and creating an interactive web application. The project enhanced skills in data handling, cleaning, and application development while providing insights into recommendation systems.

Uploaded by

Sanskar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Project Problem Statement

The project developed a movie recommendation system using R and the Shiny framework, analyzing data from the IMDb dataset to suggest movies based on user preferences. It involved processing multiple TSV files containing movie information, implementing data manipulation and visualization techniques, and creating an interactive web application. The project enhanced skills in data handling, cleaning, and application development while providing insights into recommendation systems.

Uploaded by

Sanskar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Project Problem Statement & Dataset

Statistics with R Project by Mohit Narang 222113 & Neerh Bordoloi 221098

Overview

In this project, we developed a movie recommendation system using R to analyze data


from the IMDb dataset. The aim was to create an interactive web application that
suggests movies to users based on their preferences, using the Shiny framework. The
dataset included several TSV files containing information on movie titles, ratings,
genres, crew, and other relevant details. We started by reading the IMDb data files and
handling any parsing issues during data import. Next, we filtered the data to focus on
movies, merged the ratings data, and split genres for a more detailed analysis. The
recommendation system allows users to select a genre, rate three movies, and get top
10 recommendations based on average ratings and the number of votes. We also
included data visualization using `ggplot2` to present the results in an intuitive format.
Throughout the project, we utilized various R packages for data manipulation, cleaning,
and visualization, while implementing error handling to manage potential data issues.
The result is a functional and interactive system that offers users movie suggestions
tailored to their preferences.

Dataset

The dataset used in this project was downloaded from [IMDb datasets]
(https://datasets.imdbws.com). It consists of multiple tab-separated values (TSV) files
containing various types of information about movies, such as titles, ratings, genres,
crew members, and more. We utilized the following seven files to build the
recommendation system, each playing a crucial role in the data processing:

1. name.basics.tsv: This file contains information about people in the film industry,
including their names, birth and death years, and the titles they've been involved with.
Although not used directly for recommendations, it provides additional context about
crew members.

2. title.akas.tsv: This file includes alternate titles for movies across different regions and
languages. It was used to account for variations in movie titles, ensuring comprehensive
matching when users input movie names.
3. title.basics.tsv: This file contains primary information about movies, such as title,
release year, and genre. It served as the main source for filtering movies by type (e.g.,
movies only) and extracting genres for recommendation purposes. The genre column
was further split to allow multi-genre analysis.

4. title.crew.tsv: This file provides details on the directors and writers of movies. While
not directly used for recommendations, it could enhance future iterations of the system
by incorporating crew-based filtering.

5. title.episode.tsv: This file lists episode-specific information for TV shows, helping to


filter out non-movie entries from our data.

6. title.principals.tsv: It contains information about key cast and crew members


associated with movies. Though not utilized in the current version, it holds potential for
future enhancements in recommendations based on cast.

7. title.ratings.tsv: This file includes user ratings and the number of votes each movie
has received. It was critical for ranking movies by average rating and number of votes to
generate meaningful recommendations.

Each of these files was processed using R packages like `readr` for reading the data
and `dplyr` for filtering, merging, and cleaning the information to ensure quality
recommendations.

Tools and Technologies

- R: The primary programming language used for data analysis and building the
application.
- Shiny: Used for creating the interactive web-based movie recommendation system.
- dplyr: For data manipulation, filtering, and merging datasets.
- readr: To read TSV files and handle data import.
- tidyr: To transform and tidy the data, including splitting genres.
- ggplot2: For creating visualizations to enhance data presentation.
- stringr: For handling string operations during data preprocessing.

Expected Outcomes

Upon completing this project, we gained valuable skills and insights, including:

- Handling and parsing complex datasets in TSV format using `readr`.


- Implementing data cleaning techniques to manage missing values and handle irregular
data.
- Filtering, merging, and manipulating data frames using `dplyr` and `tidyr`.
- Developing an interactive web application using the Shiny framework in R.
- Designing a recommendation system based on user input and collaborative filtering.
- Visualizing data using `ggplot2` to display results in a user-friendly manner.
- Applying error handling techniques to manage data import issues gracefully.

Conclusion

This project provided a comprehensive introduction to data manipulation, interactive


application development, and data visualization in R. It enabled us to apply our skills in
a practical context, enhancing our understanding of recommendation systems. Thank
you for the opportunity to learn and grow through this project.

You might also like