Final
Final
INTRODUCTION
Objectives of EDA:-
• The dataset consists of 250 records from the IMDb Top 250 list for Indian movies,
featuring essential attributes like ranking, title, rating, release year, duration, genre,
and a brief description
STEPS OF EDA
Data Cleaning Steps
• Univariate Analysis: Explored IMDb ratings and movie release years through visualizations.
• Bivariate Analysis: Examined relationships using scatterplots and correlation matrices.
• Multivariate Analysis: Utilized dimensionality reduction and advanced visualizations.
DATA COLLECTION
Curated a dataset of 250 records from IMDb's Top 250 list for Indian
movies. Extracted crucial details including ranking, title, rating, release
year, duration, genre, and brief descriptions. This comprehensive
dataset sets the stage for a detailed exploration of highly acclaimed
Indian films.
DATA ANALYSIS
The univariate analysis brought to light the nuances of IMDb ratings and the
distribution of movie release years, providing a snapshot of individual
variables. Transitioning to bivariate analysis, we delved into relationships
between pairs of variables, using scatterplots and correlation matrices to
uncover meaningful associations. The multivariate analysis phase introduced
advanced techniques such as dimensionality reduction (PCA) and diverse
visualizations, offering insights into the intricate web of interactions among
multiple variables. Although not the primary focus, exploratory modeling
techniques were applied, contributing to a nuanced understanding of
underlying patterns within the dataset
DATA ANALYSIS
INSIGHTS
• IMDb Top 250 list features iconic classics with near-perfect ratings.
• Movie ratings have evolved over time, reflecting changes in film quality, genres,
and audience preferences.
• Movies from the 1980s tend to have high average ratings.
• Genres like drama, crime, adventure, and action are associated with highly-rated
films.
• Dramas and documentaries generally receive higher ratings; comedy and horror
have more mixed ratings.
• The 2010s saw the highest number of Top 250 releases.
• There's a modest correlation betIen movie duration and IMDb rating, with longer
films slightly higher-rated on average.
LIMITATIONS AND RECOMMENDATIONS
∙ This analysis offers valuable insights, but it's important to recognize its limitations:
∙ Data Scope: The analysis relies on the provided dataset, which may not represent the entire universe of
Indian movies.
∙ Causation vs. Correlation: Correlations observed don't imply causation, as unaccounted factors may
influence trends.
CONCLUSION
In this Exploratory Data Analysis (EDA) of IMDb ratings for Indian movies , I have uncovered several key
takeaways:
Rating Distribution: The initial distribution of IMDb ratings for Indian movies in the dataset was right-skeId,
with a concentration of movies receiving ratings around 8.0.
Year-Based Analysis: I conducted a hypothesis test to compare IMDb ratings before and after the year 2000.
The results indicated that there is no significant difference in ratings betIen these two time periods.
THANK YOU