0% found this document useful (0 votes)
9 views9 pages

qxc6bs1pw: 0.0.1 Matplotlib Assignment

The document outlines a series of data visualization tasks using Python's Pandas and Matplotlib libraries, focusing on automobile sales and Netflix data. It includes instructions for creating line charts, scatter plots, pie charts, heatmaps, and bar plots to analyze trends and correlations in sales, advertising expenditure, and IMDb ratings. Each section provides code snippets and explanations for visualizing the data effectively.

Uploaded by

anuj rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

qxc6bs1pw: 0.0.1 Matplotlib Assignment

The document outlines a series of data visualization tasks using Python's Pandas and Matplotlib libraries, focusing on automobile sales and Netflix data. It includes instructions for creating line charts, scatter plots, pie charts, heatmaps, and bar plots to analyze trends and correlations in sales, advertising expenditure, and IMDb ratings. Each section provides code snippets and explanations for visualizing the data effectively.

Uploaded by

anuj rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

qxc6bs1pw

December 26, 2024

0.0.1 Matplotlib Assignment

[1]: import pandas as pd

# Load the dataset


url = 'https://itv-contentbucket.s3.ap-south-1.amazonaws.com/Exams/AWP/
↪Matplotlib/historical_automobile_sales.csv'

df = pd.read_csv(url)

1. Develop a Line chart using pandas to show how automobile sales fluctuate from year to year
[2]: import matplotlib.pyplot as plt

# Aggregate automobile sales by year


sales_per_year = df.groupby('Year')['Automobile_Sales'].sum().reset_index()

# Plot the line chart


plt.figure(figsize=(12, 6))
plt.plot(sales_per_year['Year'], sales_per_year['Automobile_Sales'],␣
↪marker='o', linestyle='-')

plt.title('Automobile Sales Fluctuation from Year to Year')


plt.xlabel('Year')
plt.ylabel('Automobile Sales')
plt.grid(True)
plt.show()

1
2. Plot different lines for categories of vehicle type and analyze the trend during recession periods
[3]: # Aggregate automobile sales by year and vehicle type
sales_per_year_vehicle = df.groupby(['Year',␣
↪'Vehicle_Type'])['Automobile_Sales'].sum().unstack()

# Plot the line chart with recession shading


plt.figure(figsize=(14, 7))
for vehicle_type in sales_per_year_vehicle.columns:
plt.plot(sales_per_year_vehicle.index,␣
↪sales_per_year_vehicle[vehicle_type], marker='o', linestyle='-',␣

↪label=vehicle_type)

# Highlight recession periods


recession_periods = df[df['Recession'] == 1]['Year'].unique()
for year in recession_periods:
plt.axvspan(year - 0.5, year + 0.5, color='gray', alpha=0.3)

plt.title('Sales Trends by Vehicle Type During Recession Periods')


plt.xlabel('Year')
plt.ylabel('Automobile Sales')
plt.legend(title='Vehicle Type')
plt.grid(True)
plt.show()

2
3. Visualization to compare the sales trend per vehicle type for recession and non-recession
periods
[4]: # Separate recession and non-recession periods
recession_sales = df[df['Recession'] == 1].groupby(['Year',␣
↪'Vehicle_Type'])['Automobile_Sales'].sum().unstack()

non_recession_sales = df[df['Recession'] == 0].groupby(['Year',␣


↪'Vehicle_Type'])['Automobile_Sales'].sum().unstack()

# Plot the comparison


fig, axes = plt.subplots(1, 2, figsize=(18, 8), sharey=True)

# Recession period sales


axes[0].set_title('Sales Trend During Recession Periods')
for vehicle_type in recession_sales.columns:
axes[0].plot(recession_sales.index, recession_sales[vehicle_type],␣
↪marker='o', linestyle='-', label=vehicle_type)

axes[0].set_xlabel('Year')
axes[0].set_ylabel('Automobile Sales')
axes[0].legend(title='Vehicle Type')
axes[0].grid(True)

# Non-recession period sales


axes[1].set_title('Sales Trend During Non-Recession Periods')
for vehicle_type in non_recession_sales.columns:
axes[1].plot(non_recession_sales.index, non_recession_sales[vehicle_type],␣
↪marker='o', linestyle='-', label=vehicle_type)

3
axes[1].set_xlabel('Year')
axes[1].legend(title='Vehicle Type')
axes[1].grid(True)

plt.show()

4. Scatter plot to identify the correlation between average vehicle price and sales volume during
recessions
[5]: # Calculate average price and total sales during recession periods
avg_price_sales_recession = df[df['Recession'] == 1].groupby('Vehicle_Type').
↪agg({'Price': 'mean', 'Automobile_Sales': 'sum'}).reset_index()

# Scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(avg_price_sales_recession['Price'],␣
↪avg_price_sales_recession['Automobile_Sales'])

for i, txt in enumerate(avg_price_sales_recession['Vehicle_Type']):


plt.annotate(txt, (avg_price_sales_recession['Price'][i],␣
↪avg_price_sales_recession['Automobile_Sales'][i]))

plt.title('Correlation Between Average Vehicle Price and Sales Volume During␣


↪Recessions')

plt.xlabel('Average Vehicle Price')


plt.ylabel('Total Automobile Sales')
plt.grid(True)
plt.show()

4
5. Pie chart to display the portion of advertising expenditure of Automotives during recession
and non-recession periods
[6]: # Calculate total advertising expenditure during recession and non-recession␣
↪periods

ad_exp_recession = df[df['Recession'] == 1]['Advertising_Expenditure'].sum()


ad_exp_non_recession = df[df['Recession'] == 0]['Advertising_Expenditure'].sum()

# Create pie chart


labels = ['Recession Period', 'Non-Recession Period']
sizes = [ad_exp_recession, ad_exp_non_recession]
colors = ['#ff9999','#66b3ff']
explode = (0.1, 0)

plt.figure(figsize=(8, 8))
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.
↪1f%%', shadow=True, startangle=140)

plt.title('Advertising Expenditure During Recession and Non-Recession Periods')


plt.axis('equal')
plt.show()

5
6) Heatmap to Understand Correlation Between IMDB Score, Hidden Gem Score, and IMDB
Votes
[12]: import pandas as pd

# Load the dataset


url = 'https://itv-contentbucket.s3.ap-south-1.amazonaws.com/Exams/AWP/pandas/
↪Netflix.csv'

df = pd.read_csv(url)

# Print the column names


print(df.columns)

Index(['Title', 'Genre', 'Languages', 'Series or Movie', 'Hidden Gem Score',


'Country Availability', 'Runtime', 'Director', 'Writer', 'Actors',
'View Rating', 'IMDb Score', 'Rotten Tomatoes Score',
'Metacritic Score', 'Awards Nominated For', 'Boxoffice', 'Release Date',
'Netflix Release Date', 'Netflix Link', 'IMDb Votes'],
dtype='object')

[16]: import seaborn as sns


import matplotlib.pyplot as plt

6
# Ensure relevant columns are numeric
df['IMDb Score'] = pd.to_numeric(df['IMDb Score'], errors='coerce')
df['Hidden Gem Score'] = pd.to_numeric(df['Hidden Gem Score'], errors='coerce')
df['IMDb Votes'] = pd.to_numeric(df['IMDb Votes'], errors='coerce')

# Drop rows with NaN values in the relevant columns


correlation_data = df[['IMDb Score', 'Hidden Gem Score', 'IMDb Votes']].dropna()

# Calculate the correlation matrix


correlation_matrix = correlation_data.corr()

# Create the heatmap


plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation between IMDb Score, Hidden Gem Score, and IMDb Votes')
plt.show()

7) Plot lines for categories of every movie type and analyze how they have received IMDB Votes.
Create a subplot to compare the same categories with IMDB Score
[17]: # Ensure 'Series or Movie' and 'IMDb Votes' columns are numeric
df['IMDb Votes'] = pd.to_numeric(df['IMDb Votes'], errors='coerce')

7
df['IMDb Score'] = pd.to_numeric(df['IMDb Score'], errors='coerce')

# Aggregate IMDb Votes and IMDb Score by 'Series or Movie'


votes_by_type = df.groupby('Series or Movie')['IMDb Votes'].sum().reset_index()
score_by_type = df.groupby('Series or Movie')['IMDb Score'].mean().reset_index()

# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# IMDb Votes plot


sns.lineplot(ax=axes[0], data=votes_by_type, x='Series or Movie', y='IMDb␣
↪Votes', marker='o')

axes[0].set_title('IMDb Votes by Movie Type')


axes[0].set_xlabel('Movie Type')
axes[0].set_ylabel('IMDb Votes')
axes[0].grid(True)

# IMDb Score plot


sns.lineplot(ax=axes[1], data=score_by_type, x='Series or Movie', y='IMDb␣
↪Score', marker='o')

axes[1].set_title('IMDb Score by Movie Type')


axes[1].set_xlabel('Movie Type')
axes[1].set_ylabel('IMDb Score')
axes[1].grid(True)

plt.tight_layout()
plt.show()

8) Create 2 bar plots to understand movies and web series by languages in which they have been
made

8
[ ]: # Extract languages data for movies and web series
movie_languages = df[df['Series or Movie'] == 'Movie']['Languages'].
↪value_counts().reset_index()

movie_languages.columns = ['Language', 'Count']

series_languages = df[df['Series or Movie'] == 'Series']['Languages'].


↪value_counts().reset_index()

series_languages.columns = ['Language', 'Count']

# Create subplots for bar plots


fig, axes = plt.subplots(1, 2, figsize=(18, 8), sharey=True)

# Movies by language
sns.barplot(ax=axes[0], data=movie_languages.head(10), x='Count', y='Language',␣
↪hue='Language', palette='viridis', dodge=False, legend=False)

axes[0].set_title('Top 10 Languages of Movies')


axes[0].set_xlabel('Count')
axes[0].set_ylabel('Language')

# Web series by language


sns.barplot(ax=axes[1], data=series_languages.head(10), x='Count',␣
↪y='Language', hue='Language', palette='inferno', dodge=False, legend=False)

axes[1].set_title('Top 10 Languages of Web Series')


axes[1].set_xlabel('Count')
axes[1].set_ylabel('Language')

plt.tight_layout()
plt.show()

You might also like