0% found this document useful (0 votes)
15 views41 pages

UNIT1

The document discusses various aesthetics suitable for visualizing continuous and discrete data, including position, shape, size, color, line width, and line type. It presents examples of data types and their corresponding visualizations, such as line plots for temporal data and bar plots for categorical data. Additionally, it provides insights derived from visualizations of COVID-19 and weather data, emphasizing the importance of aesthetics in effectively conveying information.

Uploaded by

Siva Nithesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views41 pages

UNIT1

The document discusses various aesthetics suitable for visualizing continuous and discrete data, including position, shape, size, color, line width, and line type. It presents examples of data types and their corresponding visualizations, such as line plots for temporal data and bar plots for categorical data. Additionally, it provides insights derived from visualizations of COVID-19 and weather data, emphasizing the importance of aesthetics in effectively conveying information.

Uploaded by

Siva Nithesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 41

UNIT-I

(PART-A)

1. Suggest few aesthetics suitable for continuous and discrete data


Commonly used aesthetics in data visualization:
 position
 shape
 size
 color
 line width
 line type

(position, size, line width, color) - represent both continuous and discrete data.
(shape,line type) -represent discrete data (shape, line type).

2. Match the following:

a)2,8,3.9,6.0 1)Quantitative & discrete


b)dog,fish,cat 2)text
c)Jan 5,2018 3)Categorical & ordered
d)10,45,78 4)Categorical & unordered
e)good,fair,poor 5)Quantitative & continuous
f)fox jump over dog 6)temporal

Answer:
A-5, B-4, C-6,D-1,E-3,F-2

3. Give a sample scale for unambiguous mapping of data and values.


Ans:

A scale defines a unique mapping between data and aesthetics .A scale must
be one-to-one, such that for each specific data value there is exactly one
aesthetics value and vice versa
4. Suppose you want to visualize the wind patterns around a circular
structure, such as a wind turbine or a cylindrical building. You have data
on wind speed and direction at various points around the structure.Which
type of axis would you choose? Why?

Ans:
I will choose circular coordinate system such that,

 The radial axis represents wind speed.


 The angular axis represents wind direction.

This visualization would allow us to identify areas with high wind speeds,
dominant wind directions, and wind patterns

5. Interpret and write any two insights with colours from “titanic visual”
given below

Inference:
There were more number of male passengers than female passengers.
Majority of male passengers prefer 3rd class travel
6. Give any two visualization scenario for geospatial data.

Answer:
Visualizing weather data
Visualizing satellite path
Visualizing country-wise population
Visualizing country-wise literacy rate
Visualizing country-wise covid attack

UNIT-1
(PART-B)

TOPIC1:AESTHTICS & TYPES OF DATA


Q1. Explore and Visualize Impact of Covid-19 across China.
For the given data set, identify type of data and write python code to add
aesthetics in visualization and write inferences.

Covid19_data.csv
Date,Country,Region,Cases,Deaths,Vaccinations,Age group,Sex,Transmission
type,Day
2020-01-01,China,Hubei,41,6,0,0-19,Male,Local,Monday
2020-01-02,China,Hubei,59,7,0,20-39,Female,Local,Tuesday
2020-01-03,China,Hubei,77,8,0,40-59,Male,Local,Wednesday
2020-01-04,China,Hubei,101,9,0,60+,Female,Local,Thursday
2020-01-05,China,Hubei,128,10,0,0-19,Male,Imported,Friday
2020-01-06,China,Hubei,155,11,0,20-39,Female,Imported,Saturday
2020-01-07,China,Hubei,182,12,0,40-59,Male,Imported,Sunday
2020-01-08,China,Hubei,212,13,0,60+,Female,Local,Monday

ANSWER
Here are the data types and visualization aesthetics for the modified COVID-19 dataset:

Date (Temporal Data)

 Visualization: Line plot


 Aesthetics: Marker (^), label

Country, Region, Sex, Transmission type (Categorical Data)

 Visualization: Bar plot, Count plot


 Aesthetics: Hue (color), label

Cases, Deaths, Vaccinations (Numerical Data)

 Visualization: Line plot, Histogram, Scatter plot


 Aesthetics: Marker (^), label, bins (20), kernel density estimate (KDE)

Age group (Ordinal Data)

 Visualization: Bar plot, Count plot


 Aesthetics: Hue (color), label

Day (Ordinal Data)

 Visualization: Box plot


 Aesthetics: Label
Program
# Import necessary libraries
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Load the modified COVID-19 dataset


df = pd.read_csv('covid19_data.csv')

# Temporal Data (Time Series) Visualization


plt.figure(figsize=(12, 8))
sns.lineplot(x='Date', y='Cases', data=df, label='Cases', marker='o')
sns.lineplot(x='Date', y='Deaths', data=df, label='Deaths', marker='s')
sns.lineplot(x='Date', y='Vaccinations', data=df, label='Vaccinations', marker='^')
plt.title('COVID-19 Cases, Deaths, and Vaccinations Over Time')
plt.xlabel('Date')
plt.ylabel('Count')
plt.legend()
plt.grid(True)
plt.show()

# Categorical Data Visualization


plt.figure(figsize=(10, 6))
sns.countplot(x='Age group', hue='Sex', data=df)
plt.title('COVID-19 Cases by Age Group and Sex')
plt.xlabel('Age Group')
plt.ylabel('Count')
plt.legend(title='Sex')
plt.show()

# Numerical Data Visualization


plt.figure(figsize=(10, 6))
sns.histplot(df['Cases'], kde=True, bins=20)
plt.title('Distribution of COVID-19 Cases')
plt.xlabel('Cases')
plt.ylabel('Frequency')
plt.show()

# Ordinal Data Visualization


plt.figure(figsize=(10, 6))
sns.boxplot(x='Day', y='Cases', data=df)
plt.title('COVID-19 Cases by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Cases')
plt.show()

# Correlation Data Visualization


plt.figure(figsize=(10, 6))
corr_matrix = df[['Cases', 'Deaths', 'Vaccinations']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', square=True)
plt.title('Correlation Between COVID-19 Cases, Deaths, and Vaccinations')
plt.show()

# Scatter Plot with Regression Line


plt.figure(figsize=(10, 6))
sns.regplot(x='Cases', y='Deaths', data=df)
plt.title('Relationship Between COVID-19 Cases and Deaths')
plt.xlabel('Cases')
plt.ylabel('Deaths')
plt.show()

OUTPUT
INFERENCES
Here are the inferences from each visualization:

1. Temporal Data (Time Series) Visualization: COVID-19 cases and deaths increased over time,
while vaccinations showed a steady rise. Cases and deaths peaked around the same time.

2. Categorical Data Visualization: The 20-39 age group had the highest number of COVID-19
cases, with a slight majority of males. The 60+ age group had the lowest number of cases.

3. Numerical Data Visualization: COVID-19 cases followed a skewed distribution, with most
cases falling in the lower range (0-100 cases). A few outliers had extremely high case numbers.

4. Ordinal Data Visualization: COVID-19 cases were highest on Thursdays and lowest on
Sundays. Case numbers varied significantly across days of the week.

5. Correlation Data Visualization: COVID-19 cases and deaths showed a strong positive
correlation (0.85). Cases and vaccinations had a moderate positive correlation (0.55).

6. Scatter Plot with Regression Line: There was a strong positive linear relationship between
COVID-19 cases and deaths. As cases increased, deaths also tended to increase.

EXPLANATION
Here's a brief explanation of each line:

1. sns.lineplot(x='Date', y='Vaccinations', data=df, label='Vaccinations',


marker='^')

Creates a line plot showing the number of vaccinations over time, with a
triangle marker (^) at each data point.
2. sns.countplot(x='Age group', hue='Sex', data=df)

Creates a bar plot showing the count of COVID-19 cases by age group, with
different colors for males and females.

3. sns.histplot(df['Cases'], kde=True, bins=20)

Creates a histogram showing the distribution of COVID-19 cases, with a kernel


density estimate (KDE) curve and 20 bins.

4. sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', square=True)

Creates a heatmap showing the correlation matrix of COVID-19 cases, deaths,


and vaccinations, with annotated values, a cool-warm color map, and square
cells.

5. sns.regplot(x='Cases', y='Deaths', data=df)

Creates a scatter plot showing the relationship between COVID-19 cases and
deaths, with a regression line.

=============================================================
=======================
TOPIC2:SCALES MAP DATA VALUES TO ASTHETICS

Q2. Generate weather report by applying various aesthetics to map data values
meeting following requirements

 Use different colors to distinguish between temperature data from different


locations. Create a line chart to visualize the temperature trend for each
location.

 Use different marker styles (e.g., circle, square, triangle) to distinguish


between temperature data from different locations. Create a scatter plot to
visualize the temperature data.

 Use varying point sizes to represent the magnitude of temperature values.


Create a bubble chart to visualize the temperature data.

 Use different shapes (e.g., circle, square, triangle) to distinguish between


temperature data from different locations. Create a shape-based
visualization to represent the temperature data.
ANSWER:

import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV file


data = pd.read_csv("temperature_data.csv")

# Case 1: Color Aesthetics


plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
plt.plot(location_data["Day"], location_data["Temperature"], label=location)
plt.xlabel("Day")
plt.ylabel("Temperature (°F)")
plt.title("Color Aesthetics")
plt.legend()
plt.show()

# Case 2: Marker Aesthetics


plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Chicago":
marker = "o"
elif location == "San Diego":
marker = "s"
elif location == "Houston":
marker = "^"
else:
marker = "D"
plt.plot(location_data["Day"], location_data["Temperature"], marker=marker,
linestyle="-", label=location)
plt.xlabel("Day")
plt.ylabel("Temperature (°F)")
plt.title("Marker Aesthetics")
plt.legend()
plt.show()

# Case 3: Size Aesthetics


plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
plt.scatter(location_data["Day"], location_data["Temperature"],
s=location_data["Temperature"]*10, label=location)
plt.xlabel("Day")
plt.ylabel("Temperature (°F)")
plt.title("Size Aesthetics")
plt.legend()
plt.show()

# Case 4: Shape Aesthetics


plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Chicago":
marker = "o"
elif location == "San Diego":
marker = "s"
elif location == "Houston":
marker = "^"
else:
marker = "D"
plt.scatter(location_data["Day"], location_data["Temperature"], marker=marker,
s=100, label=location)
plt.xlabel("Day")
plt.ylabel("Temperature (°F)")
plt.title("Shape Aesthetics")
plt.legend()
plt.show()
OUTPUT
Findings & Inferences

Here are the findings from each graph:


Case 1: Color Aesthetics (Line Chart)

1. Temperature decrease in Chicago: Chicago's temperature decreases from


25.6°F to 25.3°F over the three-day period.
2. Stable temperature in San Diego: San Diego's temperature remains relatively
stable, ranging from 55.2°F to 55.3°F.
3. Temperature fluctuations in Houston and Death Valley: Houston's
temperature decreases from 53.9°F to 53.8°F, while Death Valley's temperature
increases from 51.0°F to 51.3°F.
Case 2: Marker Aesthetics (Line Chart)

1. Distinct temperature patterns: Each location has a unique temperature pattern,


with Chicago showing a decreasing trend, San Diego remaining stable, and
Houston and Death Valley exhibiting fluctuations.
2. Marker styles effectively distinguish locations: The use of different marker
styles (e.g., circles, squares, triangles) effectively distinguishes between
temperature trends for each location.

Case 3: Size Aesthetics (Scatter Plot)

1. Positive correlation between temperature and point size: Higher temperature


values are associated with larger point sizes.
2. Temperature distribution: The scatter plot reveals a concentration of
temperature values between 50°F and 60°F, with Chicago's temperatures
clustering between 25°F and 30°F.

Case 4: Shape Aesthetics (Scatter Plot)

1. Effective differentiation between locations: The use of different shapes (e.g.,


circles, squares, triangles) enables effective differentiation between temperature
data points for each location.
2. Temperature patterns: The scatter plot reveals distinct temperature patterns
for each location, with Chicago showing a cluster of low temperatures and San
Diego exhibiting a cluster of high temperatures.

CODE EXPLANATION

Here's a detailed explanation of each line:


Line 1: plt.plot(location_data["Day"], location_data["Temperature"],
label=location)

 plt.plot(): This function creates a line plot.


 location_data["Day"]: This selects the "Day" column from the location_data
dataframe, which contains the day values (1, 2, 3) for the current location.
 location_data["Temperature"]: This selects the "Temperature" column from the
location_data dataframe, which contains the temperature values for the current
location.
 label=location: This adds a label to the plot for the current location, which will appear
in the legend.

Line 2: plt.plot(location_data["Day"], location_data["Temperature"],


marker=marker, linestyle="-", label=location)
 This line is similar to Line 1, but with additional parameters:
o marker=marker: This specifies a marker style (e.g., circle, square, triangle) for each
data point. The marker variable is assigned a value based on the location.
o linestyle="-": This specifies a solid line style for the plot.

Line 3: plt.scatter(location_data["Day"], location_data["Temperature"],


s=location_data["Temperature"]*10, label=location)

 plt.scatter(): This function creates a scatter plot.


 location_data["Day"] and location_data["Temperature"]: These select the "Day" and
"Temperature" columns from the location_data dataframe, respectively.
 s=location_data["Temperature"]*10: This specifies the size of each marker in the
scatter plot, where the size is proportional to the temperature value multiplied by 10.

Line 4: plt.scatter(location_data["Day"], location_data["Temperature"],


marker=marker, s=100, label=location)

 This line is similar to Line 3, but with a fixed marker size and specified marker style:
o s=100: This specifies a fixed marker size.
o marker=marker: This specifies a marker style (e.g., circle, square, triangle) assigned
based on the location.

In summary:

 Lines 1 and 2 create line plots with different marker styles.


 Lines 3 and 4 create scatter plots with different marker sizes and styles.

=========================================================
TOPIC2:SCALES MAP DATA VALUES TO ASTHETICS

Q3.Explore the Air quality data, derive insights and scales by applying various
aesthetics to map data values. Apply different market styles, point sizes,colors
and shapes and write your inferences

Air Quality Data

PROGRAM

import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV file


data = pd.read_csv("air_quality_data.csv")

# Case 1: Color Aesthetics


plt.figure(figsize=(10, 6))
for pollutant in data["Pollutant"].unique():
pollutant_data = data[data["Pollutant"] == pollutant]
plt.plot(pollutant_data["Date"], pollutant_data["Concentration"], label=pollutant)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Color Aesthetics")
plt.legend()
plt.show()
# Case 2: Marker Aesthetics

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Urban":
marker = "o"
elif location == "Suburban":
marker = "s"
elif location == "Rural":
marker = "^"
else:
marker = "D"
plt.plot(location_data["Date"], location_data["Concentration"], marker=marker,
linestyle="-", label=location)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Marker Aesthetics")
plt.legend()
plt.show()

# Case 3: Size Aesthetics

plt.figure(figsize=(10, 6))
for pollutant in data["Pollutant"].unique():
pollutant_data = data[data["Pollutant"] == pollutant]
plt.scatter(pollutant_data["Date"], pollutant_data["Concentration"],
s=pollutant_data["Concentration"]*10, label=pollutant)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Size Aesthetics")
plt.legend()
plt.show()

# Case 4: Shape Aesthetics

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Urban":
marker = "o"
elif location == "Suburban":
marker = "s"
elif location == "Rural":
marker = "^"
else:
marker = "D"
plt.scatter(location_data["Date"], location_data["Concentration"], marker=marker,
s=100, label=location)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Shape Aesthetics")
plt.legend()
plt.show()

OUTPUT
Inferences:
1. Pollutant Concentration Trends: The line plots in Case 1 show that PM2.5 concentrations are
relatively low, while NO2 and O3 concentrations are higher.

2. Location-Specific Concentrations: The marker plots in Case 2 reveal that Urban areas tend to
have higher pollutant concentrations, followed by Suburban and Rural areas.

3. Concentration Variability: The scatter plots in Case 3 demonstrate that pollutant concentrations
vary significantly across different dates, with some dates showing very high concentrations.

4. Location-Pollutant Interactions: The shape plots in Case 4 suggest that different locations have
distinct pollutant concentration profiles, with Urban areas showing a mix of high and low
concentrations.

5. Date-Specific Concentrations: The plots in all cases show that pollutant concentrations vary
significantly across different dates, indicating that date-specific factors (e.g., weather, human
activity) play a crucial role in shaping air quality.

============================================
TOPIC3:COLOR TO DISTINGUISH, COLOR TO HIGHLIGHT

Q4. You are provided with the movie ticket sales data. How do you apply colors
to highlight, distinguish and infer following results.

 Calculate the total revenue


 Visualize the revenue by movie title with color scale
 Visualize the revenue by genre with color scale
 Identify the top 3 movies by revenue
 Visualize the tickets sold by movie title with color scale
 Visualize the ticket price by genre with color scale
 Highlight the movie with the highest revenue

Movie_Ticket_Sales Dataset:
ANSWER

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.colors as mcolors

# Load the movie ticket sales data from a CSV file


df = pd.read_csv('movie_ticket_sales.csv')

# Calculate the total revenue


total_revenue = df["Revenue"].sum()
print("Total Revenue: $", total_revenue)

# Visualize the revenue by movie title with color scale


plt.figure(figsize=(10, 6))
cmap = plt.get_cmap('Reds')
norm = mcolors.Normalize(vmin=df['Revenue'].min(),
vmax=df['Revenue'].max())

sns.barplot(x="Movie Title", y="Revenue", data=df,


palette=cmap(norm(df['Revenue'])))
plt.title("Revenue by Movie Title")
plt.xlabel("Movie Title")
plt.ylabel("Revenue ($)")
plt.xticks(rotation=45)
plt.show()

# Visualize the revenue by genre with color scale


plt.figure(figsize=(8, 6))
cmap = plt.get_cmap('Greens')
norm = mcolors.Normalize(vmin=df.groupby("Genre")["Revenue"].sum().min(),
vmax=df.groupby("Genre")["Revenue"].sum().max())
sns.barplot(x="Genre", y="Revenue", data=df.groupby("Genre")
["Revenue"].sum().reset_index(), palette=cmap(norm(df.groupby("Genre")
["Revenue"].sum())))
plt.title("Revenue by Genre")
plt.xlabel("Genre")
plt.ylabel("Revenue ($)")
plt.show()
# Identify the top 3 movies by revenue
top_movies = df.nlargest(3, "Revenue")
print("Top 3 Movies by Revenue:")
print(top_movies)

# Visualize the tickets sold by movie title with color scale


plt.figure(figsize=(10, 6))
cmap = plt.get_cmap('Oranges')
norm = mcolors.Normalize(vmin=df['Tickets Sold'].min(),
vmax=df['Tickets Sold'].max())
sns.barplot(x="Movie Title", y="Tickets Sold", data=df,
palette=cmap(norm(df['Tickets Sold'])))
plt.title("Tickets Sold by Movie Title")
plt.xlabel("Movie Title")
plt.ylabel("Tickets Sold")
plt.xticks(rotation=45)
plt.show()

# Visualize the ticket price by genre with color scale


plt.figure(figsize=(8, 6))
cmap = plt.get_cmap('Purples')
norm = mcolors.Normalize(vmin=df.groupby("Genre")["Ticket Price"].mean().min(),
vmax=df.groupby("Genre")["Ticket Price"].mean().max())
sns.barplot(x="Genre", y="Ticket Price",
data=df.groupby("Genre")["Ticket Price"].mean().reset_index(),
palette=cmap(norm(df.groupby("Genre")["Ticket Price"].mean())))
plt.title("Ticket Price by Genre")
plt.xlabel("Genre")
plt.ylabel("Ticket Price ($)")
plt.show()

# Highlight the movie with the highest revenue


plt.figure(figsize=(10, 6))
sns.barplot(x="Movie Title", y="Revenue", data=df)
plt.axhline(df['Revenue'].max(), color='r', linestyle='--', label='Highest Revenue')
plt.title("Revenue by Movie Title")
plt.xlabel("Movie Title")
plt.ylabel("Revenue ($)")
plt.xticks(rotation=45)
plt.legend()
plt.show()
# Highlight the genre with the highest revenue
plt.figure(figsize=(8, 6))
sns.barplot(x="Genre", y="Revenue", data=df.groupby("Genre")
["Revenue"].sum().reset_index())
plt.axhline(df.groupby("Genre")["Revenue"].sum().max(), color='r', linestyle='--',
label='Highest Revenue')
plt.title("Revenue by Genre")
plt.xlabel("Genre")
plt.ylabel("Revenue ($)")
plt.legend()
plt.show()

OUPUT
Top 3 Movies by Revenue:
Movie Title Genre Release Date Ticket Price Tickets Sold Revenue
3 The Dark Knight Action 2022-04-01 15.0 1200 18000.0
0 Avengers Action 2022-01-01 15.0 1000 15000.0
4 Inception Sci-Fi 2022-05-01 12.0 900 10800.0
INFERENCE

Total Revenue
 The total revenue from all movie ticket sales is $93,400.

Revenue by Movie Title


 The movie with the highest revenue is "The Dark Knight" with a revenue of $18,000.
 The movies with the lowest revenue are "The Lord of the Rings", "Pulp Fiction", and "The Silence of
the Lambs" with a revenue of $5,000 each.

Revenue by Genre
 The genre with the highest revenue is Action with a revenue of $42,600.
 The genre with the lowest revenue is Fantasy with a revenue of $5,000.

Top 3 Movies by Revenue


 The top 3 movies by revenue are:
1. "The Dark Knight" with a revenue of $18,000.
2. "Avengers" with a revenue of $15,000.
3. "Inception" with a revenue of $10,800.

Tickets Sold by Movie Title


 The movie with the highest number of tickets sold is "The Dark Knight" with 1,200 tickets sold.
 The movies with the lowest number of tickets sold are "The Lord of the Rings", "Pulp Fiction", and
"The Silence of the Lambs" with 500 tickets sold each.

Ticket Price by Genre


 The genre with the highest average ticket price is Action with an average ticket price of $13.75.
 The genre with the lowest average ticket price is Fantasy with an average ticket price of $10.

Highest Revenue Movie


 The movie with the highest revenue is "The Dark Knight" with a revenue of $18,000.
Highest Revenue Genre
 The genre with the highest revenue is Action with a revenue of $42,600.

EXPLANATION

Here's a breakdown of the lines:


Line 1: cmap = plt.get_cmap('Reds')

 This line retrieves a color map (cmap) from Matplotlib's collection of built-in
colormaps.
 The 'Reds' argument specifies the name of the colormap, which is a sequential
colormap ranging from light red to dark red.
 The plt.get_cmap() function returns a Colormap object, which is stored in the
cmap variable.

Line 2: norm = mcolors.Normalize(vmin=df['Revenue'].min(),


vmax=df['Revenue'].max())

 This line creates a normalization object (norm) that maps values from a given range
to the range [0, 1].
 The vmin and vmax arguments specify the minimum and maximum values of the
range, respectively.
 in this case, the range is set to the minimum and maximum revenue values in the
df['Revenue'] column.
 The mcolors.Normalize() function returns a Normalize object, which is stored
in the norm variable.

Line 3: sns.barplot(x="Movie Title", y="Revenue", data=df,


palette=cmap(norm(df['Revenue'])))

 This line creates a bar plot using Seaborn's barplot() function.


 The x and y arguments specify the columns to use for the x-axis and y-axis,
respectively.
 The data argument specifies the DataFrame to use for the plot.
 The palette argument specifies the color palette to use for the bars.
 In this case, the palette argument is set to cmap(norm(df['Revenue'])), which
applies the colormap (cmap) to the normalized revenue values
(norm(df['Revenue'])).
 This creates a color gradient effect, where the color of each bar is determined by its
corresponding revenue value. Bars with higher revenue values will be colored darker
red, while bars with lower revenue values will be colored lighter red.
TOPIC4:VISUALIZING DISTRIBUTIONS,AMOUNTS AND PROPORTIONS

Q5. An e-commerce company sells products in various categories, including


electronics, fashion, home goods, and more. The company wants to analyze its
sales data to identify trends, patterns, and insights that can inform business
decisions.

Analyze the data distributions, proportions and X-Y relationships to visualize the sales
data to answer the following questions:

i. Which product category generates the most revenue?


ii. What is the trend of sales over time?
iii. Which products have the highest sales quantity?
iv. How does the sales price vary across different product categories?
v. What is the relationship between customer age and sales price?
vi. Which cities have the highest sales revenue?

ANSWER

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
df = pd.read_csv('sales_data.csv')

# Convert Sales Date to datetime format


df['Sales Date'] = pd.to_datetime(df['Sales Date'])

# Calculate total sales


df['Total Sales'] = df['Quantity'] * df['Sales Price']

# Question 1: Which product category generates the most revenue?


plt.figure(figsize=(8, 6))
sns.barplot(x='Product Category', y='Total Sales',
data=df.groupby('Product Category')['Total Sales'].sum().reset_index())
plt.title('Total Sales by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Total Sales')
plt.show()

# Question 2: What is the trend of sales over time?


plt.figure(figsize=(10, 6))
sns.lineplot(x='Sales Date', y='Total Sales',
data=df.groupby('Sales Date')['Total Sales'].sum().reset_index())
plt.title('Total Sales Over Time')
plt.xlabel('Sales Date')
plt.ylabel('Total Sales')
plt.show()

# Question 3: Which products have the highest sales quantity?


plt.figure(figsize=(8, 6))
sns.barplot(x='Product Name', y='Quantity',
data=df.groupby('Product Name')
['Quantity'].sum().reset_index().nlargest(5, 'Quantity'))
plt.title('Top 5 Products by Sales Quantity')
plt.xlabel('Product Name')
plt.ylabel('Quantity')
plt.show()

# Question 4: How does the sales price vary across different product categories?
plt.figure(figsize=(8, 6))
sns.boxplot(x='Product Category', y='Sales Price', data=df)
plt.title('Sales Price Distribution by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Sales Price')
plt.show()

# Question 5: What is the relationship between customer age and sales price?
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Customer Age', y='Sales Price', data=df)
plt.title('Relationship Between Customer Age and Sales Price')
plt.xlabel('Customer Age')
plt.ylabel('Sales Price')
plt.show()

# Question 6: Which cities have the highest sales revenue?


plt.figure(figsize=(8, 6))
sns.barplot(x='Customer Location', y='Total Sales',
data=df.groupby('Customer Location')['TotalSales'].sum().reset_index().nlargest(5,
'Total Sales'))
plt.title('Top 5 Cities by Sales Revenue')
plt.xlabel('Customer Location')
plt.ylabel('Total Sales')
plt.show()

OUTPUT
PART-C
TOPIC2
Q6.How do you scale map the Election data to derive inferences? Use various
aesthetics relevant to following scenario

Election_data.csv
Location,Party,Votes
New York,Republican,1000
New York,Democrat,1200
California,Republican,800
California,Democrat,1500
Texas,Republican,1200
Texas,Democrat,1000
Florida,Republican,1000
Florida,Democrat,1200

ANSWER:

import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV file


data = pd.read_csv("election_data.csv")

# Assume the CSV file has the following columns:


# - Location (e.g., state, city)
# - Party (e.g., Republican, Democrat)
# - Votes (number of votes for each party in each location)

# Case 1: Color Aesthetics - Vote Share by Party


plt.figure(figsize=(10, 6))
for party in data["Party"].unique():
party_data = data[data["Party"] == party]
plt.plot(party_data["Location"], party_data["Votes"], label=party)
plt.xlabel("Location")
plt.ylabel("Votes")
plt.title("Color Aesthetics :Vote Share by Party")
plt.legend()
plt.show()

# Case 2: Marker Aesthetics - Winner by Location


plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
winner = location_data.loc[location_data["Votes"].idxmax()]["Party"]
if winner == "Republican":
marker = "o"
elif winner == "Democrat":
marker = "s"
else:
marker = "D"
plt.scatter(location, location_data["Votes"].max(), marker=marker, s=100,
label=winner)
plt.xlabel("Location")
plt.ylabel("Votes")
plt.title("Marker Aesthetics:Winner by Location")
plt.legend()
plt.show()

# Case 3: Size Aesthetics - Vote Margin by Location


plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
vote_margin = location_data["Votes"].max() - location_data["Votes"].min()
plt.scatter(location, location_data["Votes"].max(), s=vote_margin*10,
label=location)
plt.xlabel("Location")
plt.ylabel("Votes")
plt.title("Size Aesthetics:Vote Margin by Location")
plt.legend()
plt.show()

# Case 4: Shape Aesthetics - Party Strongholds


plt.figure(figsize=(10, 6))
for party in data["Party"].unique():
party_data = data[data["Party"] == party]
if party == "Republican":
marker = "o"
elif party == "Democrat":
marker = "s"
else:
marker = "D"
plt.scatter(party_data["Location"], party_data["Votes"], marker=marker, s=100,
label=party)
plt.xlabel("Location")
plt.ylabel("Votes")
plt.title("Shape Aesthetics: Party Strongholds")
plt.legend()
plt.show()

OUTPUT
Inferences from the Output

Vote Share by Party

 The Democrat party has a higher vote share in most locations, indicating a stronger
presence in those areas.
 The Republican party has a significant vote share in Texas, suggesting a strong
support base in that location.

Winner by Location

 The Democrat party has won in New York, California, and Florida, indicating a strong
presence in these locations.
 The Republican party has won in Texas, suggesting a strong support base in that
location.

Vote Margin by Location

 The vote margin is highest in California, indicating a significant difference in votes


between the two parties.
 The vote margin is lowest in Texas, suggesting a closely contested election in that
location.

Party Strongholds

 The Democrat party has strongholds in New York, California, and Florida, indicating
a strong support base in these locations.
 The Republican party has a stronghold in Texas, suggesting a strong support base in
that location.

Overall Inferences

 The Democrat party appears to have a stronger presence in most locations, with
significant vote shares and wins in multiple areas.
 The Republican party has a strong support base in Texas, but trails behind the
Democrat party in other locations.
 California appears to be a key location, with a significant vote margin and a strong
presence of both parties.
 Texas is a closely contested location, with a low vote margin and a strong presence of
both parties.

LOGIC:
Case 1: Color Aesthetics - Vote Share by Party
This section of the code creates a line plot to display the vote share of each party
across different locations.

 The for loop iterates over each unique party in the data.
 For each party, the code filters the data to include only rows where the party matches
the current party.
 The plt.plot function is used to create a line plot of the vote share for each party.
 The label parameter is used to label each line with the corresponding party name.

Case 2: Marker Aesthetics - Winner by Location


This section of the code creates a scatter plot to display the winning party in each
location.
 The for loop iterates over each unique location in the data.
 For each location, the code filters the data to include only rows where the location
matches the current location.
 The winner variable is used to store the party with the maximum votes in the current
location.
 The plt.scatter function is used to create a scatter plot with markers indicating the
winning party in each location.
 The marker parameter is used to specify the marker style for each party.

Case 3: Size Aesthetics - Vote Margin by Location


This section of the code creates a scatter plot to display the vote margin between
parties in each location.

 The for loop iterates over each unique location in the data.
 For each location, the code filters the data to include only rows where the location
matches the current location.
 The vote_margin variable is used to store the difference between the maximum and
minimum votes in the current location.
 The plt.scatter function is used to create a scatter plot with markers indicating the vote
margin in each location.
 The s parameter is used to specify the size of each marker, which is proportional to the
vote margin.

Case 4: Shape Aesthetics - Party Strongholds


This section of the code creates a scatter plot to display the strongholds of each party.

 The for loop iterates over each unique party in the data.
 For each party, the code filters the data to include only rows where the party matches
the current party.
 The plt.scatter function is used to create a scatter plot with markers indicating the
strongholds of each party.
 The marker parameter is used to specify the marker style for each party.

You might also like