0% found this document useful (0 votes)

15 views41 pages

UNIT1

The document discusses various aesthetics suitable for visualizing continuous and discrete data, including position, shape, size, color, line width, and line type. It presents examples of data types and their corresponding visualizations, such as line plots for temporal data and bar plots for categorical data. Additionally, it provides insights derived from visualizations of COVID-19 and weather data, emphasizing the importance of aesthetics in effectively conveying information.

Uploaded by

Siva Nithesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views41 pages

UNIT1

Uploaded by

Siva Nithesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 41

UNIT-I

(PART-A)

1. Suggest few aesthetics suitable for continuous and discrete data

Commonly used aesthetics in data visualization:
 position
 shape
 size
 color
 line width
 line type

(position, size, line width, color) - represent both continuous and discrete data.
(shape,line type) -represent discrete data (shape, line type).

2. Match the following:

a)2,8,3.9,6.0 1)Quantitative & discrete

b)dog,fish,cat 2)text
c)Jan 5,2018 3)Categorical & ordered
d)10,45,78 4)Categorical & unordered
e)good,fair,poor 5)Quantitative & continuous
f)fox jump over dog 6)temporal

Answer:
A-5, B-4, C-6,D-1,E-3,F-2

3. Give a sample scale for unambiguous mapping of data and values.

Ans:

A scale defines a unique mapping between data and aesthetics .A scale must
be one-to-one, such that for each specific data value there is exactly one
aesthetics value and vice versa
4. Suppose you want to visualize the wind patterns around a circular
structure, such as a wind turbine or a cylindrical building. You have data
on wind speed and direction at various points around the structure.Which
type of axis would you choose? Why?

Ans:
I will choose circular coordinate system such that,

 The radial axis represents wind speed.

 The angular axis represents wind direction.

This visualization would allow us to identify areas with high wind speeds,
dominant wind directions, and wind patterns

5. Interpret and write any two insights with colours from “titanic visual”
given below

Inference:
There were more number of male passengers than female passengers.
Majority of male passengers prefer 3rd class travel
6. Give any two visualization scenario for geospatial data.

Answer:
Visualizing weather data
Visualizing satellite path
Visualizing country-wise population
Visualizing country-wise literacy rate
Visualizing country-wise covid attack

UNIT-1
(PART-B)

TOPIC1:AESTHTICS & TYPES OF DATA

Q1. Explore and Visualize Impact of Covid-19 across China.
For the given data set, identify type of data and write python code to add
aesthetics in visualization and write inferences.

Covid19_data.csv
Date,Country,Region,Cases,Deaths,Vaccinations,Age group,Sex,Transmission
type,Day
2020-01-01,China,Hubei,41,6,0,0-19,Male,Local,Monday
2020-01-02,China,Hubei,59,7,0,20-39,Female,Local,Tuesday
2020-01-03,China,Hubei,77,8,0,40-59,Male,Local,Wednesday
2020-01-04,China,Hubei,101,9,0,60+,Female,Local,Thursday
2020-01-05,China,Hubei,128,10,0,0-19,Male,Imported,Friday
2020-01-06,China,Hubei,155,11,0,20-39,Female,Imported,Saturday
2020-01-07,China,Hubei,182,12,0,40-59,Male,Imported,Sunday
2020-01-08,China,Hubei,212,13,0,60+,Female,Local,Monday

ANSWER
Here are the data types and visualization aesthetics for the modified COVID-19 dataset:

Date (Temporal Data)

 Visualization: Line plot

 Aesthetics: Marker (^), label

Country, Region, Sex, Transmission type (Categorical Data)

 Visualization: Bar plot, Count plot

 Aesthetics: Hue (color), label

Cases, Deaths, Vaccinations (Numerical Data)

 Visualization: Line plot, Histogram, Scatter plot

 Aesthetics: Marker (^), label, bins (20), kernel density estimate (KDE)

Age group (Ordinal Data)

 Visualization: Bar plot, Count plot

 Aesthetics: Hue (color), label

Day (Ordinal Data)

 Visualization: Box plot

 Aesthetics: Label
Program
# Import necessary libraries
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Load the modified COVID-19 dataset

df = pd.read_csv('covid19_data.csv')

# Temporal Data (Time Series) Visualization

plt.figure(figsize=(12, 8))
sns.lineplot(x='Date', y='Cases', data=df, label='Cases', marker='o')
sns.lineplot(x='Date', y='Deaths', data=df, label='Deaths', marker='s')
sns.lineplot(x='Date', y='Vaccinations', data=df, label='Vaccinations', marker='^')
plt.title('COVID-19 Cases, Deaths, and Vaccinations Over Time')
plt.xlabel('Date')
plt.ylabel('Count')
plt.legend()
plt.grid(True)
plt.show()

# Categorical Data Visualization

plt.figure(figsize=(10, 6))
sns.countplot(x='Age group', hue='Sex', data=df)
plt.title('COVID-19 Cases by Age Group and Sex')
plt.xlabel('Age Group')
plt.ylabel('Count')
plt.legend(title='Sex')
plt.show()

# Numerical Data Visualization

plt.figure(figsize=(10, 6))
sns.histplot(df['Cases'], kde=True, bins=20)
plt.title('Distribution of COVID-19 Cases')
plt.xlabel('Cases')
plt.ylabel('Frequency')
plt.show()

# Ordinal Data Visualization

plt.figure(figsize=(10, 6))
sns.boxplot(x='Day', y='Cases', data=df)
plt.title('COVID-19 Cases by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Cases')
plt.show()

# Correlation Data Visualization

plt.figure(figsize=(10, 6))
corr_matrix = df[['Cases', 'Deaths', 'Vaccinations']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', square=True)
plt.title('Correlation Between COVID-19 Cases, Deaths, and Vaccinations')
plt.show()

# Scatter Plot with Regression Line

plt.figure(figsize=(10, 6))
sns.regplot(x='Cases', y='Deaths', data=df)
plt.title('Relationship Between COVID-19 Cases and Deaths')
plt.xlabel('Cases')
plt.ylabel('Deaths')
plt.show()

OUTPUT
INFERENCES
Here are the inferences from each visualization:

1. Temporal Data (Time Series) Visualization: COVID-19 cases and deaths increased over time,
while vaccinations showed a steady rise. Cases and deaths peaked around the same time.

2. Categorical Data Visualization: The 20-39 age group had the highest number of COVID-19
cases, with a slight majority of males. The 60+ age group had the lowest number of cases.

3. Numerical Data Visualization: COVID-19 cases followed a skewed distribution, with most
cases falling in the lower range (0-100 cases). A few outliers had extremely high case numbers.

4. Ordinal Data Visualization: COVID-19 cases were highest on Thursdays and lowest on
Sundays. Case numbers varied significantly across days of the week.

5. Correlation Data Visualization: COVID-19 cases and deaths showed a strong positive
correlation (0.85). Cases and vaccinations had a moderate positive correlation (0.55).

6. Scatter Plot with Regression Line: There was a strong positive linear relationship between
COVID-19 cases and deaths. As cases increased, deaths also tended to increase.

EXPLANATION
Here's a brief explanation of each line:

1. sns.lineplot(x='Date', y='Vaccinations', data=df, label='Vaccinations',

marker='^')

Creates a line plot showing the number of vaccinations over time, with a
triangle marker (^) at each data point.
2. sns.countplot(x='Age group', hue='Sex', data=df)

Creates a bar plot showing the count of COVID-19 cases by age group, with
different colors for males and females.

3. sns.histplot(df['Cases'], kde=True, bins=20)

Creates a histogram showing the distribution of COVID-19 cases, with a kernel

density estimate (KDE) curve and 20 bins.

4. sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', square=True)

Creates a heatmap showing the correlation matrix of COVID-19 cases, deaths,

and vaccinations, with annotated values, a cool-warm color map, and square
cells.

5. sns.regplot(x='Cases', y='Deaths', data=df)

Creates a scatter plot showing the relationship between COVID-19 cases and
deaths, with a regression line.

=============================================================
=======================
TOPIC2:SCALES MAP DATA VALUES TO ASTHETICS

Q2. Generate weather report by applying various aesthetics to map data values
meeting following requirements

 Use different colors to distinguish between temperature data from different

locations. Create a line chart to visualize the temperature trend for each
location.

 Use different marker styles (e.g., circle, square, triangle) to distinguish

between temperature data from different locations. Create a scatter plot to
visualize the temperature data.

 Use varying point sizes to represent the magnitude of temperature values.

Create a bubble chart to visualize the temperature data.

 Use different shapes (e.g., circle, square, triangle) to distinguish between

temperature data from different locations. Create a shape-based
visualization to represent the temperature data.
ANSWER:

import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV file

data = pd.read_csv("temperature_data.csv")

# Case 1: Color Aesthetics

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
plt.plot(location_data["Day"], location_data["Temperature"], label=location)
plt.xlabel("Day")
plt.ylabel("Temperature (°F)")
plt.title("Color Aesthetics")
plt.legend()
plt.show()

# Case 2: Marker Aesthetics

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Chicago":
marker = "o"
elif location == "San Diego":
marker = "s"
elif location == "Houston":
marker = "^"
else:
marker = "D"
plt.plot(location_data["Day"], location_data["Temperature"], marker=marker,
linestyle="-", label=location)
plt.xlabel("Day")
plt.ylabel("Temperature (°F)")
plt.title("Marker Aesthetics")
plt.legend()
plt.show()

# Case 3: Size Aesthetics

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
plt.scatter(location_data["Day"], location_data["Temperature"],
s=location_data["Temperature"]*10, label=location)
plt.xlabel("Day")
plt.ylabel("Temperature (°F)")
plt.title("Size Aesthetics")
plt.legend()
plt.show()

# Case 4: Shape Aesthetics

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Chicago":
marker = "o"
elif location == "San Diego":
marker = "s"
elif location == "Houston":
marker = "^"
else:
marker = "D"
plt.scatter(location_data["Day"], location_data["Temperature"], marker=marker,
s=100, label=location)
plt.xlabel("Day")
plt.ylabel("Temperature (°F)")
plt.title("Shape Aesthetics")
plt.legend()
plt.show()
OUTPUT
Findings & Inferences

Here are the findings from each graph:

Case 1: Color Aesthetics (Line Chart)

1. Temperature decrease in Chicago: Chicago's temperature decreases from

25.6°F to 25.3°F over the three-day period.
2. Stable temperature in San Diego: San Diego's temperature remains relatively
stable, ranging from 55.2°F to 55.3°F.
3. Temperature fluctuations in Houston and Death Valley: Houston's
temperature decreases from 53.9°F to 53.8°F, while Death Valley's temperature
increases from 51.0°F to 51.3°F.
Case 2: Marker Aesthetics (Line Chart)

1. Distinct temperature patterns: Each location has a unique temperature pattern,

with Chicago showing a decreasing trend, San Diego remaining stable, and
Houston and Death Valley exhibiting fluctuations.
2. Marker styles effectively distinguish locations: The use of different marker
styles (e.g., circles, squares, triangles) effectively distinguishes between
temperature trends for each location.

Case 3: Size Aesthetics (Scatter Plot)

1. Positive correlation between temperature and point size: Higher temperature

values are associated with larger point sizes.
2. Temperature distribution: The scatter plot reveals a concentration of
temperature values between 50°F and 60°F, with Chicago's temperatures
clustering between 25°F and 30°F.

Case 4: Shape Aesthetics (Scatter Plot)

1. Effective differentiation between locations: The use of different shapes (e.g.,

circles, squares, triangles) enables effective differentiation between temperature
data points for each location.
2. Temperature patterns: The scatter plot reveals distinct temperature patterns
for each location, with Chicago showing a cluster of low temperatures and San
Diego exhibiting a cluster of high temperatures.

CODE EXPLANATION

Here's a detailed explanation of each line:

Line 1: plt.plot(location_data["Day"], location_data["Temperature"],
label=location)

 plt.plot(): This function creates a line plot.

 location_data["Day"]: This selects the "Day" column from the location_data
dataframe, which contains the day values (1, 2, 3) for the current location.
 location_data["Temperature"]: This selects the "Temperature" column from the
location_data dataframe, which contains the temperature values for the current
location.
 label=location: This adds a label to the plot for the current location, which will appear
in the legend.

Line 2: plt.plot(location_data["Day"], location_data["Temperature"],

marker=marker, linestyle="-", label=location)
 This line is similar to Line 1, but with additional parameters:
o marker=marker: This specifies a marker style (e.g., circle, square, triangle) for each
data point. The marker variable is assigned a value based on the location.
o linestyle="-": This specifies a solid line style for the plot.

Line 3: plt.scatter(location_data["Day"], location_data["Temperature"],

s=location_data["Temperature"]*10, label=location)

 plt.scatter(): This function creates a scatter plot.

 location_data["Day"] and location_data["Temperature"]: These select the "Day" and
"Temperature" columns from the location_data dataframe, respectively.
 s=location_data["Temperature"]*10: This specifies the size of each marker in the
scatter plot, where the size is proportional to the temperature value multiplied by 10.

Line 4: plt.scatter(location_data["Day"], location_data["Temperature"],

marker=marker, s=100, label=location)

 This line is similar to Line 3, but with a fixed marker size and specified marker style:
o s=100: This specifies a fixed marker size.
o marker=marker: This specifies a marker style (e.g., circle, square, triangle) assigned
based on the location.

In summary:

 Lines 1 and 2 create line plots with different marker styles.

 Lines 3 and 4 create scatter plots with different marker sizes and styles.

=========================================================
TOPIC2:SCALES MAP DATA VALUES TO ASTHETICS

Q3.Explore the Air quality data, derive insights and scales by applying various
aesthetics to map data values. Apply different market styles, point sizes,colors
and shapes and write your inferences

Air Quality Data

PROGRAM

import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV file

data = pd.read_csv("air_quality_data.csv")

# Case 1: Color Aesthetics

plt.figure(figsize=(10, 6))
for pollutant in data["Pollutant"].unique():
pollutant_data = data[data["Pollutant"] == pollutant]
plt.plot(pollutant_data["Date"], pollutant_data["Concentration"], label=pollutant)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Color Aesthetics")
plt.legend()
plt.show()
# Case 2: Marker Aesthetics

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Urban":
marker = "o"
elif location == "Suburban":
marker = "s"
elif location == "Rural":
marker = "^"
else:
marker = "D"
plt.plot(location_data["Date"], location_data["Concentration"], marker=marker,
linestyle="-", label=location)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Marker Aesthetics")
plt.legend()
plt.show()

# Case 3: Size Aesthetics

plt.figure(figsize=(10, 6))
for pollutant in data["Pollutant"].unique():
pollutant_data = data[data["Pollutant"] == pollutant]
plt.scatter(pollutant_data["Date"], pollutant_data["Concentration"],
s=pollutant_data["Concentration"]*10, label=pollutant)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Size Aesthetics")
plt.legend()
plt.show()

# Case 4: Shape Aesthetics

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Urban":
marker = "o"
elif location == "Suburban":
marker = "s"
elif location == "Rural":
marker = "^"
else:
marker = "D"
plt.scatter(location_data["Date"], location_data["Concentration"], marker=marker,
s=100, label=location)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Shape Aesthetics")
plt.legend()
plt.show()

OUTPUT
Inferences:
1. Pollutant Concentration Trends: The line plots in Case 1 show that PM2.5 concentrations are
relatively low, while NO2 and O3 concentrations are higher.

2. Location-Specific Concentrations: The marker plots in Case 2 reveal that Urban areas tend to
have higher pollutant concentrations, followed by Suburban and Rural areas.

3. Concentration Variability: The scatter plots in Case 3 demonstrate that pollutant concentrations
vary significantly across different dates, with some dates showing very high concentrations.

4. Location-Pollutant Interactions: The shape plots in Case 4 suggest that different locations have
distinct pollutant concentration profiles, with Urban areas showing a mix of high and low
concentrations.

5. Date-Specific Concentrations: The plots in all cases show that pollutant concentrations vary
significantly across different dates, indicating that date-specific factors (e.g., weather, human
activity) play a crucial role in shaping air quality.

============================================
TOPIC3:COLOR TO DISTINGUISH, COLOR TO HIGHLIGHT

Q4. You are provided with the movie ticket sales data. How do you apply colors
to highlight, distinguish and infer following results.

 Calculate the total revenue

 Visualize the revenue by movie title with color scale
 Visualize the revenue by genre with color scale
 Identify the top 3 movies by revenue
 Visualize the tickets sold by movie title with color scale
 Visualize the ticket price by genre with color scale
 Highlight the movie with the highest revenue

Movie_Ticket_Sales Dataset:
ANSWER

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.colors as mcolors

# Load the movie ticket sales data from a CSV file

df = pd.read_csv('movie_ticket_sales.csv')

# Calculate the total revenue

total_revenue = df["Revenue"].sum()
print("Total Revenue: $", total_revenue)

# Visualize the revenue by movie title with color scale

plt.figure(figsize=(10, 6))
cmap = plt.get_cmap('Reds')
norm = mcolors.Normalize(vmin=df['Revenue'].min(),
vmax=df['Revenue'].max())

sns.barplot(x="Movie Title", y="Revenue", data=df,

palette=cmap(norm(df['Revenue'])))
plt.title("Revenue by Movie Title")
plt.xlabel("Movie Title")
plt.ylabel("Revenue ($)")
plt.xticks(rotation=45)
plt.show()

# Visualize the revenue by genre with color scale

plt.figure(figsize=(8, 6))
cmap = plt.get_cmap('Greens')
norm = mcolors.Normalize(vmin=df.groupby("Genre")["Revenue"].sum().min(),
vmax=df.groupby("Genre")["Revenue"].sum().max())
sns.barplot(x="Genre", y="Revenue", data=df.groupby("Genre")
["Revenue"].sum().reset_index(), palette=cmap(norm(df.groupby("Genre")
["Revenue"].sum())))
plt.title("Revenue by Genre")
plt.xlabel("Genre")
plt.ylabel("Revenue ($)")
plt.show()
# Identify the top 3 movies by revenue
top_movies = df.nlargest(3, "Revenue")
print("Top 3 Movies by Revenue:")
print(top_movies)

# Visualize the tickets sold by movie title with color scale

plt.figure(figsize=(10, 6))
cmap = plt.get_cmap('Oranges')
norm = mcolors.Normalize(vmin=df['Tickets Sold'].min(),
vmax=df['Tickets Sold'].max())
sns.barplot(x="Movie Title", y="Tickets Sold", data=df,
palette=cmap(norm(df['Tickets Sold'])))
plt.title("Tickets Sold by Movie Title")
plt.xlabel("Movie Title")
plt.ylabel("Tickets Sold")
plt.xticks(rotation=45)
plt.show()

# Visualize the ticket price by genre with color scale

plt.figure(figsize=(8, 6))
cmap = plt.get_cmap('Purples')
norm = mcolors.Normalize(vmin=df.groupby("Genre")["Ticket Price"].mean().min(),
vmax=df.groupby("Genre")["Ticket Price"].mean().max())
sns.barplot(x="Genre", y="Ticket Price",
data=df.groupby("Genre")["Ticket Price"].mean().reset_index(),
palette=cmap(norm(df.groupby("Genre")["Ticket Price"].mean())))
plt.title("Ticket Price by Genre")
plt.xlabel("Genre")
plt.ylabel("Ticket Price ($)")
plt.show()

# Highlight the movie with the highest revenue

plt.figure(figsize=(10, 6))
sns.barplot(x="Movie Title", y="Revenue", data=df)
plt.axhline(df['Revenue'].max(), color='r', linestyle='--', label='Highest Revenue')
plt.title("Revenue by Movie Title")
plt.xlabel("Movie Title")
plt.ylabel("Revenue ($)")
plt.xticks(rotation=45)
plt.legend()
plt.show()
# Highlight the genre with the highest revenue
plt.figure(figsize=(8, 6))
sns.barplot(x="Genre", y="Revenue", data=df.groupby("Genre")
["Revenue"].sum().reset_index())
plt.axhline(df.groupby("Genre")["Revenue"].sum().max(), color='r', linestyle='--',
label='Highest Revenue')
plt.title("Revenue by Genre")
plt.xlabel("Genre")
plt.ylabel("Revenue ($)")
plt.legend()
plt.show()

OUPUT
Top 3 Movies by Revenue:
Movie Title Genre Release Date Ticket Price Tickets Sold Revenue
3 The Dark Knight Action 2022-04-01 15.0 1200 18000.0
0 Avengers Action 2022-01-01 15.0 1000 15000.0
4 Inception Sci-Fi 2022-05-01 12.0 900 10800.0
INFERENCE

Total Revenue
 The total revenue from all movie ticket sales is $93,400.

Revenue by Movie Title

 The movie with the highest revenue is "The Dark Knight" with a revenue of $18,000.
 The movies with the lowest revenue are "The Lord of the Rings", "Pulp Fiction", and "The Silence of
the Lambs" with a revenue of $5,000 each.

Revenue by Genre
 The genre with the highest revenue is Action with a revenue of $42,600.
 The genre with the lowest revenue is Fantasy with a revenue of $5,000.

Top 3 Movies by Revenue

 The top 3 movies by revenue are:
1. "The Dark Knight" with a revenue of $18,000.
2. "Avengers" with a revenue of $15,000.
3. "Inception" with a revenue of $10,800.

Tickets Sold by Movie Title

 The movie with the highest number of tickets sold is "The Dark Knight" with 1,200 tickets sold.
 The movies with the lowest number of tickets sold are "The Lord of the Rings", "Pulp Fiction", and
"The Silence of the Lambs" with 500 tickets sold each.

Ticket Price by Genre

 The genre with the highest average ticket price is Action with an average ticket price of $13.75.
 The genre with the lowest average ticket price is Fantasy with an average ticket price of $10.

Highest Revenue Movie

 The movie with the highest revenue is "The Dark Knight" with a revenue of $18,000.
Highest Revenue Genre
 The genre with the highest revenue is Action with a revenue of $42,600.

EXPLANATION

Here's a breakdown of the lines:

Line 1: cmap = plt.get_cmap('Reds')

 This line retrieves a color map (cmap) from Matplotlib's collection of built-in
colormaps.
 The 'Reds' argument specifies the name of the colormap, which is a sequential
colormap ranging from light red to dark red.
 The plt.get_cmap() function returns a Colormap object, which is stored in the
cmap variable.

Line 2: norm = mcolors.Normalize(vmin=df['Revenue'].min(),

vmax=df['Revenue'].max())

 This line creates a normalization object (norm) that maps values from a given range
to the range [0, 1].
 The vmin and vmax arguments specify the minimum and maximum values of the
range, respectively.
 in this case, the range is set to the minimum and maximum revenue values in the
df['Revenue'] column.
 The mcolors.Normalize() function returns a Normalize object, which is stored
in the norm variable.

Line 3: sns.barplot(x="Movie Title", y="Revenue", data=df,

palette=cmap(norm(df['Revenue'])))

 This line creates a bar plot using Seaborn's barplot() function.

 The x and y arguments specify the columns to use for the x-axis and y-axis,
respectively.
 The data argument specifies the DataFrame to use for the plot.
 The palette argument specifies the color palette to use for the bars.
 In this case, the palette argument is set to cmap(norm(df['Revenue'])), which
applies the colormap (cmap) to the normalized revenue values
(norm(df['Revenue'])).
 This creates a color gradient effect, where the color of each bar is determined by its
corresponding revenue value. Bars with higher revenue values will be colored darker
red, while bars with lower revenue values will be colored lighter red.
TOPIC4:VISUALIZING DISTRIBUTIONS,AMOUNTS AND PROPORTIONS

Q5. An e-commerce company sells products in various categories, including

electronics, fashion, home goods, and more. The company wants to analyze its
sales data to identify trends, patterns, and insights that can inform business
decisions.

Analyze the data distributions, proportions and X-Y relationships to visualize the sales
data to answer the following questions:

i. Which product category generates the most revenue?

ii. What is the trend of sales over time?
iii. Which products have the highest sales quantity?
iv. How does the sales price vary across different product categories?
v. What is the relationship between customer age and sales price?
vi. Which cities have the highest sales revenue?

ANSWER

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
df = pd.read_csv('sales_data.csv')

# Convert Sales Date to datetime format

df['Sales Date'] = pd.to_datetime(df['Sales Date'])

# Calculate total sales

df['Total Sales'] = df['Quantity'] * df['Sales Price']

# Question 1: Which product category generates the most revenue?

plt.figure(figsize=(8, 6))
sns.barplot(x='Product Category', y='Total Sales',
data=df.groupby('Product Category')['Total Sales'].sum().reset_index())
plt.title('Total Sales by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Total Sales')
plt.show()

# Question 2: What is the trend of sales over time?

plt.figure(figsize=(10, 6))
sns.lineplot(x='Sales Date', y='Total Sales',
data=df.groupby('Sales Date')['Total Sales'].sum().reset_index())
plt.title('Total Sales Over Time')
plt.xlabel('Sales Date')
plt.ylabel('Total Sales')
plt.show()

# Question 3: Which products have the highest sales quantity?

plt.figure(figsize=(8, 6))
sns.barplot(x='Product Name', y='Quantity',
data=df.groupby('Product Name')
['Quantity'].sum().reset_index().nlargest(5, 'Quantity'))
plt.title('Top 5 Products by Sales Quantity')
plt.xlabel('Product Name')
plt.ylabel('Quantity')
plt.show()

# Question 4: How does the sales price vary across different product categories?
plt.figure(figsize=(8, 6))
sns.boxplot(x='Product Category', y='Sales Price', data=df)
plt.title('Sales Price Distribution by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Sales Price')
plt.show()

# Question 5: What is the relationship between customer age and sales price?
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Customer Age', y='Sales Price', data=df)
plt.title('Relationship Between Customer Age and Sales Price')
plt.xlabel('Customer Age')
plt.ylabel('Sales Price')
plt.show()

# Question 6: Which cities have the highest sales revenue?

plt.figure(figsize=(8, 6))
sns.barplot(x='Customer Location', y='Total Sales',
data=df.groupby('Customer Location')['TotalSales'].sum().reset_index().nlargest(5,
'Total Sales'))
plt.title('Top 5 Cities by Sales Revenue')
plt.xlabel('Customer Location')
plt.ylabel('Total Sales')
plt.show()

OUTPUT
PART-C
TOPIC2
Q6.How do you scale map the Election data to derive inferences? Use various
aesthetics relevant to following scenario

Election_data.csv
Location,Party,Votes
New York,Republican,1000
New York,Democrat,1200
California,Republican,800
California,Democrat,1500
Texas,Republican,1200
Texas,Democrat,1000
Florida,Republican,1000
Florida,Democrat,1200

ANSWER:

import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV file

data = pd.read_csv("election_data.csv")

# Assume the CSV file has the following columns:

# - Location (e.g., state, city)
# - Party (e.g., Republican, Democrat)
# - Votes (number of votes for each party in each location)

# Case 1: Color Aesthetics - Vote Share by Party

plt.figure(figsize=(10, 6))
for party in data["Party"].unique():
party_data = data[data["Party"] == party]
plt.plot(party_data["Location"], party_data["Votes"], label=party)
plt.xlabel("Location")
plt.ylabel("Votes")
plt.title("Color Aesthetics :Vote Share by Party")
plt.legend()
plt.show()

# Case 2: Marker Aesthetics - Winner by Location

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
winner = location_data.loc[location_data["Votes"].idxmax()]["Party"]
if winner == "Republican":
marker = "o"
elif winner == "Democrat":
marker = "s"
else:
marker = "D"
plt.scatter(location, location_data["Votes"].max(), marker=marker, s=100,
label=winner)
plt.xlabel("Location")
plt.ylabel("Votes")
plt.title("Marker Aesthetics:Winner by Location")
plt.legend()
plt.show()

# Case 3: Size Aesthetics - Vote Margin by Location

plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
vote_margin = location_data["Votes"].max() - location_data["Votes"].min()
plt.scatter(location, location_data["Votes"].max(), s=vote_margin*10,
label=location)
plt.xlabel("Location")
plt.ylabel("Votes")
plt.title("Size Aesthetics:Vote Margin by Location")
plt.legend()
plt.show()

# Case 4: Shape Aesthetics - Party Strongholds

plt.figure(figsize=(10, 6))
for party in data["Party"].unique():
party_data = data[data["Party"] == party]
if party == "Republican":
marker = "o"
elif party == "Democrat":
marker = "s"
else:
marker = "D"
plt.scatter(party_data["Location"], party_data["Votes"], marker=marker, s=100,
label=party)
plt.xlabel("Location")
plt.ylabel("Votes")
plt.title("Shape Aesthetics: Party Strongholds")
plt.legend()
plt.show()

OUTPUT
Inferences from the Output

Vote Share by Party

 The Democrat party has a higher vote share in most locations, indicating a stronger
presence in those areas.
 The Republican party has a significant vote share in Texas, suggesting a strong
support base in that location.

Winner by Location

 The Democrat party has won in New York, California, and Florida, indicating a strong
presence in these locations.
 The Republican party has won in Texas, suggesting a strong support base in that
location.

Vote Margin by Location

 The vote margin is highest in California, indicating a significant difference in votes

between the two parties.
 The vote margin is lowest in Texas, suggesting a closely contested election in that
location.

Party Strongholds

 The Democrat party has strongholds in New York, California, and Florida, indicating
a strong support base in these locations.
 The Republican party has a stronghold in Texas, suggesting a strong support base in
that location.

Overall Inferences

 The Democrat party appears to have a stronger presence in most locations, with
significant vote shares and wins in multiple areas.
 The Republican party has a strong support base in Texas, but trails behind the
Democrat party in other locations.
 California appears to be a key location, with a significant vote margin and a strong
presence of both parties.
 Texas is a closely contested location, with a low vote margin and a strong presence of
both parties.

LOGIC:
Case 1: Color Aesthetics - Vote Share by Party
This section of the code creates a line plot to display the vote share of each party
across different locations.

 The for loop iterates over each unique party in the data.
 For each party, the code filters the data to include only rows where the party matches
the current party.
 The plt.plot function is used to create a line plot of the vote share for each party.
 The label parameter is used to label each line with the corresponding party name.

Case 2: Marker Aesthetics - Winner by Location

This section of the code creates a scatter plot to display the winning party in each
location.
 The for loop iterates over each unique location in the data.
 For each location, the code filters the data to include only rows where the location
matches the current location.
 The winner variable is used to store the party with the maximum votes in the current
location.
 The plt.scatter function is used to create a scatter plot with markers indicating the
winning party in each location.
 The marker parameter is used to specify the marker style for each party.

Case 3: Size Aesthetics - Vote Margin by Location

This section of the code creates a scatter plot to display the vote margin between
parties in each location.

 The for loop iterates over each unique location in the data.
 For each location, the code filters the data to include only rows where the location
matches the current location.
 The vote_margin variable is used to store the difference between the maximum and
minimum votes in the current location.
 The plt.scatter function is used to create a scatter plot with markers indicating the vote
margin in each location.
 The s parameter is used to specify the size of each marker, which is proportional to the
vote margin.

Case 4: Shape Aesthetics - Party Strongholds

This section of the code creates a scatter plot to display the strongholds of each party.

 The for loop iterates over each unique party in the data.
 For each party, the code filters the data to include only rows where the party matches
the current party.
 The plt.scatter function is used to create a scatter plot with markers indicating the
strongholds of each party.
 The marker parameter is used to specify the marker style for each party.

Visualizing Complex Data Using R (2014, N.D.lewis)
No ratings yet
Visualizing Complex Data Using R (2014, N.D.lewis)
385 pages
Fatigue Fracture Mechanics
0% (2)
Fatigue Fracture Mechanics
12 pages
KGMC Alumni Association Directory 2011
100% (1)
KGMC Alumni Association Directory 2011
29 pages
Data Visualisation Using Python
100% (1)
Data Visualisation Using Python
77 pages
Crystal Meth PDF
100% (3)
Crystal Meth PDF
64 pages
Call of Cthulhu - D20 - Hardboiled Part 1 - Waters Over Heaven
100% (1)
Call of Cthulhu - D20 - Hardboiled Part 1 - Waters Over Heaven
51 pages
Data Visualization Charts, Maps, and Interactive Graphics
100% (16)
Data Visualization Charts, Maps, and Interactive Graphics
249 pages
General Navigation
100% (4)
General Navigation
46 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Minor Project File
No ratings yet
Minor Project File
29 pages
Covid Data Report
No ratings yet
Covid Data Report
21 pages
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
No ratings yet
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
59 pages
Research Poster
No ratings yet
Research Poster
2 pages
Physics - Classes IX-X - NC 2006 - Latest Revision June 2012
No ratings yet
Physics - Classes IX-X - NC 2006 - Latest Revision June 2012
72 pages
Clay Pot Refrigerator
No ratings yet
Clay Pot Refrigerator
494 pages
10212cs214 Data Visualization Unit 4 Part 2 04.03.2024
No ratings yet
10212cs214 Data Visualization Unit 4 Part 2 04.03.2024
88 pages
Unit4 - Dev
No ratings yet
Unit4 - Dev
129 pages
Lecture 3
No ratings yet
Lecture 3
53 pages
Week12 Slides
No ratings yet
Week12 Slides
46 pages
High Current Linear Regulated Bench Power Supply
No ratings yet
High Current Linear Regulated Bench Power Supply
14 pages
Pressurevessel Sop 0082 PDF
No ratings yet
Pressurevessel Sop 0082 PDF
9 pages
DEV Experiment No.3
No ratings yet
DEV Experiment No.3
10 pages
22695a3120 DV ASSIGNMENT-1
No ratings yet
22695a3120 DV ASSIGNMENT-1
10 pages
Effective Data Visualization With Python Notes V1.01
No ratings yet
Effective Data Visualization With Python Notes V1.01
78 pages
BDAExp 8
No ratings yet
BDAExp 8
9 pages
Handout6 - Visualization
No ratings yet
Handout6 - Visualization
75 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
and Data/uk - and - Regional - Series
0% (1)
and Data/uk - and - Regional - Series
5 pages
Data Visualization
No ratings yet
Data Visualization
93 pages
Dev Exp2 60009220193
No ratings yet
Dev Exp2 60009220193
9 pages
Data Visualization
No ratings yet
Data Visualization
40 pages
Unit 4
No ratings yet
Unit 4
35 pages
Data Visualization Techniques: Dr. D. Koteswara Rao
No ratings yet
Data Visualization Techniques: Dr. D. Koteswara Rao
41 pages
(Shared) Data Visualisation (Quiz-1, Jan-24 Term)
No ratings yet
(Shared) Data Visualisation (Quiz-1, Jan-24 Term)
12 pages
Chapter 3 Non Spatial Data Visualization
No ratings yet
Chapter 3 Non Spatial Data Visualization
45 pages
436 20jan Midterm
No ratings yet
436 20jan Midterm
9 pages
DAV ESX Answer
No ratings yet
DAV ESX Answer
58 pages
LM of Tip
No ratings yet
LM of Tip
5 pages
Learning From Lines: Critical COVID Data Visualizations and The Quarantine Quotidian
No ratings yet
Learning From Lines: Critical COVID Data Visualizations and The Quarantine Quotidian
13 pages
DVPD Final Lab Word PDF
No ratings yet
DVPD Final Lab Word PDF
93 pages
DV Co1 All PDF
No ratings yet
DV Co1 All PDF
196 pages
CSE315:Introduction To Data Science: WEEK-8
No ratings yet
CSE315:Introduction To Data Science: WEEK-8
27 pages
Plot Per Columns Features Kde or Normal Distribution Seaborn in Details
No ratings yet
Plot Per Columns Features Kde or Normal Distribution Seaborn in Details
272 pages
Quant Developers' Tools and Techniques: Quant Books, #1
From Everand
Quant Developers' Tools and Techniques: Quant Books, #1
Manfred Hindering
No ratings yet
Data Visualization Python Code
No ratings yet
Data Visualization Python Code
8 pages
Santamaria
No ratings yet
Santamaria
6 pages
Visualization
No ratings yet
Visualization
75 pages
09 Plotting and Visualization
No ratings yet
09 Plotting and Visualization
97 pages
Cap 8 Harry Stack Sullivan PDF
No ratings yet
Cap 8 Harry Stack Sullivan PDF
30 pages
Jmis 26 4 167
No ratings yet
Jmis 26 4 167
9 pages
DV Lab Fat
No ratings yet
DV Lab Fat
7 pages
Data Visualization in R
No ratings yet
Data Visualization in R
12 pages
Data Visualization Part 2
No ratings yet
Data Visualization Part 2
18 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Ob CH-2
No ratings yet
Ob CH-2
20 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Ds 1
No ratings yet
Ds 1
22 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Unit 2
No ratings yet
Unit 2
52 pages
Data Visualization Notes
No ratings yet
Data Visualization Notes
7 pages
Tools For Data Visualization
No ratings yet
Tools For Data Visualization
2 pages
Magnetism - Notes 24-25
No ratings yet
Magnetism - Notes 24-25
13 pages
Visualizing A Single Variable Using R
No ratings yet
Visualizing A Single Variable Using R
9 pages
Covid-19 Data Analysis
No ratings yet
Covid-19 Data Analysis
6 pages
Ashutosh Project
No ratings yet
Ashutosh Project
19 pages
Eng - Avionics PTC 2019
No ratings yet
Eng - Avionics PTC 2019
186 pages
Non Vented Drift
No ratings yet
Non Vented Drift
17 pages
DAV ESE Mod 2
No ratings yet
DAV ESE Mod 2
9 pages
Prac - 6
No ratings yet
Prac - 6
7 pages
Region and Domain Region and Domain
No ratings yet
Region and Domain Region and Domain
3 pages
Da Unit - V
No ratings yet
Da Unit - V
14 pages
Lab Work Week 7 - Data Visualisation - Specification
No ratings yet
Lab Work Week 7 - Data Visualisation - Specification
5 pages
Analyzing FE
No ratings yet
Analyzing FE
5 pages
Visualizing COVID-19 Data Beautifully in Python (In 5 Minutes or Less!!) - by Nik Piepenbreier - Towards Data Science
No ratings yet
Visualizing COVID-19 Data Beautifully in Python (In 5 Minutes or Less!!) - by Nik Piepenbreier - Towards Data Science
8 pages
Data Visualization For The Understanding of COVID-19
No ratings yet
Data Visualization For The Understanding of COVID-19
6 pages
The Bugs: Sammy Betty
No ratings yet
The Bugs: Sammy Betty
1 page
Biopsychology and Neuroscience Reviewer
No ratings yet
Biopsychology and Neuroscience Reviewer
4 pages
BOM Prod Analysis
No ratings yet
BOM Prod Analysis
3 pages
Battery Thermal Management System
No ratings yet
Battery Thermal Management System
17 pages
TOEFL Reading - Practice Exam - Revisión Del Intento (Página 1 de 5)
No ratings yet
TOEFL Reading - Practice Exam - Revisión Del Intento (Página 1 de 5)
5 pages
Eco SMRT
No ratings yet
Eco SMRT
2 pages
Rohini 77915581445
No ratings yet
Rohini 77915581445
2 pages
BZ3 Instruction (v1.0)
No ratings yet
BZ3 Instruction (v1.0)
23 pages
Case Study
No ratings yet
Case Study
3 pages
pc60 Datasheet en
No ratings yet
pc60 Datasheet en
3 pages
Angol Nyelvi Próbafelvételi
No ratings yet
Angol Nyelvi Próbafelvételi
10 pages
Understanding Track Stress in Modern Railways
No ratings yet
Understanding Track Stress in Modern Railways
8 pages
Disaster Management at International Level
No ratings yet
Disaster Management at International Level
4 pages
Coning of Wheels in Railway
No ratings yet
Coning of Wheels in Railway
5 pages
Just A Pretty Face
No ratings yet
Just A Pretty Face
2 pages
Details For
No ratings yet
Details For
6 pages
Boiler Report
No ratings yet
Boiler Report
1 page
Isaac Asiedu CV
No ratings yet
Isaac Asiedu CV
5 pages