UNIT1
UNIT1
(PART-A)
(position, size, line width, color) - represent both continuous and discrete data.
(shape,line type) -represent discrete data (shape, line type).
Answer:
A-5, B-4, C-6,D-1,E-3,F-2
A scale defines a unique mapping between data and aesthetics .A scale must
be one-to-one, such that for each specific data value there is exactly one
aesthetics value and vice versa
4. Suppose you want to visualize the wind patterns around a circular
structure, such as a wind turbine or a cylindrical building. You have data
on wind speed and direction at various points around the structure.Which
type of axis would you choose? Why?
Ans:
I will choose circular coordinate system such that,
This visualization would allow us to identify areas with high wind speeds,
dominant wind directions, and wind patterns
5. Interpret and write any two insights with colours from “titanic visual”
given below
Inference:
There were more number of male passengers than female passengers.
Majority of male passengers prefer 3rd class travel
6. Give any two visualization scenario for geospatial data.
Answer:
Visualizing weather data
Visualizing satellite path
Visualizing country-wise population
Visualizing country-wise literacy rate
Visualizing country-wise covid attack
UNIT-1
(PART-B)
Covid19_data.csv
Date,Country,Region,Cases,Deaths,Vaccinations,Age group,Sex,Transmission
type,Day
2020-01-01,China,Hubei,41,6,0,0-19,Male,Local,Monday
2020-01-02,China,Hubei,59,7,0,20-39,Female,Local,Tuesday
2020-01-03,China,Hubei,77,8,0,40-59,Male,Local,Wednesday
2020-01-04,China,Hubei,101,9,0,60+,Female,Local,Thursday
2020-01-05,China,Hubei,128,10,0,0-19,Male,Imported,Friday
2020-01-06,China,Hubei,155,11,0,20-39,Female,Imported,Saturday
2020-01-07,China,Hubei,182,12,0,40-59,Male,Imported,Sunday
2020-01-08,China,Hubei,212,13,0,60+,Female,Local,Monday
ANSWER
Here are the data types and visualization aesthetics for the modified COVID-19 dataset:
OUTPUT
INFERENCES
Here are the inferences from each visualization:
1. Temporal Data (Time Series) Visualization: COVID-19 cases and deaths increased over time,
while vaccinations showed a steady rise. Cases and deaths peaked around the same time.
2. Categorical Data Visualization: The 20-39 age group had the highest number of COVID-19
cases, with a slight majority of males. The 60+ age group had the lowest number of cases.
3. Numerical Data Visualization: COVID-19 cases followed a skewed distribution, with most
cases falling in the lower range (0-100 cases). A few outliers had extremely high case numbers.
4. Ordinal Data Visualization: COVID-19 cases were highest on Thursdays and lowest on
Sundays. Case numbers varied significantly across days of the week.
5. Correlation Data Visualization: COVID-19 cases and deaths showed a strong positive
correlation (0.85). Cases and vaccinations had a moderate positive correlation (0.55).
6. Scatter Plot with Regression Line: There was a strong positive linear relationship between
COVID-19 cases and deaths. As cases increased, deaths also tended to increase.
EXPLANATION
Here's a brief explanation of each line:
Creates a line plot showing the number of vaccinations over time, with a
triangle marker (^) at each data point.
2. sns.countplot(x='Age group', hue='Sex', data=df)
Creates a bar plot showing the count of COVID-19 cases by age group, with
different colors for males and females.
Creates a scatter plot showing the relationship between COVID-19 cases and
deaths, with a regression line.
=============================================================
=======================
TOPIC2:SCALES MAP DATA VALUES TO ASTHETICS
Q2. Generate weather report by applying various aesthetics to map data values
meeting following requirements
import pandas as pd
import matplotlib.pyplot as plt
CODE EXPLANATION
This line is similar to Line 3, but with a fixed marker size and specified marker style:
o s=100: This specifies a fixed marker size.
o marker=marker: This specifies a marker style (e.g., circle, square, triangle) assigned
based on the location.
In summary:
=========================================================
TOPIC2:SCALES MAP DATA VALUES TO ASTHETICS
Q3.Explore the Air quality data, derive insights and scales by applying various
aesthetics to map data values. Apply different market styles, point sizes,colors
and shapes and write your inferences
PROGRAM
import pandas as pd
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Urban":
marker = "o"
elif location == "Suburban":
marker = "s"
elif location == "Rural":
marker = "^"
else:
marker = "D"
plt.plot(location_data["Date"], location_data["Concentration"], marker=marker,
linestyle="-", label=location)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Marker Aesthetics")
plt.legend()
plt.show()
plt.figure(figsize=(10, 6))
for pollutant in data["Pollutant"].unique():
pollutant_data = data[data["Pollutant"] == pollutant]
plt.scatter(pollutant_data["Date"], pollutant_data["Concentration"],
s=pollutant_data["Concentration"]*10, label=pollutant)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Size Aesthetics")
plt.legend()
plt.show()
plt.figure(figsize=(10, 6))
for location in data["Location"].unique():
location_data = data[data["Location"] == location]
if location == "Urban":
marker = "o"
elif location == "Suburban":
marker = "s"
elif location == "Rural":
marker = "^"
else:
marker = "D"
plt.scatter(location_data["Date"], location_data["Concentration"], marker=marker,
s=100, label=location)
plt.xlabel("Date")
plt.ylabel("Concentration (μg/m³)")
plt.title("Shape Aesthetics")
plt.legend()
plt.show()
OUTPUT
Inferences:
1. Pollutant Concentration Trends: The line plots in Case 1 show that PM2.5 concentrations are
relatively low, while NO2 and O3 concentrations are higher.
2. Location-Specific Concentrations: The marker plots in Case 2 reveal that Urban areas tend to
have higher pollutant concentrations, followed by Suburban and Rural areas.
3. Concentration Variability: The scatter plots in Case 3 demonstrate that pollutant concentrations
vary significantly across different dates, with some dates showing very high concentrations.
4. Location-Pollutant Interactions: The shape plots in Case 4 suggest that different locations have
distinct pollutant concentration profiles, with Urban areas showing a mix of high and low
concentrations.
5. Date-Specific Concentrations: The plots in all cases show that pollutant concentrations vary
significantly across different dates, indicating that date-specific factors (e.g., weather, human
activity) play a crucial role in shaping air quality.
============================================
TOPIC3:COLOR TO DISTINGUISH, COLOR TO HIGHLIGHT
Q4. You are provided with the movie ticket sales data. How do you apply colors
to highlight, distinguish and infer following results.
Movie_Ticket_Sales Dataset:
ANSWER
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.colors as mcolors
OUPUT
Top 3 Movies by Revenue:
Movie Title Genre Release Date Ticket Price Tickets Sold Revenue
3 The Dark Knight Action 2022-04-01 15.0 1200 18000.0
0 Avengers Action 2022-01-01 15.0 1000 15000.0
4 Inception Sci-Fi 2022-05-01 12.0 900 10800.0
INFERENCE
Total Revenue
The total revenue from all movie ticket sales is $93,400.
Revenue by Genre
The genre with the highest revenue is Action with a revenue of $42,600.
The genre with the lowest revenue is Fantasy with a revenue of $5,000.
EXPLANATION
This line retrieves a color map (cmap) from Matplotlib's collection of built-in
colormaps.
The 'Reds' argument specifies the name of the colormap, which is a sequential
colormap ranging from light red to dark red.
The plt.get_cmap() function returns a Colormap object, which is stored in the
cmap variable.
This line creates a normalization object (norm) that maps values from a given range
to the range [0, 1].
The vmin and vmax arguments specify the minimum and maximum values of the
range, respectively.
in this case, the range is set to the minimum and maximum revenue values in the
df['Revenue'] column.
The mcolors.Normalize() function returns a Normalize object, which is stored
in the norm variable.
Analyze the data distributions, proportions and X-Y relationships to visualize the sales
data to answer the following questions:
ANSWER
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
df = pd.read_csv('sales_data.csv')
# Question 4: How does the sales price vary across different product categories?
plt.figure(figsize=(8, 6))
sns.boxplot(x='Product Category', y='Sales Price', data=df)
plt.title('Sales Price Distribution by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Sales Price')
plt.show()
# Question 5: What is the relationship between customer age and sales price?
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Customer Age', y='Sales Price', data=df)
plt.title('Relationship Between Customer Age and Sales Price')
plt.xlabel('Customer Age')
plt.ylabel('Sales Price')
plt.show()
OUTPUT
PART-C
TOPIC2
Q6.How do you scale map the Election data to derive inferences? Use various
aesthetics relevant to following scenario
Election_data.csv
Location,Party,Votes
New York,Republican,1000
New York,Democrat,1200
California,Republican,800
California,Democrat,1500
Texas,Republican,1200
Texas,Democrat,1000
Florida,Republican,1000
Florida,Democrat,1200
ANSWER:
import pandas as pd
import matplotlib.pyplot as plt
OUTPUT
Inferences from the Output
The Democrat party has a higher vote share in most locations, indicating a stronger
presence in those areas.
The Republican party has a significant vote share in Texas, suggesting a strong
support base in that location.
Winner by Location
The Democrat party has won in New York, California, and Florida, indicating a strong
presence in these locations.
The Republican party has won in Texas, suggesting a strong support base in that
location.
Party Strongholds
The Democrat party has strongholds in New York, California, and Florida, indicating
a strong support base in these locations.
The Republican party has a stronghold in Texas, suggesting a strong support base in
that location.
Overall Inferences
The Democrat party appears to have a stronger presence in most locations, with
significant vote shares and wins in multiple areas.
The Republican party has a strong support base in Texas, but trails behind the
Democrat party in other locations.
California appears to be a key location, with a significant vote margin and a strong
presence of both parties.
Texas is a closely contested location, with a low vote margin and a strong presence of
both parties.
LOGIC:
Case 1: Color Aesthetics - Vote Share by Party
This section of the code creates a line plot to display the vote share of each party
across different locations.
The for loop iterates over each unique party in the data.
For each party, the code filters the data to include only rows where the party matches
the current party.
The plt.plot function is used to create a line plot of the vote share for each party.
The label parameter is used to label each line with the corresponding party name.
The for loop iterates over each unique location in the data.
For each location, the code filters the data to include only rows where the location
matches the current location.
The vote_margin variable is used to store the difference between the maximum and
minimum votes in the current location.
The plt.scatter function is used to create a scatter plot with markers indicating the vote
margin in each location.
The s parameter is used to specify the size of each marker, which is proportional to the
vote margin.
The for loop iterates over each unique party in the data.
For each party, the code filters the data to include only rows where the party matches
the current party.
The plt.scatter function is used to create a scatter plot with markers indicating the
strongholds of each party.
The marker parameter is used to specify the marker style for each party.