0% found this document useful (0 votes)

31 views18 pages

Data Visualization Part 2

Uploaded by

mrunalikulkarni0331

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views18 pages

Data Visualization Part 2

Uploaded by

mrunalikulkarni0331

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Study Notes

Data Visualization:
 Bar Plots
 Count Plots
 Histograms
 Cat Plots (Box, Violin, Swarm, Boxen)
 Multiple Plots using FacetGrid
 Joint Plots
 KDE Plots
 Pairplots
 Heatmaps
 Scatter Plots
Study Notes- Data Visualization

1. Bar Plots
Bar plots are an effective way to visualize various data types, including counts,
frequencies, percentages, or averages. They are particularly valuable for comparing data
across different categories.

Use Cases:

1. Categorical Comparison: In a bar plot, each bar represents a specific category, and the
height of the bar reflects the aggregated value associated with that category (such as
count, sum, or mean).

For instance, you can use a bar plot to show the average age of Titanic passengers based on
gender.

# Simple barplot
[Link](data=titanic, x="who", y="age", estimator='mean',
errorbar=None, palette='viridis')
[Link]('Simple Barplot')
[Link]('Person')
[Link]('Average Age')
[Link]();

using Seaborn

2. Proportional Representation with Stacked Bar Charts:

Bar plots can also be used to visualize proportions or percentages. By adjusting the
height of each bar to reflect the proportion of observations within a category, stacked
bar charts allow for a comparison of the relative distribution across different categories.

2
Study Notes- Data Visualization

For example, a stacked bar chart could show the proportion of males from various towns
aboard the Titanic.

#Prepare data for next plot

data = [Link]('embark_town').agg({'who':'count','sex': lambda x: (x=='male').sum()}).reset_index()
[Link](columns={'who':'total', 'sex':'male'}, inplace=True)
data.sort_values('total', inplace=True)

# Barplot Showing Part of Total

sns.set_color_codes("pastel")
[Link](x="total", y="embark_town", data=data,
label="Female", color="b")
sns.set_color_codes("muted")
[Link](x="male", y="embark_town", data=data,
label="Male", color="b")
[Link]('Barplot Showing Part of Total')
[Link]('Number of Persons')
[Link](loc='upper right')
[Link]()

using Seaborn

3. Comparing Subcategories within Categories using Clustered Bar Plots:

Clustered bar plots group multiple bars within each category to represent different
subcategories, making it easier to compare and analyze data across them.

For instance, you could use a clustered bar plot to compare the average age of males and
females within each class.

3
Study Notes- Data Visualization

# Clustered barplot
[Link](data=titanic, x='class', y='age', hue='sex',
estimator='mean', errorbar=None, palette='viridis')
[Link]('Clustered Barplot')
[Link]('Class')
[Link]('Average Age')
[Link]();

using Seaborn

2. Count Plots
A count plot visualizes the frequency of occurrences for each category within a
categorical variable. The x-axis shows the categories, while the y-axis indicates the count
or frequency of each category.

Use Cases:

 Frequency Distribution of Categorical Variables: Each bar in the plot represents a

category, and its height reflects the number of observations in that category, helping
identify the most and least common categories.

For example, the count plot can be used to show the status of passengers on the Titanic.

# Simple Countplot
[Link](data=titanic, x='alive', palette='viridis')
[Link]('Simple Countplot')
[Link]();

4
Study Notes- Data Visualization

using Seaborn
Analyzing the relationship between different categorical variables
For example, examining the status of passengers based on gender on the Titanic.

# Clustered Countplot
[Link](data=titanic, y="who",
hue="alive", palette='viridis')
[Link]('Clustered Countplot')
[Link]();

using Seaborn

3. Histograms
Histograms are visual representations that display the distribution of a dataset, helping
5
Study Notes- Data Visualization

to uncover key characteristics such as normality, skewness, or multiple peaks. They

show the frequency or count of data points within specific intervals or "bins." The x-axis
represents the range of values in the dataset, divided into equal bins, while the y-axis
shows the frequency or count of observations within each bin. The height of each bar
corresponds to the number of data points in that bin.
Use Cases:
4. To visualize the distribution, central tendency, range, and spread of a continuous or
numeric variable, as well as to identify any patterns or outliers.

# Histogram with KDE

[Link](data=iris, x='sepal_width', kde=True)
[Link]('Histogram with KDE')
[Link]();

using Seaborn

2. 2. Compare theCompare the distribution of multiple continuous variables

For example, comparing the distribution of petal length and sepal length in flowers.

# Histogram with multiple features

[Link](data=iris[['sepal_length','sepal_width']])
[Link]('Multi-Column Histogram')
[Link]()

6
Study Notes- Data Visualization

3. Compare the distribution of a continuous variable across different categories

For example, comparing the distribution of petal length among various flower species.

#Stacked Histogram
[Link](iris, x='sepal_length', hue='species', multiple='stack',
linewidth=0.5)
[Link]('Stacked Histogram')
[Link]()

using Seaborn

4. Cat Plots (Box, Violin, Swarm, Boxen)

A catplot is a high-level, flexible function that integrates several categorical seaborn
plots, such as boxplots, violinplots, swarmplots, pointplots, barplots, and countplots.
Use Cases:

7
Study Notes- Data Visualization

 Analyze the relationship between categorical and continuous variables

 Obtain a statistical summary of a continuous variable
Examples:

# Boxplot
[Link](data=tips, x='time', y='total_bill', hue='sex', palette='viridis')
[Link]('Boxplot')
[Link]()

using Seaborn
# Violinplot
[Link](data=tips, x='day', y='total_bill', palette='viridis')
[Link]('Violinplot')
[Link]()

8
Study Notes- Data Visualization

using Seaborn
#Swarmplot
[Link](data=tips, x='time', y='tip', dodge=True, palette='viridis', hue='sex', s=6)
[Link]('SwarmPlot')
[Link]()

using Seaborn
#StripPlot
[Link](data=tips, x='tip', hue='size', y='day', s=25, alpha=0.2,
jitter=False, marker='D',palette='viridis')
[Link]('StripPlot')
[Link]()

using Seaborn

9
Study Notes- Data Visualization

5Multiple Plots using FacetGrid

FacetGrid is a feature in the Seaborn library that enables the creation of multiple data subsets
arranged in a grid-like structure. Each plot in the grid represents a category, and these subsets
are defined by the column names specified in the 'col' and 'row' attributes of FacetGrid(). The
plots in the grid can be of any type supported by Seaborn, such as scatter plots, line plots, bar
plots, or histograms.
Use Cases:

 Compare and analyze different groups or categories within a dataset

 Create subplots efficiently
Example: Boxplots for pulse rate during various activities

# Creating subplots using FacetGrid

g = [Link](exercise, col='kind', palette='Paired')

# Drawing a plot on every facet

[Link]([Link], 'pulse')

g.set_titles(col_template="Pulse rate for {col_name}")

g.add_legend();

using Seaborn

Scatter plots for flipper length and body mass of Penguins from different islands

# Creating subplots using FacetGrid

g = [Link](penguins, col='island',hue='sex', palette='Paired')

# Drawing a plot on every facet

[Link]([Link], 'flipper_length_mm', 'body_mass_g')
g.set_titles(template="Penguins of {col_name} Island")
g.add_legend();

10
Study Notes- Data Visualization

using Seaborn

6. Joint Plots
A joint plot combines univariate and bivariate visualizations in one figure. The central plot
typically features a scatter plot or hexbin plot to represent the joint distribution of two
variables. Additional plots, such as histograms or Kernel Density Estimates (KDEs), are displayed
along the axes to show the individual distributions of each variable.
Use Cases:

 Analyzing the relationship between two variables

# Hex Plot with Histogram margins

[Link](x="mpg", y="displacement", data=mpg,
height=5, kind='hex', ratio=2, marginal_ticks=True)

 Comparing the individual distributions of two variables

Example: Comparing displacement and miles per gallon (MPG) for cars

11
Study Notes- Data Visualization

Comparison of acceleration and horsepower for cars from different countries

# Scatter Plot with KDE Margins
[Link](x="horsepower", y="acceleration", data=mpg,
hue="origin", height=5, ratio=2, marginal_ticks=True);

7. KDE Plots
A KDE (Kernel Density Estimate) plot provides a smooth, continuous representation of the
probability density function for a continuous random variable. The y-axis represents the density
or likelihood of observing specific values, while the x-axis displays the variable's values.
Use Cases:

 Visualizing the distribution of a single variable (univariate analysis)

 Gaining insights into the shape, peaks, and skewness of the distribution
Example: Comparing the horsepower of cars in relation to the number of cylinders

#Overlapping KDE Plots

[Link](data=mpg, x='horsepower', hue='cylinders', fill=True,
palette='viridis', alpha=.5, linewidth=0)
[Link]('Overlapping KDE Plot')
[Link](

12
Study Notes- Data Visualization

Comparing the weight of cars across different countries:

#Stacked KDE Plots

[Link](data=mpg, x="weight", hue="origin", multiple="stack")
[Link]('Stacked KDE Plot')
[Link]();

8. Pairplots
A pair plot is a visualization technique that helps explore relationships between multiple
variables in a dataset. It creates a grid of scatter plots where each variable is plotted against
13
Study Notes- Data Visualization

every other variable, with diagonal entries displaying histograms or density plots to show the
distribution of values for each variable.
Use Cases:

 Identifying correlations or patterns between variables, such as linear or non-linear

relationships, clusters, or outliers
Example: Visualizing the relationships between different features of penguins

#Simple Pairplot
[Link](data=penguins, corner=True);

# Pairplot with hues

[Link](data=penguins, hue='species');

14
Study Notes- Data Visualization

By adding hue to the plot, we can clearly distinguish key differences between the various
species of penguins.

9. Heatmaps
Heatmaps are visualizations that use color-coded cells to represent the values within a matrix
or table of data. In a heatmap, the rows and columns correspond to two different variables, and
the color intensity of each cell indicates the value or magnitude of the data point at their
intersection.
Use Cases:

 Correlation analysis and visualizing pivot tables that aggregate data by rows and
columns.
Example: Visualizing the correlation between all the numerical columns in the mpg
dataset.

Selection of numeric columns from the dataset

num_cols = list(mpg.select_dtypes(include='number'))

15
Study Notes- Data Visualization

fig = [Link](figsize=(12,7))

#Correlation Heatmap
[Link](data=mpg[num_cols].corr(),
annot=True, cmap=sns.cubehelix_palette(as_cmap=True))
[Link]('Heatmap of Correlation matrix');

[Link]();

10. Scatter Plots

A scatter plot visualizes the relationship between two continuous variables by displaying
individual data points on a graph. The x-axis represents one variable, and the y-axis represents
the other, creating a pattern of scattered points that illustrates their interaction.

Use Cases:

1. Relationship Analysis: Scatter plots help identify the relationship between two variables, such
as positive correlation (both increase together), negative correlation (one increases as the other
decreases), or no correlation.
Example: A scatter plot can show that the horsepower and weight of cars are positively
correlated.

# Simple Scatterplot
[Link](data=mpg, x='weight', y='horsepower', s=150, alpha=0.7)
[Link]('Simple Scatterplot')
[Link]();

16
Study Notes- Data Visualization

using Seaborn
Outlier Detection: Scatter plots effectively highlight outliers, which are data points that significantly
deviate from the general trend or pattern.

Clustering and Group Identification: By analyzing the distribution of points, scatter plots can reveal
natural groupings or patterns among the variables.
Example: Comparing the horsepower and weight of cars manufactured in different countries.

# Scatterplot with Hue

[Link](data=mpg, x='weight', y='horsepower', s=150, alpha=0.7,
hue='origin', palette='viridis')
[Link]('Scatterplot with Hue')
[Link]()

# Scatterplot with Hue and Markers

[Link](data=mpg, x='weight', y='horsepower', s=150, alpha=0.7,
style='origin',palette='viridis', hue='origin')
[Link]('Scatterplot with Hue and Markers')
[Link]()

17
Study Notes- Data Visualization

# Scatterplot with Hue & Size

[Link](data=mpg, x='weight', y='horsepower', sizes=(40, 400), alpha=.5,
palette='viridis', hue='origin', size='cylinders')
[Link]('Scatterplot with Hue & Size')
[Link]

 Trend Analysis: Scatter plots can illustrate the progression or changes in variables over
time by plotting data points in chronological order, making it easier to identify trends or
shifts in behavior.
 Model Validation: Scatter plots are useful for assessing a model's accuracy by
comparing predicted values against actual values, highlighting any deviations or
patterns in the model’s predictions.

Seaborn
No ratings yet
Seaborn
7 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Seaborn Data Visualization Guide
No ratings yet
Seaborn Data Visualization Guide
35 pages
Pandas Cheat Sheet 2
No ratings yet
Pandas Cheat Sheet 2
12 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
8 pages
Seaborn Data Visualization Guide
No ratings yet
Seaborn Data Visualization Guide
49 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
Data Analysis Graphs
No ratings yet
Data Analysis Graphs
9 pages
Data Visualization in Python With Libraries
No ratings yet
Data Visualization in Python With Libraries
28 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
4 pages
Titanic Fare Distribution Analysis
No ratings yet
Titanic Fare Distribution Analysis
21 pages
Pandas 3-2
No ratings yet
Pandas 3-2
27 pages
Session 7 - Data Visualization With Python
No ratings yet
Session 7 - Data Visualization With Python
17 pages
Seaborn: Key Features
No ratings yet
Seaborn: Key Features
5 pages
19 Matplotlib
No ratings yet
19 Matplotlib
26 pages
@PowerBI - Ir - Data Visualization Cheat Sheet
No ratings yet
@PowerBI - Ir - Data Visualization Cheat Sheet
15 pages
DSBDL Write Ups 8 To 10
No ratings yet
DSBDL Write Ups 8 To 10
7 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
An Introduction To Seaborn
No ratings yet
An Introduction To Seaborn
42 pages
Sections Revision Part 2
No ratings yet
Sections Revision Part 2
7 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
Mfds QnA
No ratings yet
Mfds QnA
8 pages
Data Visualization Techniques Guide
No ratings yet
Data Visualization Techniques Guide
9 pages
Data Visualization
No ratings yet
Data Visualization
23 pages
ProgrammingForDS12 Viz
No ratings yet
ProgrammingForDS12 Viz
25 pages
Data Visualization
No ratings yet
Data Visualization
33 pages
DMV Unit-4-1 PDF
100% (1)
DMV Unit-4-1 PDF
10 pages
Plot Per Columns Features Kde or Normal Distribution Seaborn in Details
No ratings yet
Plot Per Columns Features Kde or Normal Distribution Seaborn in Details
272 pages
Lec-5 Seaborn
No ratings yet
Lec-5 Seaborn
30 pages
MBA Seaborn 2
No ratings yet
MBA Seaborn 2
62 pages
Unit 05
No ratings yet
Unit 05
26 pages
Data Visualization with Python Tutorial
100% (1)
Data Visualization with Python Tutorial
9 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
Data Visualization Tools for EDA
No ratings yet
Data Visualization Tools for EDA
10 pages
Unit 5
No ratings yet
Unit 5
25 pages
Visualization With Help of PANDAS
No ratings yet
Visualization With Help of PANDAS
83 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Matplotlib Guide for Data Scientists
No ratings yet
Matplotlib Guide for Data Scientists
5 pages
Data Visualization
No ratings yet
Data Visualization
19 pages
Day-5 DS Practical
No ratings yet
Day-5 DS Practical
4 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Lab 5 &6
No ratings yet
Lab 5 &6
6 pages
Seaborn
No ratings yet
Seaborn
8 pages
Ex No 10
No ratings yet
Ex No 10
5 pages
Python
No ratings yet
Python
29 pages
Visualization
No ratings yet
Visualization
18 pages
Update Chapter 4 Data Visualizations
No ratings yet
Update Chapter 4 Data Visualizations
19 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
36 pages
Data Visualization with Matplotlib & Seaborn
No ratings yet
Data Visualization with Matplotlib & Seaborn
4 pages
Data Visualisation Using Pyplot
No ratings yet
Data Visualisation Using Pyplot
20 pages
Seaborn Plot Types and Examples
No ratings yet
Seaborn Plot Types and Examples
20 pages
Data Visualization
No ratings yet
Data Visualization
10 pages
Visualization
No ratings yet
Visualization
28 pages
Chapter11 DataVisualization2
No ratings yet
Chapter11 DataVisualization2
43 pages
Bar Plot and Histogram Guide
No ratings yet
Bar Plot and Histogram Guide
28 pages
V Unit
No ratings yet
V Unit
17 pages
The Resource-Based View (RBV) : Issues and Perspectives: March 2010
No ratings yet
The Resource-Based View (RBV) : Issues and Perspectives: March 2010
22 pages
Understanding Environmental Health Hazards
0% (1)
Understanding Environmental Health Hazards
29 pages
Land Laws Module 2 Part 2
No ratings yet
Land Laws Module 2 Part 2
12 pages
Website Notice For Guest Faculty Recruitment 1 - 250 - 250701 - 203331
No ratings yet
Website Notice For Guest Faculty Recruitment 1 - 250 - 250701 - 203331
4 pages
Piston Damage Recognising and Rectifying - 51730 PDF
100% (3)
Piston Damage Recognising and Rectifying - 51730 PDF
92 pages
Geographical Indication
100% (1)
Geographical Indication
28 pages
Key KPIs for Quality Management
No ratings yet
Key KPIs for Quality Management
8 pages
SSN College of Engineering KALAVAKKAM-603110
No ratings yet
SSN College of Engineering KALAVAKKAM-603110
6 pages
RA 10607 Insurance Code
100% (2)
RA 10607 Insurance Code
63 pages
Support Document 1 To Annexure To SOW 10 A1 Rev 4
100% (2)
Support Document 1 To Annexure To SOW 10 A1 Rev 4
8 pages
Transportation Strategies in Supply Chain
No ratings yet
Transportation Strategies in Supply Chain
56 pages
19 Unpriced BOM, Project & Manpower Plan, Compliances
No ratings yet
19 Unpriced BOM, Project & Manpower Plan, Compliances
324 pages
HISTORYOFLEAR
No ratings yet
HISTORYOFLEAR
2 pages
New Ojt Report
No ratings yet
New Ojt Report
24 pages
China Essence Tour: 14 Days Itinerary
No ratings yet
China Essence Tour: 14 Days Itinerary
4 pages
High School Biology Research Paper Rubric
100% (3)
High School Biology Research Paper Rubric
9 pages
Agritourism Market Trends and Insights
No ratings yet
Agritourism Market Trends and Insights
5 pages
Identifying+Key+Metrics Zoom GJ
No ratings yet
Identifying+Key+Metrics Zoom GJ
13 pages
Informatica Interview Questions Scenario Based PDF
No ratings yet
Informatica Interview Questions Scenario Based PDF
14 pages
Basohli Bridge: Design & Construction
No ratings yet
Basohli Bridge: Design & Construction
7 pages
Data Science For Entrepreneurship Principles and Methods For Data Engineering Analytics Entrepreneurship and The Society Werner Liebregts Download
100% (1)
Data Science For Entrepreneurship Principles and Methods For Data Engineering Analytics Entrepreneurship and The Society Werner Liebregts Download
83 pages
Coping Stratergy Manual
No ratings yet
Coping Stratergy Manual
22 pages
Java Script Oops Notes
No ratings yet
Java Script Oops Notes
15 pages
Computer Application Technology P2 May-June 2016 Memo Eng
No ratings yet
Computer Application Technology P2 May-June 2016 Memo Eng
15 pages
Folk, Feuds, and Factions
100% (1)
Folk, Feuds, and Factions
98 pages
Targeting by Radar
No ratings yet
Targeting by Radar
39 pages
17 Community Participation
No ratings yet
17 Community Participation
2 pages
Activity-Based Costing Explained
100% (1)
Activity-Based Costing Explained
3 pages
December 11, 2020.: Gender and Development (GAD) Survey Questionnaire
No ratings yet
December 11, 2020.: Gender and Development (GAD) Survey Questionnaire
9 pages
Preem Holding Q2 2022 Financial Report
No ratings yet
Preem Holding Q2 2022 Financial Report
17 pages