0% found this document useful (0 votes)
6 views

Data Visualization part 2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Data Visualization part 2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Study Notes

Data Visualization:
 Bar Plots
 Count Plots
 Histograms
 Cat Plots (Box, Violin, Swarm, Boxen)
 Multiple Plots using FacetGrid
 Joint Plots
 KDE Plots
 Pairplots
 Heatmaps
 Scatter Plots
Study Notes- Data Visualization

1. Bar Plots
Bar plots are an effective way to visualize various data types, including counts,
frequencies, percentages, or averages. They are particularly valuable for comparing data
across different categories.

Use Cases:

1. Categorical Comparison: In a bar plot, each bar represents a specific category, and the
height of the bar reflects the aggregated value associated with that category (such as
count, sum, or mean).

For instance, you can use a bar plot to show the average age of Titanic passengers based on
gender.

# Simple barplot
sns.barplot(data=titanic, x="who", y="age", estimator='mean',
errorbar=None, palette='viridis')
plt.title('Simple Barplot')
plt.xlabel('Person')
plt.ylabel('Average Age')
plt.show();

using Seaborn

2. Proportional Representation with Stacked Bar Charts:


Bar plots can also be used to visualize proportions or percentages. By adjusting the
height of each bar to reflect the proportion of observations within a category, stacked
bar charts allow for a comparison of the relative distribution across different categories.

2
Study Notes- Data Visualization

For example, a stacked bar chart could show the proportion of males from various towns
aboard the Titanic.

#Prepare data for next plot


data = titanic.groupby('embark_town').agg({'who':'count','sex': lambda x: (x=='male').sum()}).reset_index()
data.rename(columns={'who':'total', 'sex':'male'}, inplace=True)
data.sort_values('total', inplace=True)

# Barplot Showing Part of Total


sns.set_color_codes("pastel")
sns.barplot(x="total", y="embark_town", data=data,
label="Female", color="b")
sns.set_color_codes("muted")
sns.barplot(x="male", y="embark_town", data=data,
label="Male", color="b")
plt.title('Barplot Showing Part of Total')
plt.xlabel('Number of Persons')
plt.legend(loc='upper right')
plt.show()

using Seaborn

3. Comparing Subcategories within Categories using Clustered Bar Plots:


Clustered bar plots group multiple bars within each category to represent different
subcategories, making it easier to compare and analyze data across them.

For instance, you could use a clustered bar plot to compare the average age of males and
females within each class.

3
Study Notes- Data Visualization

# Clustered barplot
sns.barplot(data=titanic, x='class', y='age', hue='sex',
estimator='mean', errorbar=None, palette='viridis')
plt.title('Clustered Barplot')
plt.xlabel('Class')
plt.ylabel('Average Age')
plt.show();

using Seaborn

2. Count Plots
A count plot visualizes the frequency of occurrences for each category within a
categorical variable. The x-axis shows the categories, while the y-axis indicates the count
or frequency of each category.

Use Cases:

 Frequency Distribution of Categorical Variables: Each bar in the plot represents a


category, and its height reflects the number of observations in that category, helping
identify the most and least common categories.

For example, the count plot can be used to show the status of passengers on the Titanic.

# Simple Countplot
sns.countplot(data=titanic, x='alive', palette='viridis')
plt.title('Simple Countplot')
plt.show();

4
Study Notes- Data Visualization

using Seaborn
Analyzing the relationship between different categorical variables
For example, examining the status of passengers based on gender on the Titanic.

# Clustered Countplot
sns.countplot(data=titanic, y="who",
hue="alive", palette='viridis')
plt.title('Clustered Countplot')
plt.show();

using Seaborn

3. Histograms
Histograms are visual representations that display the distribution of a dataset, helping
5
Study Notes- Data Visualization

to uncover key characteristics such as normality, skewness, or multiple peaks. They


show the frequency or count of data points within specific intervals or "bins." The x-axis
represents the range of values in the dataset, divided into equal bins, while the y-axis
shows the frequency or count of observations within each bin. The height of each bar
corresponds to the number of data points in that bin.
Use Cases:
4. To visualize the distribution, central tendency, range, and spread of a continuous or
numeric variable, as well as to identify any patterns or outliers.

# Histogram with KDE


sns.histplot(data=iris, x='sepal_width', kde=True)
plt.title('Histogram with KDE')
plt.show();

using Seaborn

2. 2. Compare theCompare the distribution of multiple continuous variables


For example, comparing the distribution of petal length and sepal length in flowers.

# Histogram with multiple features


sns.histplot(data=iris[['sepal_length','sepal_width']])
plt.title('Multi-Column Histogram')
plt.show()

6
Study Notes- Data Visualization

3. Compare the distribution of a continuous variable across different categories


For example, comparing the distribution of petal length among various flower species.

#Stacked Histogram
sns.histplot(iris, x='sepal_length', hue='species', multiple='stack',
linewidth=0.5)
plt.title('Stacked Histogram')
plt.show()

using Seaborn

4. Cat Plots (Box, Violin, Swarm, Boxen)


A catplot is a high-level, flexible function that integrates several categorical seaborn
plots, such as boxplots, violinplots, swarmplots, pointplots, barplots, and countplots.
Use Cases:

7
Study Notes- Data Visualization

 Analyze the relationship between categorical and continuous variables


 Obtain a statistical summary of a continuous variable
Examples:

# Boxplot
sns.boxplot(data=tips, x='time', y='total_bill', hue='sex', palette='viridis')
plt.title('Boxplot')
plt.show()

using Seaborn
# Violinplot
sns.violinplot(data=tips, x='day', y='total_bill', palette='viridis')
plt.title('Violinplot')
plt.show()

8
Study Notes- Data Visualization

using Seaborn
#Swarmplot
sns.swarmplot(data=tips, x='time', y='tip', dodge=True, palette='viridis', hue='sex', s=6)
plt.title('SwarmPlot')
plt.show()

using Seaborn
#StripPlot
sns.stripplot(data=tips, x='tip', hue='size', y='day', s=25, alpha=0.2,
jitter=False, marker='D',palette='viridis')
plt.title('StripPlot')
plt.show()

using Seaborn

9
Study Notes- Data Visualization

5Multiple Plots using FacetGrid


FacetGrid is a feature in the Seaborn library that enables the creation of multiple data subsets
arranged in a grid-like structure. Each plot in the grid represents a category, and these subsets
are defined by the column names specified in the 'col' and 'row' attributes of FacetGrid(). The
plots in the grid can be of any type supported by Seaborn, such as scatter plots, line plots, bar
plots, or histograms.
Use Cases:

 Compare and analyze different groups or categories within a dataset


 Create subplots efficiently
Example: Boxplots for pulse rate during various activities

# Creating subplots using FacetGrid

g = sns.FacetGrid(exercise, col='kind', palette='Paired')

# Drawing a plot on every facet

g.map(sns.boxplot, 'pulse')

g.set_titles(col_template="Pulse rate for {col_name}")

g.add_legend();

using Seaborn

Scatter plots for flipper length and body mass of Penguins from different islands

# Creating subplots using FacetGrid


g = sns.FacetGrid(penguins, col='island',hue='sex', palette='Paired')

# Drawing a plot on every facet


g.map(sns.scatterplot, 'flipper_length_mm', 'body_mass_g')
g.set_titles(template="Penguins of {col_name} Island")
g.add_legend();

10
Study Notes- Data Visualization

using Seaborn

6. Joint Plots
A joint plot combines univariate and bivariate visualizations in one figure. The central plot
typically features a scatter plot or hexbin plot to represent the joint distribution of two
variables. Additional plots, such as histograms or Kernel Density Estimates (KDEs), are displayed
along the axes to show the individual distributions of each variable.
Use Cases:

 Analyzing the relationship between two variables

# Hex Plot with Histogram margins


sns.jointplot(x="mpg", y="displacement", data=mpg,
height=5, kind='hex', ratio=2, marginal_ticks=True)

 Comparing the individual distributions of two variables


Example: Comparing displacement and miles per gallon (MPG) for cars

11
Study Notes- Data Visualization

Comparison of acceleration and horsepower for cars from different countries


# Scatter Plot with KDE Margins
sns.jointplot(x="horsepower", y="acceleration", data=mpg,
hue="origin", height=5, ratio=2, marginal_ticks=True);

7. KDE Plots
A KDE (Kernel Density Estimate) plot provides a smooth, continuous representation of the
probability density function for a continuous random variable. The y-axis represents the density
or likelihood of observing specific values, while the x-axis displays the variable's values.
Use Cases:

 Visualizing the distribution of a single variable (univariate analysis)


 Gaining insights into the shape, peaks, and skewness of the distribution
Example: Comparing the horsepower of cars in relation to the number of cylinders

#Overlapping KDE Plots


sns.kdeplot(data=mpg, x='horsepower', hue='cylinders', fill=True,
palette='viridis', alpha=.5, linewidth=0)
plt.title('Overlapping KDE Plot')
plt.show(

12
Study Notes- Data Visualization

Comparing the weight of cars across different countries:

#Stacked KDE Plots


sns.kdeplot(data=mpg, x="weight", hue="origin", multiple="stack")
plt.title('Stacked KDE Plot')
plt.show();

8. Pairplots
A pair plot is a visualization technique that helps explore relationships between multiple
variables in a dataset. It creates a grid of scatter plots where each variable is plotted against
13
Study Notes- Data Visualization

every other variable, with diagonal entries displaying histograms or density plots to show the
distribution of values for each variable.
Use Cases:

 Identifying correlations or patterns between variables, such as linear or non-linear


relationships, clusters, or outliers
Example: Visualizing the relationships between different features of penguins

#Simple Pairplot
sns.pairplot(data=penguins, corner=True);

# Pairplot with hues


sns.pairplot(data=penguins, hue='species');

14
Study Notes- Data Visualization

By adding hue to the plot, we can clearly distinguish key differences between the various
species of penguins.

9. Heatmaps
Heatmaps are visualizations that use color-coded cells to represent the values within a matrix
or table of data. In a heatmap, the rows and columns correspond to two different variables, and
the color intensity of each cell indicates the value or magnitude of the data point at their
intersection.
Use Cases:

 Correlation analysis and visualizing pivot tables that aggregate data by rows and
columns.
Example: Visualizing the correlation between all the numerical columns in the mpg
dataset.

Selection of numeric columns from the dataset


num_cols = list(mpg.select_dtypes(include='number'))

15
Study Notes- Data Visualization

fig = plt.figure(figsize=(12,7))

#Correlation Heatmap
sns.heatmap(data=mpg[num_cols].corr(),
annot=True, cmap=sns.cubehelix_palette(as_cmap=True))
plt.title('Heatmap of Correlation matrix');

plt.show();

10. Scatter Plots


A scatter plot visualizes the relationship between two continuous variables by displaying
individual data points on a graph. The x-axis represents one variable, and the y-axis represents
the other, creating a pattern of scattered points that illustrates their interaction.

Use Cases:

1. Relationship Analysis: Scatter plots help identify the relationship between two variables, such
as positive correlation (both increase together), negative correlation (one increases as the other
decreases), or no correlation.
Example: A scatter plot can show that the horsepower and weight of cars are positively
correlated.

# Simple Scatterplot
sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7)
plt.title('Simple Scatterplot')
plt.show();

16
Study Notes- Data Visualization

using Seaborn
Outlier Detection: Scatter plots effectively highlight outliers, which are data points that significantly
deviate from the general trend or pattern.

Clustering and Group Identification: By analyzing the distribution of points, scatter plots can reveal
natural groupings or patterns among the variables.
Example: Comparing the horsepower and weight of cars manufactured in different countries.

# Scatterplot with Hue


sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7,
hue='origin', palette='viridis')
plt.title('Scatterplot with Hue')
plt.show()

# Scatterplot with Hue and Markers


sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7,
style='origin',palette='viridis', hue='origin')
plt.title('Scatterplot with Hue and Markers')
plt.show()

17
Study Notes- Data Visualization

# Scatterplot with Hue & Size


sns.scatterplot(data=mpg, x='weight', y='horsepower', sizes=(40, 400), alpha=.5,
palette='viridis', hue='origin', size='cylinders')
plt.title('Scatterplot with Hue & Size')
plt.show

 Trend Analysis: Scatter plots can illustrate the progression or changes in variables over
time by plotting data points in chronological order, making it easier to identify trends or
shifts in behavior.
 Model Validation: Scatter plots are useful for assessing a model's accuracy by
comparing predicted values against actual values, highlighting any deviations or
patterns in the model’s predictions.

18

You might also like