Data Visualization part 2
Data Visualization part 2
Data Visualization:
Bar Plots
Count Plots
Histograms
Cat Plots (Box, Violin, Swarm, Boxen)
Multiple Plots using FacetGrid
Joint Plots
KDE Plots
Pairplots
Heatmaps
Scatter Plots
Study Notes- Data Visualization
1. Bar Plots
Bar plots are an effective way to visualize various data types, including counts,
frequencies, percentages, or averages. They are particularly valuable for comparing data
across different categories.
Use Cases:
1. Categorical Comparison: In a bar plot, each bar represents a specific category, and the
height of the bar reflects the aggregated value associated with that category (such as
count, sum, or mean).
For instance, you can use a bar plot to show the average age of Titanic passengers based on
gender.
# Simple barplot
sns.barplot(data=titanic, x="who", y="age", estimator='mean',
errorbar=None, palette='viridis')
plt.title('Simple Barplot')
plt.xlabel('Person')
plt.ylabel('Average Age')
plt.show();
using Seaborn
2
Study Notes- Data Visualization
For example, a stacked bar chart could show the proportion of males from various towns
aboard the Titanic.
using Seaborn
For instance, you could use a clustered bar plot to compare the average age of males and
females within each class.
3
Study Notes- Data Visualization
# Clustered barplot
sns.barplot(data=titanic, x='class', y='age', hue='sex',
estimator='mean', errorbar=None, palette='viridis')
plt.title('Clustered Barplot')
plt.xlabel('Class')
plt.ylabel('Average Age')
plt.show();
using Seaborn
2. Count Plots
A count plot visualizes the frequency of occurrences for each category within a
categorical variable. The x-axis shows the categories, while the y-axis indicates the count
or frequency of each category.
Use Cases:
For example, the count plot can be used to show the status of passengers on the Titanic.
# Simple Countplot
sns.countplot(data=titanic, x='alive', palette='viridis')
plt.title('Simple Countplot')
plt.show();
4
Study Notes- Data Visualization
using Seaborn
Analyzing the relationship between different categorical variables
For example, examining the status of passengers based on gender on the Titanic.
# Clustered Countplot
sns.countplot(data=titanic, y="who",
hue="alive", palette='viridis')
plt.title('Clustered Countplot')
plt.show();
using Seaborn
3. Histograms
Histograms are visual representations that display the distribution of a dataset, helping
5
Study Notes- Data Visualization
using Seaborn
6
Study Notes- Data Visualization
#Stacked Histogram
sns.histplot(iris, x='sepal_length', hue='species', multiple='stack',
linewidth=0.5)
plt.title('Stacked Histogram')
plt.show()
using Seaborn
7
Study Notes- Data Visualization
# Boxplot
sns.boxplot(data=tips, x='time', y='total_bill', hue='sex', palette='viridis')
plt.title('Boxplot')
plt.show()
using Seaborn
# Violinplot
sns.violinplot(data=tips, x='day', y='total_bill', palette='viridis')
plt.title('Violinplot')
plt.show()
8
Study Notes- Data Visualization
using Seaborn
#Swarmplot
sns.swarmplot(data=tips, x='time', y='tip', dodge=True, palette='viridis', hue='sex', s=6)
plt.title('SwarmPlot')
plt.show()
using Seaborn
#StripPlot
sns.stripplot(data=tips, x='tip', hue='size', y='day', s=25, alpha=0.2,
jitter=False, marker='D',palette='viridis')
plt.title('StripPlot')
plt.show()
using Seaborn
9
Study Notes- Data Visualization
g.map(sns.boxplot, 'pulse')
g.add_legend();
using Seaborn
Scatter plots for flipper length and body mass of Penguins from different islands
10
Study Notes- Data Visualization
using Seaborn
6. Joint Plots
A joint plot combines univariate and bivariate visualizations in one figure. The central plot
typically features a scatter plot or hexbin plot to represent the joint distribution of two
variables. Additional plots, such as histograms or Kernel Density Estimates (KDEs), are displayed
along the axes to show the individual distributions of each variable.
Use Cases:
11
Study Notes- Data Visualization
7. KDE Plots
A KDE (Kernel Density Estimate) plot provides a smooth, continuous representation of the
probability density function for a continuous random variable. The y-axis represents the density
or likelihood of observing specific values, while the x-axis displays the variable's values.
Use Cases:
12
Study Notes- Data Visualization
8. Pairplots
A pair plot is a visualization technique that helps explore relationships between multiple
variables in a dataset. It creates a grid of scatter plots where each variable is plotted against
13
Study Notes- Data Visualization
every other variable, with diagonal entries displaying histograms or density plots to show the
distribution of values for each variable.
Use Cases:
#Simple Pairplot
sns.pairplot(data=penguins, corner=True);
14
Study Notes- Data Visualization
By adding hue to the plot, we can clearly distinguish key differences between the various
species of penguins.
9. Heatmaps
Heatmaps are visualizations that use color-coded cells to represent the values within a matrix
or table of data. In a heatmap, the rows and columns correspond to two different variables, and
the color intensity of each cell indicates the value or magnitude of the data point at their
intersection.
Use Cases:
Correlation analysis and visualizing pivot tables that aggregate data by rows and
columns.
Example: Visualizing the correlation between all the numerical columns in the mpg
dataset.
15
Study Notes- Data Visualization
fig = plt.figure(figsize=(12,7))
#Correlation Heatmap
sns.heatmap(data=mpg[num_cols].corr(),
annot=True, cmap=sns.cubehelix_palette(as_cmap=True))
plt.title('Heatmap of Correlation matrix');
plt.show();
Use Cases:
1. Relationship Analysis: Scatter plots help identify the relationship between two variables, such
as positive correlation (both increase together), negative correlation (one increases as the other
decreases), or no correlation.
Example: A scatter plot can show that the horsepower and weight of cars are positively
correlated.
# Simple Scatterplot
sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7)
plt.title('Simple Scatterplot')
plt.show();
16
Study Notes- Data Visualization
using Seaborn
Outlier Detection: Scatter plots effectively highlight outliers, which are data points that significantly
deviate from the general trend or pattern.
Clustering and Group Identification: By analyzing the distribution of points, scatter plots can reveal
natural groupings or patterns among the variables.
Example: Comparing the horsepower and weight of cars manufactured in different countries.
17
Study Notes- Data Visualization
Trend Analysis: Scatter plots can illustrate the progression or changes in variables over
time by plotting data points in chronological order, making it easier to identify trends or
shifts in behavior.
Model Validation: Scatter plots are useful for assessing a model's accuracy by
comparing predicted values against actual values, highlighting any deviations or
patterns in the model’s predictions.
18