HCMC University of Technology & Education
Faculty of Economic
PYTHON LANGUAGE
and AI application
VISUALIZATION WITH SEABORN
Mr Lê Ngọc Hiếu
HCMC, April 2022
What have we learned?
• Keras, TensorFlow, and Scikit-learn are for
machine learning
• NumPy is for data analysis and high-
performance computation
• SciPy is advanced computing
• Pandas is for data analysis in general
• Seaborn is for Data Visualization
Before class
• Understand about python and how to code
with python language
• Understand and know how to use one of the
followings:
– Jupyter Notebook (offline)
– Google Colaborator (online on browser, need
google account)
– Visual Studio Code integrated with Python
Jupyter (Offline)
Learning goals
• Understand Seaborn and know how to use it
in Python code
• Use python seaborn to visualize data in
basic level
What are we going to learn?
1. Introduction to seaborn library
2. Installing seaborn
3. Plotting functions
4. Multi-plot grids
5. Plot aesthetics
6. Exercises with Churn Dataset
7. References
SEABORN
Introduction
1. Introduction
Seaborn is a Python data visualization
library based on matplotlib. It provides
a high-level interface for drawing
attractive and informative statistical
graphics.
To see the code or report a bug, please
visit the GitHub repository. General
support questions are most at home
on stackoverflow or discourse, which
have dedicated channels for seaborn.
1. Introduction
• Current version of seaborn is v0.11.1
(December 2020)
• Seaborn was created by
Michael Waskom.
SEABORN
Install & How to use
2. Installing seaborn
Official releases of seaborn can be
installed from PyPI:
pip install seaborn
Supported Python versions: Python
3.6+
2. Installing seaborn
SEABORN
Plotting functions
2. Plotting functions
• Visualizing statistical relationships
• Visualizing distributions of data
• Plotting with categorical data
• Visualizing regression models
2.1 Visualizing statistical relationships
Statistical analysis is a process of
understanding how variables in a dataset
relate to each other and how those
relationships depend on other variables.
Visualization can be a core component of this
process because, when data are visualized
properly, the human visual system can see
trends and patterns that indicate a
relationship.
2.1 Visualizing statistical relationships
2.1 Visualizing statistical relationships
2.1 Visualizing statistical relationships
2.2 Visualizing distributions of data
An early step in any effort to analyze or model
data should be to understand how the
variables are distributed. Techniques for
distribution visualization can provide quick
answers to many important questions. What
range do the observations cover? What is their
central tendency? Are they heavily skewed in
one direction? Is there evidence for
bimodality? Are there significant outliers? Do
the answers to these questions vary across
subsets defined by other variables?
2.2 Visualizing distributions of data
2.2 Visualizing distributions of data
2.2 Visualizing distributions of data
2.2 Visualizing distributions of data
2.3 Plotting with categorical data
In seaborn, there are several different ways to
visualize a relationship involving categorical
data. Similar to the relationship between
relplot() and either scatterplot() or
lineplot(), there are two ways to make these
plots. There are a number of axes-level
functions for plotting categorical data in
different ways and a figure-level interface,
catplot(), that gives unified higher-level
access to them.
2.3 Plotting with categorical data
2.3 Plotting with categorical data
2.3 Plotting with categorical data
2.3 Plotting with categorical data
2.4 Visualizing regression models
Many datasets contain multiple quantitative
variables, and the goal of an analysis is often
to relate those variables to each other. We
previously discussed functions that can
accomplish this by showing the joint
distribution of two variables. It can be very
helpful, though, to use statistical models to
estimate a simple relationship between two
noisy sets of observations. The functions
discussed in this chapter will do so through the
common framework of linear regression.
2.4 Visualizing regression models
2.4 Visualizing regression models
2.4 Visualizing regression models
2.4 Visualizing regression models
SEABORN
Multi-plot grids
3. Multi-plot grids
When exploring multi-dimensional data, a useful approach is to
draw multiple instances of the same plot on different subsets of
your dataset. This technique is sometimes called either “lattice” or
“trellis” plotting, and it is related to the idea of “small multiples”. It
allows a viewer to quickly extract a large amount of information
about a complex dataset. Matplotlib offers good support for making
figures with multiple axes; seaborn builds on top of this to directly
link the structure of the plot to the structure of your dataset.
The figure-level functions are built on top of the objects discussed
in this chapter of the tutorial. In most cases, you will want to work
with those functions. They take care of some important
bookkeeping that synchronizes the multiple plots in each grid. This
chapter explains how the underlying objects work, which may be
useful for advanced applications.
3. Multi-plot grids
3. Multi-plot grids
3. Multi-plot grids
3. Multi-plot grids
SEABORN
Plot aesthetics
4. Plot aesthetics
Choosing color palettes
4. Plot aesthetics
SEABORN
Excercises
5. Exercises with Churn Dataset
• https://www.kaggle.com/blastchar/telco-
customer-churn
References
1. https://seaborn.pydata.org/
2. https://www.kaggle.com/blastchar/telco-customer-
churn
HCMC University of Technology & Education
Faculty of Economic