0% found this document useful (0 votes)
4 views

Chapter11_DataVisualization2

The document provides an overview of data preparation and visualization techniques using Matplotlib and Seaborn in Python. It covers various plotting methods, including line, scatter, bar, and histograms, as well as the differences between exploratory and explanatory data visualization. Additionally, it discusses the use of subplots and advanced features like pair plots and heatmaps for effective data analysis and presentation.

Uploaded by

Tuấn Đỗ Anh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Chapter11_DataVisualization2

The document provides an overview of data preparation and visualization techniques using Matplotlib and Seaborn in Python. It covers various plotting methods, including line, scatter, bar, and histograms, as well as the differences between exploratory and explanatory data visualization. Additionally, it discusses the use of subplots and advanced features like pair plots and heatmaps for effective data analysis and presentation.

Uploaded by

Tuấn Đỗ Anh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

DATA PREPARATION AND

VISUALIZATION
Mathematical Economics Faculty

National Economics University


https://www.neu.edu.vn/ 1
Chapter 11: Plotting and
Visualization
3

Outline

1. Introduction To Matplotlib 2. Introduction To Seaborn


• line • Distribution: Hist, KDE
• scatter • Join Plot
• bar • Pair Plot
• hist • Bar and Box Plot Facet Plot
• Subplots
4

Introduction
• At the heart of any data science workflow is data exploration. Most commonly, we explore data
by using the following:
• Statistical methods(measuring averages, measuring variability,…)
• Data visualization (transforming data into a visual form)
• The other central task is to help us communicate and explain the results we’ve found through
exploring data. That being said, we have two kinds of data visualization:
• Exploratory data visualization: we build graphs for ourselves to explore data and find
patterns
• Explanatory data visualization: we build graphs for others to communicate and explain the
patterns we’ve found through exploring data
5

Introduction
Exploratory Data Visualization
Exploratory Data Visualization
Tell a story
Introduction to Matplotlib
Introduction to Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python.

Matplotlib makes easy things easy and hard things possible.


Introduction to Figures

• Figure: Blue zone


• Axes: an Artist attached to a Figure
that contains a region for plotting
data.
• Axis: set the scale and limits
• Artist: everything visible on the
Figure is an Artist
Parts of a Figure
General Matplotlib Tips

• Importing Matplotlib • Setting Styles

We will use the plt.style.use directive to


choose appropriate aesthetic styles for our
figures
Plotting from a script

• Create a file called myplot.py containing the following

# ------- file: myplot.py ------


import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))
plt.show()

• Run this script from the command-line prompt:

$python myplot.py

• One thing to be aware of: the plt.show() command should be used only once per
Python session, and is most often seen at the very end of the script.
Saving Figures to File

To confirm that it contains what we


think it contains, let's use the IPython
Image object to display the contents of
this file:
Two Interfaces for the Price of One(I)

• MATLAB-style Interface • Object-oriented interface


16

Two Interfaces for the Price of One(II)

• To create a graph using the OO interface, we use the plt.subplots() function, which generates
an empty plot and returns a tuple of two objects

plt.subplots()

fig, ax = plt.subplots()

print(type(fig))

print(type(ax))

<class 'matplotlib.figure.figure'="">
<class 'matplotlib.axes._subplots.axessubplot'="">
</class></class>
17

Two Interfaces for the Price of One(III)

• The matplotlib.figure.Figure object acts as a canvas on which we can add one or more plots
• The matplotlib.axes._subplots.AxesSubplot object is the actual plot
• In short, we have two objects:
• The Figure (the canvas)
• The Axes (the plot; don’t confuse with “axis”, which is the x- and y-axis of a plot)
• To create a bar plot, we use the Axes.bar() method and call plt.show()
• The final code
fig, ax = plt.subplots()
ax.bar(['A', 'B', 'C'], [2, 4, 16])
Simple Line Plots
Simple Line Plots
Adjusting the Plot: Line Colors and Styles (I)
Adjusting the Plot: Line Colors and Styles (II)
 Similarly, the line style can be adjusted
using the linestyle keyword:

 If you would like to be extremely terse,


these linestyle and color codes can be
combined into a single non-keyword
argument to the plt.plot() function:
Adjusting the Plot: Axes Limits

Matplotlib does a decent job of choosing default axes The plt.axis() method allows you
limits for your plot, but sometimes it's nice to have finer to set the x and y limits with a
control. The most basic way to adjust axis limits is to single call, by passing a list
use the plt.xlim() and plt.ylim() methods: which specifies [xmin, xmax,
ymin, ymax]:
Labeling Plots

Titles and axis labels are the simplest When multiple lines are being shown within a single axes, it
such labels—there are methods that can be useful to create a plot legend that labels each line
can be used to quickly set them: type. Again, Matplotlib has a built-in way of quickly creating
such a legend. It is done via the plt.legend() method.
Practice: Plotting with Object-oriented interface

• While most plt functions translate directly to ax


methods (such as plt.plot() →ax.plot(), plt.legend() →
ax.legend(), etc.); functions to set limits, labels, and
titles are slightly modified:
• plt.xlabel() → ax.set_xlabel()
• plt.ylabel() → ax.set_ylabel()
• plt.xlim() → ax.set_xlim()
• plt.ylim() → ax.set_ylim()
• plt.title() → ax.set_title()
Scatter Plots with plt.plot

The
character
that
represents
the type of
symbol
used for
the
plotting.
Simple Scatter Plots
For even more possibilities, these character Additional keyword arguments to plt.plot specify
codes can be used together with line and a wide range of properties of the lines and
color codes to plot points along with a line markers:
connecting them:
Scatter Plots with plt.scatter

A second, more powerful method of creating


scatter plots is the plt.scatter function, which
can be used very similarly to the plt.plot
function:
The primary
difference of
plt.scatter from
plt.plot is that it
can be used to
create scatter
plots where the
properties of
each individual
point (size, face
color, edge color,
etc.) can be
individually
Bar
• Vertical • Horizontal
Histograms
A simple histogram can be a great first The hist() function has many options to
step in understanding a dataset. tune both the calculation and the display;
Histograms
Multiple Subplots
Sometimes it is helpful to compare different views of data side by side. To this end,
Matplotlib has the concept of subplots: groups of smaller axes that can exist together
within a single figure.
plt.axes: Subplots by Hand plt.subplot: Simple Grids of Subplots
The most basic method of creating an
axes is to use the plt.axes function.

[left, bottom, width, height]


Multiple Subplots
plt.subplots: The Whole Grid in One Go
Multiple Subplots
Practice: Data Visualization - California Cities
Introduction to Seaborn
The Python visualization library Seaborn is based on matplotlib and provides a high-
level interface for drawing attractive statistical graphics. Make use of the following
aliases to import the libraries:

# import three necessary libraries


import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# to ignore the warnings


from warnings import filterwarnings

The basic steps to creating plots with


Seaborn are:
1. Prepare some data
2. Control figure aesthetics
3. Plot with Seaborn
4. Further customize your plot
Distribution: Hist, KDE

sns.kdeplot(pokemon_df.Attack);
Join Plot
Pair plots
When you generalize joint plots to
datasets of larger dimensions, you
end up with pair plots. This is very
useful for exploring correlations
between multidimensional data, when
you'd like to plot all pairs of values
against each other.
Practice
Iris dataset
Facet Plots

• Faceted histograms
Sometimes the best way to
view data is via histograms of
subsets. Seaborn's
FacetGrid makes this
extremely simple. We'll take
a look at some data that
shows the amount that
restaurant staff receive in tips
based on various indicator
data:
Factor and Bar plots

• Factor plots • Bar plots


Heatmap
Practice: Visualation Correlation

You might also like