Chapter11_DataVisualization2
Chapter11_DataVisualization2
VISUALIZATION
Mathematical Economics Faculty
Outline
Introduction
• At the heart of any data science workflow is data exploration. Most commonly, we explore data
by using the following:
• Statistical methods(measuring averages, measuring variability,…)
• Data visualization (transforming data into a visual form)
• The other central task is to help us communicate and explain the results we’ve found through
exploring data. That being said, we have two kinds of data visualization:
• Exploratory data visualization: we build graphs for ourselves to explore data and find
patterns
• Explanatory data visualization: we build graphs for others to communicate and explain the
patterns we’ve found through exploring data
5
Introduction
Exploratory Data Visualization
Exploratory Data Visualization
Tell a story
Introduction to Matplotlib
Introduction to Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python.
$python myplot.py
• One thing to be aware of: the plt.show() command should be used only once per
Python session, and is most often seen at the very end of the script.
Saving Figures to File
• To create a graph using the OO interface, we use the plt.subplots() function, which generates
an empty plot and returns a tuple of two objects
plt.subplots()
fig, ax = plt.subplots()
print(type(fig))
print(type(ax))
<class 'matplotlib.figure.figure'="">
<class 'matplotlib.axes._subplots.axessubplot'="">
</class></class>
17
• The matplotlib.figure.Figure object acts as a canvas on which we can add one or more plots
• The matplotlib.axes._subplots.AxesSubplot object is the actual plot
• In short, we have two objects:
• The Figure (the canvas)
• The Axes (the plot; don’t confuse with “axis”, which is the x- and y-axis of a plot)
• To create a bar plot, we use the Axes.bar() method and call plt.show()
• The final code
fig, ax = plt.subplots()
ax.bar(['A', 'B', 'C'], [2, 4, 16])
Simple Line Plots
Simple Line Plots
Adjusting the Plot: Line Colors and Styles (I)
Adjusting the Plot: Line Colors and Styles (II)
Similarly, the line style can be adjusted
using the linestyle keyword:
Matplotlib does a decent job of choosing default axes The plt.axis() method allows you
limits for your plot, but sometimes it's nice to have finer to set the x and y limits with a
control. The most basic way to adjust axis limits is to single call, by passing a list
use the plt.xlim() and plt.ylim() methods: which specifies [xmin, xmax,
ymin, ymax]:
Labeling Plots
Titles and axis labels are the simplest When multiple lines are being shown within a single axes, it
such labels—there are methods that can be useful to create a plot legend that labels each line
can be used to quickly set them: type. Again, Matplotlib has a built-in way of quickly creating
such a legend. It is done via the plt.legend() method.
Practice: Plotting with Object-oriented interface
The
character
that
represents
the type of
symbol
used for
the
plotting.
Simple Scatter Plots
For even more possibilities, these character Additional keyword arguments to plt.plot specify
codes can be used together with line and a wide range of properties of the lines and
color codes to plot points along with a line markers:
connecting them:
Scatter Plots with plt.scatter
sns.kdeplot(pokemon_df.Attack);
Join Plot
Pair plots
When you generalize joint plots to
datasets of larger dimensions, you
end up with pair plots. This is very
useful for exploring correlations
between multidimensional data, when
you'd like to plot all pairs of values
against each other.
Practice
Iris dataset
Facet Plots
• Faceted histograms
Sometimes the best way to
view data is via histograms of
subsets. Seaborn's
FacetGrid makes this
extremely simple. We'll take
a look at some data that
shows the amount that
restaurant staff receive in tips
based on various indicator
data:
Factor and Bar plots