4 Unit 4 Data Vizuvalization
4 Unit 4 Data Vizuvalization
Syllabus
Importing Matplotlib - Line plots - Scatter plots - visualizing errors - densityand
contour plots Histograms - legends - colors - subplots - text andannotation -
customization three dimensional plotting - Geographic Data withBasemap -
Visualization with Seaborn.
• This window is a matplotlib window, which allows us to see our graph, as well as
interact with it and navigate it
• A grid can be added to a Matplotlib plot using the plt.grid() command. By defaut,
the grid is turned off. To turn on the grid use:
plt.grid(True)
• The only valid options are plt.grid(True) and plt.grid(False). Note that True and
False are capitalized and are not enclosed in quotes.
Defining the Line Appearance and Working with Line Style
• Line styles help differentiate graphs by drawing the lines in various ways. Following
line style is used by Matplotlib.
• Matplotlib has an additional parameter to control the colour and style of the plot.
plt.plot(xa, ya 'g')
• This will make the line green. You can use any colour of red, green, blue, cyan,
magenta, yellow, white or black just by using the first character of the colour name in
lower case (use "k" for black, as "b" means blue).
• You can also alter the linestyle, for example two dashes -- makes a dashed line.
This can be used added to the colour selector, like this:
plt.plot(xa, ya 'r--')
• You can use "-" for a solid line (the default), "-." for dash-dot lines, or ":" for a dotted
line. Here is an example :
from matplotlib import pyplot as plt
import numpy as np
xa = np.linspace(0, 5, 20)
ya = xa**2
plt.plot(xa, ya, 'g')
ya = 3*xa
plt.plot(xa, ya, 'r--')
plt.show()
Output:
• Comparing plt.scatter() and plt.plot(): We can also produce the scatter plot shown
above using another function within matplotlib.pyplot. Matplotlib'splt.plot() is a
general-purpose plotting function that will allow user to create various different line
or marker plots.
• We can achieve the same scatter plot as the one obtained in the section above with
the following call to plt.plot(), using the same data:
plt.plot(x, y, "o")
plt.show()
• In this case, we had to include the marker "o" as a third argument, as otherwise
plt.plot() would plot a line graph. The plot created with this code is identical to the
plot created earlier with plt.scatter().
. • Here's a rule of thumb that can use :
a) If we need a basic scatter plot, use plt.plot(), especially if we want to prioritize
performance.
b) If we want to customize our scatter plot by using more advanced plotting features,
use plt.scatter().
• Example: We can create a simple scatter plot in Python by passing x and y values
to plt.scatter():
# scatter_plotting.py
importmatplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
x = [2, 4, 6, 6, 9, 2, 7, 2, 6, 1, 8, 4, 5, 9, 1, 2, 3, 7, 5, 8, 1, 3]
y = [7, 8, 2, 4, 6, 4, 9, 5, 9, 3, 6, 7, 2, 4, 6, 7, 1, 9, 4, 3, 6, 9]
plt.scatter(x, y)
plt.show()
Output:
• When creating a contour plot, we can also specify the color map. There are different
classes of color maps. Matplotlib gives the following guidance :
a) Sequential: Change in lightness and often saturation of color incrementally, often
using a single hue; should be used for representing information that has ordering.
b) Diverging: Change in lightness and possibly saturation of two different colors that
meet in the middle at an unsaturated color; should be used when the information
being plotted has a critical middle value, such as topography or when the data
deviates around zero.
c) Cyclic : Change in lightness of two different colors that meet in the middle and
beginning/end at an unsaturated color; should be used for values that wrap around
at the endpoints, such as phase angle, wind direction, or time of day.
d) Qualitative: Often are miscellaneous colors; should be used to represent
information which does not have ordering or relationships.
• This data has both positive and negative values, which zero representing a node for
the wave function. There are three important display options for contour plots: the
undisplaced shape key, the scale factor, and the contour scale.
a) The displaced shape option controls if and how the deformed model is shown in
comparison to the undeformed (original) geometry. The "Deformed shape only" is the
default and provides no basis for comparison.
b) The "Deformed shape with undeformed edge" option overlays the contour plot on
an outline of the original model.
c) The "Deformed shape with undeformed model" option overlays the contour plot on
the original finite element model.
4.5 Histogram
• In a histogram, the data are grouped into ranges (e.g. 10 - 19, 20 - 29) and then
plotted as connected bars. Each bar represents a range of data. The width of each bar
is proportional to the width of each category, and the height is proportional to the
frequency or percentage of that category.
• It provides a visual interpretation of numerical data by showing the number of data
points that fall within a specified range of values called "bins".
• Fig. 5.5.1 shows histogram.
• Histograms can display a large amount of data and the frequency of the data
values. The median and distribution of the data can be determined by a histogram. In
addition, it can show any outliers or gaps in the data.
• Matplotlib provides a dedicated function to compute and display histograms:
plt.hist()
• Code for creating histogram with randomized data :
import numpy as np
import matplotlib.pyplot as plt
x = 40* np.random.randn(50000)
plt.hist(x, 20, range=(-50, 50), histtype='stepfilled',
align='mid', color='r', label="Test Data')
plt.legend()
plt.title(' Histogram')
plt.show()
Legend
• Plot legends give meaning to a visualization, assigning labels to the various plot
elements. Legends are found in maps - describe the pictorial language or symbology
of the map. Legends are used in line graphs to explain the function or the values
underlying the different lines of the graph.
• Matplotlib has native support for legends. Legends can be placed in various
positions: A legend can be placed inside or outside the chart and the position can be
moved. The legend() method adds the legend to the plot.
• To place the legend inside, simply call legend():
import matplotlib.pyplot as plt
import numpy as np
y = [2,4,6,8,10,12,14,16,18,20]
y2 = [10,11,12,13,14,15,16,17,18,19]
x = np.arange(10)
fig = plt.figure()
ax = plt.subplot(111)
ax.plot(x, y, label='$y = numbers')
ax.plot(x, y2, label='$y2 = other numbers')
plt.title('Legend inside')
ax.legend()
plt.show()
Output:
• If we add a label to the plot function, the value will be used as the label in the
legend command. There is another argument that we can add to the legend function:
We can define the location of the legend inside of the axes plot with the parameter
"loc". If we add a label to the plot function, the values will be used in the legend
command:
frompolynomialsimportPolynomial
importnumpyasnp
importmatplotlib.pyplotasplt
p=Polynomial(-0.8,2.3,0.5,1,0.2)
p_der=p.derivative()
fig, ax=plt.subplots()
X=np.linspace (-2,3,50, endpoint=True)
F=p(X)
F_derivative=p_der(X)
ax.plot(X,F,label="p")
ax.plot(X,F_derivative,label="derivation of p")
ax.legend(loc='upper left')
Output:
• There are 3 different ways (at least) to create plots (called axes) in matplotlib. They
are:plt.axes(), figure.add_axis() and plt.subplots()
• plt.axes(): The most basic method of creating an axes is to use the plt.axes
function. It takes optional argument for figure coordinate system. These numbers
represent [bottom, left, width, height] in the figure coordinate system, which ranges
from 0 at the bottom left of the figure to 1 at the top right of the figure.
• Plot just one figure with (x,y) coordinates: plt.plot(x, y).
• By calling subplot(n,m,k), we subdidive the figure into n rows and m columns and
specify that plotting should be done on the subplot number k. Subplots are
numbered row by row, from left to right.
importmatplotlib.pyplotasplt
importnumpyasnp
frommathimportpi
plt.figure(figsize=(8,4)) # set dimensions of the figure
x=np.linspace (0,2*pi,100)
foriinrange(1,7):
plt.subplot(2,3,i)# create subplots on a grid with 2 rows and 3 columns
plt.xticks([])# set no ticks on x-axis
plt.yticks([])# set no ticks on y-axis
plt.plot(np.sin(x), np.cos(i*x))
plt.title('subplot'+'(2,3,' + str(i)+')')
plt.show()
Output:
Example :
importplotly.graph_objectsasgo
fig=go.Figure()
fig.add_trace(go.Scatter(
x=[0,1,2,3,4,5,6,7,8],
y=[0,1,3,2,4,3,4,6,5]
))
fig.add_trace(go.Scatter(
x=[0,1,2,3,4,5,6,7,8],
y=[0,4,5,1,2,2,3,4,2]
))
fig.add_annotation(x=2,y=5,
text="Text annotation with arrow",
showarrow=True,
arrowhead=1)
fig.add_annotation(x=4,y=4,
text="Text annotation without arrow",
showarrow=False,
yshift = 10)
fig.update_layout(showlegend=False)
fig.show()
Output:
4.7 Customization
• A tick is a short line on an axis. For category axes, ticks separate each category. For
value axes, ticks mark the major divisions and show the exact point on an axis that
the axis label defines. Ticks are always the same color and line style as the axis.
• Ticks are the markers denoting data points on axes. Matplotlib's default tick
locators and formatters are designed to be generally sufficient in many common
situations. Position and labels of ticks can be explicitly mentioned to suit specific
requirements.
• Fig. 5.9.1 shows ticks.
• Ticks come in two types: major and minor.
a) Major ticks separate the axis into major units. On category axes, major ticks are
the only ticks available. On value axes, one major tick appears for every major axis
division.
b) Minor ticks subdivide the major tick units. They can only appear on value axes.
One minor tick appears for every minor axis division.
• By default, major ticks appear for value axes. xticks is a method, which can be used
to get or to set the current tick locations and the labels.
• The following program creates a plot with both major and minor tick marks,
customized to be thicker and wider than the default, with the major tick marks point
into and out of the plot area.
importnumpyasnp
importmatplotlib.pyplotasplt
# A selection of functions on rnabcissa points for 0 <= x < 1
rn=100
rx=np.linspace(0,1,rn, endpoint=False)
deftophat(rx):
"""Top hat function: y = 1 for x < 0.5, y=0 for x >= 0.5"""
ry=np.ones(rn)
ry[rx>=0.5]=0
returnry
# A dictionary of functions to choose from
ry={half-sawtooth':lambdarx:rx.copy(),
'top-hat':tophat,
'sawtooth':lambdarx:2*np.abs(rx-0.5)}
# Repeat the chosen function nrep times
nrep=4
x=np.linspace (0,nrep,nrep*rn, endpoint=False)
y=np.tile(ry['top-hat'] (rx), nrep)
fig=plt.figure()
ax=fig.add_subplot(111)
ax.plot(x,y,'k',lw=2)
# Add a bit of padding around the plotted line to aid visualization
ax.set_ylim(-0.1,1.1)
ax.set_xlim(x[0]-0.5,x[-1]+0.5)
# Customize the tick marks and turn the grid on
ax.minorticks_on()
ax.tick_params (which='major',length=10, width=2,direction='inout')
ax.tick_params(which='minor',length=5,width=2, direction='in')
ax.grid(which='both')
plt.show()
Output:
Example :
fig=plt.figure(figsize=(8,8))
ax=plt.axes(projection='3d')
ax.grid()
t=np.arange(0,10*np.pi,np.pi/50)
x=np.sin(t)
y=np.cos(t)
ax.plot3D(x,y,t)
ax.set_title('3D Parametric Plot')
# Set axes label
ax.set_xlabel('x',labelpad=20)
ax.set_ylabel('y', labelpad=20)
ax.set_zlabel('t', labelpad=20)
plt.show()
Output:
4.9 Geographic Data with Basemap
• Basemap is a toolkit under the Python visualization library Matplotlib. Its main
function is to draw 2D maps, which are important for visualizing spatial data.
Basemap itself does not do any plotting, but provides the ability to transform
coordinates into one of 25 different map projections.
• Matplotlib can also be used to plot contours, images, vectors, lines or points in
transformed coordinates. Basemap includes the GSSH coastline dataset, as well as
datasets from GMT for rivers, states and national boundaries.
• These datasets can be used to plot coastlines, rivers and political boundaries on a
map at several different resolutions. Basemap uses the Geometry Engine-Open
Source (GEOS) library at the bottom to clip coastline and boundary features to the
desired map projection area. In addition, basemap provides the ability to read
shapefiles.
• Basemap cannot be installed using pip install basemap. If Anaconda is installed,
you can install basemap using canda install basemap.
• Example objects in basemap:
a) contour(): Draw contour lines.
b) contourf(): Draw filled contours.
c) imshow(): Draw an image.
d) pcolor(): Draw a pseudocolor plot.
e) pcolormesh(): Draw a pseudocolor plot (faster version for regular meshes).
f) plot(): Draw lines and/or markers.
g) scatter(): Draw points with markers.
h) quiver(): Draw vectors.(draw vector map, 3D is surface map)
i) barbs(): Draw wind barbs (draw wind plume map)
j) drawgreatcircle(): Draw a great circle (draws a great circle route)
• For example, if we wanted to show all the different types of endangered plants
within a region, we would use a base map showing roads, provincial and state
boundaries, waterways and elevation. Onto this base map, we could add layers that
show the location of different categories of endangered plants. One added layer could
be trees, another layer could be mosses and lichens, another layer could be grasses.
Basemap basic usage:
import warnings
warnings.filterwarmings('ignore')
frommpl_toolkits.basemap import Basemap
importmatplotlib.pyplot as plt
map = Basemap()
map.drawcoastlines()
# plt.show()
plt.savefig('test.png')
Output:
4.10 Visualization with Seaborn
• Seaborn is a Python data visualization library based on Matplotlib. It provides a
high-level interface for drawing attractive and informative statistical graphics.
Seaborn is an open- source Python library.
• Seaborn helps you explore and understand your data. Its plotting functions operate
on dataframes and arrays containing whole datasets and internally perform the
necessary semantic mapping and statistical aggregation to produce informative plots.
• Its dataset-oriented, declarative API. User should focus on what the different
elements of your plots mean, rather than on the details of how to draw them.
• Keys features:
a) Seaborn is a statistical plotting library
b) It has beautiful default styles
c) It also is designed to work very well with Pandas dataframe objects.
Seaborn works easily with dataframes and the Pandas library. The graphs created
can also be customized easily.
• Functionality that seaborn offers:
a) A dataset-oriented API for examining relationships between multiple variables
b) Convenient views onto the overall structure of complex datasets
c) Specialized support for using categorical variables to show observations or
aggregate statistics
d) Options for visualizing univariate or bivariate distributions and for comparing
them between subsets of data
e) Automatic estimation and plotting of linear regression models for different kinds of
dependent variables
f) High-level abstractions for structuring multi-plot grids that let you easily build
complex visualizations
g) Concise control over matplotlib figure styling with several built-in themes
h) Tools for choosing color palettes that faithfully reveal patterns in your data.
Plot a Scatter Plot in Seaborn :
importmatplotlib.pyplot as plt
importseaborn as sns
import pandas as pd
df = pd.read_csv('worldHappiness2016.csv').
sns.scatterplot(data= df, x = "Economy (GDP per Capita)", y =
plt.show()
Output:
• Matplotlib can also be used to plot contours, images, vectors, lines or points in
transformed coordinates. Basemap includes the GSSH coastline dataset, as well as
datasets from GMT for rivers, states and national boundaries.