0% found this document useful (0 votes)
2 views

Unit 2 Machine Learning

This document provides an overview of data visualization techniques using the Matplotlib library in Python, focusing on controlling line properties, creating multiple plots, adding text and annotations, styling plots, and generating box plots and heatmaps. It includes code examples for various functionalities such as customizing line charts, creating subplots, and visualizing data distributions. Additionally, it discusses the use of different styles and the creation of scatter plots with histograms.

Uploaded by

Nischal Ghimire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit 2 Machine Learning

This document provides an overview of data visualization techniques using the Matplotlib library in Python, focusing on controlling line properties, creating multiple plots, adding text and annotations, styling plots, and generating box plots and heatmaps. It includes code examples for various functionalities such as customizing line charts, creating subplots, and visualizing data distributions. Additionally, it discusses the use of different styles and the creation of scatter plots with histograms.

Uploaded by

Nischal Ghimire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Unit 1

Data Mining and Data Visualization


Controlling Line Properties of Charts
Matplotlib is a data visualization library in Python. The pyplot, a sublibrary of
Matplotlib, is a collection of functions that helps in creating a variety of charts. Line
charts can be created simply by using plot() method of pyplot library.
import matplotlib.pyplot as plt
x=[1,2,3,4,5,6,7]
y=[3,5,7,9,11,13,15]
plt.plot(x,y)
plt.xlabel("x")
plt.ylabel("y")
plt.title("Line Chart Example")
plt.show()

There are many properties of a line that can be set, such as the color, dashes etc. There are
essentially three ways of doing this: using keyword arguments, using setter methods, and
using setp() command.
Using Keyword Arguments
Keyword arguments (or named arguments) are values that, when passed into a function,
are identified by specific parameter names. These arguments can be sent using key = value
syntax. We can use keyword arguments to change default value of properties of line
charts as below. Major keyword arguments supported by plot() methods are: linewidth,
color, linestyle, label, alpha, etc.
import matplotlib.pyplot as plt
x=[1,2,3,4,5,6,7]
y=[3,5,7,9,11,13,15]
plt.plot(x,y, linewidth=4, linestyle="--", color="red", label="y=2x+1")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Line Chart Example")
plt.legend(loc='upper center')
plt.show()
Using Setter Methods
The plot function returns the list of line objects, for example line,=plot(x,y) returns single
line object and line1, line2 =plot(x1,y1,x2,y2) returns list of multiple line objects. Then,
using the setter methods of line objects we can define the property that needs to be set.
Major setter methods supported by plot() method are set_label(),set_linewidth(),
set_linestyle(), set_color() etc.

Example
import matplotlib.pyplot as plt
x=[1,2,3,4,5,6,7]
y=[3,5,7,9,11,13,15]
line,=plt.plot(x,y)
print(line)
line.set_label("y=2x+1")
line.set_linewidth(4)
line.set_linestyle("-")
line.set_color("green")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Line Chart Example")
plt.legend(loc='upper center')
plt.show()

Using setp Command


The setp() function in pyplot module of matplotlib library can also used to set the
properties of line objects. We can either use python keyword arguments or
string/value pairs to set properties of line objects.
Example
import matplotlib.pyplot as plt
x=[1,2,3,4,5,6,7]
y=[3,5,7,9,11,13,15]
line,=plt.plot(x,y)
plt.setp(line,linewidth=4,linestyle="dashdot",label="y=2x+1",color="red")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Line Chart Example")
plt.legend(loc='upper center')
plt.show()
Creating Multiple Plots
One very useful feature of matplotlib is that it makes it easy to plot multiple plots,
which can be compared to each other. In Matplotlib, we can achieve this using the
subplots() function. The subplots() function creates a grid of subplots within a single
figure. We can specify the number of rows and columns in the grid, as well as the figure
number.
plt.subplots(211)

A subplot with a value of 211 means that there will be two rows, one column, and one
figure.
Example
import matplotlib.pyplot as plt
import numpy as np
x=[0,0.52,1.04,1.57,2.09,2.62,3.14]
y=np.sin(x)
plt.subplot(211)
plt.plot(x,y,linestyle="dashed",linewidth=2,label="sin(x)")
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Sin(x) vs. Cos(x) Curve")
plt.legend(loc='upper left')
plt.subplot(212)
y=np.cos(x)
plt.plot(x,y,linestyle="dashed", color="red",linewidth=2,label="cos(x)")
plt.xlabel("x")
plt.ylabel("cos(x)")
plt.legend(loc='upper center')
plt.show()

Playing With Text


The matplotlib.pyplot.text() function is used to add text inside the plot. It adds text at an
arbitrary location of the axes. It also supports mathematical expressions.
import matplotlib.pyplot as plt
import numpy as np

#Generate Data for Parabola


x = np.arange(-20, 21, 1)
y = 2*x**2
#adding text inside the plot
plt.text(-10,400 , 'Parabola Y =2x^2', fontsize = 20)
plt.plot(x, y, color='green')
plt.xlabel("x")
plt.ylabel("y=2x^2")
plt.show()

We can also add mathematical equations as text inside the plot by


following LaTex syntax. This can be done by enclosing text in $
symbol.
Example
import matplotlib.pyplot as plt
import numpy as np

#Generate
x = np.arange(-20, 21, 1)
y = 2*x**2

#adding text inside the plot


plt.text(-10,400 , 'Parabola $Y =2x^2$', fontsize = 20)
plt.plot(x, y, color='green')
plt.xlabel("x")
plt.ylabel("y=2x^2")
plt.show()

We can use text() method to display text over columns in a bar chart so that we could
place text at a specific location of the bars column.

Example
import matplotlib.pyplot as plt
x = ['A', 'B', 'C', 'D', 'E']
y = [1, 3, 2, 5, 4]
percentage = [10, 30, 20, 50, 40]
plt.figure(figsize=(3,4))
plt.bar(x, y)
for i in range(len(x)):
plt.text(x[i], y[i], percentage[i])
plt.show()
The annotate() function in pyplot module of matplotlib library is used to annotate the
point xy with specified text. In order to add text annotations to a matplotlib chart we need
to set at least, the text, the coordinates of the plot to be highlighted with an arrow (xy),
the coordinates of the text (xytext) and the properties of the arrow (arrowprops).
Example
import numpy as np
import matplotlib.pyplot as plt
x=np.arange(0,10,0.25)
y=np.sin(x)
plt.plot(x,y)
plt.annotate('Minimum',xy = (4.75, -1),xytext = (4.75, 0.2),
arrowprops = dict(facecolor = 'black',width=0.2),
horizontalalignment = 'center')

plt.show()

Styling Plots
These options can be accessed by executing the command plt.style.available. This gives a
list of all the available stylesheet option names that can be used as an attribute inside
plt.style.use().
Example
import matplotlib.pyplot as plt
ls=plt.style.available
print("Number of Styples:",len(ls))
print("List of Styles:",ls)

A. ggplot is a popular data visualization package in R programming. It stands for


“Grammar of Graphics plot”. To apply ggplot styling to a plot created in Matplotlib, we
can use the following syntax:
plt.style.use('ggplot')

This style adds a light grey background with white gridlines and uses slightly larger
axis tick labels. The statement plt.style.use(‘ggplot’) can be used to apply ggplot styling
to any plot in Matplotlib.
Example
from scipy import stats
import matplotlib.pyplot as plt
dist=stats.norm(loc=150,scale=20)
data=dist.rvs(size=1000)
plt.style.use('ggplot')
plt.hist(data,bins=100,color='blue')
plt.show()

The FiveThirtyEight Style is another way of styling plots in matplotlib.pyplot. It is


based on the popular American blog FiveThirtyEight which provides economic, sports,
and political analysis. The FiveThirtyEight stylesheet in Matplotlib has gridlines on the
plot area with bold x and y ticks. The colors of the bars in Bar plot or Lines in the Line
chart are usually bright and distinguishable. Syntax of using the style is as below.
Example
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
a = [2, 3, 4, 3, 4, 5, 3]
b = [4, 5, 5, 7, 9, 8, 6]
plt.figure(figsize = (4,3))
plt.plot(a, marker='o',linewidth=1,color='blue')
plt.plot(b, marker='v',linewidth=1,color='red')
plt.show()

The dark_background stylesheet is third popular style that is based on the dark mode.
Applying this stylesheet makes the plot background black and ticks color to white, in
contrast. In the foreground, the bars and/or Lines are grey based colors to increase the
aesthetics and readability of the plot.
Example
import matplotlib.pyplot as plt
plt.style.use("dark_background")
a = [1, 2, 3, 4, 5, 6, 7]
b = [1, 4, 9, 16, 25, 36, 49]
plt.figure(figsize = (4,3))
plt.plot(a, marker='o',linewidth=1,color='blue')
plt.plot(b, marker='v',linewidth=1,color='red')
plt.show()
Box Plots
A Box plot is a way to visualize the distribution of the data by using a box and some
vertical lines. It is known as the whisker plot. The data can be distributed between five
key ranges, which are as follows:
 Minimum: Q1-1.5*IQR
 1st quartile (Q1): 25th percentile
 Median:50th percentile
 3rd quartile(Q3):75th percentile
 Maximum: Q3+1.5*IQR

Here IQR represents the InterQuartile Range which starts from the first quartile (Q1) and
ends at the third quartile (Q3). Thus, IQR=Q3-Q1.

In the box plot, those points which are out of range are called outliers. We can create the
box plot of the data to determine the following.

 The number of outliers in a dataset


 Is the data skewed or not
 The range of the data

The range of the data from minimum to maximum is called the whisker limit. In
Python, we will use the matplotlib module's pyplot module, which has an inbuilt
function named boxplot() which can create the box plot of any data set. Multiple boxes
can be created just by send list of data to the boxplot() method.
Example
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
dist = stats.norm(100, 30)
data=dist.rvs(size=500)
plt.figure(figsize =(6, 4))
plt.boxplot(data)
plt.show()

Example 2
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
dist = stats.norm(100, 50)
data1=dist.rvs(size=500)
data2=dist.rvs(size=500)
data3=dist.rvs(size=500)
plt.figure(figsize =(6, 4))
plt.boxplot([data1,data2,data3])
plt.show()

Horizontal box plots can be created by setting vert=0 while creating box plots. Boxes in
the plot can be filled by setting patch_artist=True. The boxplot function is a Python
dictionary with key values such as boxes, whiskers, fliers, caps, and median. We can also
change properties of dictionary objects by calling set method.
Example
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
dist = stats.norm(100, 50)
data=dist.rvs(size=500)
plt.figure(figsize =(6, 4))
bp=plt.boxplot(data,vert=0,patch_artist=True)
for b in bp['boxes']:
b.set(color='blue',facecolor='cyan',linewidth=2)
for w in bp['whiskers']:
w.set(linestyle='--',linewidth=1, color='green')
for f in bp['fliers']:
f.set(marker='D', color='black',alpha=1)
for m in bp['medians']:
m.set(color='yellow',linewidth=2)
for c in bp['caps']:
c.set(color='red')
plt.show()

Heatmaps
A heatmap (or heat map) is a graphical representation of data where values are depicted
by color. A simple heat map provides an immediate visual summary of information
across two axes, allowing users to quickly grasp the most important or relevant data
points. More elaborate heat maps allow the viewer to understand complex data sets. All
heat maps share one thing in common -- they use different colors or different shades of
the same color to represent different values and to communicate the relationships that
may exist between the variables plotted on the x-axis and y-axis. Usually, a darker color
or shade represents a higher or greater quantity of the value being represented in the heat
map. For instance, a heat map showing the rain distribution (range of values) of a city
grouped by month may use varying shades of red, yellow and blue. The months may be
mapped on the y axis and the rain ranges on the x axis. The lightest color (i.e., blue) would
represent the lower rainfall. In contrast, yellow and red would represent increasing
rainfall values, with red indicating the highest values.

When using matplotlib we can create a heat map with the imshow() function. In order
to create a default heat map you just need to input an array of m×n dimensions, where
the first dimension defines the rows and the second the columns of the heat map. We
can choose different colors for Heatmap using the cmap parameter. Cmap is colormap
instance or registered color map name. Some of the possible values of cmap are:
‘pink’, ‘spring’, ‘summer’, ‘autumn’, ‘winter’, ‘cool’, ‘Wistia’, ‘hot’, ‘copper’ etc.
Example
import numpy as np
import matplotlib.pyplot as plt
data = np.random.random(( 12 , 12 ))
plt.imshow( data,cmap='autumn')
plt.title( "2-D Heat Map" )
plt.show()

Heat maps usually provide a legend named color bar for better interpretation of the
colors of the cells. We can add a colorbar to the heatmap using plt.colorbar(). We can also
add the ticks and labels for our heatmap using xticks() and yticks() methods.
Example

import numpy as np
import matplotlib.pyplot as plt
teams = ["A", "B", "C", "D","E", "F", "G"]
year= ["2022", "2021", "2020", "2019", "2018", "2017", "2016"]
games_won = np.array([[82, 63,
[86, 48, 72,83, 92,
67, 70,
46, 45,
42, 64],
71],
[76, 89, 45, 43, 51, 38, 53],
[54, 56, 78, 76, 72, 80, 65],
[67, 49, 91, 56, 68, 40, 87],
[45, 70, 53, 86, 59, 63, 97],
[97, 67, 62, 90, 67, 78, 39]])
plt.figure(figsize = (4,4))
plt.imshow(games_won,cmap='spring')
plt.colorbar()
plt.xticks(np.arange(len(teams)),
labels=teams) plt.yticks(np.arange(len(year)),
labels=year) plt.title( "Games Won By Teams" )
plt.show()

We also use a heatmap to plot the correlation between columns of the dataset. We will
use correlation to find the relation between columns of the dataset.
Example
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
df=pd.DataFrame({"x":[2,3,4,5,6],"y":[5,8,9,13,15],"z":[0,4,5,6,7]})
corr=df.corr(method='pearson')
plt.figure(figsize = (4,4))
plt.imshow(corr,cmap='spring')
plt.colorbar()
plt.xticks(np.arange(len(df.columns)), labels=df.columns,rotation=65)
plt.yticks(np.arange(len(df.columns)), labels=df.columns)
plt.show()

Scatter Plots with Histograms


We can combine a simple scatter plot with histograms for each axis. These kinds of
plots help us see the distribution of the values of each axis. Sometimes when we make
scatterplot with a lot of data points, overplotting can be an issue. Overlapping data
points can make it difficult to fully interpret the data. Having marginal histograms on
the side along with the scatter plot can help with overplotting. To make the simplest
marginal plot, we provide x and y variable to Seaborn’s jointplot() function.
Example
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn import datasets
df = datasets.load_iris()
df=df.data[:,0:2]
df=pd.DataFrame({'SepalLength': df[:,0],'SepalWidth': df[:,1]})
sns.jointplot(x="SepalLength",y="SepalWidth",edgecolor="white",data=df);
plt.title("Scatter Plot with Histograms")
plt.show()

The simplest plotting method, JointGrid.plot() accepts a pair of functions. One for the
joint axes and one for both marginal axes. Some other keyword arguments accepted by
the method are listed below
 height: Size of each side of the figure in inches (it will be square).
 ratio: Ratio of joint axes height to marginal axes height.
 space: Space between the joint and marginal axes
Example
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn import datasets
df = datasets.load_iris()
df=df.data[:,0:2]
df=pd.DataFrame({'SepalLength': df[:,0],'SepalWidth': df[:,1]})
g = sns.JointGrid(data=df,
x="SepalLength",y="SepalWidth",height=4,ratio=2,space=0)
g.plot(sns.scatterplot, sns.histplot)
plt.show()

You might also like