Unit 2 Machine Learning
Unit 2 Machine Learning
There are many properties of a line that can be set, such as the color, dashes etc. There are
essentially three ways of doing this: using keyword arguments, using setter methods, and
using setp() command.
Using Keyword Arguments
Keyword arguments (or named arguments) are values that, when passed into a function,
are identified by specific parameter names. These arguments can be sent using key = value
syntax. We can use keyword arguments to change default value of properties of line
charts as below. Major keyword arguments supported by plot() methods are: linewidth,
color, linestyle, label, alpha, etc.
import matplotlib.pyplot as plt
x=[1,2,3,4,5,6,7]
y=[3,5,7,9,11,13,15]
plt.plot(x,y, linewidth=4, linestyle="--", color="red", label="y=2x+1")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Line Chart Example")
plt.legend(loc='upper center')
plt.show()
Using Setter Methods
The plot function returns the list of line objects, for example line,=plot(x,y) returns single
line object and line1, line2 =plot(x1,y1,x2,y2) returns list of multiple line objects. Then,
using the setter methods of line objects we can define the property that needs to be set.
Major setter methods supported by plot() method are set_label(),set_linewidth(),
set_linestyle(), set_color() etc.
Example
import matplotlib.pyplot as plt
x=[1,2,3,4,5,6,7]
y=[3,5,7,9,11,13,15]
line,=plt.plot(x,y)
print(line)
line.set_label("y=2x+1")
line.set_linewidth(4)
line.set_linestyle("-")
line.set_color("green")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Line Chart Example")
plt.legend(loc='upper center')
plt.show()
A subplot with a value of 211 means that there will be two rows, one column, and one
figure.
Example
import matplotlib.pyplot as plt
import numpy as np
x=[0,0.52,1.04,1.57,2.09,2.62,3.14]
y=np.sin(x)
plt.subplot(211)
plt.plot(x,y,linestyle="dashed",linewidth=2,label="sin(x)")
plt.xlabel("x")
plt.ylabel("sin(x)")
plt.title("Sin(x) vs. Cos(x) Curve")
plt.legend(loc='upper left')
plt.subplot(212)
y=np.cos(x)
plt.plot(x,y,linestyle="dashed", color="red",linewidth=2,label="cos(x)")
plt.xlabel("x")
plt.ylabel("cos(x)")
plt.legend(loc='upper center')
plt.show()
#Generate
x = np.arange(-20, 21, 1)
y = 2*x**2
We can use text() method to display text over columns in a bar chart so that we could
place text at a specific location of the bars column.
Example
import matplotlib.pyplot as plt
x = ['A', 'B', 'C', 'D', 'E']
y = [1, 3, 2, 5, 4]
percentage = [10, 30, 20, 50, 40]
plt.figure(figsize=(3,4))
plt.bar(x, y)
for i in range(len(x)):
plt.text(x[i], y[i], percentage[i])
plt.show()
The annotate() function in pyplot module of matplotlib library is used to annotate the
point xy with specified text. In order to add text annotations to a matplotlib chart we need
to set at least, the text, the coordinates of the plot to be highlighted with an arrow (xy),
the coordinates of the text (xytext) and the properties of the arrow (arrowprops).
Example
import numpy as np
import matplotlib.pyplot as plt
x=np.arange(0,10,0.25)
y=np.sin(x)
plt.plot(x,y)
plt.annotate('Minimum',xy = (4.75, -1),xytext = (4.75, 0.2),
arrowprops = dict(facecolor = 'black',width=0.2),
horizontalalignment = 'center')
plt.show()
Styling Plots
These options can be accessed by executing the command plt.style.available. This gives a
list of all the available stylesheet option names that can be used as an attribute inside
plt.style.use().
Example
import matplotlib.pyplot as plt
ls=plt.style.available
print("Number of Styples:",len(ls))
print("List of Styles:",ls)
This style adds a light grey background with white gridlines and uses slightly larger
axis tick labels. The statement plt.style.use(‘ggplot’) can be used to apply ggplot styling
to any plot in Matplotlib.
Example
from scipy import stats
import matplotlib.pyplot as plt
dist=stats.norm(loc=150,scale=20)
data=dist.rvs(size=1000)
plt.style.use('ggplot')
plt.hist(data,bins=100,color='blue')
plt.show()
The dark_background stylesheet is third popular style that is based on the dark mode.
Applying this stylesheet makes the plot background black and ticks color to white, in
contrast. In the foreground, the bars and/or Lines are grey based colors to increase the
aesthetics and readability of the plot.
Example
import matplotlib.pyplot as plt
plt.style.use("dark_background")
a = [1, 2, 3, 4, 5, 6, 7]
b = [1, 4, 9, 16, 25, 36, 49]
plt.figure(figsize = (4,3))
plt.plot(a, marker='o',linewidth=1,color='blue')
plt.plot(b, marker='v',linewidth=1,color='red')
plt.show()
Box Plots
A Box plot is a way to visualize the distribution of the data by using a box and some
vertical lines. It is known as the whisker plot. The data can be distributed between five
key ranges, which are as follows:
Minimum: Q1-1.5*IQR
1st quartile (Q1): 25th percentile
Median:50th percentile
3rd quartile(Q3):75th percentile
Maximum: Q3+1.5*IQR
Here IQR represents the InterQuartile Range which starts from the first quartile (Q1) and
ends at the third quartile (Q3). Thus, IQR=Q3-Q1.
In the box plot, those points which are out of range are called outliers. We can create the
box plot of the data to determine the following.
The range of the data from minimum to maximum is called the whisker limit. In
Python, we will use the matplotlib module's pyplot module, which has an inbuilt
function named boxplot() which can create the box plot of any data set. Multiple boxes
can be created just by send list of data to the boxplot() method.
Example
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
dist = stats.norm(100, 30)
data=dist.rvs(size=500)
plt.figure(figsize =(6, 4))
plt.boxplot(data)
plt.show()
Example 2
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
dist = stats.norm(100, 50)
data1=dist.rvs(size=500)
data2=dist.rvs(size=500)
data3=dist.rvs(size=500)
plt.figure(figsize =(6, 4))
plt.boxplot([data1,data2,data3])
plt.show()
Horizontal box plots can be created by setting vert=0 while creating box plots. Boxes in
the plot can be filled by setting patch_artist=True. The boxplot function is a Python
dictionary with key values such as boxes, whiskers, fliers, caps, and median. We can also
change properties of dictionary objects by calling set method.
Example
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
dist = stats.norm(100, 50)
data=dist.rvs(size=500)
plt.figure(figsize =(6, 4))
bp=plt.boxplot(data,vert=0,patch_artist=True)
for b in bp['boxes']:
b.set(color='blue',facecolor='cyan',linewidth=2)
for w in bp['whiskers']:
w.set(linestyle='--',linewidth=1, color='green')
for f in bp['fliers']:
f.set(marker='D', color='black',alpha=1)
for m in bp['medians']:
m.set(color='yellow',linewidth=2)
for c in bp['caps']:
c.set(color='red')
plt.show()
Heatmaps
A heatmap (or heat map) is a graphical representation of data where values are depicted
by color. A simple heat map provides an immediate visual summary of information
across two axes, allowing users to quickly grasp the most important or relevant data
points. More elaborate heat maps allow the viewer to understand complex data sets. All
heat maps share one thing in common -- they use different colors or different shades of
the same color to represent different values and to communicate the relationships that
may exist between the variables plotted on the x-axis and y-axis. Usually, a darker color
or shade represents a higher or greater quantity of the value being represented in the heat
map. For instance, a heat map showing the rain distribution (range of values) of a city
grouped by month may use varying shades of red, yellow and blue. The months may be
mapped on the y axis and the rain ranges on the x axis. The lightest color (i.e., blue) would
represent the lower rainfall. In contrast, yellow and red would represent increasing
rainfall values, with red indicating the highest values.
When using matplotlib we can create a heat map with the imshow() function. In order
to create a default heat map you just need to input an array of m×n dimensions, where
the first dimension defines the rows and the second the columns of the heat map. We
can choose different colors for Heatmap using the cmap parameter. Cmap is colormap
instance or registered color map name. Some of the possible values of cmap are:
‘pink’, ‘spring’, ‘summer’, ‘autumn’, ‘winter’, ‘cool’, ‘Wistia’, ‘hot’, ‘copper’ etc.
Example
import numpy as np
import matplotlib.pyplot as plt
data = np.random.random(( 12 , 12 ))
plt.imshow( data,cmap='autumn')
plt.title( "2-D Heat Map" )
plt.show()
Heat maps usually provide a legend named color bar for better interpretation of the
colors of the cells. We can add a colorbar to the heatmap using plt.colorbar(). We can also
add the ticks and labels for our heatmap using xticks() and yticks() methods.
Example
import numpy as np
import matplotlib.pyplot as plt
teams = ["A", "B", "C", "D","E", "F", "G"]
year= ["2022", "2021", "2020", "2019", "2018", "2017", "2016"]
games_won = np.array([[82, 63,
[86, 48, 72,83, 92,
67, 70,
46, 45,
42, 64],
71],
[76, 89, 45, 43, 51, 38, 53],
[54, 56, 78, 76, 72, 80, 65],
[67, 49, 91, 56, 68, 40, 87],
[45, 70, 53, 86, 59, 63, 97],
[97, 67, 62, 90, 67, 78, 39]])
plt.figure(figsize = (4,4))
plt.imshow(games_won,cmap='spring')
plt.colorbar()
plt.xticks(np.arange(len(teams)),
labels=teams) plt.yticks(np.arange(len(year)),
labels=year) plt.title( "Games Won By Teams" )
plt.show()
We also use a heatmap to plot the correlation between columns of the dataset. We will
use correlation to find the relation between columns of the dataset.
Example
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
df=pd.DataFrame({"x":[2,3,4,5,6],"y":[5,8,9,13,15],"z":[0,4,5,6,7]})
corr=df.corr(method='pearson')
plt.figure(figsize = (4,4))
plt.imshow(corr,cmap='spring')
plt.colorbar()
plt.xticks(np.arange(len(df.columns)), labels=df.columns,rotation=65)
plt.yticks(np.arange(len(df.columns)), labels=df.columns)
plt.show()
The simplest plotting method, JointGrid.plot() accepts a pair of functions. One for the
joint axes and one for both marginal axes. Some other keyword arguments accepted by
the method are listed below
height: Size of each side of the figure in inches (it will be square).
ratio: Ratio of joint axes height to marginal axes height.
space: Space between the joint and marginal axes
Example
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn import datasets
df = datasets.load_iris()
df=df.data[:,0:2]
df=pd.DataFrame({'SepalLength': df[:,0],'SepalWidth': df[:,1]})
g = sns.JointGrid(data=df,
x="SepalLength",y="SepalWidth",height=4,ratio=2,space=0)
g.plot(sns.scatterplot, sns.histplot)
plt.show()