Iml Unit 2
Iml Unit 2
DEVANSHI DAVE
)
2.1 Numpy:
What is NumPy?
In Python we have lists that serve the purpose of arrays, but they are slow to process.
NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.
The array object in NumPy is called ndarray, it provides a lot of supporting functions that
make working with ndarray very easy.
Arrays are very frequently used in data science, where speed and resources are very
important.
NumPy arrays are stored at one continuous place in memory unlike lists, so processes can
access and manipulate them very efficiently.
This behavior is called locality of reference in computer science.
This is the main reason why NumPy is faster than lists. Also it is optimized to work with
latest CPU architectures.
Example:
import numpy as np
print(arr)
print(type(arr))
Output:
Page 1
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
[1 2 3 4 5]
<class 'numpy.ndarray'>
type(): This built-in Python function tells us the type of the object passed to it. Like in above
code it shows that arr is numpy.ndarray type.
To create an ndarray, we can pass a list, tuple or any array-like object into the array() method,
and it will be converted into an ndarray:
import numpy as np
arr = op.array(42)
print(arr)
Output:
42
► 1-D Arrays
• An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
• These are the most common and basic arrays.
• Example: Create a 1-D array containing the values 1,2,3,4,5
import numpy as np
arr= np.array([l, 2, 3, 4, 5])
print(arr)
Output:
[1 2 3 4 5]
► 2-D Arrays
• An array that has 1-D arrays as its elements is called a 2-D array.
• These are often used to represent matrix or 2nd order tensors.
• NumPy has a whole sub module dedicated towards matrix operations called numpy.mat
• Example: Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6
Page 2
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
import numpy as np
arr= np.array([[l, 2, 3], [4, 5, 6]])
print(arr)
Output:
[[123]
[4 5 6]]
► 3-D arrays
• An array that has 2-D arrays (matrices) as its elements is called 3-D array.
• These are often used to represent a 3rd order tensor.
Example: Create a 3-D array with two 2-D arrays, both containing two arrays with the
values 1,2,3 and 4,5,6
import numpy as np
arr= np.array([[[l, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Output:
[[[1 2 3]
[4 5 6]]
[[l 2 3]
[4 5 6]]]
print(arr[1])
Output:
2
Page 3
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
► Access 2-D Arrays
• To access elements from 2-D arrays we can use comma separated integers representing
the dimension and the index of the element.
• Think of 2-D arrays like a table with rows and columns, where the dimension represents
the row and the index represents the column.
• Example:Access the element on the first row, second column:
import numpy as np
Output:
2nd element on 1st dim: 2
► Access 3-D Arrays
• To access elements from 3-D arrays we can use comma separated integers
representing the dimensions and the index of the element.
• Example:Access the third element of the second array of the first array:
import numpy as np
arr= np.array([[[l, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])
Output:
6
► Negative Indexing
Output:
Page 4
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Last element from 2nd dim: 10
Where,
import numpy as np
# input array
a= np.array([l, 2, 3])
b = op.array([4, 5, 6])
Output:
array([[l, 2, 3],
[4, 5, 6)1)
► array_split()
• Splitting is reverse operation of Joining.
• Joining merges multiple arrays into one and Splitting breaks one array into multiple.
Page 5
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
• We use array_split() for splitting arrays, we pass it the array we want to split and
the number of splits.
newarr = np.array_split(arr, 3)
print(newarr)
Output:
[array([l, 2]), array([3, 4]), array([5, 6])]
import numpy as np
arr= np.array([[l,2], (3, 4], (5, 6], (7, 8], (9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
print(newarr)
Output:
[array([[l, 2],
[3, 4]]), array([[5, 6],
[7, 8]]), array([[ 9, 10],
[11, 12]])]
• In addition, you can specify which axis you want to do the split around.
• The example below also returns three 2-D arrays, but they are split along the row (axis=l).
Example:Split the 2-D array into three 2-D arrays along rows.
import numpy as np
arr= np.array([[l, 2, 3], (4, 5, 6], (7, 8, 9], [10, 11, 12], (13, 14, 15], (16, 17, 18]])
Page 6
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
print(newarr)
Output:
[array([[ 1],
[ 4],
[ 7],
[10],
[13],
[16] ]), array([[ 2],
[ 5],
[ 8],
[11],
[14],
[17] ]), array([[ 3],
[ 6],
[ 9],
[12],
[15],
[18] ])]
arr= np.array([[l, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])
newarr = np.hsplit(arr, 3)
print(newarr)
Output:
[array([[ 1],
[ 4],
[ 7],
[10],
[13],
[16] ]), array([[ 2],
[ 5],
Page 7
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
[ 8],
[11],
[14],
[17] ]), array([[ 3],
[ 6],
[ 9],
[12],
[15],
[18] ])]
Addition add() +
Subtraction subtract() -
Multiplication multiply() *
Division divide() I
Exponentiation power() **
Modulus mod() %
Example 1:
import numpy as np
output:
Page 8
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Example 2:
import numpy as np
print("Add:")
print(np.add(l.0, 4.0))
print("Subtract:")
print(np.subtract(l.0, 4.0))
print("Multiply:")
print(np.multiply(l.0, 4.0))
print("Divide:")
print(np.divide(l.0, 4.0))
output:
Add:
5.0
Subtract:
-3.0
Multiply:
4.0
Divide:
0.25
Function Description
Page 9
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
• numpy.mean() function
The numpy.mean() function is used to compute the arithmetic mean along the specified axis.
The mean is calculated over the flattened array by default, otherwise over the specified axis.
Syntax:
numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>)
Page 10
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Example:
import numpy as np
Arr= np.array([[l0,20,30],[70,80,90]])
print("Array is:")
print(Arr)
output:
Array is:
[[10 20 30]
[70 80 90]]
• numpy.median() function
The numpy.median() function is used to compute the median along the specified axis. The median
is calculated over the flattened array by default, otherwise over the specified axis.
Page 11
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Syntax
numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)
Parameters
axis Optional. Specify axis or axes along which the medians are
computed. The default is to compute the median of the flattened
array.
out Optional. Specify output array for the result. The default is
None. If provided, it must have the same shape as output.
overwrite_input Optional. If True, the input array will be modified. If
overwrite_input is True and a is not already an ndarray, an error
will be raised. Default is False.
keepdims Optional. If this is set to True, the reduced axes are left in the result
as dimensions with size one. With this option, the result will
broadcast correctly against the input array.
Example:
In the example below, median() function is used to calculate median of all values present in
the array. When axis parameter is provided, median is calculated over the specified axes.
import numpy as np
Arr= np.array([[l0,20,500],[30,40,400], [100,200,300]])
print("Array is:")
print(Arr)
Page 12
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
output:
Array is:
[[ 10 20 500]
[ 30 40 400]
[100 200 300]]
• numpy.average() function
The numpy.average() function is used to compute the weighted average along the specified axis.
The syntax for using this function is given below:
Syntax
numpy.average(a, axis=None, weights=None, returned=False)
Parameters
weight Optional. Specify an array of weights associated with the values in a. The
weights array can either be 1-D (in which case its length must be the size of a
along the given axis) or of the same shape as a. If weights=None, then all data
in a are assumed to have a weight equal to one.
returned Optional. Default is False. If True, the tuple (average, sum_of_weights) is
Page 13
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Example:
In the example below, average() function is used to calculate average of all values present
in the array. When axis parameter is provided, averaging is performed over the specified
axes.
Example:
import numpy as np
Arr= np.array([[l0,20,30],[70,80,90]])
print("Array is:")
print(Arr)
output:
Array is:
[[10 20 30]
[70 80 90]]
• numpy.std() function
Page 14
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
The numpy.std() function is used to compute the standard deviation along the specified axis.
The standard deviation is defined as the square root of the average of the squared deviations
from the mean. Mathematically, it can be represented as:
Syntax
numpy.std(a, axis=None, dtype=None, out=None, keepdims=<no value>)
""
P arameters
Required. Specify the input array. -I
a
)
axis Optional. Specify axis or axes along which the standard evia tion is
d
calculated. The default, axis=None, computes the standard deviation of the
flattened array.
dtype Optional. Specify the type to use in computing the standard deviation. For
arrays of integer type the default is float64, for arrays of float types it is the
same as the array type.
out Optional. Specify the output array in which to place the result. It must have
the same shape as the expected output.
keepdims Optional. If this is set to True, the reduced axes are left in the result as
dimensions with size one. With this option, the result will broadcast
correctly against the input array.
Example:
Here, std() function is used to calculate standard deviation of all values present in the array.
But, when axis parameter is provided, standard deviation is calculated over the specified
axes as shown in the example below.
import numpy as np
Arr= np.array([[l0,20,30],[70,80,90]])
print("Array is:")
print(Arr)
Page 15
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
output:
Array is:
[[10 20 30]
[70 80 90]]
• numpy.var() function
The numpy.var() function is used to compute the variance along the specified axis. The variance is
a measure of the spread of a distribution. The variance is computed for the flattened array by
default, otherwise over the specified axis.
Syntax
numpy.var(a, axis=None, dtype=None, out=None, keepdims=<no value>)
Parameters
Page 16
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
out Optional. Specify the output array in which to place the result. It must have the
same shape as the expected output.
keepdims Optional. If this is set to True, the reduced axes are left in the result as
dimensions with size one. With this option, the result will broadcast correctly
against the input array.
Example:
In the example below, var() function is used to calculate variance of all values present in the
array. When axis parameter is provided, variance is calculated over the specified axes.
import numpy as np
Arr= np.array([[l0,20,30],[70,80,90]])
print("Array is:")
print(Arr)
output:
Array is:
[[10 20 30]
[70 80 90]]
• numpy.amax() function
Page 17
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
The NumPy amax() function returns the maximum of an array or maximum along the specified axis.
Syntax:
numpy.amax(a, axis=None, out=None, keepdims=<no value>)
Parameters
axis Optional. Specify axis or axes along which to operate. The default, axis=None,
operation is performed on flattened array.
out Optional. Specify the output array in which to place the result. It must have the same
shape as the expected output.
keepdims Optional. If this is set to True, the reduced axes are left in the result as dimensions
with size one. With this option, the result will broadcast correctly against the input
array. (; \,,_
'
Example:
In the example below, amax() function is used to calculate maximum of an array. When axis
parameter is provided, maximum is calculated over the specified axes.
import numpy as np
Arr= np.array([[l0,20,30],[70,80,90]])
print("Array is:")
print(Arr)
output:
Array is:
Page 18
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
[[10 20 30]
[70 80 90]]
• numpy.amin() function
The NumPy amin() function returns the minimum of an array or minimum along the specified axis.
Syntax
numpy.amin(a, axis=None, out=None, keepdims=<no value>)
Parameters
axis Optional. Specify axis or axes along which to operate. The default,
axis=None, operation is performed on flattened array.
out Optional. Specify the output array in which to place the result. It must
have the same shape as the expected output.
keepdims Optional. If this is set to True, the reduced axes are left in the result as
dimensions with size one. With this option, the result will broadcast
correctly against the input array.
/
Example:
In the example below, amin() function is used to calculate minimum of an array. When axis
parameter is provided, minimum is calculated over the specified axes.
import numpy as np
Arr= np.array([[l0,20,30],[70,80,90]])
print("Array is:")
print(Arr)
Page 19
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
#minimum of all values
print("\nMinimum of all values:", np.amjn(Arr))
output:
Array is:
[[10 20 30]
[70 80 90]]
• numpy.ptp() function
The NumPy ptp() function returns range of values (maximum - minimum) of an array or range
of values along the specified axjs_
The name of the function comes from the acronym for peak to peak.
Syntax:
numpy.ptp(a, axis=None, out=None, keepdims=<no value>)
Parameters
a Required. Specify the input array.
axis Optional. Specify axis or axes along which to operate. The default,
axis=None, operation is performed on flattened array.
out Optional. Specify the output array in which to place the result. It must have
the same shape as the expected output.
Page 20
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
keepdims Optional. If this is set to True, the reduced axes are left in the result
as dimensions with size one. With this option, the result will
broadcast correctly against the input array.
Example
In the example below, ptp() function is used to calculate the range of values present in the
array. When axis parameter is provided, it is calculated over the specified axes.
import numpy as np
Arr= np.array([[l0,20,30],[70,80,90]])
print("Array is:")
print(Arr)
#range of values
print("\nRange of values:", np.ptp(Arr))
Array is:
[[10 20 30]
[70 80 90]]
Range of values: 80
• numpy.percentile() function
Page 21
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
The NumPy percentile() function returns the q-th percentile of the array elements or q-th percentile
the data along the specified axis.
Syntax:
numpy.percentile(a, q, axis=None, out=None, interpolation='linear', keepdims=False)
Parameters
interpolation Optional. Specify the interpolation method to use when the desired
percentile lies between two data points. It can take value from {'linear',
'lower', 'higher', 'midpoint', 'nearest'}
keepdims Optional. If this is set to True, the reduced axes are left in the result as
dimensions with size one. With this option, the result will broadcast
correctly against the input array.
Example:
In the example below, percentile() function returns the maximum of all values present in the
array. When axis parameter is provided, it is calculated over the specified axes.
import numpy as np
Arr= op.array([[10,20, 30],[40, 50, 60]])
print("Array is:")
print(Arr)
print()
#calculating 50th percentile point
print("50th percentile:", np.percentile(Arr, 50))
print()
#calculating (25, 50, 75) percentile points
print("[25, 50, 75] percentile:\n",
np.percentile(Arr, (25, 50, 75)))
Page 22
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
print()
#calculating 50th percentile point along axis=0
print("50th percentile (axis=0):",
np.percentile(Arr, 50, axis=0))
output:
Array is:
[[10 20 30]
[40 50 60]]
2.2 Pandas
• History of development
• In 2008, pandas development began at AQR Capital Management. By the end of 2009 it
had been open sourced, and is actively supported today by a community of like-minded
individuals around the world who contribute their valuable time and energy to help make
open source pandas possible.
• Since 2015, pandas is a NumFOCUS sponsored project. This will help ensure the success of
development of pandas as a world-class open-source project.
• Timeline
2008: Development of pandas started
2009: pandas becomes open source
2012: First edition of Python for Data Analysis is published
2015: pandas becomes a NumFOCUS sponsored project
2018: First in-person core developer sprint
• Library Highlights
Page 23
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
A fast and efficient DataFrame object for data manipulation with integrated indexing;
Tools for reading and writing data between in-memory data structures and different
formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
Intelligent data alignment and integrated handling of missing data: gain automatic label
based alignment in computations and easily manipulate messy data into an orderly form;
Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;
Columns can be inserted and deleted from data structures for size mutability;
Highly optimized for performance, with critical code paths written in Cython or C.
Python with pandas is in use in a wide variety of academic and commercial domains,
including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics,
and more.
Series: Series()
• The Pandas Series can be defined as a one-dimensional array that is capable of storing
various data types.
• We can easily convert the list, tuple, and dictionary into series using "series' method. The
row labels of series are called the index.
• A Series cannot contain multiple columns. It has the following parameter:
Page 24
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
data: It can be any list, dictionary, or scalar value.
index: The value of the index should be unique and hashable. It must be of the same length
as data. If we do not pass any index, default np.arrange(n) will be used.
dtype: It refers to the data type of series.
copy: It is used for copying the data.
► Creating a Series:
We can create a Series in two ways:
The below example creates an Empty Series type object that has no values and having default
datatype, i.e., float64.
Example:
import pandas as pd
x = pd.Series()
print (x)
output:
o Array
o Diet
o Scalar value
Page 25
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Before creating a Series, firstly, we have to import the numpy module and then use array() function
in the program. If the data is ndarray, then the passed index must be of the same length.
If we do not pass an index, then by default index of range(n) is being passed where n defines the
length of an array, i.e., [0,1,2,....range(len(array))-1].
Example:
import pandas as pd
import numpy as np
info= np.array(['P','a','n','d','a','s'])
a= pd.Series(info)
print(a)
output:
0 p
1 a
2 n
3 d
4 a
5 s
dtype: object
We can also create a Series from diet. If the dictionary object is being passed as an input and the
index is not specified, then the dictionary keys are taken in a sorted order to construct the index.
If index is passed, then values correspond to a particular label in the index will be extracted from
the dictionary.
Example:
import pandas as pd
import numpy as np
Page 26
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
a = pd.Series(info)
print (a)
Output:
X 0.0
y 1.0
z 2.0
dtype: float64
► Series Functions
There are some functions used in Series which are as follows:
· _ _I
Functions Description
'
\'
)
>
Pandas Series.maQO Map the values from two series that have a common column.
Pandas Series.std(} Calculate the standard deviation of the given set of numbers, DataFrame,
column, and rows.
.......
Dataframes: DataFrames()
• Pandas DataFrame is a widely used data structure which works with a two-dimensional
array with labeled axes (rows and columns).
• DataFrame is defined as a standard way to store data that has two different indexes, i.e., row
index and column index. It consists of the following properties:
• The columns can be heterogeneous types like int, bool, and so on.
Page 27
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
• It can be seen as a dictionary of Series structure where both the rows and columns are
indexed. It is denoted as "columns" in case of columns and "index" in case of rows.
data: It consists of different forms like ndarray, series, map, constants, lists, array.
index: The Default np.arrange(n) index is used for the row labels if no index is passed.
columns: The default syntax is np.arrange(n) for the column labels. It shows only true if no
index is passed.
Columns---.
!
Percentage
Regd.No Name
of Marks
• Create a DataFrame
o diet
Page 28
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
o Lists
o Numpy ndarrrays
o Series
Example:
Output
Empty DataFrame
Columns:[]
Index: []
Example:
import pandas as pd
# a list of strings
x = ['Python', 'Pandas']
Output
Page 29
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
0
0 Python
1 Pandas
Lists Example:
Output
ID Department
0 101 B.Sc
1 102 B.Tech
2 103 M.Tech
Series: Example:
Output:
one two
a 1.0 1
b 2.0 2
C 3.0 3
Page 30
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
d 4.0 4
e 5.0 5
f 6.0 6
g NaN 7
h NaN 8
To read the csv file as pandas.DataFrame, use the pandas function read_csv() or read_table().
The difference between read_csv() and read_table() is almost nothing. In fact, the same function is
called by the source:
The pandas function read_csv() reads in values, where the delimiter is a comma character.
You can export a file into a csv file in any modern office suite including Google Sheets.
Example:
name,age,state,point
Alice,24,NY,64
Bob,42,CA,92
Charlie,18,CA,70
Dave,68,TX,70
Ellen,24,CA,88
Frank,30,NY,57
Alice,24,NY,64
Bob,42,CA,92
Charlie,18,CA,70
Dave,68,TX,70
Ellen,24,CA,88
Frank,30,NY,57
# Load pandas
import pandas as pd
# Show dataframe
Page 31
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
print(df)
output:
Empty cells can potentially give you a wrong result when you analyze data.
• Remove Rows
One way to deal with empty cells is to remove rows that contain empty cells.
This is usually OK, since data sets can be very big, and removing a few rows will not have a big
impact on the result.
import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna()
print(new_df.to_string())
• Removing Rows
Another way of handling wrong data is to remove the rows that contains wrong data.
Page 32
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
This way you do not have to find out what to replace them with, and there is a good chance you do
not need them to do your analyses.
for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)
The duplicated() method returns a Series with True and False values that describe which rows in
the DataFrame are duplicated and not.
Use the subset parameter to specify which columns to include when looking for duplicates. By
default all columns are included.
By default, the first occurrence of two or more duplicates will be set to False.
Set the keep parameter to False to also set the first occurrence to True.
Syntax
dataframe.duplicated(subset, keep)
subset column label(s) Optional. A String, or a list, of the column names to include
when looking for duplicates. Default subset=None
(meaning no subset is specified, and all columns should be
included.
Page 33
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Example:
s = df.duplicated(subset=["name", "age"])
print(s)
output:
0 False
1 False
2 True
3 False
4 True
dtype: bool
We can use Pyplot, a submodule of the Matplotlib library to visualize the diagram on the screen.
Example:
import pandas as pd
import matplotlib.pyplot as pit
df = pd.read_csv('data.csv')
df.plot()
pit.show()
Page 34
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Duration
1750 Pulse
Maxpulse
1500 calories
1250
1000
750
500
250
0
0 20 40 60 80 100 120 140 160
2.3 Matplotlib
What is Matplotlib?
Matplotlib is a low level graph plotting library in python that serves as a visualization
utility.
Matplotlib is mostly written in python, a few segments are written in C, Objective-C and
Javascript for Platform compatibility.
Page 35
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under
the pit alias:
Example:
plt.plot(xpoints, ypoints)
pit.show()
250
200
150
100
50
0 1 2 3 4 5 6
Page 36
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Example:
import numpy as np
import matplotlib.pyplot as pit
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
pit.show()
320
v 300
O l
r o
E
Ill
CJ)
'i: 280
0
ro
u
260
240
Page 37
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Grid: grid()
With Pyplot, you can use the grid() function to add grid lines to the plot.
Example:
Result:
v 300 -+--+-------+-------+-------------+-------------------------,
Ol
ro
E
:,
en
Q)
C 280 -+--+-------+-------------------+-------+----------------------t
0
ro
u
80 90 100
110 120
Average
Pullse
Page 38
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Bars: bar()
With Pyplot, you can use the bar() function to draw bar graphs:
Example:
Draw 4 bars:
plt.bar(x,y)
pit.show()
10
0
A B C D
Histogram: hist()
Page 39
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
A histogram is a graph showingfrequency distributions.
Example: Say you ask for the height of 250 people, you might end up with a histogram like this:
The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.
Example
plt.hist(x)
pit.show()
50
40
30
20
10
0
140 150 160 170 180 190
Page 40
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Subplot: subplot()
With the subplot() function you can draw multiple plots in one figure:
Result:
10 40
35
8
30
6
25
4 20
15
2
10
0 1 2 3 0 1 2 3
Page 41
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
pie chart: pie()
With Pyplot, you can use the pie() function to draw pie charts:
Example:
plt.pie(y)
pit.show()
Result:
Page 42
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
Save the plotted images into pdf: savefig()
import pandas as pd
from matplotlib import pyplot as pit
d = {'Column 1': [i for i in range(lO)], 'Column 2': [i * i for i in range(lO)]}
df = pd.DataFrame(d)
df.plot(style=['o', 'rx'])
plt.savefig("mylmagePDF.pdf", format="pdf', bbox_inches="tight")
pit.show()
output:
so • Column 1 X
x Column 2
70
X
60
so X
40
X
30
X
20
X
10 X
• • • • •
0 • • • •
0 2
.4 6 8
Page 43
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
2.4 sklearn
What is Sklearn?
Page 44
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
iris= load_iris()
Page 45
Introduction to Machine Learning (4350702) PROF. DEVANSHI DAVE
)
# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("kNN model accuracy:", metrics.accuracy_score(y_test, y_pred))
Output:
kNN model accuracy: 0.983333333333
Predictions: ['versicolor', 'virginica']
Page 46