0% found this document useful (0 votes)

14 views58 pages

ML Journal

Uploaded by

Aishwarya Gunda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views58 pages

ML Journal

Uploaded by

Aishwarya Gunda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

1) Assignment on Practice Of Numpy Library

In [1]: import numpy as np

create numpy array containing the numbers from 1 to 10

In [2]: a=np.array([1,2,3,4,5,6,7,8,9,10])
print(a)

[ 1 2 3 4 5 6 7 8 9 10]

create a nested numpy array containing the numbers from 1 to 10

In [3]: a1=np.array([(1,2,3,5,6),[3,4,5,6,7]])
print(a1)

[[1 2 3 5 6]
[3 4 5 6 7]]

convert the python list to numpy array

In [4]: l=[1,2,3,4,5,6]
print("list",l)
arr=np.array(l)
print("array:",arr)

list [1, 2, 3, 4, 5, 6]
array: [1 2 3 4 5 6]

create 50 evenly spaced numbers between 1 and 10

In [5]: f=np.linspace(1,10,50)
print(f)

[ 1. 1.18367347 1.36734694 1.55102041 1.73469388 1.91836735

2.10204082 2.28571429 2.46938776 2.65306122 2.83673469 3.02040816
3.20408163 3.3877551 3.57142857 3.75510204 3.93877551 4.12244898
4.30612245 4.48979592 4.67346939 4.85714286 5.04081633 5.2244898
5.40816327 5.59183673 5.7755102 5.95918367 6.14285714 6.32653061
6.51020408 6.69387755 6.87755102 7.06122449 7.24489796 7.42857143
7.6122449 7.79591837 7.97959184 8.16326531 8.34693878 8.53061224
8.71428571 8.89795918 9.08163265 9.26530612 9.44897959 9.63265306
9.81632653 10. ]
1
create a 5 by 5 matrix which contains random samples from standard normal distribution

In [6]: b=np.random.random((5,5))
print(b)
print("length :",len(b))
print("max :",np.max(b))
print("min:",np.min(b))

[[0.15440899 0.42294149 0.46214078 0.21596251 0.08636295]

[0.78784204 0.7245969 0.32918659 0.32918339 0.4520661 ]
[0.24292908 0.90863825 0.31756834 0.73344836 0.90501618]
[0.6991077 0.3173603 0.4995183 0.02597746 0.9563307 ]
[0.80797122 0.57241344 0.22372026 0.93136353 0.75959641]]
length : 5
max : 0.9563306963992193
min: 0.025977463053625804

create a 20 random interger numbers b/w 1 to 100 as numpy array

In [7]: c=np.random.randint(1,100,20)
print(c)

[50 43 72 85 95 79 33 35 40 18 72 49 85 36 63 42 37 56 37 92]

given the numpy array arr ,reverse its elements and find its size

In [8]: arr=np.array([1,2,3,4,6,8,9])
print(arr)
#print(np.flip(arr))
print(np.flip(arr,0))
print("size is :",np.size(arr))

[1 2 3 4 6 8 9]
[9 8 6 4 3 2 1]
size is : 7

find the mean, median and standard deviation of the array

In [9]: array=np.array([1,2,4,5,6])
print("mean: ",np.mean(array))
print("median: ",np.median(array))
print("standard deviation :",np.std(array))

mean: 3.6
median: 4.0
standard deviation : 1.8547236990991407

2
create 3by 3 matrix with all values set to 1

In [10]: a=np.ones((3,3))
print(a)

[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]

create 3by 3 matrix with all values set to 0

In [11]: b=np.zeros((3,3))
print(b)

[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]

given two numpy arrays arr1 and arr2 ,concatenate them horizontally

In [12]: arr1=np.array([(1,3,4,7),(3,5,6,7)])
arr2=np.array([[9,10,4,5],[7,9,0,5]])
print("horizontal")
c=np.concatenate((arr1,arr2),axis=None) #horizontal
print(c)
print("vertical")
c1=np.concatenate((arr1,arr2),axis=1) #vertical
print(c1)

print("using hstack")
d=np.hstack((arr1,arr2)) #horizontal
print(d)

print("using vstack")
d1=np.vstack((arr1,arr2)) #vertical
print(d1)

horizontal
[ 1 3 4 7 3 5 6 7 9 10 4 5 7 9 0 5]
vertical
[[ 1 3 4 7 9 10 4 5]
[ 3 5 6 7 7 9 0 5]]
using hstack
[[ 1 3 4 7 9 10 4 5]
[ 3 5 6 7 7 9 0 5]]
using vstack
[[ 1 3 4 7]
[ 3 5 6 7]
[ 9 10 4 5]
[ 7 9 0 5]]

3
In [13]: arr1=np.array([1,3,4,7])
arr2=np.array([9,10,4,5])
print("horizontal")
c=np.concatenate((arr1,arr2),axis=None) #horizontal
print(c)

#print("vertical")
#c1=np.concatenate((arr1,arr2),axis=1) # not possible to print vertical
#print(c1)

horizontal
[ 1 3 4 7 9 10 4 5]

create a numpy array containing all even and odd numbers from 0 to 20

In [14]: print(np.arange(0,20,2))
print(np.arange(1,20,2))

[ 0 2 4 6 8 10 12 14 16 18]
[ 1 3 5 7 9 11 13 15 17 19]

perform element-wise multiplication of two array a and b

In [15]: a=np.array([1,2,3,4])
b=np.array([5,6,7,8])
#c=np.matmul(a,b) #total multiplication
#print(c)
#print(np.dot(a,b)) #dot product
print(np.multiply(a,b))

[ 5 12 21 32]

Reshape the numpy array into 2by 3 matrix

In [16]: a=np.array([1,3,4,56,7,8,9,9,1])
newa=a.reshape(3,3)
print(newa)

[[ 1 3 4]
[56 7 8]
[ 9 9 1]]

find the maximum and minimum values in the numpy array

4
In [17]: a=np.arange(0,30)
print(a)
print("max:", np.max(a))
print("min : ",min(a))

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29]
max: 29
min : 0

calculate the dot product of two numpy arrays x and y

In [18]: x=np.array([1,2,3,4])
y=np.array([5,6,7,8])
print(np.dot(x,y))

5
2) Assignment on Practice of Pandas Library
In [1]: import pandas as pd
import numpy as np

In [2]: from numpy.random import randn

np.random.seed(101)

In [3]: df=pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())

Out[3]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [4]: df['W']

Out[4]: A 2.706850
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64

In [5]: df[['W','Z']]

Out[5]: W Z

A 2.706850 0.503826

B 0.651118 0.605965

C -2.018168 -0.589001

D 0.188695 0.955057

E 0.190794 0.683509

In [6]: df.W

Out[6]: A 2.706850
B 0.651118
C -2.018168
D 0.188695
E 0.190794
Name: W, dtype: float64
In [7]: type(df['W'])

Out[7]: pandas.core.series.Series

6
In [8]: df['new']=df['W']+df['Y']
df

Out[8]: W X Y Z new

A 2.706850 0.628133 0.907969 0.503826 3.614819

B 0.651118 -0.319318 -0.848077 0.605965 -0.196959

C -2.018168 0.740122 0.528813 -0.589001 -1.489355

D 0.188695 -0.758872 -0.933237 0.955057 -0.744542

E 0.190794 1.978757 2.605967 0.683509 2.796762

In [9]: df.drop('new',axis=1)

Out[9]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [10]: df

Out[10]: W X Y Z new

A 2.706850 0.628133 0.907969 0.503826 3.614819

B 0.651118 -0.319318 -0.848077 0.605965 -0.196959

C -2.018168 0.740122 0.528813 -0.589001 -1.489355

D 0.188695 -0.758872 -0.933237 0.955057 -0.744542

E 0.190794 1.978757 2.605967 0.683509 2.796762

In [11]: df.drop('new',axis=1,inplace=True)
df

Out[11]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [12]: df.drop('E',axis=0)

Out[12]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057 7

In [13]: df.loc['A']

Out[13]: W 2.706850
X 0.628133
Y 0.907969
Z 0.503826
Name: A, dtype: float64

In [14]: df.iloc[2]

Out[14]: W -2.018168
X 0.740122
Y 0.528813
Z -0.589001
Name: C, dtype: float64

In [15]: df.loc['B','Y']

Out[15]: -0.8480769834036315

In [16]: df.loc[['A','B'],['W','Y']]

Out[16]: W Y

A 2.706850 0.907969

B 0.651118 -0.848077

In [17]: df

Out[17]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [18]: df>0

Out[18]: W X Y Z

A True True True True

B True False False True

C False True True False

D True False False True

E True True True True

8
In [19]: df[df>0]

Out[19]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 NaN NaN 0.605965

C NaN 0.740122 0.528813 NaN

D 0.188695 NaN NaN 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [20]: df[df['W']>0]

Out[20]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

In [21]: df[df['W']>0]['Y']

Out[21]: A 0.907969
B -0.848077
D -0.933237
E 2.605967
Name: Y, dtype: float64

In [22]: df[df['W']>0][['Y','X']]

Out[22]: Y X

A 0.907969 0.628133

B -0.848077 -0.319318

D -0.933237 -0.758872

E 2.605967 1.978757

In [23]: df[(df['W']>0)&(df['Y']>1)]

Out[23]: W X Y Z

E 0.190794 1.978757 2.605967 0.683509

In [24]: df

Out[24]: W X Y Z

A 2.706850 0.628133 0.907969 0.503826

B 0.651118 -0.319318 -0.848077 0.605965

C -2.018168 0.740122 0.528813 -0.589001

D 0.188695 -0.758872 -0.933237 0.955057

E 0.190794 1.978757 2.605967 0.683509

9
In [25]: df.reset_index()

Out[25]: index W X Y Z

0 A 2.706850 0.628133 0.907969 0.503826

1 B 0.651118 -0.319318 -0.848077 0.605965

2 C -2.018168 0.740122 0.528813 -0.589001

3 D 0.188695 -0.758872 -0.933237 0.955057

4 E 0.190794 1.978757 2.605967 0.683509

In [26]: newind='CA NY WY OR CO'.split()

In [27]: df['States']=newind
df

Out[27]: W X Y Z States

A 2.706850 0.628133 0.907969 0.503826 CA

B 0.651118 -0.319318 -0.848077 0.605965 NY

C -2.018168 0.740122 0.528813 -0.589001 WY

D 0.188695 -0.758872 -0.933237 0.955057 OR

E 0.190794 1.978757 2.605967 0.683509 CO

In [28]: df.set_index('States')

Out[28]: W X Y Z

States

CA 2.706850 0.628133 0.907969 0.503826

NY 0.651118 -0.319318 -0.848077 0.605965

WY -2.018168 0.740122 0.528813 -0.589001

OR 0.188695 -0.758872 -0.933237 0.955057

CO 0.190794 1.978757 2.605967 0.683509

In [29]: df

Out[29]: W X Y Z States

A 2.706850 0.628133 0.907969 0.503826 CA

B 0.651118 -0.319318 -0.848077 0.605965 NY

C -2.018168 0.740122 0.528813 -0.589001 WY

D 0.188695 -0.758872 -0.933237 0.955057 OR

E 0.190794 1.978757 2.605967 0.683509 CO

10
In [30]: df.set_index('States',inplace=True)
df

Out[30]: W X Y Z

States

CA 2.706850 0.628133 0.907969 0.503826

NY 0.651118 -0.319318 -0.848077 0.605965

WY -2.018168 0.740122 0.528813 -0.589001

OR 0.188695 -0.758872 -0.933237 0.955057

CO 0.190794 1.978757 2.605967 0.683509

In [31]: outside=['G1','G1','G1','G2','G2','G2']
inside=[1,2,3,1,2,3]
hier_index=list(zip(outside,inside))
hier_index=pd.MultiIndex.from_tuples(hier_index)

In [32]: hier_index

Out[32]: MultiIndex([('G1', 1),

('G1', 2),
('G1', 3),
('G2', 1),
('G2', 2),
('G2', 3)],
)

In [33]: df=pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df

Out[33]: A B

1 0.302665 1.693723

G1 2 -1.706086 -1.159119

3 -0.134841 0.390528

1 0.166905 0.184502

G2 2 0.807706 0.072960

3 0.638787 0.329646

In [34]: df.loc['G1']

Out[34]: A B

1 0.302665 1.693723

2 -1.706086 -1.159119

3 -0.134841 0.390528

In [35]: df.loc['G1'].loc[1]

Out[35]: A 0.302665
B 1.693723
Name: 1, dtype: float64
11
In [36]: df.index.names

Out[36]: FrozenList([None, None])

In [37]: df.index.names=['Group','Num']
df

Out[37]: A B

Group Num

1 0.302665 1.693723

G1 2 -1.706086 -1.159119

3 -0.134841 0.390528

1 0.166905 0.184502

G2 2 0.807706 0.072960

3 0.638787 0.329646

In [38]: df.xs('G1')

Out[38]: A B

Num

1 0.302665 1.693723

2 -1.706086 -1.159119

3 -0.134841 0.390528

In [40]: df.xs(('G1',1))

Out[40]: A 0.302665
B 1.693723
Name: (G1, 1), dtype: float64

In [41]: df.xs(1,level='Num')

Out[41]: A B

Group

G1 0.302665 1.693723

G2 0.166905 0.184502

12
3) Assignment on finds Algorithm. Apply on 'Enjoy
Sport Data to find Specific hypothesis for it.
In [1]: import pandas as pd
import numpy as np

Loading Dataset
In [2]: data=pd.read_csv("tennis.csv")
print(data)

outlook temp humidity windy play

0 sunny hot high False no
1 sunny hot high True no
2 overcast hot high False yes
3 rainy mild high False yes
4 rainy cool normal False yes
5 rainy cool normal True no
6 overcast cool normal True yes
7 sunny mild high False no
8 sunny cool normal False yes
9 rainy mild normal False yes
10 sunny mild normal True yes
11 overcast mild high True yes
12 overcast hot normal False yes
13 rainy mild high True no

In [3]: d=np.array(data)[:,:-1]
print("the attributes are: ",d)

the attributes are: [['sunny' 'hot' 'high' False]

['sunny' 'hot' 'high' True]
['overcast' 'hot' 'high' False]
['rainy' 'mild' 'high' False]
['rainy' 'cool' 'normal' False]
['rainy' 'cool' 'normal' True]
['overcast' 'cool' 'normal' True]
['sunny' 'mild' 'high' False]
['sunny' 'cool' 'normal' False]
['rainy' 'mild' 'normal' False]
['sunny' 'mild' 'normal' True]
['overcast' 'mild' 'high' True]
['overcast' 'hot' 'normal' False]
['rainy' 'mild' 'high' True]]

In [4]: target= np.array(data)[:,-1]

print("the target is :",target)

the target is : ['no' 'no' 'yes' 'yes' 'yes' 'no' 'yes' 'no' 'yes' 'yes' 'yes' 'ye
s' 'yes'
'no']
Find-s Algorithm
13
In [5]: def train(d, t):
specific_hypothesis = None # Initialize specific_hypothesis within the function

for i, val in enumerate(t):

if val == "yes":
specific_hypothesis = d[i].copy()
break

if specific_hypothesis is None:
return "No positive example found in the target"

for i, val in enumerate(d):

if t[i] == "yes":
for x in range(len(specific_hypothesis)):
if val[x] != specific_hypothesis[x]:
specific_hypothesis[x] = '?'

return specific_hypothesis

In [6]: print("the final hypothesis is :",train(d,target))

the final hypothesis is : ['?' '?' '?' '?']

14
4) Assignment on Candidate Elimination Algorithm.
Apply it on Dataset to Enjoy Sport find Version
Space for it.
In [1]: import pandas as pd
import numpy as np

Loading Dataset
In [2]: data=pd.read_csv("tennis.csv")
print(data)

outlook temp humidity windy play

In [3]: d=np.array(data)[:,:-1]
print("the attributes are: ",d)

the attributes are: [['sunny' 'hot' 'high' False]

In [4]: target= np.array(data)[:,-1]

print("the target is :",target)

the target is : ['no' 'no' 'yes' 'yes' 'yes' 'no' 'yes' 'no' 'yes' 'yes' 'yes' 'ye
s' 'yes'
'no']
Candidate Elimination Algorithm
15
In [5]: def learn(d, target):
specific_h = d[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)

general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h)

print(general_h)

for i, h in enumerate(d):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'

if target[i] == "no":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(d, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")

16
initialization of specific_h and general_h
['sunny' 'hot' 'high' False]
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
steps of Candidate Elimination Algorithm 1
['sunny' 'hot' 'high' False]
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
steps of Candidate Elimination Algorithm 2
['sunny' 'hot' 'high' False]
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
False]]
steps of Candidate Elimination Algorithm 3
['?' 'hot' 'high' False]
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
False]]
steps of Candidate Elimination Algorithm 4
['?' '?' 'high' False]
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
False]]
steps of Candidate Elimination Algorithm 5
['?' '?' '?' False]
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
False]]
steps of Candidate Elimination Algorithm 6
['?' '?' '?' False]
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
False]]
steps of Candidate Elimination Algorithm 7
['?' '?' '?' '?']
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
steps of Candidate Elimination Algorithm 8
['?' '?' '?' '?']
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
steps of Candidate Elimination Algorithm 9
['?' '?' '?' '?']
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
steps of Candidate Elimination Algorithm 10
['?' '?' '?' '?']
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
steps of Candidate Elimination Algorithm 11
['?' '?' '?' '?']
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
steps of Candidate Elimination Algorithm 12
['?' '?' '?' '?']
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
steps of Candidate Elimination Algorithm 13
['?' '?' '?' '?']
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
steps of Candidate Elimination Algorithm 14
['?' '?' '?' '?']
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]
Final Specific_h:
17
['?' '?' '?' '?']
Final General_h:
[['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?', '?'], ['?', '?', '?',
'?']]

18
5) Assignment on Simple Regression. Build an
application where it can predict based on year of
Experience a salary using single Variable Linear
Regression (use dataset from the Kaggle) . Display
co-efficient and intercept. Also Display MSE. Plot
model on Testing data.
In [1]: import pandas as pd

Loading Dataset
In [2]: data=pd.read_csv('salary_data.csv')

In [3]: data.head()

Out[3]: YearsExperience Salary

0 1.1 39343.0

1 1.3 46205.0

2 1.5 37731.0

3 2.0 43525.0

4 2.2 39891.0

In [4]: data.tail()

Out[4]: YearsExperience Salary

25 9.0 105582.0

26 9.5 116969.0

27 9.6 112635.0

28 10.3 122391.0

29 10.5 121872.0

In [5]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
# Column Non-Null Count Dtype

0 YearsExperience 30 non-null float64

1 Salary 30 non-null float64
dtypes: float64(2)
memory usage: 612.0 bytes

19
In [6]: data.describe()

Out[6]: YearsExperience Salary

count 30.000000 30.000000

mean 5.313333 76003.000000

std 2.837888 27414.429785

min 1.100000 37731.000000

25% 3.200000 56720.750000

50% 4.700000 65237.000000

75% 7.700000 100544.750000

max 10.500000 122391.000000

Train test split

In [7]: import sklearn
from sklearn.model_selection import train_test_split
train , test=train_test_split(data,test_size=0.3)

In [8]: x_train=train.drop('Salary',axis=1)
y_train=train['Salary']

In [9]: x_test=test.drop('Salary',axis=1)
y_test=test['Salary']

In [10]: x_test

Out[10]: YearsExperience

4 2.2

26 9.5

22 7.9

2 1.5

16 5.1

10 3.9

20 6.8

9 3.7

3 2.0

In [11]: y_test.head()

Out[11]: 4 39891.0
26 116969.0
22 101302.0
2 37731.0
16 66029.0
Name: Salary, dtype: float64
20
In [18]: from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from math import sqrt
import matplotlib.pyplot as plt

In [24]: model= LinearRegression()

In [25]: model.fit(x_train,y_train)

Out[25]: LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the
notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with
nbviewer.org.

In [26]: pred=model.predict(x_test)
pred

Out[26]: array([ 48504.9952946 , 115083.17689995, 100490.69873987, 42120.78609957,

74953.86195974, 64009.50333968, 90458.37000482, 62185.44356967,
46680.93552459])

In [27]: error=sqrt(mean_squared_error(y_test,pred))
error

Out[27]: 4882.248392297978

In [28]: print("Coefficient (slope):", model.coef_[0])

print("Intercept:", model.intercept_)

Coefficient (slope): 9120.298850047055

Intercept: 28440.337824500202

21
In [31]: plt.scatter(x_test, y_test, color='green', label='Testing data')
plt.plot(x_train, model.predict(x_train), color='red', label='Regression line')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.title('Salary vs. Years of Experience (Testing data)')
plt.legend()
plt.show()

22
6) Assignment on Multi Regression: Build an
application where it can predict price of a house
using a multiple variable Linear regression (use
Housing dataset from Kaggle). Display all the co-
efficients and MSE.
In [1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Loading Dataset ¶
In [2]: data=pd.read_csv("USA_housing.csv")

In [3]: data.head()

Out[3]:
Avg.
Avg. Area Avg. Area
Avg. Area Area Area
Number of Number of Price Address
Income House Population
Rooms Bedrooms
Age

208 Michael Ferry Apt.

0 79545.458574 5.682861 7.009188 4.09 23086.800503 1.059034e+06 674\nLaurabury, NE
3701...

188 Johnson Views Suite

1 79248.642455 6.002900 6.730821 3.09 40173.072174 1.505891e+06 079\nLake Kathleen,
CA...

9127 Elizabeth
2 61287.067179 5.865890 8.512727 5.13 36882.159400 1.058988e+06 Stravenue\nDanieltown,
WI 06482...

USS Barnett\nFPO AP
3 63345.240046 7.188236 5.586729 3.26 34310.242831 1.260617e+06
44820
USNS Raymond\nFPO
4 59982.197226 5.040555 7.839388 4.23 26354.109472 6.309435e+05
AE 0938

In [4]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 7 columns):
# Column Non-Null Count Dtype

0 Avg. Area Income 5000 non-null float64

1 Avg. Area House Age 5000 non-null float64
2 Avg. Area Number of Rooms 5000 non-null float64
3 Avg. Area Number of Bedrooms 5000 non-null float64
4 Area Population 5000 non-null float64
5 Price 5000 non-null float64
6 Address 5000 non-null object
dtypes: float64(6), object(1)
memory usage: 273.6+ KBssss

23
In [5]: data.describe()

Out[5]:
Avg. Area Avg. Area Area
Avg. Area Avg. Area Price
Number of Number of
Income House Age Population
Rooms Bedrooms

count 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5.000000e+03

mean 68583.108984 5.977222 6.987792 3.981330 36163.516039 1.232073e+06

std 10657.991214 0.991456 1.005833 1.234137 9925.650114 3.531176e+05

min 17796.631190 2.644304 3.236194 2.000000 172.610686 1.593866e+04

25% 61480.562388 5.322283 6.299250 3.140000 29403.928702 9.975771e+05

50% 68804.286404 5.970429 7.002902 4.050000 36199.406689 1.232669e+06

75% 75783.338666 6.650808 7.665871 4.490000 42861.290769 1.471210e+06

max 107701.748378 9.519088 10.759588 6.500000 69621.713378 2.469066e+06

In [6]: data.columns

Out[6]: Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
'Avg. Area Number of Bedrooms', 'Area Population', 'Price', 'Address'],
dtype='object')

In [7]: x = data[['Avg. Area Income','Avg. Area House Age','Avg. Area Number of Rooms','Avg.

In [8]: y=data['Price']

train test split

In [9]: from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.4)

In [10]: from sklearn.linear_model import LinearRegression

lm=LinearRegression()

In [11]: lm.fit(X_train,Y_train)

Out[11]: ▾ LinearRegression
LinearRegression()

In [12]: pred=lm.predict(X_test)

In [13]: pred

Out[13]: array([ 690599.91825437, 470492.34940979, 1610609.0114855 , ...,

996537.70617916, 1205219.91840424, 1306934.17865173])
In [14]: from sklearn import metrics

24
In [15]: print("MAE : ",metrics.mean_absolute_error(Y_test,pred))

MAE : 81550.25106016382

In [16]: print("MSE : ",metrics.mean_squared_error(Y_test,pred))

MSE : 10144286208.90111

In [17]: print("RMSE : ",np.sqrt(metrics.mean_squared_error(Y_test,pred)))

RMSE : 100718.84733703574

In [18]: coefficients = lm.coef_

print("Coefficients:")
for feature, coef in zip(x.columns, coefficients):
print(f"{feature}: {coef}")

Coefficients:
Avg. Area Income: 21.350766991704706
Avg. Area House Age: 167276.28633397297
Avg. Area Number of Rooms: 121482.62475230212
Avg. Area Number of Bedrooms: 1178.4271356234713
Area Population: 15.063238730521304

25
7) Assignment on Binary classification: Build
application tennis to decide on whether to play
Decision Tree classifier. Do the required data
preprocessing. Display Accuracy score, classi fication
report & confusion Matrix.
In [1]: import numpy as np
import pandas as pd
import warnings
import matplotlib as plt
import seaborn as sns

warnings.filterwarnings("ignore", category=FutureWarning, module="sklearn")

warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")

loading dataset
In [2]: data=pd.read_csv("tennis.csv")
data

Out[2]: outlook temp humidity windy play

0 sunny hot high False no

1 sunny hot high True no

2 overcast hot high False yes

3 rainy mild high False yes

4 rainy cool normal False yes

5 rainy cool normal True no

6 overcast cool normal True yes

7 sunny mild high False no

8 sunny cool normal False yes

9 rainy mild normal False yes

10 sunny mild normal True yes

11 overcast mild high True yes

12 overcast hot normal False yes

13 rainy mild high True no

In [3]: data.info()

26
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 outlook 14 non-null object
1 temp 14 non-null object
2 humidity 14 non-null object
3 windy 14 non-null bool
4 play 14 non-null object
dtypes: bool(1 ), object(4)
memory usage: 594.0+ bytes

In [4]: data.head()

Out[4]: outlook temp humidity windy play

0 sunny hot high False no

1 sunny hot high True no

2 overcast hot high False yes

3 rainy mild high False yes

4 rainy cool normal False yes

In [5]: outlook=data["outlook"].str.get_dummies(" ")

temp = data["temp"].str.get_dummies(" ")
humidity =data["humidity"].str.get_dummies(" ")
play = data["play"].str.get_dummies(" ")
windy = pd.get_dummies(data['windy'], drop_first=True) #exception
windy

Out[5]: True

0 0

1 1

2 0

3 0

4 0

5 1

6 1

7 0

8 0

9 0

10 1

11 1

12 0

13 1

In [6]: data.head()
27
Out[6]: outlook temp humidity windy play

0 sunny hot high False no

1 sunny hot high True no

2 overcast hot high False yes

3 rainy mild high False yes

4 rainy cool normal False yes

In [7]: data.drop(["outlook",'temp',"humidity","windy","play"],axis=1, inplace=True)

In [8]: data=pd.concat([outlook,temp,humidity,windy,play] , axis=1)

data.head()
data

Out[8]: overcast rainy sunny cool hot mild high normal True no yes

0 0 0 1 0 1 0 1 0 0 1 0

1 0 0 1 0 1 0 1 0 1 1 0

2 1 0 0 0 1 0 1 0 0 0 1

3 0 1 0 0 0 1 1 0 0 0 1

4 0 1 0 1 0 0 0 1 0 0 1

5 0 1 0 1 0 0 0 1 1 1 0

6 1 0 0 1 0 0 0 1 1 0 1

7 0 0 1 0 0 1 1 0 0 1 0

8 0 0 1 1 0 0 0 1 0 0 1

9 0 1 0 0 0 1 0 1 0 0 1

10 0 0 1 0 0 1 0 1 1 0 1

11 1 0 0 0 0 1 1 0 1 0 1

12 1 0 0 0 1 0 0 1 0 0 1

13 0 1 0 0 0 1 1 0 1 1 0

In [9]: from sklearn.model_selection import train_test_split

x=data.drop(['yes','no'] , axis=1)
y=data['no']
x
x.columns = x.columns.astype(str)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2,random_state=0)
X_train.columns = X_train.columns.astype(str)
X_test.columns = X_test.columns.astype(str)

In [10]: from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier(criterion='entropy')

dtc.fit(X_train, y_train)
Out[10]: ▾ DecisionTreeClassifier
DecisionTreeClassifier(criterion='entropy') 28
In [11]: pred=dtc.predict(X_test)

In [12]: from sklearn.metrics import classification_report , confusion_matrix

print(classification_report(y_test,pred))
print((confusion_matrix(y_test,pred)))

precision recall f1-score support

0 1.00 0.33 0.50 3

1 0.00 0.00 0.00 0

accuracy 0.33 3
macro avg 0.50 0.17 0.25 3
weighted avg 1.00 0.33 0.50 3

[[1 2]
[0 0]]

29
8) Assignment on Binary classification using
Perceptron. Implement Perception model. Use this
model to classify a patient is having cancer or not
(use Breast cancer dataset from sklearn). Display
Accuracy score, classification Report and
confusion matrix.
In [1]: import sklearn.datasets
import numpy as np

Loading dataset
In [2]: cancer=sklearn.datasets.load_breast_cancer()

In [3]: x=cancer.data
y=cancer.target
print(x.shape,y.shape)

(569, 30) (569,)

In [4]: import pandas as pd

data=pd.DataFrame(cancer.data,columns=cancer.feature_names)

In [5]: data['class']=cancer.target
data.head()

Out[5]: mean me
mean mean mean mean mean mean mean mean
concave frac
radius texture perimeter area smoothness compactness concavity symmetry
points dimensi

0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.078

1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.056

2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.059

3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.097

4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.058

5 rows × 31 columns

In [6]: print(data['class'].value_counts())

1 357
0 212
Name: class, dtype: int64

In [7]: print(cancer.target_names)

['malignant' 'benign']

30
In [8]: data.groupby('class').mean()

Out[8]: mean
mean mean mean mean mean mean
mean area concave
radius texture perimeter smoothness compactness concavity
sy
points
class

0 17.462830 21.604906 115.365377 978.376415 0.102898 0.145188 0.160775 0.087990 0.

1 12.146524 17.914762 78.075406 462.790196 0.092478 0.080085 0.046058 0.025717 0.

2 rows × 30 columns

Train-test Split
In [9]: from sklearn.model_selection import train_test_split
x=data.drop('class',axis=1)
y=data['class']

In [10]: type(x)

Out[10]: pandas.core.frame.DataFrame

In [11]: x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.1)
x_train=x_train.values
x_test=x_test.values

Perceptron Class

31
In [12]: from sklearn.metrics import accuracy_score
class Perceptron:
def init (self):
self.w=None
self.b=None
def model(self,X):
return 1 if(np.dot(self.w,X) >= self.b) else 0
def predict(self,X):
Y=[]
for x in X:
result=self.model(x)
Y.append(result)
return np.array(Y)
def fit(self,X,Y,epochs=1,lr=1):
self.w=np.ones(X.shape[1])
self.b=0

accuracy={}
max_accuracy=0

wt_matrix=[]

for i in range(epochs):
for x,y in zip(X,Y):
y_pred=self.model(x)
if y==1 and y_pred==0:
self.w=self.w+lr*x
self.b=self.b-lr*1
elif y==0 and y_pred==1:
self.w=self.w-lr*x
self.b=self.b+lr*1
wt_matrix.append(self.w)

accuracy[i]=accuracy_score(self.predict(X),Y)
if (accuracy[i] >=max_accuracy):
max_accuracy=accuracy[i]
chkptw=self.w
chkptb=self.b

self.w=chkptw
self.b=chkptb

print(max_accuracy)

import matplotlib.pyplot as plt

plt.plot(accuracy.values())
plt.ylim([0,1])
plt.show()
return np.array(wt_matrix)

In [13]: percept=Perceptron()

32
In [14]: wt_matrix=percept.fit(x_train,y_train,10000,0.5)

0.947265625

In [15]: y_predict=percept.predict(x_test)

In [16]: from sklearn.metrics import classification_report

print(classification_report(y_test,y_predict))

precision recall f1-score support

0 0.96 1.00 0.98 24

1 1.00 0.97 0.98 33

accuracy 0.98 57
macro avg 0.98 0.98 0.98 57
weighted avg 0.98 0.98 0.98 57

In [17]: from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_predict)
print("Accuracy:", accuracy)

Accuracy: 0.9824561403508771

In [18]: from sklearn.metrics import confusion_matrix

conf=confusion_matrix(y_test, y_predict)
print("Confusion_Matrix : ",conf)

Confusion_Matrix : [[24 0]
[ 1 32]]

33
9) Assignment on Multiclassification using MLP
(Multilayer Perception). Build an application to
classify give iris flower into its Specie using MLP
Cuse iris data set Kaggle / sklearn). Display
Accuracy Score, classification report and
Confusion matrix.
In [1]: import pandas as pd
url="https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names=['sepal-length','sepal-width','petal-length','petal-width','Class']
data=pd.read_csv(url,names=names)

In [2]: data.head()

Out[2]:
sepal-length sepal-width petal-length petal-width Class

0 5.1 3.5 1.4 0.2 Iris-setosa

1 4.9 3.0 1.4 0.2 Iris-setosa

2 4.7 3.2 1.3 0.2 Iris-setosa

3 4.6 3.1 1.5 0.2 Iris-setosa

4 5.0 3.6 1.4 0.2 Iris-setosa

In [3]: X=data.iloc[:,0:4]
y=data.select_dtypes(include=[object])
y.head()
X.head()

Out[3]:
sepal-length sepal-width petal-length petal-width

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

In [4]: y.Class.unique()

Out[4]: array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

34
In [5]: from sklearn import preprocessing
le=preprocessing.LabelEncoder()
y=y.apply(le.fit_transform)
y

Out[5]:
Class

0 0

1 0

2 0

3 0

4 0

... ...

145 2

146 2

147 2

148 2

149 2

150 rows × 1 columns

In [6]: from sklearn.model_selection import train_test_split

X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=0.3)
print("dimension of x-train:",X_train.shape,"X_test :",X_test.shape)
Y_test.head()

dimension of x-train: (105, 4) X_test : (45, 4)

Out[6]:
Class

40 0

144 2

129 2

82 1

59 1

In [7]: from sklearn.preprocessing import StandardScaler

scaler=StandardScaler()
scaler.fit(X_train)

X_train=scaler.transform(X_train)
X_test=scaler.transform(X_test)

In [8]: from sklearn.neural_network import MLPClassifier

mlp=MLPClassifier(hidden_layer_sizes=(10,10,10),max_iter=1000)
mlp.fit(X_train,Y_train.values.ravel())

Out[8]: ▾ MLPClassifier
MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000)
35
In [9]: pred=mlp.predict(X_test)

In [10]: from sklearn.metrics import classification_report , confusion_matrix

print(classification_report(Y_test,pred))
print((confusion_matrix(Y_test,pred)))

precision recall f1-score support

0 1.00 1.00 1.00 14

1 1.00 0.82 0.90 11
2 0.91 1.00 0.95 20

accuracy 0.96 45
macro avg 0.97 0.94 0.95 45
weighted avg 0.96 0.96 0.95 45

[[14 0 0]
[ 0 9 2]
[ 0 0 20]]

36
10) Assignment on Regression using KNN. Build an
application where it can predict Salary based on of
experience using KNN (use salary dataset from
Kaggle). Display MSE.
In [1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

loading dataset
In [2]: data=pd.read_csv("salary_data.csv")

In [3]: data.head()

Out[3]:
YearsExperience Salary

0 1.1 39343.0

1 1.3 46205.0

2 1.5 37731.0

3 2.0 43525.0

4 2.2 39891.0

In [4]: data.tail()

Out[4]:
YearsExperience Salary

25 9.0 105582.0
26 9.5 116969.0
27 9.6 112635.0
28 10.3 122391.0
29 10.5 121872.0

In [5]: data.describe()

Out[5]:
YearsExperience Salary
count 30.000000 30.000000
mean 5.313333 76003.000000
std 2.837888 27414.429785
min 1.100000 37731.000000
25% 3.200000 56720.750000
50% 4.700000 65237.000000
75% 7.700000 100544.750000
max 10.500000 122391.000000

37
Train test Split
In [6]: from sklearn.model_selection import train_test_split
train , test = train_test_split(data, test_size = 0.3)

In [7]: x_train = train.drop('Salary', axis=1)

y_train = train['Salary']

In [8]: x_test= train.drop('Salary', axis=1)

y_test = train['Salary']

In [9]: x_test.head()

Out[9]:
YearsExperience

29 10.5

7 3.2

22 7.9

11 4.0

10 3.9

In [10]: y_test.head()

Out[10]: 29 121872.0
7 54445.0
22 101302.0
11 55794.0
10 63218.0
Name: Salary, dtype: float64

In [11]: from sklearn import neighbors

from sklearn.metrics import mean_squared_error
from math import sqrt
import matplotlib.pyplot as plt
%matplotlib inline

In [12]: model = neighbors.KNeighborsRegressor(n_neighbors = 3)

model.fit(x_train, y_train)
pred=model.predict(x_test)
error = sqrt(mean_squared_error(y_test,pred))

In [13]: pred

Out[13]: array([116615. , 58510.66666667, 103002. , 58697.66666667,

58697.66666667, 58697.66666667, 43024.33333333, 105438.33333333,
105438.33333333, 116615. , 97838.33333333, 58510.66666667,
48790.66666667, 43024.33333333, 57995.33333333, 70076. ,
58510.66666667, 76826.66666667, 86130.33333333, 86130.33333333,
58733.66666667])

In [14]: error

Out[14]: 4194.927232595636
38
11) Assignment on Classification using KNN. an
application classify a iris flower into its specie
using KNN (use Iris dataset from Sklearn). Display
Accuracy score, classification Report & confusion
Matrix.
In [1]: import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

Loading dataset
In [8]: data=load_iris()

In [5]: X=data.data

y=data.target
y

Out[5]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

train test split

In [10]: from sklearn.model_selection import train_test_split
X_train,X_test ,y_train,y_test = train_test_split(X,y, test_size = 0.3)

In [11]: from sklearn.neighbors import KNeighborsClassifier

model=KNeighborsClassifier(n_neighbors=7)

In [12]: model.fit(X_train,y_train)

Out[12]: KNeighborsClassifier(n_neighbors=7)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the
notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with
nbviewer.org.

In [13]: pred=model.predict(X_test)

39
In [15]: pred

Out[15]: array([1, 0, 0, 2, 2, 0, 0, 2, 0, 0, 2, 0, 0, 2, 2, 1, 0, 1, 2, 2, 2, 1,
0, 0, 1, 2, 1, 2, 0, 0, 1, 0, 2, 2, 2, 1, 2, 1, 2, 1, 1, 1, 0, 2,
0])

In [16]: from sklearn.metrics import accuracy_score

print("Accuracy :",accuracy_score(y_test,pred))

Accuracy : 0.9555555555555556

In [17]: from sklearn.metrics import confusion_matrix

print("Confusion matrix : ",confusion_matrix(y_test,pred))

Confusion matrix : [[16 0 0]

[ 0 12 2]
[ 0 0 15]]

In [18]: from sklearn.metrics import classification_report

print("Classifiaction report : ",classification_report(y_test,pred))

Classifiaction report : precision recall f1-score support

0 1.00 1.00 1.00 16

1 1.00 0.86 0.92 14
2 0.88 1.00 0.94 15

accuracy 0.96 45
macro avg 0.96 0.95 0.95 45
weighted avg 0.96 0.96 0.96 45

40
12) Assignment on Naive Bayes Classifier. Build an
application to classify a given text using a Naive
classifier. Use data from sklearn. Display Accuracy
score, Classification Report, confusion matrix.
In [1]: import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

In [2]: from sklearn.datasets import fetch_20newsgroups

data = fetch_20newsgroups()
data.target_names

Out[2]: ['alt.atheism',
'comp.graphics',
'comp.os.ms-windows.misc',
'comp.sys.ibm.pc.hardware',
'comp.sys.mac.hardware',
'comp.windows.x',
'misc.forsale',
'rec.autos',
'rec.motorcycles',
'rec.sport.baseball',
'rec.sport.hockey',
'sci.crypt',
'sci.electronics',
'sci.med',
'sci.space',
'soc.religion.christian',
'talk.politics.guns',
'talk.politics.mideast',
'talk.politics.misc',
'talk.religion.misc']

In [3]: categories = ['talk.religion.misc', 'soc.religion.christian',

'sci.space', 'comp.graphics']
train = fetch_20newsgroups(subset='train', categories=categories)
test = fetch_20newsgroups(subset='test', categories=categories)

41
In [4]: print(train.data[5])

From: [email protected] (Don McGee)

Subject: Federal Hearing
Originator: dmcgee@uluhe
Organization: School of Ocean and Earth Science and Technology
Distribution: usa
Lines: 10

Fact or rumor....? Madalyn Murray O'Hare an atheist who eliminated the

use of the bible reading and prayer in public schools 15 years ago is now
going to appear before the FCC with a petition to stop the reading of the
Gospel on the airways of America. And she is also campaigning to remove
Christmas programs, songs, etc from the public schools. If it is true
then mail to Federal Communications Commission 1919 H Street Washington DC
20054 expressing your opposition to her request. Reference Petition number

2493.

In [5]: from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

model = make_pipeline(TfidfVectorizer(), MultinomialNB())

In [6]: model.fit(train.data, train.target)

labels = model.predict(test.data)

42
In [7]: from sklearn.metrics import confusion_matrix
mat = confusion_matrix(test.target, labels)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
xticklabels=train.target_names, yticklabels=train.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label');

In [8]: mat

Out[8]: array([[344, 13, 32, 0],

[ 6, 364, 24, 0],
[ 1, 5, 392, 0],
[ 4, 12, 187, 48]],
dtype=int64)

In [9]: def predict_category(s, train=train, model=model):

pred = model.predict([s])
return train.target_names[pred[0]]

In [10]: predict_category('sending a payload to the ISS')

Out[10]: 'sci.space'

In [11]: predict_category('discussing islam vs atheism')

Out[11]: 'soc.religion.christian

In [12]: predict_category('determining the screen resolution')

Out[12]: 'comp.graphics'

43
13) Assignment on K-mean clusting. Apply K-mean
clustering on Income data set to form 3 clusters and
display there clusters using scatter graph.
In [1]: import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="sklearn")
warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")

Loading dataset
In [2]: df=pd.read_csv("income.csv")
df.head()

Out[2]: Name Age Income($)

0 Rob 27 70000

1 Michael 29 90000
2 Mohan 29 61000

3 Ismail 28 60000

4 Kory 42 150000

In [3]: plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age')
plt.ylabel('Income($)')

Text(0, 0.5, 'Income($)')

Out[3]:

44
In [4]: km=KMeans(n_clusters=3)
y_predicted=km.fit_predict(df[['Age','Income($)']])
y_predicted

array([0, 0, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2])
Out[4]:

In [5]: df['cluster']=y_predicted
df.head()

Out[5]: Name Age Income($) cluster

0 Rob 27 70000 0
1 Michael 29 90000 0
2 Mohan 29 61000 2
3 Ismail 28 60000 2
4 Kory 42 150000 1

df1 = df[df.cluster==0]
In [6]: df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*',
plt.xlabel('Age')
plt.ylabel('Income ($)')
plt.legend()

Out[6]: <matplotlib.legend.Legend at 0x15064105810>

45
In [7]: scaler=MinMaxScaler()
scaler.fit(df[['Income($)']])
df['Income($)']=scaler.transform(df[['Income($)']])

scaler.fit(df[["Age"]])
df['Age']=scaler.transform(df[['Age']])

In [8]: df.head()

Out[8]: Name Age Income($) cluster

0 Rob 0.058824 0.213675 0

1 Michael 0.176471 0.384615 0

2 Mohan 0.176471 0.136752 2

3 Ismail 0.117647 0.128205 2

4 Kory 0.941176 0.897436 1

In [9]: plt.scatter(df.Age,df['Income($)'])

<matplotlib.collections.PathCollection at 0x150641bae10>
Out[9]:

46
In [10]: km=KMeans(n_clusters=3)
y_predicted=km.fit_predict(df[['Age','Income($)']])
y_predicted

array([0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
Out[10]:
In [11]: df['cluster']=y_predicted

In [12]: df.head()

Out[12]: Name Age Income($) cluster

0 Rob 0.058824 0.213675 0

1 Michael 0.176471 0.384615 0

2 Mohan 0.176471 0.136752 0

3 Ismail 0.117647 0.128205 0

4 Kory 0.941176 0.897436 2

In [13]: km.cluster_centers_

Out[13]: array([[0.1372549 , 0.11633428],

[0.85294118, 0.2022792 ],
[0.72268908, 0.8974359 ]])
In [14]: df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green')
plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*',
plt.legend()

Out[14]: <matplotlib.legend.Legend at 0x15064202f90>

47
In [15]: sse = []
k_rng = range(1,10)
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df[['Age','Income($)']])
sse.append(km.inertia_)

In [16]: sse

[5.434011511988178,
Out[16]:
2.091136388699078,
0.4750783498553096,
0.3491047094419566,
0.2664030124668416,
0.21055478995472493,
0.16869711728567788,
0.13265419827245162,
0.10383752586603562]

In [17]: plt.xlabel('K')
plt.ylabel('sum of squared error')
plt.plot(k_rng,sse)

[<matplotlib.lines.Line2D at 0x1506427acd0>]
Out[17]:

48
49
14) Assignment on Hierarchial clustering,Apply it
on mall_customers to form 5 clusters and display
these clusters using scatter graph and also display
its dendrogram
In [1]: import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]: data=pd.read_csv('Mall_Customers.csv')
data.head()

Out[2]: CustomerID Gender Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

In [3]: newdata=data.iloc[:,[3,4]].values

50
In [4]: import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(newdata, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()

In [5]: from sklearn.cluster import AgglomerativeClustering

Agg_hc=AgglomerativeClustering(n_clusters=5,affinity='euclidean',linkage='ward')
y_hc=Agg_hc.fit_predict(newdata)

C:\Users\HP\anaconda3\Lib\site-packages\sklearn\cluster\_agglomerative.py:1005: Fut
ureWarning: Attribute `affinity` was deprecated in version 1.2 and will be removed
in 1.4. Use `metric` instead
warnings.warn(

51
In [6]: plt.scatter(newdata[y_hc == 0, 0], newdata[y_hc == 0, 1], s = 100, c = 'red', label =
plt.scatter(newdata[y_hc == 1, 0], newdata[y_hc == 1, 1], s = 100, c = 'blue', label
plt.scatter(newdata[y_hc == 2, 0], newdata[y_hc == 2, 1], s = 100, c = 'green', label
plt.scatter(newdata[y_hc == 3, 0], newdata[y_hc == 3, 1], s = 100, c = 'cyan', label
plt.scatter(newdata[y_hc == 4, 0], newdata[y_hc == 4, 1], s = 100, c = 'magenta', lab
# plot title addition
plt.title('Clusters of customers')
# labelling the x-axis
plt.xlabel('Annual Income (k$)')
# label of the y-axis
plt.ylabel('Spending Score (1-100)')
# printing the legend
plt.legend()
# show the plot
plt.show()

52
15) Assignment on Dimensionality Reduction
. Apply Principal component Analysis (PCA) on Iris
dataset to reduce its dimensionality into 3 principal
components. Display data before and after reduction
using scatter matrix graph.
In [1]: import pandas as pd
import matplotlib.pyplot as plt

df=pd.read_csv('Iris.csv')
df.head()

Out[1]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

53
In [2]: fig = plt.figure(figsize = (8,8))
sepal = fig.add_subplot(1,1,1)
sepal.set_xlabel('sepal_length', fontsize = 15)
sepal.set_ylabel('sepal_width', fontsize = 15)
sepal.set_title('Original Data', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = df['Species'] == target
sepal.scatter(df.loc[indicesToKeep, 'SepalLengthCm']
, df.loc[indicesToKeep, 'SepalWidthCm']
, c = color
, s = 50)
sepal.legend(targets)
sepal.grid()

54
In [3]: fig = plt.figure(figsize = (8,8))
petal = fig.add_subplot(1,1,1)
petal.set_xlabel('petal_length', fontsize = 15)
petal.set_ylabel('petal_width', fontsize = 15)
petal.set_title('Original Data', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = df['Species'] == target
petal.scatter(df.loc[indicesToKeep, 'PetalLengthCm']
, df.loc[indicesToKeep, 'PetalWidthCm']
, c = color
, s = 50)
petal.legend(targets)
petal.grid()

55
In [4]: from sklearn.preprocessing import StandardScaler
features = ['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['Species']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state

In [5]: from sklearn.decomposition import PCA

pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2', 'principa

In [6]: finalDf = pd.concat([principalDf, df[['Species']]], axis = 1)

finalDf.head()

Out[6]:
principal component 1 principal component 2 principal component 3 Species

0 -2.264542 0.505704 -0.121943 Iris-setosa

1 -2.086426 -0.655405 -0.227251 Iris-setosa

2 -2.367950 -0.318477 0.051480 Iris-setosa

3 -2.304197 -0.575368 0.098860 Iris-setosa

4 -2.388777 0.674767 0.021428 Iris-setosa

56
In [7]: import matplotlib.pyplot as plt
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = finalDf['Species'] == target
ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1']
, finalDf.loc[indicesToKeep, 'principal component 2']
, c = color
, s = 50)
ax.legend(targets)
ax.grid()

57
In [8]: from sklearn.neural_network import MLPClassifier
mlp=MLPClassifier(hidden_layer_sizes=(10,10,10),max_iter=200)
mlp.fit(X_train,y_train.ravel())

C:\Users\HP\anaconda3\Lib\site-packages\sklearn\neural_network\_multilayer_perceptr
on.py:691: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reach
ed and the optimization hasn't converged yet.
warnings.warn(

Out[8]: ▾ MLPClassifier
MLPClassifier(hidden_layer_sizes=(10, 10, 10))

In [9]: #using original data

model = MLPClassifier()
model.fit(X_train, y_train.ravel())
predictions = model.predict(X_test)

In [10]: from sklearn.metrics import accuracy_score

accuracy_score(y_test, predictions)

Out[10]: 0.9777777777777777

TEACHING SOCIAL STUDIES IN THE ELEMENTARY GRADES Culture
89% (9)
TEACHING SOCIAL STUDIES IN THE ELEMENTARY GRADES Culture
88 pages
Workbook in LOGIC
No ratings yet
Workbook in LOGIC
40 pages
ML PROGRAMS
No ratings yet
ML PROGRAMS
55 pages
Python
No ratings yet
Python
5 pages
Numpy Project Part-1
No ratings yet
Numpy Project Part-1
49 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
11 pages
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
NumPy 2
No ratings yet
NumPy 2
11 pages
Python 20240309 154846 0000
No ratings yet
Python 20240309 154846 0000
34 pages
Python 2
No ratings yet
Python 2
28 pages
2 1
No ratings yet
2 1
7 pages
Numpy Notes Merged
No ratings yet
Numpy Notes Merged
16 pages
2 Numpy Basics
No ratings yet
2 Numpy Basics
14 pages
NUMPY - Jupyter Notebook
No ratings yet
NUMPY - Jupyter Notebook
32 pages
#The Numpy Array 20240306 131018 0000
No ratings yet
#The Numpy Array 20240306 131018 0000
7 pages
Import As
100% (1)
Import As
27 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
Numpy
No ratings yet
Numpy
27 pages
Numpy
No ratings yet
Numpy
20 pages
Nguyenquangmanh
No ratings yet
Nguyenquangmanh
15 pages
Unit 2
No ratings yet
Unit 2
12 pages
Numpy
No ratings yet
Numpy
18 pages
Aks Python Numpy
No ratings yet
Aks Python Numpy
6 pages
CS GE Assignment
No ratings yet
CS GE Assignment
7 pages
DL Lab2
No ratings yet
DL Lab2
38 pages
ML Labs
No ratings yet
ML Labs
15 pages
L - AND - T - Project - Naveen 24cs002895
No ratings yet
L - AND - T - Project - Naveen 24cs002895
7 pages
Applied Machine Learning For Engineers: Introduction To Numpy
No ratings yet
Applied Machine Learning For Engineers: Introduction To Numpy
13 pages
12 Numpy
No ratings yet
12 Numpy
17 pages
Labmanualfds
No ratings yet
Labmanualfds
49 pages
Python Course Cheat Sheet
No ratings yet
Python Course Cheat Sheet
30 pages
Mmds
No ratings yet
Mmds
12 pages
Workshop Notes-2 Handling Array With NumPy
No ratings yet
Workshop Notes-2 Handling Array With NumPy
13 pages
Numpy Library Basics
No ratings yet
Numpy Library Basics
16 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
Numpy
No ratings yet
Numpy
1 page
NumPy: From Basic To Advance
No ratings yet
NumPy: From Basic To Advance
119 pages
2024 Spring Practice Week 13 Lecture 12
No ratings yet
2024 Spring Practice Week 13 Lecture 12
59 pages
Numpy
No ratings yet
Numpy
8 pages
Shiva Teja
No ratings yet
Shiva Teja
19 pages
Python Assignment 1
No ratings yet
Python Assignment 1
4 pages
Session-21 - Jupyter Notebook
No ratings yet
Session-21 - Jupyter Notebook
9 pages
Section 7
No ratings yet
Section 7
33 pages
A2 Rahil
No ratings yet
A2 Rahil
5 pages
NumPy Basics
No ratings yet
NumPy Basics
9 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Veri Analizi Hafta 6-2 Ipynb - Colab
No ratings yet
Veri Analizi Hafta 6-2 Ipynb - Colab
7 pages
Ai Tools Lab - N3
No ratings yet
Ai Tools Lab - N3
66 pages
Session 13 Numpy Fundamentals
No ratings yet
Session 13 Numpy Fundamentals
14 pages
Data Manipulation With Numpy
No ratings yet
Data Manipulation With Numpy
13 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
23BCL071 Practical1.2
No ratings yet
23BCL071 Practical1.2
9 pages
Numpy Session1
No ratings yet
Numpy Session1
1 page
Session-22 - Jupyter Notebook
No ratings yet
Session-22 - Jupyter Notebook
9 pages
03 Numpy
No ratings yet
03 Numpy
12 pages
Numpy
No ratings yet
Numpy
29 pages
Numpy Coding Question
No ratings yet
Numpy Coding Question
11 pages
Numpy
No ratings yet
Numpy
11 pages
Python Programming U5
No ratings yet
Python Programming U5
46 pages
Numpy Revision Exercise
No ratings yet
Numpy Revision Exercise
2 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
WRIGHT, Coling. Post-Truth, Postmodernism and Alternative Facts.
No ratings yet
WRIGHT, Coling. Post-Truth, Postmodernism and Alternative Facts.
15 pages
GODSEND AGENDA Booster Deck
No ratings yet
GODSEND AGENDA Booster Deck
17 pages
TsukiMichi - 1 - Prologue + Wandering in The Ends of The World
No ratings yet
TsukiMichi - 1 - Prologue + Wandering in The Ends of The World
398 pages
Worksheet BE-GOING-TO
100% (1)
Worksheet BE-GOING-TO
3 pages
19 Passage 1 - What Are You Laughing at Q1-13
No ratings yet
19 Passage 1 - What Are You Laughing at Q1-13
5 pages
SrinivasuluReddy Gaddam Resume - PDF 2
No ratings yet
SrinivasuluReddy Gaddam Resume - PDF 2
2 pages
Python For IT Professionals
No ratings yet
Python For IT Professionals
13 pages
Digital Paramount (EE) + Front
No ratings yet
Digital Paramount (EE) + Front
66 pages
Hindi Songs 62 AWAZ UTHAYENGE (Nirmal Sangeet Sarita)
100% (2)
Hindi Songs 62 AWAZ UTHAYENGE (Nirmal Sangeet Sarita)
1 page
Kinoko Komori My Hero Academia Wiki Fandom
No ratings yet
Kinoko Komori My Hero Academia Wiki Fandom
1 page
The Rust Reference
No ratings yet
The Rust Reference
9 pages
How To Master The Art of Speaking (And Blow Up Your Content) (DownSub - Com)
No ratings yet
How To Master The Art of Speaking (And Blow Up Your Content) (DownSub - Com)
27 pages
CL Commands II
No ratings yet
CL Commands II
728 pages
HubSpot - MFM's Resources and Tools For Better Copywriting
No ratings yet
HubSpot - MFM's Resources and Tools For Better Copywriting
4 pages
Drivers Audio Lenovo Thinkcentre
No ratings yet
Drivers Audio Lenovo Thinkcentre
5 pages
SundaysSpecialDays 2PGCalendar 2022
No ratings yet
SundaysSpecialDays 2PGCalendar 2022
2 pages
LaTeX Expressions in Xfig
No ratings yet
LaTeX Expressions in Xfig
1 page
2 Business Objects
No ratings yet
2 Business Objects
3 pages
4 Marks Questions
No ratings yet
4 Marks Questions
33 pages
Catch Up Plan Form 1 2022
No ratings yet
Catch Up Plan Form 1 2022
41 pages
Introduction To NetSim
No ratings yet
Introduction To NetSim
8 pages
Procedures de Maintenance
No ratings yet
Procedures de Maintenance
87 pages
Software Assignment No1 Zohaib Ijaz 23811
No ratings yet
Software Assignment No1 Zohaib Ijaz 23811
10 pages
Math Test Mix Up Worksheets RAZ
No ratings yet
Math Test Mix Up Worksheets RAZ
3 pages
Summative Assessment For The Unit "Clothes and Fashion" Learning Objective
No ratings yet
Summative Assessment For The Unit "Clothes and Fashion" Learning Objective
2 pages
Tense and Aspects
No ratings yet
Tense and Aspects
6 pages
Library Automation System: Project Report
No ratings yet
Library Automation System: Project Report
49 pages
Lyric Writing
No ratings yet
Lyric Writing
12 pages