Machine Learning Laboratory 18CSL76: Institute of Technology and Management
Machine Learning Laboratory 18CSL76: Institute of Technology and Management
MACHINE LEARNING
LABORATORY
18CSL76
BY
Prof. Chethana C
Department of CSE
BMSIT&M
Programs Implemented By
Bhavya Sheth
1BY15CS110
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
LAB Experiments
1
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
PROGRAM 1: Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.
Program
import csv
num_attributes = 6
a = []
print("\n The Given Training Data Set \n")
csvfile = open('enjoysport.csv', 'r')
2
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='Yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{0} the hypothesis is ".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)
INPUT: enjoysport.csv
OUTPUT
The initial value of hypothesis: ['0', '0', '0', '0', '0', '0']
For Training instance No: 3 the hypothesis is ['Sunny', 'Warm', '?', 'Strong', '?', '?']
The Maximally Specific Hypothesis for a given Training Examples: ['Sunny', 'Warm', '?', 'Strong', '?',
'?']
Program 2: For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with the
training examples.
4
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
s
Summary
For all Positive Instances
1. Delete all members of G that do not match +ve.(Inconsistent)
2. For Each Member s, if S does not match +ve replace s with its minimal generalization that match +ve
(Let it be hypothesis h)
3. Delete all inconsistent hypothesis from S. (check h is consistent with d)
4. Delete any member of s , if more general than another member in s
5. Delete any member of s , if more general than another member in g
Program
import csv
a = []
print("\n The Given Training Data Set \n")
print(row)
num_attributes = len(a[0])-1
S = ['0'] * num_attributes
G = ['?'] * num_attributes
print ("\n The most specific hypothesis S0 : [0,0,0,0,0,0]\n")
print (" \n The most general hypothesis G0 : [?,?,?,?,?,?]\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='Yes':
for j in range(0,num_attributes):
if a[i][j]!=S[j]:
S[j]='?'
for j in range(0,num_attributes):
for k in range(1,len(temp)):
if temp[k][j]!= '?' and temp[k][j] !=S[j]:
del temp[k]
if a[i][num_attributes]=='No':
for j in range(0,num_attributes):
# print("S[j] ",S[j])
# print("a[i][j] ",a[i][j])
if S[j] != a[i][j] and S[j]!= '?':
G[j]=S[j]
temp.append(G)
print("Temp ",temp)
G = ['?'] * num_attributes
INPUT: enjoysport.csv
OUTPUT
For Training Example No :1 the hypothesis is S1 ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']
For Training Example No :1 the hypothesis is G1 ['?', '?', '?', '?', '?', '?']
For Training Example No :2 the hypothesis is S2 ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
For Training Example No :2 the hypothesis is G2 ['?', '?', '?', '?', '?', '?']
For Training Example No :3 the hypothesis is S3 ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
For Training Example No :3 the hypothesis is G3 [['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', 'Same']]
For Training Example No :4 the hypothesis is S4 ['Sunny', 'Warm', '?', 'Strong', '?', '?']
For Training Example No :4 the hypothesis is G4 [['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]
7
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program 3: Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample
8
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
9
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program
import pandas as pd
from collections import Counter
import math
tennis = pd.read_csv('playtennis.csv')
print("\n Given Play Tennis Data Set:\n\n", tennis)
def entropy(alist):
c = Counter(x for x in alist)
instances = len(alist)
prob = [x / instances for x in c.values()]
return sum( [-p*math.log(p, 2) for p in prob] )
splitting = d.groupby(split)
n = len(d.index)
agent = splitting.agg({target : [entropy, lambda x: len(x)/n] })[target] #aggregating
agent.columns = ['Entropy', 'observations']
newentropy = sum( agent['Entropy'] * agent['observations'] )
oldentropy = entropy(d[target])
return oldentropy - newentropy
if len(count) == 1:
# next input data set, or raises StopIteration when EOF is hit
return next(iter(count))
else:
names = list(tennis.columns)
print("List of Attributes:", names)
10
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
names.remove('PlayTennis')
print("Predicting Attributes:", names)
tree = id3(tennis,'PlayTennis',names)
print("\n\nThe Resultant Decision Tree is :\n")
print(tree)
INPUT: playtennis.csv
PlayTennis Outlook Temperature Humidity Wind
No Sunny Hot High Weak
No Sunny Hot High Strong
Yes Overcast Hot High Weak
Yes Rain Mild High Weak
Yes Rain Cool Normal Weak
No Rain Cool Normal Strong
Yes Overcast Cool Normal Strong
No Sunny Mild High Weak
Yes Sunny Cool Normal Weak
Yes Rain Mild Normal Weak
Yes Sunny Mild Normal Strong
Yes Overcast Mild High Strong
Yes Overcast Hot Normal Weak
11
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
OUTPUT
Given Play Tennis Data Set:
PlayTennis Outlook Temperature Humidity Wind
0 No Sunny Hot High Weak
1 No Sunny Hot High Strong
2 Yes Overcast Hot High Weak
3 Yes Rain Mild High Weak
4 Yes Rain Cool Normal Weak
5 No Rain Cool Normal Strong
6 Yes Overcast Cool Normal Strong
7 No Sunny Mild High Weak
8 Yes Sunny Cool Normal Weak
9 Yes Rain Mild Normal Weak
10 Yes Sunny Mild Normal Strong
11 Yes Overcast Mild High Strong
12 Yes Overcast Hot Normal Weak
13 No Rain Mild High Strong
List of Attributes: ['PlayTennis', 'Outlook', 'Temperature', 'Humidity', 'Wind']
Predicting Attributes: ['Outlook', 'Temperature', 'Humidity', 'Wind']
Gain= [0.2467498197744391, 0.029222565658954647, 0.15183550136234136, 0.04812703040826927]
Best Attribute: Outlook
Gain= [0.01997309402197489, 0.01997309402197489, 0.9709505944546686]
Best Attribute: Wind
Gain= [0.5709505944546686, 0.9709505944546686, 0.01997309402197489]
Best Attribute: Humidity
The Resultant Decision Tree is:
{'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'Humidity': {'High': 'No',
'Normal': 'Yes'}}}}
12
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program 5: Build an Artificial Neural Network by implementing the Backpropagation algorithm and test
the same using appropriate data sets.
13
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
14
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program
import math
def sigmoid(x):
y= 1/(1+math.exp(-x))
return y
##define inputs and target for xor gate
x1=[0,0,1,1] #input1
x2=[0,1,0,1] #input2
t=[0,1,1,0] #target
15
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
## Training Starts
while(train):
for i in range(len(x1)):
##input for each perceptron of hidden layer
z_in1=b1+x1[i]*w11+x2[i]*w21
z_in2=b2+x1[i]*w12+x2[i]*w22
##computing activation function output
z1=round(sigmoid(z_in1),4)
z2=round(sigmoid(z_in2),4)
b2=round(b2+del_2,4)
w21=round(w21+del_2*x2[i],4)
w22=round(w22+del_2*x2[i],4)
print("Iteration: ",iteration)
print("w11 : %5.4f w12: %5.4f w21: %5.4f w22: %5.4f w13: %5.4f w23: %5.4f "
%(w11,w12,w21,w22,w13,w23))
print("Error: %5.3f" % del_k)
iteration=iteration+1
if(iteration==1000):
train=False
OUTPUT
Sample
17
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program 5: Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
s
18
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program
import csv
import math
import random
import statistics
#Gaussian formula
def calculate_probability(x, mean, stdev):
exponent = math.exp(-(math.pow(x - mean, 2) / (2 * math.pow(stdev, 2))))
return (1 / (math.sqrt(2 * math.pi) * stdev)) * exponent
dataset = []
dataset_size = 0
with open('lab5.csv') as csvfile:
lines = csv.reader(csvfile)
for row in lines:
dataset.append([float(attr) for attr in row])
dataset_size = len(dataset)
print('Size of dataset is : ', dataset_size)
print(train_size)
X_train = []
X_test = dataset.copy()
training_indexes = random.sample(range(dataset_size), train_size)
# Split Data
for i in training_indexes:
X_train.append(dataset[i]) # 70% of data traning phase
X_test.remove(dataset[i]) # 30% of data testing phase
classes = {}
for samples in X_train:
last = int(samples[-1])
if last not in classes:
#print("Concept value", last)
classes[last] = []
classes[last].append(samples)
#print(classes)
X_prediction = []
# Find Accuracy
correct = 0
for index, key in enumerate(X_test):
#print("X_test[index][-1]",X_test[index][-1] )
#print( "X_prediction[index]",X_prediction[index])
20
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
if X_test[index][-1] == X_prediction[index]:
OUTPUT
Size of dataset is : 768
train_size 537
Accuracy : 68.83116883116884
21
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program 6: Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model
to perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set.
22
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program
import pandas as pd
msg=pd.read_csv('lab6.txt',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
print(df)
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
print("ytest")
print(ytest)
print("predicted")
print(predicted)
from sklearn import metrics
print('Accuracy metrics')
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
import numpy as np
labels = np.unique(ytest) #Assign concept labels
a = metrics.confusion_matrix(ytest, predicted, labels=labels)
cm = pd.DataFrame(a, index=labels, columns=labels)
23
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
print(cm)
INPUT:lab6.txt
OUTPUT
Uniqu Labels
['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'can', 'dance', 'deal', 'do', 'enemy', 'feel', 'fun', 'good',
'great', 'have', 'he', 'holiday', 'house', 'is', 'juice', 'like', 'love', 'my', 'not', 'of', 'place', 'sick', 'sworn', 'taste', 'the',
'these', 'this', 'tired', 'to', 'today', 'tomorrow', 'very', 'view', 'we', 'went', 'what', 'will', 'with', 'work']
about am amazing an and awesome ... we went what will with work
0 0 0 0 1 0 1 ... 0 0 0 0 0 0
1 0 1 0 0 1 0 ... 0 0 0 0 0 0
2 0 0 0 0 0 0 ... 1 0 0 1 0 0
3 0 0 0 0 0 0 ... 0 0 0 0 0 0
4 0 0 0 0 0 0 ... 0 0 0 0 0 0
5 1 0 0 0 0 0 ... 0 0 0 0 0 0
6 0 0 0 0 0 0 ... 0 0 0 0 1 0
7 0 0 0 0 0 0 ... 0 0 1 0 0 0
8 0 0 0 0 0 0 ... 0 0 0 0 0 1
9 0 0 0 0 0 0 ... 0 0 0 0 0 0
10 0 0 1 1 0 0 ... 0 0 0 0 0 0
11 0 0 0 1 0 1 ... 0 0 1 0 0 0
12 0 0 0 0 0 0 ... 0 1 0 0 0 0
ytest
13 0
4 1
5 0
17 0
7 0
Name: labelnum, dtype: int64
predicted
[0 1 0 0 1]
Accuracy metrics
Accuracy of the classifer is 0.8
Confusion matrix
0 1
0 3 1
1 0 1
Recall and Precison
Recall 1.0
Precison 0.5
Program 7: Write a program to construct aBayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
26
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
27
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Progrm
imort pandas as pd
col =['Ag','Gender','FamilyHist','Diet','LifeStyle','Cholesterol','HeartDisease']
data = pd.read_csv('lab7.csv',names =col )
print(dat)
#encoding
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
for i in range(len(col)):
data.iloc[:,i] = encoder.fit_transform(data.iloc[:,i])
#spliting data
X = data.iloc[:,0:6]
y = data.iloc[:,-1]
from sklearn.model_selection import train_test_split
print("X_test")
print(X_test)
print("y_test")
print(y_test)
#prediction
from sklearn.naive_bayes import GaussianNB
#Create naive bayes classifier
clf = GaussianNB()
#Fit the dataset on classifier
clf.fit(X_train,y_train)
#Perform prediction
y_pred = clf.predict(X_test)
print("y_pred")
print(y_pred)
labels = np.unique(y_test)
a = confusion_matrix(y_test, y_pred, labels=labels)
print('Confusion matrix')
cm =pd.DataFrame(a, index=labels, columns=labels)
print(cm)
28
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
INPUT: lab7.csv
SuperSeniorCitizen Male Yes Medium Sedetary High Yes
SuperSeniorCitizen Female Yes Medium Sedetary High Yes
SeniorCitizen Male No High Moderate BorderLine Yes
Teen Male Yes Medium Sedetary Normal No
Youth Female Yes High Athlete Normal No
MiddleAged Male Yes Medium Active High Yes
Teen Male Yes High Moderate High Yes
SuperSeniorCitizen Male Yes Medium Sedetary High Yes
Youth Female Yes High Athlete Normal No
SeniorCitizen Female No High Athlete Normal Yes
Teen Female No Medium Moderate High Yes
Teen Male Yes Medium Sedetary Normal No
MiddleAged Female No High Athlete High No
MiddleAged Male Yes Medium Active High Yes
Youth Female Yes High Athlete BorderLine No
SuperSeniorCitizen Male Yes High Athlete Normal Yes
SeniorCitizen Female No Medium Moderate BorderLine Yes
Youth Female Yes Medium Athlete BorderLine No
Teen Male Yes Medium Sedetary Normal No
OUTPUT
X_test
Age Gender FamilyHist Diet LifeStyle Cholesterol
0 2 1 1 1 3 1
3 3 1 1 1 3 2
5 0 1 1 1 0 1
10 3 0 0 1 2 1
y_test
0 1
3 0
5 1
10 1
Name: HeartDisease, dtype: int32
y_pred
[1 0 1 1]
Confusion matrix
0 1
0 1 0
1 0 3
Program 8: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering
using k-Means algorithm. Compare the results of these two algorithms and comment on the quality of clustering.
You can add Java/Python ML library classes/API in the program.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.mixture import GaussianMixture
from sklearn.cluster import KMeans
data = pd.read_csv('lab8.csv')
print("Input Data and Shape")
print(data.shape)
data.head()
f1 = data['V1'].values
30
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
f2 = data['V2'].values
X = np.array(list(zip(f1, f2)))
print("X ", X)
print('Graph for whole dataset')
plt.scatter(f1, f2, c='black', s=7)
plt.show()
gmm = GaussianMixture(n_components=3).fit(X)
labels = gmm.predict(X)
probs = gmm.predict_proba(X)
size = 10 * probs.max(1) ** 3
print('Graph using EM Algorithm')
INPUT: lab8.csv
31
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
15 17.63935 -3.21235
16 4.415292 22.81555
17 11.94122 8.122487
18 0.725853 1.806819
19 8.185273 28.1326
20 -5.77359 1.0248
OUTPUT
32
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program 9: Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions. Java/Python ML library classes can be used for this problem.
import numpy as np
from sklearn.datasets import load_iris
iris=load_iris()
x=iris.data
y=iris.target
print(x[:5],y[:5])
xtrain,xtest,ytrain,ytest =train_test_split(x,y,test_size=0.4,random_state=1)
print(iris.data.shape)
print(len(xtrain))
print(len(ytest))
knn=KNeighborsClassifier(n_neighbors=1)
knn.fit(xtrain,ytrain)
pred=knn.predict(xtest)
print("Accuracy",metrics.accuracy_score(ytest,pred))
print(iris.target_names[2])
ytestn=[iris.target_names[i] for i in ytest]
predn=[iris.target_names[i] for i in pred]
OUTPUT
y sample [0 0 0 0 0]
Prediced [0 1 1 0 2 2 2 0 0 2 1 0 2 1 1 0 1 1 0 0 1 1 1 0 2 1 0 0 1 2 1 2 1 2 2 0 1
0 1 2 2 0 1 2 1 2 0 0 0 1 0 0 2 2 2 2 2 1 2 1]
Accuracy 0.9666666666666667
Virginica
predicted Actual
0 setosa setosa
1 versicolor versicolor
2 versicolor versicolor
3 setosa setosa
4 virginica virginica
5 virginica versicolor
6 virginica virginica
7 setosa setosa
8 setosa setosa
9 virginica virginica
10 versicolor versicolor
11 setosa setosa
12 virginica virginica
13 versicolor versicolor
14 versicolor versicolor
15 setosa setosa
16 versicolor versicolor
17 versicolor versicolor
18 setosa setosa
19 setosa setosa
20 versicolor versicolor
34
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
21 versicolor versicolor
22 versicolor versicolor
23 setosa setosa
24 virginica virginica
25 versicolor versicolor
26 setosa setosa
27 setosa setosa
28 versicolor versicolor
29 virginica virginica
30 versicolor versicolor
31 virginica virginica
32 versicolor versicolor
33 virginica virginica
34 virginica virginica
35 setosa setosa
36 versicolor versicolor
37 setosa setosa
38 versicolor versicolor
39 virginica virginica
40 virginica virginica
41 setosa setosa
42 versicolor virginica
43 virginica virginica
44 versicolor versicolor
45 virginica virginica
46 setosa setosa
47 setosa setosa
48 setosa setosa
49 versicolor versicolor
50 setosa setosa
51 setosa setosa
52 virginica virginica
53 virginica virginica
54 virginica virginica
55 virginica virginica
56 virginica virginica
57 versicolor versicolor
58 virginica virginica
59 versicolor versicolor
Program 10: Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
tou = 0.5
data=pd.read_csv("lab10.csv")
X_train = np.array(data.total_bill)
print(X_train)
X_train = X_train[:, np.newaxis]
print(len(X_train))
y_train = np.array(data.tip)
y_test = []
count = 0
for r in range(len(X_test)):
wts = np.exp(-np.sum((X_train - X_test[r]) ** 2, axis=1) / (2 * tou ** 2))
W = np.diag(wts)
factor1 = np.linalg.inv(X_train.T.dot(W).dot(X_train))
parameters = factor1.dot(X_train.T).dot(W).dot(y_train)
prediction = X_test[r].dot(parameters)
y_test.append(prediction)
count += 1
print(len(y_test))
y_test = np.array(y_test)
plt.plot(X_train.squeeze(), y_train, 'o')
INPUT
36
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
OUTPUT
37
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
The actual value was negative and the model predicted a negative value
Precision is a useful metric in cases where False Positive is a higher concern than False Negatives.
Recall is a useful metric in cases where False Negative trumps False Positive.
39
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
APPENDIX A
Program 3
Where,
p+ is the proportion of positive examples in S
p- is the proportion of negative examples in S
Program 6
40
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
fit(raw_documents[, y]) Learn a vocabulary dictionary of all tokens in the raw documents.
fit_transform(raw_documents[,
Learn the vocabulary dictionary and return document-term matrix.
y])
This is equivalent to fit followed by transform, but more efficiently implemented.
get_feature_names()[source] Array mapping from feature integer indices to feature name.
Multinomial Naïve Bayes uses term frequency i.e. the number of times a given term appears
in a document. ... After normalization, term frequency can be used to compute maximum
likelihood estimates based on the training data to estimate the conditional probability.
Naive Bayes classifier for multinomial models
The multinomial Naive Bayes classifier is suitable for classification with discrete features
(e.g., word counts for text classification).
The multinomial distribution normally requires integer feature counts.
Program 7
The sklearn.preprocessing.LabelEncoder().fit_transform(y) fits the label encoder, or
assigns labels, for a single column and transforms the values in that column to
the correct labels.
Label Encoding refers to converting the labels into numeric form so as to convert it into the
machine-readable form.
Machine learning algorithms can then decide in a better way on how those labels must
be operated.
It is an important pre-processing step for the structured dataset in supervised learning.
It assigns a unique number(starting from 0) to each class of data.
where 0 is the label for tall, 1 is the label for medium and 2 is label for short height.
41
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Program 8
What Is Clustering?
Clustering is a set of techniques used to partition data into groups, or clusters.
Clusters are loosely defined as groups of data objects that are more similar to other objects
in their cluster than they are to data objects in other clusters.
K-Means Algorithm
Conventional k-means requires only a few steps. The first step is to randomly
select k centroids, where k is equal to the number of clusters we choose.
Centroids are data points representing the center of a cluster.
The main element of the algorithm works by a two-step process called expectation-
maximization.
The expectation step assigns each data point to its nearest centroid.
Then, the maximization step computes the mean of all the points for each cluster and sets
the new centroid.
42
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
Applications
The EM algorithm has many applications, including:
Dis-entangling superimposed signals,
Estimating Gaussian mixture models (GMMs),
Estimating hidden Markov models (HMMs),
Estimating parameters for compound Dirichlet distributions,
Finding optimal mixtures of fixed models.
Limitations
The EM algorithm can be very slow, even on the fastest computer.
It works best when you only have a small percentage of missing data and
the dimensionality of the data isn’t too big.
The higher the dimensionality, the slower the E-step; for data with larger dimensionality,
we may find the E-step runs extremely slow as the procedure approaches a local
maximum.
Program 9
KNeighborsClassifier
K nearest neighbors is a simple algorithm that stores all available cases and classifies new
cases based on a similarity measure (e.g., distance functions).
43
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
The K in the name of this classifier represents the k nearest neighbors, where k is an integer
value specified by the user.
Hence as the name suggests, this classifier implements learning based on the k nearest
neighbors. The choice of the value of k is dependent on data.
Program 10
BASICS
Strings
1. str1 = 'Hello Python'
2. print(str1)
3. #Using double quotes
4. str2 = "Hello Python"
5. print(str2)
1. # Given String
2. str = "BMS Institue of Technology"
3. # Start Oth index to end
4. print(str[0:])
5. # Starts 1th index to 4th index
6. print(str[1:5])
7. # Starts 2nd index to 3rd index
8. print(str[2:4])
9. # Starts 0th to 2nd index
10. print(str[:3])
11. #Starts 4th to 6th index
12. print(str[4:7])
44
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
LIST
1. L1 = ["John", 102, "USA"]
2. L2 = [1, 2, 3, 4, 5, 6]
IIf we try to print the type of L1, L2, and L3 using type() function then it will come out to be a list.
1. print(type(L1))
2. print(type(L2))
Output:
<class 'list'>
<class 'list'>
Characteristics of Lists
The list has the following characteristics:
Let's check the first statement that lists are the ordered.
1. a = [1,2,"Peter",4.50,"Ricky",5,6]
2. b = [1,2,5,"Peter",4.50,"Ricky",6]
3. a ==b
Output:
False
Both lists have consisted of the same elements, but the second list changed the index position of the 5th
element that violates the order of lists. When compare both lists it returns the false.
Lists maintain the order of the element for the lifetime. That's why it is the ordered collection of objects.
Output:
True
45
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
list = [1,2,3,4,5,6,7]
print(list[0])
print(list[1])
print(list[2])
print(list[3])
print(list[0:6])
# By default the index value is 0 so its starts from the 0th element and go for index -1.
print(list[:])
print(list[2:5])
print(list[1:6:2])
list = [1,2,3,4,5]
print(list[-1])
print(list[-3:])
print(list[:-1])
print(list[-3:-1])
46
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
#updation
list = [1, 2, 3, 4, 5, 6]
print(list)
list[2] = 10
print(list)
# Adding multiple-element
print(list)
list[-1] = 25
print(list)
print(type(Student))
print(Student)
Creating a Dictionary
The dictionary is created using the multiple key-value pair, which enclosed within the curly brackets {},
and each key is separated from its value by the colon (:). The syntax is given below.
dict = {}
47
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M
Seventh Semester 18CSL76- MACHINE LEARNING LABORATORY
# Creating a Dictionary
# using the dict() method
dict1 = dict({1: 'Hello', 2: 'Hi', 3: 'Hey'})
print("\nCreate Dictionary by using the dict() method : ")
print(dict1)
1. Tom M. Mitchell, Machine Learning, India Edition 2013, McGraw Hill Education.
2. google.com
48
Mrs.Chethana C, Assistant Professor, Dept. Of CSE ,BMSIT&M