ML1408-Machine Learning Lab Programs
ML1408-Machine Learning Lab Programs
Dataset: 2) Manifacturers
PROGRAM:
import pandas as pd
data = pd.read_csv(r'C:/Users/Documents/data.csv')
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
hypothesis = None
for i in range(len(X)):
if y[i] == 'Yes':
hypothesis = list(X[i])
break
for i in range(len(X)):
if y[i] == 'Yes':
for j in range(len(X[i])):
if X[i][j] != hypothesis[j]:
hypothesis[j] = '?'
OUTPUT:
*NOTE: Here I have used the word Yes for Enjoying sports (*sunny ,warm, normal, strong, warm,
same,yes) and shapes (*Big,Red,Circle,No) dataset.For manufacturer (*Japan, Honda, Blue, 1980,
Economy , Positive ) dataset use the word ‘Positive’ instead of ‘Yes’.
2)CANDITATE-ELIMINATION ALGORITHM
training_examples = [
def candidate_elimination(examples):
general_hypothesis = ['?','?','?','?','?','?']
for x, y in examples:
if y == 'Yes':
for i in range(len(x)):
if x[i] != specific_hypothesis[i]:
specific_hypothesis[i] = '?'
for i in range(len(x)):
general_hypothesis[i] = x[i]
else:
for i in range(len(x)):
if x[i] != specific_hypothesis[i] :
general_hypothesis[i] = '?'
else :
general_hypothesis[i] = specific_hypothesis[i]
candidate_elimination(training_examples)
OUTPUT:
*NOTE: Here I enter the datas of enjoying sports dataset directly to the program.for shapes re-
enter the iths respective datas and proceed the program.
PROGRAM:
from sklearn.tree import DecisionTreeClassifier
iris = load_iris()
clf = DecisionTreeClassifier(criterion='entropy')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
OUTPUT:
Accuracy: 0.9333333333333333
ID3/DECISION TREE CLASSIFIER:(When dataset is given in the question paper)
import numpy as np
x = dataset.iloc[:, [1,2,3]].values
y = dataset.iloc[:, 4].values
x = np.array(ct.fit_transform(x))
#print(x)
#print(y)
clf = DecisionTreeClassifier(criterion='entropy')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(y_pred)
OUTPUT:
Accuracy: 0.3333333333333333
*NOTE: The above program is for the covid infection dataset,for playing tennis dataset the below
line must be changer(line for reading the value of x from the dataset).The change is nothing but
the index of first column ‘0’ is added with it.The modified line for playing tennis dataset is:
x = dataset.iloc[:, [0,1,2,3]].values (the same program can run after changing this line)
4) Artificial Neural Network by implementing the Back propagation algorithm.
PROGRAM:
X, y = make_classification(n_samples=1000)
mlp.fit(X_train, y_train)
print(f"Accuracy: {accuracy}")
OUTPUT:
Accuracy: 0.865
5) naïve Bayesian classifier (with iris dataset)
PROGRAM:
iris = load_iris()
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
print(y_pred)
print("ACCURACY",gnb.score(X_test,y_test))
OUTPUT:
[2 2 1 0 1 1 0 0 1 2 0 1 2 1 2 1 2 0 0 1 0 1 0 0 1 2 0 1 2 0]
ACCURACY 1.0
PROGRAM:
import pandas as pd
import numpy as np
x = dataset.iloc[:, [0,1,2,3]].values
y = dataset.iloc[:, 4].values
x = np.array(ct.fit_transform(x))
#print(x)
gnb= GaussianNB()
gnb.fit(x_train, y_train)
y_pred = gnb.predict(x_test)
print(y_pred)
print('Accuracy=',gnb.score(x_test,y_test))
OUTPUT:
Accuracy= 0.6
*NOTE: The above program is for the playing tennis dataset,for playing Stolen vehicle dataset the
below line must be changer(line for reading the value of x and y from the dataset).The change is
nothing but the index of fifth column ‘4’ was removed from y and index of forth column ‘3’ was
removed from x and added with y.The modified line for playing tennis dataset is:
x = dataset.iloc[:, [0,1,2]].values
y = dataset.iloc[:, 3].values
DATASET:
message labelnum
I love this sandwich pos
This is an amazing place pos
I feel very good about these
beers pos
This is my best work pos
What an awesome view pos
I do not like this restaurant neg
I am tried of this stuff neg
I can't deal with this neg
He is my sworn enemy neg
My boss is horrible neg
This is an awesome place pos
I do not like the taste of this
juice neg
I love to dance pos
I am sick and tried of this place neg
What a great holiday pos
That is a bad locality to stay neg
We will have good fun
tomorrow pos
I went to my enemy's house
today neg
PROGRAM:
import pandas as pd
import numpy as np
msg=pd.read_csv(r'C:\Users\rajd3\Desktop\6-Dataset1.csv',names=['message','label'])
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
a=msg['message']
b=msg['labelnum']
X=a[1:-1]
y=b[1:-1]
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
cv = CountVectorizer()
xtrain_dtm = cv.fit_transform(xtrain)
xtest_dtm=cv.transform(xtest)
print(cv.get_feature_names_out())
clf = MultinomialNB()
clf.fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
print(metrics.confusion_matrix(ytest,predicted))
['am' 'amazing' 'an' 'and' 'awesome' 'bad' 'best' 'boss' 'dance' 'do'
'enemy' 'fun' 'good' 'great' 'have' 'he' 'holiday' 'horrible' 'is' 'like'
'stuff' 'sworn' 'that' 'this' 'to' 'tomorrow' 'tried' 'we' 'what' 'will'
'work']
Confusion matrix
[[1 1]
[0 3]]
7) Bayesian network
import pandas as pd
df = pd.read_csv('heart_disease.csv')
le = LabelEncoder()
if df[column].dtype == 'object':
df[column] = le.fit_transform(df[column])
model.fit(df, estimator=MaximumLikelihoodEstimator)
# Infer the posterior probabilities of the target variable using Variable Elimination
infer = VariableElimination(model)
nb = GaussianNB()
nb.fit(train_inputs, train_targets)
predictions = nb.predict(test_inputs)
print('Accuracy:', accuracy)
8) EM algorithm
PROGRAM:
gmm.fit(X)
labels = gmm.predict(X)
print('Means:', gmm.means_)
print('Covariances:', gmm.covariances_)
print('Weights:', gmm.weights_)
OUTPUT:
[-6.83120002 -6.75657544]
[ 4.61416263 1.93184055]]
[-0.01320113 0.95416819]]
[[ 0.77515889 -0.09007485]
[-0.09007485 1.03680033]]
[[ 1.12983272 0.0239471 ]
[ 0.0239471 0.93604854]]]
iris = load_iris()
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print('Accuracy:', accuracy)
OUTPUT:
Accuracy: 1.0
NOTE:**While entering the value of gender in csv file assign M=1 and F=0
PROGRAM:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Define the dataset
dataset = pd.read_csv("data.csv")
X = dataset.iloc[:, [1,2]].values
y = dataset.iloc[:, 3].values
#print(X)
#print(y)
# Reshape the input data to have 2 dimensions
X = X.reshape(-1, 2)
# Create the KNN model
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)
# Use the model to predict the sport for a new data point
new_data = np.array([[5,0]])
sport = model.predict(new_data)
OUTPUT:
Angelina used to play: ['Cricket']
Accuracy: 0.5
*NOTE: Here I have used M=1 and F=0 on the dataset to perform the operation as we doing in
theory exams.(changes need to be done in the datasets itself)
PROGRAM:
diabetes = load_diabetes()
lwr.fit(X_train, y_train)
y_pred = lwr.predict(X_test)
OUTPUT: