0% found this document useful (0 votes)
5 views

ML1408-Machine Learning Lab Programs

The document outlines various machine learning algorithms including FIND-ALGORITHM, CANDIDATE-ELIMINATION ALGORITHM, ID3/DECISION TREE CLASSIFIER, Artificial Neural Networks, and Naïve Bayesian classifiers, with corresponding datasets and example code implementations. Each algorithm is demonstrated using specific datasets such as Enjoying Sports, Shapes, and Iris, along with their outputs and accuracy scores. Additionally, notes are provided for modifying the code for different datasets.

Uploaded by

sherwinanand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

ML1408-Machine Learning Lab Programs

The document outlines various machine learning algorithms including FIND-ALGORITHM, CANDIDATE-ELIMINATION ALGORITHM, ID3/DECISION TREE CLASSIFIER, Artificial Neural Networks, and Naïve Bayesian classifiers, with corresponding datasets and example code implementations. Each algorithm is demonstrated using specific datasets such as Enjoying Sports, Shapes, and Iris, along with their outputs and accuracy scores. Additionally, notes are provided for modifying the code for different datasets.

Uploaded by

sherwinanand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

1) FINDS-ALGORITHM

Dataset:1) Enjoying sports

Example Sky Temp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes


2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

Dataset: 2) Manifacturers

Dataset: 3) Shapes (size,color,shape,class Lable)

Example Size Color Shape Class Label


1 Big Red Circle No
2 Small Red Triangle No
3 Small Red Circle Yes
4 Big Blue Circle No
5 Small Blue Circle Yes

PROGRAM:
import pandas as pd

# Read the dataset from CSV file

data = pd.read_csv(r'C:/Users/Documents/data.csv')

# Extract the features and target

X = data.iloc[:, :-1].values

y = data.iloc[:, -1].values

# Initialize the hypothesis with the first positive example

hypothesis = None
for i in range(len(X)):

if y[i] == 'Yes':

hypothesis = list(X[i])

break

# Refine the hypothesis by checking all positive examples

for i in range(len(X)):

if y[i] == 'Yes':

for j in range(len(X[i])):

if X[i][j] != hypothesis[j]:

hypothesis[j] = '?'

# Print the final hypothesis

print('The final hypothesis is:', hypothesis)

OUTPUT:

The final Hypothesis: ['sunny', 'warm', '?', 'strong', '?', '?']

*NOTE: Here I have used the word Yes for Enjoying sports (*sunny ,warm, normal, strong, warm,
same,yes) and shapes (*Big,Red,Circle,No) dataset.For manufacturer (*Japan, Honda, Blue, 1980,
Economy , Positive ) dataset use the word ‘Positive’ instead of ‘Yes’.

2)CANDITATE-ELIMINATION ALGORITHM

Dataset:1) Enjoying sports

1) Example Sky Temp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes


2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

Dataset: 2) Shapes (size,color,shape,class Lable)

Example Size Color Shape Class Label


1 Big Red Circle No
2 Small Red Triangle No
3 Small Red Circle Yes
4 Big Blue Circle No
5 Small Blue Circle Yes
PROGRAM:
import numpy as np

# Define the training examples

training_examples = [

(['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same'], 'Yes'),

(['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same'], 'Yes'),

(['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change'], 'No'),

(['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change'], 'Yes'),

def candidate_elimination(examples):

specific_hypothesis = ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same']

general_hypothesis = ['?','?','?','?','?','?']

for x, y in examples:

if y == 'Yes':

for i in range(len(x)):

if x[i] != specific_hypothesis[i]:

specific_hypothesis[i] = '?'

for i in range(len(x)):

if specific_hypothesis[i] == '?' and x[i] == general_hypothesis[i]:

general_hypothesis[i] = x[i]

else:

for i in range(len(x)):

if x[i] != specific_hypothesis[i] :

general_hypothesis[i] = '?'

else :

general_hypothesis[i] = specific_hypothesis[i]

print(f'Specific hypothesis: {specific_hypothesis}')

print(f'General hypothesis: {general_hypothesis}\n')

candidate_elimination(training_examples)
OUTPUT:

Specific hypothesis: ['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Sam


e']
General hypothesis: ['?', '?', '?', '?', '?', '?']

Specific hypothesis: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']


General hypothesis: ['?', '?', '?', '?', '?', '?']

Specific hypothesis: ['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']


General hypothesis: ['?', '?', '?', 'Strong', 'Warm', '?']

Specific hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']


General hypothesis: ['?', '?', '?', 'Strong', 'Warm', '?']

*NOTE: Here I enter the datas of enjoying sports dataset directly to the program.for shapes re-
enter the iths respective datas and proceed the program.

3) ID3/DECISION TREE CLASSIFIER (with iris dataset)

PROGRAM:
from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.datasets import load_iris

# Load the iris dataset

iris = load_iris()

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3)

# Create an instance of the DecisionTreeClassifier class

clf = DecisionTreeClassifier(criterion='entropy')

# Train the model using the training data

clf.fit(X_train, y_train)

# Predict the classes of the testing data

y_pred = clf.predict(X_test)

# Print the accuracy score of the model

print("Accuracy:", clf.score(X_test, y_test))

OUTPUT:

Accuracy: 0.9333333333333333
ID3/DECISION TREE CLASSIFIER:(When dataset is given in the question paper)

Datasets: 1) Covid infection problem

ID Fever Cough Breathing Infected


Issues
1 NO NO NO NO
2 YES YES YES YES
3 YES YES NO NO
4 YES NO YES YES
5 YES YES YES YES
6 NO YES NO NO
7 YES NO YES YES
8 YES NO YES YES
9 NO YES YES YES
10 YES YES NO YES
11 NO YES NO NO
12 NO YES YES YES
13 NO YES YES NO
14 YES YES NO NO

Datasets: 2) tennis in the given condition

Outlook Temperature Humidity Windy PlayTennis


Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
PROGRAM:
import pandas as pd

import numpy as np

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder

dataset = pd.read_csv(" data.csv")

x = dataset.iloc[:, [1,2,3]].values

y = dataset.iloc[:, 4].values

ct= ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0,1,2])], remainder='passthrough')

x = np.array(ct.fit_transform(x))

#print(x)

#print(y)

X_train, X_test, y_train, y_test = train_test_split(x,y, test_size=0.2)

clf = DecisionTreeClassifier(criterion='entropy')

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(y_pred)

print("Accuracy:", clf.score(X_test, y_test))

OUTPUT:

['YES' 'YES' 'NO']

Accuracy: 0.3333333333333333

*NOTE: The above program is for the covid infection dataset,for playing tennis dataset the below
line must be changer(line for reading the value of x from the dataset).The change is nothing but
the index of first column ‘0’ is added with it.The modified line for playing tennis dataset is:

x = dataset.iloc[:, [0,1,2,3]].values (the same program can run after changing this line)
4) Artificial Neural Network by implementing the Back propagation algorithm.

PROGRAM:

from sklearn.neural_network import MLPClassifier

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Generate a random dataset

X, y = make_classification(n_samples=1000)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create a multi-layer perceptron classifier with one hidden layer of 5 neurons

mlp = MLPClassifier( max_iter=1000, solver='sgd', random_state=42)

# Train the model using backpropagation

mlp.fit(X_train, y_train)

# Calculate the accuracy score of the model

accuracy = mlp.score(X_test, y_test)

print(f"Accuracy: {accuracy}")

OUTPUT:

Accuracy: 0.865
5) naïve Bayesian classifier (with iris dataset)

PROGRAM:

from sklearn.naive_bayes import GaussianNB

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

# Load the iris dataset

iris = load_iris()

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)

# Train a Gaussian Naive Bayes classifier

gnb = GaussianNB()

gnb.fit(X_train, y_train)

# Test the classifier on the test data

y_pred = gnb.predict(X_test)

print(y_pred)

print("ACCURACY",gnb.score(X_test,y_test))

OUTPUT:

[2 2 1 0 1 1 0 0 1 2 0 1 2 1 2 1 2 0 0 1 0 1 0 0 1 2 0 1 2 0]

ACCURACY 1.0

Naïve Bayesian Classifier: (When dataset is given in the question paper)

Datasets: 1) Stolen Vehicles(cars


Datasets: 2)Playing tennis on the given condition

Outlook Temperature Humidity Windy PlayTennis


Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No

PROGRAM:
import pandas as pd

import numpy as np

from sklearn.naive_bayes import GaussianNB

from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder

from sklearn.model_selection import train_test_split

dataset = pd.read_csv(" data.csv")

x = dataset.iloc[:, [0,1,2,3]].values

y = dataset.iloc[:, 4].values

ct= ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0,1,2])], remainder='passthrough')

x = np.array(ct.fit_transform(x))

#print(x)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size =0.3, random_state =0)

gnb= GaussianNB()

gnb.fit(x_train, y_train)

y_pred = gnb.predict(x_test)
print(y_pred)

print('Accuracy=',gnb.score(x_test,y_test))

OUTPUT:

['No' 'Yes' 'No' 'Yes' 'Yes']

Accuracy= 0.6

*NOTE: The above program is for the playing tennis dataset,for playing Stolen vehicle dataset the
below line must be changer(line for reading the value of x and y from the dataset).The change is
nothing but the index of fifth column ‘4’ was removed from y and index of forth column ‘3’ was
removed from x and added with y.The modified line for playing tennis dataset is:

x = dataset.iloc[:, [0,1,2]].values

y = dataset.iloc[:, 3].values

(the same program can run after changing this lines)

6)-naïve Bayesian classifier-II

DATASET:

message labelnum
I love this sandwich pos
This is an amazing place pos
I feel very good about these
beers pos
This is my best work pos
What an awesome view pos
I do not like this restaurant neg
I am tried of this stuff neg
I can't deal with this neg
He is my sworn enemy neg
My boss is horrible neg
This is an awesome place pos
I do not like the taste of this
juice neg
I love to dance pos
I am sick and tried of this place neg
What a great holiday pos
That is a bad locality to stay neg
We will have good fun
tomorrow pos
I went to my enemy's house
today neg
PROGRAM:
import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn import metrics

import numpy as np

msg=pd.read_csv(r'C:\Users\rajd3\Desktop\6-Dataset1.csv',names=['message','label'])

print('The dimensions of the dataset',msg.shape)

msg['labelnum']=msg.label.map({'pos':1,'neg':0})

a=msg['message']

b=msg['labelnum']

X=a[1:-1]

y=b[1:-1]

#splitting the dataset into train and test data

xtrain,xtest,ytrain,ytest=train_test_split(X,y)

print ('\n the total number of Training Data :',xtrain.shape)

print ('\n the total number of Test Data :',xtest.shape)

#output the words or Tokens in the text documents

cv = CountVectorizer()

xtrain_dtm = cv.fit_transform(xtrain)

xtest_dtm=cv.transform(xtest)

print('\n The words or Tokens in the text documents \n')

print(cv.get_feature_names_out())

clf = MultinomialNB()

clf.fit(xtrain_dtm,ytrain)

predicted = clf.predict(xtest_dtm)

#printing accuracy, Confusion matrix, Precision and Recall

print('\n Accuracy of the classifier is',metrics.accuracy_score(ytest,predicted))

print('\n Confusion matrix')

print(metrics.confusion_matrix(ytest,predicted))

print('\n The value of Precision', metrics.precision_score(ytest,predicted))

print('\n The value of Recall', metrics.recall_score(ytest,predicted))


OUTPUT:

The dimensions of the dataset (19, 2)

the total number of Training Data : (12,)

the total number of Test Data : (5,)

The words or Tokens in the text documents

['am' 'amazing' 'an' 'and' 'awesome' 'bad' 'best' 'boss' 'dance' 'do'

'enemy' 'fun' 'good' 'great' 'have' 'he' 'holiday' 'horrible' 'is' 'like'

'locality' 'love' 'my' 'not' 'of' 'place' 'restaurant' 'sick' 'stay'

'stuff' 'sworn' 'that' 'this' 'to' 'tomorrow' 'tried' 'we' 'what' 'will'

'work']

Accuracy of the classifier is 0.8

Confusion matrix

[[1 1]

[0 3]]

The value of Precision 0.75

The value of Recall 1.0

7) Bayesian network

import pandas as pd

from sklearn.naive_bayes import GaussianNB

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

from pgmpy.estimators import MaximumLikelihoodEstimator

from pgmpy.models import BayesianModel

from pgmpy.inference import VariableElimination

# Load the heart disease dataset

df = pd.read_csv('heart_disease.csv')

# Encode the categorical variables using LabelEncoder

le = LabelEncoder()

for column in df.columns:

if df[column].dtype == 'object':
df[column] = le.fit_transform(df[column])

# Split the data into inputs and targets

inputs = df.iloc[:, :-1]

targets = df.iloc[:, -1]

# Split the data into training and testing sets

train_inputs, test_inputs, train_targets, test_targets = train_test_split(inputs, targets, test_size=0.2,


random_state=42)

# Create a Bayesian network model using the pgmpy library

model = BayesianModel([('age', 'heartdisease'), ('sex', 'heartdisease'), ('exang', 'heartdisease'), ('cp',


'heartdisease'), ('heartdisease', 'restecg'), ('heartdisease', 'chol')])

# Learn the parameters of the model using Maximum Likelihood Estimation

model.fit(df, estimator=MaximumLikelihoodEstimator)

# Infer the posterior probabilities of the target variable using Variable Elimination

infer = VariableElimination(model)

posterior = infer.query(['heartdisease'], evidence={'age': 28, 'sex': 1, 'exang': 1, 'cp': 2, 'restecg': 1,


'chol': 200})

print('Probability of heart disease:', posterior.values[1])

# Train a Gaussian Naive Bayes classifier using the training data

nb = GaussianNB()

nb.fit(train_inputs, train_targets)

# Test the classifier using the testing data

predictions = nb.predict(test_inputs)

# Calculate the accuracy of the predictions

accuracy = accuracy_score(test_targets, predictions)

print('Accuracy:', accuracy)
8) EM algorithm

PROGRAM:

from sklearn.datasets import make_blobs

from sklearn.mixture import GaussianMixture

# Generate sample data

X, _ = make_blobs(n_samples=500, centers=3, random_state=42)

# Create a Gaussian Mixture Model with 3 components

gmm = GaussianMixture(n_components=3, random_state=42)

# Fit the GMM to the data using the EM algorithm

gmm.fit(X)

# Predict the cluster labels of the data

labels = gmm.predict(X)

# Print the parameters learned by the GMM

print('Means:', gmm.means_)

print('Covariances:', gmm.covariances_)

print('Weights:', gmm.weights_)

OUTPUT:

Means: [[-2.51336974 9.03492867]

[-6.83120002 -6.75657544]

[ 4.61416263 1.93184055]]

Covariances: [[[ 0.90129853 -0.01320113]

[-0.01320113 0.95416819]]

[[ 0.77515889 -0.09007485]

[-0.09007485 1.03680033]]

[[ 1.12983272 0.0239471 ]

[ 0.0239471 0.93604854]]]

Weights: [0.334 0.332 0.334]


9)-KNN(with iris dataset)

from sklearn.datasets import load_iris

from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Load the iris dataset

iris = load_iris()

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Create a k-NN classifier with k=3

knn = KNeighborsClassifier(n_neighbors=3)

# Fit the classifier to the training data

knn.fit(X_train, y_train)

# Predict the classes of the test data

y_pred = knn.predict(X_test)

# Calculate the accuracy of the classifier

accuracy = accuracy_score(y_test, y_pred)

print('Accuracy:', accuracy)

OUTPUT:

Accuracy: 1.0

KNN (When dataset is given in the question paper)

Datasets: 1) Angelina sports dataset

Name Age Gender Sport


Ajay 32 M Football
Mark 40 M Neither
Sara 16 F Cricket
Zaira 34 F Cricket
Sachin 55 M Neither
Rahul 40 M Cricket
Pooja 20 F Neither
Smith 15 M Cricket
Lakshmi 55 F Football
Michael 15 M Football
Angelina 5 F ?

NOTE:**While entering the value of gender in csv file assign M=1 and F=0

PROGRAM:

import numpy as np

import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Define the dataset

dataset = pd.read_csv("data.csv")
X = dataset.iloc[:, [1,2]].values
y = dataset.iloc[:, 3].values
#print(X)

#print(y)
# Reshape the input data to have 2 dimensions
X = X.reshape(-1, 2)
# Create the KNN model

model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)
# Use the model to predict the sport for a new data point
new_data = np.array([[5,0]])
sport = model.predict(new_data)

print("Angelina used to play:",sport)


# Predict the classes of the test data
y_pred = model.predict(X)
# Calculate the accuracy of the classifier

accuracy = accuracy_score(y, y_pred)


print('Accuracy:', accuracy)

OUTPUT:
Angelina used to play: ['Cricket']
Accuracy: 0.5

*NOTE: Here I have used M=1 and F=0 on the dataset to perform the operation as we doing in
theory exams.(changes need to be done in the datasets itself)

10) non-parametric Locally Weighted Regression algorithm

PROGRAM:

from sklearn.datasets import load_diabetes

from sklearn.neighbors import KNeighborsRegressor

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error

# Load the diabetes dataset

diabetes = load_diabetes()

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, test_size=0.3,


random_state=42)

# Create a KNeighborsRegressor with weights='distance' and n_neighbors=5

lwr = KNeighborsRegressor(weights='distance', n_neighbors=5)

# Fit the regressor to the training data

lwr.fit(X_train, y_train)

# Predict the target values of the test data

y_pred = lwr.predict(X_test)

# Calculate the mean squared error of the regressor

mse = mean_squared_error(y_test, y_pred)

print('Mean Squared Error:', mse)

OUTPUT:

Mean Squared Error: 3190.614716201732

You might also like