ML Lab Programs (1-13)
ML Lab Programs (1-13)
INDEX
Input attributes are (from left to right) income, recreation, job, status, age-
group, home-owner. Find the unconditional probability of `golf' and the
conditional probability of `single' given `medRisk' in the dataset?
6. Implement linear regression using python.
7. Implement Naïve Bayes theorem to classify the English text
8. Implement an algorithm to demonstrate the significance of genetic algorithm
Implement the finite words classification system using Back-propagation
9.
algorithm
10. Find-S Algorithm
11. Candidate Elimination Algorithm
12. K-Means Clustering Algorithm
Experiment :1
1. The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school days
in a week, the probability that it is Friday is 20 %. What is theprobability that a student is absent
given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)
ALGORITHM:
Step 1: Calculate probability for each word in a text and filter the words which have a probability less than threshold
probability. Words with probability less than threshold probability are irrelevant.
Step 2: Then for each word in the dictionary, create a probability of that word being in insincere questions and its
probability insincere questions. Then finding the conditional probability to use in naive Bayes classifier.
Step 4: End.
PROGRAM:
PABF=PFIA / PF
print(“probability that a student is absent given that today is Friday using conditional
probabilities=”,PABF)
OUTPUT:
probability that a student is absent given that today is Friday using conditional probabilities= 0.15
Experiment:2
consider dataset excel consists of 14 input columns and 3 output columns (C1, C2, C3)as follows:
import pandas as pd
Import csv
dataset=pd.read_csv("Sample_Dataset.csv", delimiter=',')
X = dataset[['AA','BB','CC','DD','EE','FF']].values
Y = dataset[['C1','C2','C3']].values
X1 = dataset[['AA','BB','CC']].values
K-Nearest Neighbors, or KNN for short, is one of the simplest machine learning algorithms and
is used in a wide array of institutions. KNN is a non-parametric, lazy learning algorithm. When
we say a technique is non-parametric, it means that it does not make any assumptions about the
underlying data.
Pros:
Cons:
High memory requirement — All of the training data must be present in memory in order to
calculate the closest K neighbors
Sensitive to irrelevant features
Sensitive to the scale of the data since we’re computing the distance to the closest K points
ALGORITHM:
Step 3: For getting the predicted class, iterate from 1 to total number of training data points
i) Calculate the distance between test data and each row of training data. Here we will
use Euclidean distance as our distance metric since it’s the most popular method.
The other metrics that can be used are Chebyshev, cosine, etc.
ii) Sort the calculated distances in ascending order based on distance values 3. Get top
k rows from the sorted array
iii) Get the most frequent class of these rows i.e. Get the labels of the selected K entries
iv) Return the predicted class If regression, return the mean of the K labels If
classification, return the mode of the K labels
If regression, return the mean of the K labels
If classification, return the mode of the K labels
Step 4: End.
PROGRAM
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
data = iris.data
labels = iris.target
np.random.seed(42)
indices = np.random.permutation(len(data))
n_training_samples = 12
learn_data = data[indices[:-n_training_samples]]
learn_labels = labels[indices[:-n_training_samples]]
test_data = data[indices[-n_training_samples:]]
test_labels = labels[indices[-n_training_samples:]]
for i in range(5):
print(f"{i:4d} {learn_data[i]} {learn_labels[i]:3}")
for i in range(5):
print(f"{i:4d} {learn_data[i]} {learn_labels[i]:3}")
#The following code is only necessary to visualize the data of our learnset
X = []
for iclass in range(3):
for i in range(len(learn_data)):
if learn_labels[i] == iclass:
X[iclass][0].append(learn_data[i][0])
X[iclass][1].append(learn_data[i][1])
X[iclass][2].append(sum(learn_data[i][2:]))
ax = fig.add_subplot(111, projection='3d')
for iclass in range(3):
#----------------------------------------------------
"""
get_neighors calculates a list of the k nearest neighbors of an instance 'test_instance'.
The function returns a list of k 3-tuples. Each 3-tuples consists of (index, dist, label)
"""
distances = []
distances.sort(key=lambda x: x[1])
neighbors = distances[:k]
return neighbors
for i in range(5):
print("Index: ",i,'\n',
"Testset Data: ",test_data[i],'\n',
4. Given the following data, which specify classifications for nine combinations of VAR1 and
VAR2 predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result of
k-means clustering with 3 means (i.e., 3centroids)
ALGORITHM:
The idea of the K-Means algorithm is to find k-centroid points and every point in the dataset will
belong to either of the k-sets having minimum Euclidean distance.
Step 1: Create X array with [var1,var2] as each element from the given input.
Step 2:Create y array with Class attribute from the given input.
Step 3:Training the KMeans model by providing (X,y) as training data.
Step 4:Predict the model by giving input.
Step 5: End
PROGRAM:
import numpy as np
X = np.array([[1.713,1.586], [0.180,1.786], [0.353,1.240], [0.940,1.566], [1.486,0.759],
[1.266,1.106],[1.540,0.419],[0.459,1.799],[0.773,0.186]])
y=np.array([0,1,1,0,1,0,1,1,1])
kmeans.predict([[0.906, 0.606]])
OUTPUT:
5.The following training examples map descriptions of individuals onto high, medium
and low credit-worthiness.
medium skiing design single twenties no -> highRisk
high golf trading married forties yes -> lowRisk
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisk
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk
medium golf transport married forties yes -> lowRisk
high skiing banking single thirties yes -> highRisk
low golf unemployed married forties yes -> highRisk
Input attributes are (from left to right) income, recreation, job, status, age-group, home-owner.
Find the unconditional probability of `golf' and the conditional probability of `single' given
`medRisk' in the dataset?
PROGRAM:
totalRecords=10
numberGolfRecreation=4
probGolf=numberGolfRecreation/totalRecords
print("Unconditional probability o f golf: ={}".format(probGolf))
#conditional probability of `single' given`medRisk'
# bayes Formula
#p(single|medRisk)=p(medRisk|single)p(single)/p(medRisk)
#p(medRisk|single)=p(medRisk ∩single)/p(single)
numberMedRiskSingle=2
numberMedRisk=3
probMedRiskSingle=numberMedRiskSingle/totalRecords
probMedRisk=numberMedRisk/totalRecords
conditionalProbability=(probMedRiskSingle/probMedRisk)
print("Conditional probability of single given medRisk: = {}".format(conditionalProbability))
OUTPUT:
ALGORITHM:
Step 6: End
PROGRAM:
import numpy as np
np.random.seed(0)
x = np.random.rand(100, 1) #Generate a 2-D array with 100 rows, each row containing 1
random numbers:
y = 2 + 3 * x + np.random.rand(100, 1)
# model evaluation
r2 = r2_score(y, y_predicted)
# printing values
print('Slope:' ,regression_model.coef_)
print('Intercept:', regression_model.intercept_)
plt.scatter(x, y, s=10)
# predicted values
plt.show() )
OUTPUT:
Experiment 7
7. Implement Naive Bayes Theorem to Classify the English Text using python
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described
as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem
The values 0,1,2, encode the frequency of a word that appeared in the initial text data.
E.g. The first transformed row is [0 1 1 1 0 0 1 0 1] and the unique vocabulary is [‘and’,
‘document’, ‘first’, ‘is’, ‘one’, ‘second’, ‘the’, ‘third’, ‘this’], thus this means that the words
“document”, “first”, “is”, “the” and “this” appeared 1 time each in the initial text string (i.e.
‘This is the first document.’).
In our example, we will convert the collection of text documents (train and test sets) into a
matrix of token counts.
To implement that text transformation we will use the make_pipeline function. This will
internally transform the text data and then the model will be fitted using the transformed data.
Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis,
and classifying articles.
So to solve the text classification problem, we need to follow the below steps:
Source Code
print("NAIVE BAYES ENGLISH TEST CLASSIFICATION")
print(np.array(test_data.target_names)[predicted_categories])
ALGORITHM:
3) Mutation Operator: The key idea is to insert random genes in offspring to maintain the diversity in
population to avoid the premature convergence.
Given a target string, the goal is to produce target string starting from a random string of the same
length. In the following implementation, following analogies are made –
Characters A-Z, a-z, 0-9 and other special symbols are considered as genes
A string generated by these character is considered as chromosome/solution/Individual
Fitness score is the number of characters which differ from characters in target string at a particular
index. So individual having lower fitness value is given more preference.
Source Code
import random
# Valid genes
GENES = '''abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP
QRSTUVWXYZ 1234567890, .-;:_!"#%&/()=?@${[]}'''
class Individual(object):
'''
Class representing individual in population '''
def __init__(self, chromosome):
self.chromosome = chromosome
self.fitness = self.cal_fitness()
@classmethod
def mutated_genes(self):
'''
create random genes for mutation
'''
global GENES
gene = random.choice(GENES)
return gene
@classmethod
def create_gnome(self):
'''
create chromosome or string of genes
'''
global TARGET
gnome_len = len(TARGET)
return [self.mutated_genes() for _ in range(gnome_len)]
# random probability
prob = random.random()
def cal_fitness(self):
''' Calculate fittness score, it is the number of
characters in string which differ from target string. '''
global TARGET
fitness = 0
for gs, gt in zip(self.chromosome, TARGET):
if gs != gt: fitness+= 1
return fitness
# Driver code
def main():
global POPULATION_SIZE
#current generation
generation = 1
found = False
population = []
population = new_generation
if __name__ == '__main__':
main()
OUTPUT:
Experiment 9
9. Implement an algorithm to demonstrate Back Propagation Algorithm in python
It is the most widely used algorithm for training artificial neural networks.
In the simplest scenario, the architecture of a neural network consists of some sequential layers,
where the layer numbered i is connected to the layer numbered i+1. The layers can be classified
into 3 classes:
1. Input
2. Hidden
3. Output
Usually, each neuron in the hidden layer uses an activation function like sigmoid or rectified
linear unit (ReLU). This helps to capture the non-linear relationship between the inputs and their
outputs.
The neurons in the output layer also use activation functions like sigmoid (for regression) or
SoftMax (for classification).
To train a neural network, there are 2 passes (phases):
Forward
Backward
The forward and backward phases are repeated from some epochs. In each epoch, the following
occurs:
1. The inputs are propagated from the input to the output layer.
2. The network error is calculated.
3. The error is propagated from the output layer to the input layer.
Knowing that there’s an error, what should we do? We should minimize it. To minimize network
error, we must change something in the network.
Remember that the only parameters we can change are the weights and biases. We can try
different weights and biases, and then test our network.
Disdavantages of Backpropagation
Source Code:
import numpy
import matplotlib.pyplot as plt
def sigmoid(sop):
return 1.0/(1+numpy.exp(-1*sop))
def sigmoid_sop_deriv(sop):
return sigmoid(sop)*(1.0-sigmoid(sop))
def sop_w_deriv(x):
return x
x1=0.1
x2=0.4
target = 0.7
learning_rate = 0.01
w1=numpy.random.rand()
w2=numpy.random.rand()
predicted_output = []
network_error = []
old_err = 0
for k in range(80000):
# Forward Pass
y = w1*x1 + w2*x2
predicted = sigmoid(y)
err = error(predicted, target)
predicted_output.append(predicted)
network_error.append(err)
# Backward Pass
g1 = error_predicted_deriv(predicted, target)
g2 = sigmoid_sop_deriv(y)
g3w1 = sop_w_deriv(x1)
g3w2 = sop_w_deriv(x2)
gradw1 = g3w1*g2*g1
gradw2 = g3w2*g2*g1
#print(predicted)
plt.figure()
plt.plot(network_error)
plt.title("Iteration Number vs Error")
plt.xlabel("Iteration Number")
plt.ylabel("Error")
plt.show()
plt.figure()
plt.plot(predicted_output)
plt.title("Iteration Number vs Prediction")
plt.xlabel("Iteration Number")
plt.ylabel("Prediction")
plt.show()
OUTPUT:
Training Database
Algorithm
Hypothesis Construction
Source Code:
import csv
with open('enjoysport.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
a.append(row)
print(a)
print("\n The total number of training instances are : ",len(a))
num_attribute = len(a[0])-1
print("\n The initial hypothesis is : ")
hypothesis = ['0']*num_attribute
print(hypothesis)
for i in range(0, len(a)):
if a[i][num_attribute] == 'Yes': #for each positive example only
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\n The hypothesis for the training instance {} is : \n".format(i+1),hypothesis)
print("\n The Maximally specific hypothesis for the training instance is ")
print(hypothesis)
OUTPUT:
Experiment 11
11. Implementing Candidate Elimination algorithm using python
Training Database
Algorithm
Sourc
e
Code:
impor
t csv
with open("enjoysport.csv") as f:
csv_file=csv.reader(f)
data=list(csv_file)
print(data)
print("--------------------")
s=data[1][:-1] #extracting one row or instance or record
g=[['?' for i in range(len(s))] for j in range(len(s))]
print(s)
print("--------------------")
print(g)
print("--------------------")
for i in data:
if i[-1]=="Yes": # For each positive training record or instance
for j in range(len(s)):
if i[j]!=s[j]:
s[j]='?'
g[j][j]='?'
OUTPUT:
Experiment 12
ALGORITHM:
Step 6: End
PROGRAM:
#Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn import datasets
#Read DataSet
df = datasets.load_iris()
x = df.data
y = df.target
print(x)
print(y)
kmeans5 = KMeans(n_clusters=5)
y_kmeans5 = kmeans5.fit_predict(x)
print(y_kmeans5)
print(kmeans5.cluster_centers_)
Error =[ ]
for i in range(1, 11):
kmeans = KMeans(n_clusters = i).fit(x)
kmeans.fit(x)
Error.append(kmeans.inertia_)
import matplotlib.pyplot as plt
plt.plot(range(1, 11), Error)
plt.title('Elbow method')
plt.xlabel('No of clusters')
plt.ylabel('Error')
plt.show()
print(kmeans3.cluster_centers_)
OUTPUT:
Experiment: 13
Reference: https://www.geeksforgeeks.org/decision-tree-implementation-python/
ALGORITHM:
1. Find the best attribute and place it on the root node of the tree.
2. Now, split the training set of the dataset into subsets. While making the subset make
sure that each subset of training dataset should have the same value for an attribute.
3. Find leaf nodes in all branches by repeating 1 and 2 on each subset.
PROGRAM:
import numpy as np
import pandas as pd
def importdata():
balance_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-
databases/balance-scale/balance-scale.data',sep= ',', header = None)
def splitdataset(balance_data):
X = balance_data.values[:, 1:5]
Y = balance_data.values[:, 0]
# Performing training
clf_gini.fit(X_train, y_train)
return clf_gini
clf_entropy = DecisionTreeClassifier(
max_depth = 3, min_samples_leaf = 5)
# Performing training
clf_entropy.fit(X_train, y_train)
return clf_entropy
y_pred = clf_object.predict(X_test)
print("Predicted values:")
print(y_pred)
return y_pred
confusion_matrix(y_test, y_pred))
print ("Accuracy : ",
accuracy_score(y_test,y_pred)*100)
print("Report : ",
classification_report(y_test, y_pred))
# Driver code
def main():
# Building Phase
data = importdata()
# Operational Phase
cal_accuracy(y_test, y_pred_gini)
cal_accuracy(y_test, y_pred_entropy)
# Calling main function
if __name__=="__main__":
main()
OUTPUT:
Dataset Length: 625
Dataset Shape: (625, 5)
Dataset: 0 1 2 3 4
0 B 1 1 1 1
1 R 1 1 1 2
2 R 1 1 1 3
3 R 1 1 1 4
4 R 1 1 1 5
Results Using Gini Index:
Predicted values:
['R' 'L' 'R' 'R' 'R' 'L' 'R' 'L' 'L' 'L' 'R' 'L' 'L' 'L' 'R' 'L' 'R' 'L'
'L' 'R' 'L' 'R' 'L' 'L' 'R' 'L' 'L' 'L' 'R' 'L' 'L' 'L' 'R' 'L' 'L' 'L'
'L' 'R' 'L' 'L' 'R' 'L' 'R' 'L' 'R' 'R' 'L' 'L' 'R' 'L' 'R' 'R' 'L' 'R'
'R' 'L' 'R' 'R' 'L' 'L' 'R' 'R' 'L' 'L' 'L' 'L' 'L' 'R' 'R' 'L' 'L' 'R'
'R' 'L' 'R' 'L' 'R' 'R' 'R' 'L' 'R' 'L' 'L' 'L' 'L' 'R' 'R' 'L' 'R' 'L'
'R' 'R' 'L' 'L' 'L' 'R' 'R' 'L' 'L' 'L' 'R' 'L' 'R' 'R' 'R' 'R' 'R' 'R'
'R' 'L' 'R' 'L' 'R' 'R' 'L' 'R' 'R' 'R' 'R' 'R' 'L' 'R' 'L' 'L' 'L' 'L'
'L' 'L' 'L' 'R' 'R' 'R' 'R' 'L' 'R' 'R' 'R' 'L' 'L' 'R' 'L' 'R' 'L' 'R'
'L' 'L' 'R' 'L' 'L' 'R' 'L' 'R' 'L' 'R' 'R' 'R' 'L' 'R' 'R' 'R' 'R' 'R'
'L' 'L' 'R' 'R' 'R' 'R' 'L' 'R' 'R' 'R' 'L' 'R' 'L' 'L' 'L' 'L' 'R' 'R'
'L' 'R' 'R' 'L' 'L' 'R' 'R' 'R']
Confusion Matrix: [[ 0 6 7]
[ 0 67 18]
[ 0 19 71]]
Accuracy : 73.40425531914893
Report : precision recall f1-score support