ML LAB
ML LAB
KILAKARAI – 623806
(Approved by AICTE, Accredited by NAAC and NBA)
PG PRACTICAL EXAMINATIONS
JULY-2024
Student Name :
Register Number :
Name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Class: . . . . . . . . . . . . . . . . . .
.
Register No:
Certified that this is the bonafide record of work done by above student in the
CP4252 – Machine Learning during the year 2023-2024.
0
EX.NO:1 LINEAR REGRESSION
AIM:
To implement the linear regression model and to experiment with different features in
building a model.
DEFINITION:
Let us consider a dataset where we have a value of response y for every feature x:
Now, the task is to find a line that fits best in the above scatter plot so that we can
predict the response for any new feature values. (i.e a value of x not present in the datasetThis
line is called a regression line.
PROCEDURE:
• Importing required libraries like pandas & numpy for data analysis and manipulation and
seaborn & matplotlib for data visualization.
• Visualizing the variables in order to interpret business/domain inferences.
• Splitting the data into two sections in order to train a subset of dataset to generate a trained
(fitted) line
1
• Rescaling the trained model: It is a method used to normalize the range of numerical
variables with varying degrees of magnitude.
• Residual analysis of the train data tells us how much the errors are distributed across the
model. A good residual analysis will signify that the mean is centred around 0.
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if name == " main ":
main()
2
OUTPUT:
RESULT:
Thus the program to implement linear regression model was implemented and executed
successfully.
3
EX.NO:2 BINARY CLASSIFICATION MODEL
AIM:
To write a program to implement the binary classification model using python.
PROCEDURE:
Step 1: Define explanatory and target variables
Step 2: Split the dataset into training and testing sets
Step 3: Normalize the data for numerical stability
Step 4: Fit a logistic regression model to the training data
Step 5: Make predictions on the testing data
Step 6: Calculate the accuracy score by comparing the actual values and predicted values.
PROGRAM:
import numpy as np
class Perceptron(object):
""" Perceptron Classifier
Parameters
rate : float
Learning rate (ranging from 0.0 to 1.0)
number_of_iteration : int
Number of iterations over the input dataset.
Attributes:
weight_matrix : 1d-array
Weights after fitting.
error_matrix : list
Number of misclassification in every epoch(one full training cycle on the training set)
"""
def init (self, rate = 0.01, number_of_iterations = 100):
self.rate = rate
self.number_of_iterations = number_of_iterations
def fit(self, X, y):
""" Fit training data
Parameters:
4
y : array-like, shape = [number_of_samples]
Target values.
Returns
self : object
"""
self.weight_matrix = np.zeros(1 + X.shape[1])
self.errors_list = []
for _ in range(self.number_of_iterations):
errors = 0
for xi, target in zip(X, y):
update = self.rate * (target - self.predict(xi))
self.weight_matrix[1:] += update * xi
self.weight_matrix[0] += update
errors += int(update != 0.0)
self.errors_list.append(errors)
return self
def dot_product(self, X):
""" Calculate the dot product """
return (np.dot(X, self.weight_matrix[1:]) + self.weight_matrix[0])
def predict(self, X):
""" Predicting the label for the input data """
return np.where(self.dot_product(X) >= 0.0, 1, 0)
if name == ' main ':
X = np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0]])
y = np.array([0, 1, 1, 1, 1, 1, 1])
p = Perceptron()
p.fit(X, y)
print("Predicting the output of [1, 1, 1] = {}".format(p.predict([1, 1, 1])))
OUTPUT:
Predicting the output of [1, 1, 1] = 1
RESULT:
Thus the program for implementing binary classification model was implemented and executed
successfully.
5
EX.NO:3 CLASSIFICATION WITH NEAREST NEIGHBOURS
AIM:
To write the program for the implementation of the k-nearest neighbor algorithm
ALGORITHM:
Step 1 − For implementing any algorithm, we need dataset. So during the first step of KNN, we
must load the training as well as test data.
Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can be any
integer.
Step 3 − For each point in the test data do the following −
• 3.1 − Calculate the distance between test data and each row of training data with the help
of any of the method namely: Euclidean, Manhattan or Hamming distance. The most
commonly used method to calculate distance is Euclidean.
• 3.2 − Now, based on the distance value, sort them in ascending order.
• 3.3 − Next, it will choose the top K rows from the sorted array.
• 3.4 − Now, it will assign a class to the test point based on most frequent class of these
rows.
Step 4 − End
PROGRAM:
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
# Loading data
irisData = load_iris()
# Create feature and target arrays
X = irisData.data
y = irisData.target
# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)
# Predict on dataset which model has not seen before
print(knn.predict(X_test))
OUTPUT
[1 0 2 1 1 0 1 2 2 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]
6
PERFORMANCE
# Loading data
irisData = load_iris()
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)
OUTPUT:
0.9666666666666667
MODEL ACCURACY:
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt
irisData = load_iris()
# Create feature and target arrays
X = irisData.data
y = irisData.target
# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))
7
# Loop over K values
for i, k in enumerate(neighbors):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
# Generate plot
plt.plot(neighbors, test_accuracy, label = 'Testing dataset Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training dataset Accuracy')
plt.legend()
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.show()
OUTPUT:
RESULT :
Thus the program for the implementation of the k-nearest neighbor algorithm was verified and
executed successfully.
8
EX.NO:4 EXPERIMENT WITH VALIDATION SET AND TEST SET
AIM:
To write an experiment with validation sets and test sets for the given dataset.
PROCEDURE:
Training Dataset
The sample of data used to fit the model.The actual dataset that we use to train the model
(weights and biases in the case of a Neural Network). The model sees and learns from this data.
Validation Dataset
The sample of data used to provide an unbiased evaluation of a model fit on the training dataset
while tuning model hyperparameters. The evaluation becomes more biased as skill on the
validation dataset is incorporated into the model configuration.
Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on
the training dataset.The Test dataset provides the gold standard used to evaluate the model. It is
only used once a model is completely trained(using the train and validation sets).
PROGRAM:
9
y = range(8)
train_size=0.8,
random_state=42)
# Training set
print("Training set x: ",x_train)
print("Training set y: ",y_train)
# Importing numpy & scikit-learn
import numpy as np
from sklearn.model_selection import train_test_split
test_size=0.2,
random_state=42)
# Testing set
print("Testing set x: ", x_test)
print("Testing set y: ", y_test)
# Importing numpy & scikit-learn
import numpy as np
from sklearn.model_selection import train_test_split
10
x = np.arange(24).reshape((8,3))
# Testing set
print("Testing set x: ",x_test)
print("Testing set y: ",y_test)
print(" ")
# Validation set
print("Validation set x: ",x_val)
print("Validation set y: ",y_val)
OUTPUT:
Training set x: [[ 0 1]
[14 15]
[ 4 5]
[ 8 9]
[ 6 7]
[12 13]]
Training set y: [0, 7, 2, 4, 3, 6]
Testing set x: [[ 2 3]
[10 11]]
Testing set y: [1, 5]
Training set x: [[ 0 1 2]
[21 22 23]
[ 6 7 8]
[12 13 14]
11
[ 9 10 11]
[18 19 20]]
Training set y: [0, 7, 2, 4, 3, 6]
RESULT:
Thus the program for the implementation of an experiment with validation sets and test sets for
the given dataset was verified and executed successfully.
12
EX.NO:5 K-MEANS CLUSTERING
AIM:
To write a program for the implementation of the Kmeans to the given dataset.
PROCEDURE:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
PROGRAM
from sklearn.cluster import KMeans
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
from sklearn.datasets import load_iris
%matplotlib inline
iris = load_iris()
df = pd.DataFrame(iris.data,columns=iris.feature_names)
df.head()
OUTPUT:
13
df['flower'] = iris.target
df.head()
OUTPUT:
OUTPUT:
km = KMeans(n_clusters=3)
yp = km.fit_predict(df)
yp
OUTPUT:
14
df['cluster'] = yp
df.head(2)
OUTPUT:
df.cluster.unique()
OUTPUT:
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
15
sse = []
k_rng = range(1,10)
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df)
sse.append(km.inertia_)
plt.xlabel('K')
16
RESULT :
Thus the program for the implementation of the Kmeans to the given dataset was verified and
executed successfully.
17
EX.NO:6 NAIVE BAYES CLASSIFIER
AIM:
To implement a program for Naïve Bayes model
P(H|E) = P(E|H)*P(H)/P(E)
18
Step 4: Trying all together
Finally, we tie to all steps together and form our own model of Naive Bayes Classifier.
PROGRAM:
# the tuples consist of (delay time of train1, number of times)
X, Y = zip(*in_time)
X2, Y2 = zip(*too_late)
bar_width = 0.9
plt.bar(X, Y, bar_width, color="blue", alpha=0.75, label="in time")
bar_width = 0.8
plt.bar(X2, Y2, bar_width, color="red", alpha=0.75, label="too late")
plt.legend(loc='upper right')
plt.show()
in_time_dict = dict(in_time)
too_late_dict = dict(too_late)
def catch_the_train(min):
s = in_time_dict.get(min, 0)
if s == 0:
return 0
else:
m = too_late_dict.get(min, 0)
return s / (s + m)
19
OUTPUT:
-1 0
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 0.6
7 0.4375
8 0.25
9 0.15
10 0.14285714285714285
11 0.11764705882352941
12 0
RESULT:
Thus the program to implement naïve bayes classifier hass been verified and executed
successfully.
20