Machine Learning LAB
Machine Learning LAB
AIM:
To implement the linear regression model and to experiment with different features in
building a model.
DEFINITION:
Let us consider a dataset where we have a value of response y for every feature x:
Now, the task is to find a line that fits best in the above scatter plot so that we can
predict the response for any new feature values. (i.e a value of x not present in the datasetThis
line is called a regression line.
PROCEDURE:
Importing required libraries like pandas & numpy for data analysis and manipulation and
seaborn & matplotlib for data visualization.
Visualizing the variables in order to interpret business/domain inferences.
Splitting the data into two sections in order to train a subset of dataset to generate a trained
(fitted) line
1
Rescaling the trained model: It is a method used to normalize the range of numerical
variables with varying degrees of magnitude.
Residual analysis of the train data tells us how much the errors are distributed across the
model. A good residual analysis will signify that the mean is centred around 0.
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
2
OUTPUT:
RESULT:
Thus the program to implement linear regression model was implemented and executed
successfully.
3
EX.NO:2 BINARY CLASSIFICATION MODEL
AIM:
To write a program to implement the binary classification model using python.
PROCEDURE:
Step 1: Define explanatory and target variables
Step 2: Split the dataset into training and testing sets
Step 3: Normalize the data for numerical stability
Step 4: Fit a logistic regression model to the training data
PROGRAM:
import numpy as np
class Perceptron(object):
""" Perceptron Classifier
Parameters
------------
rate : float
Learning rate (ranging from 0.0 to 1.0)
number_of_iteration : int
Number of iterations over the input dataset.
Attributes:
------------
weight_matrix : 1d-array
Weights after fitting.
error_matrix : list
Number of misclassification in every epoch(one full training cycle on the training set)
"""
def __init__(self, rate = 0.01, number_of_iterations = 100):
self.rate = rate
self.number_of_iterations = number_of_iterations
def fit(self, X, y):
""" Fit training data
Parameters:
------------
X : array-like, shape = [number_of_samples, number_of_features]
Training vectors.
4
y : array-like, shape = [number_of_samples]
Target values.
Returns
------------
self : object
"""
self.weight_matrix = np.zeros(1 + X.shape[1])
self.errors_list = []
for _ in range(self.number_of_iterations):
errors = 0
for xi, target in zip(X, y):
update = self.rate * (target - self.predict(xi))
self.weight_matrix[1:] += update * xi
self.weight_matrix[0] += update
errors += int(update != 0.0)
self.errors_list.append(errors)
return self
def dot_product(self, X):
""" Calculate the dot product """
return (np.dot(X, self.weight_matrix[1:]) + self.weight_matrix[0])
def predict(self, X):
""" Predicting the label for the input data """
return np.where(self.dot_product(X) >= 0.0, 1, 0)
if __name__ == '__main__':
X = np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0]])
y = np.array([0, 1, 1, 1, 1, 1, 1])
p = Perceptron()
p.fit(X, y)
print("Predicting the output of [1, 1, 1] = {}".format(p.predict([1, 1, 1])))
OUTPUT:
Predicting the output of [1, 1, 1] = 1
RESULT:
Thus the program for implementing binary classification model was implemented and executed
successfully.
5
EX.NO:3 CLASSIFICATION WITH NEAREST NEIGHBOURS
AIM:
To write the program for the implementation of the k-nearest neighbor algorithm
ALGORITHM:
Step 1 − For implementing any algorithm, we need dataset. So during the first step of KNN, we
must load the training as well as test data.
Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can be any
integer.
Step 3 − For each point in the test data do the following −
3.1 − Calculate the distance between test data and each row of training data with the help
of any of the method namely: Euclidean, Manhattan or Hamming distance. The most
commonly used method to calculate distance is Euclidean.
3.2 − Now, based on the distance value, sort them in ascending order.
3.3 − Next, it will choose the top K rows from the sorted array.
3.4 − Now, it will assign a class to the test point based on most frequent class of these
rows.
Step 4 − End
PROGRAM:
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
# Loading data
irisData = load_iris()
# Create feature and target arrays
X = irisData.data
y = irisData.target
# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)
# Predict on dataset which model has not seen before
print(knn.predict(X_test))
OUTPUT
[1 0 2 1 1 0 1 2 2 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]
6
PERFORMANCE
# Loading data
irisData = load_iris()
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)
OUTPUT:
0.9666666666666667
MODEL ACCURACY:
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt
irisData = load_iris()
# Create feature and target arrays
X = irisData.data
y = irisData.target
# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2, random_state=42)
neighbors = np.arange(1, 9)
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))
7
# Loop over K values
for i, k in enumerate(neighbors):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
# Generate plot
plt.plot(neighbors, test_accuracy, label = 'Testing dataset Accuracy')
plt.plot(neighbors, train_accuracy, label = 'Training dataset Accuracy')
plt.legend()
plt.xlabel('n_neighbors')
plt.ylabel('Accuracy')
plt.show()
OUTPUT:
RESULT :
Thus the program for the implementation of the k-nearest neighbor algorithm was verified and
executed successfully.
8
EX.NO:4 EXPERIMENT WITH VALIDATION SET AND TEST SET
AIM:
To write an experiment with validation sets and test sets for the given dataset.
PROCEDURE:
Training Dataset
The sample of data used to fit the model.The actual dataset that we use to train the model
(weights and biases in the case of a Neural Network). The model sees and learns from this data.
Validation Dataset
The sample of data used to provide an unbiased evaluation of a model fit on the training dataset
while tuning model hyperparameters. The evaluation becomes more biased as skill on the
validation dataset is incorporated into the model configuration.
Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on
the training dataset.The Test dataset provides the gold standard used to evaluate the model. It is
only used once a model is completely trained(using the train and validation sets).
PROGRAM:
9
y = range(8)
train_size=0.8,
random_state=42)
# Training set
print("Training set x: ",x_train)
print("Training set y: ",y_train)
# Importing numpy & scikit-learn
import numpy as np
from sklearn.model_selection import train_test_split
test_size=0.2,
random_state=42)
# Testing set
print("Testing set x: ", x_test)
print("Testing set y: ", y_test)
# Importing numpy & scikit-learn
import numpy as np
from sklearn.model_selection import train_test_split
10
x = np.arange(24).reshape((8,3))
# Testing set
print("Testing set x: ",x_test)
print("Testing set y: ",y_test)
print(" ")
# Validation set
print("Validation set x: ",x_val)
print("Validation set y: ",y_val)
OUTPUT:
Training set x: [[ 0 1]
[14 15]
[ 4 5]
[ 8 9]
[ 6 7]
[12 13]]
Training set y: [0, 7, 2, 4, 3, 6]
Testing set x: [[ 2 3]
[10 11]]
Testing set y: [1, 5]
Training set x: [[ 0 1 2]
[21 22 23]
[ 6 7 8]
[12 13 14]
11
[ 9 10 11]
[18 19 20]]
Training set y: [0, 7, 2, 4, 3, 6]
RESULT:
Thus the program for the implementation of an experiment with validation sets and test sets for
the given dataset was verified and executed successfully.
12
EX.NO:5 K-MEANS CLUSTERING
AIM:
To write a program for the implementation of the Kmeans to the given dataset.
PROCEDURE:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
PROGRAM
from sklearn.cluster import KMeans
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
from sklearn.datasets import load_iris
%matplotlib inline
iris = load_iris()
df = pd.DataFrame(iris.data,columns=iris.feature_names)
df.head()
OUTPUT:
13
df['flower'] = iris.target
df.head()
OUTPUT:
OUTPUT:
km = KMeans(n_clusters=3)
yp = km.fit_predict(df)
yp
OUTPUT:
14
df['cluster'] = yp
df.head(2)
OUTPUT:
df.cluster.unique()
OUTPUT:
df1 = df[df.cluster==0]
df2 = df[df.cluster==1]
df3 = df[df.cluster==2]
OUTPUT:
15
sse = []
k_rng = range(1,10)
for k in k_rng:
km = KMeans(n_clusters=k)
km.fit(df)
sse.append(km.inertia_)
plt.xlabel('K')
OUTPUT:
16
RESULT :
Thus the program for the implementation of the Kmeans to the given dataset was verified and
executed successfully.
17
EX.NO:6 NAIVE BAYES CLASSIFIER
AIM:
To implement a program for Naïve Bayes model
P(H|E) = P(E|H)*P(H)/P(E)
18
Step 4: Trying all together
Finally, we tie to all steps together and form our own model of Naive Bayes Classifier.
PROGRAM:
# the tuples consist of (delay time of train1, number of times)
X, Y = zip(*in_time)
X2, Y2 = zip(*too_late)
bar_width = 0.9
plt.bar(X, Y, bar_width, color="blue", alpha=0.75, label="in time")
bar_width = 0.8
plt.bar(X2, Y2, bar_width, color="red", alpha=0.75, label="too late")
plt.legend(loc='upper right')
plt.show()
in_time_dict = dict(in_time)
too_late_dict = dict(too_late)
def catch_the_train(min):
s = in_time_dict.get(min, 0)
if s == 0:
return 0
else:
m = too_late_dict.get(min, 0)
return s / (s + m)
19
OUTPUT:
-1 0
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 0.6
7 0.4375
8 0.25
9 0.15
10 0.14285714285714285
11 0.11764705882352941
12 0
RESULT:
Thus the program to implement naïve bayes classifier hass been verified and executed
successfully.
20