0% found this document useful (0 votes)
39 views29 pages

ML Lab Manual

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views29 pages

ML Lab Manual

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

21ECE305J – Machine Learning Algorithms

A Laboratory Record

Submitted by

Register No.:
Name:

in partial fulfillment for the award of the degree of

BACHELOR OF TECHNOLOGY
IN

ELECTRONICS AND COMMUNICATION ENGINEERING

DEPARTMENT OF ELECTRONICS AND COMMUNICATION


ENGINEERING FACULTY OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
VADAPALANI CAMPUS, CHENNAI – 26

OCTOBER 2024

BONAFIDE CERTIFICATE
Register No: Date:

Certified that this laboratory manual is the Bonafide record of _________, of


III Year, B.Tech Electronics and Communication Engineering who
carried out the laboratory work of Subject Code/Title: 21ECE305J –
MACHINE LEARNING ALGORITHMS under my supervision in the
academic year 2024-2025.

Date: Faculty-in-Charge Head of the Department

Submitted for University Examination held in of


SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

Date: Examiner -I Examiner -II

Index

Ex No Date Experiment Sign


1 Linear Regression

2 Logistic Regression

3 K-Fold

4 Support Vector Machine

5 K-Nearest Neighbor

6 K-Means Clustering

7 Hierarchical Clustering

8 Principal Component Analysis

EX NO: 1

Implementation of Linear Regression


DATE:

Aim: To perform linear regression using dataset samples..

Software Required: Google Colab, Python IDE, Jupyter Notebook


Program:

A) Linear regression using random data sample

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

m = 50 # creating 50 samples

X = np.linspace(0,10,m).reshape(m,1)

y = X + np.random.randn(m,1)

print(X)

len(X)

len(y)

plt.scatter(X,y)

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X,y)

model.score(X,y)

Prediction = model.predict(X)

plt.scatter(X,y)

plt.plot(X,Prediction,'r')

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

mae = mean_absolute_error(X, Prediction)

mse = mean_squared_error(X, Prediction)

rmse = np.sqrt(mse)

r_squared = r2_score(X, Prediction)

print("Mean Absolute Error (MAE)=", mae)

print("Mean Squared Error (MSE)=", mse)

print("R-squared (R-squ)=", r_squared)

print("Root Mean Squared Error (RMSE)=", rmse)


Output:
Accuracy: 0.8539729126652087

Mean Absolute Error (MAE)= 0.37381532213368635

Mean Squared Error (MSE)= 0.19259123079935717

R-squared (R-squ)= 0.977795363978427

Root Mean Squared Error (RMSE)= 0.4388521741991911

B) Linear regression on salary dataset

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

data=pd.read_csv("Salary_Data.csv")

print(data)

data=data.dropna() #Remove NAN whole row

print(data)

len(data)

x=data[["YearsExperience"]]

y=data[["Salary"]]

plt.scatter(x,y) #plotting experience & salary data

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2) #splitting training data


& testing data

print(len(y_train)) #training dataset (80%)

len(x_train)

print(len(x_test)) #testing dataset(20%)

len(x_test)

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(x_train,y_train) #training experience & salary

predict=model.predict(x_test) #predicting salary by giving testset of experience

print(predict) #predicted value

y_test #original value

plt.scatter(x_test,y_test)

plt.plot(x_test,predict,'r')#plotting graph btw predicted and original value

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

mae=mean_absolute_error(y_test, predict)

mse=mean_squared_error(y_test, predict)
rmse=np.sqrt(mse)

r2=r2_score(y_test, predict)

print("MAE:",mae)

print("MSE:",mse)

print("RMSE:",rmse)

print("R2:",r2)

Result: Thus the linear regression using dataset sample evaluated successfully

Output:
MAE: 5950.500000000003

MSE: 47645779.63765114

RMSE: 6902.592240430485

R2: 0.9124538165906179

EX NO: 2 Implementation of Logistic Regression

DATE:
Aim: To perform Logistic Regression and predict digit from digits dataset

Software Required: Google Colab, Python IDE, Jupyter Notebook

Program:

from sklearn.datasets import load_digits

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

digits=load_digits()

dir(digits)

len(digits.data[1])

plt.gray()

plt.matshow(digits.images[3])

digits.target[3]

x_train,x_test,y_train,y_test=train_test_split(digits.data,digits.target,test_size=0.2)

LGR=LogisticRegression(max_iter=30)

LGR.fit(x_train,y_train)

from sklearn.metrics import confusion_matrix

prediction=LGR.predict(x_test)

cm=confusion_matrix(y_test,prediction)

import seaborn as sn

plt.figure(figsize=(5,5))

sn.heatmap(cm,annot=True)

plt.xlabel(‘prediction’)

plt.ylabel(‘Actual’)

import numpy as np

correct_pred=np.trace(cm)

total_pred=np.sum(cm)
accuracy=correct_pred/total_pred

print(accuracy)

Result: Thus the Logistic Regression and predict digit from digits dataset
implemented successfully

Output:
accuracy=0.9611111111111111

EX NO: 3 Implementation of K-FOLD

DATE:
Aim: To perform K-Fold on Sample Dataset

Software Required: Google Colab, Python IDE, Jupyter Notebook

Program:

A) K-Fold sample program

from sklearn.model_selection import KFold

import numpy as np

data=np.arange(1,21)

data

k=5

kf=KFold(n_splits=5,shuffle=True,random_state=42)

for folds,(train_index,test_index) in enumerate(kf.split(data),1):

train_index,test_index=data[train_index],data[test_index]

print(f"fold:{fold}")

print(f"Train index:{train_index}")

print(f"Test index:{test_index}")

Output:
Fold:1

Train index: [ 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 20]

Test index: [ 1 2 16 18]

Fold:2

Train index: [ 1 2 3 5 7 8 10 11 13 14 15 16 17 18 19 20]

Test index: [ 4 6 9 12]

Fold:3

Train index: [ 1 2 4 5 6 7 8 9 10 11 12 13 15 16 18 20]

Test index: [ 3 14 17 19]

Fold:4

Train index: [ 1 2 3 4 6 7 8 9 11 12 14 15 16 17 18 19]

Test index: [ 5 10 13 20]

Fold:5

Train index: [ 1 2 3 4 5 6 9 10 12 13 14 16 17 18 19 20]

Test index: [ 7 8 11 15]

B) K-Fold on Linear Regression Using Random Dataset


from sklearn.model_selection import KFold

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

import numpy as np

np.random.seed(42)

x=2*np.random.randn(100,1)

y=4+3*x+np.random.randn(100,1)

model=LinearRegression()

kf=KFold(n_splits=5,shuffle=True,random_state=42)

mse_scores=[]

fold_index=1

for train_index,test_index in kf.split(x):

print(f"fold:{fold_index}")

print(f"Train index:{train_index}")

print(f"Test index:{test_index}")

X_train,X_test=x[train_index],x[test_index]

Y_train,Y_test=y[train_index],y[test_index]

model.fit(X_train,Y_train)

y_pred=model.predict(X_test)

mse=mean_squared_error(Y_test,y_pred)

mse_scores.append(mse)

fold_index +=1

mean_mse=np.mean(mse_scores)

std_mse=np.std(mse_scores)

print(f"MSE SCORE:{mse_scores}")

print(f"MEAN MASE:{mean_mse}")

print(f"Standard deviation:{std_mse}")

Output:
MSE SCORE:[]

MEAN MASE:0.7479020958894542

Standard Deviation:0.0

C) K-Fold on Logistic Regression Using Digits Dataset


from sklearn.model_selection import KFold, cross_val_score

from sklearn.linear_model import LogisticRegression

from sklearn.datasets import load_digits

from sklearn.metrics import accuracy_score

import numpy as np

digits = load_digits()

X = digits.data

Y = digits.target

model = LogisticRegression(max_iter=10000)

kf = KFold(n_splits=5, shuffle=True, random_state=42)

accuracy_scores = []

for train_index, test_index in kf.split(X):

print(f"Train Index: {train_index}")

print(f"Test Index: {test_index}")

X_train, X_test = X[train_index], X[test_index]

Y_train, Y_test = Y[train_index], Y[test_index]

model.fit(X_train,Y_train)

Y_pred=model.predict(X_test)

# Calculate accuracy

accuracy = accuracy_score(Y_test, Y_pred)

accuracy_scores.append(accuracy)

mean_accuracy=np.mean(accuracy_scores)

std_accuracy=np.std(accuracy_scores)

print(f"Accuracy Scores for each fold: {accuracy_scores}")

print(f"Mean Accuracy: {mean_accuracy}")

print(f"Standard Deviation of Accuracy: {std_accuracy}")

Result: Thus the K-Fold on linear and logistic regression has been evaluated
successfully

Output:
Accuracy Scores for each fold: [0.9526462395543176]

Mean Accuracy: 0.9526462395543176

Standard Deviation of Accuracy: 0.0

EX NO: 4 Implementation of Support Vector Machine(SVM)


DATE:

Aim: To perform SVM on Digits Dataset

Software Required: Google Colab, Python IDE, Jupyter Notebook

Program:

from sklearn.datasets import load_digits

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC

digits=load_digits()

dir(digits)

len(digits.data[1])

plt.gray()

plt.matshow(digits.images[3])

digits.target[3]

x_train,x_test,y_train,y_test=train_test_split(digits.data,digits.target,test_size=0.2)

svm=SVC()

svm.fit(x_train,y_train)

from sklearn.metrics import confusion_matrix

prediction=svm.predict(x_test)

cm=confusion_matrix(y_test,prediction)

import seaborn as sn

plt.figure(figsize=(5,5))

sn.heatmap(cm,annot=True)

plt.xlabel('prediction')

plt.ylabel('Actual')

import numpy as np
correct_pred=np.trace(cm)

total_pred=np.sum(cm)

accuracy=correct_pred/total_pred

print(accuracy)

Result:

Thus the SVM on Digit Dataset has been implemented successfully

Output:
Accuracy=0.9805555555555555

EX NO: 5 Implementation of K-Nearest Neighbor (KNN) on


DATE: Random Dataset

Aim: To perform KNN on random dataset

Software Required: Google Colab, Python IDE, Jupyter Notebook

Program:

import seaborn as sns

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.neighbors import KNeighborsClassifier

import numpy as np

ship = sns.load_dataset('titanic')

ship.shape

ship = ship[['survived','pclass','sex','age']]

ship.dropna(axis=0,inplace=True)

ship['sex'].replace(['male','female'],[0,1],inplace=True)

ship.head()

knn = KNeighborsClassifier()

y = ship['survived']

X = ship.drop('survived',axis=1)

knn.fit(X,y)

knn.score(X,y)

def survivedPerson(knn,pclass=3,sex=1,age=30):

x = np.array([pclass,sex,age]).reshape(1,3)

print(knn.predict(x))

survivedPerson(knn)

Result: Thus the KNN on a Random Dataset has been implemented

Output:
Accuracy = 0.8305322128851541

SurvivedPerson = [0]

EX NO: 6 Implementation of KMeans on Make Blobs Dataset


DATE:

Aim: To perform KMeans on blobs dataset

Software Required: Google Colab, Python IDE, Jupyter Notebook

Program:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

from sklearn.datasets import make_blobs

# 1. Generate synthetic data (or use your own dataset)

X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# 2. Apply KMeans Clustering

kmeans = KMeans(n_clusters=4)

kmeans.fit(X)

# 3. Get the cluster labels and centroids

labels = kmeans.labels_

centroids = kmeans.cluster_centers_

# 4. Plot the clusters

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50)

plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=200, marker='X')

plt.title('K-Means Clustering (K=4)')

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.show()

Result: KMeans on Make Blobs Dataset has been implemented successfully

Output:
EX NO: 7 Implementation of Hierarchical Clustering

DATE:

Aim: To perform Hierarchical Clustering on blobs dataset

Software Required: Google Colab, Python IDE, Jupyter Notebook

Program:

import numpy as np

import pandas as pd

from sklearn.datasets import make_blobs

from sklearn.cluster import AgglomerativeClustering

from scipy.cluster.hierarchy import dendrogram, linkage

import matplotlib.pyplot as plt

X, y = make_blobs(n_samples=200, centers=4, cluster_std=0.60, random_state=0)

model = AgglomerativeClustering(n_clusters=4,linkage='ward')

labels = model.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='rainbow')

plt.title('Hierarchical Clustering')

plt.show()

Z = linkage(X, method='ward')

plt.figure(figsize=(10, 7))

plt.title("Dendrogram for Hierarchical Clustering")

dendrogram(Z)

plt.show()

Result: Hierarchical Clustering on Blobs Dataset has been implemented


successfully
Output:
EX NO: 8 Implementation of Principal Component Analysis
(PCA)
DATE:

Aim: To perform Principal Component Analysis on Wholesale Customer dataset.

Software Required: Google Colab, Python IDE, Jupyter Notebook

Program:

import pandas as pd

fromsklearn.preprocessingimportStandardScale

from sklearn.decomposition import PCA

file_path = '/content/Wholesale customers data.csv'

data = pd.read_csv(file_path)

data features = data.drop(columns=['Channel', 'Region'])

scaler = StandardScaler()

scaled_features = scaler.fit_transform(features)

pca = PCA(n_components=2)

pca_transformed = pca.fit_transform(scaled_features)

explained_variance = pca.explained_variance_ratio_

df = pd.DataFrame(pca_transformed, columns=['PC1', 'PC2']) pca_df['Channel'] =


data['Channel'] print("PCA Results:")

print(pca_df.head())

print(f"PC1: {explained_variance[0]:.2f}, PC2: {explained_variance[1]:.2f}")

Import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6)) colors = {1: 'red', 2: 'blue'}

plt.scatter(pca_df['PC1'], pca_df['PC2'], c=pca_df['Channel'].map(colors), alpha=0.5)

plt.xlabel('Principal Component 1')

plt.ylabel('Principal Component 2')

plt.title('PCA of Wholesale Customers Data')

plt.show()
Result: Principal Component Analysis on Customer dataset been implemented

Output:
PCA Results:

PC1 PC2 Channel

0 0.193291 -0.305100 2

1 0.434420 -0.328413 2

2 0.811143 0.815096 2

3 -0.778648 0.652754 1

4 0.166287 1.271434 2

Explained Variance:

PC1: 0.44, PC2: 0.28

You might also like