0% found this document useful (0 votes)
5 views41 pages

ML 512

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views41 pages

ML 512

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

FACULTY OF ENGINEERING AND

TECHNOLOGYBACHELOR OF
TECHNOLOGY

Machine Learning Laboratory

(303105354)

VI SEMESTER

Computer Science & Engineering

Laboratory Manual

Session: 2024-25
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

PREFACE

It gives us immense pleasure to present the first edition of the Machine Learning Laboratory
Practical Book for the B.Tech . 6th Semester students for PARUL UNIVERSITY.

The ML theory and laboratory courses at PARUL UNIVERSITY, WAGHODIA,


VADODARA are designed in such a way that students develop the basic understanding
of the subject in the theory classes and then try their hands on the experiments to realize
the various implementations of problems learnt during the theoretical sessions. The main
objective of the ML laboratory course is: Learning ML through Experimentations. All
the experiments are designed to illustrate various problems in different areas of ML and
also to expsee the studentsto various uses.

The objective of this ML Practical Book is to provide a comprehensive source for all the
experiments included in the ML laboratory course. It explains all the aspects related to
every experiment such as: basic underlying commit and how to analyze a problem. It also
gives sufficientinformation on how to interpret and discuss the obtained results.

We acknowledge the authors and publishers of all the books which we have consulted
while developing this Practical book. Hopefully this ML Practical Book will serve the
purpose for which it has been developed.

2|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM

INSTRUCTIONS TO STUDENTS

1. The main objective of the ML laboratory is: Learning through the Experimentation.
All the experiments are designed to illustrate various problems in different areas of
ML andalso to expose the students to various problems and their uses.
2. Be prompt in arriving to the laboratory and always come well prepared for the practical.
3. Every student should have his/her individual copy of the ML Practical Book.
4. Every student have to prepare the notebooks specifically reserved for the ML
practicalwork: ” ML Practical Book”
5. Every student has to necessarily bring his/her ML Practical Book, ML Practical
ClassNotebook and ML Practical Final Notebook.
6. Finally find the output of the experiments along with the problem and note results in the
ML Practical Notebook.
7. The grades for the ML practical course work will be awarded based on our
performance in the laboratory, regularity, recording of experiments in the ML
Practical Final Notebook, lab quiz, regular vivavoce and end term examination.

3|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM

Sr. Experiment Title Page No Start Completion Sign Mark


No From To Date Date s/10
Dealing with Data using Numpy,
1 Pandas, Statistics library
Data Analysis & Visualization
2 on Diwali Sales Dataset.
Implement linear regression and
3 logistic regression.
Implement the naïve Bayesian
4 classifier for a sample training
data set stored as a .CSV file.
Compute the accuracy of the
classifier, considering a few test
data sets.
Assuming a set of documents that
5 need to be classified, use the naïve
Bayesian Classifier model to
perform this task.
Decision tree-based ID3
6 algorithm.
Write a program to implement the
7 K-Nearest Neighbor algorithm to
classify the iris data set
Apply EM algorithm to cluster a
8 set of data stored in a .CSV file.
Use the same data set for
clustering using k-Means
algorithm.

9 Write a program to construct a


Bayesian network considering
medical data. Use this model to
demonstrate the diagnosis
of heart patients using standard
Heart Disease Data Set.
10 Compare the various supervised
learning algorithm by using
appropriate dataset. (Linear
Regression, Support Vector
Machine, Decision Tree)
11 Compare the various
Unsupervised learning algorithm
by using the appropriate datasets.
(K Means Clustering, K Mode)
12 Build an Artificial Neural
Network by implementing the
Backpropagation algorithm and
test the same using appropriate
data sets

4|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 1
AIM: Dealing with Data using Numpy, Pandas, Statistics library

CODE:

import statistics as st
a=[20, 10, 50, 10, 21, 90]
s=st.mode(a)
print("Mode:",s)
m=st.mean(a)
print("Mean:",m)

import numpy as np
b=np.array([[10, 20, 30, 50, 10],[1, 2, 3, 5, 6]])
print(b)
print("Dim:",b.ndim)

b=np.array([[10, 20, 30, 50, 10],[1, 2, 3, 5, 6],[1,1,1,1,1]])


print(b)
print("Dim:",b.ndim)
print("Shape:",b.shape)
print("Size:",b.size)
print("Item Size:",b.itemsize)
print("Data type:",b.dtype)

c=np.empty((5,5))
c
f = np.full((3, 3), 5)
f
d=np.arange(0,100,2)

5|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

d
e=np.reshape(d,(10,5))
t=np.reshape(e,(2,5,5))
print(t)
print("Last element:",d[10])
print("ELement upto 10:",d[:10])
print("Last 5 element:",d[-5:])
print("Alternative Numbers:",d[::2])

import pandas as pd
p_dict={'pid':[1,2,3,4,5],'value':[10,20,30,40,50]}
p_dict
grt=pd.DataFrame(p_dict)
grt

OUTPUT:

6|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 2

AIM: Data Analysis & Visualization on Diwali Sales Dataset.

CODE:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load Dataset
df = pd.read_csv('diwali_sales.csv', encoding='utf-8')

# Display basic information


print(df.info())
print(df.head())

# Data Cleaning
df.dropna(inplace=True) # Remove missing values
df.drop_duplicates(inplace=True)

# Convert necessary columns to appropriate data types


df['Amount'] = pd.to_numeric(df['Amount'], errors='coerce')

# Exploratory Data Analysis (EDA)


plt.figure(figsize=(10, 5))
sns.countplot(x='Gender', data=df, palette='coolwarm')
plt.title('Sales by Gender')
plt.show()

plt.figure(figsize=(12, 6))
sns.barplot(x='State', y='Amount', data=df, estimator=sum, palette='viridis')
plt.xticks(rotation=90)
plt.title('Total Sales by State')
plt.show()

plt.figure(figsize=(10, 5))
sns.countplot(x='Age Group', data=df, palette='Set2')
plt.title('Sales by Age Group')
plt.show()

plt.figure(figsize=(12, 6))
sns.barplot(x='Category', y='Amount', data=df, estimator=sum, palette='magma')
plt.xticks(rotation=45)
plt.title('Sales by Product Category')

7|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

plt.show()

# Revenue Trends
df['Order Date'] = pd.to_datetime(df['Order Date'], errors='coerce')
df['Month'] = df['Order Date'].dt.month
plt.figure(figsize=(10, 5))
sns.lineplot(x='Month', y='Amount', data=df, estimator=sum, marker='o', color='b')
plt.title('Revenue Trend Over Months')
plt.show()

print("Analysis Complete.")

OUTPUT:

8|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

9|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 3
AIM: Implement linear regression and logistic regression.

CODE:

3.1 linear regression


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = {
'Size': [1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400],
'Price': [300000, 310000, 360000, 350000, 390000, 400000, 420000, 430000, 480000,
470000]
}
#print(data)
# Convert to DataFrame
df = pd.DataFrame(data)
df

#plot the data


plt.scatter(df['Size'], df['Price'], color='blue')
plt.xlabel('Size (sq ft)')
plt.ylabel('Price ($)')
plt.title('House Prices vs Size')
plt.show()

from sklearn.model_selection import train_test_split


X = df[['Size']]
y = df['Price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


#Train the Linear Regression Model

from sklearn.linear_model import LinearRegression


model = LinearRegression()
model.fit(X_train, y_train)
#Make Predictions
y_pred = model.predict(X_test)

#Evaluate the Model


from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

10 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

print(f'Mean Squared Error: {mse}')


print(f'R^2 Score: {r2}')

#Visualize the Regression Line


plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.xlabel('Size (sq ft)')
plt.ylabel('Price ($)')
plt.title('Actual vs Predicted Prices')
plt.show()

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import load_fetch_california_housing

# Load the Boston dataset


boston = load_fetch_california_housing()
# Create a DataFrame
df = pd.DataFrame(fetch_california_housing.data, columns=
fetch_california_housing.feature_names)
df['PRICE'] = fetch_california_housing.target

# Explore the data


print(df.head(15))
print(df.describe())
print(df.corr())

# Visualize the data


plt.scatter(df['RM'], df['PRICE'])
plt.xlabel('Average number of rooms per dwelling (RM)')
plt.ylabel('House Price')
plt.title('House Price vs RM')
plt.show()

# Split the data


X = df[['RM']]
y = df['PRICE']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

11 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')

# Visualize the regression line


plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('Average number of rooms per dwelling (RM)')
plt.ylabel('House Price')
plt.title('Actual vs Predicted Prices')
plt.legend()
plt.show()

OUTPUT:

12 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

3.2 logistic regression

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
y
y_binary = (y > np.median(y)).astype(int)
y_binary
X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2,
random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

13 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

OUTPUT:

14 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 4

AIM: Implement the naïve Bayesian classifier for a sample training


data set stored as a .CSV file. Compute the accuracy of the classifier,
considering a few test data sets.

CODE:
# load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()

# store the feature matrix (X) and response vector (y)


X = iris.data
y = iris.target

# splitting X and y into training and testing sets


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

# training the model on training set


from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# making predictions on the testing set


y_pred = gnb.predict(X_test)

# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test,
y_pred)*100)

OUTPUT:

15 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 5

AIM: Assuming a set of documents that need to be classified, use the


naïve Bayesian Classifier model to perform this task.

CODE:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

# Load Dataset
msg = pd.read_csv('naivetext.csv', names=['message', 'label'], encoding='utf-8')

# Check dataset dimensions


print('The dimensions of the dataset:', msg.shape)

# Mapping labels to numerical values


msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

# Handle missing values (if any)


msg.dropna(inplace=True)

# Features & Target


X = msg.message
y = msg.labelnum

# Splitting Data (Stratified to maintain label balance)


xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print('\nThe total number of Training Data:', ytrain.shape[0])


print('The total number of Test Data:', ytest.shape[0])

# Vectorizing Text Data


count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm = count_vect.transform(xtest)

print('\nThe words or tokens in the text documents:')


print(count_vect.get_feature_names_out())

# Convert Sparse Matrix to DataFrame (Optional)


df = pd.DataFrame(xtrain_dtm.toarray(), columns=count_vect.get_feature_names_out())

16 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

# Train Naive Bayes Model


clf = MultinomialNB().fit(xtrain_dtm, ytrain)

# Make Predictions
predicted = clf.predict(xtest_dtm)

# Model Evaluation
accuracy = metrics.accuracy_score(ytest, predicted)
precision = metrics.precision_score(ytest, predicted)
recall = metrics.recall_score(ytest, predicted)
f1 = metrics.f1_score(ytest, predicted)

print("\nModel Performance Metrics:")


print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

# Confusion Matrix
print('\nConfusion Matrix:')
print(metrics.confusion_matrix(ytest, predicted))

OUTPUT

17 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 6
AIM: Decision tree-based ID3 algorithm.

CODE

#Step-1: Import python libraries.


import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics

#Step-2: Import IRIS Dataset


dataset = datasets.load_iris()
X = dataset.data
y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

#Step-3: Load Decision tree classifier into clf variable.


clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = metrics.accuracy_score(y_test, y_pred)

#Step-4: Plot the Confusion Matrix.


confusion_matrix = metrics.confusion_matrix(y_test, y_pred)
print("Accuracy:", accuracy)
print("Confusion Matrix:")
labels = dataset.target_names
sns.heatmap(confusion_matrix, annot=True, fmt="d", xticklabels=labels,
yticklabels=labels, cmap="Blues", cbar=False)
plt.xlabel("Predicted")

18 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()

OUTPUT

19 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 7
AIM: Write a program to implement the K-Nearest Neighbor algorithm
to classify the iris data set.

CODE:

import pandas as pd
from sklearn.datasets import load_iris iris = load_iris()
iris.feature_names
iris.target_names
df = pd.DataFrame(iris.data,columns=iris.feature_names) df.head()

print (iris.target) [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0000000000000000000000000011111111
1111111111111111111111111111111111
1111111122222222222222222222222222
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

df[df.target==1].head()

20 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

df[df.target==2].head()

df['flower_name'] =df.target.apply(lambda x:
iris.target_names[x]) df.head()

df[45:55]

df0 = df[:50] df1 = df[50:100]


df2 = df[100:]
import matplotlib.pyplot as plt %matplotlib inline

plt.xlabel('Sepal Length') plt.ylabel('Sepal Width') plt.scatter(df0['sepal length (cm)'],


df0['sepal width (cm)'],color="green",marker='+') plt.scatter(df1['sepal length (cm)'],
df1['sepal width (cm)'], color="blue",marker='.')

21 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

plt.xlabel('Petal Length') plt.ylabel('Petal Width')


plt.scatter(df0['petal length (cm)'], df0['petal width (cm)'],color="green",marker='+')
plt.scatter(df1['petal length (cm)'], df1['petal width (cm)'],color="blue",marker='.')

from sklearn.model_selection import train_test_split X = df.drop(['target','flower_name'],


axis='columns')
y = df.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=1)
len(X_train) len(X_test)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=10) knn.fit(X_train, y_train) knn.score(X_test,
y_test) knn.predict([[4.8,3.0,1.5,0.3]])
from sklearn.metrics import confusion_matrix y_pred = knn.predict(X_test)
cm = confusion_matrix(y_test, y_pred) cm
%matplotlib

22 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

inline import matplotlib.pyplot as plt import seaborn as sn


plt.figure(figsize=(7,5)) sn.heatmap(cm, annot=True) plt.xlabel('Predicted') plt.ylabel('Truth')
from sklearn.metrics import classification_report print(classification_report(y_test, y_pred))

23 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 8
AIM: Apply EM algorithm to cluster a set of data stored in a
.CSV file. Use the same data set for clustering using k-Means
algorithm.
CODE:-
from sklearn.cluster import KMeans from sklearn.datasets import load_iris import
pandas as pd import numpy as np import matplotlib.pyplot as plt dataset=load_iris()
X=pd.DataFrame(dataset.data) X.columns= ['Sepal_Length','Sepal_Width','Petal_Len
gth','Petal_Width'] y=pd.DataFrame(dataset.target) y.columns=['Targets']
plt.figure(figsize=(14,7)) colormap=np.array(['red','lime','black'])

plt.subplot(1,3,1)
plt.scatter(X.Petal_Length, X.Petal_Width, c= colormap[y.Targets], s= 40)
plt.title('Real')
plt.show()

24 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

K-PLOT

plt.subplot(1,3,2) model=KMeans(n_clusters=3)
model.fit(X) predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')

25 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 9

AIM: Write a program to construct a Bayesian network


considering medical data. Use this model to demonstrate the
diagnosis of heart patients using standard Heart Disease Data
Set.

CODE:

import pandas as pd import numpy as np


import matplotlib.pyplot as plt import warnings
import seaborn as sns warnings.filterwarnings('ignore')
data=pd.read_csv('/content/heart_statlog_cleveland_hungary_final.csv') data.shape

data.info()

data.head()

26 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

data.describe()

data.hist(figsize=(20,16))

27 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

corr_matrix=data.corr() plt.figure(figsize=(20,10))
sns.heatmap(corr_matrix,annot=True,cmap='coolwarm') plt.title('Correlation Matrix')
plt.show()

data.isnull().sum()

data.duplicated().sum()

28 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

data["target"].value_counts()

x_norm=data.drop(['target'],axis=1) y=data['target']
from sklearn.preprocessing import
StandardScaler scaler=StandardScaler()
x=scaler.fit_transform(x_norm)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report
nb=GaussianNB() nb.fit(x_train,y_train)

y_pred=nb.predict(x_test) print(classification_report(y_test,y_pred))

from sklearn.metrics import accuracy_score,confusion_matrix


acc=accuracy_score(y_test,y_pred)*100
print("Accuracy",acc)

29 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

nb_cm=confusion_matrix (y_test,y_pred)
sns.heatmap(nb_cm,annot=True,fmt='d',cmap='Blues') plt.title('Confusion Matrix')
plt.xlabel('Predicted') plt.ylabel('Actual') plt.show()

30 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 10

AIM: ompare the various supervised learning algorithm by using


appropriate dataset. (Linear Regression, Support VectorMachine,
Decision Tree)

CODE:

import pandas as pd import numpy as np


import matplotlib.pyplot as plt import warnings
import seaborn as sns warnings.filterwarnings('ignore')

data=pd.read_csv('/content/heart_statlog_cleveland_hungary_final.csv')
data["target"].value_counts()
x_norm=data.drop(['target'],axis=1) y=data['target']
#scale data
from sklearn.preprocessing import StandardScaler scaler=StandardScaler()
x=scaler.fit_transform(x_norm)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42) # Gaussian
Naïve Bayes
from sklearn.naive_bayes import GaussianNB from sklearn.metrics import
classification_report nb=GaussianNB() nb

nb.fit(x_train,y_train)

31 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

y_pred=nb.predict(x_test) print(classification_report(y_test,y_pred))

from sklearn.metrics import accuracy_score,confusion_matrix


acc=accuracy_score(y_test,y_pred)*100 print("Accuracy=.acc")
nb_cm=confusion_matrix(y_test,y_pred)
sns.heatmap(nb_cm,annot=True,fmt='d',cmap='Blues') plt.title('Confusion Matrix')
plt.xlabel('Predicted') plt.ylabel('Actual')

#Logistic Regression
from sklearn.linear_model import LogisticRegression lr = LogisticRegression()
lr.fit(x_train, y_train) y_pred_lr = lr.predict(x_test)
print(classification_report(y_test, y_pred_lr))

32 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

acc_lr = accuracy_score(y_test, y_pred_lr) * 100 print("Accuracy=",acc_lr)


lr_cm = confusion_matrix(y_test, y_pred_lr) sns.heatmap(lr_cm, annot=True, fmt='d',
cmap='Blues') plt.title('Confusion Matrix for Logistic Regression') plt.xlabel('Predicted')
plt.ylabel('Actual') plt.show()

# Decision Tree classifier


from sklearn.tree import DecisionTreeClassifier dt = DecisionTreeClassifier()

dt.fit(x_train, y_train)

# Make predictions on the test set y_pred_dt = dt.predict(x_test)


# Evaluate the model print(classification_report(y_test, y_pred_dt))
acc_dt = accuracy_score(y_test, y_pred_dt) * 100 print("Accuracy=", acc_dt)

33 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

dt_cm = confusion_matrix(y_test, y_pred_dt) sns.heatmap(dt_cm, annot=True, fmt='d',


cmap='Blues') plt.title('Confusion Matrix for Decision Tree') plt.xlabel('Predicted')
plt.ylabel('Actual') plt.show()

#KNN classifier
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5) # You can adjust the number of neighbors
knn.fit(x_train, y_train)

#KNN classifier

# Make predictions on the test set y_pred_knn = knn.predict(x_test)

# Evaluate the model print(classification_report(y_test, y_pred_knn))

acc_knn = accuracy_score(y_test, y_pred_knn) * 100 print("Accuracy=",acc_knn)


knn_cm = confusion_matrix(y_test, y_pred_knn) sns.heatmap(knn_cm, annot=True, fmt='d',
cmap='Blues') plt.title('Confusion Matrix for KNN') plt.xlabel('Predicted')
plt.ylabel('Actual') plt.show()

34 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

#SVM classifier

from sklearn.svm import SVC


svm = SVC(kernel='linear', random_state=42) # You can change the kernel (e.g., 'rbf', 'poly')
svm.fit(x_train, y_train)

# Make predictions on the test set y_pred_svm = svm.predict(x_test)

# Evaluate the model print(classification_report(y_test, y_pred_svm))

acc_svm = accuracy_score(y_test, y_pred_svm) * 100 print("Accuracy=",acc_svm)

svm_cm = confusion_matrix(y_test, y_pred_svm) sns.heatmap(svm_cm, annot=True,


fmt='d', cmap='Blues') plt.title('Confusion Matrix for SVM') plt.xlabel('Predicted')
best_model = max(model_accuracies, key=model_accuracies.get) best_accuracy =
model_accuracies[best_model]
print(f"The best performing model is {best_model} with an accuracy of
{best_accuracy:.2f}%") plt.ylabel('Actual') plt.show()

35 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

model_accuracies = { 'Naive Bayes': acc,


'Logistic Regression': acc_lr, 'Decision Tree': acc_dt, 'KNN': acc_knn,
'SVM': acc_svm
}
best_model = max(model_accuracies, key=model_accuracies.get) best_accuracy =
model_accuracies[best_model]
print(f"The best performing model is {best_model} with an accuracy of
{best_accuracy:.2f}%")

model_data = {
'Model': ['Naive Bayes', 'Logistic Regression', 'Decision Tree', 'KNN', 'SVM'],
'Accuracy': [acc, acc_lr, acc_dt, acc_knn, acc_svm]
}
model_comparison = pd.DataFrame(model_data) model_comparison

colors = ['skyblue', 'lightcoral', 'lightgreen', 'gold', 'plum'] plt.figure(figsize=(10, 6))


plt.bar(model_comparison['Model'],model_comparison['Accuracy'], color=colors)
plt.xlabel("Model")
plt.ylabel("Accuracy (%)") plt.title("Model Comparison")
plt.ylim(0, 100) # Set y-axis limit to 100%

36 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 11
AIM: Compare the various Unsupervised learning algorithm by using
the appropriate datasets. (K Means Clustering, K Mode)

CODE:

from sklearn.datasets import load_iris from sklearn.cluster import KMeans


from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score from
scipy.optimize import linear_sum_assignment
import matplotlib.pyplot as plt import seaborn as sns
import pandas as pd import numpy as np

# Load the iris dataset iris = load_iris()


X = iris.data y = iris.target

# Apply K-Means clustering


kmeans = KMeans(n_clusters=3, random_state=42) kmeans_labels = kmeans.fit_predict(X)

# Function to match predicted cluster labels to true labels def


cluster_labels_to_true_labels(y_true, y_pred):
D = np.zeros((np.max(y_pred) + 1, np.max(y_true) + 1), dtype=np.int32) for i in
range(y_pred.size):
D[y_pred[i], y_true[i]] += 1
row_ind, col_ind = linear_sum_assignment(D.max() - D) return col_ind
# Function to compute evaluation metrics def compute_metrics(y_true, y_pred):

y_pred_matched = cluster_labels_to_true_labels(y_true, y_pred)[y_pred] accuracy =


accuracy_score(y_true, y_pred_matched)
precision = precision_score(y_true, y_pred_matched, average='weighted') recall =
recall_score(y_true, y_pred_matched, average='weighted')
f1 = f1_score(y_true, y_pred_matched, average='weighted') return accuracy, precision, recall,
f1

# Compute metrics for K-Means clustering kmeans_metrics = compute_metrics(y,


kmeans_labels)

# Plot the results


df = pd.DataFrame(X, columns=iris.feature_names) df['Cluster'] = kmeans_labels
df['True Label'] = y

sns.pairplot(df, hue='Cluster', palette='viridis') plt.suptitle('K-Means Clustering')


plt.show()

37 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

# Print evaluation metrics


print(f"Accuracy: {kmeans_metrics[0]:.4f}")
print(f"Precision: {kmeans_metrics[1]:.4f}")
print(f"Recall: {kmeans_metrics[2]:.4f}")
print(f"F1 Score: {kmeans_metrics[3]:.4f}")

38 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

Experiment 12

AIM: Build an Artificial Neural Network by implementing the


Backpropagation algorithm and test the same using appropriate
data sets

CODE:

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler

# Load dataset
iris = load_iris()
X = iris.data # Features
y = iris.target.reshape(-1, 1) # Labels

# One-hot encoding of labels


encoder = OneHotEncoder(sparse_output=False)
y_one_hot = encoder.fit_transform(y)

# Standardize input data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_one_hot, test_size=0.2,
random_state=42)

# Define network architecture


input_size = X_train.shape[1] # 4 input neurons (Iris features)
hidden_size = 5 # Hidden layer with 5 neurons
output_size = y_train.shape[1] # 3 output neurons (Iris classes)
learning_rate = 0.01
epochs = 1000

# Initialize weights and biases


np.random.seed(42)
W1 = np.random.randn(input_size, hidden_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size)
b2 = np.zeros((1, output_size))

# Activation function and its derivative

39 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM

def sigmoid(x):
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
return x * (1 - x)

# Training loop
for epoch in range(epochs):
# Forward propagation
Z1 = np.dot(X_train, W1) + b1
A1 = sigmoid(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = sigmoid(Z2)

# Compute loss (Mean Squared Error)


loss = np.mean((y_train - A2) ** 2)

# Backpropagation
dA2 = (A2 - y_train) * sigmoid_derivative(A2)
dW2 = np.dot(A1.T, dA2)
db2 = np.sum(dA2, axis=0, keepdims=True)

dA1 = np.dot(dA2, W2.T) * sigmoid_derivative(A1)


dW1 = np.dot(X_train.T, dA1)
db1 = np.sum(dA1, axis=0, keepdims=True)

# Update weights and biases


W2 -= learning_rate * dW2
b2 -= learning_rate * db2
W1 -= learning_rate * dW1
b1 -= learning_rate * db1

# Print loss every 100 epochs


if epoch % 100 == 0:
print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Testing the ANN


Z1_test = np.dot(X_test, W1) + b1
A1_test = sigmoid(Z1_test)
Z2_test = np.dot(A1_test, W2) + b2
A2_test = sigmoid(Z2_test)

# Convert predictions to class labels


y_pred = np.argmax(A2_test, axis=1)
y_actual = np.argmax(y_test, axis=1)

# Accuracy calculation
accuracy = np.mean(y_pred == y_actual) * 100
print(f"Test Accuracy: {accuracy:.2f}%")

40 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512

OUTPUT:

41 | P a g e

You might also like