ML 512
ML 512
TECHNOLOGYBACHELOR OF
TECHNOLOGY
(303105354)
VI SEMESTER
Laboratory Manual
Session: 2024-25
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
PREFACE
It gives us immense pleasure to present the first edition of the Machine Learning Laboratory
Practical Book for the B.Tech . 6th Semester students for PARUL UNIVERSITY.
The objective of this ML Practical Book is to provide a comprehensive source for all the
experiments included in the ML laboratory course. It explains all the aspects related to
every experiment such as: basic underlying commit and how to analyze a problem. It also
gives sufficientinformation on how to interpret and discuss the obtained results.
We acknowledge the authors and publishers of all the books which we have consulted
while developing this Practical book. Hopefully this ML Practical Book will serve the
purpose for which it has been developed.
2|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
INSTRUCTIONS TO STUDENTS
1. The main objective of the ML laboratory is: Learning through the Experimentation.
All the experiments are designed to illustrate various problems in different areas of
ML andalso to expose the students to various problems and their uses.
2. Be prompt in arriving to the laboratory and always come well prepared for the practical.
3. Every student should have his/her individual copy of the ML Practical Book.
4. Every student have to prepare the notebooks specifically reserved for the ML
practicalwork: ” ML Practical Book”
5. Every student has to necessarily bring his/her ML Practical Book, ML Practical
ClassNotebook and ML Practical Final Notebook.
6. Finally find the output of the experiments along with the problem and note results in the
ML Practical Notebook.
7. The grades for the ML practical course work will be awarded based on our
performance in the laboratory, regularity, recording of experiments in the ML
Practical Final Notebook, lab quiz, regular vivavoce and end term examination.
3|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
4|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 1
AIM: Dealing with Data using Numpy, Pandas, Statistics library
CODE:
import statistics as st
a=[20, 10, 50, 10, 21, 90]
s=st.mode(a)
print("Mode:",s)
m=st.mean(a)
print("Mean:",m)
import numpy as np
b=np.array([[10, 20, 30, 50, 10],[1, 2, 3, 5, 6]])
print(b)
print("Dim:",b.ndim)
c=np.empty((5,5))
c
f = np.full((3, 3), 5)
f
d=np.arange(0,100,2)
5|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
d
e=np.reshape(d,(10,5))
t=np.reshape(e,(2,5,5))
print(t)
print("Last element:",d[10])
print("ELement upto 10:",d[:10])
print("Last 5 element:",d[-5:])
print("Alternative Numbers:",d[::2])
import pandas as pd
p_dict={'pid':[1,2,3,4,5],'value':[10,20,30,40,50]}
p_dict
grt=pd.DataFrame(p_dict)
grt
OUTPUT:
6|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 2
CODE:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load Dataset
df = pd.read_csv('diwali_sales.csv', encoding='utf-8')
# Data Cleaning
df.dropna(inplace=True) # Remove missing values
df.drop_duplicates(inplace=True)
plt.figure(figsize=(12, 6))
sns.barplot(x='State', y='Amount', data=df, estimator=sum, palette='viridis')
plt.xticks(rotation=90)
plt.title('Total Sales by State')
plt.show()
plt.figure(figsize=(10, 5))
sns.countplot(x='Age Group', data=df, palette='Set2')
plt.title('Sales by Age Group')
plt.show()
plt.figure(figsize=(12, 6))
sns.barplot(x='Category', y='Amount', data=df, estimator=sum, palette='magma')
plt.xticks(rotation=45)
plt.title('Sales by Product Category')
7|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
plt.show()
# Revenue Trends
df['Order Date'] = pd.to_datetime(df['Order Date'], errors='coerce')
df['Month'] = df['Order Date'].dt.month
plt.figure(figsize=(10, 5))
sns.lineplot(x='Month', y='Amount', data=df, estimator=sum, marker='o', color='b')
plt.title('Revenue Trend Over Months')
plt.show()
print("Analysis Complete.")
OUTPUT:
8|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
9|Page
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 3
AIM: Implement linear regression and logistic regression.
CODE:
data = {
'Size': [1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400],
'Price': [300000, 310000, 360000, 350000, 390000, 400000, 420000, 430000, 480000,
470000]
}
#print(data)
# Convert to DataFrame
df = pd.DataFrame(data)
df
10 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import load_fetch_california_housing
# Make predictions
y_pred = model.predict(X_test)
11 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
OUTPUT:
12 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
y
y_binary = (y > np.median(y)).astype(int)
y_binary
X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2,
random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
13 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
OUTPUT:
14 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 4
CODE:
# load the iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test,
y_pred)*100)
OUTPUT:
15 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 5
CODE:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
# Load Dataset
msg = pd.read_csv('naivetext.csv', names=['message', 'label'], encoding='utf-8')
16 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
# Make Predictions
predicted = clf.predict(xtest_dtm)
# Model Evaluation
accuracy = metrics.accuracy_score(ytest, predicted)
precision = metrics.precision_score(ytest, predicted)
recall = metrics.recall_score(ytest, predicted)
f1 = metrics.f1_score(ytest, predicted)
# Confusion Matrix
print('\nConfusion Matrix:')
print(metrics.confusion_matrix(ytest, predicted))
OUTPUT
17 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 6
AIM: Decision tree-based ID3 algorithm.
CODE
18 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()
OUTPUT
19 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 7
AIM: Write a program to implement the K-Nearest Neighbor algorithm
to classify the iris data set.
CODE:
import pandas as pd
from sklearn.datasets import load_iris iris = load_iris()
iris.feature_names
iris.target_names
df = pd.DataFrame(iris.data,columns=iris.feature_names) df.head()
print (iris.target) [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0000000000000000000000000011111111
1111111111111111111111111111111111
1111111122222222222222222222222222
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
df[df.target==1].head()
20 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
df[df.target==2].head()
df['flower_name'] =df.target.apply(lambda x:
iris.target_names[x]) df.head()
df[45:55]
21 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
22 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
23 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 8
AIM: Apply EM algorithm to cluster a set of data stored in a
.CSV file. Use the same data set for clustering using k-Means
algorithm.
CODE:-
from sklearn.cluster import KMeans from sklearn.datasets import load_iris import
pandas as pd import numpy as np import matplotlib.pyplot as plt dataset=load_iris()
X=pd.DataFrame(dataset.data) X.columns= ['Sepal_Length','Sepal_Width','Petal_Len
gth','Petal_Width'] y=pd.DataFrame(dataset.target) y.columns=['Targets']
plt.figure(figsize=(14,7)) colormap=np.array(['red','lime','black'])
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length, X.Petal_Width, c= colormap[y.Targets], s= 40)
plt.title('Real')
plt.show()
24 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
K-PLOT
plt.subplot(1,3,2) model=KMeans(n_clusters=3)
model.fit(X) predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
25 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 9
CODE:
data.info()
data.head()
26 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
data.describe()
data.hist(figsize=(20,16))
27 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
corr_matrix=data.corr() plt.figure(figsize=(20,10))
sns.heatmap(corr_matrix,annot=True,cmap='coolwarm') plt.title('Correlation Matrix')
plt.show()
data.isnull().sum()
data.duplicated().sum()
28 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
data["target"].value_counts()
x_norm=data.drop(['target'],axis=1) y=data['target']
from sklearn.preprocessing import
StandardScaler scaler=StandardScaler()
x=scaler.fit_transform(x_norm)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report
nb=GaussianNB() nb.fit(x_train,y_train)
y_pred=nb.predict(x_test) print(classification_report(y_test,y_pred))
29 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
nb_cm=confusion_matrix (y_test,y_pred)
sns.heatmap(nb_cm,annot=True,fmt='d',cmap='Blues') plt.title('Confusion Matrix')
plt.xlabel('Predicted') plt.ylabel('Actual') plt.show()
30 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 10
CODE:
data=pd.read_csv('/content/heart_statlog_cleveland_hungary_final.csv')
data["target"].value_counts()
x_norm=data.drop(['target'],axis=1) y=data['target']
#scale data
from sklearn.preprocessing import StandardScaler scaler=StandardScaler()
x=scaler.fit_transform(x_norm)
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42) # Gaussian
Naïve Bayes
from sklearn.naive_bayes import GaussianNB from sklearn.metrics import
classification_report nb=GaussianNB() nb
nb.fit(x_train,y_train)
31 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
y_pred=nb.predict(x_test) print(classification_report(y_test,y_pred))
#Logistic Regression
from sklearn.linear_model import LogisticRegression lr = LogisticRegression()
lr.fit(x_train, y_train) y_pred_lr = lr.predict(x_test)
print(classification_report(y_test, y_pred_lr))
32 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
dt.fit(x_train, y_train)
33 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
#KNN classifier
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5) # You can adjust the number of neighbors
knn.fit(x_train, y_train)
#KNN classifier
34 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
#SVM classifier
35 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
model_data = {
'Model': ['Naive Bayes', 'Logistic Regression', 'Decision Tree', 'KNN', 'SVM'],
'Accuracy': [acc, acc_lr, acc_dt, acc_knn, acc_svm]
}
model_comparison = pd.DataFrame(model_data) model_comparison
36 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 11
AIM: Compare the various Unsupervised learning algorithm by using
the appropriate datasets. (K Means Clustering, K Mode)
CODE:
37 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
38 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
Experiment 12
CODE:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
# Load dataset
iris = load_iris()
X = iris.data # Features
y = iris.target.reshape(-1, 1) # Labels
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_one_hot, test_size=0.2,
random_state=42)
39 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
# Training loop
for epoch in range(epochs):
# Forward propagation
Z1 = np.dot(X_train, W1) + b1
A1 = sigmoid(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = sigmoid(Z2)
# Backpropagation
dA2 = (A2 - y_train) * sigmoid_derivative(A2)
dW2 = np.dot(A1.T, dA2)
db2 = np.sum(dA2, axis=0, keepdims=True)
# Accuracy calculation
accuracy = np.mean(y_pred == y_actual) * 100
print(f"Test Accuracy: {accuracy:.2f}%")
40 | P a g e
COMPUTER SCIENCE AND ENGINEERING
FACULTY OF ENGINEERING & TECHNOLOGY
ML (303105354) B. Tech.6th SEM
ENROLLMENT NO: 2203051050512
OUTPUT:
41 | P a g e