0% found this document useful (0 votes)
18 views34 pages

8 To 12 Jaimeen

machine learning practicals
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views34 pages

8 To 12 Jaimeen

machine learning practicals
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Faculty of Engineering & Technology

Machine Learning Laboratory (203105403)


B. Tech CSE 4th Year 7th Semester
PRACTICAL: 08

AIM: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use
the same data set for clustering using k-Means algorithm.
from sklearn import datasets
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt

ERP:- 210303105045 P a g e | 36
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

cluster_3_model = KMeans (n_clusters = 3)


cluster_lables = cluster_3_model.fit_predict(x[:,[0,1]])
plt.scatter(x[:,0],x[:,1],c=cluster_lables)
centroids = cluster_3_model.cluster_centers_
plt.scatter(centroids[:,0],centroids[:,1],c='red')
plt.show()

ERP:- 210303105045 P a g e | 37
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

ERP:- 210303105045 P a g e | 38
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

k_values = [2,3,4,5]
wcss_values = []

for k_value in k_values:


model = KMeans (n_clusters = k_value)
model.fit(x_data[0])
wcss_value = model.inertia_
wcss_values.append(model.inertia_)

plt.plot(k_values,wcss_values)
plt.show()

ERP:- 210303105045 P a g e | 39
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

ERP:- 210303105045 P a g e | 40
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
PRACTICAL: 09

AIM: Write a program to construct a Bayesian network considering medical


data. Use this model to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.

ERP:- 210303105045 P a g e | 41
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

patient_data.hist(figsize=(20, 16))

ERP:- 210303105045 P a g e | 42
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

➢ Logical Regression

# 80% of the data will be used for training


# 20% of the data will be used for testing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
test_size=0.2, random_state=42)
# Define the parameter grid for Logistic Regression
logreg_param_grid = {
"penalty": ["l1", "l2"],
"C": np.logspace(-3, 3, 7),
"solver": ["liblinear"],
}

ERP:- 210303105045 P a g e | 43
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
# Create a Logistic Regression model
logreg_model = LogisticRegression(random_state=42)
# Perform grid search with cross-validation
logreg_grid_search = GridSearchCV(
logreg_model, logreg_param_grid, cv=5, scoring="accuracy"
)
logreg_grid_search.fit(X_train, Y_train)

# Get the best parameters


best_logreg_params = logreg_grid_search.best_params_
print("Best Hyperparameters for Logistic Regression:", best_logreg_params)

# Train a Logistic Regression model with the best parameters


best_logreg_model = LogisticRegression(random_state=42, **best_logreg_params)
best_logreg_model.fit(X_train, Y_train)

# Make predictions on the test set


logreg_predict = best_logreg_model.predict(X_test)

# Calculate accuracy on the test set


best_logreg_acc = accuracy_score(Y_test, logreg_predict)
print(
"Best Accuracy of Logistic Regression model:",
"{:.2f}%".format(best_logreg_acc * 100),
)

# Display classification report


print("\nClassification Report - Logistic Regression:")
lr_cr = classification_report(Y_test, logreg_predict)
print(lr_cr)
# Display confusion matrix
logreg_cm = confusion_matrix(Y_test, logreg_predict)

# Plot the confusion matrix


sns.heatmap(logreg_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix - Logistic Regression")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

Best Hyperparameters for Logistic Regression: {'C': 0.1, 'penalty': 'l1', 'solver': 'liblinear'}
Best Accuracy of Logistic Regression model: 85.29%

Classification Report - Logistic Regression:


precision recall f1-score support

0 0.83 0.84 0.84 107


1 0.87 0.86 0.87 131

accuracy 0.85 238


macro avg 0.85 0.85 0.85 238
weighted avg 0.85 0.85 0.85 238

ERP:- 210303105045 P a g e | 44
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

➢ Decision Tree

# Define the parameter grid


param_grid = {
"criterion": ["gini", "entropy"],
"max_depth": np.arange(1, 21),
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 2, 4],
"max_features": ["sqrt", "log2", None],
}

# Create a Decision Tree model


DT = DecisionTreeClassifier(random_state=0)

# Perform grid search with cross-validation


grid_search = GridSearchCV(DT, param_grid, cv=5, scoring="accuracy")
grid_search.fit(X_train, Y_train)

# Get the best parameters


best_params = grid_search.best_params_
print("Best Hyperparameters:", best_params)

# Train a Decision Tree with the best parameters


best_DT = DecisionTreeClassifier(random_state=0, **best_params)
best_DT.fit(X_train, Y_train)

# Make predictions on the test set


DT_predict = best_DT.predict(X_test)

# Calculate accuracy on the test set


max_dt_acc = accuracy_score(Y_test, DT_predict)
print(
"Accuracy of Decision Tree with Best Parameters:",
"{:.2f}%".format(max_dt_acc * 100),

ERP:- 210303105045 P a g e | 45
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
)

# Display classification report


print("\nClassification Report:")
dt_cr = classification_report(Y_test, DT_predict)
print(dt_cr)

# Display confusion matrix


DT_cm = confusion_matrix(Y_test, DT_predict)

# Plot the confusion matrix


sns.heatmap(DT_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

Best Hyperparameters: {'criterion': 'entropy', 'max_depth': 16, 'max_features': 'sqrt',


'min_samples_leaf': 1, 'min_samples_split': 2}
Accuracy of Decision Tree with Best Parameters: 86.97%

Classification Report:
precision recall f1-score support

0 0.86 0.85 0.85 107


1 0.88 0.89 0.88 131

accuracy 0.87 238


macro avg 0.87 0.87 0.87 238
weighted avg 0.87 0.87 0.87 238

ERP:- 210303105045 P a g e | 46
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
➢ Random Forest

# Define the parameter grid for Random Forest


rf_param_grid = {
"n_estimators": [10, 20, 50, 100],
"max_depth": [None, 5, 10, 15],
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 2, 4],
"max_features": ["auto", "sqrt", "log2"],
"random_state": [12],
}

# Create a Random Forest model


RF = RandomForestClassifier()

# Perform grid search with cross-validation


rf_grid_search = GridSearchCV(RF, rf_param_grid, cv=5, scoring="accuracy")
rf_grid_search.fit(X_train, Y_train)

# Get the best parameters


best_rf_params = rf_grid_search.best_params_
print("Best Hyperparameters for Random Forest:", best_rf_params)

# Train a Random Forest model with the best parameters


best_RF_model = RandomForestClassifier(**best_rf_params)
best_RF_model.fit(X_train, Y_train)

# Make predictions on the test set


RF_predict = best_RF_model.predict(X_test)

# Calculate accuracy on the test set


best_RF_acc = accuracy_score(Y_test, RF_predict)
print("Best Accuracy of Random Forest:", "{:.2f}%".format(best_RF_acc * 100))

# Display classification report


print("\nClassification Report - Random Forest:")
rf_cr = classification_report(Y_test, RF_predict)
print(rf_cr)

# Display confusion matrix


RF_cm = confusion_matrix(Y_test, RF_predict)

# Plot the confusion matrix


sns.heatmap(RF_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix - Random Forest")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

ERP:- 210303105045 P a g e | 47
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
Best Hyperparameters for Random Forest: {'max_depth': 10, 'max_features': 'sqrt',
'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100, 'random_state': 12}
Best Accuracy of Random Forest: 94.54%

Classification Report - Random Forest:


precision recall f1-score support

0 0.96 0.92 0.94 107


1 0.93 0.97 0.95 131

accuracy 0.95 238


macro avg 0.95 0.94 0.94 238
weighted avg 0.95 0.95 0.95 238

➢ KNN

# Define the parameter grid for K-Nearest Neighbors


knn_param_grid = {
"n_neighbors": np.arange(1, 21),
"weights": ["uniform", "distance"],
"algorithm": ["auto", "ball_tree", "kd_tree", "brute"],
}

# Create a K-Nearest Neighbors model


KNN = KNeighborsClassifier()

# Perform grid search with cross-validation


knn_grid_search = GridSearchCV(KNN, knn_param_grid, cv=5, scoring="accuracy")
knn_grid_search.fit(X_train, Y_train)

# Get the best parameters


best_knn_params = knn_grid_search.best_params_
print("Best Hyperparameters for K-Nearest Neighbors:", best_knn_params)

# Train a K-Nearest Neighbors model with the best parameters


best_KNN_model = KNeighborsClassifier(**best_knn_params)
best_KNN_model.fit(X_train, Y_train)

# Make predictions on the test set


KNN_predict = best_KNN_model.predict(X_test)

# Calculate accuracy on the test set

ERP:- 210303105045 P a g e | 48
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
best_KNN_acc = accuracy_score(Y_test, KNN_predict)
print("Best Accuracy of K-Neighbors Classifier:", "{:.2f}%".format(best_KNN_acc * 100))

# Display classification report


print("\nClassification Report - K-Neighbors Classifier:")
knn_cr = classification_report(Y_test, KNN_predict)
print(knn_cr)

# Display confusion matrix


KNN_cm = confusion_matrix(Y_test, KNN_predict)

# Plot the confusion matrix


sns.heatmap(KNN_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix - K-Neighbors Classifier")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

Best Hyperparameters for K-Nearest Neighbors: {'algorithm': 'auto', 'n_neighbors': 19, 'weights': 'distance'}
Best Accuracy of K-Neighbors Classifier: 94.12%

Classification Report - K-Neighbors Classifier:


precision recall f1-score support

0 0.93 0.94 0.94 107


1 0.95 0.94 0.95 131

accuracy 0.94 238


macro avg 0.94 0.94 0.94 238
weighted avg 0.94 0.94 0.94 238

➢ SVM

# Define the parameter grid for Support Vector Classifier


svm_param_grid = {
"C": [0.1, 1, 10, 100],
"gamma": ["scale", "auto", 0.001, 0.01, 0.1, 1, 10],
"kernel": ["rbf"],
}

ERP:- 210303105045 P a g e | 49
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
# Create a Support Vector Classifier model
SVM = SVC()

# Perform grid search with cross-validation


svm_grid_search = GridSearchCV(SVM, svm_param_grid, cv=5, scoring="accuracy")
svm_grid_search.fit(X_train, Y_train)

# Get the best parameters


best_svm_params = svm_grid_search.best_params_
print("Best Hyperparameters for Support Vector Classifier:", best_svm_params)

# Train a Support Vector Classifier model with the best parameters


best_SVM_model = SVC(**best_svm_params)
best_SVM_model.fit(X_train, Y_train)

# Make predictions on the test set


SVM_predict = best_SVM_model.predict(X_test)

# Calculate accuracy on the test set


best_SVM_acc = accuracy_score(Y_test, SVM_predict)
print(
"Best Accuracy of Support Vector Classifier:", "{:.2f}%".format(best_SVM_acc * 100)
)

# Display classification report


print("\nClassification Report - Support Vector Classifier:")
svm_cr = classification_report(Y_test, SVM_predict)
print(svm_cr)

# Display confusion matrix


SVM_cm = confusion_matrix(Y_test, SVM_predict)

# Plot the confusion matrix


sns.heatmap(SVM_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix - Support Vector Classifier")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()
Best Hyperparameters for Support Vector Classifier: {'C': 10, 'gamma': 0.1, 'kernel': 'rbf'}
Best Accuracy of Support Vector Classifier: 89.92%

Classification Report - Support Vector Classifier:


precision recall f1-score support

0 0.87 0.91 0.89 107


1 0.92 0.89 0.91 131

accuracy 0.90 238


macro avg 0.90 0.90 0.90 238
weighted avg 0.90 0.90 0.90 238

ERP:- 210303105045 P a g e | 50
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

➢ Naive Bayes

# Create a Naive Bayes model


NB = GaussianNB()
NB.fit(X_train, Y_train)
NB_predict = NB.predict(X_test)
NB_acc_score = accuracy_score(Y_test, NB_predict)

print("Accuracy of Naive Bayes model:", "{:.2f}%".format(NB_acc_score * 100))

print("\nClassification Report:")
nb_cr = classification_report(Y_test, NB_predict)
print(nb_cr)

NB_cm = confusion_matrix(Y_test, NB_predict)

sns.heatmap(NB_cm, annot=True, fmt="d", cmap="Blues")

plt.title("Confusion Matrix - Naive Bayes")


plt.xlabel("Predicted Label")
plt.ylabel("True Label")

plt.show()

Accuracy of Naive Bayes model: 85.71%

Classification Report:
precision recall f1-score support

0 0.85 0.83 0.84 107


1 0.86 0.88 0.87 131

accuracy 0.86 238


macro avg 0.86 0.85 0.86 238
weighted avg 0.86 0.86 0.86 238

ERP:- 210303105045 P a g e | 51
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

➢ Adaboost

# Define the parameter grid for AdaBoost


ab_param_grid = {
"n_estimators": [50, 100, 150],
"learning_rate": [0.01, 0.1, 1],
"random_state": [42],
}

# Create an AdaBoost model


adaboost_classifier = AdaBoostClassifier()

# Perform grid search with cross-validation


ab_grid_search = GridSearchCV(
adaboost_classifier, ab_param_grid, cv=5, scoring="accuracy"
)
ab_grid_search.fit(X_train, Y_train)

# Get the best parameters


best_ab_params = ab_grid_search.best_params_
print("Best Hyperparameters for AdaBoost:", best_ab_params)

# Train an AdaBoost model with the best parameters


best_adaboost_model = AdaBoostClassifier(**best_ab_params)
best_adaboost_model.fit(X_train, Y_train)

# Make predictions on the test set


adaboost_predict = best_adaboost_model.predict(X_test)

# Calculate accuracy on the test set


adaboost_acc_score = accuracy_score(Y_test, adaboost_predict)
print("Best Accuracy of AdaBoost model:", "{:.2f}%".format(adaboost_acc_score * 100))

# Display classification report


print("\nClassification Report - AdaBoost:")
ab_cr = classification_report(Y_test, adaboost_predict)
print(ab_cr)

ERP:- 210303105045 P a g e | 52
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

# Display confusion matrix


adaboost_cm = confusion_matrix(Y_test, adaboost_predict)

# Plot the confusion matrix


sns.heatmap(adaboost_cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix - AdaBoost")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

Best Hyperparameters for AdaBoost: {'learning_rate': 1, 'n_estimators': 50, 'random_state': 42}


Best Accuracy of AdaBoost model: 89.08%

Classification Report - AdaBoost:


precision recall f1-score support

0 0.89 0.86 0.88 107


1 0.89 0.92 0.90 131

accuracy 0.89 238


macro avg 0.89 0.89 0.89 238
weighted avg 0.89 0.89 0.89 238

ERP:- 210303105045 P a g e | 53
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
PRACTICAL: 10

AIM: Compare the various supervised learning algorithm by using appropriate


dataset.

➢ Comparison Table

# Create a dictionary to store the models and their accuracies


models_accuracy = {
"Logistic Regression": best_logreg_acc,
"Decision Tree": max_dt_acc,
"Random Forest": best_RF_acc,
"KNN": best_KNN_acc,
"SVM": best_SVM_acc,
"Naive Bayes": NB_acc_score,
"Adaboost": adaboost_acc_score,
}

# Find the model with the highest accuracy


best_model = max(models_accuracy, key=models_accuracy.get)
best_accuracy = models_accuracy[best_model]

# Create a DataFrame for comparison


comparison = pd.DataFrame(
{
"Model": list(models_accuracy.keys()),
"Accuracy": ["{:.2f}%".format(acc * 100) for acc in models_accuracy.values()],
}
)

print("Comparison Table:")
print(comparison)

# Print the name and accuracy of the best model


print("\nBest Model:")
print(f"{best_model}: {best_accuracy:.2%}")

Comparison Table:
Model Accuracy
0 Logistic Regression 85.29%
1 Decision Tree 86.97%
2 Random Forest 94.54%
3 KNN 94.12%
4 SVM 89.92%
5 Naive Bayes 85.71%
6 Adaboost 89.08%

Best Model:
Random Forest: 94.54%

ERP:- 210303105045 P a g e | 54
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
➢ Comparison Bar Plot

➢ All Confusion Matrices

num_classifiers = 7
num_rows = (num_classifiers - 1) // 4 + 1
num_cols = min(num_classifiers, 4)

fig, axes = plt.subplots(


nrows=num_rows, ncols=num_cols, figsize=(5 * num_cols, 5 * num_rows)
)
fig.suptitle("Confusion Matrices", fontsize=20)

classifiers = [
("Logistic Regression", logreg_cm, best_logreg_acc),
("Decision Tree", DT_cm, max_dt_acc),
("Random Forest", RF_cm, best_RF_acc),
("K-Neighbors", KNN_cm, best_KNN_acc),
("SVM", SVM_cm, best_SVM_acc),
("Naive Bayes", NB_cm, NB_acc_score),
("AdaBoost", adaboost_cm, adaboost_acc_score),
]
for (name, cm, acc_score), ax in zip(classifiers, axes.flatten()):
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax)

ax.set_title(f"{name} - Accuracy: {acc_score * 100:.2f}%")


ax.set_xlabel("Predicted Label")
ax.set_ylabel("True Label")

for i in range(num_classifiers, num_rows * num_cols):


fig.delaxes(axes.flatten()[i])

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

ERP:- 210303105045 P a g e | 55
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

➢ Classification Report Comparison Bar Plots

# Get classification reports as dictionaries


lr_cr = classification_report(Y_test, logreg_predict, output_dict=True)
dt_cr = classification_report(Y_test, DT_predict, output_dict=True)
rf_cr = classification_report(Y_test, RF_predict, output_dict=True)
knn_cr = classification_report(Y_test, KNN_predict, output_dict=True)
svm_cr = classification_report(Y_test, SVM_predict, output_dict=True)
nb_cr = classification_report(Y_test, NB_predict, output_dict=True)
ab_cr = classification_report(Y_test, adaboost_predict, output_dict=True)

classification_reports = [lr_cr, dt_cr, rf_cr, knn_cr, svm_cr, nb_cr, ab_cr]

f1_scores = {}
recall_scores = {}
precision_scores = {}

# Store f1-score, recall, and precision scores in lists


for name, cr in zip(
[
"Logistic Regression",
"Decision Tree",
"Random Forest",
"KNN",
"SVM",
"Naive Bayes",
"AdaBoost",
],
classification_reports,

ERP:- 210303105045 P a g e | 56
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
):
f1_scores[name] = [
cr[label]["f1-score"] for label in cr.keys() if label.isnumeric()
]
recall_scores[name] = [
cr[label]["recall"] for label in cr.keys() if label.isnumeric()
]
precision_scores[name] = [
cr[label]["precision"] for label in cr.keys() if label.isnumeric()
]

# Create pandas dataframes from the lists


df_f1 = pd.DataFrame(
f1_scores, index=[str(i) for i in range(1, len(f1_scores["Decision Tree"]) + 1)]
)
df_recall = pd.DataFrame(
recall_scores,
index=[str(i) for i in range(1, len(recall_scores["Decision Tree"]) + 1)],
)
df_precision = pd.DataFrame(
precision_scores,
index=[str(i) for i in range(1, len(precision_scores["Decision Tree"]) + 1)],
)

# Plot accuracy comparison


plt.figure(figsize=(15, 5))
sns.barplot(data=df_f1, palette="Set2")
plt.title("F1-Score Comparison")
plt.xlabel("Models")
plt.ylabel("F1-Score")
plt.show()

# Plot recall comparison


plt.figure(figsize=(15, 5))
sns.barplot(data=df_recall, palette="Set2")
plt.title("Recall Comparison")
plt.xlabel("Models")
plt.ylabel("Recall")
plt.show()

# Plot precision comparison


plt.figure(figsize=(15, 5))
sns.barplot(data=df_precision, palette="Set2")
plt.title("Precision Comparison")
plt.xlabel("Models")
plt.ylabel("Precision")
plt.show()

ERP:- 210303105045 P a g e | 57
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

ERP:- 210303105045 P a g e | 58
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
PRACTICAL: 11
AIM: Compare the various Unsupervised learning algorithm by using
the appropriate datasets.
Theory:

1. Partitioning Clustering (K-Means):


K-Means clustering assigns data points to clusters such that the sum of
the squared distances between data points and the cluster center is
minimized. It assumes that clusters are spherical and equally sized.
# K-Means Clustering with Iris dataset and
calculations from sklearn import datasetsfrom
sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt

# Load the Iris dataset iris =


datasets.load_iris()
X = iris.data

# Standardize the data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Reduce data to 2D using PCA for visualizationpca


= PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Apply K-Means Clustering


kmeans = KMeans(n_clusters=3, random_state=42) kmeans_labels = kmeans.fit_predict(X_pca)

# Calculate silhouette score


silhouette_avg = silhouette_score(X_pca, kmeans_labels)
print(f'K-Means Silhouette Score: {silhouette_avg:.3f}')

# Plot the clusters


plt.scatter(X_pca[:, 0], X_pca[:, 1], c=kmeans_labels, cmap='viridis', edgecolor='k')
plt.title('K-Means Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.grid(True)
plt.show()

K-Means Silhouette Score: 0.523

ERP:- 210303105045 P a g e | 59
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
2. Hicíaíckical Clustcíi⭲g (Agglomcíati:c)
Hicíaíckical cl"stcíi⭲g b"ilds a tícc-likc stí"ct"íc (dc⭲díogíam) wkcíc cack data poi⭲t staíts i⭲ its ow⭲ cl"stcí,
a⭲d cl"stcís aíc itcíati:clQ mcígcd bascd o⭲ somc li⭲kagc cíitcíia (c.g., dista⭲cc).

# Agglomerative Clustering with Iris dataset and calculations


from sklearn.cluster import AgglomerativeClustering
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn import datasets
from sklearn.metrics import silhouette_score
import matplotlib pyplot as plt
# Load the Iris dataset iris
= datasets.load_iris()
X = iris.data

# Standardize the data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Reduce data to 2D using PCA for visualization


pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Apply Agglomerative Clustering


agglo = AgglomerativeClustering(n_clusters=3)
agglo_labels = agglo.fit_predict(X_pca)

# Calculate silhouette score


silhouette_avg = silhouette_score(X_pca, agglo_labels)
print(f'Agglomerative Clustering Silhouette Score: {silhouette_avg:.3f}')

# Plot the clusters


plt.scatter(X_pca[:, 0], X_pca[:, 1], c=agglo_labels, cmap='viridis', edgecolor='k')
plt.title('Agglomerative Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.grid(True)
plt.show()

Agglomerative Clustering Silhouette Score: 0.511

ERP:- 210303105045 P a g e | 60
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
3. Dc⭲sity-Bascd Clustcíi⭲g (DBSCAN)
DBSCAN (Dc⭲sitQ-Bascd Spatial Cl"stcíi⭲g of Applicatio⭲s witk Noisc) foíms cl"stcís bQ idc⭲tifQi⭲g dc⭲sc
ícgio⭲s of data poi⭲ts. Poi⭲ts i⭲ low- dc⭲sitQ ícgio⭲s aíc co⭲sidcícd ⭲oisc.

# DBSCAN Clustering with Iris dataset and


calculations from sklearn.cluster import DBSCANfrom
sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScalerfrom
sklearn import datasets
from sklearn.metrics import silhouette_scoreimport
matplotlib.pyplot as plt

# Load the Iris dataset iris =


datasets.load_iris()
X = iris.data

# Standardize the data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Reduce data to 2D using PCA for visualizationpca


= PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# Apply DBSCAN Clustering
dbscan = DBSCAN(eps=0.5, min_samples=5)
dbscan_labels = dbscan.fit_predict(X_pca)

# Calculate silhouette score (DBSCAN may produce -1 for noise, so we filter it out)if
len(set(dbscan_labels)) > 1:
silhouette_avg = silhouette_score(X_pca[dbscan_labels != -1], dbscan_labels[dbscan_labels != -1])print(f'DBSCAN
Silhouette Score: {silhouette_avg:.3f}')
else:
print('DBSCAN did not form any clusters.')

# Plot the clusters


plt.scatter(X_pca[:, 0], X_pca[:, 1], c=dbscan_labels, cmap='viridis', edgecolor='k')plt.title('DBSCAN Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.grid(True)

ERP:- 210303105045 P a g e | 61
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
4. Modcl-Bascd Clustcíi⭲g (Gaussia⭲ Mixtuíc Modcls, GMM):
GMM ass"mcs tkat tkc data is gc⭲cíatcd fíom a mixt"íc of sc:cíal Ga"ssia⭲ distíib"tio⭲s. Eack cl"stcí is
modclcd bQ a m"lti:aíiatc Ga"ssia⭲ distíib"tio⭲, a⭲d tkc goal is to maximizc tkc likclikood of tkc data gi:c⭲tkc
paíamctcís.

# Gaussian Mixture Model (GMM) Clustering with Iris dataset andcalculations


from sklearn.mixture import GaussianMixture
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScalerfrom
sklearn import datasets
from sklearn.metrics import silhouette_scoreimport
matplotlib.pyplot as plt

# Load the Iris dataset iris =


datasets.load_iris()
X = iris.data

# Standardize the data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Reduce data to 2D using PCA for visualizationpca


= PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Apply GMM Clustering


gmm = GaussianMixture(n_components=3, random_state=42)
gmm_labels = gmm.fit_predict(X_pca)

# Calculate silhouette score


silhouette_avg = silhouette_score(X_pca, gmm_labels)
print(f'Gaussian Mixture Model Silhouette Score: {silhouette_avg:.3f}')

# Plot the clusters


plt.scatter(X_pca[:, 0], X_pca[:, 1], c=gmm_labels, cmap='viridis', edgecolor='k')

plt.title('Gaussian Mixture Model Clustering')


plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.grid(True)
plt.show()
Gaussian Mixture Model Silhouette Score: 0.468

ERP:- 210303105045 P a g e | 62
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
5. Mca⭲-Skift Clustcíi⭲g:
Mca⭲-skift cl"stcíi⭲g itcíati:clQ skifts data poi⭲ts towaíd tkc modc (dc⭲scst ícgio⭲) of tkc datasct,"ltimatclQ
idc⭲tifQi⭲g cl"stcís aío"⭲d tkcsc dc⭲sitQ pcaks.

# Mean Shift Clustering with Iris dataset and


calculations from sklearn.cluster import MeanShiftfrom
sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScalerfrom
sklearn import datasets
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt

# Load the Iris dataset iris =


datasets.load_iris()
X = iris.data
# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Reduce data to 2D using PCA for visualizationpca


= PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# Apply Mean Shift Clustering
mean_shift = MeanShift()
mean_shift_labels = mean_shift.fit_predict(X_pca)

# Calculate silhouette score


silhouette_avg = silhouette_score(X_pca, mean_shift_labels)

print(f'Mean Shift Silhouette

Score:{silhouette_avg:.3f}')

# Plot the clusters


plt.scatter(X_pca[:, 0], X_pca[:, 1], c=mean_shift_labels, cmap='viridis', edgecolor='k')
plt.title('Mean Shift Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.grid(True)
plt.show()

Mean Shift Silhouette Score: 0.615

ERP:- 210303105045 P a g e | 63
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
G. Spcctíal Clustcíi⭲g:
Spcctíal cl"stcíi⭲g "scs gíapk tkcoíQ to paítitio⭲ data i⭲to cl"stcís bQ "si⭲g tkc cigc⭲:al"cs (spcctí"m)of

tkc similaíitQ matíix of tkc data. It’s wcll-s"itcd foí data tkat is ⭲ot li⭲caílQ scpaíablc.

# Spectral Clustering with Iris dataset and calculationsfrom


sklearn.cluster import SpectralClustering
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScalerfrom
sklearn import datasets
from sklearn.metrics import silhouette_scoreimport
matplotlib.pyplot as plt

# Load the Iris dataset iris =


datasets.load_iris()
X = iris.data

# Standardize the data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Reduce data to 2D using PCA for visualizationpca


= PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Apply Spectral Clustering


spectral = SpectralClustering(n_clusters=3, affinity='nearest_neighbors', random_state=42)spectral_labels =
spectral.fit_predict(X_pca)

silhouette_avg = silhouette_score(X_pca, spectral_labels)

print(f'Spectral Clustering Silhouette Score: {silhouette_avg:.3f}')#

Plot the clusters


plt.scatter(X_pca[:, 0], X_pca[:, 1], c=spectral_labels, cmap='viridis', edgecolor='k')
plt.title('Spectral Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.grid(True)
plt.show()

Spectral Clustering Silhouette Score: 0.512

ERP:- 210303105045 P a g e | 64
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
PRACTICAL: 12
AIM: Build an Artificial Neural Network by implementing the
Backpropagation algorithm and test the same using appropriate data sets.

ERP:- 210303105045 P a g e | 65
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

# ANN for binary classification

import pandas as pd
from sklearn.preprocessing import LabelEncoder

bank_ds = pd.read_csv('data/bank.csv', delimiter=';')

le = LabelEncoder()

for col in ['job', 'marital', 'default', 'education', 'housing', 'loan', 'contact', 'month', 'poutcome', 'y']:
bank_ds[col] = le.fit_transform(bank_ds[col])
bank_ds.head()

x = bank_ds.iloc[:,:-1].values
y = bank_ds.iloc[:,-1].values

# import numpy as np

# x.shape

# for i in range(x.shape[1]):
# x[:,i] = x[:,i]/np.max(x[:,i]

from sklearn.preprocessing import StandardScaler

x_std = StandardScaler().fit_transform(x)

x_std[0]

ERP:- 210303105045 P a g e | 66
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester
array([-1.05626965, 1.71680374, -0.24642938, -1.64475535, -0.1307588 ,
0.12107186, -1.14205138, -0.42475611, -0.72364152, 0.37405206,
1.48541444, -0.7118608 , -0.57682947, -0.4072183 , -0.32041282,
0.44441328])

# ANN

model_bc = Sequential()
model_bc.add(Dense(20, activation='relu', input_shape=(16,)))
model_bc.add(Dense(5, activation='relu'))
model_bc.add(Dense(1, activation='sigmoid')) #binary classification take sigmoid as activation

model_bc.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


model_bc.fit(x, y, epochs=10, batch_size=128, validation_split=0.2)

Epoch 1/10
29/29 [==============================] - 1s 11ms/step - loss: 132.9066 - accuracy: 0.1773 - val_loss: 49.4584 -
val_accuracy: 0.4177
Epoch 2/10
29/29 [==============================] - 0s 5ms/step - loss: 9.2283 - accuracy: 0.7312 - val_loss: 0.9392 -
val_accuracy: 0.8674
Epoch 3/10
29/29 [==============================] - 0s 7ms/step - loss: 1.1626 - accuracy: 0.8722 - val_loss: 0.7767 -
val_accuracy: 0.8674
Epoch 4/10
29/29 [==============================] - 0s 8ms/step - loss: 0.8962 - accuracy: 0.8728 - val_loss: 0.6238 -
val_accuracy: 0.8674
Epoch 5/10
29/29 [==============================] - 0s 7ms/step - loss: 0.6996 - accuracy: 0.8744 - val_loss: 0.5429 -
val_accuracy: 0.8685
Epoch 6/10
29/29 [==============================] - 0s 8ms/step - loss: 0.5973 - accuracy: 0.8761 - val_loss: 0.5306 -
val_accuracy: 0.8685
Epoch 7/10
29/29 [==============================] - 0s 14ms/step - loss: 0.5731 - accuracy: 0.8769 - val_loss: 0.5318 -
val_accuracy: 0.8685
Epoch 8/10
29/29 [==============================] - 0s 16ms/step - loss: 0.5687 - accuracy: 0.8780 - val_loss: 0.5266 -
val_accuracy: 0.8685
Epoch 9/10
29/29 [==============================] - 0s 8ms/step - loss: 0.5616 - accuracy: 0.8791 - val_loss: 0.5201 -
val_accuracy: 0.8685
Epoch 10/10
29/29 [==============================] - 0s 8ms/step - loss: 0.5560 - accuracy: 0.8808 - val_loss: 0.5174 -
val_accuracy: 0.8729

# prediction of new
pred_proba = model_bc.predict(x)

pred = []
for proba in pred_proba:
if proba>=0.5:
pred.append(1)
else:
pred.append(0)

142/142 [==============================] - 0s 3ms/step

ERP:- 210303105045 P a g e | 67
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

from sklearn.metrics import classification_report

cls = classification_report(y, pred)


print(cls)

precision recall f1-score support

0 0.88 0.99 0.94 4000


1 0.04 0.00 0.00 521

accuracy 0.88 4521


macro avg 0.46 0.50 0.47 4521
weighted avg 0.79 0.88 0.83 4521

from tensorflow.keras import losses


from tensorflow.keras import optimizers

# ANN for multiclass class

from sklearn import datasets


from tensorflow.keras.utils import to_categorical

iris_ds = datasets.load_iris()

x = iris_ds.data
y = iris_ds.target

y_cat = to_categorical(y, num_classes=3) #one-hot encoding

model_mc = Sequential()
model_mc.add(Dense(10, activation='relu', input_shape=(4,)))
model_mc.add(Dense(5, activation='relu'))
model_mc.add(Dense(3, activation='softmax')) #multiclass classification take softmax as activation

model_mc.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


model_mc.fit(x, y_cat, epochs=4, batch_size=64, validation_split=0.2)

Epoch 1/4
2/2 [==============================] - 1s 249ms/step - loss: 1.1049 - accuracy: 0.1667 - val_loss: 0.5357 -
val_accuracy: 1.0000
Epoch 2/4
2/2 [==============================] - 0s 58ms/step - loss: 1.0822 - accuracy: 0.1750 - val_loss: 0.5653 -
val_accuracy: 1.0000
Epoch 3/4
2/2 [==============================] - 0s 92ms/step - loss: 1.0599 - accuracy: 0.2250 - val_loss: 0.5945 -
val_accuracy: 1.0000
Epoch 4/4
2/2 [==============================] - 0s 107ms/step - loss: 1.0394 - accuracy: 0.3750 - val_loss: 0.6243 -
val_accuracy: 1.0000

ERP:- 210303105045 P a g e | 68
Faculty of Engineering & Technology
Machine Learning Laboratory (203105403)
B. Tech CSE 4th Year 7th Semester

x.shape
np.unique(y)

array([0, 1, 2])

y_pred_proba = model_mc.predict(x)

y_pred = []
for proba in y_pred_proba:
y_pred.append(np.argmax(proba))

5/5 [==============================] - 0s 2ms/step

print(classification_report(y, y_pred))

precision recall f1-score support

0 1.00 0.78 0.88 50


1 0.00 0.00 0.00 50
2 0.45 1.00 0.62 50

accuracy 0.59 150


macro avg 0.48 0.59 0.50 150
weighted avg 0.48 0.59 0.50 150

C:\Users\HP\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1318: UndefinedMetricWarning:
Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division`
parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\HP\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1318: UndefinedMetricWarning:
Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division`
parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\HP\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1318: UndefinedMetricWarning:
Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division`
parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))

ERP:- 210303105045 P a g e | 69

You might also like