Assignment 5
Assignment 5
Divide the data set into training and test set. Compare the accuracy of the different classifiers
under the following situations:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
X = iris.data
y = iris.target
print(iris_df.head(5))
sepal length (cm) sepal width (cm) petal length (cm) petal width
(cm) \
0 5.1 3.5 1.4
0.2
1 4.9 3.0 1.4
0.2
2 4.7 3.2 1.3
0.2
3 4.6 3.1 1.5
0.2
4 5.0 3.6 1.4
0.2
species
0 0
1 0
2 0
3 0
4 0
Splitting the data into training and test sets (75% training, 25% test)
# Define classifiers
nb_classifier = GaussianNB()
knn_classifier = KNeighborsClassifier()
dt_classifier = DecisionTreeClassifier()
# Splitting the data into training and test sets (75% training, 25%
test)
X_train_a, X_test_a, y_train_a, y_test_a = train_test_split(X, y,
test_size=0.25, random_state=42)
Splitting the data into training and test sets (2/3rd training, 1/3rd test)
# Hold-out method
X_train_holdout, X_test_holdout, y_train_holdout, y_test_holdout =
train_test_split(X, y, test_size=0.2, random_state=42)
# Cross-validation
cv_scores_nb = cross_val_score(nb_classifier, X, y, cv=5)
cv_scores_knn = cross_val_score(knn_classifier, X, y, cv=5)
cv_scores_dt = cross_val_score(dt_classifier, X, y, cv=5)
print("\nCross-validation scores:")
print("Naive Bayes Classifier Accuracy:", cv_scores_nb.mean())
print("K-Nearest Neighbors Classifier Accuracy:",
cv_scores_knn.mean())
print("Decision Tree Classifier Accuracy:", cv_scores_dt.mean())
Cross-validation scores:
Naive Bayes Classifier Accuracy: 0.9533333333333334
K-Nearest Neighbors Classifier Accuracy: 0.9733333333333334
Decision Tree Classifier Accuracy: 0.9600000000000002
# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)