ML Lab Manual Completed
ML Lab Manual Completed
NO:1
EXPLORATION OF REPOSITORY DATASETS AND TOOLS
DATE:
AIM:
To get started with exploring UCI and Kaggle datasets using tools like WEKA, RapidMiner, and
Python's scikit-learn, follow the steps below, including installation procedures for each tool.
2. Kaggle Datasets
Overview: A platform offering a variety of datasets and competitions for machine learning.
Steps:
Go to Kaggle Datasets.
To download datasets programmatically, you can use Kaggle's API.
Install Kaggle API:
o After installation, set up authentication by downloading your API token from your
Kaggle account:
1. Go to your Kaggle account settings.
2. Select "Create New API Token," which downloads a kaggle.json file.
3. Place the kaggle.json file in the ~/.kaggle/ directory.
Download Datasets via Kaggle API:
3. WEKA
Load a dataset (e.g., from UCI) by going to Open File in WEKA and selecting a file in
.arff or .csv format.
Use the Explorer panel to apply various ML algorithms like Decision Trees, Naive Bayes,
SVM, etc.
4. Rapid Miner
Import datasets by going to the Repository tab and selecting the Import Data option.
Build ML workflows using a visual interface by dragging and dropping data
transformation and ML algorithm components.
Overview: A Python library for machine learning that supports various algorithms for
classification, regression, and clustering.
Installation:
1. First, make sure Python is installed on your system.
To install Python, download it from the Python website and follow the
instructions.
2. Install scikit-learn and other necessary libraries:
bash
Copy code
pip install numpy pandas scikit-learn matplotlib seaborn
Getting Started:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load dataset
df = pd.read_csv('path_to_dataset.csv')
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate accuracy
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')
Visualization: You can also visualize the results using libraries like matplotlib and seaborn.
# Confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True)
plt.show()
3. Model Training & Evaluation: Try different ML models, tune hyperparameters, and evaluate
performance.
4. Visualization: Visualize data, model performance, and results using tools like WEKA's built-in
visualization or Python libraries like matplotlib and seaborn.
RESULT:
Thus we can begin exploring datasets and using different tools for machine learning experiments.
EX.No:2
PERFORM DATA MANIPULATION AND DATA VISUALIZATION
DATE:
AIM:
To Perform data manipulation using NumPy and Pandas and, data visualization using matplotlib.
NumPy Operations
import numpy as np
# 1D array
arr_1d = np.arange(10)
# 2D array
arr_2d = np.arange(1, 10).reshape(3, 3)
# Element-wise operations
arr_1d_add = arr_1d + 5
arr_2d_mul = arr_2d * 2
# Accessing elements
third_element = arr_1d[2]
slice_arr_2d = arr_2d[:2, :2]
1. Create a DataFrame:
import pandas as pd
# Data
data = {'Name': ['John', 'Jane', 'Alice', 'Bob'],
'Age': [28, 34, 29, 40],
'Score': [85, 92, 88, 79]}
# DataFrame
df = pd.DataFrame(data)
print(df)
# Selecting columns
name_score = df[['Name', 'Score']]
# Filtering rows
age_filter = df[df['Age'] > 30]
3. Descriptive Statistics:
Calculate the mean, median, and standard deviation for the 'Score' column.
mean_score = df['Score'].mean()
median_score = df['Score'].median()
std_score = df['Score'].std()
1. Line Plot:
# Data
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Plot
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Sine Wave')
plt.show()
2. Bar Plot:
Create a bar plot using the Pandas DataFrame from Task 2, plotting 'Name' on the x-axis
and 'Score' on the y-axis.
# Bar plot
df.plot(kind='bar', x='Name', y='Score', color='blue')
# Show plot
plt.title('Scores of Students')
plt.show()
3. Scatter Plot:
Create a scatter plot to visualize the relationship between 'Age' and 'Score' in the Pandas
DataFrame.
# Scatter plot
plt.scatter(df['Age'], df['Score'])
plt.xlabel('Age')
plt.ylabel('Score')
plt.title('Age vs Score')
plt.show()
4. Histogram:
Generate a histogram for the 'Age' column from the Pandas DataFrame.
# Histogram
plt.hist(df['Age'], bins=5, color='green', edgecolor='black')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Histogram of Ages')
plt.show()
1. Simulated Dataset:
Generate a synthetic dataset using NumPy, simulating the heights and weights of 100
people.
Store the data in a Pandas DataFrame.
Plot a scatter plot of height vs weight.
# Create a DataFrame
data = {'Height': height, 'Weight': weight}
df_synthetic = pd.DataFrame(data)
# Scatter plot
plt.scatter(df_synthetic['Height'], df_synthetic['Weight'])
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Height vs Weight')
plt.show()
RESULT:
Thus data manipulation using NumPy and Pandas and, data visualization using
matplotlib is performed successfully.
EX.NO:3
IMPLEMENTATION OF NAIVE BAYES CLASSIFIER
DATE:
AIM:
To diagnose heart patients and predict heart disease using heart disease dataset with Naïve
Bayes Classifier Algorithm.
ALGORITHM:
2. Calculate the mean and standard deviation of the predictor variables in each class
3. Repeat Calculate the probability of fi using the gauss density equation in each class; Until the
probability of all predictor variables (f1, f2, f3, .. , fn) has been calculated.
5. Get the greatest likelihood; Program: NB_from_scratch.py import csv import numpy as np from
sklearn.metrics import confusion_matrix, f1_score, roc_curve,
PROGRAM:
NB_from_scratch.py
import csv
import numpy as np
import warnings
import random
import math
# convert txt file to csv
writer = csv.writer(out_file)
writer.writerows(lines)
warnings.filterwarnings("ignore")
def mean(columnvalues):
s=0
n = float(len(columnvalues))
for i in range(len(columnvalues)):
s = s + float(columnvalues[i])
return s / n
def stdev(columnvalues):
avg = mean(columnvalues)
s = 0.0
num = len(columnvalues)
for i in range(num):
s = s + pow(float(columnvalues[i]) - avg, 2)
filename = 'heartdisease.csv'
dataset = list(lines)
for z in range(5):
trainset = []
testset = list(dataset)
for i in range(trainsize):
index = random.randrange(len(testset))
trainset.append(testset.pop(index))
classlist = {}
for i in range(len(dataset)):
class_num = float(dataset[i][-1])
row = dataset[i]
classlist[class_num] = []
classlist[class_num].append(row)
class_data = {}
zip(*row)]
class_datrow = class_datarow[0:13]
class_data[class_num] = class_datarow
y_test = []
for j in range(len(testset)):
y_test.append(testset[j][-1])
y_pred = []
for i in range(len(testset)):
class_probability = {}
class_probability[class_num] = 1
for j in range(len(row)):
x = float(testset[i][j])
if (calculated_dev != 0):
math.pow(calculated_dev, 2))))
class_probability[class_num] *= probability
max_prob = probability
resultant_class = class_num
y_pred.append(resultant_class)
# Getting Accuracy
count = 0
for i in range(len(testset)):
if testset[i][-1] == y_pred[i]:
count += 1
print("\n\n\n\nConfusion Matrix")
print(cf_matrix)
print("\n\n\n\nF1 Score")
print(f_score)
y2 = np.zeros(shape=(len(y1), 5))
y3 = np.zeros(shape=(len(y_pred1), 5))
for i in range(len(y1)):
y2[i][int(y1[i])] = 1
for i in range(len(y_pred1)):
y3[i][int(y_pred1[i])] = 1
n_classes = 5
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
print("\n\n\n\nROC Curve")
lw = 2
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
mean_tpr /= n_classes
fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
plt.figure()
plt.plot(fpr["micro"], tpr["micro"],
label='micro-average (area = {0:0.2f})'
''.format(roc_auc["micro"]),
plt.plot(fpr["macro"], tpr["macro"],
''.format(roc_auc["macro"]),
''.format(i, roc_auc[i]))
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.legend(loc="lower right")
plt.savefig('Exp-8')
plt.show()
NB_from_Gaussian_Sklearn.py
import csv
import pandas as pd
import numpy as np
writer = csv.writer(out_file)
writer.writerows(lines)
df = pd.read_csv('heartdisease.csv', header=None)
# print(training_set)
# print(testing_set)
x = np.array(training_x)
y = np.array(training_y)
for z in range(5):
gnb = GaussianNB()
gnb.fit(x_train, y_train.ravel())
y_pred = gnb.predict(x_test)
y_pred) * 100)
y1 = y_test.ravel()
y_pred1 = y_pred.ravel()
print("\n\n\n\nConfusion Matrix")
print(cf_matrix)
print("\n\n\n\nF1 Score")
print(f_score)
y2 = np.zeros(shape=(len(y1), 5))
y3 = np.zeros(shape=(len(y_pred1), 5))
for i in range(len(y1)):
y2[i][int(y1[i])] = 1
for i in range(len(y_pred1)):
y3[i][int(y_pred1[i])] = 1
n_classes = 5
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
print("\n\n\n\nROC Curve")
lw = 2
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
mean_tpr /= n_classes
fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
plt.figure()
plt.plot(fpr["micro"], tpr["micro"],
plt.plot(fpr["macro"], tpr["macro"],
''.format(roc_auc["macro"]),
''.format(i, roc_auc[i]))
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.legend(loc="lower right")
plt.show()
OUTPUT:
RESULT:
Thus diagnosing heart patients and predict heart disease using heart disease dataset with Naïve
Bayes classifier algorithm is implemented successfully.
EX.NO:4
IMPLEMENTATION OF LINEAR MODELS
DATE:
AIM:
To implement linear models such as locally weighted linear regression and plot the necessary
graphs.
ALGORITHM:
1. Read the Given data Sample to X and the curve (linear or non-linear) to Y
6. Prediction = x0*β.
PROGRAM:
import numpy as np
n = len(x)
r = int(ceil(f * n))
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for i in range(n):
np.sum(weights * x * x)]])
beta = linalg.solve(A, b)
residuals = y - yest
s = np.median(np.abs(residuals))
delta = (1 - delta ** 2) ** 2
return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
f =0.25
iterations=3
plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")
OUTPUT:
Results:
Thus the implementation of linear models such as locally weighted linear regression and plot
the necessary graphs executed successfully.
EX.NO:5
IMPLEMENT MULTI-LAYER PERCEPTRON ALGORITHM
DATE:
AIM:
To implement the multi layer perceptron algorithm for the specified data.
ALGORITHM:
Step 2: Download the dataset. Tensor Flow allows us to read the MNIST dataset and we can load it
directly in the program as a train and test dataset.
Program:
# importing modules
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation
import matplotlib.pyplot as plt
OUTPUT:
OUTPUT:
model = Sequential([
# dense layer 1
Dense(256, activation='sigmoid'),
# dense layer 2
Dense(128, activation='sigmoid'),
# output layer
Dense(10, activation='sigmoid'),
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
OUTPUT:3
results = model.evaluate(x_test, y_test, verbose = 0)
print('test loss, test acc:', results)
OUTPUT:
RESULT:
Thus the multi layer perceptron algorithm for the specified data has been executed and output is
verified successfully.
EX.NO:6
IMPLEMENTATION OF KNN ALGORITHM
DATE:
AIM:
ALGORITHM:
PROGRAM:
pimport numpy as np
import pandas as pd
np.random.seed(42)
data_size = 1000
X = np.random.rand(data_size, 2) # 2 features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Initialize KNN with K=5 (You can change this based on experimentation)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
for i in range(len(X_test)):
print(f"Real-time Prediction for Data Point {i}: {pred[0]} (True Label: {y_test[i]})")
y_pred = knn.predict(X_test)
# Calculate accuracy
knn.fit(X_train, y_train)
X_train_reduced = pca.fit_transform(X_train)
X_test_reduced = pca.transform(X_test)
knn.fit(X_train_reduced, y_train)
# Use multiple cores for parallel distance calculations
np.random.seed(42)
data_size = 1000
X = np.random.rand(data_size, 2) # 2 features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
for i in range(len(X_test)):
print(f"Real-time Prediction for Data Point {i}: {pred[0]} (True Label: {y_test[i]})")
OUTPUT:
y_pred = knn.predict(X_test)
# Calculate accuracy
OUTPUT:
RESULT:
AIM:
To create a machine learning model which classifies the Spam and Ham E-Mails from
ALGORITHM:
2. Read the given csv file which contains the emails which are both spam and ham.
3. Gather all the words given in that dataset and Identify the stop words with a mean distribution.
4. Create an ML model using the Support Vector Classifier after splitting the dataset into training and
test set.
5. Display the accuracy and f1 score and print the confusion matrix for the classification of spam and
ham.
PROGRAM:
import pandas as pd
import numpy as np
import string
import os
class data_read_write(object):
def __init__(self):
pass
self.data_frame = pd.read_csv(file_link)
return self.data_frame
return
class generate_word_cloud(data_read_write):
def __init__(self):
pass
return np.variance(data)
stopwords.update(["subject"])
background_color = "white").generate(text)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.savefig("Distribution.png")
plt.show()
wordcloud.to_file(output_image_file)
return
class data_cleaning(data_read_write):
def __init__(self):
pass
Test_punc_removed_join = ''.join(Test_punc_removed)
return final_join
data_processed = data_column_text.apply(self.message_cleaning)
return data_processed
class apply_embeddding_and_model(data_read_write):
def __init__(self):
pass
preprocessor=None, stop_words=None)
return vectorizer.fit_transform(v_data_column)
probability=True)
svm_cv.fit(X_train, y_train)
y_predict_test = svm_cv.predict(X_test)
cm = confusion_matrix(y_test, y_predict_test)
sns.heatmap(cm, annot=True)
print(classification_report(y_test, y_predict_test))
print("test set")
display_labels=class_names,
cmap=plt.cm.Blues,
normalize=normalize)
disp.ax_.set_title(title)
print(title)
print(disp.confusion_matrix)
plt.savefig("SVM.png")
plt.show()
lr_probs = svm_cv.predict_proba(X_test)
lr_probs = lr_probs[:, 1]
pyplot.legend()
pyplot.savefig("SVMMat.png")
pyplot.show()
return
data_obj = data_read_write("emails.csv")
data_frame = data_obj.read_csv_file("processed.csv")
data_frame.head()
data_frame.tail()
data_frame.describe()
data_frame.info()
data_frame.head()
data_frame.groupby('spam').describe()
data_frame['length'] = data_frame['text'].apply(len)
data_frame['length'].max()
sns.set(rc={'figure.figsize':(11.7,8.27)})
ham_messages_length = data_frame[data_frame['spam']==0]
spam_messages_length = data_frame[data_frame['spam']==1]
plt.legend()
data_frame[data_frame['spam']==0].text.values
data_frame[data_frame['spam']==0].text.values]
data_frame[data_frame['spam']==1].text.values]
print(max(ham_words_length))
print(max(spam_words_length))
sns.set(rc={'figure.figsize':(11.7,8.27)})
plt.xlabel('Number of Words')
plt.legend()
plt.savefig("SVMGraph.png")
plt.show()
def mean_word_length(x):
word_lengths = np.array([])
return word_lengths.mean()
ham_meanword_length =
data_frame[data_frame['spam']==0].text.apply(mean_word_length)
spam_meanword_length =
data_frame[data_frame['spam']==1].text.apply(mean_word_length)
plt.legend()
plt.savefig("Graph.png")
plt.show()
stop_words = set(stopwords.words('english'))
def stop_words_ratio(x):
num_total_words = 0
num_stop_words = 0
for word in word_tokenize(x):
if word in stop_words:
num_stop_words += 1
num_total_words += 1
sns.distplot(spam_stopwords, label='Spam')
plt.legend()
ham = data_frame[data_frame['spam']==0]
spam = data_frame[data_frame['spam']==1]
spam['length'].plot(bins=60, kind='hist')
ham['length'].plot(bins=60, kind='hist')
data_clean_obj = data_cleaning()
data_frame['clean_text'] = data_clean_obj.apply_to_column(data_frame['text'])
data_frame.head()
data_obj.data_frame.head()
data_obj.write_to_csvfile("processed_file.csv")
cv_object = apply_embeddding_and_model()
spamham_countvectorizer = cv_object.apply_count_vector(data_frame['clean_text'])
X = spamham_countvectorizer
label = data_frame['spam'].values
y = label
cv_object.apply_svm(X,y)
Output:
test set
F1 Score: 0.9776119402985075
Recall: 0.9739776951672863
Precision: 0.9812734082397003
[[0.99429875 0.00570125]
[0.0260223 0.9739777 ]]
OUTPUT:
RESULT:
Thus the program to create a machine learning model which classifies the Spam and Ham E-
Mails from a given dataset using Support Vector Machine algorithm has been successfully executed.
EX.NO:8
IMPLEMENTATION OF DECISION TREE
DATE:
AIM:
To implement the concept of decision trees with suitable dataset from real world problems.
ALGORITHM:
PROGRAM:
#Import Modules
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
#Create Dataframe
iris_df = pd.read_csv('/content/Iris.csv')
#Display First five rows of dataframe
iris_df.head(5)
#Drop Id Column
iris_df.drop("Id",axis=1,inplace=True)
#To check Number of rows and Columns
iris_df.shape
(150, 5)
#Create X and y variables
feature_cols = ['SepalLengthCm','SepalWidthCm','SepalWidthCm','SepalWidthCm']
X = iris_df.drop('Species', axis=1) # Features
y = iris_df['Species'] # Target variable
#Print X(Feature variable)
X.head()
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
accuracy_dt = accuracy_score(y_test, y_test_pred)
print('Decision Tree Accuracy:', accuracy_dt)
Decision Tree Accuracy: 1.0
from sklearn.tree import export_graphviz
from io import StringIO
from IPython.display import Image
import pydotplus
dot_data = StringIO()
export_graphviz(decision_tree, out_file=dot_data,
filled=True, rounded=True,max_depth=3,
special_characters=True,feature_names = feature_cols,class_names=['Iris-setosa','Iris-
versicolor','Iris-virginica'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('tree.png')
Image(graph.create_png())
OUTPUT:
RESULT:
Thus the concept of decision trees with suitable dataset from real world problems is
implemented successfully.
EX.NO:9
IMPLEMENTATION OF K MEANS CLUSTERING ALGORITHM
DATE:
AIM:
AlGORITHM:
PROGRAM:
import numpy as np
import pandas as pd
iris = load_iris()
# Initialize KMeans model with 3 clusters (since we know there are 3 classes in the Iris dataset)
kmeans.fit(X)
KMeans
KMeans(n_clusters=3, random_state=0)
# Step 5: Evaluate the Model
# Get cluster labels
labels = kmeans.labels_
# Get cluster centers
centers = kmeans.cluster_centers_
# Get inertia (sum of squared distances to nearest cluster center)
inertia = kmeans.inertia_
print("Cluster Centers:\n", centers)
print("Inertia:", inertia)
Cluster Centers:
[[5.77358491 2.69245283]
[6.81276596 3.07446809]
[5.006 3.428 ]]
Inertia: 37.0507021276596
# Step 6: Visualize the Results
plt.scatter(X.iloc[:, 0], X.iloc[:, 1], c=labels, cmap='viridis', marker='o', edgecolor='k')
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='x', s=100, label='Centers')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.title('K-Means Clustering on Iris Dataset')
plt.legend()
plt.show()
OUTPUT:
RESULT:
Thus K-Means clustering algorithm for the given data set is executed successfully.
EX.NO:10
IMPLEMENTATION OF GENETIC OPERATORS AND Q-LEARNING
DATE:
AIM:
ALGORITHM:
GENETIC ALGORITHM
Q-LEARNING ALGORITHM:
1. Initialization:
Initialize a Q-table: This is a table where rows represent the state (e.g., feature set), and
columns represent actions (e.g., predict 0 or 1).
Each state-action pair in the table holds a Q-value representing the expected reward of
taking that action in that state.
For each new real-time data point, based on the current state, select an action (e.g.,
classify as 0 or 1).
Use an exploration-exploitation strategy like epsilon-greedy (randomly choose an action
with probability ϵ or take the best known action based on Q-values with probability 1−ϵ1
3. Q-value Update:
After taking an action and receiving feedback (reward), update the Q-value using the
Bellman Equation:
PROGRAM:
import random
def fitness_function(k):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
population = sorted_population[:population_size // 2
parent2 = random.choice(population[:3])
# Mutation: Randomly change some individuals (add some randomness to prevent local minima)
for i in range(len(population)):
best_k = population[0]
best_score = fitness_function(best_k)
return population[0]
OUTPUT:
import numpy as np
# Q-learning implementation
# Initialize Q-table with zeros (n_states = number of states, n_actions = 0 or 1 for binary
classification)
def get_state(x):
for i in range(len(X_train)):
state = get_state(X_train[i])
action = np.random.choice(n_actions)
else:
action = np.argmax(Q_table[state])
# Observe the new state (same as state since we're using static data)
new_state = state
return Q_table
print("Final Q-table:")
print(q_table)
OUTPUT:
Final Q-table:
[[ 0.15 0.08]
[ 0.05 0.18]
[ 0.12 0.20]
[ 0.25 0.10]
[ 0.30 0.12]
[ 0.45 0.22]
[ 0.55 0.10]
[ 0.62 0.35]
[ 0.50 0.40]
[ 0.30 0.60]]
RESULT:
Thus the implementation of genetic operators and Q-learning for the given data is executed
successfully.
EX.NO:11
BUILD SUPERVISED AND UNSUPERVISED MODEL
DATE:
AIM:
To build a supervised and unsupervised model for an appropriate dataset.
AlGORITHM:
Step 1: Import libraries
Step 2: Load the Iris dataset
Step 3: Split the dataset into training and testing sets
PROGRAM:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
iris = datasets.load_iris()
X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # Labels (species of iris flowers)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(classification_rep)
OUTPUT:
Accuracy: 1.0
Confusion Matrix:
[[10 0 0]
[ 0 9 1]
[ 0 0 10]]
Classification Report:
accuracy 0.95 30
macro avg 0.97 0.97 0.97 30
OUTPUT:
Cluster Labels: [0 1 1 0 0 1 1 0 1 2 2 2 0 0 0 1 2 2 0 1 1 0 2 1 2 0 1 2 0 2 2 2 1 1 0 1 2 1 1 0 2 1 0 2 0 1 1
2 1 0]
Silhouette Score: 0.61
RESULT:
Thus the building of supervised and unsupervised model for an appropriate dataset is
implemented successfully.