0% found this document useful (0 votes)
69 views56 pages

ML Lab Manual Completed

Uploaded by

javeedakthar003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views56 pages

ML Lab Manual Completed

Uploaded by

javeedakthar003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

EX.

NO:1
EXPLORATION OF REPOSITORY DATASETS AND TOOLS
DATE:

AIM:

To get started with exploring UCI and Kaggle datasets using tools like WEKA, RapidMiner, and
Python's scikit-learn, follow the steps below, including installation procedures for each tool.

THEORY AND INSTALLATION:

1. UCI Machine Learning Repository

 Overview: A collection of datasets for machine learning research.


 Access:

 No installation is required; simply go to UCI ML Repository.


 Download datasets in CSV, ARFF, or other formats directly from the repository.

2. Kaggle Datasets

 Overview: A platform offering a variety of datasets and competitions for machine learning.
 Steps:

 Go to Kaggle Datasets.
 To download datasets programmatically, you can use Kaggle's API.
 Install Kaggle API:

pip install kaggle

o After installation, set up authentication by downloading your API token from your
Kaggle account:
1. Go to your Kaggle account settings.
2. Select "Create New API Token," which downloads a kaggle.json file.
3. Place the kaggle.json file in the ~/.kaggle/ directory.
 Download Datasets via Kaggle API:

kaggle datasets download -d <dataset-name>

3. WEKA

 Overview: A GUI-based tool for machine learning and data mining.


 Installation:
1. Download the latest version of WEKA from the WEKA official website.
2. Run the installer and follow the instructions.
3. After installation, launch WEKA by running the application.
 Getting Started:

 Load a dataset (e.g., from UCI) by going to Open File in WEKA and selecting a file in
.arff or .csv format.
 Use the Explorer panel to apply various ML algorithms like Decision Trees, Naive Bayes,
SVM, etc.

4. Rapid Miner

 Overview: A drag-and-drop platform for data science workflows.


 Installation:
1. Download the free version from the RapidMiner website.
2. Install the software by following the setup instructions.
3. Once installed, launch the RapidMiner Studio.
 Getting Started:

 Import datasets by going to the Repository tab and selecting the Import Data option.
 Build ML workflows using a visual interface by dragging and dropping data
transformation and ML algorithm components.

5. Python with scikit-learn

 Overview: A Python library for machine learning that supports various algorithms for
classification, regression, and clustering.
 Installation:
1. First, make sure Python is installed on your system.
 To install Python, download it from the Python website and follow the
instructions.
2. Install scikit-learn and other necessary libraries:

bash
Copy code
pip install numpy pandas scikit-learn matplotlib seaborn

 Getting Started:

 After installation, create a Python script or Jupyter Notebook.


 Import datasets, preprocess, and apply ML algorithms. Example code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
df = pd.read_csv('path_to_dataset.csv')

# Preprocess (split into features and target)


X = df.drop('target_column', axis=1)
y = df['target_column']

# Split into training and testing


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train a RandomForest model


clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate accuracy
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

 Visualization: You can also visualize the results using libraries like matplotlib and seaborn.

import matplotlib.pyplot as plt


import seaborn as sns

# Confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True)
plt.show()

Practical Workflow Recap

1. Dataset Selection: Choose a dataset from UCI or Kaggle.


2. Tool Selection:

 For GUI-based experiments: Use WEKA or RapidMiner.


 For code-based exploration: Use Python with scikit-learn.

3. Model Training & Evaluation: Try different ML models, tune hyperparameters, and evaluate
performance.
4. Visualization: Visualize data, model performance, and results using tools like WEKA's built-in
visualization or Python libraries like matplotlib and seaborn.

RESULT:

Thus we can begin exploring datasets and using different tools for machine learning experiments.
EX.No:2
PERFORM DATA MANIPULATION AND DATA VISUALIZATION
DATE:

AIM:

To Perform data manipulation using NumPy and Pandas and, data visualization using matplotlib.

NumPy Operations

1. Create and Manipulate Arrays:

 Create a 1D array of numbers from 0 to 9 using NumPy.


 Create a 2D array (3x3) with numbers ranging from 1 to 9.
 Perform element-wise addition, subtraction, and multiplication on these arrays.

import numpy as np

# 1D array
arr_1d = np.arange(10)

# 2D array
arr_2d = np.arange(1, 10).reshape(3, 3)

# Element-wise operations
arr_1d_add = arr_1d + 5
arr_2d_mul = arr_2d * 2

print("1D Array:", arr_1d)


print("2D Array:\n", arr_2d)
print("1D Array after addition:\n", arr_1d_add)
print("2D Array after multiplication:\n", arr_2d_mul)

Indexing and Slicing:

 Access the third element in the 1D array.


 Slice the first two rows and columns from the 2D array.

# Accessing elements
third_element = arr_1d[2]
slice_arr_2d = arr_2d[:2, :2]

print("Third element of 1D array:", third_element)


print("Slice of 2D array:\n", slice_arr_2d)
Data Manipulation with Pandas

1. Create a DataFrame:

 Create a Pandas DataFrame from the following data:

 Names: 'John', 'Jane', 'Alice', 'Bob'


 Ages: 28, 34, 29, 40
 Scores: 85, 92, 88, 79

import pandas as pd

# Data
data = {'Name': ['John', 'Jane', 'Alice', 'Bob'],
'Age': [28, 34, 29, 40],
'Score': [85, 92, 88, 79]}

# DataFrame
df = pd.DataFrame(data)

print(df)

2. Data Selection and Filtering:

 Select the 'Name' and 'Score' columns.


 Filter the DataFrame to show only rows where the age is greater than 30.

# Selecting columns
name_score = df[['Name', 'Score']]

# Filtering rows
age_filter = df[df['Age'] > 30]

print("Names and Scores:\n", name_score)


print("Filtered Data (Age > 30):\n", age_filter)

3. Descriptive Statistics:

 Calculate the mean, median, and standard deviation for the 'Score' column.

mean_score = df['Score'].mean()
median_score = df['Score'].median()
std_score = df['Score'].std()

print(f"Mean Score: {mean_score}")


print(f"Median Score: {median_score}")
print(f"Standard Deviation of Score: {std_score}")
Task 3: Data Visualization using Matplotlib

1. Line Plot:

 Create a NumPy array x from 0 to 10 with 100 equally spaced points.


 Plot y = sin(x) using Matplotlib and label the axes and the title.

import matplotlib.pyplot as plt


import numpy as np

# Data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Plot
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Sine Wave')
plt.show()

2. Bar Plot:

 Create a bar plot using the Pandas DataFrame from Task 2, plotting 'Name' on the x-axis
and 'Score' on the y-axis.

# Bar plot
df.plot(kind='bar', x='Name', y='Score', color='blue')

# Show plot
plt.title('Scores of Students')
plt.show()

3. Scatter Plot:

 Create a scatter plot to visualize the relationship between 'Age' and 'Score' in the Pandas
DataFrame.

# Scatter plot
plt.scatter(df['Age'], df['Score'])
plt.xlabel('Age')
plt.ylabel('Score')
plt.title('Age vs Score')
plt.show()

4. Histogram:

 Generate a histogram for the 'Age' column from the Pandas DataFrame.
# Histogram
plt.hist(df['Age'], bins=5, color='green', edgecolor='black')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Histogram of Ages')
plt.show()

Task 4: Combination Exercise

1. Simulated Dataset:

 Generate a synthetic dataset using NumPy, simulating the heights and weights of 100
people.
 Store the data in a Pandas DataFrame.
 Plot a scatter plot of height vs weight.

# Generate synthetic data


height = np.random.normal(165, 10, 100)
weight = np.random.normal(70, 15, 100)

# Create a DataFrame
data = {'Height': height, 'Weight': weight}
df_synthetic = pd.DataFrame(data)

# Scatter plot
plt.scatter(df_synthetic['Height'], df_synthetic['Weight'])
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Height vs Weight')
plt.show()

RESULT:

Thus data manipulation using NumPy and Pandas and, data visualization using
matplotlib is performed successfully.
EX.NO:3
IMPLEMENTATION OF NAIVE BAYES CLASSIFIER
DATE:

AIM:

To diagnose heart patients and predict heart disease using heart disease dataset with Naïve
Bayes Classifier Algorithm.

ALGORITHM:

Steps in Naïve Bayes Classifier Algorithm:

1. Read the training dataset T

2. Calculate the mean and standard deviation of the predictor variables in each class

3. Repeat Calculate the probability of fi using the gauss density equation in each class; Until the
probability of all predictor variables (f1, f2, f3, .. , fn) has been calculated.

4. Calculate the likelihood for each class

5. Get the greatest likelihood; Program: NB_from_scratch.py import csv import numpy as np from
sklearn.metrics import confusion_matrix, f1_score, roc_curve,

PROGRAM:

NB_from_scratch.py

import csv

import numpy as np

from sklearn.metrics import confusion_matrix, f1_score, roc_curve, auc

import matplotlib.pyplot as plt

from itertools import cycle

from scipy import interp

import warnings

import random

import math
# convert txt file to csv

with open('heartdisease.txt', 'r') as in_file:

stripped = (line.strip() for line in in_file)

lines = (line.split(",") for line in stripped if line)

with open('heartdisease.csv', 'w', newline='') as out_file:

writer = csv.writer(out_file)

writer.writerow(('age', 'sex', 'cp', 'restbp', 'chol', 'fbs', 'restecg',

'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num'))

writer.writerows(lines)

warnings.filterwarnings("ignore")

# Example of Naive Bayes implemented from Scratch in Python

# calculating mean of column values belonging to one class

def mean(columnvalues):

s=0

n = float(len(columnvalues))

for i in range(len(columnvalues)):

s = s + float(columnvalues[i])

return s / n

# calculating standard deviation of column values belonging to one class

def stdev(columnvalues):

avg = mean(columnvalues)

s = 0.0

num = len(columnvalues)

for i in range(num):

s = s + pow(float(columnvalues[i]) - avg, 2)

variance = s / (float(num - 1))


return math.sqrt(variance)

# Reading CSV file

filename = 'heartdisease.csv'

lines = csv.reader(open(filename, "r"))

dataset = list(lines)

for i in range(len(dataset) - 1):

dataset[i] = [float(x) for x in dataset[i + 1]]

for z in range(5):

print("\n\n\nTest Train Split no. ", z + 1, "\n\n\n")

trainsize = int(len(dataset) * 0.75)

trainset = []

testset = list(dataset)

for i in range(trainsize):

index = random.randrange(len(testset))

trainset.append(testset.pop(index))

# separate list according to class

classlist = {}

for i in range(len(dataset)):

class_num = float(dataset[i][-1])

row = dataset[i]

if (class_num not in classlist):

classlist[class_num] = []

classlist[class_num].append(row)

# preparing data class wise

class_data = {}

for class_num, row in classlist.items():


class_datarow = [(mean(columnvalues), stdev(columnvalues)) for columnvalues in

zip(*row)]

class_datrow = class_datarow[0:13]

class_data[class_num] = class_datarow

# Getting test vector

y_test = []

for j in range(len(testset)):

y_test.append(testset[j][-1])

# Getting prediction vector

y_pred = []

for i in range(len(testset)):

class_probability = {}

for class_num, row in class_data.items():

class_probability[class_num] = 1

for j in range(len(row)):

calculated_mean, calculated_dev = row[j]

x = float(testset[i][j])

if (calculated_dev != 0):

power = math.exp(-(math.pow(x - calculated_mean, 2) / (2 *

math.pow(calculated_dev, 2))))

probability = (1 / (math.sqrt(2 * math.pi) * calculated_dev)) * power

class_probability[class_num] *= probability

resultant_class, max_prob = -1, -1

for class_num, probability in class_probability.items():

if resultant_class == -1 or probability > max_prob:

max_prob = probability
resultant_class = class_num

y_pred.append(resultant_class)

# Getting Accuracy

count = 0

for i in range(len(testset)):

if testset[i][-1] == y_pred[i]:

count += 1

accuracy = (count / float(len(testset))) * 100.0

print("\n\n Accuracy: ", accuracy, "%")

y1 = [float(k) for k in y_test]

y_pred1 = [float(k) for k in y_pred]

print("\n\n\n\nConfusion Matrix")

cf_matrix = confusion_matrix(y1, y_pred1)

print(cf_matrix)

print("\n\n\n\nF1 Score")

f_score = f1_score(y1, y_pred1, average='weighted')

print(f_score)

# Matrix from 1D array

y2 = np.zeros(shape=(len(y1), 5))

y3 = np.zeros(shape=(len(y_pred1), 5))

for i in range(len(y1)):

y2[i][int(y1[i])] = 1

for i in range(len(y_pred1)):

y3[i][int(y_pred1[i])] = 1

# ROC Curve generation

n_classes = 5
fpr = dict()

tpr = dict()

roc_auc = dict()

for i in range(n_classes):

fpr[i], tpr[i], _ = roc_curve(y2[:, i], y3[:, i])

roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area

fpr["micro"], tpr["micro"], _ = roc_curve(y2.ravel(), y3.ravel())

roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

# Compute macro-average ROC curve and ROC area

print("\n\n\n\nROC Curve")

# First aggregate all false positive rates

lw = 2

all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))

# Then interpolate all ROC curves at this points

mean_tpr = np.zeros_like(all_fpr)

for i in range(n_classes):

mean_tpr += interp(all_fpr, fpr[i], tpr[i])

# Finally average it and compute AUC

mean_tpr /= n_classes

fpr["macro"] = all_fpr

tpr["macro"] = mean_tpr

roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])

# Plot all ROC curves

plt.figure()

plt.plot(fpr["micro"], tpr["micro"],
label='micro-average (area = {0:0.2f})'

''.format(roc_auc["micro"]),

color='deeppink', linestyle=':', linewidth=4)

plt.plot(fpr["macro"], tpr["macro"],

label='macro-average (area = {0:0.2f})'

''.format(roc_auc["macro"]),

color='navy', linestyle=':', linewidth=4)

colors = cycle(['aqua', 'darkorange', 'cornflowerblue', 'red', 'black'])

for i, color in zip(range(n_classes), colors):

plt.plot(fpr[i], tpr[i], color=color, lw=lw,

label='ROC of class {0} (area = {1:0.2f})'

''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)

plt.xlim([0.0, 1.0])

plt.ylim([0.0, 1.05])

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('Receiver operating characteristic for multi-class')

plt.legend(loc="lower right")

plt.savefig('Exp-8')

plt.show()

NB_from_Gaussian_Sklearn.py

import csv

import pandas as pd

import numpy as np

from sklearn.naive_bayes import GaussianNB


from sklearn.model_selection import train_test_split

from sklearn import metrics

from sklearn.metrics import confusion_matrix, f1_score, roc_curve, auc

import matplotlib.pyplot as plt

from itertools import cycle

from scipy import interp

# converting txt file to csv file

with open('heartdisease.txt', 'r') as in_file:

stripped = (line.strip() for line in in_file)

lines = (line.split(",") for line in stripped if line)

with open('heartdisease.csv', 'w') as out_file:

writer = csv.writer(out_file)

writer.writerow(('age', 'sex', 'cp', 'restbp', 'chol', 'fbs', 'restecg',

'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num'))

writer.writerows(lines)

# reading CSV using Pandas and storing in dataframe

df = pd.read_csv('heartdisease.csv', header=None)

training_x = df.iloc[1:df.shape[0], 0:13]

# print(training_set)

training_y = df.iloc[1:df.shape[0], 13:14]

# print(testing_set)

# converting dataframe into arrays

x = np.array(training_x)

y = np.array(training_y)

for z in range(5):

print("\n\n\nTest Train Split no. ", z + 1, "\n\n\n")


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=None)

# Gaussian function of sklearn

gnb = GaussianNB()

gnb.fit(x_train, y_train.ravel())

y_pred = gnb.predict(x_test)

print("\n\nGaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test,

y_pred) * 100)

# convert 2D array to 1D array

y1 = y_test.ravel()

y_pred1 = y_pred.ravel()

print("\n\n\n\nConfusion Matrix")

cf_matrix = confusion_matrix(y1, y_pred1)

print(cf_matrix)

print("\n\n\n\nF1 Score")

f_score = f1_score(y1, y_pred1, average='weighted')

print(f_score)

# Matrix from 1D array

y2 = np.zeros(shape=(len(y1), 5))

y3 = np.zeros(shape=(len(y_pred1), 5))

for i in range(len(y1)):

y2[i][int(y1[i])] = 1

for i in range(len(y_pred1)):

y3[i][int(y_pred1[i])] = 1

# ROC Curve generation

n_classes = 5

fpr = dict()
tpr = dict()

roc_auc = dict()

for i in range(n_classes):

fpr[i], tpr[i], _ = roc_curve(y2[:, i], y3[:, i])

roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area

fpr["micro"], tpr["micro"], _ = roc_curve(y2.ravel(), y3.ravel())

roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

# Compute macro-average ROC curve and ROC area

print("\n\n\n\nROC Curve")

# First aggregate all false positive rates

lw = 2

all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))

# Then interpolate all ROC curves at this points

mean_tpr = np.zeros_like(all_fpr)

for i in range(n_classes):

mean_tpr += interp(all_fpr, fpr[i], tpr[i])

# Finally average it and compute AUC

mean_tpr /= n_classes

fpr["macro"] = all_fpr

tpr["macro"] = mean_tpr

roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])

# Plot all ROC curves

plt.figure()

plt.plot(fpr["micro"], tpr["micro"],

label='micro-average (area = {0:0.2f})'


''.format(roc_auc["micro"]),

color='deeppink', linestyle=':', linewidth=4)

plt.plot(fpr["macro"], tpr["macro"],

label='macro-average (area = {0:0.2f})'

''.format(roc_auc["macro"]),

color='navy', linestyle=':', linewidth=4)

colors = cycle(['aqua', 'darkorange', 'cornflowerblue', 'red', 'black'])

for i, color in zip(range(n_classes), colors):

plt.plot(fpr[i], tpr[i], color=color, lw=lw,

label='ROC of class {0} (area = {1:0.2f})'

''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)

plt.xlim([0.0, 1.0])

plt.ylim([0.0, 1.05])

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('Receiver operating characteristic for multi-class')

plt.legend(loc="lower right")

plt.show()
OUTPUT:

RESULT:

Thus diagnosing heart patients and predict heart disease using heart disease dataset with Naïve
Bayes classifier algorithm is implemented successfully.
EX.NO:4
IMPLEMENTATION OF LINEAR MODELS
DATE:

AIM:

To implement linear models such as locally weighted linear regression and plot the necessary
graphs.

ALGORITHM:

1. Read the Given data Sample to X and the curve (linear or non-linear) to Y

2. Set the value for Smoothening parameter or Free parameter say τ

3. Set the bias /Point of interest set x0 which is a subset of X

4. Determine the weight matrix using :

5. Determine the value of model term parameter β using :

6. Prediction = x0*β.

PROGRAM:

from math import ceil

import numpy as np

from scipy import linalg

def lowess(x, y, f, iterations):

n = len(x)

r = int(ceil(f * n))

h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]

w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)

w = (1 - w ** 3) ** 3
yest = np.zeros(n)

delta = np.ones(n)

for iteration in range(iterations):

for i in range(n):

weights = delta * w[:, i]

b = np.array([np.sum(weights * y), np.sum(weights * y * x)])

A = np.array([[np.sum(weights), np.sum(weights * x)],[np.sum(weights * x),

np.sum(weights * x * x)]])

beta = linalg.solve(A, b)

yest[i] = beta[0] + beta[1] * x[i]

residuals = y - yest

s = np.median(np.abs(residuals))

delta = np.clip(residuals / (6.0 * s), -1, 1)

delta = (1 - delta ** 2) ** 2

return yest

import math

n = 100

x = np.linspace(0, 2 * math.pi, n)

y = np.sin(x) + 0.3 * np.random.randn(n)

f =0.25

iterations=3

yest = lowess(x, y, f, iterations)

import matplotlib.pyplot as plt

plt.plot(x,y,"r.")

plt.plot(x,yest,"b-")
OUTPUT:

Results:

Thus the implementation of linear models such as locally weighted linear regression and plot
the necessary graphs executed successfully.
EX.NO:5
IMPLEMENT MULTI-LAYER PERCEPTRON ALGORITHM
DATE:

AIM:

To implement the multi layer perceptron algorithm for the specified data.

ALGORITHM:

Step 1: Import the necessary libraries.

Step 2: Download the dataset. Tensor Flow allows us to read the MNIST dataset and we can load it
directly in the program as a train and test dataset.

Step 3: Now we will convert the pixels into floating-point values.

Step 4: Understand the structure of the dataset

Step 5: Visualize the data.

Step 6: Form the Input, hidden, and output layers.

Step 7: Compile the model.

Step 8: Fit the model.

Step 9: Find Accuracy of the model.

Program:

# importing modules
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


11493376/11490434 [==============================] – 2s 0us/step
# Cast the records into float values
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# normalize image pixel values by dividing


# by 255
gray_scale = 255
x_train /= gray_scale
x_test /= gray_scale
print("Feature matrix:", x_train.shape)
print("Target matrix:", x_test.shape)
print("Feature matrix:", y_train.shape)
print("Target matrix:", y_test.shape)

OUTPUT:

Feature matrix: (60000, 28, 28)


Target matrix: (10000, 28, 28)
Feature matrix: (60000,)
Target matrix: (10000,)

fig, ax = plt.subplots(10, 10)


k=0
for i in range(10):
for j in range(10):
ax[i][j].imshow(x_train[k].reshape(28, 28),
aspect='auto')
k += 1
plt.show()

OUTPUT:
model = Sequential([

# reshape 28 row * 28 column data to 28*28 rows


Flatten(input_shape=(28, 28)),

# dense layer 1
Dense(256, activation='sigmoid'),

# dense layer 2
Dense(128, activation='sigmoid'),

# output layer
Dense(10, activation='sigmoid'),
])

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10,


batch_size=2000,
validation_split=0.2)

OUTPUT:3
results = model.evaluate(x_test, y_test, verbose = 0)
print('test loss, test acc:', results)

OUTPUT:

test loss, test acc: [0.27210235595703125, 0.9223999977111816]

RESULT:
Thus the multi layer perceptron algorithm for the specified data has been executed and output is
verified successfully.
EX.NO:6
IMPLEMENTATION OF KNN ALGORITHM
DATE:

AIM:

To implement K-NN algorithm for the specified data.

ALGORITHM:

1. Collect data continuously.


2. Preprocess real-time data (scaling, normalization).
3. Choose KKK, the number of neighbors.
4. For every new data point:
 Compute distances to all existing data points.
 Identify the K-nearest neighbors.
 Use majority voting (classification) or averaging (regression) to predict the label/value.
5. Update the system with new labeled data periodically.

PROGRAM:

pimport numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import accuracy_scoreip install numpy pandas scikit-learn

# Example dataset: Features are random, labels are 0 or 1

np.random.seed(42)

data_size = 1000

X = np.random.rand(data_size, 2) # 2 features

y = np.random.choice([0, 1], size=data_size) # Binary labels

# Split dataset into training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardizing the dataset (important for KNN)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Initialize KNN with K=5 (You can change this based on experimentation)

knn = KNeighborsClassifier(n_neighbors=5)

# Train the model on the training data

knn.fit(X_train, y_train)

# Real-time data simulation: Loop through test set (one-by-one)

for i in range(len(X_test)):

# Predict the class for the incoming real-time data point

pred = knn.predict([X_test[i]]) # Reshape single data point to 2D array

print(f"Real-time Prediction for Data Point {i}: {pred[0]} (True Label: {y_test[i]})")

# Predict all the labels for the test set

y_pred = knn.predict(X_test)

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy of KNN model: {accuracy:.2f}")

# Using KD-tree for optimized nearest neighbor search

knn = KNeighborsClassifier(n_neighbors=5, algorithm='kd_tree')

knn.fit(X_train, y_train)

from sklearn.decomposition import PCA

# Applying PCA to reduce dimensionality

pca = PCA(n_components=2) # Reduce to 2 dimensions

X_train_reduced = pca.fit_transform(X_train)

X_test_reduced = pca.transform(X_test)

# Fit and evaluate KNN on reduced data

knn.fit(X_train_reduced, y_train)
# Use multiple cores for parallel distance calculations

knn = KNeighborsClassifier(n_neighbors=5, n_jobs=-1) # -1 uses all available cores

np.random.seed(42)

data_size = 1000

X = np.random.rand(data_size, 2) # 2 features

y = np.random.choice([0, 1], size=data_size) # Binary labels

# Split dataset into training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardizing the dataset

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Initialize and train KNN

knn = KNeighborsClassifier(n_neighbors=5)

knn.fit(X_train, y_train)

for i in range(len(X_test)):

pred = knn.predict([X_test[i]]) # Reshape single data point to 2D array

print(f"Real-time Prediction for Data Point {i}: {pred[0]} (True Label: {y_test[i]})")

OUTPUT:

Real-time Prediction for Data Point 0: 0 (True Label: 0)

Real-time Prediction for Data Point 1: 1 (True Label: 1)

Real-time Prediction for Data Point 2: 1 (True Label: 1)

Real-time Prediction for Data Point 3: 1 (True Label: 1)

Real-time Prediction for Data Point 4: 0 (True Label: 0)


...

# Predict all the labels for the test set

y_pred = knn.predict(X_test)

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy of KNN model: {accuracy:.2f}")

OUTPUT:

Accuracy of KNN model: 0.90

RESULT:

Thus the K NN implementation for the specified dataset is executed successfully.


EX.NO:7
IMPLEMENTATION OF SVM ALGORITHM
DATE:

AIM:

To create a machine learning model which classifies the Spam and Ham E-Mails from

a given dataset using Support Vector Machine algorithm.

ALGORITHM:

1. Import all the necessary libraries.

2. Read the given csv file which contains the emails which are both spam and ham.

3. Gather all the words given in that dataset and Identify the stop words with a mean distribution.

4. Create an ML model using the Support Vector Classifier after splitting the dataset into training and
test set.

5. Display the accuracy and f1 score and print the confusion matrix for the classification of spam and
ham.

PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

import string

from nltk.corpus import stopwords

import os

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

from PIL import Image

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report, confusion_matrix


from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import roc_curve, auc

from sklearn import metrics

from sklearn import model_selection

from sklearn import svm

from nltk import word_tokenize

from sklearn.metrics import roc_auc_score

from matplotlib import pyplot

from sklearn.metrics import plot_confusion_matrix

class data_read_write(object):

def __init__(self):

pass

def __init__(self, file_link):

self.data_frame = pd.read_csv(file_link)

def read_csv_file(self, file_link):

return self.data_frame

def write_to_csvfile(self, file_link):

self.data_frame.to_csv(file_link, encoding='utf-8', index=False, header=True)

return

class generate_word_cloud(data_read_write):

def __init__(self):

pass

def variance_column(self, data):

return np.variance(data)

def word_cloud(self, data_frame_column, output_image_file):

text = " ".join(review for review in data_frame_column)


stopwords = set(STOPWORDS)

stopwords.update(["subject"])

wordcloud = WordCloud(width = 1200, height = 800, stopwords=stopwords,

max_font_size = 50, margin=0,

background_color = "white").generate(text)

plt.imshow(wordcloud, interpolation='bilinear')

plt.axis("off")

plt.savefig("Distribution.png")

plt.show()

wordcloud.to_file(output_image_file)

return

class data_cleaning(data_read_write):

def __init__(self):

pass

def message_cleaning(self, message):

Test_punc_removed = [char for char in message if char not in string.punctuation]

Test_punc_removed_join = ''.join(Test_punc_removed)

Test_punc_removed_join_clean = [word for word in Test_punc_removed_join.split()

if word.lower() not in stopwords.words('english')]

final_join = ' '.join(Test_punc_removed_join_clean)

return final_join

def apply_to_column(self, data_column_text):

data_processed = data_column_text.apply(self.message_cleaning)

return data_processed

class apply_embeddding_and_model(data_read_write):

def __init__(self):
pass

def apply_count_vector(self, v_data_column):

vectorizer = CountVectorizer(min_df=2, analyzer="word", tokenizer=None,

preprocessor=None, stop_words=None)

return vectorizer.fit_transform(v_data_column)

def apply_svm(self, X, y):

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

params = {'kernel': 'linear', 'C': 2, 'gamma': 1}

svm_cv = svm.SVC(C=params['C'], kernel=params['kernel'], gamma=params['gamma'],

probability=True)

svm_cv.fit(X_train, y_train)

y_predict_test = svm_cv.predict(X_test)

cm = confusion_matrix(y_test, y_predict_test)

sns.heatmap(cm, annot=True)

print(classification_report(y_test, y_predict_test))

print("test set")

print("\nAccuracy Score: " + str(metrics.accuracy_score(y_test, y_predict_test)))

print("F1 Score: " + str(metrics.f1_score(y_test, y_predict_test)))

print("Recall: " + str(metrics.recall_score(y_test, y_predict_test)))

print("Precision: " + str(metrics.precision_score(y_test, y_predict_test)))

class_names = ['ham', 'spam']

titles_options = [("Confusion matrix, without normalization", None),

("Normalized confusion matrix", 'true')]

for title, normalize in titles_options:

disp = plot_confusion_matrix(svm_cv, X_test, y_test,

display_labels=class_names,
cmap=plt.cm.Blues,

normalize=normalize)

disp.ax_.set_title(title)

print(title)

print(disp.confusion_matrix)

plt.savefig("SVM.png")

plt.show()

ns_probs = [0 for _ in range(len(y_test))]

lr_probs = svm_cv.predict_proba(X_test)

lr_probs = lr_probs[:, 1]

ns_auc = roc_auc_score(y_test, ns_probs)

lr_auc = roc_auc_score(y_test, lr_probs)

print('No Skill: ROC AUC=%.3f' % (ns_auc))

print('SVM: ROC AUC=%.3f' % (lr_auc))

ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)

lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)

pyplot.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')

pyplot.plot(lr_fpr, lr_tpr, marker='.', label='SVM')

pyplot.xlabel('False Positive Rate')

pyplot.ylabel('True Positive Rate')

pyplot.legend()

pyplot.savefig("SVMMat.png")

pyplot.show()

return

data_obj = data_read_write("emails.csv")

data_frame = data_obj.read_csv_file("processed.csv")
data_frame.head()

data_frame.tail()

data_frame.describe()

data_frame.info()

data_frame.head()

data_frame.groupby('spam').describe()

data_frame['length'] = data_frame['text'].apply(len)

data_frame['length'].max()

sns.set(rc={'figure.figsize':(11.7,8.27)})

ham_messages_length = data_frame[data_frame['spam']==0]

spam_messages_length = data_frame[data_frame['spam']==1]

ham_messages_length['length'].plot(bins=100, kind='hist',label = 'Ham')

spam_messages_length['length'].plot(bins=100, kind='hist',label = 'Spam')

plt.title('Distribution of Length of Email Text')

plt.xlabel('Length of Email Text')

plt.legend()

data_frame[data_frame['spam']==0].text.values

ham_words_length = [len(word_tokenize(title)) for title in

data_frame[data_frame['spam']==0].text.values]

spam_words_length = [len(word_tokenize(title)) for title in

data_frame[data_frame['spam']==1].text.values]

print(max(ham_words_length))

print(max(spam_words_length))

sns.set(rc={'figure.figsize':(11.7,8.27)})

ax = sns.distplot(ham_words_length, norm_hist = True, bins = 30, label = 'Ham')

ax = sns.distplot(spam_words_length, norm_hist = True, bins = 30, label = 'Spam')


plt.title('Distribution of Number of Words')

plt.xlabel('Number of Words')

plt.legend()

plt.savefig("SVMGraph.png")

plt.show()

def mean_word_length(x):

word_lengths = np.array([])

for word in word_tokenize(x):

word_lengths = np.append(word_lengths, len(word))

return word_lengths.mean()

ham_meanword_length =

data_frame[data_frame['spam']==0].text.apply(mean_word_length)

spam_meanword_length =

data_frame[data_frame['spam']==1].text.apply(mean_word_length)

sns.distplot(ham_meanword_length, norm_hist = True, bins = 30, label = 'Ham')

sns.distplot(spam_meanword_length , norm_hist = True, bins = 30, label = 'Spam')

plt.title('Distribution of Mean Word Length')

plt.xlabel('Mean Word Length')

plt.legend()

plt.savefig("Graph.png")

plt.show()

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

def stop_words_ratio(x):

num_total_words = 0

num_stop_words = 0
for word in word_tokenize(x):

if word in stop_words:

num_stop_words += 1

num_total_words += 1

return num_stop_words / num_total_words

ham_stopwords = data_frame[data_frame['spam'] == 0].text.apply(stop_words_ratio)

spam_stopwords = data_frame[data_frame['spam'] == 1].text.apply(stop_words_ratio)

sns.distplot(ham_stopwords, norm_hist=True, label='Ham')

sns.distplot(spam_stopwords, label='Spam')

print('Ham Mean: {:.3f}'.format(ham_stopwords.values.mean()))

print('Spam Mean: {:.3f}'.format(spam_stopwords.values.mean()))

plt.title('Distribution of Stop-word Ratio')

plt.xlabel('Stop Word Ratio')

plt.legend()

ham = data_frame[data_frame['spam']==0]

spam = data_frame[data_frame['spam']==1]

spam['length'].plot(bins=60, kind='hist')

ham['length'].plot(bins=60, kind='hist')

data_frame['Ham(0) and Spam(1)'] = data_frame['spam']

print( 'Spam percentage =', (len(spam) / len(data_frame) )*100,"%")

print( 'Ham percentage =', (len(ham) / len(data_frame) )*100,"%")

sns.countplot(data_frame['Ham(0) and Spam(1)'], label = "Count")

data_clean_obj = data_cleaning()

data_frame['clean_text'] = data_clean_obj.apply_to_column(data_frame['text'])
data_frame.head()

data_obj.data_frame.head()

data_obj.write_to_csvfile("processed_file.csv")

cv_object = apply_embeddding_and_model()

spamham_countvectorizer = cv_object.apply_count_vector(data_frame['clean_text'])

X = spamham_countvectorizer

label = data_frame['spam'].values

y = label

cv_object.apply_svm(X,y)

Output:

precision recall f1-score support

0 0.99 0.99 0.99 877

1 0.98 0.97 0.98 269

accuracy 0.99 1146

macro avg 0.99 0.98 0.99 1146

weighted avg 0.99 0.99 0.99 1146

test set

Accuracy Score: 0.9895287958115183

F1 Score: 0.9776119402985075

Recall: 0.9739776951672863
Precision: 0.9812734082397003

Normalized confusion matrix

[[0.99429875 0.00570125]

[0.0260223 0.9739777 ]]

OUTPUT:
RESULT:

Thus the program to create a machine learning model which classifies the Spam and Ham E-
Mails from a given dataset using Support Vector Machine algorithm has been successfully executed.
EX.NO:8
IMPLEMENTATION OF DECISION TREE
DATE:

AIM:

To implement the concept of decision trees with suitable dataset from real world problems.

ALGORITHM:

1. Start with the entire dataset.


2. Evaluate all features and find the best split based on impurity/variance reduction.
3. Split the dataset into subsets and assign them as children nodes.
4. Recursively split the children nodes.
5. Stop splitting based on the stopping criteria.
6. Assign the output class/value to leaf nodes.

PROGRAM:

#Import Modules
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
#Create Dataframe
iris_df = pd.read_csv('/content/Iris.csv')
#Display First five rows of dataframe
iris_df.head(5)

#Drop Id Column
iris_df.drop("Id",axis=1,inplace=True)
#To check Number of rows and Columns
iris_df.shape
(150, 5)
#Create X and y variables
feature_cols = ['SepalLengthCm','SepalWidthCm','SepalWidthCm','SepalWidthCm']
X = iris_df.drop('Species', axis=1) # Features
y = iris_df['Species'] # Target variable
#Print X(Feature variable)
X.head()

#Print y(Target variable)


y.head()
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
Name: Species, dtype: object
#Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)
decision_tree = DecisionTreeClassifier(random_state=42)
decision_tree.fit(X_train, y_train)
decision_tree.score(X_train,y_train)
1.0
# Make predictions on the train dataset by using the 'predict()' function.
# Compute the predictions
y_pred_dt = pd.Series(decision_tree.predict(X_train))
# Print the occurrence of each flower type computed in the predictions.
y_pred_dt.value_counts()
Iris-versicolor 41
Iris-setosa 40
Iris-virginica 39
Name: count, dtype: int64
#Make predictions on the test dataset by using the 'predict()' function.
# Compute the predictions
y_test_pred= pd.Series(decision_tree.predict(X_test))
# Print the occurrence of each flower type computed in the predictions.
y_test_pred.value_counts()
Iris-virginica 11
Iris-setosa 10
Iris-versicolor 9
Name: count, dtype: int64
# Create a confusion matrix for the test set.
# Import the libraries
from sklearn.metrics import confusion_matrix, classification_report
# Print the confusion matrix
cm= confusion_matrix(y_test, y_test_pred)
cm
array([[10, 0, 0],
[ 0, 9, 0],
[ 0, 0, 11]])
# Display recall, precision and f1-score values for the test set.
bm= classification_report(y_test,y_test_pred)
print(bm)
precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 10


Iris-versicolor 1.00 1.00 1.00 9
Iris-virginica 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
accuracy_dt = accuracy_score(y_test, y_test_pred)
print('Decision Tree Accuracy:', accuracy_dt)
Decision Tree Accuracy: 1.0
from sklearn.tree import export_graphviz
from io import StringIO
from IPython.display import Image
import pydotplus

dot_data = StringIO()
export_graphviz(decision_tree, out_file=dot_data,
filled=True, rounded=True,max_depth=3,
special_characters=True,feature_names = feature_cols,class_names=['Iris-setosa','Iris-
versicolor','Iris-virginica'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('tree.png')
Image(graph.create_png())
OUTPUT:

RESULT:
Thus the concept of decision trees with suitable dataset from real world problems is
implemented successfully.
EX.NO:9
IMPLEMENTATION OF K MEANS CLUSTERING ALGORITHM
DATE:

AIM:

To implement K-Means clustering algorithm for the given data.

AlGORITHM:

1. Randomly initialize Kcentroids.


2. Assign each data point to the nearest centroid (form Kclusters).
3. Recalculate the centroids of the clusters.
4. Repeat until centroids do not change significantly.
5. Return the clusters and final centroids.

PROGRAM:

# Install required libraries

!pip install scikit-learn matplotlib

Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.5.2)


Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.7.1)
Requirement already satisfied: numpy>=1.19.5 in /usr/local/lib/python3.10/dist-packages (from scikit-
learn) (1.26.4)
Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from scikit-
learn) (1.13.1)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from scikit-
learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from
scikit-learn) (3.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib)
(0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (4.54.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (3.1.4)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-
dateutil>=2.7->matplotlib) (1.16.0)

import numpy as np

import pandas as pd

from sklearn.datasets import load_iris

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

# Step 2: Load the Dataset

iris = load_iris()

data = pd.DataFrame(iris.data, columns=iris.feature_names)

# Step 3: Preprocess the Data

# Select the first two features for simplicity

X = data.iloc[:, :2] # Use only 'sepal length' and 'sepal width'

# Step 4: Create and Train the K-Means Model

# Initialize KMeans model with 3 clusters (since we know there are 3 classes in the Iris dataset)

kmeans = KMeans(n_clusters=3, random_state=0)

kmeans.fit(X)

KMeans
KMeans(n_clusters=3, random_state=0)
# Step 5: Evaluate the Model
# Get cluster labels
labels = kmeans.labels_
# Get cluster centers
centers = kmeans.cluster_centers_
# Get inertia (sum of squared distances to nearest cluster center)
inertia = kmeans.inertia_
print("Cluster Centers:\n", centers)
print("Inertia:", inertia)
Cluster Centers:
[[5.77358491 2.69245283]
[6.81276596 3.07446809]
[5.006 3.428 ]]
Inertia: 37.0507021276596
# Step 6: Visualize the Results
plt.scatter(X.iloc[:, 0], X.iloc[:, 1], c=labels, cmap='viridis', marker='o', edgecolor='k')
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='x', s=100, label='Centers')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.title('K-Means Clustering on Iris Dataset')
plt.legend()
plt.show()

OUTPUT:

RESULT:

Thus K-Means clustering algorithm for the given data set is executed successfully.
EX.NO:10
IMPLEMENTATION OF GENETIC OPERATORS AND Q-LEARNING
DATE:

AIM:

To implement genetic operators and Q-learning for the given data.

ALGORITHM:

GENETIC ALGORITHM

1. Initialization:Define the range of K (number of neighbors) and create an initial population of


potential Kvalues.
2. Fitness Function:For each individual in the population, use KNN with the corresponding
Kvalue, compute accuracy on the validation set, and use this as the fitness score.
3. Selection, Crossover, Mutation:Use selection to pick the best K values, crossover to combine
them, and mutation to introduce randomness by adjusting K.
4. Iterate for a set number of generations.

Q-LEARNING ALGORITHM:

1. Initialization:

 Initialize a Q-table: This is a table where rows represent the state (e.g., feature set), and
columns represent actions (e.g., predict 0 or 1).
 Each state-action pair in the table holds a Q-value representing the expected reward of
taking that action in that state.

2. Action Selection (Policy):

 For each new real-time data point, based on the current state, select an action (e.g.,
classify as 0 or 1).
 Use an exploration-exploitation strategy like epsilon-greedy (randomly choose an action
with probability ϵ or take the best known action based on Q-values with probability 1−ϵ1

3. Q-value Update:

 After taking an action and receiving feedback (reward), update the Q-value using the
Bellman Equation:
PROGRAM:

import random

from sklearn.metrics import accuracy_score

# Genetic Algorithm for optimizing KNN

def genetic_algorithm_knn(X_train, y_train, X_test, y_test, population_size=10, generations=10):

# Randomly initialize the population with K values between 1 and 30

population = [random.randint(1, 30) for _ in range(population_size)]

def fitness_function(k):

# Fit and evaluate KNN with the given K value

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

return accuracy_score(y_test, y_pred)

for generation in range(generations):

# Evaluate fitness of each individual in the population

fitness_scores = [fitness_function(k) for k in population]

# Selection: Select the top individuals based on fitness

sorted_population = [x for _, x in sorted(zip(fitness_scores, population), reverse=True)]

population = sorted_population[:population_size // 2

# Crossover: Create new individuals by combining parts of two parents

while len(population) < population_size:

parent1 = random.choice(population[:3]) # Top 3 are best parents

parent2 = random.choice(population[:3])

offspring = (parent1 + parent2) // 2


population.append(offspring)

# Mutation: Randomly change some individuals (add some randomness to prevent local minima)

for i in range(len(population)):

if random.random() < 0.1: # 10% chance of mutation

population[i] = random.randint(1, 30)

# Output the best result in each generation

best_k = population[0]

best_score = fitness_function(best_k)

print(f"Generation {generation+1}: Best K = {best_k}, Best Accuracy = {best_score:.4f}")

# Return the best solution found

return population[0]

# Run Genetic Algorithm to optimize K

best_k = genetic_algorithm_knn(X_train, y_train, X_test, y_test)

print(f"Optimal K found by GA: {best_k}")

OUTPUT:

Generation 1: Best K = 3, Best Accuracy = 0.89

Generation 2: Best K = 5, Best Accuracy = 0.90

Generation 3: Best K = 7, Best Accuracy = 0.91

Generation 4: Best K = 7, Best Accuracy = 0.91

Generation 5: Best K = 9, Best Accuracy = 0.91

Generation 6: Best K = 9, Best Accuracy = 0.92

Generation 7: Best K = 9, Best Accuracy = 0.92

Generation 8: Best K = 11, Best Accuracy = 0.92

Generation 9: Best K = 11, Best Accuracy = 0.93


Generation 10: Best K = 13, Best Accuracy = 0.93

Optimal K found by GA: 13

import numpy as np

# Q-learning implementation

def q_learning_knn(X_train, y_train, n_actions=2, n_states=10, n_episodes=100):

# Initialize Q-table with zeros (n_states = number of states, n_actions = 0 or 1 for binary
classification)

Q_table = np.zeros((n_states, n_actions))

alpha = 0.1 # Learning rate

gamma = 0.9 # Discount factor

epsilon = 0.1 # Exploration-exploitation trade-off

def get_state(x):

"""Convert the input to a state index (discretize the feature space)"""

return int(x[0] * n_states) # Example of simple discretization

for episode in range(n_episodes):

for i in range(len(X_train)):

state = get_state(X_train[i])

# Epsilon-greedy action selection

if np.random.rand() < epsilon:

action = np.random.choice(n_actions)

else:

action = np.argmax(Q_table[state])

# Execute the action and receive reward

reward = 1 if (action == y_train[i]) else -1 # Reward is based on correct classification

# Observe the new state (same as state since we're using static data)
new_state = state

# Update Q-value using Bellman equation

Q_table[state, action] = Q_table[state, action] + alpha * (reward + gamma *


np.max(Q_table[new_state]) - Q_table[state, action])

return Q_table

# Run Q-learning to optimize binary classification

q_table = q_learning_knn(X_train, y_train)

print("Final Q-table:")

print(q_table)

OUTPUT:

Final Q-table:

[[ 0.15 0.08]

[ 0.05 0.18]

[ 0.12 0.20]

[ 0.25 0.10]

[ 0.30 0.12]

[ 0.45 0.22]

[ 0.55 0.10]

[ 0.62 0.35]

[ 0.50 0.40]

[ 0.30 0.60]]

RESULT:

Thus the implementation of genetic operators and Q-learning for the given data is executed
successfully.
EX.NO:11
BUILD SUPERVISED AND UNSUPERVISED MODEL
DATE:

AIM:
To build a supervised and unsupervised model for an appropriate dataset.

AlGORITHM:
Step 1: Import libraries
Step 2: Load the Iris dataset
Step 3: Split the dataset into training and testing sets

Step 4: Create and train the Decision Tree model


Step 5: Make predictions on the test set
Step 6: Evaluate the model's performance
Step 7: Print the results

PROGRAM:
import numpy as np

import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
iris = datasets.load_iris()

X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # Labels (species of iris flowers)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")

print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(classification_rep)

OUTPUT:
Accuracy: 1.0
Confusion Matrix:
[[10 0 0]
[ 0 9 1]
[ 0 0 10]]
Classification Report:

precision recall f1-score support


0 1.00 1.00 1.00 10
1 1.00 0.90 0.95 10
2 0.91 1.00 0.95 10

accuracy 0.95 30
macro avg 0.97 0.97 0.97 30

weighted avg 0.97 0.95 0.96 30


import pandas as pd
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
wine = datasets.load_wine()
X = wine.data # Features (13 chemical attributes)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)
cluster_labels = kmeans.labels_
sil_score = silhouette_score(X_scaled, cluster_labels)
print(f"Cluster Labels: {cluster_labels}")
print(f"Silhouette Score: {sil_score}")

OUTPUT:
Cluster Labels: [0 1 1 0 0 1 1 0 1 2 2 2 0 0 0 1 2 2 0 1 1 0 2 1 2 0 1 2 0 2 2 2 1 1 0 1 2 1 1 0 2 1 0 2 0 1 1
2 1 0]
Silhouette Score: 0.61

RESULT:

Thus the building of supervised and unsupervised model for an appropriate dataset is
implemented successfully.

You might also like