0% found this document useful (0 votes)

69 views56 pages

ML Lab Manual Completed

Uploaded by

javeedakthar003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views56 pages

ML Lab Manual Completed

Uploaded by

javeedakthar003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

EX.

NO:1
EXPLORATION OF REPOSITORY DATASETS AND TOOLS
DATE:

AIM:

To get started with exploring UCI and Kaggle datasets using tools like WEKA, RapidMiner, and
Python's scikit-learn, follow the steps below, including installation procedures for each tool.

THEORY AND INSTALLATION:

1. UCI Machine Learning Repository

 Overview: A collection of datasets for machine learning research.

 Access:

 No installation is required; simply go to UCI ML Repository.

 Download datasets in CSV, ARFF, or other formats directly from the repository.

2. Kaggle Datasets

 Overview: A platform offering a variety of datasets and competitions for machine learning.
 Steps:

 Go to Kaggle Datasets.
 To download datasets programmatically, you can use Kaggle's API.
 Install Kaggle API:

pip install kaggle

o After installation, set up authentication by downloading your API token from your
Kaggle account:
1. Go to your Kaggle account settings.
2. Select "Create New API Token," which downloads a kaggle.json file.
3. Place the kaggle.json file in the ~/.kaggle/ directory.
 Download Datasets via Kaggle API:

kaggle datasets download -d <dataset-name>

3. WEKA

 Overview: A GUI-based tool for machine learning and data mining.

 Installation:
1. Download the latest version of WEKA from the WEKA official website.
2. Run the installer and follow the instructions.
3. After installation, launch WEKA by running the application.
 Getting Started:

 Load a dataset (e.g., from UCI) by going to Open File in WEKA and selecting a file in
.arff or .csv format.
 Use the Explorer panel to apply various ML algorithms like Decision Trees, Naive Bayes,
SVM, etc.

4. Rapid Miner

 Overview: A drag-and-drop platform for data science workflows.

 Installation:
1. Download the free version from the RapidMiner website.
2. Install the software by following the setup instructions.
3. Once installed, launch the RapidMiner Studio.
 Getting Started:

 Import datasets by going to the Repository tab and selecting the Import Data option.
 Build ML workflows using a visual interface by dragging and dropping data
transformation and ML algorithm components.

5. Python with scikit-learn

 Overview: A Python library for machine learning that supports various algorithms for
classification, regression, and clustering.
 Installation:
1. First, make sure Python is installed on your system.
 To install Python, download it from the Python website and follow the
instructions.
2. Install scikit-learn and other necessary libraries:

bash
Copy code
pip install numpy pandas scikit-learn matplotlib seaborn

 Getting Started:

 After installation, create a Python script or Jupyter Notebook.

 Import datasets, preprocess, and apply ML algorithms. Example code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
df = pd.read_csv('path_to_dataset.csv')

# Preprocess (split into features and target)

X = df.drop('target_column', axis=1)
y = df['target_column']

# Split into training and testing

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train a RandomForest model

clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate accuracy
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

 Visualization: You can also visualize the results using libraries like matplotlib and seaborn.

import matplotlib.pyplot as plt

import seaborn as sns

# Confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True)
plt.show()

Practical Workflow Recap

1. Dataset Selection: Choose a dataset from UCI or Kaggle.

2. Tool Selection:

 For GUI-based experiments: Use WEKA or RapidMiner.

 For code-based exploration: Use Python with scikit-learn.

3. Model Training & Evaluation: Try different ML models, tune hyperparameters, and evaluate
performance.
4. Visualization: Visualize data, model performance, and results using tools like WEKA's built-in
visualization or Python libraries like matplotlib and seaborn.

RESULT:

Thus we can begin exploring datasets and using different tools for machine learning experiments.
EX.No:2
PERFORM DATA MANIPULATION AND DATA VISUALIZATION
DATE:

AIM:

To Perform data manipulation using NumPy and Pandas and, data visualization using matplotlib.

NumPy Operations

1. Create and Manipulate Arrays:

 Create a 1D array of numbers from 0 to 9 using NumPy.

 Create a 2D array (3x3) with numbers ranging from 1 to 9.
 Perform element-wise addition, subtraction, and multiplication on these arrays.

import numpy as np

# 1D array
arr_1d = np.arange(10)

# 2D array
arr_2d = np.arange(1, 10).reshape(3, 3)

# Element-wise operations
arr_1d_add = arr_1d + 5
arr_2d_mul = arr_2d * 2

print("1D Array:", arr_1d)

print("2D Array:\n", arr_2d)
print("1D Array after addition:\n", arr_1d_add)
print("2D Array after multiplication:\n", arr_2d_mul)

Indexing and Slicing:

 Access the third element in the 1D array.

 Slice the first two rows and columns from the 2D array.

# Accessing elements
third_element = arr_1d[2]
slice_arr_2d = arr_2d[:2, :2]

print("Third element of 1D array:", third_element)

print("Slice of 2D array:\n", slice_arr_2d)
Data Manipulation with Pandas

1. Create a DataFrame:

 Create a Pandas DataFrame from the following data:

 Names: 'John', 'Jane', 'Alice', 'Bob'

 Ages: 28, 34, 29, 40
 Scores: 85, 92, 88, 79

import pandas as pd

# Data
data = {'Name': ['John', 'Jane', 'Alice', 'Bob'],
'Age': [28, 34, 29, 40],
'Score': [85, 92, 88, 79]}

# DataFrame
df = pd.DataFrame(data)

print(df)

2. Data Selection and Filtering:

 Select the 'Name' and 'Score' columns.

 Filter the DataFrame to show only rows where the age is greater than 30.

# Selecting columns
name_score = df[['Name', 'Score']]

# Filtering rows
age_filter = df[df['Age'] > 30]

print("Names and Scores:\n", name_score)

print("Filtered Data (Age > 30):\n", age_filter)

3. Descriptive Statistics:

 Calculate the mean, median, and standard deviation for the 'Score' column.

mean_score = df['Score'].mean()
median_score = df['Score'].median()
std_score = df['Score'].std()

print(f"Mean Score: {mean_score}")

print(f"Median Score: {median_score}")
print(f"Standard Deviation of Score: {std_score}")
Task 3: Data Visualization using Matplotlib

1. Line Plot:

 Create a NumPy array x from 0 to 10 with 100 equally spaced points.

 Plot y = sin(x) using Matplotlib and label the axes and the title.

import matplotlib.pyplot as plt

import numpy as np

# Data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Plot
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Sine Wave')
plt.show()

2. Bar Plot:

 Create a bar plot using the Pandas DataFrame from Task 2, plotting 'Name' on the x-axis
and 'Score' on the y-axis.

# Bar plot
df.plot(kind='bar', x='Name', y='Score', color='blue')

# Show plot
plt.title('Scores of Students')
plt.show()

3. Scatter Plot:

 Create a scatter plot to visualize the relationship between 'Age' and 'Score' in the Pandas
DataFrame.

# Scatter plot
plt.scatter(df['Age'], df['Score'])
plt.xlabel('Age')
plt.ylabel('Score')
plt.title('Age vs Score')
plt.show()

4. Histogram:

 Generate a histogram for the 'Age' column from the Pandas DataFrame.
# Histogram
plt.hist(df['Age'], bins=5, color='green', edgecolor='black')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Histogram of Ages')
plt.show()

Task 4: Combination Exercise

1. Simulated Dataset:

 Generate a synthetic dataset using NumPy, simulating the heights and weights of 100
people.
 Store the data in a Pandas DataFrame.
 Plot a scatter plot of height vs weight.

# Generate synthetic data

height = np.random.normal(165, 10, 100)
weight = np.random.normal(70, 15, 100)

# Create a DataFrame
data = {'Height': height, 'Weight': weight}
df_synthetic = pd.DataFrame(data)

# Scatter plot
plt.scatter(df_synthetic['Height'], df_synthetic['Weight'])
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Height vs Weight')
plt.show()

RESULT:

Thus data manipulation using NumPy and Pandas and, data visualization using
matplotlib is performed successfully.
EX.NO:3
IMPLEMENTATION OF NAIVE BAYES CLASSIFIER
DATE:

AIM:

To diagnose heart patients and predict heart disease using heart disease dataset with Naïve
Bayes Classifier Algorithm.

ALGORITHM:

Steps in Naïve Bayes Classifier Algorithm:

1. Read the training dataset T

2. Calculate the mean and standard deviation of the predictor variables in each class

3. Repeat Calculate the probability of fi using the gauss density equation in each class; Until the
probability of all predictor variables (f1, f2, f3, .. , fn) has been calculated.

4. Calculate the likelihood for each class

5. Get the greatest likelihood; Program: NB_from_scratch.py import csv import numpy as np from
sklearn.metrics import confusion_matrix, f1_score, roc_curve,

PROGRAM:

NB_from_scratch.py

import csv

import numpy as np

from sklearn.metrics import confusion_matrix, f1_score, roc_curve, auc

import matplotlib.pyplot as plt

from itertools import cycle

from scipy import interp

import warnings

import random

import math
# convert txt file to csv

with open('heartdisease.txt', 'r') as in_file:

stripped = (line.strip() for line in in_file)

lines = (line.split(",") for line in stripped if line)

with open('heartdisease.csv', 'w', newline='') as out_file:

writer = csv.writer(out_file)

writer.writerow(('age', 'sex', 'cp', 'restbp', 'chol', 'fbs', 'restecg',

'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num'))

writer.writerows(lines)

warnings.filterwarnings("ignore")

# Example of Naive Bayes implemented from Scratch in Python

# calculating mean of column values belonging to one class

def mean(columnvalues):

s=0

n = float(len(columnvalues))

for i in range(len(columnvalues)):

s = s + float(columnvalues[i])

return s / n

# calculating standard deviation of column values belonging to one class

def stdev(columnvalues):

avg = mean(columnvalues)

s = 0.0

num = len(columnvalues)

for i in range(num):

s = s + pow(float(columnvalues[i]) - avg, 2)

variance = s / (float(num - 1))

return math.sqrt(variance)

# Reading CSV file

filename = 'heartdisease.csv'

lines = csv.reader(open(filename, "r"))

dataset = list(lines)

for i in range(len(dataset) - 1):

dataset[i] = [float(x) for x in dataset[i + 1]]

for z in range(5):

print("\n\n\nTest Train Split no. ", z + 1, "\n\n\n")

trainsize = int(len(dataset) * 0.75)

trainset = []

testset = list(dataset)

for i in range(trainsize):

index = random.randrange(len(testset))

trainset.append(testset.pop(index))

# separate list according to class

classlist = {}

for i in range(len(dataset)):

class_num = float(dataset[i][-1])

row = dataset[i]

if (class_num not in classlist):

classlist[class_num] = []

classlist[class_num].append(row)

# preparing data class wise

class_data = {}

for class_num, row in classlist.items():

class_datarow = [(mean(columnvalues), stdev(columnvalues)) for columnvalues in

zip(*row)]

class_datrow = class_datarow[0:13]

class_data[class_num] = class_datarow

# Getting test vector

y_test = []

for j in range(len(testset)):

y_test.append(testset[j][-1])

# Getting prediction vector

y_pred = []

for i in range(len(testset)):

class_probability = {}

for class_num, row in class_data.items():

class_probability[class_num] = 1

for j in range(len(row)):

calculated_mean, calculated_dev = row[j]

x = float(testset[i][j])

if (calculated_dev != 0):

power = math.exp(-(math.pow(x - calculated_mean, 2) / (2 *

math.pow(calculated_dev, 2))))

probability = (1 / (math.sqrt(2 * math.pi) * calculated_dev)) * power

class_probability[class_num] *= probability

resultant_class, max_prob = -1, -1

for class_num, probability in class_probability.items():

if resultant_class == -1 or probability > max_prob:

max_prob = probability
resultant_class = class_num

y_pred.append(resultant_class)

# Getting Accuracy

count = 0

for i in range(len(testset)):

if testset[i][-1] == y_pred[i]:

count += 1

accuracy = (count / float(len(testset))) * 100.0

print("\n\n Accuracy: ", accuracy, "%")

y1 = [float(k) for k in y_test]

y_pred1 = [float(k) for k in y_pred]

print("\n\n\n\nConfusion Matrix")

cf_matrix = confusion_matrix(y1, y_pred1)

print(cf_matrix)

print("\n\n\n\nF1 Score")

f_score = f1_score(y1, y_pred1, average='weighted')

print(f_score)

# Matrix from 1D array

y2 = np.zeros(shape=(len(y1), 5))

y3 = np.zeros(shape=(len(y_pred1), 5))

for i in range(len(y1)):

y2[i][int(y1[i])] = 1

for i in range(len(y_pred1)):

y3[i][int(y_pred1[i])] = 1

# ROC Curve generation

n_classes = 5
fpr = dict()

tpr = dict()

roc_auc = dict()

for i in range(n_classes):

fpr[i], tpr[i], _ = roc_curve(y2[:, i], y3[:, i])

roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area

fpr["micro"], tpr["micro"], _ = roc_curve(y2.ravel(), y3.ravel())

roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

# Compute macro-average ROC curve and ROC area

print("\n\n\n\nROC Curve")

# First aggregate all false positive rates

lw = 2

all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))

# Then interpolate all ROC curves at this points

mean_tpr = np.zeros_like(all_fpr)

for i in range(n_classes):

mean_tpr += interp(all_fpr, fpr[i], tpr[i])

# Finally average it and compute AUC

mean_tpr /= n_classes

fpr["macro"] = all_fpr

tpr["macro"] = mean_tpr

roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])

# Plot all ROC curves

plt.figure()

plt.plot(fpr["micro"], tpr["micro"],
label='micro-average (area = {0:0.2f})'

''.format(roc_auc["micro"]),

color='deeppink', linestyle=':', linewidth=4)

plt.plot(fpr["macro"], tpr["macro"],

label='macro-average (area = {0:0.2f})'

''.format(roc_auc["macro"]),

color='navy', linestyle=':', linewidth=4)

colors = cycle(['aqua', 'darkorange', 'cornflowerblue', 'red', 'black'])

for i, color in zip(range(n_classes), colors):

plt.plot(fpr[i], tpr[i], color=color, lw=lw,

label='ROC of class {0} (area = {1:0.2f})'

''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)

plt.xlim([0.0, 1.0])

plt.ylim([0.0, 1.05])

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('Receiver operating characteristic for multi-class')

plt.legend(loc="lower right")

plt.savefig('Exp-8')

plt.show()

NB_from_Gaussian_Sklearn.py

import csv

import pandas as pd

import numpy as np

from sklearn.naive_bayes import GaussianNB

from sklearn.model_selection import train_test_split

from sklearn import metrics

from sklearn.metrics import confusion_matrix, f1_score, roc_curve, auc

import matplotlib.pyplot as plt

from itertools import cycle

from scipy import interp

# converting txt file to csv file

with open('heartdisease.txt', 'r') as in_file:

stripped = (line.strip() for line in in_file)

lines = (line.split(",") for line in stripped if line)

with open('heartdisease.csv', 'w') as out_file:

writer = csv.writer(out_file)

writer.writerow(('age', 'sex', 'cp', 'restbp', 'chol', 'fbs', 'restecg',

'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num'))

writer.writerows(lines)

# reading CSV using Pandas and storing in dataframe

df = pd.read_csv('heartdisease.csv', header=None)

training_x = df.iloc[1:df.shape[0], 0:13]

# print(training_set)

training_y = df.iloc[1:df.shape[0], 13:14]

# print(testing_set)

# converting dataframe into arrays

x = np.array(training_x)

y = np.array(training_y)

for z in range(5):

print("\n\n\nTest Train Split no. ", z + 1, "\n\n\n")

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=None)

# Gaussian function of sklearn

gnb = GaussianNB()

gnb.fit(x_train, y_train.ravel())

y_pred = gnb.predict(x_test)

print("\n\nGaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test,

y_pred) * 100)

# convert 2D array to 1D array

y1 = y_test.ravel()

y_pred1 = y_pred.ravel()

print("\n\n\n\nConfusion Matrix")

cf_matrix = confusion_matrix(y1, y_pred1)

print(cf_matrix)

print("\n\n\n\nF1 Score")

f_score = f1_score(y1, y_pred1, average='weighted')

print(f_score)

# Matrix from 1D array

y2 = np.zeros(shape=(len(y1), 5))

y3 = np.zeros(shape=(len(y_pred1), 5))

for i in range(len(y1)):

y2[i][int(y1[i])] = 1

for i in range(len(y_pred1)):

y3[i][int(y_pred1[i])] = 1

# ROC Curve generation

n_classes = 5

fpr = dict()
tpr = dict()

roc_auc = dict()

for i in range(n_classes):

fpr[i], tpr[i], _ = roc_curve(y2[:, i], y3[:, i])

roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area

fpr["micro"], tpr["micro"], _ = roc_curve(y2.ravel(), y3.ravel())

roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

# Compute macro-average ROC curve and ROC area

print("\n\n\n\nROC Curve")

# First aggregate all false positive rates

lw = 2

all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))

# Then interpolate all ROC curves at this points

mean_tpr = np.zeros_like(all_fpr)

for i in range(n_classes):

mean_tpr += interp(all_fpr, fpr[i], tpr[i])

# Finally average it and compute AUC

mean_tpr /= n_classes

fpr["macro"] = all_fpr

tpr["macro"] = mean_tpr

roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])

# Plot all ROC curves

plt.figure()

plt.plot(fpr["micro"], tpr["micro"],

label='micro-average (area = {0:0.2f})'

''.format(roc_auc["micro"]),

color='deeppink', linestyle=':', linewidth=4)

plt.plot(fpr["macro"], tpr["macro"],

label='macro-average (area = {0:0.2f})'

''.format(roc_auc["macro"]),

color='navy', linestyle=':', linewidth=4)

colors = cycle(['aqua', 'darkorange', 'cornflowerblue', 'red', 'black'])

for i, color in zip(range(n_classes), colors):

plt.plot(fpr[i], tpr[i], color=color, lw=lw,

label='ROC of class {0} (area = {1:0.2f})'

''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)

plt.xlim([0.0, 1.0])

plt.ylim([0.0, 1.05])

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('Receiver operating characteristic for multi-class')

plt.legend(loc="lower right")

plt.show()
OUTPUT:

RESULT:

Thus diagnosing heart patients and predict heart disease using heart disease dataset with Naïve
Bayes classifier algorithm is implemented successfully.
EX.NO:4
IMPLEMENTATION OF LINEAR MODELS
DATE:

AIM:

To implement linear models such as locally weighted linear regression and plot the necessary
graphs.

ALGORITHM:

1. Read the Given data Sample to X and the curve (linear or non-linear) to Y

2. Set the value for Smoothening parameter or Free parameter say τ

3. Set the bias /Point of interest set x0 which is a subset of X

4. Determine the weight matrix using :

5. Determine the value of model term parameter β using :

6. Prediction = x0*β.

PROGRAM:

from math import ceil

import numpy as np

from scipy import linalg

def lowess(x, y, f, iterations):

n = len(x)

r = int(ceil(f * n))

h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]

w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)

w = (1 - w ** 3) ** 3
yest = np.zeros(n)

delta = np.ones(n)

for iteration in range(iterations):

for i in range(n):

weights = delta * w[:, i]

b = np.array([np.sum(weights * y), np.sum(weights * y * x)])

A = np.array([[np.sum(weights), np.sum(weights * x)],[np.sum(weights * x),

np.sum(weights * x * x)]])

beta = linalg.solve(A, b)

yest[i] = beta[0] + beta[1] * x[i]

residuals = y - yest

s = np.median(np.abs(residuals))

delta = np.clip(residuals / (6.0 * s), -1, 1)

delta = (1 - delta ** 2) ** 2

return yest

import math

n = 100

x = np.linspace(0, 2 * math.pi, n)

y = np.sin(x) + 0.3 * np.random.randn(n)

f =0.25

iterations=3

yest = lowess(x, y, f, iterations)

import matplotlib.pyplot as plt

plt.plot(x,y,"r.")

plt.plot(x,yest,"b-")
OUTPUT:

Results:

Thus the implementation of linear models such as locally weighted linear regression and plot
the necessary graphs executed successfully.
EX.NO:5
IMPLEMENT MULTI-LAYER PERCEPTRON ALGORITHM
DATE:

AIM:

To implement the multi layer perceptron algorithm for the specified data.

ALGORITHM:

Step 1: Import the necessary libraries.

Step 2: Download the dataset. Tensor Flow allows us to read the MNIST dataset and we can load it
directly in the program as a train and test dataset.

Step 3: Now we will convert the pixels into floating-point values.

Step 4: Understand the structure of the dataset

Step 5: Visualize the data.

Step 6: Form the Input, hidden, and output layers.

Step 7: Compile the model.

Step 8: Fit the model.

Step 9: Find Accuracy of the model.

Program:

# importing modules
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz

11493376/11490434 [==============================] – 2s 0us/step
# Cast the records into float values
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# normalize image pixel values by dividing

# by 255
gray_scale = 255
x_train /= gray_scale
x_test /= gray_scale
print("Feature matrix:", x_train.shape)
print("Target matrix:", x_test.shape)
print("Feature matrix:", y_train.shape)
print("Target matrix:", y_test.shape)

OUTPUT:

Feature matrix: (60000, 28, 28)

Target matrix: (10000, 28, 28)
Feature matrix: (60000,)
Target matrix: (10000,)

fig, ax = plt.subplots(10, 10)

k=0
for i in range(10):
for j in range(10):
ax[i][j].imshow(x_train[k].reshape(28, 28),
aspect='auto')
k += 1
plt.show()

OUTPUT:
model = Sequential([

# reshape 28 row * 28 column data to 28*28 rows

Flatten(input_shape=(28, 28)),

# dense layer 1
Dense(256, activation='sigmoid'),

# dense layer 2
Dense(128, activation='sigmoid'),

# output layer
Dense(10, activation='sigmoid'),
])

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10,

batch_size=2000,
validation_split=0.2)

OUTPUT:3
results = model.evaluate(x_test, y_test, verbose = 0)
print('test loss, test acc:', results)

OUTPUT:

test loss, test acc: [0.27210235595703125, 0.9223999977111816]

RESULT:
Thus the multi layer perceptron algorithm for the specified data has been executed and output is
verified successfully.
EX.NO:6
IMPLEMENTATION OF KNN ALGORITHM
DATE:

AIM:

To implement K-NN algorithm for the specified data.

ALGORITHM:

1. Collect data continuously.

2. Preprocess real-time data (scaling, normalization).
3. Choose KKK, the number of neighbors.
4. For every new data point:
 Compute distances to all existing data points.
 Identify the K-nearest neighbors.
 Use majority voting (classification) or averaging (regression) to predict the label/value.
5. Update the system with new labeled data periodically.

PROGRAM:

pimport numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import accuracy_scoreip install numpy pandas scikit-learn

# Example dataset: Features are random, labels are 0 or 1

np.random.seed(42)

data_size = 1000

X = np.random.rand(data_size, 2) # 2 features

y = np.random.choice([0, 1], size=data_size) # Binary labels

# Split dataset into training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardizing the dataset (important for KNN)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Initialize KNN with K=5 (You can change this based on experimentation)

knn = KNeighborsClassifier(n_neighbors=5)

# Train the model on the training data

knn.fit(X_train, y_train)

# Real-time data simulation: Loop through test set (one-by-one)

for i in range(len(X_test)):

# Predict the class for the incoming real-time data point

pred = knn.predict([X_test[i]]) # Reshape single data point to 2D array

print(f"Real-time Prediction for Data Point {i}: {pred[0]} (True Label: {y_test[i]})")

# Predict all the labels for the test set

y_pred = knn.predict(X_test)

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy of KNN model: {accuracy:.2f}")

# Using KD-tree for optimized nearest neighbor search

knn = KNeighborsClassifier(n_neighbors=5, algorithm='kd_tree')

knn.fit(X_train, y_train)

from sklearn.decomposition import PCA

# Applying PCA to reduce dimensionality

pca = PCA(n_components=2) # Reduce to 2 dimensions

X_train_reduced = pca.fit_transform(X_train)

X_test_reduced = pca.transform(X_test)

# Fit and evaluate KNN on reduced data

knn.fit(X_train_reduced, y_train)
# Use multiple cores for parallel distance calculations

knn = KNeighborsClassifier(n_neighbors=5, n_jobs=-1) # -1 uses all available cores

np.random.seed(42)

data_size = 1000

X = np.random.rand(data_size, 2) # 2 features

y = np.random.choice([0, 1], size=data_size) # Binary labels

# Split dataset into training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardizing the dataset

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Initialize and train KNN

knn = KNeighborsClassifier(n_neighbors=5)

knn.fit(X_train, y_train)

for i in range(len(X_test)):

pred = knn.predict([X_test[i]]) # Reshape single data point to 2D array

print(f"Real-time Prediction for Data Point {i}: {pred[0]} (True Label: {y_test[i]})")

OUTPUT:

Real-time Prediction for Data Point 0: 0 (True Label: 0)

Real-time Prediction for Data Point 1: 1 (True Label: 1)

Real-time Prediction for Data Point 2: 1 (True Label: 1)

Real-time Prediction for Data Point 3: 1 (True Label: 1)

Real-time Prediction for Data Point 4: 0 (True Label: 0)

...

# Predict all the labels for the test set

y_pred = knn.predict(X_test)

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy of KNN model: {accuracy:.2f}")

OUTPUT:

Accuracy of KNN model: 0.90

RESULT:

Thus the K NN implementation for the specified dataset is executed successfully.

EX.NO:7
IMPLEMENTATION OF SVM ALGORITHM
DATE:

AIM:

To create a machine learning model which classifies the Spam and Ham E-Mails from

a given dataset using Support Vector Machine algorithm.

ALGORITHM:

1. Import all the necessary libraries.

2. Read the given csv file which contains the emails which are both spam and ham.

3. Gather all the words given in that dataset and Identify the stop words with a mean distribution.

4. Create an ML model using the Support Vector Classifier after splitting the dataset into training and
test set.

5. Display the accuracy and f1 score and print the confusion matrix for the classification of spam and
ham.

PROGRAM:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

import string

from nltk.corpus import stopwords

import os

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

from PIL import Image

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report, confusion_matrix

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import roc_curve, auc

from sklearn import metrics

from sklearn import model_selection

from sklearn import svm

from nltk import word_tokenize

from sklearn.metrics import roc_auc_score

from matplotlib import pyplot

from sklearn.metrics import plot_confusion_matrix

class data_read_write(object):

def __init__(self):

pass

def init(self, file_link):

self.data_frame = pd.read_csv(file_link)

def read_csv_file(self, file_link):

return self.data_frame

def write_to_csvfile(self, file_link):

self.data_frame.to_csv(file_link, encoding='utf-8', index=False, header=True)

return

class generate_word_cloud(data_read_write):

def __init__(self):

pass

def variance_column(self, data):

return np.variance(data)

def word_cloud(self, data_frame_column, output_image_file):

text = " ".join(review for review in data_frame_column)

stopwords = set(STOPWORDS)

stopwords.update(["subject"])

wordcloud = WordCloud(width = 1200, height = 800, stopwords=stopwords,

max_font_size = 50, margin=0,

background_color = "white").generate(text)

plt.imshow(wordcloud, interpolation='bilinear')

plt.axis("off")

plt.savefig("Distribution.png")

plt.show()

wordcloud.to_file(output_image_file)

return

class data_cleaning(data_read_write):

def __init__(self):

pass

def message_cleaning(self, message):

Test_punc_removed = [char for char in message if char not in string.punctuation]

Test_punc_removed_join = ''.join(Test_punc_removed)

Test_punc_removed_join_clean = [word for word in Test_punc_removed_join.split()

if word.lower() not in stopwords.words('english')]

final_join = ' '.join(Test_punc_removed_join_clean)

return final_join

def apply_to_column(self, data_column_text):

data_processed = data_column_text.apply(self.message_cleaning)

return data_processed

class apply_embeddding_and_model(data_read_write):

def __init__(self):
pass

def apply_count_vector(self, v_data_column):

vectorizer = CountVectorizer(min_df=2, analyzer="word", tokenizer=None,

preprocessor=None, stop_words=None)

return vectorizer.fit_transform(v_data_column)

def apply_svm(self, X, y):

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

params = {'kernel': 'linear', 'C': 2, 'gamma': 1}

svm_cv = svm.SVC(C=params['C'], kernel=params['kernel'], gamma=params['gamma'],

probability=True)

svm_cv.fit(X_train, y_train)

y_predict_test = svm_cv.predict(X_test)

cm = confusion_matrix(y_test, y_predict_test)

sns.heatmap(cm, annot=True)

print(classification_report(y_test, y_predict_test))

print("test set")

print("\nAccuracy Score: " + str(metrics.accuracy_score(y_test, y_predict_test)))

print("F1 Score: " + str(metrics.f1_score(y_test, y_predict_test)))

print("Recall: " + str(metrics.recall_score(y_test, y_predict_test)))

print("Precision: " + str(metrics.precision_score(y_test, y_predict_test)))

class_names = ['ham', 'spam']

titles_options = [("Confusion matrix, without normalization", None),

("Normalized confusion matrix", 'true')]

for title, normalize in titles_options:

disp = plot_confusion_matrix(svm_cv, X_test, y_test,

display_labels=class_names,
cmap=plt.cm.Blues,

normalize=normalize)

disp.ax_.set_title(title)

print(title)

print(disp.confusion_matrix)

plt.savefig("SVM.png")

plt.show()

ns_probs = [0 for _ in range(len(y_test))]

lr_probs = svm_cv.predict_proba(X_test)

lr_probs = lr_probs[:, 1]

ns_auc = roc_auc_score(y_test, ns_probs)

lr_auc = roc_auc_score(y_test, lr_probs)

print('No Skill: ROC AUC=%.3f' % (ns_auc))

print('SVM: ROC AUC=%.3f' % (lr_auc))

ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)

lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)

pyplot.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')

pyplot.plot(lr_fpr, lr_tpr, marker='.', label='SVM')

pyplot.xlabel('False Positive Rate')

pyplot.ylabel('True Positive Rate')

pyplot.legend()

pyplot.savefig("SVMMat.png")

pyplot.show()

return

data_obj = data_read_write("emails.csv")

data_frame = data_obj.read_csv_file("processed.csv")
data_frame.head()

data_frame.tail()

data_frame.describe()

data_frame.info()

data_frame.head()

data_frame.groupby('spam').describe()

data_frame['length'] = data_frame['text'].apply(len)

data_frame['length'].max()

sns.set(rc={'figure.figsize':(11.7,8.27)})

ham_messages_length = data_frame[data_frame['spam']==0]

spam_messages_length = data_frame[data_frame['spam']==1]

ham_messages_length['length'].plot(bins=100, kind='hist',label = 'Ham')

spam_messages_length['length'].plot(bins=100, kind='hist',label = 'Spam')

plt.title('Distribution of Length of Email Text')

plt.xlabel('Length of Email Text')

plt.legend()

data_frame[data_frame['spam']==0].text.values

ham_words_length = [len(word_tokenize(title)) for title in

data_frame[data_frame['spam']==0].text.values]

spam_words_length = [len(word_tokenize(title)) for title in

data_frame[data_frame['spam']==1].text.values]

print(max(ham_words_length))

print(max(spam_words_length))

sns.set(rc={'figure.figsize':(11.7,8.27)})

ax = sns.distplot(ham_words_length, norm_hist = True, bins = 30, label = 'Ham')

ax = sns.distplot(spam_words_length, norm_hist = True, bins = 30, label = 'Spam')

plt.title('Distribution of Number of Words')

plt.xlabel('Number of Words')

plt.legend()

plt.savefig("SVMGraph.png")

plt.show()

def mean_word_length(x):

word_lengths = np.array([])

for word in word_tokenize(x):

word_lengths = np.append(word_lengths, len(word))

return word_lengths.mean()

ham_meanword_length =

data_frame[data_frame['spam']==0].text.apply(mean_word_length)

spam_meanword_length =

data_frame[data_frame['spam']==1].text.apply(mean_word_length)

sns.distplot(ham_meanword_length, norm_hist = True, bins = 30, label = 'Ham')

sns.distplot(spam_meanword_length , norm_hist = True, bins = 30, label = 'Spam')

plt.title('Distribution of Mean Word Length')

plt.xlabel('Mean Word Length')

plt.legend()

plt.savefig("Graph.png")

plt.show()

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

def stop_words_ratio(x):

num_total_words = 0

num_stop_words = 0
for word in word_tokenize(x):

if word in stop_words:

num_stop_words += 1

num_total_words += 1

return num_stop_words / num_total_words

ham_stopwords = data_frame[data_frame['spam'] == 0].text.apply(stop_words_ratio)

spam_stopwords = data_frame[data_frame['spam'] == 1].text.apply(stop_words_ratio)

sns.distplot(ham_stopwords, norm_hist=True, label='Ham')

sns.distplot(spam_stopwords, label='Spam')

print('Ham Mean: {:.3f}'.format(ham_stopwords.values.mean()))

print('Spam Mean: {:.3f}'.format(spam_stopwords.values.mean()))

plt.title('Distribution of Stop-word Ratio')

plt.xlabel('Stop Word Ratio')

plt.legend()

ham = data_frame[data_frame['spam']==0]

spam = data_frame[data_frame['spam']==1]

spam['length'].plot(bins=60, kind='hist')

ham['length'].plot(bins=60, kind='hist')

data_frame['Ham(0) and Spam(1)'] = data_frame['spam']

print( 'Spam percentage =', (len(spam) / len(data_frame) )*100,"%")

print( 'Ham percentage =', (len(ham) / len(data_frame) )*100,"%")

sns.countplot(data_frame['Ham(0) and Spam(1)'], label = "Count")

data_clean_obj = data_cleaning()

data_frame['clean_text'] = data_clean_obj.apply_to_column(data_frame['text'])
data_frame.head()

data_obj.data_frame.head()

data_obj.write_to_csvfile("processed_file.csv")

cv_object = apply_embeddding_and_model()

spamham_countvectorizer = cv_object.apply_count_vector(data_frame['clean_text'])

X = spamham_countvectorizer

label = data_frame['spam'].values

y = label

cv_object.apply_svm(X,y)

Output:

precision recall f1-score support

0 0.99 0.99 0.99 877

1 0.98 0.97 0.98 269

accuracy 0.99 1146

macro avg 0.99 0.98 0.99 1146

weighted avg 0.99 0.99 0.99 1146

test set

Accuracy Score: 0.9895287958115183

F1 Score: 0.9776119402985075

Recall: 0.9739776951672863
Precision: 0.9812734082397003

Normalized confusion matrix

[[0.99429875 0.00570125]

[0.0260223 0.9739777 ]]

OUTPUT:
RESULT:

Thus the program to create a machine learning model which classifies the Spam and Ham E-
Mails from a given dataset using Support Vector Machine algorithm has been successfully executed.
EX.NO:8
IMPLEMENTATION OF DECISION TREE
DATE:

AIM:

To implement the concept of decision trees with suitable dataset from real world problems.

ALGORITHM:

1. Start with the entire dataset.

2. Evaluate all features and find the best split based on impurity/variance reduction.
3. Split the dataset into subsets and assign them as children nodes.
4. Recursively split the children nodes.
5. Stop splitting based on the stopping criteria.
6. Assign the output class/value to leaf nodes.

PROGRAM:

#Import Modules
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
#Create Dataframe
iris_df = pd.read_csv('/content/Iris.csv')
#Display First five rows of dataframe
iris_df.head(5)

#Drop Id Column
iris_df.drop("Id",axis=1,inplace=True)
#To check Number of rows and Columns
iris_df.shape
(150, 5)
#Create X and y variables
feature_cols = ['SepalLengthCm','SepalWidthCm','SepalWidthCm','SepalWidthCm']
X = iris_df.drop('Species', axis=1) # Features
y = iris_df['Species'] # Target variable
#Print X(Feature variable)
X.head()

#Print y(Target variable)

y.head()
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
Name: Species, dtype: object
#Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)
decision_tree = DecisionTreeClassifier(random_state=42)
decision_tree.fit(X_train, y_train)
decision_tree.score(X_train,y_train)
1.0
# Make predictions on the train dataset by using the 'predict()' function.
# Compute the predictions
y_pred_dt = pd.Series(decision_tree.predict(X_train))
# Print the occurrence of each flower type computed in the predictions.
y_pred_dt.value_counts()
Iris-versicolor 41
Iris-setosa 40
Iris-virginica 39
Name: count, dtype: int64
#Make predictions on the test dataset by using the 'predict()' function.
# Compute the predictions
y_test_pred= pd.Series(decision_tree.predict(X_test))
# Print the occurrence of each flower type computed in the predictions.
y_test_pred.value_counts()
Iris-virginica 11
Iris-setosa 10
Iris-versicolor 9
Name: count, dtype: int64
# Create a confusion matrix for the test set.
# Import the libraries
from sklearn.metrics import confusion_matrix, classification_report
# Print the confusion matrix
cm= confusion_matrix(y_test, y_test_pred)
cm
array([[10, 0, 0],
[ 0, 9, 0],
[ 0, 0, 11]])
# Display recall, precision and f1-score values for the test set.
bm= classification_report(y_test,y_test_pred)
print(bm)
precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 10

Iris-versicolor 1.00 1.00 1.00 9
Iris-virginica 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
accuracy_dt = accuracy_score(y_test, y_test_pred)
print('Decision Tree Accuracy:', accuracy_dt)
Decision Tree Accuracy: 1.0
from sklearn.tree import export_graphviz
from io import StringIO
from IPython.display import Image
import pydotplus

dot_data = StringIO()
export_graphviz(decision_tree, out_file=dot_data,
filled=True, rounded=True,max_depth=3,
special_characters=True,feature_names = feature_cols,class_names=['Iris-setosa','Iris-
versicolor','Iris-virginica'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('tree.png')
Image(graph.create_png())
OUTPUT:

RESULT:
Thus the concept of decision trees with suitable dataset from real world problems is
implemented successfully.
EX.NO:9
IMPLEMENTATION OF K MEANS CLUSTERING ALGORITHM
DATE:

AIM:

To implement K-Means clustering algorithm for the given data.

AlGORITHM:

1. Randomly initialize Kcentroids.

2. Assign each data point to the nearest centroid (form Kclusters).
3. Recalculate the centroids of the clusters.
4. Repeat until centroids do not change significantly.
5. Return the clusters and final centroids.

PROGRAM:

# Install required libraries

!pip install scikit-learn matplotlib

Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.5.2)

Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.7.1)
Requirement already satisfied: numpy>=1.19.5 in /usr/local/lib/python3.10/dist-packages (from scikit-
learn) (1.26.4)
Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from scikit-
learn) (1.13.1)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from scikit-
learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from
scikit-learn) (3.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib)
(0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (4.54.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (3.1.4)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from
matplotlib) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-
dateutil>=2.7->matplotlib) (1.16.0)

import numpy as np

import pandas as pd

from sklearn.datasets import load_iris

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

# Step 2: Load the Dataset

iris = load_iris()

data = pd.DataFrame(iris.data, columns=iris.feature_names)

# Step 3: Preprocess the Data

# Select the first two features for simplicity

X = data.iloc[:, :2] # Use only 'sepal length' and 'sepal width'

# Step 4: Create and Train the K-Means Model

# Initialize KMeans model with 3 clusters (since we know there are 3 classes in the Iris dataset)

kmeans = KMeans(n_clusters=3, random_state=0)

kmeans.fit(X)

KMeans
KMeans(n_clusters=3, random_state=0)
# Step 5: Evaluate the Model
# Get cluster labels
labels = kmeans.labels_
# Get cluster centers
centers = kmeans.cluster_centers_
# Get inertia (sum of squared distances to nearest cluster center)
inertia = kmeans.inertia_
print("Cluster Centers:\n", centers)
print("Inertia:", inertia)
Cluster Centers:
[[5.77358491 2.69245283]
[6.81276596 3.07446809]
[5.006 3.428 ]]
Inertia: 37.0507021276596
# Step 6: Visualize the Results
plt.scatter(X.iloc[:, 0], X.iloc[:, 1], c=labels, cmap='viridis', marker='o', edgecolor='k')
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='x', s=100, label='Centers')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.title('K-Means Clustering on Iris Dataset')
plt.legend()
plt.show()

OUTPUT:

RESULT:

Thus K-Means clustering algorithm for the given data set is executed successfully.
EX.NO:10
IMPLEMENTATION OF GENETIC OPERATORS AND Q-LEARNING
DATE:

AIM:

To implement genetic operators and Q-learning for the given data.

ALGORITHM:

GENETIC ALGORITHM

1. Initialization:Define the range of K (number of neighbors) and create an initial population of

potential Kvalues.
2. Fitness Function:For each individual in the population, use KNN with the corresponding
Kvalue, compute accuracy on the validation set, and use this as the fitness score.
3. Selection, Crossover, Mutation:Use selection to pick the best K values, crossover to combine
them, and mutation to introduce randomness by adjusting K.
4. Iterate for a set number of generations.

Q-LEARNING ALGORITHM:

1. Initialization:

 Initialize a Q-table: This is a table where rows represent the state (e.g., feature set), and
columns represent actions (e.g., predict 0 or 1).
 Each state-action pair in the table holds a Q-value representing the expected reward of
taking that action in that state.

2. Action Selection (Policy):

 For each new real-time data point, based on the current state, select an action (e.g.,
classify as 0 or 1).
 Use an exploration-exploitation strategy like epsilon-greedy (randomly choose an action
with probability ϵ or take the best known action based on Q-values with probability 1−ϵ1

3. Q-value Update:

 After taking an action and receiving feedback (reward), update the Q-value using the
Bellman Equation:
PROGRAM:

import random

from sklearn.metrics import accuracy_score

# Genetic Algorithm for optimizing KNN

def genetic_algorithm_knn(X_train, y_train, X_test, y_test, population_size=10, generations=10):

# Randomly initialize the population with K values between 1 and 30

population = [random.randint(1, 30) for _ in range(population_size)]

def fitness_function(k):

# Fit and evaluate KNN with the given K value

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

return accuracy_score(y_test, y_pred)

for generation in range(generations):

# Evaluate fitness of each individual in the population

fitness_scores = [fitness_function(k) for k in population]

# Selection: Select the top individuals based on fitness

sorted_population = [x for _, x in sorted(zip(fitness_scores, population), reverse=True)]

population = sorted_population[:population_size // 2

# Crossover: Create new individuals by combining parts of two parents

while len(population) < population_size:

parent1 = random.choice(population[:3]) # Top 3 are best parents

parent2 = random.choice(population[:3])

offspring = (parent1 + parent2) // 2

population.append(offspring)

# Mutation: Randomly change some individuals (add some randomness to prevent local minima)

for i in range(len(population)):

if random.random() < 0.1: # 10% chance of mutation

population[i] = random.randint(1, 30)

# Output the best result in each generation

best_k = population[0]

best_score = fitness_function(best_k)

print(f"Generation {generation+1}: Best K = {best_k}, Best Accuracy = {best_score:.4f}")

# Return the best solution found

return population[0]

# Run Genetic Algorithm to optimize K

best_k = genetic_algorithm_knn(X_train, y_train, X_test, y_test)

print(f"Optimal K found by GA: {best_k}")

OUTPUT:

Generation 1: Best K = 3, Best Accuracy = 0.89

Generation 2: Best K = 5, Best Accuracy = 0.90

Generation 3: Best K = 7, Best Accuracy = 0.91

Generation 4: Best K = 7, Best Accuracy = 0.91

Generation 5: Best K = 9, Best Accuracy = 0.91

Generation 6: Best K = 9, Best Accuracy = 0.92

Generation 7: Best K = 9, Best Accuracy = 0.92

Generation 8: Best K = 11, Best Accuracy = 0.92

Generation 9: Best K = 11, Best Accuracy = 0.93

Generation 10: Best K = 13, Best Accuracy = 0.93

Optimal K found by GA: 13

import numpy as np

# Q-learning implementation

def q_learning_knn(X_train, y_train, n_actions=2, n_states=10, n_episodes=100):

# Initialize Q-table with zeros (n_states = number of states, n_actions = 0 or 1 for binary
classification)

Q_table = np.zeros((n_states, n_actions))

alpha = 0.1 # Learning rate

gamma = 0.9 # Discount factor

epsilon = 0.1 # Exploration-exploitation trade-off

def get_state(x):

"""Convert the input to a state index (discretize the feature space)"""

return int(x[0] * n_states) # Example of simple discretization

for episode in range(n_episodes):

for i in range(len(X_train)):

state = get_state(X_train[i])

# Epsilon-greedy action selection

if np.random.rand() < epsilon:

action = np.random.choice(n_actions)

else:

action = np.argmax(Q_table[state])

# Execute the action and receive reward

reward = 1 if (action == y_train[i]) else -1 # Reward is based on correct classification

# Observe the new state (same as state since we're using static data)
new_state = state

# Update Q-value using Bellman equation

Q_table[state, action] = Q_table[state, action] + alpha * (reward + gamma *

np.max(Q_table[new_state]) - Q_table[state, action])

return Q_table

# Run Q-learning to optimize binary classification

q_table = q_learning_knn(X_train, y_train)

print("Final Q-table:")

print(q_table)

OUTPUT:

Final Q-table:

[[ 0.15 0.08]

[ 0.05 0.18]

[ 0.12 0.20]

[ 0.25 0.10]

[ 0.30 0.12]

[ 0.45 0.22]

[ 0.55 0.10]

[ 0.62 0.35]

[ 0.50 0.40]

[ 0.30 0.60]]

RESULT:

Thus the implementation of genetic operators and Q-learning for the given data is executed
successfully.
EX.NO:11
BUILD SUPERVISED AND UNSUPERVISED MODEL
DATE:

AIM:
To build a supervised and unsupervised model for an appropriate dataset.

AlGORITHM:
Step 1: Import libraries
Step 2: Load the Iris dataset
Step 3: Split the dataset into training and testing sets

Step 4: Create and train the Decision Tree model

Step 5: Make predictions on the test set
Step 6: Evaluate the model's performance
Step 7: Print the results

PROGRAM:
import numpy as np

import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
iris = datasets.load_iris()

X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # Labels (species of iris flowers)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")

print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(classification_rep)

OUTPUT:
Accuracy: 1.0
Confusion Matrix:
[[10 0 0]
[ 0 9 1]
[ 0 0 10]]
Classification Report:

precision recall f1-score support

0 1.00 1.00 1.00 10
1 1.00 0.90 0.95 10
2 0.91 1.00 0.95 10

accuracy 0.95 30
macro avg 0.97 0.97 0.97 30

weighted avg 0.97 0.95 0.96 30

import pandas as pd
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
wine = datasets.load_wine()
X = wine.data # Features (13 chemical attributes)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)
cluster_labels = kmeans.labels_
sil_score = silhouette_score(X_scaled, cluster_labels)
print(f"Cluster Labels: {cluster_labels}")
print(f"Silhouette Score: {sil_score}")

OUTPUT:
Cluster Labels: [0 1 1 0 0 1 1 0 1 2 2 2 0 0 0 1 2 2 0 1 1 0 2 1 2 0 1 2 0 2 2 2 1 1 0 1 2 1 1 0 2 1 0 2 0 1 1
2 1 0]
Silhouette Score: 0.61

RESULT:

Thus the building of supervised and unsupervised model for an appropriate dataset is
implemented successfully.

Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
8 pages
Guide - Making Money Online
91% (11)
Guide - Making Money Online
324 pages
The Khuzwayos
No ratings yet
The Khuzwayos
267 pages
Bio-Stats Step 3
100% (6)
Bio-Stats Step 3
9 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
PR Final File
No ratings yet
PR Final File
49 pages
IML Lab Manual
No ratings yet
IML Lab Manual
31 pages
Module5 Quiz
100% (1)
Module5 Quiz
34 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
ML Record - Merged
No ratings yet
ML Record - Merged
29 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
Practical Assignment ML
No ratings yet
Practical Assignment ML
50 pages
Practical Labs Guide
No ratings yet
Practical Labs Guide
34 pages
Python Library Functions
No ratings yet
Python Library Functions
12 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
Successful Remedies For Early Marriage
No ratings yet
Successful Remedies For Early Marriage
4 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Rue Morgue 11.12 2021
100% (2)
Rue Morgue 11.12 2021
64 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
ML Lab Manual With Statistical Formulas
No ratings yet
ML Lab Manual With Statistical Formulas
9 pages
AIYA Pre-Requisites Session 3
No ratings yet
AIYA Pre-Requisites Session 3
4 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
ML Lab Manual (Upto Cie-1)
No ratings yet
ML Lab Manual (Upto Cie-1)
33 pages
Week 3
No ratings yet
Week 3
10 pages
Machine Learning Algorithms PDF
100% (1)
Machine Learning Algorithms PDF
148 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
Datascience
No ratings yet
Datascience
26 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
ML LabManual
No ratings yet
ML LabManual
16 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
AIML Short Term Internship Session 9 Summary-1719044709410
No ratings yet
AIML Short Term Internship Session 9 Summary-1719044709410
14 pages
TA1 English - Mini Excavator
No ratings yet
TA1 English - Mini Excavator
15 pages
2A - Python+Data Analysis For Pyhton2 v2
No ratings yet
2A - Python+Data Analysis For Pyhton2 v2
38 pages
DA LabFile
No ratings yet
DA LabFile
63 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Roadmap
No ratings yet
Roadmap
27 pages
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
No ratings yet
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
91 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
ML Manual
No ratings yet
ML Manual
21 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
96 pages
Assignment 01
No ratings yet
Assignment 01
3 pages
ML Exp
No ratings yet
ML Exp
9 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
3 pages
l9 Scientific Python Proc
No ratings yet
l9 Scientific Python Proc
30 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
SAS Weapons Heavy Machine Guns DSHK
100% (1)
SAS Weapons Heavy Machine Guns DSHK
1 page
Data Science With Python-Sasmita PDF
67% (3)
Data Science With Python-Sasmita PDF
9 pages
Unit 1-1
No ratings yet
Unit 1-1
10 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
Learninng Plan
No ratings yet
Learninng Plan
6 pages
Viva
No ratings yet
Viva
7 pages
Lab 02 - Introduction To Pandas
No ratings yet
Lab 02 - Introduction To Pandas
6 pages
Fluid Mechanics and Hydraulics - Gillesania
No ratings yet
Fluid Mechanics and Hydraulics - Gillesania
308 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
University of Cambridge International Examinations International General Certificate of Secondary Education
No ratings yet
University of Cambridge International Examinations International General Certificate of Secondary Education
20 pages
MLCyber Lab
No ratings yet
MLCyber Lab
9 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Secondary School Assessment Policy
No ratings yet
Secondary School Assessment Policy
12 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Essential Python Libraries and Functions For Data Science 1706295212
No ratings yet
Essential Python Libraries and Functions For Data Science 1706295212
12 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
Assignment - 12 Solution
No ratings yet
Assignment - 12 Solution
5 pages
Energy Management SYSTEM Manual
No ratings yet
Energy Management SYSTEM Manual
34 pages
How To Draw and Read Line Diagrams Onboard Ships
No ratings yet
How To Draw and Read Line Diagrams Onboard Ships
23 pages
A. Engage
No ratings yet
A. Engage
8 pages
Introduction To E-Gov
No ratings yet
Introduction To E-Gov
15 pages
Craig Ch03
No ratings yet
Craig Ch03
46 pages
Growth Comparison of Planting Tomato in Hydroponic Wick System and Soil Based System
No ratings yet
Growth Comparison of Planting Tomato in Hydroponic Wick System and Soil Based System
5 pages
TCC Catalog 2017 18
No ratings yet
TCC Catalog 2017 18
186 pages
Pac 6500-Sira 16 Atex 2362-00
No ratings yet
Pac 6500-Sira 16 Atex 2362-00
3 pages
ĐỀ KIỂM TRA ĐẦU VÀO - ANH 7 Global
No ratings yet
ĐỀ KIỂM TRA ĐẦU VÀO - ANH 7 Global
5 pages
Wiljam Flight Training: 050-01-01 Composition, Extent, Vertical Division
No ratings yet
Wiljam Flight Training: 050-01-01 Composition, Extent, Vertical Division
18 pages
Installing ICU 52
No ratings yet
Installing ICU 52
7 pages
Improving Statistical Methods To Protect Wildlife Populations - ScienceDaily
No ratings yet
Improving Statistical Methods To Protect Wildlife Populations - ScienceDaily
7 pages
Worksheet Geography CH 4
No ratings yet
Worksheet Geography CH 4
2 pages
Clannad - Onaji Takami He
No ratings yet
Clannad - Onaji Takami He
3 pages
Strings (ALL PROGRAMS)
No ratings yet
Strings (ALL PROGRAMS)
4 pages
Conference Coordinator-OMICS International
No ratings yet
Conference Coordinator-OMICS International
2 pages
Styrofoam Cup: Experiment 3: Energy in Thermal System Objectives
No ratings yet
Styrofoam Cup: Experiment 3: Energy in Thermal System Objectives
3 pages
Nitratos, TNT 835
No ratings yet
Nitratos, TNT 835
2 pages
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet