0% found this document useful (0 votes)

9 views11 pages

EX - NO:3: Algorithm

The document outlines exercises for implementing various machine learning algorithms using Scikit-learn, including K-Nearest Neighbors (KNN) for classifying real vs fake news, Decision Trees for analyzing overfitting, and K-Means Clustering for biological datasets. Each exercise includes aims, algorithms, and sample code to guide the implementation. The results demonstrate successful execution of the programs and their respective analyses.

Uploaded by

praisikamahendran13568

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views11 pages

EX - NO:3: Algorithm

Uploaded by

praisikamahendran13568

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

EX.NO:3 CLASSIFICATION WITH NEAREST NEIGHBORS.

IN THIS
QUESTION YOU WILL USE SCIKIT’S LEARN’S KNN
CLASSIFIER TO CLASSIFY REAL VS FAKE NEWS
HEADLINES.THE AIM OF THIS QUESTION IS FOR YOU TO
READ THE SCIKIT-LEARN API AND GET COMFORTABLE
WITH TRAINING/VALIDATION SPLITS.USE CALIFORNIA
DATE:
HOUSING DATASET.

AIM:

To implement a program for Classification using Nearest Neighbors using Scikit-

learn’s KNN classifier and evaluate the model’s performance with training/validation splits
and metrics.

ALGORITHM:

1. Start the program.

2. Import the necessary libraries and dataset.
3. Preprocess the dataset if needed.
4. Split the dataset into training and testing sets.
5. Train the K-Nearest Neighbors (KNN) model with different values of K.
6. Plot the accuracy scores for different values of K.
7. Choose the best K and evaluate the model.
8. Print accuracy, confusion matrix, and classification report.
9. Plot the confusion matrix.
10. End the program.

PROGRAM:

# Import Libraries

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import load_wine

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Load Dataset

wine = load_wine()

X = wine.data

y = wine.target

# Split Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Try different K values

neighbors = np.arange(1, 10)

train_accuracy = []

test_accuracy = []

for k in neighbors:

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

train_accuracy.append(knn.score(X_train, y_train))

test_accuracy.append(knn.score(X_test, y_test))

# Plot K vs Accuracy
plt.figure(figsize=(8,5))

plt.plot(neighbors, train_accuracy, label="Train Accuracy", marker='o')

plt.plot(neighbors, test_accuracy, label="Test Accuracy", marker='s')

plt.xlabel("Number of Neighbors (K)")

plt.ylabel("Accuracy")

plt.title("KNN Accuracy for Different K Values")

plt.legend()

plt.grid(True)

plt.show()

# Final Model with Best K

best_k = 5

knn = KNeighborsClassifier(n_neighbors=best_k)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

# Evaluation

acc = accuracy_score(y_test, y_pred)

print(f"\n Final Accuracy (K={best_k}): {acc:.4f}")

cm = confusion_matrix(y_test, y_pred)

print("\n Confusion Matrix:")

print(cm)

print("\n Classification Report:")

print(classification_report(y_test, y_pred, target_names=wine.target_names))

# Plot Confusion Matrix

plt.figure(figsize=(6, 4))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',

xticklabels=wine.target_names, yticklabels=wine.target_names)

plt.xlabel("Predicted")

plt.ylabel("Actual")

plt.title("Confusion Matrix")

plt.show()

SAMPLE INPUT OUTPUT:

RESULT:

Thus to implement a program for Classification using Nearest Neighbors using Scikit-
learn’s KNN classifier and evaluate the model’s performance with training/validation splits
and metrics was successfully executed.
EX NO: 4 IN THIS EXERCISE, YOU'LL EXPERIMENT WITH
VALIDATION SETS AND TEST SETS USING THE
DATE: DATASET. SPLITA TRAINING SET INTO A SMALLER
TRAINING SET AND A VALIDATION SET. ANALYZE
DELTAS BETWEEN TRAINING SET AND VALIDATION
SET RESULTS. TEST THE TRAINED MODEL WITH A
TEST SET TO DETERMINE WHETHER YOUR TRAINED
MODEL IS OVERFITTING. DETECT AND FIX A COMMON
TRAINING PROBLEM.

AIM:

To analyze the difference in performance between training and validation sets to determine if
the model is overfitting, and to visualize the results to detect and address this issue.

ALGORITHM:

1. Start the program.

2. Import necessary libraries and generate or load a classification dataset.
3. Split the dataset into training and testing sets.
4. Train a Decision Tree classifier with different depths.
5. Evaluate model accuracy on training and testing sets for each depth.
6. Record and compare the results.
7. Plot accuracy vs model complexity (tree depth).
8. Identify the presence of overfitting.
9. End the program.

PROGRAM:

# Step 1: Import Libraries

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

# Step 2: Generate synthetic classification dataset

X, y = make_classification(n_samples=10000, n_features=20,

n_informative=5, n_redundant=15, random_state=1)

# Step 3: Split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Train and test models with different max_depth values

train_scores = []

test_scores = []

depth_range = range(1, 21)

for depth in depth_range:

model = DecisionTreeClassifier(max_depth=depth, random_state=0)

model.fit(X_train, y_train)

train_yhat = model.predict(X_train)

test_yhat = model.predict(X_test)

train_acc = accuracy_score(y_train, train_yhat)

test_acc = accuracy_score(y_test, test_yhat)

train_scores.append(train_acc)

test_scores.append(test_acc)

print(f"Depth={depth}, Train Acc={train_acc:.3f}, Test Acc={test_acc:.3f}")

# Step 5: Plot Training vs Testing Accuracy

plt.figure(figsize=(10, 6))
plt.plot(depth_range, train_scores, '-o', label='Training Accuracy')

plt.plot(depth_range, test_scores, '-o', label='Testing Accuracy')

plt.xlabel('Tree Depth')

plt.ylabel('Accuracy')

plt.title('Overfitting Detection: Training vs Testing Accuracy')

plt.legend()

plt.grid(True)

plt.show()

SAMPLE INPUT OUTPUT:

RESULT:

Thus to analyze the difference in performance between training and validation sets to
determine if the model is overfitting, and to visualize the results to detect and address this
issue was executed successfully.
EX NO: 5
Implement the K-Means algorithm using the given
DATE: dataset.

Aim:

To implement the K-Means Clustering Algorithm on the given biological dataset and group
the data points (species) based on their codon usage frequencies.

Algorithm: K-Means Clustering

1. Start
2. Import the required libraries (pandas, sklearn, matplotlib, etc.).
3. Load the dataset containing codon usage frequencies and other features.
4. Select the relevant numerical features (e.g., UUU, UUC, UUA, UUG).
5. Normalize the features using StandardScaler for better clustering results.
6. Choose the number of clusters (e.g., k = 3).
7. Apply the K-Means clustering algorithm:
■ Initialize centroids randomly.
■ Assign each point to the nearest centroid.
■ Update centroids as the mean of assigned points.
■ Repeat steps until centroids do not change or max iterations are
reached.
8. Assign the cluster labels to the dataset.
9. Display the final clustered data with the cluster number.
10. Optionally, visualize clusters using scatter plots.

PROGRAM:

# Step 1: Import required libraries

import pandas as pd

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt

import seaborn as sns

# Step 2: Create sample data

data = {

'Kingdom': ['vrl']5 + ['pri']2,

'DNAType': [0]*7,

'SpeciesID': [100217, 100220, 100755, 100880, 100887, 9601, 9606],

'Ncodons': [1995, 1474, 4862, 1915, 22831, 1097, 40662582],

'SpeciesName': [

'Epizootic haemotopoietic necrosis virus',

'Bohle iridovirus',

'Sweet potato leaf curl virus',

'Northern cereal mosaic virus',

'Soil-borne cereal mosaic virus',

'Pongo pygmaeus abelii',

'Homo sapiens'

'UUU': [0.01654, 0.02714, 0.01974, 0.01775, 0.02816, 0.02552, 0.01757],

'UUC': [0.01203, 0.01357, 0.0218, 0.02245, 0.01371, 0.03555, 0.02028],

'UUA': [0.0005, 0.00068, 0.01357, 0.01619, 0.00767, 0.00547, 0.00767],

'UUG': [0.00351, 0.00678, 0.01543, 0.00992, 0.03679, 0.01367, 0.01293]

}
df = pd.DataFrame(data)

# Step 3: Extract codon usage features

features = df[['UUU', 'UUC', 'UUA', 'UUG']]

# Step 4: Normalize codon frequencies

scaler = StandardScaler()

scaled_features = scaler.fit_transform(features)

# Step 5: Apply KMeans clustering

kmeans = KMeans(n_clusters=3, random_state=0)

df['Cluster'] = kmeans.fit_predict(scaled_features)

# Step 6: Display final output

pd.set_option('display.max_columns', None)

print(" Final Output:\n")

print(df[['Kingdom', 'DNAType', 'SpeciesID', 'Ncodons', 'SpeciesName', 'UUU', 'UUC',

'UUA', 'UUG', 'Cluster']])

# Step 7: Optional visualization

plt.figure(figsize=(8, 6))

sns.scatterplot(data=df, x='UUU', y='UUC', hue='Cluster', palette='Set1', s=100)

plt.title("Codon Usage Clustering using K-Means")

plt.xlabel("UUU Frequency")

plt.ylabel("UUC Frequency")

plt.grid(True)

plt.show()
SAMPLE INPUT OUTPUT:

RESULT:

Thus to implement the K-Means Clustering Algorithm on the given biological

dataset and group the data points (species) based on their codon usage frequencies.

AI-900 Exam
No ratings yet
AI-900 Exam
161 pages
Generative AI and The Future of HR 1686403714
100% (1)
Generative AI and The Future of HR 1686403714
7 pages
Belza Sa2 Final Manuscript
No ratings yet
Belza Sa2 Final Manuscript
39 pages
Topic Modeling P.P.T
No ratings yet
Topic Modeling P.P.T
27 pages
Ai Combined Update
No ratings yet
Ai Combined Update
274 pages
Over 100 Data and Analytics Predictions Through 2028 1
No ratings yet
Over 100 Data and Analytics Predictions Through 2028 1
27 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
Cisco AI Whitepaper
No ratings yet
Cisco AI Whitepaper
16 pages
Importance of Teacher
No ratings yet
Importance of Teacher
8 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Syllabus
No ratings yet
Syllabus
94 pages
Deep Learning
No ratings yet
Deep Learning
90 pages
Great Collection of Data Science Resources
100% (1)
Great Collection of Data Science Resources
2 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
Project 1234
No ratings yet
Project 1234
58 pages
MIS Unit 1
No ratings yet
MIS Unit 1
31 pages
2-Array (E-Next - In)
No ratings yet
2-Array (E-Next - In)
40 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Final ML File
No ratings yet
Final ML File
34 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
ML Lab Mannual
No ratings yet
ML Lab Mannual
29 pages
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Artificial Intelligence: A Computational and Linear Programming Approach
No ratings yet
Artificial Intelligence: A Computational and Linear Programming Approach
6 pages
The Great A.I.
No ratings yet
The Great A.I.
27 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
2ND Year AI
No ratings yet
2ND Year AI
2 pages
Se Project Repot
No ratings yet
Se Project Repot
22 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
ML Lab Programs
No ratings yet
ML Lab Programs
23 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Shortlisted Projects
No ratings yet
Shortlisted Projects
3 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
Electronics
No ratings yet
Electronics
23 pages
ML Experiments
No ratings yet
ML Experiments
22 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Dir Ex1
No ratings yet
Dir Ex1
20 pages
Lab Manual ML
No ratings yet
Lab Manual ML
23 pages
FinGen Final
No ratings yet
FinGen Final
21 pages
TKAN: Temporal Kolmogorov-Arnold Networks: Rémi Genet & Hugo Inzirillo May 2024
No ratings yet
TKAN: Temporal Kolmogorov-Arnold Networks: Rémi Genet & Hugo Inzirillo May 2024
17 pages
ML Yogesh
No ratings yet
ML Yogesh
23 pages
Natural Language Processing - Wikipedia
No ratings yet
Natural Language Processing - Wikipedia
20 pages
Sathyabama: Design Thinking and Innovation Report
No ratings yet
Sathyabama: Design Thinking and Innovation Report
17 pages
Detailed Explanation: IR Vs Web Search Vs Web
No ratings yet
Detailed Explanation: IR Vs Web Search Vs Web
15 pages
Ase Lab Upto 3 Completed
No ratings yet
Ase Lab Upto 3 Completed
20 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
3 Classification
No ratings yet
3 Classification
16 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
EX - NO:1 Date: Write A Problem Statement To Define A Title of The Project With Bounded Scope of The Project Aim
No ratings yet
EX - NO:1 Date: Write A Problem Statement To Define A Title of The Project With Bounded Scope of The Project Aim
16 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
St. John College of Engineering and Management, Palghar - Maharashtra
No ratings yet
St. John College of Engineering and Management, Palghar - Maharashtra
11 pages
ML RECORD EX 5,6,7,8,9 (Without Border)
No ratings yet
ML RECORD EX 5,6,7,8,9 (Without Border)
13 pages
DM ML Practical
No ratings yet
DM ML Practical
13 pages
Extreme Programming (XP) Process
No ratings yet
Extreme Programming (XP) Process
14 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Unit 1 Introduction To Datascience
No ratings yet
Unit 1 Introduction To Datascience
14 pages
ML Functions
No ratings yet
ML Functions
12 pages
Final ML Programs 075005
No ratings yet
Final ML Programs 075005
15 pages
1
No ratings yet
1
13 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
12 pages
Aam Codes
No ratings yet
Aam Codes
8 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
Private Markets Newsletter-Aug
No ratings yet
Private Markets Newsletter-Aug
14 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
ISYE6501 Homework 2
No ratings yet
ISYE6501 Homework 2
11 pages
ML Codes
No ratings yet
ML Codes
9 pages
ML
No ratings yet
ML
11 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
ML Exp 6-9 (Corrected)
No ratings yet
ML Exp 6-9 (Corrected)
9 pages
Prediction of Idiopathic Recurrent Spontaneous Miscarriage Using Machine Learning
No ratings yet
Prediction of Idiopathic Recurrent Spontaneous Miscarriage Using Machine Learning
8 pages
ML Lab
No ratings yet
ML Lab
7 pages
Implementing KNN Algorithm On The Iris Dataset
No ratings yet
Implementing KNN Algorithm On The Iris Dataset
7 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Ds Report
No ratings yet
Ds Report
5 pages
Vighnesh - S Log 13
No ratings yet
Vighnesh - S Log 13
4 pages
AI's Impact On Digital Communication
No ratings yet
AI's Impact On Digital Communication
6 pages
Risss ML Record 6
No ratings yet
Risss ML Record 6
6 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
1 Kmeans
No ratings yet
1 Kmeans
6 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
Ds Short
No ratings yet
Ds Short
2 pages
7 Lessons To Ensure Successful Machine Learning Projects - MIT Sloan
No ratings yet
7 Lessons To Ensure Successful Machine Learning Projects - MIT Sloan
5 pages
IoT 2m
No ratings yet
IoT 2m
4 pages
Assignment 5
No ratings yet
Assignment 5
5 pages
Foundations Data Science 2 Marks Complete
No ratings yet
Foundations Data Science 2 Marks Complete
4 pages
Implementing KNN Algorithm: Importing Libraries
No ratings yet
Implementing KNN Algorithm: Importing Libraries
6 pages
Shsconf Apmm2024 04003 5
No ratings yet
Shsconf Apmm2024 04003 5
5 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
IBM
No ratings yet
IBM
3 pages
Resume SnehaSB
No ratings yet
Resume SnehaSB
3 pages
Working With Data From Files in R
No ratings yet
Working With Data From Files in R
1 page
Department of Computer Science & Engineering Continuous Internal Assessment - 2
No ratings yet
Department of Computer Science & Engineering Continuous Internal Assessment - 2
2 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

EX - NO:3: Algorithm

Uploaded by

EX - NO:3: Algorithm

Uploaded by

EX.NO:3 CLASSIFICATION WITH NEAREST NEIGHBORS.

To implement a program for Classification using Nearest Neighbors using Scikit-

1. Start the program.

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.datasets import load_wine

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Try different K values

neighbors = np.arange(1, 10)

plt.plot(neighbors, train_accuracy, label="Train Accuracy", marker='o')

plt.plot(neighbors, test_accuracy, label="Test Accuracy", marker='s')

plt.xlabel("Number of Neighbors (K)")

plt.title("KNN Accuracy for Different K Values")

# Final Model with Best K

acc = accuracy_score(y_test, y_pred)

print(f"\n Final Accuracy (K={best_k}): {acc:.4f}")

print("\n Confusion Matrix:")

print("\n Classification Report:")

print(classification_report(y_test, y_pred, target_names=wine.target_names))

# Plot Confusion Matrix

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',

SAMPLE INPUT OUTPUT:

1. Start the program.

# Step 1: Import Libraries

import matplotlib.pyplot as plt

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Step 2: Generate synthetic classification dataset

n_informative=5, n_redundant=15, random_state=1)

# Step 3: Split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Train and test models with different max_depth values

depth_range = range(1, 21)

for depth in depth_range:

model = DecisionTreeClassifier(max_depth=depth, random_state=0)

train_acc = accuracy_score(y_train, train_yhat)

test_acc = accuracy_score(y_test, test_yhat)

print(f"Depth={depth}, Train Acc={train_acc:.3f}, Test Acc={test_acc:.3f}")

# Step 5: Plot Training vs Testing Accuracy

plt.plot(depth_range, test_scores, '-o', label='Testing Accuracy')

plt.title('Overfitting Detection: Training vs Testing Accuracy')

SAMPLE INPUT OUTPUT:

Algorithm: K-Means Clustering

# Step 1: Import required libraries

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

import seaborn as sns

# Step 2: Create sample data

'Kingdom': ['vrl']*5 + ['pri']*2,

'SpeciesID': [100217, 100220, 100755, 100880, 100887, 9601, 9606],

'Ncodons': [1995, 1474, 4862, 1915, 22831, 1097, 40662582],

'Epizootic haemotopoietic necrosis virus',

'Sweet potato leaf curl virus',

'Northern cereal mosaic virus',

'Soil-borne cereal mosaic virus',

'Pongo pygmaeus abelii',

'UUU': [0.01654, 0.02714, 0.01974, 0.01775, 0.02816, 0.02552, 0.01757],

'UUC': [0.01203, 0.01357, 0.0218, 0.02245, 0.01371, 0.03555, 0.02028],

'UUA': [0.0005, 0.00068, 0.01357, 0.01619, 0.00767, 0.00547, 0.00767],

'UUG': [0.00351, 0.00678, 0.01543, 0.00992, 0.03679, 0.01367, 0.01293]

# Step 3: Extract codon usage features

features = df[['UUU', 'UUC', 'UUA', 'UUG']]

# Step 4: Normalize codon frequencies

# Step 5: Apply KMeans clustering

kmeans = KMeans(n_clusters=3, random_state=0)

# Step 6: Display final output

print(" Final Output:\n")

print(df[['Kingdom', 'DNAType', 'SpeciesID', 'Ncodons', 'SpeciesName', 'UUU', 'UUC',

# Step 7: Optional visualization

sns.scatterplot(data=df, x='UUU', y='UUC', hue='Cluster', palette='Set1', s=100)

plt.title("Codon Usage Clustering using K-Means")

Thus to implement the K-Means Clustering Algorithm on the given biological

'Kingdom': ['vrl']5 + ['pri']2,