0% found this document useful (0 votes)
9 views2 pages

Utkarsh

The document outlines a process for analyzing two datasets using Python in Google Colab, specifically focusing on the Iris dataset and a movie dataset. It details steps for data loading, preprocessing, and applying a K-Nearest Neighbors (KNN) classifier to predict species in the Iris dataset, achieving a test accuracy of 95.65%. The analysis includes splitting the data into training, validation, and test sets, as well as determining the optimal number of neighbors (K) for the KNN model.

Uploaded by

bhaishaab175
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views2 pages

Utkarsh

The document outlines a process for analyzing two datasets using Python in Google Colab, specifically focusing on the Iris dataset and a movie dataset. It details steps for data loading, preprocessing, and applying a K-Nearest Neighbors (KNN) classifier to predict species in the Iris dataset, achieving a test accuracy of 95.65%. The analysis includes splitting the data into training, validation, and test sets, as well as determining the optimal number of neighbors (K) for the KNN model.

Uploaded by

bhaishaab175
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

29/01/2025, 19:19 29-01-2025 - Colab

from google.colab import drive

drive.mount('/content/gdrive')

Mounted at /content/gdrive

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn import datasets

import matplotlib.pyplot as plt

path = "/content/drive/MyDrive/dataset/exp 2/datairis.csv"


df=pd.read_csv(path)
df.head(10)

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

5 6 5.4 3.9 1.7 0.4 Iris-setosa

6 7 4.6 3.4 1.4 0.3 Iris-setosa

7 8 5.0 3.4 1.5 0.2 Iris-setosa

8 9 4.4 2.9 1.4 0.2 Iris-setosa

9 10 4.9 3.1 1.5 0.1 Iris-setosa

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

path2 = "/content/drive/MyDrive/dataset/exp 2/datasetmovies (1).csv"


df=pd.read_csv(path2)
df.head(10)

No. of action scene No.of comedy scene Class/Label/categories

0 100 15 Action

1 20 95 comedy

2 90 5 Action

3 10 85 Comedy

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

# Load the Iris dataset


iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split into training (70%), validation (15%), and testing (15%)


X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)

# Check data shapes


print(f"Train size: {X_train.shape}, Validation size: {X_val.shape}, Test size: {X_test.shape}")

Train size: (105, 4), Validation size: (22, 4), Test size: (23, 4)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)

https://colab.research.google.com/drive/1IQUGVJGWYn7xnTY76KNnjGNmYlyQmhcB#scrollTo=dKeo40EOqbnr&printMode=true 1/2
29/01/2025, 16:15 29-01-2025 - Colab
X_test = scaler.transform(X_test)

best_k = 1
best_accuracy = 0

for k in range(1, 21):


knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
val_preds = knn.predict(X_val)
val_accuracy = accuracy_score(y_val, val_preds)

if val_accuracy > best_accuracy:


best_accuracy = val_accuracy
best_k = k

print(f"Best K found: {best_k} with validation accuracy: {best_accuracy:.4f}")

Best K found: 1 with validation accuracy: 0.9091

final_knn = KNeighborsClassifier(n_neighbors=best_k)
final_knn.fit(X_train, y_train)
test_preds = final_knn.predict(X_test)
test_accuracy = accuracy_score(y_test, test_preds)

print(f"Test accuracy using best K ({best_k}): {test_accuracy:.4f}")

Test accuracy using best K (1): 0.9565

2/2

You might also like