0% found this document useful (0 votes)

8 views11 pages

DSASSign 4

The document outlines an assignment on applying the K-Nearest Neighbors (KNN) algorithm for statistical and predictive modeling using the Breast Cancer dataset in Python. It details the steps of dataset selection, data preprocessing, exploratory data analysis, KNN implementation, model training, evaluation, and optimization through hyperparameter tuning. The optimized KNN model achieved improved performance metrics, demonstrating the algorithm's relevance in classification tasks.

Uploaded by

Nasir khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views11 pages

DSASSign 4

Uploaded by

Nasir khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Assignment #24

Name: Nasir Khan Sayyad

Reg. No: FA19-BCS-212
Teacher: Ma'am Tariq Urooj

Title: Applying K-Nearest

Neighbors (KNN) Algorithm for
Statistical and Predictive
Modeling in R or (Python)

1. Dataset Selection:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Set random seed for reproducibility

np.random.seed(42)

# Task 1: Dataset Selection

# Load the Breast Cancer dataset from scikit-learn
data = load_breast_cancer()
X, y = data.data, data.target

2. Data Exploration and Preprocessing:

Code and Steps

# Task 2: Data Preprocessing

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the feature values

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

3. Exploratory Data Analysis:

# Task 3: Exploratory Data Analysis

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Convert the feature matrix X into a pandas DataFrame

df = pd.DataFrame(X_train, columns=data.feature_names)

# Add the target variable to the DataFrame

df['target'] = y_train

# Calculate and plot the correlation matrix

corr_matrix = df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

# Plot histograms of the features

plt.figure(figsize=(12, 10))
for i, feature in enumerate(data.feature_names):
plt.subplot(5, 6, i+1)
sns.histplot(df[feature], kde=True)
plt.title(feature)
plt.tight_layout()
plt.show()

# Plot box plots of the features by target variable

plt.figure(figsize=(12, 10))
for i, feature in enumerate(data.feature_names):
plt.subplot(5, 6, i+1)
sns.boxplot(x='target', y=feature, data=df)
plt.title(feature)
plt.tight_layout()
plt.show()
Output:

4. KNN Algorithm Implementation:

# Task 4: KNN Algorithm Implementation
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Create a KNN classifier

knn_model = KNeighborsClassifier(n_neighbors=5)

5. Model Training and Evaluation:

# Task 5: Model Training and Evaluation

# Train the KNN model on the training dataset
knn_model.fit(X_train, y_train)

# Predict on the test set

y_pred = knn_model.predict(X_test)

# Evaluate the model's performance

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("\n\n")
print("KNN Model Performance:")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)
print("\n\n")

Output:
6. Model Optimization and Hyperparameter Tuning:

# Task 6: Model Optimization and Hyperparameter Tuning

from sklearn.model_selection import GridSearchCV

# Define the parameter grid for grid search

param_grid = {'n_neighbors': [3, 5, 7, 9, 11]}

# Perform grid search to find the best K value

grid_search = GridSearchCV(knn_model, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Get the best K value and retrain the model

best_k = grid_search.best_params_['n_neighbors']
knn_model = KNeighborsClassifier(n_neighbors=best_k)
knn_model.fit(X_train, y_train)

# Predict on the test set with the optimized model

y_pred_optimized = knn_model.predict(X_test)

# Evaluate the optimized model's performance

accuracy_optimized = accuracy_score(y_test, y_pred_optimized)
precision_optimized = precision_score(y_test, y_pred_optimized)
recall_optimized = recall_score(y_test, y_pred_optimized)
f1_optimized = f1_score(y_test, y_pred_optimized)

print("\n\nAfter optimization")

print("Optimized KNN Model Performance:")

print("Accuracy:", accuracy_optimized)
print("Precision:", precision_optimized)
print("Recall:", recall_optimized)
print("F1-score:", f1_optimized)

Output:
6. Conclusion and Discussion

Summary:
 The KNN algorithm was applied to the breast cancer dataset using Python.
 The initial KNN model achieved an accuracy, precision, recall, and F1-score
of 94.74%.
 Hyperparameter tuning using grid search found the optimal value of K to be
3.
 The optimized KNN model achieved an accuracy, precision, recall, and F1-
score of 95.74%.
 The breast cancer dataset contained 569 samples and 30 features, with class
labels of "malignant" and "benign".
 The KNN algorithm is relevant and applicable in real-world scenarios for
classification tasks.
 The algorithm has strengths in its simplicity, versatility, and ability to handle
various datasets.
 Limitations of KNN include computational complexity, sensitivity to
parameter choices, and equal weighting of features.
 Potential improvements include using advanced distance metrics, feature
selection, and dimensionality reduction techniques.
 The optimized KNN model showed promising results in classifying breast
cancer cases.
 Further research and experimentation can be done to explore other
techniques and algorithms for improved performance.
All Source Code:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Jul 5 18:39:26 2023

@author: nasir
"""

# Task 1: Dataset Selection

from sklearn.datasets import load_breast_cancer

# Load the breast cancer dataset

data = load_breast_cancer()

# Print a brief description of the dataset

print("Breast Cancer Dataset:")
print("Number of samples:", data.data.shape[0])
print("Number of features:", data.data.shape[1])
print("Class labels:", data.target_names)

# Task 2: Data Preprocessing

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2,
random_state=42)

# Perform feature scaling

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Task 3: Exploratory Data Analysis

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Convert the feature matrix X into a pandas DataFrame

df = pd.DataFrame(X_train, columns=data.feature_names)

# Add the target variable to the DataFrame

df['target'] = y_train

# Calculate and plot the correlation matrix

corr_matrix = df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

# Plot histograms of the features

plt.figure(figsize=(12, 10))
for i, feature in enumerate(data.feature_names):
plt.subplot(5, 6, i+1)
sns.histplot(df[feature], kde=True)
plt.title(feature)
plt.tight_layout()
plt.show()

# Plot box plots of the features by target variable

plt.figure(figsize=(12, 10))
for i, feature in enumerate(data.feature_names):
plt.subplot(5, 6, i+1)
sns.boxplot(x='target', y=feature, data=df)
plt.title(feature)
plt.tight_layout()
plt.show()

# Task 4: KNN Algorithm Implementation

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Create a KNN classifier

knn_model = KNeighborsClassifier(n_neighbors=5)

# Task 5: Model Training and Evaluation

# Train the KNN model on the training dataset
knn_model.fit(X_train, y_train)

# Predict on the test set

y_pred = knn_model.predict(X_test)
# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("\n\n")
print("KNN Model Performance:")
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)
print("\n\n")

# Task 6: Model Optimization and Hyperparameter Tuning

from sklearn.model_selection import GridSearchCV

# Define the parameter grid for grid search

param_grid = {'n_neighbors': [3, 5, 7, 9, 11]}

# Perform grid search to find the best K value

grid_search = GridSearchCV(knn_model, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Get the best K value and retrain the model

best_k = grid_search.best_params_['n_neighbors']
knn_model = KNeighborsClassifier(n_neighbors=best_k)
knn_model.fit(X_train, y_train)

# Predict on the test set with the optimized model

y_pred_optimized = knn_model.predict(X_test)

# Evaluate the optimized model's performance

print("\n\nAfter optimization")

print("Optimized KNN Model Performance:")

print("Accuracy:", accuracy_optimized)
print("Precision:", precision_optimized)
print("Recall:", recall_optimized)
print("F1-score:", f1_optimized)

Handbook of Silicon Semiconductor Metrology by Alain C. Diebold (Ed.)
No ratings yet
Handbook of Silicon Semiconductor Metrology by Alain C. Diebold (Ed.)
866 pages
Ims Og GL Si - DCC Pro 0001 - 00
No ratings yet
Ims Og GL Si - DCC Pro 0001 - 00
125 pages
D4707 Salpiqueo
No ratings yet
D4707 Salpiqueo
9 pages
COMP 6930 Topic01 Classification Basics
No ratings yet
COMP 6930 Topic01 Classification Basics
190 pages
General Chemistry - Darrell D. Ebbing
100% (10)
General Chemistry - Darrell D. Ebbing
724 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
Basic Intelligenceok
100% (1)
Basic Intelligenceok
27 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
ML Notes
100% (2)
ML Notes
125 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
ASTM D2272-09 - Oxidation Stability
No ratings yet
ASTM D2272-09 - Oxidation Stability
19 pages
Clerical Reasoning Test Technical Manual
100% (2)
Clerical Reasoning Test Technical Manual
37 pages
General Physics 1 - Lesson 2
No ratings yet
General Physics 1 - Lesson 2
26 pages
Worksheet - 2.3 20BCS7490
No ratings yet
Worksheet - 2.3 20BCS7490
6 pages
9630 International As Level Physics Scheme of Work Full
No ratings yet
9630 International As Level Physics Scheme of Work Full
77 pages
Lab 10 - Manual and Assignment On KNN
No ratings yet
Lab 10 - Manual and Assignment On KNN
3 pages
SSPC-PA 2 (2004) - Measurement of DFT With Magnetic Gages PDF
100% (1)
SSPC-PA 2 (2004) - Measurement of DFT With Magnetic Gages PDF
11 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
Me8501 LN
No ratings yet
Me8501 LN
136 pages
Synopsis Presentation SCARA
No ratings yet
Synopsis Presentation SCARA
10 pages
ICYB
No ratings yet
ICYB
40 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
K-Nearest Neighbor (KNN) 6
No ratings yet
K-Nearest Neighbor (KNN) 6
46 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
HPLC Method Development
100% (4)
HPLC Method Development
8 pages
Data Quality - Facilitators Manual FINAL - July242018
No ratings yet
Data Quality - Facilitators Manual FINAL - July242018
78 pages
Rahul Raj - Ipynb - Colab
No ratings yet
Rahul Raj - Ipynb - Colab
50 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
CXG - 072e GUIDELINES ON ANALYTICAL TERMINOLOGY PDF
No ratings yet
CXG - 072e GUIDELINES ON ANALYTICAL TERMINOLOGY PDF
18 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
SKL 3013: Chemistry Laboratory Management Techniques: Fatini Zakirah Binti Zaharin D20161073878
No ratings yet
SKL 3013: Chemistry Laboratory Management Techniques: Fatini Zakirah Binti Zaharin D20161073878
13 pages
Gis Data and Sources
No ratings yet
Gis Data and Sources
22 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Lab 8
No ratings yet
Lab 8
7 pages
Artificial Intelligence Lab 7
No ratings yet
Artificial Intelligence Lab 7
10 pages
Machine Learning KNN - Supervised
No ratings yet
Machine Learning KNN - Supervised
9 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
Diabetes Prediction System With KNN Algorithm
No ratings yet
Diabetes Prediction System With KNN Algorithm
29 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
Tank Production Stimation Math
No ratings yet
Tank Production Stimation Math
20 pages
0 - Cruz & Sarmiento, 2019 Forecast Innacuracy
No ratings yet
0 - Cruz & Sarmiento, 2019 Forecast Innacuracy
36 pages
Puh 513 Group B Electrical Weighing Balance Report
No ratings yet
Puh 513 Group B Electrical Weighing Balance Report
22 pages
Allvis Light Manual en
No ratings yet
Allvis Light Manual en
26 pages
KNN Model Implementation
No ratings yet
KNN Model Implementation
12 pages
Machine Learning With Python - Machine Learning Algorithms - KNN
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - KNN
15 pages
Untitled 9
No ratings yet
Untitled 9
17 pages
ML Fraud
No ratings yet
ML Fraud
15 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
Final
No ratings yet
Final
13 pages
Iso 6647 2007-2 Determination of Amylose CNTN (Routine Method) .
No ratings yet
Iso 6647 2007-2 Determination of Amylose CNTN (Routine Method) .
9 pages
V
No ratings yet
V
8 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
Python Code For KNN Classifier 1. Initial Message
No ratings yet
Python Code For KNN Classifier 1. Initial Message
7 pages
CH 3 - Uncertainty, Repeatability and Accuracy
No ratings yet
CH 3 - Uncertainty, Repeatability and Accuracy
11 pages
ML 3
No ratings yet
ML 3
6 pages
Module 3 Lab 2
No ratings yet
Module 3 Lab 2
6 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
Implementing KNN Algorithm On The Iris Dataset
No ratings yet
Implementing KNN Algorithm On The Iris Dataset
7 pages
KNN Classifier
No ratings yet
KNN Classifier
5 pages
Here's An Visualization of The K-Nearest Neighbors Algorithm
No ratings yet
Here's An Visualization of The K-Nearest Neighbors Algorithm
5 pages
Worksheet - 2.3 20BCS7611
No ratings yet
Worksheet - 2.3 20BCS7611
6 pages
04 KNN Implementation
No ratings yet
04 KNN Implementation
7 pages
Practical 7
No ratings yet
Practical 7
6 pages
Dhanashree ML Report
No ratings yet
Dhanashree ML Report
3 pages
DL Exp-1.4 19BCS1431
No ratings yet
DL Exp-1.4 19BCS1431
5 pages
JWiid 2021 Chapter6CollectionOfS MarketingResearch
No ratings yet
JWiid 2021 Chapter6CollectionOfS MarketingResearch
5 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
K Nearest Neighbor Algorithm in Python - Towards Data Science
No ratings yet
K Nearest Neighbor Algorithm in Python - Towards Data Science
7 pages
ML Lab Week 7
No ratings yet
ML Lab Week 7
4 pages
Disease Prediction Based On Retinal Images Using Neural Network Classification
No ratings yet
Disease Prediction Based On Retinal Images Using Neural Network Classification
7 pages
MLLABDA2
No ratings yet
MLLABDA2
5 pages
IML Assingment Report
No ratings yet
IML Assingment Report
6 pages
2 Interpretability of Hybrid Feature Using Graph Neural Networks From Mental Arithmetic Based EEG
No ratings yet
2 Interpretability of Hybrid Feature Using Graph Neural Networks From Mental Arithmetic Based EEG
5 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Machine Learning Assignment 3
No ratings yet
Machine Learning Assignment 3
7 pages
KNN Lab
No ratings yet
KNN Lab
4 pages
ML Lab2 PGM
No ratings yet
ML Lab2 PGM
3 pages
PGM 5
No ratings yet
PGM 5
3 pages
Assignment No 2 AI
No ratings yet
Assignment No 2 AI
4 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
EEW17.3.9.Pressure Test Report
No ratings yet
EEW17.3.9.Pressure Test Report
6 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
KnnClassifier - Jupyter Notebook
No ratings yet
KnnClassifier - Jupyter Notebook
2 pages
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
2 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet

DSASSign 4

Uploaded by

DSASSign 4

Uploaded by

Assignment #24

Name: Nasir Khan Sayyad

Title: Applying K-Nearest

# Set random seed for reproducibility

# Task 1: Dataset Selection

2. Data Exploration and Preprocessing:

Code and Steps

# Task 2: Data Preprocessing

# Standardize the feature values

3. Exploratory Data Analysis:

# Task 3: Exploratory Data Analysis

# Convert the feature matrix X into a pandas DataFrame

# Add the target variable to the DataFrame

# Calculate and plot the correlation matrix

# Plot histograms of the features

# Plot box plots of the features by target variable

4. KNN Algorithm Implementation:

# Create a KNN classifier

5. Model Training and Evaluation:

# Task 5: Model Training and Evaluation

# Predict on the test set

# Evaluate the model's performance

# Task 6: Model Optimization and Hyperparameter Tuning

# Define the parameter grid for grid search

# Perform grid search to find the best K value

# Get the best K value and retrain the model

# Predict on the test set with the optimized model

# Evaluate the optimized model's performance

print("Optimized KNN Model Performance:")

# Task 1: Dataset Selection

# Load the breast cancer dataset

# Print a brief description of the dataset

# Task 2: Data Preprocessing

# Split the data into training and testing sets

# Perform feature scaling

# Task 3: Exploratory Data Analysis

# Convert the feature matrix X into a pandas DataFrame

# Add the target variable to the DataFrame

# Calculate and plot the correlation matrix

# Plot histograms of the features

# Plot box plots of the features by target variable

# Task 4: KNN Algorithm Implementation

# Create a KNN classifier

# Task 5: Model Training and Evaluation

# Predict on the test set

# Task 6: Model Optimization and Hyperparameter Tuning

# Define the parameter grid for grid search

# Perform grid search to find the best K value

# Get the best K value and retrain the model

# Predict on the test set with the optimized model

# Evaluate the optimized model's performance

print("Optimized KNN Model Performance:")

You might also like