0% found this document useful (0 votes)

28 views8 pages

Experiment 4

Uploaded by

mohammed.ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views8 pages

Experiment 4

Uploaded by

mohammed.ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Experiment - 4

Name: Ansari Mohammed Shanouf Valijan

Class: B.E. Computer Engineering, Semester - VII
UID: 2021300004
Batch: M

Aim:
To use KNN (K-nearest-neighbour) technique for regression and classification tasks in the
domain of healthcare.

Objective:
▪ To identify and look for datasets where KNN may be suitable.
▪ To determine the optimum K value, both, in regression and in classification.
▪ To build, train and test the created model using the information on K.

Outcomes:
▪ To understand the working of KNN.
▪ To be able to implement KNN for regression and classification tasks on a given dataset.
▪ To be able to compare models in order to decide an optimum among them.

Theory:
K-nearest neighbours is a simple, yet powerful machine learning algorithm that is often used
for both classification and regression tasks. It operates on the assumption that similar data
points exist in close proximity to each other in feature space. The KNN algorithm works by
identifying the nearest 'k' data points to a given query point and making predictions based on
these neighbours. The value of 'k' is a key parameter in the algorithm, determining how many
neighbours should influence the prediction. One of the primary advantages of KNN is its
simplicity, as it requires no explicit model training. Instead, it stores the entire dataset and
defers decision-making until a query point is presented. This characteristic makes KNN a lazy
learner, as it does not generalize from the training data beforehand. However, this also makes
KNN computationally expensive during prediction, especially for large datasets, as it requires
scanning the entire dataset to find the nearest neighbours. Additionally, the algorithm is
sensitive to feature scaling, as distances between points can dominate the predictions if not
appropriately scaled.

In regression tasks, KNN predicts continuous values based on the values of the 'k' nearest
neighbours. Instead of assigning a class, it computes the average or weighted average of the
neighbours' target values and returns that as the prediction. This method makes KNN highly
intuitive for regression tasks because it assumes that similar data points should have similar
outputs. A lower value of 'k' might result in predictions being overly sensitive to noise, as only
very close neighbours will affect the outcome, leading to higher variance. Conversely, larger
values of 'k' tend to smooth out predictions by including more neighbours, but they might
also lead to bias if the neighbours are too distant or irrelevant. One of the key challenges of
using KNN for regression is deciding the optimal value of 'k', as different datasets may require
different values. Despite these challenges, KNN regression is widely used due to its simplicity
and effectiveness, especially when the dataset is small and noise levels are manageable.

When used for classification, KNN predicts the class of a given query point based on the
majority class of its nearest neighbours. The algorithm checks the labels of the 'k' closest data
points and assigns the query point to the class that occurs most frequently among those
neighbours. If there is a tie, various strategies, such as distance weighting, can be applied to
break it. KNN classification works well for multi-class problems and can handle both binary
and categorical target variables. Its non-parametric nature allows it to make no assumptions
about the underlying distribution of the data, making it flexible for different types of datasets.
However, its simplicity also introduces drawbacks, such as its sensitivity to outliers and
irrelevant features. Furthermore, as the size of the dataset grows, KNN classification can
become slower, since the algorithm must compute the distance from the query point to every
point in the dataset. Efficient implementation techniques, such as using KD-trees or Ball-trees,
can help alleviate these computational challenges, making KNN a versatile choice for many
classification problems.
Dataset Description:
For classification task using KNN, breast cancer (Wisconsin) dataset was used.
The task was to identify whether a patient has or does not have sever breast cancer
(Malignant or Benign) based on various physical quantifications obtained from testing the
patient.

For regression task using KNN, Hospital Stay dataset was utilized.
KNN was used to determine the number of days an admitted patient will stay in a particular
hospital based on the severity of the patient’s disease, the doctor concerned, the department
concerned, etc.

Code:
For KNN Classification-

Importing the required libraries and the dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.cluster import KMeans
from sklearn.impute import SimpleImputer
df = pd.read_csv('/content/data.csv')

Getting rid of irrelevant columns

data = df.drop(columns=['id'])

Filling out the missing data in the dataset using simple average
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(data.drop(columns=['diagnosis']))

Mapping the classes (Benign and Malignant) to numerals

y = data['diagnosis']

y = y.map({'M': 1, 'B': 0})

Splitting the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_imputed, y, test_size=0.2,
random_state=42)

Standardizing the inputs (X)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Training KNN classifier models using k values in a range and plotting the validation error
errors = []
k_range = range(1, 16)

for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train_scaled, y_train)
y_pred = knn.predict(X_test_scaled)
error = 1 - knn.score(X_test_scaled, y_test)
errors.append(error)

plt.figure(figsize=(10, 6))
plt.plot(k_range, errors, marker='o', linestyle='-')
plt.xlabel('Number of Neighbors K')
plt.ylabel('Error Rate')
plt.title('Elbow Method for Optimal K (Error vs K)')
plt.xticks(k_range)
plt.grid(True)
plt.show()
Using the optimal k value as found to finalize model training
optimal_k = 9

knn = KNeighborsClassifier(n_neighbors=optimal_k)

knn.fit(X_train_scaled, y_train)

y_pred = knn.predict(X_test_scaled)

print(f'Optimal K: {optimal_k}')
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))
print('Classification Report:')
print(classification_report(y_test, y_pred))

For KNN Regression-

Importing the required libraries and the dataset

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv("hf://datasets/Nicolybgs/healthcare_data/healthcare_data.csv")
df

Getting rid of irrelevant columns

df = df.drop(columns=['patientid'])

Replacing the values of the ‘age’ column by considering the midpoint of the range mentioned
def range_to_midpoint(age_range):
start, end = age_range.split('-')
return (int(start) + int(end)) / 2

df['Age'] = df['Age'].apply(range_to_midpoint)

Using one-hot encoding for all the categorial variables in the dataset
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
categorical_columns = ['Department', 'gender', 'Type of Admission', 'Severity of
Illness', 'Insurance', 'Ward_Facility_Code', 'doctor_name', 'health_conditions']
numeric_columns = ['Available Extra Rooms in Hospital', 'staff_available', 'Age',
'Visitors with Patient', 'Admission_Deposit', 'Stay (in days)']

categorical_transformer = OneHotEncoder(sparse=False)

preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, categorical_columns)
],
remainder='passthrough'
)

df = preprocessor.fit_transform(df)

df = pd.DataFrame(df, columns=(
list(preprocessor.named_transformers_['cat'].get_feature_names_out(categorical_
columns)) +
numeric_columns
))

Segregating the dataset for testing and training

X = df.drop(columns=['Stay (in days)'])
y = df['Stay (in days)']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Standardizing the inputs

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Taking different k values in a range and plotting the R-square value of the model trained using
respective k values
neighbors = range(1, 16)
r2_scores = []

for k in neighbors:
knn = KNeighborsRegressor(n_neighbors=k)
knn.fit(X_train_scaled, y_train)
y_pred = knn.predict(X_test_scaled)
r2 = r2_score(y_test, y_pred)
r2_scores.append(r2)

plt.figure(figsize=(10, 6))
plt.plot(neighbors, r2_scores, marker='o', linestyle='-', color='b')
plt.title('R² Score vs. Number of Neighbors')
plt.xlabel('Number of Neighbors (k)')
plt.ylabel('R² Score')
plt.xticks(neighbors)
plt.grid(True)
plt.show()

Selecting the optimal k value for final training and printing the performance parameters of
the final model
knn = KNeighborsRegressor(n_neighbors=9)
knn.fit(X_train_scaled, y_train)
y_pred = knn.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
r2 = r2_score(y_test, y_pred)
print(f'R² Score: {r2}')

Output:
For KNN Classifier-
Optimal K: 9
Confusion Matrix:
[[69 2]
[ 2 41]]
Classification Report:
precision recall f1-score support

0 0.97 0.97 0.97 71

1 0.95 0.95 0.95 43

accuracy 0.96 114

macro avg 0.96 0.96 0.96 114
weighted avg 0.96 0.96 0.96 114
For KNN Regressor-
Mean Squared Error: 12.809999999999999
R² Score: 0.8339059871859069

Conclusion:
By performing this experiment, I was able to understand the concept of KNN and how it may
be used for regression and classification tasks. Following are the inferences as notable from
the experiment conducted-
▪ In case of KNN for classification, on plotting the validation error on different k values,
optimal k was found to be 9 with an error of 0.035, thereby, the optimal accuracy of
the model was found out to be around 96%.
▪ In case of KNN for regression, a similar plot (using R-square on y-axis) revealed the
optimal k value as 9 (after which, no improvements were noted). The R-square value
of the final model was thus found out to be 0.8339 while the MSE was calculated to be
12.81.

Soul Therapy A 365 Day Journal For Self Exploration Healing and
No ratings yet
Soul Therapy A 365 Day Journal For Self Exploration Healing and
388 pages
Aptis Writing C1
No ratings yet
Aptis Writing C1
33 pages
Education Planning For Quality Report
No ratings yet
Education Planning For Quality Report
46 pages
Sociology For Business (SOC-201) : Bba 5 Semester
100% (1)
Sociology For Business (SOC-201) : Bba 5 Semester
87 pages
Self Study Pack - English - Grade 8
No ratings yet
Self Study Pack - English - Grade 8
137 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Tau WH40k Aun - Shi
100% (2)
Tau WH40k Aun - Shi
31 pages
Using Algebra Tiles From Polynomials To Factoring Handout
No ratings yet
Using Algebra Tiles From Polynomials To Factoring Handout
13 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
04-05-2025 - INC JR IIT STAR CO SUPER CHAINA MODEL-A & B - Jee - Main - WTM-03 - KEY&SOL
No ratings yet
04-05-2025 - INC JR IIT STAR CO SUPER CHAINA MODEL-A & B - Jee - Main - WTM-03 - KEY&SOL
10 pages
ML Notes
100% (2)
ML Notes
125 pages
Chapter 3 PDF
100% (1)
Chapter 3 PDF
11 pages
ADS - Documentation - Channel Simulation
No ratings yet
ADS - Documentation - Channel Simulation
294 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Hardness Testing Technologies: Advanced
No ratings yet
Hardness Testing Technologies: Advanced
20 pages
KNN
No ratings yet
KNN
53 pages
Practical Magnetic Design - Inductors and Coupled Inductors (Article)
No ratings yet
Practical Magnetic Design - Inductors and Coupled Inductors (Article)
23 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
Rahul Raj - Ipynb - Colab
No ratings yet
Rahul Raj - Ipynb - Colab
50 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Diabetes Prediction System With KNN Algorithm
No ratings yet
Diabetes Prediction System With KNN Algorithm
29 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle
No ratings yet
Data Analysis of Diabetes Using Machine Learning: Dept. of Mechanical Engineering MITS, Madanapalle
17 pages
AIML Report (1) 11
No ratings yet
AIML Report (1) 11
13 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
Supervised Learningclassification Part2
No ratings yet
Supervised Learningclassification Part2
17 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
8 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
Practical 7
No ratings yet
Practical 7
6 pages
KNN Rainfall
No ratings yet
KNN Rainfall
9 pages
DSASSign 4
No ratings yet
DSASSign 4
11 pages
Disease
No ratings yet
Disease
15 pages
MLLABDA2
No ratings yet
MLLABDA2
5 pages
Final
No ratings yet
Final
13 pages
Introduction To KNN and R
No ratings yet
Introduction To KNN and R
12 pages
04 KNN Implementation
No ratings yet
04 KNN Implementation
7 pages
Worksheet - 2.3 20BCS7611
No ratings yet
Worksheet - 2.3 20BCS7611
6 pages
Supervised Learning
No ratings yet
Supervised Learning
10 pages
K Nearest Neighbour's (KNN) (1) Using R
No ratings yet
K Nearest Neighbour's (KNN) (1) Using R
9 pages
1 Supervise Learning (KNN) (Solution) : 1.1 Distance Measuring in Machine Learning
No ratings yet
1 Supervise Learning (KNN) (Solution) : 1.1 Distance Measuring in Machine Learning
14 pages
Here's An Visualization of The K-Nearest Neighbors Algorithm
No ratings yet
Here's An Visualization of The K-Nearest Neighbors Algorithm
5 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
IML Assingment Report
No ratings yet
IML Assingment Report
6 pages
N (0:1:40) A 1.2 F 0.1 X A Cos (2 Pi F N) Stem (N, X,'r','filled') Xlabel ('TIME') Ylabel ('AMPLITUDE')
No ratings yet
N (0:1:40) A 1.2 F 0.1 X A Cos (2 Pi F N) Stem (N, X,'r','filled') Xlabel ('TIME') Ylabel ('AMPLITUDE')
7 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
K-Nearest Neighbor Algorithm: Dataset Preparation
No ratings yet
K-Nearest Neighbor Algorithm: Dataset Preparation
6 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
KNN Model
No ratings yet
KNN Model
5 pages
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
7 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
K Nearest Neighbor Algorithm in Python - Towards Data Science
No ratings yet
K Nearest Neighbor Algorithm in Python - Towards Data Science
7 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
Year 7 Mathematics Semester 2 Examination, 2014: General Instructions
No ratings yet
Year 7 Mathematics Semester 2 Examination, 2014: General Instructions
12 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Reporte Guanaco Amancaya Technical Report Jun 16 2017
No ratings yet
Reporte Guanaco Amancaya Technical Report Jun 16 2017
377 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
PML Lab Exp 11
No ratings yet
PML Lab Exp 11
3 pages
KnnClassifier - Jupyter Notebook
No ratings yet
KnnClassifier - Jupyter Notebook
2 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
ML Lab Week 7
No ratings yet
ML Lab Week 7
4 pages
KNN Regression
No ratings yet
KNN Regression
3 pages
KNN Presentation
No ratings yet
KNN Presentation
19 pages
Lab 8
No ratings yet
Lab 8
7 pages
Advanced English Grammar
No ratings yet
Advanced English Grammar
2 pages
Untitled Document
No ratings yet
Untitled Document
1 page
04 Building An Archives A Case Study
No ratings yet
04 Building An Archives A Case Study
20 pages
Important Bird Areas (IBAs) in India
No ratings yet
Important Bird Areas (IBAs) in India
10 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
WME01 01 MSC 20190307 PDF
No ratings yet
WME01 01 MSC 20190307 PDF
15 pages
Holidays Homework Class VI-1
No ratings yet
Holidays Homework Class VI-1
3 pages
Experiment 5
No ratings yet
Experiment 5
14 pages
Psionics Handbook 0.8.2
No ratings yet
Psionics Handbook 0.8.2
49 pages
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
No ratings yet
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
29 pages
Experiment 8
No ratings yet
Experiment 8
13 pages
Paul Rabinow
No ratings yet
Paul Rabinow
15 pages
CHEM 266 DE - Module 10
No ratings yet
CHEM 266 DE - Module 10
6 pages
Melting Pot Theory
No ratings yet
Melting Pot Theory
4 pages
Experiment 7
No ratings yet
Experiment 7
6 pages
07 - An Exploration Into Some Dominant Features of Filipino Social Behavior
No ratings yet
07 - An Exploration Into Some Dominant Features of Filipino Social Behavior
8 pages
Experiment 3
No ratings yet
Experiment 3
5 pages
Experiment 2
No ratings yet
Experiment 2
12 pages
Scan Plan Paut-Mc-03 PDF
No ratings yet
Scan Plan Paut-Mc-03 PDF
5 pages
Bde Unit IV
No ratings yet
Bde Unit IV
21 pages
DSM Mini Project
No ratings yet
DSM Mini Project
11 pages
Lab6A-Asset Tracking
No ratings yet
Lab6A-Asset Tracking
27 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
GE2-AIS Soal Dan Lembar Jawaban (Melinda)
No ratings yet
GE2-AIS Soal Dan Lembar Jawaban (Melinda)
4 pages
Experiment 1
No ratings yet
Experiment 1
21 pages
Experiment 1
No ratings yet
Experiment 1
16 pages
DSM Practical 1
No ratings yet
DSM Practical 1
14 pages
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
No ratings yet
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
14 pages
Experiment 4
No ratings yet
Experiment 4
12 pages
Experiment 7
No ratings yet
Experiment 7
13 pages
Experiment 5
No ratings yet
Experiment 5
10 pages
Experiment 5
No ratings yet
Experiment 5
8 pages
Experiment 3
No ratings yet
Experiment 3
9 pages
Experiment 1
No ratings yet
Experiment 1
7 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Class Assignment On Decision Trees
No ratings yet
Class Assignment On Decision Trees
6 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
Class-Work-1 (26-08-2024)
No ratings yet
Class-Work-1 (26-08-2024)
5 pages
Class-Work-Naive-Bayes (21-10-2024)
No ratings yet
Class-Work-Naive-Bayes (21-10-2024)
5 pages
m201sp18PS22 hw7b
No ratings yet
m201sp18PS22 hw7b
1 page
True
No ratings yet
True
2 pages
Assignment On Module-3
No ratings yet
Assignment On Module-3
3 pages
Assignment-1, 2
No ratings yet
Assignment-1, 2
2 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Experiment 4

Uploaded by

Experiment 4

Uploaded by

Experiment - 4

Name: Ansari Mohammed Shanouf Valijan

Importing the required libraries and the dataset

Getting rid of irrelevant columns

Mapping the classes (Benign and Malignant) to numerals

y = y.map({'M': 1, 'B': 0})

Standardizing the inputs (X)

For KNN Regression-

Importing the required libraries and the dataset

Getting rid of irrelevant columns

Segregating the dataset for testing and training

Standardizing the inputs

0 0.97 0.97 0.97 71

accuracy 0.96 114

You might also like