0% found this document useful (0 votes)

75 views5 pages

Here's An Visualization of The K-Nearest Neighbors Algorithm

Uploaded by

akif barbaros dikmen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views5 pages

Here's An Visualization of The K-Nearest Neighbors Algorithm

Uploaded by

akif barbaros dikmen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

K-Nearest

Neighbors
The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier, which uses proximity
to make classifications or predictions about the grouping of an individual data point.

Here's an visualization of the K-Nearest Neighbors algorithm.

In this case, we have data points of Class A and B. We want to predict what the question mark box (test data point) is. If we consider a k
value of 1 (1 nearest data point), we will obtain a prediction of Class A.

In this sense, it is important to consider the value of k. Hopefully from this diagram, you should get a sense of what the K-Nearest
Neighbors algorithm is. It considers the 'K' Nearest Neighbors (data points) when it predicts the classification of the test point.

Importing required packages

import matplotlib.pyplot as plt

import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import numpy as np
from sklearn.model_selection import train_test_split
%matplotlib inline

Let's download and import the data on China's GDP using pandas read_csv() method.

Download Dataset

Understanding the Data

telecomData.csv :
Let's imagine telecommunications provider has segmented its customer base by service usage patterns, categorizing the customers into
four groups. If demographic data can be used to predict group membership, the company can customize offers for individual prospective
customers. It is a classification problem. That is, given the dataset, with predefined labels, we need to build a model to be used to predict
class of a new or unknown case.The example focuses on using demographic data, such as region, age, and marital, to predict usage
patterns.The target field, called custcat, has four possible values that correspond to the four customer groups 1st Basic Service, 2nd E-
Service, 3rd Plus Service and 4thTotal Service.

Our objective is to build a classifier, to predict the class of unknown cases. We will use a specific type of classification called K nearest
Our objective is to build a classifier, to predict the class of unknown cases. We will use a specific type of classification called K nearest
neighbour.

Reading the data

df = pd.read_csv("telecomData.csv")

# take a look at the dataset

df.head()

region tenure age marital address income ed employ retire gender reside custcat

0 2 13 44 1 9 64.0 4 5 0.0 0 2 1

1 3 11 33 1 7 136.0 5 5 0.0 0 6 4

2 3 68 52 1 24 116.0 1 29 0.0 1 2 3

3 2 33 33 0 12 33.0 2 0 0.0 1 1 1

4 2 23 30 1 9 30.0 1 2 0.0 0 4 3

Data Exploration
Let's first have a descriptive exploration on our data.

df.describe()

region tenure age marital address income ed employ retire gender resid

count 1000.0000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.00000

mean 2.0220 35.526000 41.684000 0.495000 11.551000 77.535000 2.671000 10.987000 0.047000 0.517000 2.33100

std 0.8162 21.359812 12.558816 0.500225 10.086681 107.044165 1.222397 10.082087 0.211745 0.499961 1.43579

min 1.0000 1.000000 18.000000 0.000000 0.000000 9.000000 1.000000 0.000000 0.000000 0.000000 1.00000

25% 1.0000 17.000000 32.000000 0.000000 3.000000 29.000000 2.000000 3.000000 0.000000 0.000000 1.00000

50% 2.0000 34.000000 40.000000 0.000000 9.000000 47.000000 3.000000 8.000000 0.000000 1.000000 2.00000

75% 3.0000 54.000000 51.000000 1.000000 18.000000 83.000000 4.000000 17.000000 0.000000 1.000000 3.00000

max 3.0000 72.000000 77.000000 1.000000 55.000000 1668.000000 5.000000 47.000000 1.000000 1.000000 8.00000

Data Visualization and Analysis

Let’s see how many of each class is in our data set

df['custcat'].value_counts()

3 281
1 266
4 236
2 217
Name: custcat, dtype: int64

281 Plus Service, 266 Basic-service, 236 Total Service, and 217 E-Service customers

You can easily explore your data using visualization techniques:

df.hist(column='income', bins=50)

array([[<AxesSubplot:title={'center':'income'}>]], dtype=object)

df.columns
df.columns

Index(['region', 'tenure', 'age', 'marital', 'address', 'income', 'ed',

'employ', 'retire', 'gender', 'reside', 'custcat'],
dtype='object')

df.hist(column='tenure', bins=50)

array([[<AxesSubplot:title={'center':'tenure'}>]], dtype=object)

X = df[['region', 'tenure','age', 'marital', 'address', 'income', 'ed', 'employ','retire', 'gender', 'reside']].

y = df['custcat'].values

Normalization
Data Standardization gives the data zero mean and unit variance, it is good practice, especially for algorithms such as KNN which is
based on the distance of data points

X = StandardScaler().fit(X).transform(X.astype(float))

type(X)

numpy.ndarray

Creating train and test dataset

Train/Test Split involves splitting the dataset into training and testing sets respectively, which are mutually exclusive. After which, you
train with the training set and test with the testing set. This will provide a more accurate evaluation on out-of-sample accuracy because
the testing dataset is not part of the dataset that have been used to train the model. Therefore, it gives us a better understanding of how
well our model generalizes on new data.

We know the outcome of each data point in the testing dataset, making it great to test with! Since this data has not been used to train the
model, the model has no knowledge of the outcome of these data points. So, in essence, it is truly an out-of-sample testing.

Let's split our dataset into train and test sets. Around 80% of the entire dataset will be used for training and 20% for testing.

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)

print ('Train set:', X_train.shape, y_train.shape)
print ('Test set:', X_test.shape, y_test.shape)

Train set: (800, 11) (800,)

Test set: (200, 11) (200,)

Classification
k = 4
#Train Model and Predict
model = KNeighborsClassifier(n_neighbors = k).fit(X_train,y_train)
model

KNeighborsClassifier(n_neighbors=4)

y_pred = model.predict(X_test)

Evaluation
print("Train set Accuracy: ", accuracy_score(y_train, model.predict(X_train)))
print("Test set Accuracy: ", accuracy_score(y_test, y_pred))

Train set Accuracy: 0.45125

Test set Accuracy: 0.345
Exercise
Try to fit KNN with k=8 with the dataset

Click here for the solution

What about other K?

K in KNN, is the number of nearest neighbors to examine. It is supposed to be specified by the user. So, how can we choose right value
for K? The general solution is to reserve a part of your data for testing the accuracy of the model. Then choose k =1, use the training part
for modeling, and calculate the accuracy of prediction using all samples in your test set. Repeat this process, increasing the k, and see
which k is the best for your model.

We can calculate the accuracy of KNN for different values of k.

Ks = 20
mean_acc = np.zeros((Ks-1))
std_acc = np.zeros((Ks-1))

for n in range(1,Ks):

#Train Model and Predict
model = KNeighborsClassifier(n_neighbors = n).fit(X_train,y_train)
y_pred=model.predict(X_test)
mean_acc[n-1] = accuracy_score(y_test, y_pred)

std_acc[n-1]=np.std(y_pred==y_test)/np.sqrt(y_pred.shape[0])

mean_acc

array([0.3 , 0.29 , 0.315, 0.32 , 0.315, 0.31 , 0.335, 0.325, 0.34 ,

0.33 , 0.315, 0.34 , 0.33 , 0.315, 0.34 , 0.36 , 0.355, 0.35 ,
0.345])

Plot the model accuracy for a different number of neighbors.

plt.plot(range(1,Ks),mean_acc,'g')
plt.fill_between(range(1,Ks),mean_acc - 1 * std_acc,mean_acc + 1 * std_acc, alpha=0.10)
plt.fill_between(range(1,Ks),mean_acc - 3 * std_acc,mean_acc + 3 * std_acc, alpha=0.10,color="green")
plt.legend(('Accuracy ', '+/- 1xstd','+/- 3xstd'))
plt.ylabel('Accuracy ')
plt.xlabel('Number of Neighbors (K)')
plt.tight_layout()
plt.show()

print( "The best accuracy was with", mean_acc.max(), "with k=", mean_acc.argmax()+1)

The best accuracy was with 0.36 with k= 16

Thank you

Author
Moazzam Ali

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
Rahul Raj - Ipynb - Colab
No ratings yet
Rahul Raj - Ipynb - Colab
50 pages
Machine Learning With Python - Machine Learning Algorithms - KNN
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - KNN
15 pages
K-Nearest Neighbor (KNN) 6
No ratings yet
K-Nearest Neighbor (KNN) 6
46 pages
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
No ratings yet
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
21 pages
ML Practical Kiranjot 6-10
No ratings yet
ML Practical Kiranjot 6-10
10 pages
DSASSign 4
No ratings yet
DSASSign 4
11 pages
ML Practical Manjot 6-10
No ratings yet
ML Practical Manjot 6-10
10 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
KNN - Predictive Analysis
No ratings yet
KNN - Predictive Analysis
6 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Practical 7
No ratings yet
Practical 7
6 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
CH 4
100% (1)
CH 4
113 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
K-Nearest Neighbors
100% (1)
K-Nearest Neighbors
32 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
KNN Model
No ratings yet
KNN Model
5 pages
KNN Lab
No ratings yet
KNN Lab
4 pages
K-Nearest Neighbor: General Gist
No ratings yet
K-Nearest Neighbor: General Gist
14 pages
Explanation:: You Said
No ratings yet
Explanation:: You Said
4 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
Act 8
No ratings yet
Act 8
20 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
ML 3
No ratings yet
ML 3
6 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
KNN Classifier
No ratings yet
KNN Classifier
5 pages
Artificial Intelligence Lab 7
No ratings yet
Artificial Intelligence Lab 7
10 pages
ML Experiment - 9 - Final
No ratings yet
ML Experiment - 9 - Final
6 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
ML Practical Kunal 6-10
No ratings yet
ML Practical Kunal 6-10
10 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
ML Lab2 PGM
No ratings yet
ML Lab2 PGM
3 pages
Untitled 9
No ratings yet
Untitled 9
17 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
Lecture 5
No ratings yet
Lecture 5
114 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
2 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Dhanashree ML Report
No ratings yet
Dhanashree ML Report
3 pages
ML Notes
100% (2)
ML Notes
125 pages
KMEANS
No ratings yet
KMEANS
9 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
Lab 8
No ratings yet
Lab 8
7 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Machine Learning KNN - Supervised
No ratings yet
Machine Learning KNN - Supervised
9 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Solution 1
No ratings yet
Solution 1
6 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Machine Learning Classification in Qgis
No ratings yet
Machine Learning Classification in Qgis
17 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
Lab Report 5
No ratings yet
Lab Report 5
6 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Short Question EE
No ratings yet
Short Question EE
6 pages
IC-GASMOTDS-2025 Brochure 250710 130719
No ratings yet
IC-GASMOTDS-2025 Brochure 250710 130719
8 pages
A Review of The Applications of Artificial Intelligence - 2024 - Energy Conversi
No ratings yet
A Review of The Applications of Artificial Intelligence - 2024 - Energy Conversi
24 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
Grasshopper Optimization Algorithm Based Design of Structures
No ratings yet
Grasshopper Optimization Algorithm Based Design of Structures
1 page
Ds Lab Manual
No ratings yet
Ds Lab Manual
32 pages
Report Analysis: Over-View of The Dataset
No ratings yet
Report Analysis: Over-View of The Dataset
6 pages
MAT 240 Module Three Assignment
No ratings yet
MAT 240 Module Three Assignment
3 pages
Simple Linear Regression Analysis: Mcgraw-Hill/Irwin
No ratings yet
Simple Linear Regression Analysis: Mcgraw-Hill/Irwin
16 pages
Presented by Prof. Dr. A. M. Siddiqui Penn State University, York, USA
No ratings yet
Presented by Prof. Dr. A. M. Siddiqui Penn State University, York, USA
18 pages
Correlation
No ratings yet
Correlation
4 pages
Jacobian
No ratings yet
Jacobian
9 pages
The University of Zambia Department of Mathematics & Statistics MAT1110: Foundation Mathematics & Statistics For Social Sciences Test 2
No ratings yet
The University of Zambia Department of Mathematics & Statistics MAT1110: Foundation Mathematics & Statistics For Social Sciences Test 2
2 pages
Game Theory
No ratings yet
Game Theory
7 pages
Scaling Vision Transformers: Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer
No ratings yet
Scaling Vision Transformers: Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer
31 pages
Program 4
No ratings yet
Program 4
8 pages
Lec 1 Numerical Analysis Sp24
No ratings yet
Lec 1 Numerical Analysis Sp24
27 pages
Clustering Data With Measurement Errors: Mahesh Kumar, Nitin R. Patel, James B. Orlin Operations Research Center, MIT
No ratings yet
Clustering Data With Measurement Errors: Mahesh Kumar, Nitin R. Patel, James B. Orlin Operations Research Center, MIT
26 pages
Transportation and Assignment Problem
No ratings yet
Transportation and Assignment Problem
4 pages
TD4 SDC
No ratings yet
TD4 SDC
3 pages
ChiTransformer Towards Reliable Stereo From Cues
No ratings yet
ChiTransformer Towards Reliable Stereo From Cues
11 pages
AI Assignment # 1
No ratings yet
AI Assignment # 1
11 pages
New Jersey Institute of Technology AI COurse Syllabus
No ratings yet
New Jersey Institute of Technology AI COurse Syllabus
4 pages
Assignment No.6
No ratings yet
Assignment No.6
8 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Bma3201 - Operations Research - Cat - May - Aug 2020
No ratings yet
Bma3201 - Operations Research - Cat - May - Aug 2020
2 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet

Here's An Visualization of The K-Nearest Neighbors Algorithm

Uploaded by

Here's An Visualization of The K-Nearest Neighbors Algorithm

Uploaded by

K-Nearest

Here's an visualization of the K-Nearest Neighbors algorithm.

Importing required packages

import matplotlib.pyplot as plt

Understanding the Data

Reading the data

# take a look at the dataset

Data Visualization and Analysis

You can easily explore your data using visualization techniques:

Index(['region', 'tenure', 'age', 'marital', 'address', 'income', 'ed',

X = df[['region', 'tenure','age', 'marital', 'address', 'income', 'ed', 'employ','retire', 'gender', 'reside']].

Creating train and test dataset

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)

Train set: (800, 11) (800,)

Train set Accuracy: 0.45125

Click here for the solution

What about other K?

We can calculate the accuracy of KNN for different values of k.

array([0.3 , 0.29 , 0.315, 0.32 , 0.315, 0.31 , 0.335, 0.325, 0.34 ,

Plot the model accuracy for a different number of neighbors.

The best accuracy was with 0.36 with k= 16

© MT Learners 2022. All rights reserved.

You might also like