0% found this document useful (0 votes)

228 views16 pages

K Nearest Neighbor - Step by Step Tutorial

Uploaded by

Ankit Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

228 views16 pages

K Nearest Neighbor - Step by Step Tutorial

Uploaded by

Ankit Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

8/6/2018 K Nearest Neighbor : Step by Step Tutorial

ABOUT INDEX WRITE FOR US

HOME SAS R PYTHON DATA SCIENCE SQL EXCEL VBA SPSS RESOURCES INFOGRAPHICS MORE SEARCH... GO

Home » Data Science » knn » Machine Learning » R » K Nearest Neighbor : Step by Step Tutorial Follow us on Facebook


K NEAREST NEIGHBOR : STEP BY STEP TUTORIAL Join us with 5000+ Subscribers

Deepanshu Bhalla 1 Comment Data Science, knn, Machine Learning, R
Subscribe to Free Updates
In this article, we will cover how K-nearest neighbor (KNN) algorithm works and how to run k- Enter your email... Subscribe

nearest neighbor in R. It is one of the most widely used algorithm for classification problems.

K-Nearest Neighbor Simplified

Introduction to K-Nearest Neighbor (KNN)

Knn is a non-parametric supervised learning technique in which we try to classify the data

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 1/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

point to a given category with the help of training set. In simple words, it captures information
of all training cases and classifies new cases based on a similarity.

Predictions are made for a new instance (x) by searching through the entire
training set for the K most similar cases (neighbors) and summarizing the
output variable for those K cases. In classification this is the mode (or most
common) class value.

How KNN algorithm works

Suppose we have height, weight and T-shirt size of some customers and we need to predict
the T-shirt size of a new customer given only height and weight information we have. Data
including height, weight and T-shirt size information is shown below -

Height (in cms) Weight (in kgs) T Shirt Size

158 58 M

158 59 M

158 63 M

160 59 M

160 60 M

163 60 M

163 61 M

160 64 L

163 64 L

165 61 L

165 62 L

165 65 L

168 62 L

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 2/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

168 63 L

168 66 L

170 63 L

170 64 L

170 68 L

Step 1 : Calculate Similarity based on distance function

There are many distance functions but Euclidean is the most commonly used measure. It is
mainly used when data is continuous. Manhattan distance is also very common for
continuous variables.

Distance Functions

The idea to use distance measure is to find the distance (similarity) between new sample and
training cases and then finds the k-closest customers to new customer in terms of height and
weight.

New customer named 'Monica' has height 161cm and weight 61kg.

Euclidean distance between first observation and new observation (monica) is as follows -

=SQRT((161-158)^2+(61-58)^2)

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 3/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

Similarly, we will calculate distance of all the training cases with new case and calculates the
rank in terms of distance. The smallest distance value will be ranked 1 and considered as
nearest neighbor.

Step 2 : Find K-Nearest Neighbors

Let k be 5. Then the algorithm searches for the 5 customers closest to Monica, i.e. most
similar to Monica in terms of attributes, and see what categories those 5 customers were in. If
4 of them had ‘Medium T shirt sizes’ and 1 had ‘Large T shirt size’ then your best guess for
Monica is ‘Medium T shirt. See the calculation shown in the snapshot below -

Calculate KNN manually

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 4/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

In the graph below, binary dependent variable (T-shirt size) is displayed in blue and orange
color. 'Medium T-shirt size' is in blue color and 'Large T-shirt size' in orange color. New
customer information is exhibited in yellow circle. Four blue highlighted data points and one
orange highlighted data point are close to yellow circle. so the prediction for the new case is
blue highlighted data point which is Medium T-shirt size.

KNN: Visual Representation

Assumptions of KNN

1. Standardization

When independent variables in training data are measured in different units, it is important to
standardize variables before calculating distance. For example, if one variable is based on
height in cms, and the other is based on weight in kgs then height will influence more on the
distance calculation. In order to make them comparable we need to standardize them which
can be done by any of the following methods :

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 5/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

Standardization

After standardization, 5th closest value got changed as height was dominating earlier before
standardization. Hence, it is important to standardize predictors before running K-nearest
neighbor algorithm.

Knn after standardization

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 6/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

2. Outlier

Low k-value is sensitive to outliers and a higher K-value is more resilient to outliers as it
considers more voters to decide prediction.

Why KNN is non-parametric?

Non-parametric means not making any assumptions on the underlying data distribution. Non-
parametric methods do not have fixed numbers of parameters in the model. Similarly in KNN,
model parameters actually grows with the training data set - you can imagine each training
case as a "parameter" in the model.

KNN vs. K-mean

Many people get confused between these two statistical techniques- K-mean and K-nearest
neighbor. See some of the difference below -

1. K-mean is an unsupervised learning technique (no dependent variable) whereas KNN

is a supervised learning algorithm (dependent variable exists)
2. K-mean is a clustering technique which tries to split data points into K-clusters such
that the points in each cluster tend to be near each other whereas K-nearest neighbor
tries to determine the classification of a point, combines the classification of the K
nearest points

Can KNN be used for regression?

Yes, K-nearest neighbor can be used for regression. In other words, K-nearest
neighbor algorithm can be applied when dependent variable is continuous. In
this case, the predicted value is the average of the values of its k nearest
neighbors.

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 7/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

Pros and Cons of KNN

Pros

1. Easy to understand
2. No assumptions about data
3. Can be applied to both classification and regression
4. Works easily on multi-class problems

Cons

1. Memory Intensive / Computationally expensive

2. Sensitive to scale of data
3. Not work well on rare event (skewed) target variable
4. Struggle when high number of independent variables

For any given problem, a small value of k will lead to a large variance in
predictions. Alternatively, setting k to a large value may lead to a large model
bias.

How to handle categorical variables in KNN?

Create dummy variables out of a categorical variable and include them instead of original
categorical variable. Unlike regression, create k dummies instead of (k-1). For example, a
categorical variable named "Department" has 5 unique levels / categories. So we will create 5
dummy variables. Each dummy variable has 1 against its department and else 0.

How to find best K value?

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 8/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

Cross-validation is a smart way to find out the optimal K value. It estimates the validation
error rate by holding out a subset of the training set from the model building process.

Cross-validation (let's say 10 fold validation) involves randomly dividing the training set into
10 groups, or folds, of approximately equal size. 90% data is used to train the model and
remaining 10% to validate it. The misclassification rate is then computed on the 10%
validation data. This procedure repeats 10 times. Different group of observations are treated
as a validation set each of the 10 times. It results to 10 estimates of the validation error which
are then averaged out.

K Nearest Neighbor in R

We are going to use historical data of past win/loss statistics and the corresponding
speeches. This dataset comprises of 1524 observations on 14 variables. Dependent variable
is win/loss where 1 indicates win and 0 indicates loss. The independent variables are:

1. Proportion of words in the speech showing

a. Optimism
b. Pessimism
c. the use of Past
d. the use of Present
e. the use of Future

2. Number of time he/she mentions his/her own party

3. Number of time he/she mentions his/her opposite parties.

4. Some measure indicating the content of speech showing

a. Openness
b. Conscientiousness
c. Extraversion
d. Agreeableness
e. Neuroticism
f. emotionality

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 9/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

Download Link : Data File

Read Data

# Read data
data1 = read.csv("US Presidential Data.csv")
View(data1)

We read the CSV file with the help of read.csv command. Here the first argument is the
name of the dataset. The second argument - Header = TRUE or T implies that the first row
in our csv file denotes the headings while header = FALSE or F indicates that the data should
be read from the first line and does not involves any headings.

# load library
library(caret)
library(e1071)

# Transforming the dependent variable to a factor

data1$Win.Loss = as.factor(data1$Win.Loss)

Here we will use caret package in order to run knn. Since my dependent variable is numeric
here thus we need to transform it to factor using as.factor().

#Partitioning the data into training and validation data

set.seed(101)
index = createDataPartition(data1$Win.Loss, p = 0.7, list = F )
train = data1[index,]
validation = data1[-index,]

In order to partition the data into training and validation sets we use createDataPartition()
function in caret.

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 10/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

Firstly we set the seed to be 101 so that the same results can be obtained. In the
createDataPartition() the first argument is the dependent variable , p denotes how much
data we want in the training set; here we take 70% of the data in training set and rest in cross
validation set, list = F denotes that the indices we obtain should be in form of a vector.

# Explore data
dim(train)
dim(validation)
names(train)
head(train)
head(validation)

The dimensions of training and validation sets are checked via dim(). See first 6 rows of
training dataset -

Win.Loss Optimism Pessimism PastUsed FutureUsed PresentUsed OwnPartyCount

1 X1 0.10450450 0.05045045 0.4381443 0.4948454 0.06701031 2
3 X1 0.11257190 0.04930156 0.4159664 0.5168067 0.06722689 1
5 X1 0.10582640 0.05172414 0.3342618 0.5821727 0.08356546 3
7 X1 0.09838275 0.06401617 0.3240741 0.6018519 0.07407407 6
9 X1 0.10610734 0.04688464 0.3633540 0.5372671 0.09937888 2
10 X1 0.10066128 0.05951506 0.3554817 0.5382060 0.10631229 1
OppPartyCount NumericContent Extra Emoti Agree Consc Openn
1 2 0.001877543 4.041 4.049 3.469 2.450 2.548
3 1 0.002131163 3.463 4.039 3.284 2.159 2.465
5 4 0.002229220 4.658 4.023 3.283 2.415 2.836
7 4 0.002251985 3.727 4.108 3.357 2.128 2.231
9 5 0.002446440 4.119 4.396 3.661 2.572 2.599
10 2 0.002107436 3.800 4.501 3.624 2.117 2.154

By default, levels of dependent variable in this dataset is "0" "1". Later when we will do
prediction, these levels will be used as variable names for prediction so we need to make it
valid variable names.

# Setting levels for both training and validation data

levels(train$Win.Loss) <- make.names(levels(factor(train$Win.Loss)))
levels(validation$Win.Loss) <-
make.names(levels(factor(validation$Win.Loss)))

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 11/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

Here we are using repeated cross validation method using trainControl . Number denotes
either the number of folds and ‘repeats’ is for repeated ‘r’ fold cross validation. In this case, 3
separate 10-fold validations are used.

# Setting up train controls

repeats = 3
numbers = 10
tunel = 10

set.seed(1234)
x = trainControl(method = "repeatedcv",
number = numbers,
repeats = repeats,
classProbs = TRUE,
summaryFunction = twoClassSummary)

Using train() function we run our knn; Win.Loss is dependent variable, the full stop after
tilde denotes all the independent variables are there. In ‘data=’ we pass our training set,
‘method=’ denotes which technique we want to deploy, setting preProcess to center and
scale tells us that we are standardizing our independent variables

center : subtract mean from values.

scale : divide values by standard deviation.

trControl demands our ‘x’ which was obtained via train( ) and tunelength is always an
integer which is used to tune our algorithm.

model1 <- train(Win.Loss~. , data = train, method = "knn",

preProcess = c("center","scale"),
trControl = x,
metric = "ROC",
tuneLength = tunel)

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 12/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

# Summary of model
model1
plot(model1)

k-Nearest Neighbors

1068 samples
13 predictor
2 classes: 'X0', 'X1'

Pre-processing: centered (13), scaled (13)

Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 961, 962, 961, 962, 961, 962, ...
Resampling results across tuning parameters:

k ROC Sens Spec

5 0.8440407 0.6910182 0.8382051
7 0.8537506 0.6847658 0.8520513
9 0.8575183 0.6712350 0.8525796
11 0.8588422 0.6545296 0.8592152
13 0.8585478 0.6560976 0.8556333
15 0.8570397 0.6432249 0.8648329
17 0.8547545 0.6448509 0.8627894
19 0.8520574 0.6336043 0.8632867
21 0.8484632 0.6215447 0.8627894
23 0.8453320 0.6071622 0.8658664

ROC was used to select the optimal model using the largest value.
The final value used for the model was k = 11.

Cross Validation : Fine Tuning

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 13/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

Finally to make predictions on our validation set, we use predict function in which the first
argument is the formula to be applied and second argument is the new data on which we
want the predictions.

# Validation
valid_pred <- predict(model1,validation, type = "prob")

#Storing Model Performance Scores

library(ROCR)
pred_val <-prediction(valid_pred[,2],validation$Win.Loss)

# Calculating Area under Curve (AUC)

perf_val <- performance(pred_val,"auc")
perf_val

# Plot AUC
perf_val <- performance(pred_val, "tpr", "fpr")
plot(perf_val, col = "green", lwd = 1.5)

#Calculating KS statistics
ks <- max(attr(perf_val, "y.values")[[1]] - (attr(perf_val, "x.values")[[1]]))
ks

The Area under curve (AUC) on validation dataset is 0.8642.

Special thanks to Ekta Aggarwal for her contribution in this article. She is a co-author of
this article. She is a Data Science enthusiast, currently in the final year of her post graduation
in statistics from Delhi University.

R Tutorials : 75 Free R Tutorials

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 14/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial

Love this Post? Spread the Word

Facebook LinkedIn Twitter

About Author:
Deepanshu founded ListenData with a simple objective - Make analytics easy to
understand and follow. He has over 7 years of experience in data science and
predictive modeling. During his tenure, he has worked with global clients in
various domains like banking, Telecom, HR and Health Insurance.

While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates : Enter your email address Submit

*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:
Understanding Bias-Variance Tradeoff
Ensemble Methods in R : Practical Guide
GBM (Boosted Models) Tuning Parameters
Dimensionality Reduction with R
Take Screenshot of Webpage using R
Run Python from R
15 Types of Regression you should know
Web Scraping Website with R
Tutorial : Build Webapp in R using Shiny
K Nearest Neighbor : Step by Step Tutorial
Python for Data Science : Learn in 3 Days

1 Response to "K Nearest Neighbor : Step by Step Tutorial"

Mashetty Aman 29 January 2018 at 09:40

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 15/16
8/6/2018 K Nearest Neighbor : Step by Step Tutorial
Really you explained it where well u deserve my salute u clear my all doubt with best example us
president

Enter your comment...

Comment as: alwin.anuse@m Sign out

Publish Preview Notify me

← PREV NEXT →

https://www.listendata.com/2017/12/k-nearest-neighbor-step-by-step-tutorial.html 16/16

Athlean X BUILT For Size
50% (2)
Athlean X BUILT For Size
9 pages
(Ebook PDF) Physics For The Life Sciences 3rd Canadian Edition PDF Download
100% (2)
(Ebook PDF) Physics For The Life Sciences 3rd Canadian Edition PDF Download
50 pages
CV Notes PDF
No ratings yet
CV Notes PDF
206 pages
Final Year Project (Lie Detector) Report
67% (3)
Final Year Project (Lie Detector) Report
75 pages
Ai & ML Digital Notes
No ratings yet
Ai & ML Digital Notes
177 pages
Image Analysis, Classification and Change Detection in Remote Sensing
No ratings yet
Image Analysis, Classification and Change Detection in Remote Sensing
6 pages
Pattern Recognition - A Statistical Approach
No ratings yet
Pattern Recognition - A Statistical Approach
6 pages
Executive Data Science
100% (1)
Executive Data Science
6 pages
Wind Effects On Structures
No ratings yet
Wind Effects On Structures
53 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Midterm Solution
No ratings yet
Midterm Solution
6 pages
Chap1 Introduction To Artificial Intelligence
67% (3)
Chap1 Introduction To Artificial Intelligence
124 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Week03 - 1 - KNN
No ratings yet
Week03 - 1 - KNN
32 pages
Computational Biology and Bioinformatics
100% (1)
Computational Biology and Bioinformatics
11 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
Detection and Classification of Dental Caries in X-Ray Images Using Deep Neural Networks
No ratings yet
Detection and Classification of Dental Caries in X-Ray Images Using Deep Neural Networks
5 pages
Module 1:image Representation and Modeling
No ratings yet
Module 1:image Representation and Modeling
48 pages
3.11 Artificial Intelligence and Robotics
No ratings yet
3.11 Artificial Intelligence and Robotics
15 pages
Unit-I Introduction To Image Processing
No ratings yet
Unit-I Introduction To Image Processing
23 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
EEE1007 Neural Network and Fuzzy Control
No ratings yet
EEE1007 Neural Network and Fuzzy Control
2 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
Understanding The Universe An Inquiry Approach To Astronomy and The Nature of Scientific Research (George Greenstein) (Z-Library)
No ratings yet
Understanding The Universe An Inquiry Approach To Astronomy and The Nature of Scientific Research (George Greenstein) (Z-Library)
1,637 pages
Mastering Machine Learning With Scikit-Learn: Chapter No. 5 "Nonlinear Classification and Regression With Decision Trees"
No ratings yet
Mastering Machine Learning With Scikit-Learn: Chapter No. 5 "Nonlinear Classification and Regression With Decision Trees"
23 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
Large-Scale Deep Reinforcement Learning
No ratings yet
Large-Scale Deep Reinforcement Learning
6 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
Image Segmentation DeepLearning
No ratings yet
Image Segmentation DeepLearning
18 pages
Foraminifera and Climate: Sindia Sosdian Rutgers University
No ratings yet
Foraminifera and Climate: Sindia Sosdian Rutgers University
29 pages
Introduction To Pattern Recognition and Machine Learning PDF
No ratings yet
Introduction To Pattern Recognition and Machine Learning PDF
402 pages
1 What Is Bioinformatics
No ratings yet
1 What Is Bioinformatics
34 pages
Classification of Mushroom Fungi Using Machine Lea
No ratings yet
Classification of Mushroom Fungi Using Machine Lea
8 pages
Principles of Remote Sensing
No ratings yet
Principles of Remote Sensing
410 pages
Unit 4 Data Science
No ratings yet
Unit 4 Data Science
21 pages
IS-ZC444: A I: Rtificial Ntelligence
No ratings yet
IS-ZC444: A I: Rtificial Ntelligence
26 pages
Statistical Learning Methods
No ratings yet
Statistical Learning Methods
28 pages
CVR 4
No ratings yet
CVR 4
38 pages
Convolutional Neural Network For Satellite Image Classification
100% (1)
Convolutional Neural Network For Satellite Image Classification
14 pages
Example of 2D Convolution
No ratings yet
Example of 2D Convolution
5 pages
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
No ratings yet
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
40 pages
Pixxel - The Hyperspectral Advantage in Agriculture
No ratings yet
Pixxel - The Hyperspectral Advantage in Agriculture
17 pages
004artificial Intelligence 3rd Ed by Elaine Rich Kevin Knight Amp Shivashankar Nair
No ratings yet
004artificial Intelligence 3rd Ed by Elaine Rich Kevin Knight Amp Shivashankar Nair
44 pages
4 6 Expert Systems 1
100% (1)
4 6 Expert Systems 1
46 pages
Kernel-Based-Data-Fusion-For-Machine-Learning - Methods-And-Applications-In-Bioinformatics-And-Text-Mining - (Yu,-Tranchevent,-De-Moor - Moreau-2011-03-26) - (Cuuduongthancong - Com) PDF
No ratings yet
Kernel-Based-Data-Fusion-For-Machine-Learning - Methods-And-Applications-In-Bioinformatics-And-Text-Mining - (Yu,-Tranchevent,-De-Moor - Moreau-2011-03-26) - (Cuuduongthancong - Com) PDF
228 pages
POL BigDataStatisticsJune2014
No ratings yet
POL BigDataStatisticsJune2014
27 pages
The Emergence and Evolution of Earth System Science
No ratings yet
The Emergence and Evolution of Earth System Science
10 pages
Lecture 6 - State Space Search - Uninformed Search
No ratings yet
Lecture 6 - State Space Search - Uninformed Search
43 pages
OPTICS: Ordering Points To Identify The Clustering Structure
No ratings yet
OPTICS: Ordering Points To Identify The Clustering Structure
10 pages
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
Segmentation
100% (1)
Segmentation
51 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
PPT1
No ratings yet
PPT1
93 pages
Unit - I IDS
No ratings yet
Unit - I IDS
33 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
Methods For Studying Proteins
No ratings yet
Methods For Studying Proteins
96 pages
Artificial Intelligence Unit IV
No ratings yet
Artificial Intelligence Unit IV
105 pages
Introduction To K-Nearest Neighbor (KNN) : Height (In CMS) Weight (In KGS) T Shirt Size
No ratings yet
Introduction To K-Nearest Neighbor (KNN) : Height (In CMS) Weight (In KGS) T Shirt Size
5 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
1694600817-Unit2.3 KNN CU 2.0
No ratings yet
1694600817-Unit2.3 KNN CU 2.0
25 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
HAL Management Trainee Answer Key 2022
No ratings yet
HAL Management Trainee Answer Key 2022
3 pages
Final Year Project (Lie Detector)
No ratings yet
Final Year Project (Lie Detector)
10 pages
Seminar (PLASMONICS)
No ratings yet
Seminar (PLASMONICS)
10 pages
Studentinfo Homework
No ratings yet
Studentinfo Homework
11 pages
Food Science and Technology - FULYA
No ratings yet
Food Science and Technology - FULYA
23 pages
Disposal of Plastic Bags
No ratings yet
Disposal of Plastic Bags
15 pages
The TOEFL ITP Tests at A Glance
No ratings yet
The TOEFL ITP Tests at A Glance
4 pages
Med Plus
No ratings yet
Med Plus
20 pages
Bell ADT D-Series General Info
100% (1)
Bell ADT D-Series General Info
32 pages
Ampere's Law
No ratings yet
Ampere's Law
20 pages
FINAL EXAM - Reading and Writing
No ratings yet
FINAL EXAM - Reading and Writing
3 pages
Ra21vss1 07 Explode View and Parts List
No ratings yet
Ra21vss1 07 Explode View and Parts List
11 pages
Cubase SX SL 2 Ignite 1st Edition Chris Hawkins Download
100% (1)
Cubase SX SL 2 Ignite 1st Edition Chris Hawkins Download
85 pages
List of Teaching Staff AY 2016-2017
No ratings yet
List of Teaching Staff AY 2016-2017
2 pages
WhitespaceAlpha Deck Jan'25
No ratings yet
WhitespaceAlpha Deck Jan'25
18 pages
Amazon Cassette R-22 50hz Heat Pump
No ratings yet
Amazon Cassette R-22 50hz Heat Pump
4 pages
Writing Effective Covering Letters
No ratings yet
Writing Effective Covering Letters
3 pages
In Pursuit of Excellence For A Better Tomorrow
No ratings yet
In Pursuit of Excellence For A Better Tomorrow
26 pages
Form Pelaporan Ukl Upl
No ratings yet
Form Pelaporan Ukl Upl
3 pages
El Deafo Teaching Guide
75% (8)
El Deafo Teaching Guide
3 pages
Xe155ucr Spec
No ratings yet
Xe155ucr Spec
20 pages
Khan Noorlander-Studies in The Grammar and Lexicon of Neo-Aramaic
No ratings yet
Khan Noorlander-Studies in The Grammar and Lexicon of Neo-Aramaic
542 pages
Algorithmic Number Theory, Vol. 1 Efficient Algorithms - Bach E., Shallit J.
100% (3)
Algorithmic Number Theory, Vol. 1 Efficient Algorithms - Bach E., Shallit J.
516 pages
Practical Applications
No ratings yet
Practical Applications
235 pages
Salinas CA Fy 2025 26 Adopted Budget in Brief
No ratings yet
Salinas CA Fy 2025 26 Adopted Budget in Brief
13 pages
Grammar Now Plus 2 - SB Answer Keys
No ratings yet
Grammar Now Plus 2 - SB Answer Keys
59 pages
I Cdisc: Ntroduction To
No ratings yet
I Cdisc: Ntroduction To
29 pages
Omv (Tunesien) Production GMBH
100% (1)
Omv (Tunesien) Production GMBH
133 pages
Commissioning Report For Boiler Air and Flue Gas System Unit 1
No ratings yet
Commissioning Report For Boiler Air and Flue Gas System Unit 1
6 pages
Persian Farsi Language
No ratings yet
Persian Farsi Language
129 pages
MID 039 - CID 1846 - FMI 09: Pantalla Anterior
No ratings yet
MID 039 - CID 1846 - FMI 09: Pantalla Anterior
6 pages

K Nearest Neighbor - Step by Step Tutorial

Uploaded by

K Nearest Neighbor - Step by Step Tutorial

Uploaded by

8/6/2018 K Nearest Neighbor : Step by Step Tutorial

ABOUT INDEX WRITE FOR US

K NEAREST NEIGHBOR : STEP BY STEP TUTORIAL Join us with 5000+ Subscribers

K-Nearest Neighbor Simplified

Introduction to K-Nearest Neighbor (KNN)

How KNN algorithm works

Height (in cms) Weight (in kgs) T Shirt Size

Step 1 : Calculate Similarity based on distance function

Step 2 : Find K-Nearest Neighbors

Calculate KNN manually

KNN: Visual Representation

Knn after standardization

Why KNN is non-parametric?

KNN vs. K-mean

1. K-mean is an unsupervised learning technique (no dependent variable) whereas KNN

Can KNN be used for regression?

Pros and Cons of KNN

1. Memory Intensive / Computationally expensive

How to handle categorical variables in KNN?

How to find best K value?

1. Proportion of words in the speech showing

2. Number of time he/she mentions his/her own party

3. Number of time he/she mentions his/her opposite parties.

4. Some measure indicating the content of speech showing

Download Link : Data File

# Transforming the dependent variable to a factor

#Partitioning the data into training and validation data

Win.Loss Optimism Pessimism PastUsed FutureUsed PresentUsed OwnPartyCount

# Setting levels for both training and validation data

# Setting up train controls

center : subtract mean from values.

model1 <- train(Win.Loss~. , data = train, method = "knn",

Pre-processing: centered (13), scaled (13)

k ROC Sens Spec

Cross Validation : Fine Tuning

#Storing Model Performance Scores

# Calculating Area under Curve (AUC)

The Area under curve (AUC) on validation dataset is 0.8642.

R Tutorials : 75 Free R Tutorials

Love this Post? Spread the Word

Let's Get Connected: Email | LinkedIn

Get Free Email Updates : Enter your email address Submit

1 Response to "K Nearest Neighbor : Step by Step Tutorial"

Mashetty Aman 29 January 2018 at 09:40

Enter your comment...

Comment as: alwin.anuse@m Sign out

Publish Preview Notify me

Copyright 2017 ListenData

You might also like