0% found this document useful (0 votes)

101 views7 pages

Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang

Uploaded by

Ethan Manani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views7 pages

Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang

Uploaded by

Ethan Manani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Big-data Clinical Trial Column

Page 1 of 7

Introduction to machine learning: k-nearest neighbors

Zhongheng Zhang

Department of Critical Care Medicine, Jinhua Municipal Central Hospital, Jinhua Hospital of Zhejiang University, Jinhua 321000, China
Correspondence to: Zhongheng Zhang, MMed. 351#, Mingyue Road, Jinhua 321000, China. Email: [email protected].

Author’s introduction: Zhongheng Zhang, MMed. Department of Critical Care Medicine, Jinhua Municipal Central
Hospital, Jinhua Hospital of Zhejiang University. Dr. Zhongheng Zhang is a fellow physician of the Jinhua Municipal
Central Hospital. He graduated from School of Medicine, Zhejiang University in 2009, receiving Master Degree. He has
published more than 35 academic papers (science citation indexed) that have been cited for over 200 times. He has been
appointed as reviewer for 10 journals, including Journal of Cardiovascular Medicine, Hemodialysis International, Journal of
Translational Medicine, Critical Care, International Journal of Clinical Practice, Journal of Critical Care. His major research
interests include hemodynamic monitoring in sepsis and septic shock, delirium, and outcome study for critically ill patients.
He is experienced in data management and statistical analysis by using R and STATA, big data exploration, systematic
review and meta-analysis.

Zhongheng Zhang, MMed.

Abstract: Machine learning techniques have been widely used in many scientific fields, but its use in medical
literature is limited partly because of technical difficulties. k-nearest neighbors (kNN) is a simple method of
machine learning. The article introduces some basic ideas underlying the kNN algorithm, and then focuses on
how to perform kNN modeling with R. The dataset should be prepared before running the knn() function in R.
After prediction of outcome with kNN algorithm, the diagnostic performance of the model should be checked.
Average accuracy is the mostly widely used statistic to reflect the kNN algorithm. Factors such as k value, distance
calculation and choice of appropriate predictors all have significant impact on the model performance.

Keywords: Machine learning; R; k-nearest neighbors (kNN); class; average accuracy; kappa

Submitted Jan 25, 2016. Accepted for publication Feb 18, 2016.
doi: 10.21037/atm.2016.03.37
View this article at: http://dx.doi.org/10.21037/atm.2016.03.37

© Annals of Translational Medicine. All rights reserved. atm.amegroups.com Ann Transl Med 2016;4(11):218
Page 2 of 7 Zhang. Introduction to machine learning: k-nearest neighbors

Introduction to k-nearest neighbor (kNN)

kNN classifier is to classify unlabeled observations by

assigning them to the class of the most similar labeled
examples. Characteristics of observations are collected for Fruit
both training and test dataset. For example, fruit, vegetable

SWEET
and grain can be distinguished by their crunchiness and
sweetness (Figure 1). For the purpose of displaying them Vegetable
on a two-dimension plot, only two characteristics are Grain
employed. In reality, there can be any number of predictors,
and the example can be extended to incorporate any
number of characteristics. In general, fruits are sweeter
than vegetables. Grains are neither crunchy nor sweet. Our
work is to determine which category does the sweet potato
Crunchy
belong to. In this example we choose four nearest kinds of
food, they are apple, green bean, lettuce, and corn. Because Figure 1 Illustration of how k-nearest neighbors’ algorithm works.

the vegetable wins the most votes, sweet potato is assigned

to the class of vegetable. You can see that the key concept of
kNN is easy to understand. > df1 <- data.frame(x1=runif(200,0,100),
There are two important concepts in the above example. x2=runif(200,0,100))
One is the method to calculate the distance between > df1 <- transform(df1, y=1+ifelse(100 - x1 - x2
sweet potato and other kinds of food. By default, the + rnorm(200,sd=10) < 0, 0, ifelse(100 - 2*x2 +
knn() function employs Euclidean distance which can be rnorm(200,sd=10) < 0, 1, 2)))
calculated with the following equation (1,2). > df1$y<-as.factor(df1$y)
> df1$tag<-c(rep("train",150),rep("test",50))
( p1 − q1 ) + ( p2 − q2 ) +  + ( pn − qn )
2 2 2
D( p, q=
) [1]
The first line sets a seed to make the output reproducible.
where p and q are subjects to be compared with n The second line creates a data frame named df1, and it
characteristics. There are also other methods to calculate contains two variables x1 and x2. Then I add another
distance such as Manhattan distance (3,4). categorical variable y, and it has three categories. However,
Another concept is the parameter k which decides the variable y is numeric and I convert it into a factor by
how many neighbors will be chosen for kNN algorithm. using as.factor() function. A tag variable is added to split the
The appropriate choice of k has significant impact on dataset into training set and test set. Next we can examine
the diagnostic performance of kNN algorithm. A large k the dataset by graphical presentation.
reduces the impact of variance caused by random error, but
runs the risk of ignoring small but important pattern. The > library(ggplot2)
key to choose an appropriate k value is to strike a balance > qplot(x1,x2, data=df1, colour=y,shape=tag)
between overfitting and underfitting (5). Some authors
suggest to set k equal to the square root of the number of
As you can see in Figure 2, different categories are
observations in the training dataset (6).
denoted by red, green and blue colors. The whole dataset
is split in 150:50 ratio for training and test datasets. Dots
Working example represent test data and triangles are training data.

For illustration of how kNN works, I created a dataset that

had no actual meaning. Performing kNN algorithm with R

The R package class contains very useful function for the

> set.seed(seed=888) purpose of kNN machine learning algorithm (7). Firstly one

© Annals of Translational Medicine. All rights reserved. atm.amegroups.com Ann Transl Med 2016;4(11):218
Annals of Translational Medicine, Vol 4, No 11 June 2016 Page 3 of 7

100
100

75
75

tag
tag
test
test
train
train
×2
x2

50
50 yy
11
22
33

25
25

0 25 50 75 100
x1
×1

Figure 2 Visual presentation of simulated working example. The class 1, 2 and 3 are denoted by red, green and blue colors, respectively. Dots
represent test data and triangles are training data.

needs to install and load the class package to the working > test<-df1[151:200,1:2]
space. > test.label<-df1[151:200,3]

> install.packages(“class”) Up to now, datasets are well prepared for the kNN model
> library(class) building. Because kNN is a non-parametric algorithm,
we will not obtain parameters for the model. The kNN()
function returns a vector containing factor of classifications
Then we divide the original dataset into the training and
of test set. In the following code, I arbitrary choose a k
test datasets. Note that the training and test data frames
value of 6. The results are stored in the vector pred.
contain only the predictor variable. The response variable is
stored in other vectors.
> pred<-knn(train=train,test=test,cl=train.label,k=6)

> train<-df1[1:150,1:2] The results can be viewed by using CrossTable() function

> train.label<-df1[1:150,3] in the gmodels package.

© Annals of Translational Medicine. All rights reserved. atm.amegroups.com Ann Transl Med 2016;4(11):218
Page 4 of 7 Zhang. Introduction to machine learning: k-nearest neighbors

> install.packages(“gmodels”) where TP is the true positive, TN is the true negative, FP is

> library(gmodels) the false positive and FN is the false negative. The subscript
i indicates category, and l refers to the total category.
> CrossTable(x = test.label, y = pred,prop.chisq=FALSE)

> table<-CrossTable(x = test.label, y = pred,prop.

Cell Contents
chisq=TRUE)
> tp1<-table$t[1,1]
N
> tp2<-table$t[2,2]
N / Row Total
N / Col Total > tp3<-table$t[3,3]
N / Table Total > tn1<-table$t[2,2]+table$t[2,3]+table$t[3,2]+table
$t[3,3]

Total Observations in Table: 50 > tn2<-table$t[1,1]+table$t[1,3]+table$t[3,1]+table

$t[3,3]
> tn3<-table$t[1,1]+table$t[1,2]+table$t[2,1]+table
Pred
$t[2,2]
test.label 1 2 3 Row Total
> fn1<-table$t[1,2]+table$t[1,3]
1 29 0 0 29
> fn2<-table$t[2,1]+table$t[2,3]
1.000 0.000 0.000 0.580
> fn3<-table$t[3,1]+table$t[3,2]
0.935 0.000 0.000
> fp1<-table$t[2,1]+table$t[3,1]
0.580 0.000 0.000
> fp2<-table$t[1,2]+table$t[3,2]
2 2 6 2 10
> fp3<-table$t[1,3]+table$t[2,3]
0.200 0.600 0.200 0.200
> accuracy<-(((tp1+tn1)/
0.065 0.857 0.167
(tp1+fn1+fp1+tn1))+((tp2+tn2)/
0.040 0.120 0.040 (tp2+fn2+fp2+tn2))+((tp3+tn3)/(tp3+fn3+fp3+tn3)))/3
3 0 1 10 11 > accuracy
0.000 0.091 0.909 0.220 [1] 0.9333333
0.000 0.143 0.833
0.000 0.020 0.200 The CrossTable() function returns the result of cross
Column 31 7 12 50 tabulation of predicted and observed classifications. The
Total number in each cell can be used for the calculation of four
0.620 0.140 0.240 basic parameters true positive (TP), true negative (TN),
false negative (FN) and false positive (FP). The process
repeated for each category. Finally, the accuracy is 0.93.
Diagnostic performance of the model

The kNN algorithm assigns a category to observations in Sensitivity and specificity

the test dataset by comparing them to the observations in
Sensitivity is a measure of the proportion of positives that
the training dataset. Because we know the actual category
are correctly identify positive observations. Specificity
of observations in the test dataset, the performance of the
is a measure of the proportion of negatives that are
kNN model can be evaluated. One of the most commonly truly negative. They are commonly used to measure
used parameter is the average accuracy that is defined by the the diagnostic performance of a test (9). In evaluation
following equation (8): of a prediction model, they can be used to reflect the
performance of the model. Imaging a perfectly fitted
l
TPi + TN i model that can predict outcomes with 100% accuracy, both
Average Accuracy = ∑ /l [2]
i =1 TPi + FN i + FPi + TN i sensitivity and specificity are 100%. In multiclass situation
as in our example, sensitivity and specificity are calculated

separately for each class. The equations are as follows. the performance of kNN algorithm. Kappa can be formally
expressed by the following equation:
Seni TPi / (TPi + FN i )
= [3]
=Spi TN i / (TN i + FPi ) [4] P ( A) − P( E )
kappa = [5]
1 − P( E )
where TP is the true positive, TN is the true negative, FP is
the false positive and FN is the false negative. The subscript where P(A) is the relative observed agreement among raters,
i indicates category. and P(E) is the proportion of agreement expected between
the classifier and the ground truth by chance. In our
> sen1<-tp1/(tp1+fn1) example the tabulation of predicted and observed classes are
> sp1<-tn1/(tn1+fp1) as follows:
> sen1
[1] 1 > table<-table(test.label,pred)
> sp1 > table
[1] 0.9047619 pred
test.label 1 2 3
1 29 0 0
Multiclass area under the curve (AUC)
2 2 6 2
A receiver operating characteristic (ROC) curve measures the 3 0 1 10
performance of a classifier to correctly identify positives and
negatives. The AUC ranges between 0.5 and 1. An AUC of 0.5 The relative observed agreement can be calculated as
indicates a random classifier that it has no value. Multiclass
AUC is well describe by Hand and coworkers (10). The P ( A )= (29 + 6 + 10) / 50= 0.9 [6]
multiclass.roc() function in pROC package is able to do the
task.
the kNN algorithm predicts 1, 2 and 3 for 31, 7, and
12 times. Thus, the probability that kNN says for 1, 2
> install.packages("pROC") and 3 are 0.62, 0.14 and 0.24, respectively. Similarly, the
> library(pROC) probabilities that 1, 2 and 3 are observed are 0.58, 0.2 and
> multiclass.roc(response=test.label, predictor=as. 0.22, respectively. Then, the probability that both classifier
ordered(pred)) say 1, 2 and 3 are 0.62×0.58=0.3596, 0.14×0.2=0.028 and
0.24×0.22=0.0528. The overall probability of random
Call: agreement is:
multiclass.roc.default(response = test.label, predictor =
as.ordered(pred)) P ( E ) = 0.3596 + 0.028 + 0.0528 = 0.4404 [7]

Data: as.ordered(pred) with 3 levels of test.label: 1, 2, 3. and the kappa statistic is:
Multi-class area under the curve: 0.9212
P ( A ) − P ( E ) 0.9 − 0.4404
kappa
= = ≈ 0.82 [8]
1− P ( E ) 1 − 0.4404
As you can see from the output of the command, the
multi-class AUC is 0.9212.
Fortunately, the calculation can be performed by cohen.
kappa() function in the psych package. I present the
Kappa statistic calculation process here for readers to better understand the
concept of kappa.
Kappa statistic is a measurement of the agreement for
categorical items (11). Its typical use is in assessment of the
inter-rater agreement. Here kappa can be used to assess > install.packages("psych")

is because the knn() function breaks ties at random.

To explain, if we have 4 nearest neighbors and two are
classified as A and 2 are classified as B, then A and B are
randomly chosen as predicted result.
0.9
0.9

> accuracyCal<-function(N) {
Average accuracy

accuracy<-1
Average accuracy

for (x in 1:N) {
0.8
0.8
pred<-knn(train=train,test=test,cl=train.
0.88 0.90 0.92 0.94 0.96

label,k=x)
table<- table(test.label,pred)
tp1<-table[1,1]

0.7
0.7
tp2<-table[2,2]
tp3<-table[3,3]
tn1<-table[2,2]+table[2,3]+table[3,2]+table[3,3]
00 50
50 100
100 150
150 tn2<-table[1,1]+table[1,3]+table[3,1]+table[3,3]
k kvalues
values

Figure 3 Graphical presentation of average accuracy with different k tn3<-table[1,1]+table[1,2]+table[2,1]+table[2,2]

values. The inset zooms in at k range between 0 and 30. fn1<-table[1,2]+table[1,3]
fn2<-table[2,1]+table[2,3]
fn3<-table[3,1]+table[3,2]
> library(psych) fp1<-table[2,1]+table[3,1]
> cohen.kappa(x=cbind(test.label,pred)) fp2<-table[1,2]+table[3,2]
Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = fp3<-table[1,3]+table[2,3]
alpha)
accuracy<-c(accuracy, (((tp1+tn1)/
(tp1+fn1+fp1+tn1))+((tp2+tn2)/
Cohen Kappa and Weighted Kappa correlation coeffi- (tp2+fn2+fp2+tn2))+((tp3+tn3)/(tp3+fn3+fp3+tn3)))/3)
cients and confidence boundaries
}
lower estimate upper
return(accuracy[-1])
unweighted kappa 0.68 0.82 0.96
}
weighted kappa 0.87 0.93 0.99
Number of subjects = 50
The following code creates a visual display of the
results. An inset plot is created to better visualize how
Tuning k for kNN accuracy changes within the k range between 5 and 20. The
subplot() function contained in TeachingDemos package is
The parameter k is important in kNN algorithm. In the
helpful in drawing such an inset. It is interesting to adjust
last section I would like to tune k values and examine
graph parameters to make the figure a better appearance
the change of the diagnostic accuracy of the kNN
(Figure 3). The figure shows that the average accuracy is
model. Custom-made R function is helpful in simplify
highest at k=15. At a large k value (150 for example), all
the calculation process. Here I write a function named
observations in the training dataset are included and all
“accuracyCal” to calculate a series of average accuracies.
observations in the test dataset are assigned to the class with
There is only one argument for the function. That is the
the largest number of subjects in the training dataset. This
maximum number of k you would like to examine. There
is of course not the result we want.
is for loop with in the function that calculates accuracy
repeatedly from one to N. When you run the function,
the results may not exactly the same for each time. That > install.packages("TeachingDemos")

> library(TeachingDemos) References

> qplot(seq(1:150),accuracyCal(150),xlab="k 1. Short RD, Fukunaga K. The optimal distance measure
values",ylab="Average accuracy",geom = c("point", for nearest neighbor classification. IEEE Transactions on
"smooth")) Information Theory 1981;27:622-7.
> subplot( 2. Weinberger KQ, Saul LK. Distance metric learning for
plot(seq(1:30),accuracyCal(30), col=2,xlab='', large margin nearest neighbor classification. The Journal
ylab='',cex.axis=0.8), of Machine Learning Research 2009;10:207-44.
x=grconvertX(c(0,0.75), from='npc'), 3. Cost S, Salzberg S. A weighted nearest neighbor algorithm
y=grconvertY(c(0,0.45), from='npc'), for learning with symbolic features. Machine Learning
type='fig', pars=list( mar=c(0,0,1.5,1.5)+0.1) ) 1993;10:57-78.
4. Breiman L. Random forests. Machine Learning.
2001;45:5-32.
Summary 5. Zhang Z. Too much covariates in a multivariable model
may cause the problem of overfitting. J Thorac Dis
The article introduces some basic ideas underlying the kNN
algorithm. The dataset should be prepared before running 2014;6:E196-7.
the knn() function in R. After prediction of outcome with 6. Lantz B. Machine learning with R. 2nd ed. Birmingham:
kNN algorithm, the diagnostic performance of the model Packt Publishing; 2015:1.
should be checked. Average accuracy is the most widely 7. Venables WN, Ripley BD. Modern applied statistics with
used statistic to reflect the performance kNN algorithm. S-PLUS. 3rd ed. New York: Springer; 2001.
Factors such as k value, distance calculation and choice of 8. Hernandez-Torruco J, Canul-Reich J, Frausto-Solis J, et al.
appropriate predictors all have significant impact on the Towards a predictive model for Guillain-Barré syndrome.
model performance. Conf Proc IEEE Eng Med Biol Soc 2015;2015:7234-7.
9. Linden A. Measuring diagnostic and predictive accuracy
in disease management: an introduction to receiver
Acknowledgements
operating characteristic (ROC) analysis. J Eval Clin Pract
None. 2006;12:132-9.
10. Hand DJ, Till RJ. A simple generalisation of the area
under the ROC curve for multiple class classification
Footnote
problems. Machine Learning 2001;45:171-86.
Conflicts of Interest: The author has no conflicts of interest to 11. Thompson JR. Estimating equations for kappa statistics.
declare. Stat Med 2001;20:2895-906.

Cite this article as: Zhang Z. Introduction to machine

learning: k-nearest neighbors. Ann Transl Med 2016;4(11):218.
doi: 10.21037/atm.2016.03.37

Introduction To K-Nearest Neighbor (KNN) : Height (In CMS) Weight (In KGS) T Shirt Size
No ratings yet
Introduction To K-Nearest Neighbor (KNN) : Height (In CMS) Weight (In KGS) T Shirt Size
5 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
K - Nearest Neighbours (K-NN) Algorithm
No ratings yet
K - Nearest Neighbours (K-NN) Algorithm
10 pages
KNN Presentation
No ratings yet
KNN Presentation
19 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
Lazy LearningClassification Using Nearest Neighbors
No ratings yet
Lazy LearningClassification Using Nearest Neighbors
36 pages
KNN Activity
No ratings yet
KNN Activity
4 pages
K Nearest Neighbor
No ratings yet
K Nearest Neighbor
33 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
K-Nearest Neighbors Clearly Explained
No ratings yet
K-Nearest Neighbors Clearly Explained
11 pages
2.2 Lazy Learning
No ratings yet
2.2 Lazy Learning
26 pages
Clustering - KNN
No ratings yet
Clustering - KNN
10 pages
Lecture 38 KNN
No ratings yet
Lecture 38 KNN
4 pages
Presentation of KNN-1
No ratings yet
Presentation of KNN-1
18 pages
K Nearest Neighbor
No ratings yet
K Nearest Neighbor
32 pages
Shubh
No ratings yet
Shubh
10 pages
Classification Methods I
No ratings yet
Classification Methods I
20 pages
K Nearestneighborknnalgorithm 241117075907 d767c46d
No ratings yet
K Nearestneighborknnalgorithm 241117075907 d767c46d
13 pages
KNN Using Python
No ratings yet
KNN Using Python
23 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
Lec 23 - 24 KNN
No ratings yet
Lec 23 - 24 KNN
25 pages
K-Nearest Neighbors (KNN) Algorithm in Machine Learning
No ratings yet
K-Nearest Neighbors (KNN) Algorithm in Machine Learning
3 pages
KNN Algorithm
No ratings yet
KNN Algorithm
2 pages
K Nearest Neighbour's (KNN) (1) Using R
No ratings yet
K Nearest Neighbour's (KNN) (1) Using R
9 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
22 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
4.kNN Concepts
No ratings yet
4.kNN Concepts
12 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Introduction To KNN and R
No ratings yet
Introduction To KNN and R
12 pages
KNN Algo
No ratings yet
KNN Algo
9 pages
21STA024 Md. Toufik Umar Assignment On STAT 309 1
No ratings yet
21STA024 Md. Toufik Umar Assignment On STAT 309 1
12 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
21 KNN
No ratings yet
21 KNN
28 pages
K Nearest Neighbor - Step by Step Tutorial
No ratings yet
K Nearest Neighbor - Step by Step Tutorial
16 pages
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
No ratings yet
Week 3. K-Nearest Neighbours (KNN) : Dr. Shuo Wang
18 pages
KNN Classifier
No ratings yet
KNN Classifier
5 pages
CQF - ML - 3 - KNN - Annotated
No ratings yet
CQF - ML - 3 - KNN - Annotated
20 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
AIML
No ratings yet
AIML
13 pages
K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025
No ratings yet
K-Nearest Neighbor (KNN) Algorithm: Last Updated: 14 May, 2025
14 pages
K Nearest Neighbors KNN A Fundamental Machine Learning Algorithm
No ratings yet
K Nearest Neighbors KNN A Fundamental Machine Learning Algorithm
11 pages
Intro To KNN
No ratings yet
Intro To KNN
8 pages
KNN With Example
No ratings yet
KNN With Example
21 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
1694600817-Unit2.3 KNN CU 2.0
No ratings yet
1694600817-Unit2.3 KNN CU 2.0
25 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
Additional Problem Set Units I and II
No ratings yet
Additional Problem Set Units I and II
8 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Skewness and Kurtosis Final
100% (1)
Skewness and Kurtosis Final
15 pages
KNN
No ratings yet
KNN
53 pages
K-Nearest Neighbors (KNN)
No ratings yet
K-Nearest Neighbors (KNN)
3 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
Amrendra
No ratings yet
Amrendra
9 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
The Students' Engagement in Schools Questionnaire
No ratings yet
The Students' Engagement in Schools Questionnaire
13 pages
Carbohydrates Quiz
No ratings yet
Carbohydrates Quiz
2 pages
Effects of 10 Months of Speed Functional And.94254
No ratings yet
Effects of 10 Months of Speed Functional And.94254
11 pages
Ken Black QA 5th Chapter 11 Solution
No ratings yet
Ken Black QA 5th Chapter 11 Solution
30 pages
112 - Monoclonal Antibodies
No ratings yet
112 - Monoclonal Antibodies
3 pages
112 - Monoclonal Antibodies
No ratings yet
112 - Monoclonal Antibodies
3 pages
Statistical Calculations Using Calculators
No ratings yet
Statistical Calculations Using Calculators
4 pages
Applied Environmental Measurement Techniques: Statistics Exploratory Data Analysis
No ratings yet
Applied Environmental Measurement Techniques: Statistics Exploratory Data Analysis
17 pages
Formulates Appropriate Null and Alternative Hypothesis
No ratings yet
Formulates Appropriate Null and Alternative Hypothesis
23 pages
Melanie Farrar BA240.12025.Spring 2025
No ratings yet
Melanie Farrar BA240.12025.Spring 2025
6 pages
Customer Churn Prediction Employing Ensemble Learning
No ratings yet
Customer Churn Prediction Employing Ensemble Learning
5 pages
BEO1106 Business Statistics Assignment Part III AnswerSheet
No ratings yet
BEO1106 Business Statistics Assignment Part III AnswerSheet
4 pages
MATH 533 Week 8 Final Exam
No ratings yet
MATH 533 Week 8 Final Exam
11 pages
Chapter 10 QBM
No ratings yet
Chapter 10 QBM
38 pages
C4 English
No ratings yet
C4 English
27 pages
Statistics Formula
No ratings yet
Statistics Formula
4 pages
Intermediate Economics Sem 2 Pyq
No ratings yet
Intermediate Economics Sem 2 Pyq
32 pages
Time Series Using Stata (Oscar Torres-Reyna Version) : December 2007
No ratings yet
Time Series Using Stata (Oscar Torres-Reyna Version) : December 2007
32 pages
Statistics
No ratings yet
Statistics
57 pages
Price Elasticity in Motor Insurance
No ratings yet
Price Elasticity in Motor Insurance
34 pages
A General Guide To Stat Analysis
No ratings yet
A General Guide To Stat Analysis
1 page
1.08 Example: 1 Exploring Data
No ratings yet
1.08 Example: 1 Exploring Data
2 pages
Calista Valle - Quiz #2 - Measures of Central Tendency and Spread
No ratings yet
Calista Valle - Quiz #2 - Measures of Central Tendency and Spread
3 pages
Nov Dec 2022
No ratings yet
Nov Dec 2022
4 pages
Stat 230 Introduction To Probability and Statistics: Sections 1.1 & 1.2
No ratings yet
Stat 230 Introduction To Probability and Statistics: Sections 1.1 & 1.2
17 pages
Distance To Default Based On The CEV-KMV Model
No ratings yet
Distance To Default Based On The CEV-KMV Model
16 pages
All All: % (A) Construct Side-By-Side Stem-And-Leaf Plots
No ratings yet
All All: % (A) Construct Side-By-Side Stem-And-Leaf Plots
34 pages
Session 09 - BS - 2020-Z Score
No ratings yet
Session 09 - BS - 2020-Z Score
32 pages
MO Diagrams For More Complex Molecules: Friday, October 16, 2015
No ratings yet
MO Diagrams For More Complex Molecules: Friday, October 16, 2015
13 pages
Statistics and Probability Yong Hwa M. Jeong Grade 11 STEM-B Quarter 4 - Module 1: Test of Hypothesis
No ratings yet
Statistics and Probability Yong Hwa M. Jeong Grade 11 STEM-B Quarter 4 - Module 1: Test of Hypothesis
22 pages
Nigsch 2006 Melting Point Prediction Employing
No ratings yet
Nigsch 2006 Melting Point Prediction Employing
11 pages
3C's, Regression and Dimension Reduction in Machine Learning.
No ratings yet
3C's, Regression and Dimension Reduction in Machine Learning.
3 pages
Kowalski 1972 K Nearest Neighbor Classification R
No ratings yet
Kowalski 1972 K Nearest Neighbor Classification R
7 pages
A Primer On Strong Vs Weak Control of Familywise Error Rate: Michael A. Proschan Erica H. Brittain
No ratings yet
A Primer On Strong Vs Weak Control of Familywise Error Rate: Michael A. Proschan Erica H. Brittain
7 pages
K-Nearest Algorithm Classification: Neighbor Based For Multi-Label
No ratings yet
K-Nearest Algorithm Classification: Neighbor Based For Multi-Label
4 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang

Uploaded by

Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang

Uploaded by

Big-data Clinical Trial Column

Introduction to machine learning: k-nearest neighbors

Zhongheng Zhang, MMed.

Introduction to k-nearest neighbor (kNN)

kNN classifier is to classify unlabeled observations by

the vegetable wins the most votes, sweet potato is assigned

For illustration of how kNN works, I created a dataset that

The R package class contains very useful function for the

> train<-df1[1:150,1:2] The results can be viewed by using CrossTable() function

> install.packages(“gmodels”) where TP is the true positive, TN is the true negative, FP is

> table<-CrossTable(x = test.label, y = pred,prop.

Total Observations in Table: 50 > tn2<-table$t[1,1]+table$t[1,3]+table$t[3,1]+table

The kNN algorithm assigns a category to observations in Sensitivity and specificity

is because the knn() function breaks ties at random.

Figure 3 Graphical presentation of average accuracy with different k tn3<-table[1,1]+table[1,2]+table[2,1]+table[2,2]

> library(TeachingDemos) References

Cite this article as: Zhang Z. Introduction to machine

You might also like