0% found this document useful (0 votes)

35 views9 pages

Decision Tree KNN

Uploaded by

Romesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views9 pages

Decision Tree KNN

Uploaded by

Romesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

K-Nearest Neighbor Classification Algorithm

KNN classification follows a simple algorithm. The algorithm works as follows:

The inputs to the algorithm are:

• The dataset with labeled data points

• The number k i.e. the number of nearest neighbors that we use to find the class of any new instance of data.
• The new data point.

Using the above inputs, we follow the below steps to classify any data point.

1. First, we choose the number k and a distance metric. You can take any distance metric such as Euclidean, Minkowski,
or Manhattan distance for numerical attributes in the dataset. You can also specify your own distance metric if you
have datasets having categorical or mixed attributes.
2. For a new data point P, calculate its distance to all the existing data points.
3. Select the k-nearest data points, where k is a user-specified parameter.
4. Among the k-nearest neighbors, count the number of data points in each class. We do this to select the class label
with a majority of data points in the k neighbors that we select.
5. Assign the new data point to the class with the majority class label among the k-nearest neighbors.

Now that we have discussed the basic intuition and the algorithm for KNN classification, let us discuss a KNN classification
numerical example using a small dataset.

Minkowski distance

Euclidean distance can be generalised using the Minkowski norm also known as the p norm. The formula for Minkowski

distance is:

Here we can see that the formula differs from the formula of Euclidean distance as we can see that instead of squaring the

difference, we have raised the difference to the power of p and have also taken the p root of the difference. Now the biggest

advantage of using such a distance metric is that we can change the value of p to get different types of distance metrics.

p=2

If we take the value of p as 2 then we get the Euclidean distance.

p=1

If we set p to 1 then we get a distance function known as the Manhattan distance.

KNN Classification Numerical

Example

To solve the numerical example on the K-nearest neighbor i.e. KNN classification algorithm, we will use the following dataset.
Point Coordinates Class Label

A1 (2,10) C2

A2 (2, 6) C1

A3 (11,11) C3

A4 (6, 9) C2

A5 (6, 5) C1

A6 (1, 2) C1

A7 (5, 10) C2

A8 (4, 9) C2

A9 (10, 12) C3

A10 (7, 5) C1

A11 (9, 11) C3

A12 (4, 6) C1

A13 (3, 10) C2

A15 (3, 8) C2

A15 (6, 11) C2

KNN classification dataset
In the above dataset, we have fifteen data points with three class labels. Now, suppose that we have to find the class label of the
point P= (5, 7).

For this, we will first specify the number of nearest neighbors i.e. k. Let us take k to be 3. Now, we will find the distance of P to
each data point in the dataset. For this KNN classification numerical example, we will use the euclidean distance metric. The
following table shows the euclidean distance of P to each data point in the dataset.

Point Coordinates Distance from P (5, 7)

A1 (2, 10) 4.24

A2 (2, 6) 3.16

A3 (11, 11) 7.21

A4 (6, 9) 2.23

A5 (6, 5) 2.23

A6 (1, 2) 6.40

A7 (5, 10) 3.0

A8 (4, 9) 2.23

A9 (10, 12) 7.07

A10 (7, 5) 2.82

A11 (9, 11) 5.65

A12 (4, 6) 1.41

A13 (3, 10) 3.60

A15 (3, 8) 2.23

A16 (6, 11) 4.12

Distance of each point for the new point
After finding the distance of each point in the dataset to P, we will sort the above points according to their distance from P (5,
7). After sorting, we get the following table.

Point Coordinates Distance from P (5, 7)

A12 (4, 6) 1.41

A4 (6, 9) 2.23

A5 (6, 5) 2.23

A8 (4, 9) 2.23

A15 (3, 8) 2.23

A10 (7, 5) 2.82

A7 (5, 10) 3

A2 (2, 6) 3.16

A13 (3, 10) 3.6

A16 (6, 11) 4.12

A1 (2, 10) 4.24

A11 (9, 11) 5.65

A6 (1, 2) 6.4

A9 (10, 12) 7.07

A3 (11, 11) 7.21

Sorted Points according to distance
As we have taken k=3, we will now consider the class labels of three points in the dataset nearest to point P to classify P In the
above table, A12, A4, and A5 are the closest 3 neighbors of point P. Hence, we will use the class labels of points A12, A4, and
A5 to decide the class label for P.

Now, point A12, A4, and A5 have the class labels C1, C2, and C1 respectively. Among these points, the majority class label is C1.
Therefore, we will specify the class label of point P = (5, 7) as C1. Hence, we have successfully used KNN classification to classify
point P according to the given dataset.

Q.2 We have data from the questionnaires survey (to ask people opinion) and objective testing with two attributes

(acid durability and strength) to classify whether a special paper tissue is good or not. Here is four training samples

X1 = Acid Durability (seconds) X2 = Strength(kg/square meter) Y = Classification

7 7 Bad

7 4 Bad
3 4 Good

1 4 Good

Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7. Without another expensive

survey, can we guess what the classification of this new tissue is?

1. Determine parameter K = number of nearest neighbors

Suppose use K = 3

2. Calculate the distance between the query-instance and all the training samples

Coordinate of query instance is (3, 7), instead of calculating the distance we compute square distance which is faster to

calculate (without square root)

X2 = Strength
X1 = Acid Durability (seconds) Square Distance to query instance (3, 7)
(kg/square meter)

7 7

7 4

3 4

1 4

3. Sort the distance and determine nearest neighbors based on the K-th minimum distance
X2 = Strength
X1 = Acid Durability Square Distance to query instance Rank minimum Is it included in 3-Nearest
(kg/square
(seconds) (3, 7) distance neighbors?
meter)

7 7 3 Yes

7 4 4 No

3 4 1 Yes

1 4 2 Yes

4. Gather the category (Y) of the nearest neighbors. Notice in the second row last column that the category of nearest neighbor

(Y) is not included because the rank of this data is more than 3 (=K).
X2 =
X1 = Acid Rank
Strength Square Distance to query Is it included in 3- Y = Category of
Durability minimum
(kg/square instance (3, 7) Nearest neighbors? nearest Neighbor
(seconds) distance
meter)

7 7 3 Yes Bad

7 4 4 No -

3 4 1 Yes Good

1 4 2 Yes Good

5. Use simple majority of the category of nearest neighbors as the prediction value of the query instance
We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that pass laboratory test with X1 = 3 and X2 =

7 is included in Good category.

Decision tree:
Introduction
Decision Trees are a type of Supervised Machine Learning (that is you explain what the input is
and what the corresponding output is in the training data) where the data is continuously split
according to a certain parameter. The tree can be explained by two entities, namely decision
nodes and leaves. The leaves are the decisions or the final outcomes. And the decision nodes

are where the data is split.

• Entropy:

Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of
the amount of uncertainty or randomness in data.
Information Gain:

Let’s understand this with the help of an example. Consider a piece of data collected over the
course of 14 days where the features are Outlook, Temperature, Humidity, Wind and the outcome
variable is whether Golf was played on the day. Now, our job is to build a predictive model which
takes in above 4 parameters and predicts whether Golf will be played on the day. We’ll build a
decision tree -

Day Outlook Temperature Humidity Wind Play Golf

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Here, following tasks are performed recursively

1. Create root node for the tree

2. If all examples are positive, return leaf node ‘positive’
3. Else if all examples are negative, return leaf node ‘negative’
4. Calculate the entropy of current state H(S)
5. For each attribute, calculate the entropy with respect to the attribute ‘x’ denoted by H(S,
x)
6. Select the attribute which has maximum value of IG(S, x)
7. Remove the attribute that offers highest IG from the set of attributes
8. Repeat until we run out of all attributes, or the decision tree has all leaf nodes.
Now, let's go ahead and grow the decision tree. The initial step is to calculate H(S), the Entropy
of the current state. In the above example, we can see in total there are 5 No’s and 9 Yes’s.

Yes No Total
9 5 14

Remember that the Entropy is 0 if all members belong to the same class, and 1 when half of
them belong to one class and other half belong to other class that is perfect randomness. Here
it’s 0.94 which means the distribution is fairly random. Now, the next step is to choose the
attribute that gives us highest possible Information Gain which we’ll choose as the root node.
Let’s start with ‘Wind’

where ‘x’ are the possible values for an attribute. Here, attribute ‘Wind’ takes two possible values
in the sample data, hence x = {Weak, Strong} We’ll have to calculate:

Amongst all the 14 examples we have 8 places where the wind is weak and 6 where the wind is
Strong.

Wind = Weak Wind = Strong Total

8 6 14

Now, out of the 8 Weak examples, 6 of them were ‘Yes’ for Play Golf and 2 of them were ‘No’ for
‘Play Golf’. So, we have,
Similarly, out of 6 Strong examples, we have 3 examples where the outcome was ‘Yes’ for Play
Golf and 3 where we had ‘No’ for Play Golf.

Remember, here half items belong to one class while other half belong to other. Hence we have
perfect randomness. Now we have all the pieces required to calculate the Information Gain,

Which tells us the Information Gain by considering ‘Wind’ as the feature and give us information
gain of 0.048. Now we must similarly calculate the Information Gain for all the features.

We can clearly see that IG(S, Outlook) has the highest information gain of 0.246, hence we chose
Outlook attribute as the root node. At this point, the decision tree looks like.

Here we observe that whenever the outlook is Overcast, Play Golf is always ‘Yes’, it’s no
coincidence by any chance, the simple tree resulted because of the highest information gain is
given by the attribute Outlook. Now how do we proceed from this point? We can simply
apply recursion, you might want to look at the algorithm steps described earlier. Now that we’ve
used Outlook, we’ve got three of them remaining Humidity, Temperature, and Wind. And, we had
three possible values of Outlook: Sunny, Overcast, Rain. Where the Overcast node already ended
up having leaf node ‘Yes’, so we’re left with two subtrees to compute: Sunny and Rain
Table where the value of Outlook is Sunny looks like:

Temperature Humidity Wind Play Golf

Hot High Weak No
Hot High Strong No
Mild High Weak No
Cool Normal Weak Yes
Mild Normal Strong Yes

In the similar fashion, we compute the following values

As we can see the highest Information Gain is given by Humidity. Proceeding in the same way

with
will give us Wind as the one with highest information gain. The final Decision Tree looks
something like this.

K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
KNN Algorithm
No ratings yet
KNN Algorithm
11 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
ML 2
No ratings yet
ML 2
6 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
1 KNN-Algo
No ratings yet
1 KNN-Algo
27 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Week 07
No ratings yet
Week 07
24 pages
Bài nhóm tìm hiểu về KNN
No ratings yet
Bài nhóm tìm hiểu về KNN
5 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
K Nearestneighborknnalgorithm 241117075907 d767c46d
No ratings yet
K Nearestneighborknnalgorithm 241117075907 d767c46d
13 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
Unit II 2 Mark Answers ML
No ratings yet
Unit II 2 Mark Answers ML
3 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
Introduction To KNN
100% (1)
Introduction To KNN
8 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
ML 4
No ratings yet
ML 4
33 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
No ratings yet
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
24 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
11 pages
Research Paper
No ratings yet
Research Paper
6 pages
KNN
No ratings yet
KNN
53 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
8 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
Unit V Non Parametric Machine Learning
No ratings yet
Unit V Non Parametric Machine Learning
47 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
ML Program 4
No ratings yet
ML Program 4
10 pages
KNN Solution
No ratings yet
KNN Solution
2 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
S3 K Nearest Neighbor LKW 15jan2025
No ratings yet
S3 K Nearest Neighbor LKW 15jan2025
16 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
05 KNN
No ratings yet
05 KNN
49 pages
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
No ratings yet
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
93 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
CC Jordan
100% (4)
CC Jordan
10 pages
Oop-Prelim Lab Exam Activity
No ratings yet
Oop-Prelim Lab Exam Activity
3 pages
Spider Charts:: A Training Course
No ratings yet
Spider Charts:: A Training Course
21 pages
Customer Database Programming
No ratings yet
Customer Database Programming
28 pages
Java Script Language
No ratings yet
Java Script Language
38 pages
Div. Memo No. 057 S. 2021
No ratings yet
Div. Memo No. 057 S. 2021
6 pages
Capital IQ SPCIQ Fundamentals v2
No ratings yet
Capital IQ SPCIQ Fundamentals v2
4 pages
ABAP Interview Questions
No ratings yet
ABAP Interview Questions
148 pages
Microsoft Word
No ratings yet
Microsoft Word
24 pages
Etabs Eng Brochure
No ratings yet
Etabs Eng Brochure
22 pages
FEWA ISMS Antivirus Policy v1.8
No ratings yet
FEWA ISMS Antivirus Policy v1.8
8 pages
Quick Start Installation Guide For HP Openview Network Node Manager and HP Openview Customer Views For NNM
No ratings yet
Quick Start Installation Guide For HP Openview Network Node Manager and HP Openview Customer Views For NNM
26 pages
Voluntary Disclosures
No ratings yet
Voluntary Disclosures
1 page
Lesson 2.2.a Stem and Leaf Plot
No ratings yet
Lesson 2.2.a Stem and Leaf Plot
9 pages
ASAL IT Decisiontree Final
No ratings yet
ASAL IT Decisiontree Final
1 page
8086 Interrupt System: By. M. Chinyuku
No ratings yet
8086 Interrupt System: By. M. Chinyuku
29 pages
Test Oracle Database 12c Presales Specialist Assessment Review
No ratings yet
Test Oracle Database 12c Presales Specialist Assessment Review
75 pages
Slides Nest
No ratings yet
Slides Nest
26 pages
Changelog
No ratings yet
Changelog
32 pages
Abstract - For - Supermarket Billing System
67% (3)
Abstract - For - Supermarket Billing System
3 pages
It - Tech Mahindra - Digital Transformation
No ratings yet
It - Tech Mahindra - Digital Transformation
3 pages
Programming Used 6
No ratings yet
Programming Used 6
33 pages
Tingkat Kepuasan Wisatawan Terhadap Kualitas Sarana Dan Prasarana Di Kawasan Kota Lama Semarang
No ratings yet
Tingkat Kepuasan Wisatawan Terhadap Kualitas Sarana Dan Prasarana Di Kawasan Kota Lama Semarang
9 pages
Web Technology Lab Manual PDF
No ratings yet
Web Technology Lab Manual PDF
74 pages
Saudi Cybercrime (2007/1428)
No ratings yet
Saudi Cybercrime (2007/1428)
3 pages
KUDU Variable Frequency Drives: For Progressing Cavity Pumps
No ratings yet
KUDU Variable Frequency Drives: For Progressing Cavity Pumps
2 pages
Applications Note 3.0: Creating Parts in Eagle
No ratings yet
Applications Note 3.0: Creating Parts in Eagle
10 pages
Report Structure
No ratings yet
Report Structure
3 pages
Zigbee Motion Detector Zmove: Revision: 4.0 Document: Um - Zmove - 20090731 - 001 - 04 - 00
No ratings yet
Zigbee Motion Detector Zmove: Revision: 4.0 Document: Um - Zmove - 20090731 - 001 - 04 - 00
18 pages
A2A (S1) A3ACPU - User S Manual (Supplement)
No ratings yet
A2A (S1) A3ACPU - User S Manual (Supplement)
35 pages

Decision Tree KNN

Uploaded by

Decision Tree KNN

Uploaded by

K-Nearest Neighbor Classification Algorithm

KNN classification follows a simple algorithm. The algorithm works as follows:

The inputs to the algorithm are:

• The dataset with labeled data points

If we take the value of p as 2 then we get the Euclidean distance.

If we set p to 1 then we get a distance function known as the Manhattan distance.

KNN Classification Numerical

A11 (9, 11) C3

A13 (3, 10) C2

A15 (6, 11) C2

Point Coordinates Distance from P (5, 7)

A1 (2, 10) 4.24

A3 (11, 11) 7.21

A7 (5, 10) 3.0

A9 (10, 12) 7.07

A10 (7, 5) 2.82

A11 (9, 11) 5.65

A13 (3, 10) 3.60

A15 (3, 8) 2.23

A16 (6, 11) 4.12

Point Coordinates Distance from P (5, 7)

A12 (4, 6) 1.41

A15 (3, 8) 2.23

A10 (7, 5) 2.82

A13 (3, 10) 3.6

A16 (6, 11) 4.12

A1 (2, 10) 4.24

A11 (9, 11) 5.65

A9 (10, 12) 7.07

A3 (11, 11) 7.21

X1 = Acid Durability (seconds) X2 = Strength(kg/square meter) Y = Classification

1. Determine parameter K = number of nearest neighbors

calculate (without square root)

7 is included in Good category.

are where the data is split.

Day Outlook Temperature Humidity Wind Play Golf

Here, following tasks are performed recursively

1. Create root node for the tree

Wind = Weak Wind = Strong Total

Temperature Humidity Wind Play Golf

In the similar fashion, we compute the following values

You might also like