0% found this document useful (0 votes)
35 views9 pages

Decision Tree KNN

Uploaded by

Romesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views9 pages

Decision Tree KNN

Uploaded by

Romesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

K-Nearest Neighbor Classification Algorithm

KNN classification follows a simple algorithm. The algorithm works as follows:

The inputs to the algorithm are:

• The dataset with labeled data points


• The number k i.e. the number of nearest neighbors that we use to find the class of any new instance of data.
• The new data point.

Using the above inputs, we follow the below steps to classify any data point.

1. First, we choose the number k and a distance metric. You can take any distance metric such as Euclidean, Minkowski,
or Manhattan distance for numerical attributes in the dataset. You can also specify your own distance metric if you
have datasets having categorical or mixed attributes.
2. For a new data point P, calculate its distance to all the existing data points.
3. Select the k-nearest data points, where k is a user-specified parameter.
4. Among the k-nearest neighbors, count the number of data points in each class. We do this to select the class label
with a majority of data points in the k neighbors that we select.
5. Assign the new data point to the class with the majority class label among the k-nearest neighbors.

Now that we have discussed the basic intuition and the algorithm for KNN classification, let us discuss a KNN classification
numerical example using a small dataset.

Minkowski distance

Euclidean distance can be generalised using the Minkowski norm also known as the p norm. The formula for Minkowski

distance is:

Here we can see that the formula differs from the formula of Euclidean distance as we can see that instead of squaring the

difference, we have raised the difference to the power of p and have also taken the p root of the difference. Now the biggest

advantage of using such a distance metric is that we can change the value of p to get different types of distance metrics.

p=2

If we take the value of p as 2 then we get the Euclidean distance.

p=1

If we set p to 1 then we get a distance function known as the Manhattan distance.

KNN Classification Numerical

Example

To solve the numerical example on the K-nearest neighbor i.e. KNN classification algorithm, we will use the following dataset.
Point Coordinates Class Label

A1 (2,10) C2

A2 (2, 6) C1

A3 (11,11) C3

A4 (6, 9) C2

A5 (6, 5) C1

A6 (1, 2) C1

A7 (5, 10) C2

A8 (4, 9) C2

A9 (10, 12) C3

A10 (7, 5) C1

A11 (9, 11) C3

A12 (4, 6) C1

A13 (3, 10) C2

A15 (3, 8) C2

A15 (6, 11) C2


KNN classification dataset
In the above dataset, we have fifteen data points with three class labels. Now, suppose that we have to find the class label of the
point P= (5, 7).

For this, we will first specify the number of nearest neighbors i.e. k. Let us take k to be 3. Now, we will find the distance of P to
each data point in the dataset. For this KNN classification numerical example, we will use the euclidean distance metric. The
following table shows the euclidean distance of P to each data point in the dataset.

Point Coordinates Distance from P (5, 7)

A1 (2, 10) 4.24

A2 (2, 6) 3.16

A3 (11, 11) 7.21

A4 (6, 9) 2.23

A5 (6, 5) 2.23

A6 (1, 2) 6.40

A7 (5, 10) 3.0

A8 (4, 9) 2.23

A9 (10, 12) 7.07

A10 (7, 5) 2.82

A11 (9, 11) 5.65


A12 (4, 6) 1.41

A13 (3, 10) 3.60

A15 (3, 8) 2.23

A16 (6, 11) 4.12


Distance of each point for the new point
After finding the distance of each point in the dataset to P, we will sort the above points according to their distance from P (5,
7). After sorting, we get the following table.

Point Coordinates Distance from P (5, 7)

A12 (4, 6) 1.41

A4 (6, 9) 2.23

A5 (6, 5) 2.23

A8 (4, 9) 2.23

A15 (3, 8) 2.23

A10 (7, 5) 2.82

A7 (5, 10) 3

A2 (2, 6) 3.16

A13 (3, 10) 3.6

A16 (6, 11) 4.12

A1 (2, 10) 4.24

A11 (9, 11) 5.65

A6 (1, 2) 6.4

A9 (10, 12) 7.07

A3 (11, 11) 7.21


Sorted Points according to distance
As we have taken k=3, we will now consider the class labels of three points in the dataset nearest to point P to classify P In the
above table, A12, A4, and A5 are the closest 3 neighbors of point P. Hence, we will use the class labels of points A12, A4, and
A5 to decide the class label for P.

Now, point A12, A4, and A5 have the class labels C1, C2, and C1 respectively. Among these points, the majority class label is C1.
Therefore, we will specify the class label of point P = (5, 7) as C1. Hence, we have successfully used KNN classification to classify
point P according to the given dataset.

Q.2 We have data from the questionnaires survey (to ask people opinion) and objective testing with two attributes

(acid durability and strength) to classify whether a special paper tissue is good or not. Here is four training samples

X1 = Acid Durability (seconds) X2 = Strength(kg/square meter) Y = Classification

7 7 Bad

7 4 Bad
3 4 Good

1 4 Good

Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7. Without another expensive

survey, can we guess what the classification of this new tissue is?

1. Determine parameter K = number of nearest neighbors

Suppose use K = 3

2. Calculate the distance between the query-instance and all the training samples

Coordinate of query instance is (3, 7), instead of calculating the distance we compute square distance which is faster to

calculate (without square root)

X2 = Strength
X1 = Acid Durability (seconds) Square Distance to query instance (3, 7)
(kg/square meter)

7 7

7 4

3 4

1 4

3. Sort the distance and determine nearest neighbors based on the K-th minimum distance
X2 = Strength
X1 = Acid Durability Square Distance to query instance Rank minimum Is it included in 3-Nearest
(kg/square
(seconds) (3, 7) distance neighbors?
meter)

7 7 3 Yes

7 4 4 No

3 4 1 Yes

1 4 2 Yes

4. Gather the category (Y) of the nearest neighbors. Notice in the second row last column that the category of nearest neighbor

(Y) is not included because the rank of this data is more than 3 (=K).
X2 =
X1 = Acid Rank
Strength Square Distance to query Is it included in 3- Y = Category of
Durability minimum
(kg/square instance (3, 7) Nearest neighbors? nearest Neighbor
(seconds) distance
meter)

7 7 3 Yes Bad

7 4 4 No -

3 4 1 Yes Good

1 4 2 Yes Good

5. Use simple majority of the category of nearest neighbors as the prediction value of the query instance
We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that pass laboratory test with X1 = 3 and X2 =

7 is included in Good category.

Decision tree:
Introduction
Decision Trees are a type of Supervised Machine Learning (that is you explain what the input is
and what the corresponding output is in the training data) where the data is continuously split
according to a certain parameter. The tree can be explained by two entities, namely decision
nodes and leaves. The leaves are the decisions or the final outcomes. And the decision nodes

are where the data is split.

• Entropy:

Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of
the amount of uncertainty or randomness in data.
Information Gain:

Let’s understand this with the help of an example. Consider a piece of data collected over the
course of 14 days where the features are Outlook, Temperature, Humidity, Wind and the outcome
variable is whether Golf was played on the day. Now, our job is to build a predictive model which
takes in above 4 parameters and predicts whether Golf will be played on the day. We’ll build a
decision tree -

Day Outlook Temperature Humidity Wind Play Golf


D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Here, following tasks are performed recursively

1. Create root node for the tree


2. If all examples are positive, return leaf node ‘positive’
3. Else if all examples are negative, return leaf node ‘negative’
4. Calculate the entropy of current state H(S)
5. For each attribute, calculate the entropy with respect to the attribute ‘x’ denoted by H(S,
x)
6. Select the attribute which has maximum value of IG(S, x)
7. Remove the attribute that offers highest IG from the set of attributes
8. Repeat until we run out of all attributes, or the decision tree has all leaf nodes.
Now, let's go ahead and grow the decision tree. The initial step is to calculate H(S), the Entropy
of the current state. In the above example, we can see in total there are 5 No’s and 9 Yes’s.

Yes No Total
9 5 14

Remember that the Entropy is 0 if all members belong to the same class, and 1 when half of
them belong to one class and other half belong to other class that is perfect randomness. Here
it’s 0.94 which means the distribution is fairly random. Now, the next step is to choose the
attribute that gives us highest possible Information Gain which we’ll choose as the root node.
Let’s start with ‘Wind’

where ‘x’ are the possible values for an attribute. Here, attribute ‘Wind’ takes two possible values
in the sample data, hence x = {Weak, Strong} We’ll have to calculate:

Amongst all the 14 examples we have 8 places where the wind is weak and 6 where the wind is
Strong.

Wind = Weak Wind = Strong Total


8 6 14

Now, out of the 8 Weak examples, 6 of them were ‘Yes’ for Play Golf and 2 of them were ‘No’ for
‘Play Golf’. So, we have,
Similarly, out of 6 Strong examples, we have 3 examples where the outcome was ‘Yes’ for Play
Golf and 3 where we had ‘No’ for Play Golf.

Remember, here half items belong to one class while other half belong to other. Hence we have
perfect randomness. Now we have all the pieces required to calculate the Information Gain,

Which tells us the Information Gain by considering ‘Wind’ as the feature and give us information
gain of 0.048. Now we must similarly calculate the Information Gain for all the features.

We can clearly see that IG(S, Outlook) has the highest information gain of 0.246, hence we chose
Outlook attribute as the root node. At this point, the decision tree looks like.

Here we observe that whenever the outlook is Overcast, Play Golf is always ‘Yes’, it’s no
coincidence by any chance, the simple tree resulted because of the highest information gain is
given by the attribute Outlook. Now how do we proceed from this point? We can simply
apply recursion, you might want to look at the algorithm steps described earlier. Now that we’ve
used Outlook, we’ve got three of them remaining Humidity, Temperature, and Wind. And, we had
three possible values of Outlook: Sunny, Overcast, Rain. Where the Overcast node already ended
up having leaf node ‘Yes’, so we’re left with two subtrees to compute: Sunny and Rain
Table where the value of Outlook is Sunny looks like:

Temperature Humidity Wind Play Golf


Hot High Weak No
Hot High Strong No
Mild High Weak No
Cool Normal Weak Yes
Mild Normal Strong Yes

In the similar fashion, we compute the following values

As we can see the highest Information Gain is given by Humidity. Proceeding in the same way

with
will give us Wind as the one with highest information gain. The final Decision Tree looks
something like this.

You might also like