0% found this document useful (0 votes)
16 views14 pages

9.introduction To Artificial Intelligence

The document provides an introduction to the K-nearest Neighbor (K-NN) algorithm, a popular instance-based learning method in machine learning. It explains eager learning, instance-based learning, and the classification of data into categorical and continuous values, along with similarity measures for numerical and categorical data. Additionally, it discusses the importance of selecting the appropriate number of neighbors (k) for effective classification outcomes.

Uploaded by

rmj92623
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

9.introduction To Artificial Intelligence

The document provides an introduction to the K-nearest Neighbor (K-NN) algorithm, a popular instance-based learning method in machine learning. It explains eager learning, instance-based learning, and the classification of data into categorical and continuous values, along with similarity measures for numerical and categorical data. Additionally, it discusses the importance of selecting the appropriate number of neighbors (k) for effective classification outcomes.

Uploaded by

rmj92623
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Introduction to Artificial Intelligence

K-nearest Neighbor Algorithm


Eager learning.
Instance-based learning.
Advantages and disadvantages of Instance-based
learning.
K-nearest neighbor algorithm.
Categorical Value and Continuous Value.
Similarity metrics.
Select K-nearest neighbor points.
Eager learning

• It is the most common machine learning


method.
• Eager learning is a learning method in which
the system tries to construct a general, input-
independent target function during training of
the system.

baby born wight


• The training phase the model of the system
build during this phase and aims to find an

estimate function 𝒇(𝒙) by studying all the
training set < 𝒙𝒊 , 𝒇(𝒙𝒊 )> where X attribute
features and 𝒇(𝒙) is the real category or
classification.
The testing phase by using the estimated
function all data elements in the test set are mother wight
classified.
Instance-based Learning

Instance-based learning is one of the supervised


machine learning methods that build the target fx
function to classify new data elements.
It compares the data elements and points with the
point that needs to be classified or predicted in its

baby born wight


value.
The training phase: in which all training sets are stored
< 𝒙𝒊 , 𝒇(𝒙𝒊 )> where X attribute features and 𝒇(𝒙) is the
real category or classification.
Testing phase In this phase:
• The classification of a sample or the data element
(𝑥𝑞 ) is queried as [the mother weight]. mother wight
• Then identify the nearest elements (most similar) to
the element 𝑥𝑞 in the training set.

• Then estimate the target function 𝑓(𝑥).
K-nearest Neighbor

K-nearest Neighbor :
The most basic instance-based method is the k-NEAREST
NEIGHBOR algorithm.
This algorithm assumes all instances correspond to points
in the n-dimensional space(attribute).
• Example: if the goal of the algorithm is to classify the
data with only two attributes (x, y) and each element of
the data is classified into ⬣ or 
Then :
• The data will be represented in a 2D space.
• Then the similarity is calculated between the point we want
to classify and all the points in the training set.
• Then the most similar points are selected, and the most
common classification is chosen among the similar points.
Categorical Value

• The nearest neighbor algorithm is used to classify the data into


categorical or continuous values.
• In the case of classifying data into independent categories that are
within a group of categories, such as the student’s result {successful,
failing}. The algorithm does the following:
1. Take the data element or instance x to be classified.
2. Find the nearest neighbor points to x in the training data and determine the
number of neighbors with k that takes an integer value greater than zero, for
example, k = 5.
3. Specify the most common C category between neighbor points.
4. Classify x by C classification.
Continuous Value

• In the case of classification into a continuous value or a numeric value


within a specific range, such as student rate = 3.54, the algorithm does
the following:
1. Take the data element or instance x to be classified.
2. Find the nearest points to x in the training data.
3. Calculate the average value of neighbor points.
4. Give x the mean value that was calculated.
• Calculating similarity considers determining the number of neighbors or
neighbor points. Also, one of the most important factors for the success
of the nearest neighbor algorithm.
Similarity measure - Numerical Data

• Calculating The similarity between points with numeric attributes is done


by calculating the standard Euclidean distance, which means that the
greater the value of the distance, the less similarity between the items
being compared.
• For example:
If data elements are classified by a number of the following attributes:
𝑥 =< 𝑎1 , 𝑎2 ,…., 𝑎𝑛 >
Since 𝑎1 is the first attribute in the x data elements, to calculate the
similarity or distance d between any two elements 𝑥𝑖 and 𝑥𝑗 of the data
elements, we can use the Euclidean distance function as follows:
𝑥𝑖 =< 𝑎1 , 𝑎2 ,…., 𝑎𝑛 >
𝑥𝑗 =< 𝑏1 , 𝑏2 ,…., 𝑏𝑛 >
𝑑 𝑥𝑖 , 𝑥𝑗 = 𝑎1 − 𝑏1 2 + 𝑎2 − 𝑏2 2 + ⋯ . + 𝑎𝑛 − 𝑏𝑛 2
Exercise 1: Classification of Data into Categories:

Mother
Training Group Mother Age Fetal Health
2 2 2
Wight
d(xi , xj) = 𝑎 1 − 𝑏1 + 𝑎 2 − 𝑏2 + ⋯ + 𝑎𝑛 − 𝑏𝑛
25 67.6 Healthy

33 88.9 Healthy
d(x1 , xq) = 25 − 23 2 + 67.6 − 60 2 =7.85
48 90 Not Healthy
d(x2 , xq) = 33 − 23 2 + 88.9 − 60 2 =30.58
60 50 Not Healthy

d(x3 , xq) = 48 − 23 2 + 90 − 60 2 =39.05


42 56 Healthy

d(x4 , xq) = 60 − 23 2 + 50 − 60 2 =38.33

d(x5 , xq) = 42 − 23 2 + 56 − 60 2 =19.42 Mother


Testing Group Mother Age Fetal Health
Wight
23 60
Similarity measure - Categorical Data

• To calculate the similarity between the categorical attributes, the


nominal attributes are converted to binary attributes (0,1) using one
of the Transformation Functions.
• For example, to calculate the distance between two elements 𝑥𝑖 , 𝑥𝑗
must calculate the distance between the attributes of the elements
using the following function:
𝑥𝑖 =< 𝑎1 , 𝑎2 ,…., 𝑎𝑛 >
𝑥𝑗 =< 𝑏1 , 𝑏2 ,…., 𝑏𝑛 >
0, 𝑎1 = 𝑏1
𝑑_𝑎1 𝑥𝑖 , 𝑥𝑗 = ቊ
1, 𝑎1 ≠ 𝑏1
Choose The Number of K-nearest neighbor points

• Suppose we want to classify a new


element 𝑥𝑞 using the nearest neighbor
algorithm (Returns the most common
value of the instance among the training
elements ) if the number of neighboring
elements is k = 1, k = 5.

We notice here that:


In the case of using the simple nearest
neighbor algorithm K = 1 (1 - NN), the
instance classification will be ⬣, but when
K = 5 (5 - NN), the instance classification
will be 
Choose The Number of K-nearest neighbor points

What is the appropriate number of K-nearest


neighbor points in this example?

We notice here that:


1. If K = 1 is chosen, the algorithm will
classify the point based on the nearest
point, regardless of the classification of
most other similar points.
2. If K = 3 or K = 5 is chosen, then the most
common classification will be chosen in
the neighboring points.
• Usually, it's better not to choose an even
number, such as 2 or 4, because the
algorithm cannot make a decision if there is
an equalization.
Thank You

You might also like