0% found this document useful (0 votes)
10 views39 pages

Lec08 Classification KNN ANN

The document discusses two classification methods in machine learning: Instance-Based Learning (kNN) and Artificial Neural Networks (ANNs). kNN classifies instances based on the majority class of their nearest neighbors, while ANNs utilize interconnected nodes to learn complex functions through weight adjustments. Key concepts include the lazy learning approach of kNN and the perceptron model in ANNs, highlighting their respective advantages and limitations.

Uploaded by

ozcan8479
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views39 pages

Lec08 Classification KNN ANN

The document discusses two classification methods in machine learning: Instance-Based Learning (kNN) and Artificial Neural Networks (ANNs). kNN classifies instances based on the majority class of their nearest neighbors, while ANNs utilize interconnected nodes to learn complex functions through weight adjustments. Key concepts include the lazy learning approach of kNN and the perceptron model in ANNs, highlighting their respective advantages and limitations.

Uploaded by

ozcan8479
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Other Classification Methods

• Instance-Based Learning (kNN)


• Artificial Neural Networks

Data Mining 1
Instance-Based Learning (kNN)

Machine Learning 2
Instance-Based Learning
• Instance-based learning methods simply store the training examples instead of
learning explicit description of the target function.
– Generalizing the examples is postponed until a new instance must be classified.
– When a new instance is encountered, its relationship to the stored examples is examined in
order to assign a target function value for the new instance.
• One of instance-based learning methods is k-nearest neighbor method.
• Instance-based methods are referred to as lazy learning methods because they delay
processing until a new instance must be classified.
– Eager methods (decision tree, neural networks, …) generalize the training set to learn a
function.
• A key advantage of lazy learning is that instead of estimating the target function once
for the entire instance space, these methods can estimate it locally and differently for
each new instance to be classified.

Data Mining 3
k-Nearest Neighbor Learning (Classification)
• k-Nearest Neighbor Learning algorithm can be used in the prediction of values of
continuous valued functions in addition to the prediction of class values of discrete-
valued functions (classification)
• k-Nearest Neighbor Learning algorithm assumes all instances correspond to points
in the n-dimensional space Rn
• The nearest neighbors of an instance are defined in terms of Euclidean distance.
• Euclidean distance between the instances xi = <xi1,…,xin> and
xj = <xj1,…,xjn> are:
n
d ( xi, xj )   (
r 1
xir  xjr ) 2

• For a given instance xq, Class(xq) is computed using the class values of k-nearest
neighbors of xq

Data Mining 4
k-Nearest Neighbor Classification
• Store all training examples <xi,Class(xi)>
• Calculate Class(xq) for a given instance xq using its k-nearest neighbors.
• Nearest neighbor: (k=1)
– Locate the nearest traing example xn, and estimate Class(xq) as Class(xn).
• k-Nearest neighbor:
– Locate k nearest traing examples, and estimate Class(xq) using majority vote among
class values of k nearest neighbors.
• The test example is classified based on the majority class of its nearest neighbors:

where v is a class label, yi is the class label for one of the nearest neighbors, and I(.) is an
indicator function that returns the value 1 if its argument is true and 0 otherwise.

Data Mining 5
k-Nearest Neighbor Classification - Example

A B Class 3-Nearest Neighbor Classification of instance <3,3>


1 1 no
A B Distance
2 1 no of <3,3>
3 2 yes 1 1 8
7 7 yes 2 1 5
8 8 yes 3 2 1
7 7 32
8 8 50

• First three example are 3 Nearest Neighbors of instance <3,3>.


• Two of them is no and one of them is yes.
• Majority of classes of its neighbors are no, the classification of instance <3,3> is no.

Data Mining 6
Distance Weighted kNN Classification
• In the majority voting approach, every neighbor has the same impact on the
classification
• We can weight the influence of each nearest neighbor xi according to its distance to
instance xq.

• Using the distance-weighted voting scheme, the class label can be determined:

Data Mining 7
Distance Weighted kNN Classification- Example

A B Class Distance Weighted 3-Nearest Neighbor Classification


1 1 no of instance <3,3>
A B Distance
2 1 no of <3,3>
3 2 yes 1 1 8
7 7 yes 2 1 5
8 8 yes 3 2 1
7 7 32
8 8 50

• First three example are 3 Nearest Neighbors of instance <3,3>.


• Weight of no = 1/8 + 1/5 = 13/40 Weight of yes = 1/1 = 1
• Since 1 > 13/40, the classification of instance <3,3> is yes.

Data Mining 8
k-Nearest Neighbor Algorithm
Consider the following set of training examples:

• Using 3-nearest neighbor algorithm, find the target classification for the instance
<A=5,B=5>. Show your work.
• Using distance weighted 3-nearest neighbor algorithm, find the target classification
for the instance <A=5,B=5>. Show your work.

Data Mining 9
k-Nearest Neighbor Algorithm

Distance of <5,5>
sqrt(32)
sqrt(10) * nearest
sqrt(5) * neighbors
sqrt(1) *
sqrt(32)

3-nearest neighbor: 2 yes, 1 no  YES

Weighted 3-nearest neighbor:


Weight of yes: 1/10 + 1/5 = 3/10
Weight of no : 1/1 = 1

Since 1 > 3/10  NO


Data Mining 10
k-Nearest Neighbor Classification - Issues

• Choosing the value of k:


– If k is too small, sensitive to noise points
– If k is too large, neighborhood may
include points from other classes. X

• Scaling issues:
– Attributes may have to be scaled to prevent
distance measures from being dominated by
one of the attributes

Data Mining 11
Artificial Neural Networks

Machine Learning 12
Artificial Neural Networks
• Artificial neural networks (ANNs) provide a general, practical method for learning
real-valued, discrete-valued, and vector-valued functions from examples.
• Algorithms such as BACKPROPAGATION gradient descent to tune network
parameters to best fit a training set of input-output pairs.
• The study of artificial neural networks (ANNs) has been inspired in part by the
observation that biological learning systems are built of very complex webs of
interconnected neurons.
• Artificial neural networks are built out of a densely interconnected set of simple units,
where each unit takes a number of real-valued inputs (possibly the outputs of other
units) and produces a single real-valued output (which may become the input to many
other units).

Data Mining 13
Properties of Artificial Neural Networks
• A large number of very simple, neuron-like processing elements called units,
• A large number of weighted, directed connections between pairs of units
– Weights may be positive or negative real values
• Local processing in that each unit computes a function based on the outputs of a
limited number of other units in the network
• Each unit computes a simple function of its input values, which are the weighted
outputs from other units.
– If there are n inputs to a unit, then the unit's output, or activation is defined by a=
g((w1 * x1) + (w2 * x2) + ... + (wn * xn)).
– Each unit computes a (simple) function g of the linear combination of its inputs.
• Learning by tuning the connection weights

Data Mining 14
Artificial Neural Networks (ANN)

• Model is an assembly of inter-


connected nodes and weighted links
• Output node sums up each of its
input value according to the
weights of its links
• Compare output node against some
threshold t

Data Mining 15
Perceptron

x0=1
x1 w1
w0
w2
x2  𝑛 o
.
. wn
෍ 𝑤𝑖 𝑥𝑖
. 𝑖=0
xn
𝑛

1 𝑖𝑓 ෍ 𝑤𝑖 𝑥𝑖 > 0
𝑜 𝑥0, … , 𝑥𝑛 =
𝑖=0
−1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Data Mining 16
Perceptron
• Perceptron is a Linear Threshold Unit (LTU).
• A perceptron takes a vector of real-valued inputs, calculates a linear combination of
these inputs, then outputs 1 if the result is greater than some threshold and -1
otherwise.
• Given inputs xl through xn, the output o(x1, . . . , xn) computed by the perceptron is:

each wi is a real-valued constant, or weight, that determines the contribution of input


xi to the perceptron output.
• The quantity (-w0) is a threshold that the weighted combination of inputs must
surpass in order for the perceptron to output 1.
– To simplify notation, we imagine an additional constant input x0 = 1

Data Mining 17
Perceptron

• Learning a perceptron involves choosing values for weights w0, …,wn.


• A perceptron represents a hyperplane decision surface in the
n-dimensional space of instances.
• The perceptron outputs 1 for instances lying on one side of the hyperplane
and outputs -1 for instances lying on the other side.
• Some sets of positive and negative examples cannot be separated by any
hyperplane.
– Those that can be separated are called linearly separable sets of
examples.
• A single perceptron can be used to represent many boolean functions.
– AND, OR, NAND, NOR are representable by a perceptron
– XOR cannot be representable by a perceptron.

Data Mining 18
Representational Power of Perceptrons

x2 x2
+
+
+ + -
- -
x1 x1
+ - +
-
-
Representable by a perceptron NOT representable by a perceptron

Data Mining 19
Perceptron - Example

X Y Out Give a perceptron to represent OR function.


0 0 0
0 1 1
1 0 1
1 1 1

Data Mining 20
Perceptron - Example

Give a perceptron to represent OR function.


X Y Out
0 0 0 1
0 1 1 w0

1 0 1 X w1
Out
1 1 1
w2
Y

Out is 1 if w0+w1*X+w2*Y > 0 ; 0 otherwise


What will be the weights?

Data Mining 21
Perceptron - Example
X Y Out Give a perceptron to represent OR function.
0 0 0
0 1 1 1
w0
1 0 1
X w1
1 1 1 Out
w2
Y

Out is 1 if w0+w1*X+w2*Y > 0 ; 0 otherwise

w0 = -0.5 w1 = 0.7 w2 = 0.7

Data Mining 22
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.

Data Mining 23
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.

1
w0
x w1
Out
w2
y

Out is 1 if w0+w1*x+w2*y > 0 ; 0 otherwise


What will be the weights?

Data Mining 24
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.

1
w0
x w1
Out
w2
y

Out is 1 if w0+w1*x+w2*y > 0 ; 0 otherwise

w0 = 1 w1 = -0.6 w2 = -0.6

Data Mining 25
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.

Data Mining 26
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.

1
x
w1 w0
Out
y w2

z w3

Out is 1 if w0+w1*x+w2*y +w3*z > 0 ; 0 otherwise


What will be the weights?

Data Mining 27
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.

1
x
w1 w0
Out
y w2

z w3

Out is 1 if w0+w1*x+w2*y +w3*z > 0 ; 0 otherwise

w0 = -0.5 w1 = 0.6 w2 = 0.3 w3=0.3

Data Mining 28
Perceptron Training Rule
• To learn an acceptable weight vector is to begin with random weights, then iteratively
apply the perceptron to each training example, modifying the perceptron weights
whenever it misclassifies an example.
– If the training example classifies correctly, weights are not updated.

• This process is repeated, iterating through the training examples as many times as
needed until the perceptron classifies all training examples correctly.
– Each pass through all of the training examples is called one epoch

• Weights are modified at each step according to perceptron training rule

Data Mining 29
Perceptron Training Rule
• Weights are modified at each step according to perceptron training rule.
wi = wi + wi
wi =  (t - o) xi
t is the target value
o is the perceptron output
 is a small constant (e.g. 0.1) called learning rate

• If the output is correct (t=o) the weights wi are not changed


• If the output is incorrect (to) the weights wi are changed such that
the output of the perceptron for the new weights is closer to t.
• The algorithm converges to the correct classification
• if the training data is linearly separable
• and  is sufficiently small

Data Mining 30
General Structure of ANN

Data Mining 31
Multi-Layer Networks
• Single perceptron can only express linear decision surfaces.
• Multilayer networks are capable of expressing a rich variety of nonlinear decision
surfaces.

output layer

hidden layer

input layer

Data Mining 32
Multi-Layer Networks with Linear Units
Ex. XOR
• Multiple layers of cascaded linear units still produce only linear functions.

OR: 0.5*x1 + 0.5*x2 – 0.25 > 0


w0= -0.25
AND: 0.5*x1 + 0.5*x2 – 0.75 > 0

w1=0. w2= -0.5 XOR: 0.5*x1 - 0.5*x2 – 0.25 > 0


5
OR AN
w0= -0.25
D w0= -0.75
w2=0. w1=0.
5 5
w2=0.
w1=0.5 5

x1 x
2
Data Mining 33
Multi-Layer Networks with Linear Units - Example
• Give an artificial neural network for the following function. Make clear the structure
of your ANN and the used weights.

Data Mining 34
Multi-Layer Networks with Linear Units - Example
Give an artificial neural network for the following function. Make clear the structure of
your ANN and the used weights.

Data Mining 35
Multi-Layer Networks with Linear Units - Example

Unit1: w10 = -1 w12 = 0.4 w12 = 0.4


Unit2: w20 = 1 w21 = -0.3 w22 = -0.3
Unit3: w30 = - 0.6 w31 = 0.5 w32 = 0.5 (and function)

x y unit1 unit2 unit3 Classification


1 1 0 1 0 0
1 2 1 1 1 1
2 1 1 1 1 1
2 2 1 0 0 0
Data Mining 36
Multi-Layer Networks with Non-Linear Units
• Multiple layers of cascaded linear units still produce only linear functions.

• We prefer networks capable of representing highly nonlinear functions.

• What we need is a unit whose output is a nonlinear function of its inputs, but whose
output is also a differentiable function of its inputs.

• One solution is the sigmoid unit, a unit very much like a perceptron, but based on a
smoothed, differentiable threshold function.

Data Mining 37
Algorithm for Learning ANN
• Initialize the weights (w0, w1, …, wk)
• Adjust the weights in such a way that the output of ANN is consistent with class
labels of training examples

E   Yi  f ( wi , X i )
2
– Objective function:
i

– Find the weights wi’s that minimize the above objective function using
backpropagation algorithm

Data Mining 38
Units of ANN
• Perceptron (Linear Threshold Unit)
• Linear Unit produces continuous output o (not just –1,1)
o = w0 + w1 x1 + … + w n xn

• Sigmoid Unit

Data Mining 39

You might also like