Lec08 Classification KNN ANN
Lec08 Classification KNN ANN
Data Mining 1
Instance-Based Learning (kNN)
Machine Learning 2
Instance-Based Learning
• Instance-based learning methods simply store the training examples instead of
learning explicit description of the target function.
– Generalizing the examples is postponed until a new instance must be classified.
– When a new instance is encountered, its relationship to the stored examples is examined in
order to assign a target function value for the new instance.
• One of instance-based learning methods is k-nearest neighbor method.
• Instance-based methods are referred to as lazy learning methods because they delay
processing until a new instance must be classified.
– Eager methods (decision tree, neural networks, …) generalize the training set to learn a
function.
• A key advantage of lazy learning is that instead of estimating the target function once
for the entire instance space, these methods can estimate it locally and differently for
each new instance to be classified.
Data Mining 3
k-Nearest Neighbor Learning (Classification)
• k-Nearest Neighbor Learning algorithm can be used in the prediction of values of
continuous valued functions in addition to the prediction of class values of discrete-
valued functions (classification)
• k-Nearest Neighbor Learning algorithm assumes all instances correspond to points
in the n-dimensional space Rn
• The nearest neighbors of an instance are defined in terms of Euclidean distance.
• Euclidean distance between the instances xi = <xi1,…,xin> and
xj = <xj1,…,xjn> are:
n
d ( xi, xj ) (
r 1
xir xjr ) 2
• For a given instance xq, Class(xq) is computed using the class values of k-nearest
neighbors of xq
Data Mining 4
k-Nearest Neighbor Classification
• Store all training examples <xi,Class(xi)>
• Calculate Class(xq) for a given instance xq using its k-nearest neighbors.
• Nearest neighbor: (k=1)
– Locate the nearest traing example xn, and estimate Class(xq) as Class(xn).
• k-Nearest neighbor:
– Locate k nearest traing examples, and estimate Class(xq) using majority vote among
class values of k nearest neighbors.
• The test example is classified based on the majority class of its nearest neighbors:
where v is a class label, yi is the class label for one of the nearest neighbors, and I(.) is an
indicator function that returns the value 1 if its argument is true and 0 otherwise.
Data Mining 5
k-Nearest Neighbor Classification - Example
Data Mining 6
Distance Weighted kNN Classification
• In the majority voting approach, every neighbor has the same impact on the
classification
• We can weight the influence of each nearest neighbor xi according to its distance to
instance xq.
• Using the distance-weighted voting scheme, the class label can be determined:
Data Mining 7
Distance Weighted kNN Classification- Example
Data Mining 8
k-Nearest Neighbor Algorithm
Consider the following set of training examples:
• Using 3-nearest neighbor algorithm, find the target classification for the instance
<A=5,B=5>. Show your work.
• Using distance weighted 3-nearest neighbor algorithm, find the target classification
for the instance <A=5,B=5>. Show your work.
Data Mining 9
k-Nearest Neighbor Algorithm
Distance of <5,5>
sqrt(32)
sqrt(10) * nearest
sqrt(5) * neighbors
sqrt(1) *
sqrt(32)
• Scaling issues:
– Attributes may have to be scaled to prevent
distance measures from being dominated by
one of the attributes
Data Mining 11
Artificial Neural Networks
Machine Learning 12
Artificial Neural Networks
• Artificial neural networks (ANNs) provide a general, practical method for learning
real-valued, discrete-valued, and vector-valued functions from examples.
• Algorithms such as BACKPROPAGATION gradient descent to tune network
parameters to best fit a training set of input-output pairs.
• The study of artificial neural networks (ANNs) has been inspired in part by the
observation that biological learning systems are built of very complex webs of
interconnected neurons.
• Artificial neural networks are built out of a densely interconnected set of simple units,
where each unit takes a number of real-valued inputs (possibly the outputs of other
units) and produces a single real-valued output (which may become the input to many
other units).
Data Mining 13
Properties of Artificial Neural Networks
• A large number of very simple, neuron-like processing elements called units,
• A large number of weighted, directed connections between pairs of units
– Weights may be positive or negative real values
• Local processing in that each unit computes a function based on the outputs of a
limited number of other units in the network
• Each unit computes a simple function of its input values, which are the weighted
outputs from other units.
– If there are n inputs to a unit, then the unit's output, or activation is defined by a=
g((w1 * x1) + (w2 * x2) + ... + (wn * xn)).
– Each unit computes a (simple) function g of the linear combination of its inputs.
• Learning by tuning the connection weights
Data Mining 14
Artificial Neural Networks (ANN)
Data Mining 15
Perceptron
x0=1
x1 w1
w0
w2
x2 𝑛 o
.
. wn
𝑤𝑖 𝑥𝑖
. 𝑖=0
xn
𝑛
1 𝑖𝑓 𝑤𝑖 𝑥𝑖 > 0
𝑜 𝑥0, … , 𝑥𝑛 =
𝑖=0
−1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Data Mining 16
Perceptron
• Perceptron is a Linear Threshold Unit (LTU).
• A perceptron takes a vector of real-valued inputs, calculates a linear combination of
these inputs, then outputs 1 if the result is greater than some threshold and -1
otherwise.
• Given inputs xl through xn, the output o(x1, . . . , xn) computed by the perceptron is:
Data Mining 17
Perceptron
Data Mining 18
Representational Power of Perceptrons
x2 x2
+
+
+ + -
- -
x1 x1
+ - +
-
-
Representable by a perceptron NOT representable by a perceptron
Data Mining 19
Perceptron - Example
Data Mining 20
Perceptron - Example
1 0 1 X w1
Out
1 1 1
w2
Y
Data Mining 21
Perceptron - Example
X Y Out Give a perceptron to represent OR function.
0 0 0
0 1 1 1
w0
1 0 1
X w1
1 1 1 Out
w2
Y
Data Mining 22
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.
Data Mining 23
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.
1
w0
x w1
Out
w2
y
Data Mining 24
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.
1
w0
x w1
Out
w2
y
w0 = 1 w1 = -0.6 w2 = -0.6
Data Mining 25
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.
Data Mining 26
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.
1
x
w1 w0
Out
y w2
z w3
Data Mining 27
Perceptron - Example
Give a linear threshold unit (a perceptron) that implements
the following function by giving its weight values.
1
x
w1 w0
Out
y w2
z w3
Data Mining 28
Perceptron Training Rule
• To learn an acceptable weight vector is to begin with random weights, then iteratively
apply the perceptron to each training example, modifying the perceptron weights
whenever it misclassifies an example.
– If the training example classifies correctly, weights are not updated.
• This process is repeated, iterating through the training examples as many times as
needed until the perceptron classifies all training examples correctly.
– Each pass through all of the training examples is called one epoch
Data Mining 29
Perceptron Training Rule
• Weights are modified at each step according to perceptron training rule.
wi = wi + wi
wi = (t - o) xi
t is the target value
o is the perceptron output
is a small constant (e.g. 0.1) called learning rate
Data Mining 30
General Structure of ANN
Data Mining 31
Multi-Layer Networks
• Single perceptron can only express linear decision surfaces.
• Multilayer networks are capable of expressing a rich variety of nonlinear decision
surfaces.
output layer
hidden layer
input layer
Data Mining 32
Multi-Layer Networks with Linear Units
Ex. XOR
• Multiple layers of cascaded linear units still produce only linear functions.
x1 x
2
Data Mining 33
Multi-Layer Networks with Linear Units - Example
• Give an artificial neural network for the following function. Make clear the structure
of your ANN and the used weights.
Data Mining 34
Multi-Layer Networks with Linear Units - Example
Give an artificial neural network for the following function. Make clear the structure of
your ANN and the used weights.
Data Mining 35
Multi-Layer Networks with Linear Units - Example
• What we need is a unit whose output is a nonlinear function of its inputs, but whose
output is also a differentiable function of its inputs.
• One solution is the sigmoid unit, a unit very much like a perceptron, but based on a
smoothed, differentiable threshold function.
Data Mining 37
Algorithm for Learning ANN
• Initialize the weights (w0, w1, …, wk)
• Adjust the weights in such a way that the output of ANN is consistent with class
labels of training examples
E Yi f ( wi , X i )
2
– Objective function:
i
– Find the weights wi’s that minimize the above objective function using
backpropagation algorithm
Data Mining 38
Units of ANN
• Perceptron (Linear Threshold Unit)
• Linear Unit produces continuous output o (not just –1,1)
o = w0 + w1 x1 + … + w n xn
• Sigmoid Unit
Data Mining 39