UNIT 3 - Part - 2
UNIT 3 - Part - 2
backpropagation, SVM,kNN
21CSE355T – Data Mining and Analytics
Unit 3 - Part-2
Classification by backpropagation
“What is backpropagation?” Backpropagation is a neural network learning algorithm.
• The backpropagation algorithm performs learning on a multilayer feed-forward neural network. It
iteratively learns a set of weights for prediction of the class label of tuples.
• A multilayer feed-forward neural network consists of an input layer, one or more hidden layers,
and an output layer.
• Each layer is made up of units. The inputs to the network correspond to the attributes measured
for each training tuple.
• The inputs are fed simultaneously into the units making up the input layer.
• These inputs pass through the input layer and are then weighted and fed simultaneously to a
second layer of “neuronlike” units, known as a hidden layer.
• The outputs of the hidden layer units can be input to another hidden layer, and so on. The
number of hidden layers is arbitrary, although in practice, usually only one is used.
• The weighted outputs of the last hidden layer are input to units making up the output layer,
which emits the network’s prediction for given tuples.
Network Topology
• Neural networks can be used for both classification (to predict the class label of a
given tuple) and numeric prediction (to predict a continuous-valued output).
• For classification, one output unit may be used to represent two classes (where
the value 1
• represents one class, and the value 0 represents the other).
• If there are more than two classes, then one output unit per class is used.
(strategies used in multiclass classification)
• There are no clear rules as to the “best” number of hidden layer units.
• Network design is a trial-and-error process and may affect the accuracy of the
resulting trained network.
• The initial values of the weights may also affect the resulting accuracy.
• Once a network has been trained and its accuracy is not considered acceptable, it
is common to repeat the training process with a different network topology or a
different set of initial weights.
How does backpropagation work
• Backpropagation learns by iteratively processing a data set of training tuples,
comparing the network’s prediction for each tuple with the actual known target
value.
• The target value may be the known class label of the training tuple (for
classification problems) or a continuous value (for numeric prediction).
• For each training tuple, the weights are modified so as to minimize the mean-
squared error between the network’s prediction and the actual target value.
• These modifications are made in the “backwards” direction (i.e., from the output
layer) through each hidden layer down to the first hidden layer (hence the name
backpropagation).
• Although it is not guaranteed, in general the weights will eventually converge,
and the learning process stops.
Backpropagation algorithm
Backpropagation
• Initialize the weights: The weights in the network are initialized to small random numbers (e.g., ranging
from 1.0 to 1.0, or 0.5 to 0.5).
• Each unit has a bias associated with it, as explained later. The biases are similarly initialized to small random
numbers.
• Each training tuple, X, is processed by the following steps.
• Propagate the inputs forward: First, the training tuple is fed to the network’s input layer. The inputs pass
through the input units, unchanged. That is, for an input unit, j, its output, Oj , is equal to its input value, Ij .
• Next, the net input and output of each unit in the hidden and output layers are computed. The net input to
a unit in the hidden or output layers is computed as a linear combination of its inputs.
• To help illustrate this point, a hidden layer or output layer unit shown in the following figure.
Backpropagation
• Each such unit has a number of inputs to it that are, in fact, the outputs of the
units connected to it in the previous layer.
• Each connection has a weight. To compute the net input to the unit, each input
connected to the unit is multiplied by its corresponding weight, and this is
summed.
• Given a unit, j in a hidden or output layer, the net input, Ij , to unit j is
• where wij is the weight of the connection from unit i in the previous layer to unit j;
• Oi is the output of unit i from the previous layer; and j is the bias of the unit.
• The bias acts as a threshold in that it serves to vary the activity of the unit.
Backpropagation
• Backpropagate the error: The error is propagated backward by updating the weights and biases
to reflect the error of the network’s prediction. For a unit j in the output layer, the error Errj is
computed by
• The weights and biases are updated to reflect the propagated errors.
• Terminating condition:
• Training stops when
• Change in the weight in the previous epoch are so small as to be below some specified threshold, or
• The percentage of tuples misclassified in the previous epoch is below some threshold, or
• A prespecified number of epochs has expired.
Sample calculations for learning by the backpropagation algorithm
This example shows the calculations for backpropagation, given the first training tuple, X.
Sample calculations for learning by the backpropagation algorithm
Backpropagation
Classify an unknown tuple using a trained network
• The distance formula involves comparing the values of each feature. For example, to calculate the
distance between the tomato (sweetness = 6, crunchiness = 4), and the green bean (sweetness =
3, crunchiness = 7), we can use the formula as follows:
Distance
Closest neighbours
Choosing appropriate k value
• Deciding how many neighbors to use for kNN
determines how well the mode will generalize to
future data.
• The balance between overfitting and underfitting the
training data is a problem known as the biasvariance
tradeoff.
• Choosing a large k reduces the impact or variance
caused by noisy data, but can bias the learner such
that it runs the risk of ignoring small, but important
patterns.
Choosing appropriate k value
Choosing appropriate k value
• In practice, choosing k depends on the difficulty of the
concept to be learned and the number of records in the
training data.
• Typically, k is set somewhere between 3 and 10. One common
practice is to set k equal to the square root of the number of
training examples.
• In the classifier, we might set k = 4, because there were 15
example ingredients in the training data and the square root
of 15 is 3.87.
Min-Max Normalisation
• The traditional method of rescaling features for kNN is minmax normalization.
• This process transforms a feature such that all of its values fall in a range between 0
and 1. The formula for normalizing a feature is as follows. Essentially, the formula
subtracts the minimum of feature X from each value and divides by the range of X:
• K Nearest Neighbors
• Local Regression
• Lazy Naive Bayes
kNN algorithm
END OF UNIT-III