PERCEPTRONS
ANN system is based on a unit called a perceptron,
A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs,
then outputs a 1 if the result is greater than some threshold and -1 otherwise.
More precisely, given inputs xl through x,, the output o(x1, . . . , x,) computed by the perceptron is
where each wi is a real-valued constant, or weight, that determines the contribution of input xi to
the perceptron output. Notice the quantity (-wO) is a threshold that the weighted combination of
inputs wlxl + . . . + wnxn must surpass in order for the perceptron to output a 1.
Representational Power of Perceptron's
We can view the perceptron as representing a hyperplane decision surface in the n-dimensional
space of instances (i.e., points). The perceptron outputs a 1 for instances lying on one side of the
hyperplane and outputs a -1 for instances lying on the other side
A single perceptron can be used to represent many boolean functions
The decision surface represented by a two-input perceptron.
(a) A set of training examples and the decision surface of a perceptron that classifies them correctly.
(b) A set of training examples that is not linearly separable (i.e., that cannot be correctly classified by
any straight line).
xl and x2 are the Perceptron inputs. Positive examples are indicated by "+", negative by "-".
Gradient Descent and the Delta Rule
Although the perceptron rule finds a successful weight vector when the training examples are
linearly separable, it can fail to converge if the examples are not linearly separable.
A second training rule, called the delta rule, is designed to overcome this difficulty.
If the training examples are not linearly separable, the delta rule converges toward a best-fit
approximation to the target concept. The key idea behind the delta rule is to use gradient descent to
search the hypothesis space of possible weight vectors to find the weights that best fit the training
Examples..
This rule is important because gradient descent provides the basis for the back propagation
algorithm , which can learn networks with many interconnected units.
The key idea behind the delta rule is to use gradient descent to search the hypothesis space of
possible weight vectors to find the weights that best fit the training examples
How to change weight
Initially we have to take random weights later keep on applying iterations and check if(output=t)
n=learning rate
t=target output
O=actual output
Xi=input associated with wi
Multilayer Neural Network:
• A multiplayer perceptron is a feed forward neural network with one or more hidden layers
• The network consists of an input layer of source neurons, at least one hidden layer of computational
neurons, and an output layer of computational neurons.
• The input signals are propagated in a forward direction on a layer-by-layer basis.
• Neurons in the hidden layer cannot be observed through input/output behavior of the network.
• There is no obvious way to know what the desired output of the hidden layer should be.
Backpropagation
Backpropagation is the essence of neural network training. It is the method of fine-tuning the weights of a
neural network based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the
weights allows you to reduce error rates and make the model reliable by increasing its generalization.
Backpropagation in neural network is a short form for “backward propagation of errors.” It is a standard
method of training artificial neural networks. This method helps calculate the gradient of a loss function
with respect to all the weights in the network.
How Backpropagation Algorithm Works
The Back propagation algorithm in neural network computes the gradient of the loss function for a single
weight by the chain rule. It efficiently computes one layer at a time, unlike a native direct computation. It
computes the gradient, but it does not define how the gradient is used. It generalizes the computation in
the delta rule.
1.Inputs X, arrive through the preconnected path
2.Input is modeled using real weights W. The weights are usually randomly selected.
3.Calculate the output for every neuron from the input layer, to the hidden layers, to the output layer.
4.Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5.Travel back from the output layer to the hidden layer to adjust the weights such that the error is
decreased.
VISUALIZING THE HYPOTHESIS SPACE
To understand the gradient descent algorithm, it is helpful to visualize the entire hypothesis space of
possible weight vectors and their associated E values.
Here the axes wo and wl represent possible values for the two weights of a simple linear unit.
The wo, w1 plane therefore represents the entire hypothesis space. The vertical axis indicates the error E
relative to some fixed set of training examples. The error surface shown in the figure thus summarizes
the desirability of every weight vector in the hypothesis space (we desire a hypothesis with minimum
error).
Given the way in which we chose to define E, for linear units this error surface must always be parabolic
with a single global minimum. The specific parabola will depend, of course, on the particular
set of training examples.
Error of different hypotheses. For a linear unit with two weights, the hypothesis space H is the w0, w1
plane.
The vertical axis indicates the error of the corresponding weight vector hypothesis, relative to a fixed set
of training examples. The arrow shows the negated gradient at one particular point, indicating the
direction in the w0, w1 plane producing steepest descent along the error surface.