0% found this document useful (0 votes)
143 views29 pages

Neural Networks

- Homework 5 and a book review are due today, October 30th, while lab 3 is due November 1st and homework 6 is due November 6th. - Kay will present the current event today and Chelsea will present on November 1st. - The document discusses artificial neural networks and how they can be used to learn real-valued, discrete-valued, and vector-valued functions from noisy training data. It provides examples of problems ANNs have been applied to like visual scene interpretation and speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views29 pages

Neural Networks

- Homework 5 and a book review are due today, October 30th, while lab 3 is due November 1st and homework 6 is due November 6th. - Kay will present the current event today and Chelsea will present on November 1st. - The document discusses artificial neural networks and how they can be used to learn real-valued, discrete-valued, and vector-valued functions from noisy training data. It provides examples of problems ANNs have been applied to like visual scene interpretation and speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Announcements

• Homework 5 due today, October 30


• Book Review due today, October 30
• Lab 3 due Thursday, November 1
• Homework 6 due Tuesday, November 6
• Current Event
• Kay - today
• Chelsea - Thursday, November 1

CS 484 – Artificial Intelligence 1


Neural Networks

Lecture 12
Artificial Neural Networks
• Artificial neural networks (ANNs) provide
a practical method for learning
• real-valued functions
• discrete-valued functions
• vector-valued functions
• Robust to errors in training data
• Successfully applied to such problems as
• interpreting visual scenes
• speech recognition
• learning robot control strategies

CS 484 – Artificial Intelligence 3


Biological Neurons

• The human brain is


made up of billions
of simple processing
units – neurons.

• Inputs are received on dendrites, and if the input


levels are over a threshold, the neuron fires, passing a
signal through the axon to the synapse which then
connects to another neuron.
CS 484 – Artificial Intelligence 4
Neural Network Representation
• ALVINN uses a learned ANN to steer an
autonomous vehicle driving at normal
speeds on public highways
• Input to network: 30x32 grid of pixel intensities
obtained from a forward-pointed camera
mounted on the vehicle
• Output: direction in which the vehicle is steered
• Trained to mimic observed steering commands
of a human driving the vehicle for
approximately 5 minutes

CS 484 – Artificial Intelligence 5


ALVINN

CS 484 – Artificial Intelligence 6


Appropriate problems
• ANN learning well-suit to problems which the training
data corresponds to noisy, complex data (inputs from
cameras or microphones)
• Can also be used for problems with symbolic
representations
• Most appropriate for problems where
• Instances have many attribute-value pairs
• Target function output may be discrete-valued, real-valued, or a
vector of several real- or discrete-valued attributes
• Training examples may contain errors
• Long training times are acceptable
• Fast evaluation of the learned target function may be required
• The ability for humans to understand the learned target function is
not important

CS 484 – Artificial Intelligence 7


Artificial Neurons (1)

• Artificial neurons are based on biological neurons.


• Each neuron in the network receives one or more inputs.
• An activation function is applied to the inputs, which
determines the output of the neuron – the activation level.

• The charts on the


right show three
typical activation
functions.
CS 484 – Artificial Intelligence 8
Artificial Neurons (2)

• A typical activation function works as follows:


n  1 for X  t
X   wi xi Y 
i 1 0 for X  t
• Each node i has a weight, wi associated with it. The
input to node i is xi.
• t is the threshold.
• So if the weighted sum of the inputs to the neuron is
above the threshold, then the neuron fires.

CS 484 – Artificial Intelligence 9


Perceptrons
• A perceptron is a single neuron that classifies a set
of inputs into one of two categories (usually 1 or
-1).
• If the inputs are in the form of a grid, a perceptron
can be used to recognize visual images of shapes.
• The perceptron usually uses a step function, which
returns 1 if the weighted sum of inputs exceeds a
threshold, and 0 otherwise.

CS 484 – Artificial Intelligence 10


Training Perceptrons
• Learning involves choosing values for the weights
• The perceptron is trained as follows:
• First, inputs are given random weights (usually
between –0.5 and 0.5).
• An item of training data is presented. If the perceptron
mis-classifies it, the weights are modified according to
the following:
wi  wi   a  xi   t  o  
• where t is the target output for the training example, o is the output
generated by the preceptron and a is the learning rate, between 0 and
1 (usually small such as 0.1)
• Cycle through training examples until successfully classify
all examples
• Each cycle known as an epoch

CS 484 – Artificial Intelligence 11


Bias of Perceptrons
• Perceptrons can only classify linearly separable
functions.
• The first of the following graphs shows a linearly
separable function (OR).
• The second is not linearly separable (Exclusive-
OR).

CS 484 – Artificial Intelligence 12


Convergence
• Perceptron training rule only converges when
training examples are linearly separable and a has
a small learning constant
• Another approach uses the delta rule and gradient
descent
• Same basic rule for finding update value
• Changes
• Do not incorporate the threshold in the output value
(unthresholded perceptron)
• Wait to update weight until cycle is complete
• Converges asymptotically toward the minimum error
hypothesis, possibly requiring unbounded time, but
converges regardless of whether the training data are
linearly separable

CS 484 – Artificial Intelligence 13


Multilayer Neural Networks
• Multilayer neural networks can classify a range of
functions, including non linearly separable ones.
• Each input layer neuron
connects to all neurons in
the hidden layer.
• The neurons in the hidden
layer connect to all neurons
in the output layer.
A feed-forward network
CS 484 – Artificial Intelligence 14
Speech Recognition ANN

CS 484 – Artificial Intelligence 15


Sigmoid Unit

 (x) is the sigmoid function 1


1  ex
• Nice property: differentiable  ( x)
  ( x )1   ( x ) 
x
• Derive gradient descent rules to train
• One sigmoid unit - node
• Multilayer networks of sigmoid units
CS 484 – Artificial Intelligence 16
Backpropagation

• Multilayer neural networks learn in the same way as


perceptrons.
• However, there are many more weights, and it is
important to assign credit (or blame) correctly when
changing weights.
• E sums the errors over all of the network output units

 1
E ( w)    (t kd  okd ) 2
2 dD koutputs

CS 484 – Artificial Intelligence 17


Backpropagation Algorithm
• Create a feed-forward network with nin inputs, nhidden
hidden units, and nout output units.
• Initialize all network weights to small random numbers
• Until termination condition is met, Do
• For each <x,t> in training examples, Do
Propagate the input forward through the network:
1. Input the instance x to the network and compute the output ou of
every unit u in the network
Propagate the errors backward through the network:
2. For each network output unit k, calculate its error term δk
 k  ok (1  ok )(t k  ok )
3. For each hidden unit h, calculate its error term δh
 h  oh (1  oh )  wkh k
koutputs

4. Update each
w network
w  w weight wji
ji ji ji

where
w ji   j x ji
CS 484 – Artificial Intelligence 18
Example: Learning AND
a b c Initial Weights:
w_da = .2
w_db = .1
d e w_dc = -.1
w_d0 = .1

f w_ea = -.5
w_eb = .3
w_ec = -.2
Training Data: w_e0 = 0
AND(1,0,1) = 0
AND(1,1,1) = 1 w_fd = .4
w_fe = -.2
Alpha = 0.1 w_f0 = -.1

CS 484 – Artificial Intelligence 19


Hidden Layer representation

Target Function:

Can this be learned?

CS 484 – Artificial Intelligence 20


Yes
Input Hidden Output
Values
10000000 → .89 .04 .08 → 10000000
01000000 → .15 .99 .99 → 01000000
00100000 → .01 .97 .27 → 00100000
00010000 → .99 .97 .71 → 00010000
00001000 → .03 .05 .02 → 00001000
00000100 → .01 .11 .88 → 00000100
00000010 → .80 .01 .98 → 00000010
00000001 → .60 .94 .01 → 00000001
CS 484 – Artificial Intelligence 21
Plots of Squared Error

CS 484 – Artificial Intelligence 22


Hidden Unit
(.15 .99 .99)

CS 484 – Artificial Intelligence 23


Evolving weights

CS 484 – Artificial Intelligence 24


Momentum
• One of many variations
• Modify the update rule by making the
weight update on the nth iteration depend
partially on the update that occurred in the
(n-1)th iteration
w ji (n)   j x ji   w ji (n  1)
• Minimizes error over training examples
• Speeds up training since it can take 1000s
of iterations
CS 484 – Artificial Intelligence 25
When to stop training
• Continue until error falls below some predefined
threshold
• Bad choice because Backpropagation is susceptible to
overfitting
• Won't be able to generalize as well over unseen data

CS 484 – Artificial Intelligence 26


Cross Validation
• Common approach to avoid overfitting
• Reserve part of the training data for testing
• m examples are partitioned into k disjoint subsets
• Run the procedure k times
• Each time a different one of these subsets is used as
validation
• Determine the number of iterations that yield the
best performance
• Mean of the number of iterations is used to train
all n examples

CS 484 – Artificial Intelligence 27


Neural Nets for Face Recognition

CS 484 – Artificial Intelligence 28


Hidden Unit Weights

left straight right up

CS 484 – Artificial Intelligence 29

You might also like