Artificial Neural Networks
Artificial Neural Networks
Networks
ANN
• Topics
• Perceptron Model to Neural Networks
• Activation Functions
• Cost Functions
• Feed Forward Networks
• BackPropagation
Perceptron Model
Perceptron model
Inputs Output
Perceptron model
x1
Inputs Output
x2
Perceptron model
x1
Inputs f(X) Output
x2
Perceptron model
x1
y
Inputs f(X) Output
x2
Perceptron model
• If f(X) is just a sum, then y=x1+x2
x1
y
Inputs f(X) Output
x2
Perceptron model
• adjust some parameter in order to “learn”
x1
y
Inputs f(X) Output
x2
Perceptron model
• add an adjustable weight
w1
x1
y
Inputs f(X) Output
w2
x2
Perceptron model
• y = x1w1 + x2w2
w1
x1
y
Inputs f(X) Output
w2
x2
Perceptron model
• update the weights to effect y
w1
x1
y
Inputs f(X) Output
w2
x2
Perceptron model
● what if an x is zero? w won’t change anything!
w1
x1
y
Inputs f(X) Output
w2
x2
Perceptron model
• add in a bias term b to the inputs
w1
x1
y
Inputs f(X) Output
w2
x2
Perceptron model
*w1 + b
x1
y
Inputs f(X) Output
x2 *w2 + b
Perceptron model
• y = (x1w1 + b) + (x2w2 + b)
*w1 + b
x1
y
Inputs f(X) Output
x2 *w2 + b
Perceptron model
• expand this to a generalization:
x1
*w1 + b y
Inputs f(X) Output
x2 *w2 + b
xn *wn + b
Perceptron model
• Recall
• x*w + b
• w implies how much weight or strength to give the
incoming input
• b offset value, making x*w have to reach a certain
threshold before having an effect
Neural Networks
1
Output very “strong” function
0
0
z = wx + b
Deep Learning
1
Output
0
0
z = wx + b
Deep Learning
1
Output
0
0
z = wx + b
Deep Learning
• sigmoid function
1
Output
0
0
z = wx + b
Deep Learning
1
Output
-1
0
Deep Learning
Output
0
z = wx + b
Multi-Class
Activation Functions
Deep Learning
Class One
Class Two
Hidden Layers
Class N
Deep Learning
... ...
• Non-Exclusive Classes
A B C
Data Point 1 A,B
Data Point 1 1 1 0
Data Point 2 A
Data Point 2 1 0 0
Data Point 3 C,B
Data Point 3 0 1 1
... ...
... ... ... ...
Data Point N B
Data Point N 0 1 0
Deep Learning
• Non-exclusive
• Sigmoid function
• Each neuron will output a value between 0 and 1,
indicating the probability of having that class
assigned to it.
Multiclass Classification
1
Class Two 0.2
0
Hidden Layers
1
Class N 0.3
0
Deep Learning
• First question
• We need to take the estimated outputs of the network
and then compare them to the real values of the label.
• The cost function (often referred to as a loss function)
must be an average so it can output a single value.
Terminology
• Cost function
w
Deep Learning
wmin
• The real cost function will be very complex!
• n-dimensional
• use gradient descent to solve this problem
C(w)
w
Gradient Descent
• Calculate the slope at a point
C(w)
wmin
Gradient Descent
• Calculate the slope at a point
C(w)
wmin
Deep Learning
• Move in the downward direction of the slope.
C(w)
wmin
Deep Learning
• Move in the downward direction of the slope.
C(w)
wmin
Deep Learning
• Move in the downward direction of the slope.
C(w)
wmin
Deep Learning
• Until we converge to zero, indicating a minimum.
C(w)
wmin
Deep Learning
• We could have changed our step size to find the next point!
C(w)
wmin
Deep Learning
• Smaller steps sizes take longer to find the minimum.
C(w)
wmin
Deep Learning
• Larger steps are faster, but we risk overshooting the
minimum!
C(w)
wmin
Deep Learning
• This step size is known as the learning rate.
C(w)
wmin
Deep Learning
• ∇C(w1,w2,...wn)
Deep Learning
• For classification problems, we often use the cross
entropy loss function.
• The assumption is that your model predicts a probability
distribution p(y=i) for each class i=1,2,…,C.
• For a binary classification this results in:
L-1 L
Backpropagation
L-1 L
Backpropagation
L-1 L
Backpropagation
L-1 L
Backpropagation
L-1 L
Backpropagation
!"
• Partial derivative :
!#
• How quickly the cost changes when we change the weights
&
• 𝑤$% : weight for the connection from the 𝑘'( neuron in 𝑙 − 1 layer to the 𝑗'(
neuron in 𝑙'( layer
Backpropagation
• Activation 𝑎$& of 𝑗'( neuron in 𝑙'( layer is related to the activations in 𝑙 − 1'(
layer by the following equation:
Assumptions about the cost function
• First assumption
• Average
• Second assumption
• Function of output from the neural network
Four Fundamental Equations
Equation 1: Error in the output layer
Equation 2: Error in terms of error in the next layer
Equation 3: Rate of change of the cost w.r.t. any bias in the network
Equation 4: Rate of change of the cost w.r.t. any weight in the network
Learning Process