Week2 - Intro To Neural Nets
Week2 - Intro To Neural Nets
2
Neural Net Structure
Input Output
(Feature Vector) (Label)
3
Basic Neuron Visualization
activation
function
4
Basic Neuron Visualization
Data from
previous layer
activation
function
5
Basic Neuron Visualization
Some form of computation
transforms the inputs
activation
function
6
Basic Neuron Visualization
activation
function The neuron outputs the
transformed data
7
Basic Neuron Visualization
x1
x2 w2 activation
function
x3
8
Basic Neuron Visualization
x1
x2 w2 activation
function
x3 1
9
Basic Neuron Visualization
x1 z = x1w1+ x2w2+ x3w3+b
x2 w2 activation f(z)
function
x3 1
10
In Vector Notation
𝑚
z = “net input” 𝑧=𝑏+ 𝑥𝑖 𝑤𝑖
b = “bias term” 𝑖=1
11
Relation to Logistic Regression
1
When we choose: 𝑓 𝑧 = 1+𝑒 −𝑧
𝑧=𝑏+ 𝑥𝑖 𝑤𝑖 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + ⋯ + 𝑥𝑚 𝑤𝑚 + 𝑏
𝑖=1
12
Relation to Logistic Regression
1
This is called the “sigmoid” function: 𝜎 𝑧 =
1 + 𝑒 −𝑧
13
Nice Property of Sigmoid Function
1
𝜎 𝑧 = Quotient rule
1 + 𝑒 −𝑧 𝑑 𝑓(𝑥) 𝑓 ′ 𝑥 𝑔 𝑥 − 𝑓 𝑥 𝑔′(𝑥)
⋅ =
0 − (−𝑒 −𝑧 ) 𝑒 −𝑧 𝑑𝑥 𝑔(𝑥) 𝑔 𝑥 2
𝜎′ 𝑧 = =
1 + 𝑒 −𝑧 2 1 + 𝑒 −𝑧 2
1 + 𝑒 −𝑧 − 1 1 + 𝑒 −𝑧 1
= = −
1 + 𝑒 −𝑧 2 1 + 𝑒 −𝑧 2 1 + 𝑒 −𝑧 2
1 1 1 1
= − = 1 −
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧 2 1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
14
Example Neuron Computation
x1 z = x1w1+ x2w2+ x3w3+b
(sigmoid)
x2 w2
activation f(z)
function
x3 1
15
Example Neuron Computation
.9 z = x1w1+ x2w2+ x3w3+b
(sigmoid)
.2 3
activation f(z)
function
.3 1
16
Example Neuron Computation
.9 z = .9(2)+ .2(3)+ .3(-1)+.5 = 2.6
(sigmoid)
.2 3
activation f(z)
function
.3 1
17
Example Neuron Computation
.9 z = .9(2)+ .2(3)+ .3(-1)+.5 = 2.6
f(z)=f(3.5)=1/(1+exp(-2.6))
= .93
(sigmoid)
.2 3
activation
function
.3 1
18
Example Neuron Computation
.9 z = .9(2)+ .2(3)+ .3(-1)+.5 = 2.6
f(z)=f(3.5)=1/(1+exp(-2.6))
= .93
(sigmoid)
.2 3
activation
function
Neuron would output
the value .93
.3 1
19
Why Neural Nets?
Why not just use a single neuron?
Why do we need a larger network?
A single neuron (like logistic
regression) only permits a linear
decision boundary.
Most real-world problems are
considerably more complicated!
20
Feedforward Neural Network
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎
21
Weights
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎
22
Input Layer
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎
23
Hidden Layers
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎
24
Output Layer
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎
25
Weights (represented by matrices)
𝑊 (1) 𝑊 (2) 𝑊 (3)
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎
26
Net Input (sum of weighted inputs, before activation function)
𝑧 (2) 𝑧 (3) 𝑧 (4)
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎
27
Activations (output of neurons to next layer)
𝑎(2) 𝑎(3)
𝑎(1) 𝜎 𝜎 𝑎(4)
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎
28
Matrix representation of computation
𝑧 (2)
𝑎(2)
𝑊 (1)
𝜎
𝑥1
𝑥 = 𝑥1 , 𝑥2 , 𝑥3 𝑊 (1) is a
3x4 matrix 𝜎
(𝑥 = 𝑎(1) ) 𝑧 (2) is a 𝑥2
4-vector
𝑧 (2) = 𝑥𝑊 (1) 𝜎
𝑎(2) is a
𝑎(2) = 𝜎(𝑧 2
) 4-vector
𝑥3
𝜎
29
Continuing the Computation
For a single training instance (data point)
Input: vector x (a row vector of length 3)
Output: vector 𝑦 (a row vector of length 3)
30
Multiple data points
In practice, we do these computation for many data points at the same time,
by “stacking” the rows into a matrix. But the equations look the same!
Input: matrix x (an nx3 matrix) (each row a single instance)
Output: vector 𝑦 (an nx3 matrix) (each row a single prediction)
31
Now we know how feedforward NNs do Computations.
Next, we will learn how to adjust the weights to learn from data.
32