0% found this document useful (0 votes)
10 views33 pages

Week2 - Intro To Neural Nets

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views33 pages

Week2 - Intro To Neural Nets

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Motivation for Neural Nets

 Use biology as inspiration for


mathematical model
 Get signals from previous neurons
 Generate signals (or not)
according to inputs
 Pass signals on to next neurons
 By layering many neurons, can
create complex model

2
Neural Net Structure

Input Output
(Feature Vector) (Label)

 Can think of it as a complicated computation engine


 We will ”train it” using our training data
 Then (hopefully) it will give good answers on new data

3
Basic Neuron Visualization

activation
function

4
Basic Neuron Visualization

Data from
previous layer
activation
function

5
Basic Neuron Visualization
Some form of computation
transforms the inputs

activation
function

6
Basic Neuron Visualization

activation
function The neuron outputs the
transformed data

7
Basic Neuron Visualization
x1

x2 w2 activation
function

x3

8
Basic Neuron Visualization
x1

x2 w2 activation
function

x3 1

9
Basic Neuron Visualization
x1 z = x1w1+ x2w2+ x3w3+b

x2 w2 activation f(z)
function

x3 1

10
In Vector Notation
𝑚
z = “net input” 𝑧=𝑏+ 𝑥𝑖 𝑤𝑖
b = “bias term” 𝑖=1

f = activation function 𝑧=𝑏 + 𝑥𝑇 𝑤


a = output to next layer 𝑎 = 𝑓(𝑧)

11
Relation to Logistic Regression
1
When we choose: 𝑓 𝑧 = 1+𝑒 −𝑧

𝑧=𝑏+ 𝑥𝑖 𝑤𝑖 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + ⋯ + 𝑥𝑚 𝑤𝑚 + 𝑏
𝑖=1

Then a neuron is simply a ”unit” of logistic regression!


weights  coefficients inputs  variables
bias term  constant term

12
Relation to Logistic Regression
1
This is called the “sigmoid” function: 𝜎 𝑧 =
1 + 𝑒 −𝑧

13
Nice Property of Sigmoid Function
1
𝜎 𝑧 = Quotient rule
1 + 𝑒 −𝑧 𝑑 𝑓(𝑥) 𝑓 ′ 𝑥 𝑔 𝑥 − 𝑓 𝑥 𝑔′(𝑥)
⋅ =
0 − (−𝑒 −𝑧 ) 𝑒 −𝑧 𝑑𝑥 𝑔(𝑥) 𝑔 𝑥 2
𝜎′ 𝑧 = =
1 + 𝑒 −𝑧 2 1 + 𝑒 −𝑧 2

1 + 𝑒 −𝑧 − 1 1 + 𝑒 −𝑧 1
= = −
1 + 𝑒 −𝑧 2 1 + 𝑒 −𝑧 2 1 + 𝑒 −𝑧 2

1 1 1 1
= − = 1 −
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧 2 1 + 𝑒 −𝑧 1 + 𝑒 −𝑧

𝜎′ 𝑧 = 𝜎(𝑧)(1 − 𝜎(𝑧)) This will be helpful!

14
Example Neuron Computation
x1 z = x1w1+ x2w2+ x3w3+b

(sigmoid)
x2 w2
activation f(z)

function

x3 1

15
Example Neuron Computation
.9 z = x1w1+ x2w2+ x3w3+b

(sigmoid)
.2 3
activation f(z)

function

.3 1

16
Example Neuron Computation
.9 z = .9(2)+ .2(3)+ .3(-1)+.5 = 2.6

(sigmoid)
.2 3
activation f(z)

function

.3 1

17
Example Neuron Computation
.9 z = .9(2)+ .2(3)+ .3(-1)+.5 = 2.6

f(z)=f(3.5)=1/(1+exp(-2.6))
= .93
(sigmoid)
.2 3
activation
function

.3 1

18
Example Neuron Computation
.9 z = .9(2)+ .2(3)+ .3(-1)+.5 = 2.6

f(z)=f(3.5)=1/(1+exp(-2.6))
= .93
(sigmoid)
.2 3
activation
function
Neuron would output
the value .93
.3 1

19
Why Neural Nets?
 Why not just use a single neuron?
Why do we need a larger network?
 A single neuron (like logistic
regression) only permits a linear
decision boundary.
 Most real-world problems are
considerably more complicated!

20
Feedforward Neural Network
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎

21
Weights
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎

22
Input Layer
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎

23
Hidden Layers
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎

24
Output Layer
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎

25
Weights (represented by matrices)
𝑊 (1) 𝑊 (2) 𝑊 (3)
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎

26
Net Input (sum of weighted inputs, before activation function)
𝑧 (2) 𝑧 (3) 𝑧 (4)
𝜎 𝜎
𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎

27
Activations (output of neurons to next layer)
𝑎(2) 𝑎(3)
𝑎(1) 𝜎 𝜎 𝑎(4)

𝑥1 𝑦1
𝜎 𝜎
𝑥2 𝑦2
𝜎 𝜎
𝑥3 𝑦3
𝜎 𝜎

28
Matrix representation of computation
𝑧 (2)
𝑎(2)
𝑊 (1)
𝜎
𝑥1
𝑥 = 𝑥1 , 𝑥2 , 𝑥3 𝑊 (1) is a
3x4 matrix 𝜎
(𝑥 = 𝑎(1) ) 𝑧 (2) is a 𝑥2
4-vector
𝑧 (2) = 𝑥𝑊 (1) 𝜎
𝑎(2) is a
𝑎(2) = 𝜎(𝑧 2
) 4-vector
𝑥3
𝜎

29
Continuing the Computation
For a single training instance (data point)
Input: vector x (a row vector of length 3)
Output: vector 𝑦 (a row vector of length 3)

𝑧 (2) = 𝑥𝑊 (1) 𝑎(2) = 𝜎(𝑧 2


)

𝑧 (3) = 𝑎(2) 𝑊 (2) 𝑎(3) = 𝜎(𝑧 3


)
𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧 4 )
𝑧 (4) = 𝑎(3) 𝑊 (3)

30
Multiple data points
In practice, we do these computation for many data points at the same time,
by “stacking” the rows into a matrix. But the equations look the same!
Input: matrix x (an nx3 matrix) (each row a single instance)
Output: vector 𝑦 (an nx3 matrix) (each row a single prediction)

𝑧 (2) = 𝑥𝑊 (1) 𝑎(2) = 𝜎(𝑧 2


)

𝑧 (3) = 𝑎(2) 𝑊 (2) 𝑎(3) = 𝜎(𝑧 3


)
𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧 4 )
𝑧 (4) = 𝑎(3) 𝑊 (3)

31
Now we know how feedforward NNs do Computations.
Next, we will learn how to adjust the weights to learn from data.

32

You might also like