0% found this document useful (0 votes)
29 views19 pages

Neural Networks

The document provides an overview of softmax regression and neural networks. It discusses softmax regression as an extension of logistic regression to handle multi-class classification problems. It then introduces neural networks, describing the basic architecture as a network of neurons arranged in layers, with parameters to learn being the weights and biases. It provides examples of activation functions and discusses how to perform forward and backpropagation to optimize the parameters.

Uploaded by

Shubham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views19 pages

Neural Networks

The document provides an overview of softmax regression and neural networks. It discusses softmax regression as an extension of logistic regression to handle multi-class classification problems. It then introduces neural networks, describing the basic architecture as a network of neurons arranged in layers, with parameters to learn being the weights and biases. It provides examples of activation functions and discusses how to perform forward and backpropagation to optimize the parameters.

Uploaded by

Shubham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Softmax Regression Neural Networks

Convex Optimization
Lecture 16 - Softmax Regression and Neural Networks

Instructor: Yuanzhang Xiao

University of Hawaii at Manoa

Fall 2017

1 / 19
Softmax Regression Neural Networks

Today’s Lecture

1 Softmax Regression

2 Neural Networks

2 / 19
Softmax Regression Neural Networks

Outline

1 Softmax Regression

2 Neural Networks

3 / 19
Softmax Regression Neural Networks

Softmax Regression
extend logistic regression to multi-class classification

training data (a(i) , b (i) ), i = 1, . . . , m

labels with K values: b (i) ∈ {1, . . . , K }

hypothesis:

exp(x (1)T a)
   
P(b = 1 | a; x)
.. 1 ..
hx (a) =   = PK
   
. (k)T
 . 
k=1 exp(x a)
P(b = K | a; x) exp(x (K )T a)

parameters to learn:

x = [x (1) , . . . , x (K ) ] ∈ Rn×K
4 / 19
Softmax Regression Neural Networks

Maximum Log-Likelihood Estimator


given training data (a(i) , b (i) )m
i=1 , the probability of this sequence is

m Y K
" #1 (i)
Y exp(x (k)T a(i) ) {b =k}

PK (j)T a(i) )
i=1 k=1 j=1 exp(x

log-likelihood is
m X
K
X exp(x (k)T a(i) )
`(x) = 1{b(i) =k} · log PK
(j)T a(i) )
i=1 k=1 j=1 exp(x

gradient is
m
!
∂`(x) X (i) exp(x (k)T a(i) )
= a · 1{b(i) =k} − PK
∂x (k) i=1 j=1 exp(x
(j)T a(i) )

5 / 19
Softmax Regression Neural Networks

Outline

1 Softmax Regression

2 Neural Networks

6 / 19
Softmax Regression Neural Networks

Neural Networks – A Single Neuron


fit a training example (x, y ) with a neuron:

• input: x and a normalization term +1


• output: hw ,b (x) = f (w T x + b)
• f is the activation function

some choices of activation function:


• sigmoid: f (z) = 1+e1 −z (as in logistic regression)
z −z
• tanh: f (z) = tanh(z) = ee z −e
+e −z
• rectified linear: f (z) = max{0, x} (deep neural networks)
7 / 19
Softmax Regression Neural Networks

Neural Networks – Activation Functions


illustration of activation functions

• tanh: rescaled sigmoid


• rectified linear: unbounded
8 / 19
Softmax Regression Neural Networks

Neural Networks – Basic Architecture


neural network: network of neurons
a three-layer neural network:

• input layer: the leftmost layer


• output layer: the rightmost layer
• hidden layer: the layers in the middle
9 / 19
Softmax Regression Neural Networks

Neural Networks – Parameters to Learn


a three-layer neural network:

number of layers n` = 3

weight of link from unit j in layer ` and unit i in layer ` + 1: Wij`

parameters to learn: (W (1) , b (1) , . . . , W (n` ) , b (n` ) )


10 / 19
Softmax Regression Neural Networks

Neural Networks – Example

(`)
the activation (i.e., output) of unit i at layer `: ai
computation:
(2) (1) (1) (1) (1)
a1 = f (W11 x1 + W12 x2 + W13 x3 + b1 )
(2) (1) (1) (1) (1)
a2 = f (W21 x1 + W22 x2 + W23 x3 + b2 )
(2) (1) (1) (1) (1)
a3 = f (W31 x1 + W32 x2 + W33 x3 + b3 )
(3) (2) (2) (2) (2) (2) (2) (2)
hW ,b (x) = a1 = f (W11 a1 + W12 a2 + W13 a3 + b3 )
11 / 19
Softmax Regression Neural Networks

Neural Networks – Compact Representation


a three-layer neural network:

define weighted sum of inputs to unit i in layer ` as


n
(`) (`−1) (`−1) (`−1)
X
zi = Wij aj + bi ,
j=1

where
(`−1) (`−1)
aj = f (zj )
12 / 19
Softmax Regression Neural Networks

Neural Networks – Compact Representation

a three-layer neural network:

compact representation: (forward propagation)

z (`+1) = W (`) a(`) + b (`)


a(`+1) = f (z (`+1) )

13 / 19
Softmax Regression Neural Networks

Neural Networks – Extensions

may have different architectures (i.e., network topology)


• different numbers s` of units in each layer `
• different connectivity

may have loops

may have multiple output units

14 / 19
Softmax Regression Neural Networks

Neural Networks – Optimization

minimize the prediction error while promoting sparsity:


" m
# n` −1 X̀ s`+1 
s X
1 X (i) (i) λ X (`) 2

J(W , b) = J(W , b; x , y ) + Wij
m 2
i=1 `=1 j=1 i=1

where J(W , b; x (i) , y (i) ) is the prediction error of sample i

1 2
J(W , b; x (i) , y (i) ) = hW ,b (x (i) ) − y (i)

2

characteristics:
• nonconvex – gradient descent used in practice
• initialization: small random values near 0 (but not all zeros)

15 / 19
Softmax Regression Neural Networks

Neural Networks – Calculating Gradients

need to compute gradients:


 
m (i) , y (i) )
∂J(W , b) 1 X ∂J(W , b; x  + λW (`)
(`)
=  (`) ij
∂Wij m ∂Wij
i=1
m
∂J(W , b) 1 X ∂J(W , b; x (i) , y (i) )
(`)
= (`)
∂bi m ∂b
i=1 i

∂J(W ,b;x (i) ,y (i) ) ∂J(W ,b;x (i) ,y (i) )


backpropagation to compute (`) and (`)
∂Wij ∂bi

16 / 19
Softmax Regression Neural Networks

Neural Networks – Backpropagation


for the output layer: (the superscript of sample index removed)
∂J(W , b; x, y )
(n` −1)
∂Wij
  2 

 


  

    
 

 n  
∂ 1  X (n −1) (n −1) (n −1) 

= yi − hW ,b  Wij ` aj ` + bi ` 

(n −1) 2
∂Wij ` 

  j=1
 
 

  | {z }  

 

 (n ) (n )
` ` 
=f (zi )=ai
(n` )

(n` )
 h 
(n )
i ∂zi
= yi − ai · −f 0 zi ` · (n` −1)
∂Wij
   
(n ) (n ) (n −1)
= − yi − ai ` · f 0 zi ` ·aj `
| {z }
(n` )
,δi 17 / 19
Softmax Regression Neural Networks

Neural Networks – Backpropagation


for the middle layer n` − 1: (the superscript of sample index
removed)
∂J(W , b; x, y )
(n` −2)
∂Wij
( sn )
∂ 1 X̀ h (n ) 2
i
= (n` −2)
yk − f (zk ` )
∂Wij 2
k=1
sn i ∂z (n` ) ∂a(n` −1) ∂z (n` −1)
X̀  (n` )
 h 
(n )
= yk − ak · −f 0 zk ` · (nk −1) · i(n −1) · i
(n −2)
` `
k=1 ∂ai ∂zi ∂Wij `
sn  
(n` ) (n −1) (n −1) (n −2)

= δk · Wki ` · f 0 zi ` ·aj `
|k=1 {z }
(n −1)
,δi `

18 / 19
Softmax Regression Neural Networks

Neural Networks – Backpropagation


backpropagation:
(`) (`)
• a forward propagation to determine all the ai , zi
• for the output layer, set
(n` ) (n` ) (n` )
δi = −(yi − ai ) · f 0 (zi )
• for middle layers ` = n` − 1, . . . , 2 and each node i in layer `,
set  
s`+1
(`) (`) (`+1)  0 (`)
X
δi = W δ ji jf (z ) i
j=1
• compute gradients
∂J(W , b; x, y ) (`) (`+1)
(`)
= aj δi
∂Wij
∂J(W , b; x, y ) (`+1)
(`)
= δi
∂bi
19 / 19

You might also like