0% found this document useful (0 votes)

29 views19 pages

Neural Networks

The document provides an overview of softmax regression and neural networks. It discusses softmax regression as an extension of logistic regression to handle multi-class classification problems. It then introduces neural networks, describing the basic architecture as a network of neurons arranged in layers, with parameters to learn being the weights and biases. It provides examples of activation functions and discusses how to perform forward and backpropagation to optimize the parameters.

Uploaded by

Shubham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views19 pages

Neural Networks

Uploaded by

Shubham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Softmax Regression Neural Networks

Convex Optimization
Lecture 16 - Softmax Regression and Neural Networks

Instructor: Yuanzhang Xiao

University of Hawaii at Manoa

Fall 2017

1 / 19
Softmax Regression Neural Networks

Today’s Lecture

1 Softmax Regression

2 Neural Networks

2 / 19
Softmax Regression Neural Networks

Outline

1 Softmax Regression

2 Neural Networks

3 / 19
Softmax Regression Neural Networks

Softmax Regression
extend logistic regression to multi-class classification

training data (a(i) , b (i) ), i = 1, . . . , m

labels with K values: b (i) ∈ {1, . . . , K }

hypothesis:

exp(x (1)T a)
   
P(b = 1 | a; x)
.. 1 ..
hx (a) =   = PK
   
. (k)T
 . 
k=1 exp(x a)
P(b = K | a; x) exp(x (K )T a)

parameters to learn:

x = [x (1) , . . . , x (K ) ] ∈ Rn×K
4 / 19
Softmax Regression Neural Networks

Maximum Log-Likelihood Estimator

given training data (a(i) , b (i) )m
i=1 , the probability of this sequence is

m Y K
" #1 (i)
Y exp(x (k)T a(i) ) {b =k}

PK (j)T a(i) )
i=1 k=1 j=1 exp(x

log-likelihood is
m X
K
X exp(x (k)T a(i) )
`(x) = 1{b(i) =k} · log PK
(j)T a(i) )
i=1 k=1 j=1 exp(x

gradient is
m
!
∂`(x) X (i) exp(x (k)T a(i) )
= a · 1{b(i) =k} − PK
∂x (k) i=1 j=1 exp(x
(j)T a(i) )

5 / 19
Softmax Regression Neural Networks

Outline

1 Softmax Regression

2 Neural Networks

6 / 19
Softmax Regression Neural Networks

Neural Networks – A Single Neuron

fit a training example (x, y ) with a neuron:

• input: x and a normalization term +1

• output: hw ,b (x) = f (w T x + b)
• f is the activation function

some choices of activation function:

• sigmoid: f (z) = 1+e1 −z (as in logistic regression)
z −z
• tanh: f (z) = tanh(z) = ee z −e
+e −z
• rectified linear: f (z) = max{0, x} (deep neural networks)
7 / 19
Softmax Regression Neural Networks

Neural Networks – Activation Functions

illustration of activation functions

• tanh: rescaled sigmoid

• rectified linear: unbounded
8 / 19
Softmax Regression Neural Networks

Neural Networks – Basic Architecture

neural network: network of neurons
a three-layer neural network:

• input layer: the leftmost layer

• output layer: the rightmost layer
• hidden layer: the layers in the middle
9 / 19
Softmax Regression Neural Networks

Neural Networks – Parameters to Learn

a three-layer neural network:

number of layers n` = 3

weight of link from unit j in layer ` and unit i in layer ` + 1: Wij`

parameters to learn: (W (1) , b (1) , . . . , W (n` ) , b (n` ) )

10 / 19
Softmax Regression Neural Networks

Neural Networks – Example

(`)
the activation (i.e., output) of unit i at layer `: ai
computation:
(2) (1) (1) (1) (1)
a1 = f (W11 x1 + W12 x2 + W13 x3 + b1 )
(2) (1) (1) (1) (1)
a2 = f (W21 x1 + W22 x2 + W23 x3 + b2 )
(2) (1) (1) (1) (1)
a3 = f (W31 x1 + W32 x2 + W33 x3 + b3 )
(3) (2) (2) (2) (2) (2) (2) (2)
hW ,b (x) = a1 = f (W11 a1 + W12 a2 + W13 a3 + b3 )
11 / 19
Softmax Regression Neural Networks

Neural Networks – Compact Representation

a three-layer neural network:

define weighted sum of inputs to unit i in layer ` as

n
(`) (`−1) (`−1) (`−1)
X
zi = Wij aj + bi ,
j=1

where
(`−1) (`−1)
aj = f (zj )
12 / 19
Softmax Regression Neural Networks

Neural Networks – Compact Representation

a three-layer neural network:

compact representation: (forward propagation)

z (`+1) = W (`) a(`) + b (`)

a(`+1) = f (z (`+1) )

13 / 19
Softmax Regression Neural Networks

Neural Networks – Extensions

may have different architectures (i.e., network topology)

• different numbers s` of units in each layer `
• different connectivity

may have loops

may have multiple output units

14 / 19
Softmax Regression Neural Networks

Neural Networks – Optimization

minimize the prediction error while promoting sparsity:

" m
# n` −1 X̀ s`+1
s X
1 X (i) (i) λ X (`) 2

J(W , b) = J(W , b; x , y ) + Wij
m 2
i=1 `=1 j=1 i=1

where J(W , b; x (i) , y (i) ) is the prediction error of sample i

1 2
J(W , b; x (i) , y (i) ) = hW ,b (x (i) ) − y (i)

2

characteristics:
• nonconvex – gradient descent used in practice
• initialization: small random values near 0 (but not all zeros)

15 / 19
Softmax Regression Neural Networks

Neural Networks – Calculating Gradients

need to compute gradients:

 
m (i) , y (i) )
∂J(W , b) 1 X ∂J(W , b; x  + λW (`)
(`)
=  (`) ij
∂Wij m ∂Wij
i=1
m
∂J(W , b) 1 X ∂J(W , b; x (i) , y (i) )
(`)
= (`)
∂bi m ∂b
i=1 i

∂J(W ,b;x (i) ,y (i) ) ∂J(W ,b;x (i) ,y (i) )

backpropagation to compute (`) and (`)
∂Wij ∂bi

16 / 19
Softmax Regression Neural Networks

Neural Networks – Backpropagation

for the output layer: (the superscript of sample index removed)
∂J(W , b; x, y )
(n` −1)
∂Wij
  2 

 


  

    
 

 n  
∂ 1  X (n −1) (n −1) (n −1) 

= yi − hW ,b  Wij ` aj ` + bi ` 

(n −1) 2
∂Wij ` 

  j=1
 
 

  | {z }  

 

 (n ) (n )
` ` 
=f (zi )=ai
(n` )

(n` )
h
(n )
i ∂zi
= yi − ai · −f 0 zi ` · (n` −1)
∂Wij

(n ) (n ) (n −1)
= − yi − ai ` · f 0 zi ` ·aj `
| {z }
(n` )
,δi 17 / 19
Softmax Regression Neural Networks

Neural Networks – Backpropagation

for the middle layer n` − 1: (the superscript of sample index
removed)
∂J(W , b; x, y )
(n` −2)
∂Wij
( sn )
∂ 1 X̀ h (n ) 2
i
= (n` −2)
yk − f (zk ` )
∂Wij 2
k=1
sn i ∂z (n` ) ∂a(n` −1) ∂z (n` −1)
X̀ (n` )
h
(n )
= yk − ak · −f 0 zk ` · (nk −1) · i(n −1) · i
(n −2)
` `
k=1 ∂ai ∂zi ∂Wij `
sn
(n` ) (n −1) (n −1) (n −2)
X̀
= δk · Wki ` · f 0 zi ` ·aj `
|k=1 {z }
(n −1)
,δi `

18 / 19
Softmax Regression Neural Networks

Neural Networks – Backpropagation

backpropagation:
(`) (`)
• a forward propagation to determine all the ai , zi
• for the output layer, set
(n` ) (n` ) (n` )
δi = −(yi − ai ) · f 0 (zi )
• for middle layers ` = n` − 1, . . . , 2 and each node i in layer `,
set  
s`+1
(`) (`) (`+1)  0 (`)
X
δi = W δ ji jf (z ) i
j=1
• compute gradients
∂J(W , b; x, y ) (`) (`+1)
(`)
= aj δi
∂Wij
∂J(W , b; x, y ) (`+1)
(`)
= δi
∂bi
19 / 19

Understanding The Law of Resonance
No ratings yet
Understanding The Law of Resonance
13 pages
Hanuman Chalisa Bengali Large
75% (4)
Hanuman Chalisa Bengali Large
5 pages
Neural - Networks
No ratings yet
Neural - Networks
47 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
2020 CS182 Section 2 Notes
No ratings yet
2020 CS182 Section 2 Notes
6 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
6.neural Networks 2
No ratings yet
6.neural Networks 2
44 pages
What Is A Neural Network?
No ratings yet
What Is A Neural Network?
7 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Ch22 Presn PDF
No ratings yet
Ch22 Presn PDF
34 pages
Part 1.2. Back Propagation
No ratings yet
Part 1.2. Back Propagation
30 pages
Slide 2-f2
No ratings yet
Slide 2-f2
52 pages
Softmax Reg Skimmed - Ipynb - Colab
No ratings yet
Softmax Reg Skimmed - Ipynb - Colab
9 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Neural Networks: Introduction & Matlab Examples
No ratings yet
Neural Networks: Introduction & Matlab Examples
46 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
47 pages
Probability Neuron Network
No ratings yet
Probability Neuron Network
84 pages
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
No ratings yet
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
7 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
Lecture 4
No ratings yet
Lecture 4
46 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Unit 2 DL
No ratings yet
Unit 2 DL
70 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
NN 2
No ratings yet
NN 2
12 pages
cs231n Github Io Neural Networks Case Study
No ratings yet
cs231n Github Io Neural Networks Case Study
17 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Inference and Learning
No ratings yet
Inference and Learning
33 pages
Unit 1
No ratings yet
Unit 1
72 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Big Data Lesson 4 Lucrezia Noli
No ratings yet
Big Data Lesson 4 Lucrezia Noli
16 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Pr3 ANN WriteUp
No ratings yet
Pr3 ANN WriteUp
8 pages
Fundamentals of Deep Learning: Part 2: How A Neural Network Trains
No ratings yet
Fundamentals of Deep Learning: Part 2: How A Neural Network Trains
54 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
3EBX0 Lecture Notes Addendum
No ratings yet
3EBX0 Lecture Notes Addendum
10 pages
Training NNs
No ratings yet
Training NNs
34 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
15 Deep
No ratings yet
15 Deep
39 pages
Pattern Classification 11. Backpropagation & Time-Series Forecasting
No ratings yet
Pattern Classification 11. Backpropagation & Time-Series Forecasting
78 pages
MLfromBasics Ch2E
No ratings yet
MLfromBasics Ch2E
32 pages
SoftMax Regress Real
No ratings yet
SoftMax Regress Real
8 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Markdown To PDF
No ratings yet
Markdown To PDF
2 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
C2 W2 SoftMax
No ratings yet
C2 W2 SoftMax
7 pages
Neural Network
100% (1)
Neural Network
54 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Lab 4 - Markdown Practical - Solution
No ratings yet
Lab 4 - Markdown Practical - Solution
5 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Lecture02-Basics of Deep Learning
No ratings yet
Lecture02-Basics of Deep Learning
34 pages
C2 W2 SoftMax
No ratings yet
C2 W2 SoftMax
7 pages
Neural-Networks Back Propagation
No ratings yet
Neural-Networks Back Propagation
70 pages
MCL431 Minor2
No ratings yet
MCL431 Minor2
1 page
Design and Analysis of Crown Pinion of A Differential Gear Box For Reduced Number of Teeth To Improve Torque Transmitted
No ratings yet
Design and Analysis of Crown Pinion of A Differential Gear Box For Reduced Number of Teeth To Improve Torque Transmitted
10 pages
Deep Learning: CS229 Lecture Notes
No ratings yet
Deep Learning: CS229 Lecture Notes
16 pages
Back Propagation in NN
No ratings yet
Back Propagation in NN
30 pages
Folding Machine
No ratings yet
Folding Machine
5 pages
MCL 201 Tut12
No ratings yet
MCL 201 Tut12
5 pages
(9!10!11) Pure Substances
No ratings yet
(9!10!11) Pure Substances
42 pages
MCL212 Minor1 2016-2017 Sem2
No ratings yet
MCL212 Minor1 2016-2017 Sem2
2 pages
Sir - 11 - 21 Rate List 2022
No ratings yet
Sir - 11 - 21 Rate List 2022
10 pages
A Review of Daylighting Design and Implementation in Buildings 2018
No ratings yet
A Review of Daylighting Design and Implementation in Buildings 2018
10 pages
CLS - JEEAD Set Jee
No ratings yet
CLS - JEEAD Set Jee
20 pages
Kaizen Costing
No ratings yet
Kaizen Costing
4 pages
Contact Us Track Your Order: - 44-330-175-5511 - Europe - Language
No ratings yet
Contact Us Track Your Order: - 44-330-175-5511 - Europe - Language
18 pages
Untitled - Notepad
No ratings yet
Untitled - Notepad
1 page
Grade 9 - Sustainable Ecosystems
No ratings yet
Grade 9 - Sustainable Ecosystems
36 pages
Unit Iii
No ratings yet
Unit Iii
41 pages
Saipem Modern Slavery Statement 22 FINAL
No ratings yet
Saipem Modern Slavery Statement 22 FINAL
20 pages
Pi For Spare Parts 2021.5.12
No ratings yet
Pi For Spare Parts 2021.5.12
1 page
Position Description BIM Manager
No ratings yet
Position Description BIM Manager
5 pages
Economic and Product Design Considerations in Machining
No ratings yet
Economic and Product Design Considerations in Machining
29 pages
Y.garud Multiaxial Fatigue
No ratings yet
Y.garud Multiaxial Fatigue
27 pages
Quotation Structures Poles RGGVY XII DVVNL
No ratings yet
Quotation Structures Poles RGGVY XII DVVNL
2 pages
Characteristics (Typical Figures) Agip Arum HT 220
No ratings yet
Characteristics (Typical Figures) Agip Arum HT 220
1 page
Loop Breaker Manual
No ratings yet
Loop Breaker Manual
62 pages
What Is Chemical Engineering
No ratings yet
What Is Chemical Engineering
8 pages
2019 Genes Ejercicio
No ratings yet
2019 Genes Ejercicio
543 pages
PMP Notes - 3
100% (3)
PMP Notes - 3
68 pages
Math10 Q1 Wk3 Illustrate-Geometric-Sequence
100% (1)
Math10 Q1 Wk3 Illustrate-Geometric-Sequence
12 pages
Ebook Golden Rules For Futures Traders
No ratings yet
Ebook Golden Rules For Futures Traders
15 pages
Earl Reece Stadtman
No ratings yet
Earl Reece Stadtman
4 pages
CH 4 Force System Resultant
No ratings yet
CH 4 Force System Resultant
50 pages
Oct 31
No ratings yet
Oct 31
4 pages
HGP11 Q3 W3 - Las
No ratings yet
HGP11 Q3 W3 - Las
13 pages
Competency Mapping: Asst Professor, Amity University Noida Asst Professor, Amity University Noida
No ratings yet
Competency Mapping: Asst Professor, Amity University Noida Asst Professor, Amity University Noida
3 pages
16 - The New Public Service Serving Rather Than Steering
No ratings yet
16 - The New Public Service Serving Rather Than Steering
11 pages
Series D1MW Characteristics: Technical Features
No ratings yet
Series D1MW Characteristics: Technical Features
6 pages

Neural Networks

Uploaded by

Neural Networks

Uploaded by

Softmax Regression Neural Networks

Instructor: Yuanzhang Xiao

University of Hawaii at Manoa

training data (a(i) , b (i) ), i = 1, . . . , m

labels with K values: b (i) ∈ {1, . . . , K }

Maximum Log-Likelihood Estimator

Neural Networks – A Single Neuron

• input: x and a normalization term +1

some choices of activation function:

Neural Networks – Activation Functions

• tanh: rescaled sigmoid

Neural Networks – Basic Architecture

• input layer: the leftmost layer

Neural Networks – Parameters to Learn

weight of link from unit j in layer ` and unit i in layer ` + 1: Wij`

parameters to learn: (W (1) , b (1) , . . . , W (n` ) , b (n` ) )

Neural Networks – Example

Neural Networks – Compact Representation

define weighted sum of inputs to unit i in layer ` as

Neural Networks – Compact Representation

a three-layer neural network:

compact representation: (forward propagation)

z (`+1) = W (`) a(`) + b (`)

Neural Networks – Extensions

may have different architectures (i.e., network topology)

may have loops

may have multiple output units

Neural Networks – Optimization

minimize the prediction error while promoting sparsity:

where J(W , b; x (i) , y (i) ) is the prediction error of sample i

Neural Networks – Calculating Gradients

need to compute gradients:

∂J(W ,b;x (i) ,y (i) ) ∂J(W ,b;x (i) ,y (i) )

Neural Networks – Backpropagation

Neural Networks – Backpropagation

Neural Networks – Backpropagation

You might also like