0% found this document useful (0 votes)
19 views56 pages

4 DL Deep Neural Nets

The document discusses deep neural networks, focusing on their structure, including two-layer networks and hyperparameters like network depth and width. It emphasizes the complexity of deep networks compared to shallow ones and the importance of hyperparameter optimization in training. Additionally, it explores the representation of functions by neural networks based on chosen hyperparameters.

Uploaded by

mahfuz.karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views56 pages

4 DL Deep Neural Nets

The document discusses deep neural networks, focusing on their structure, including two-layer networks and hyperparameters like network depth and width. It emphasizes the complexity of deep networks compared to shallow ones and the importance of hyperparameter optimization in training. Additionally, it explores the representation of functions by neural networks based on chosen hyperparameters.

Uploaded by

mahfuz.karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

13

Interactive
Figures!

CMPS 497/CMPE 471 Special Topics in Deep Learning 3


Deep neural networks
• Networks with more than one hidden layer
• Intuition becomes more difficult!

CMPS 497/CMPE 471 Special Topics in Deep Learning 4


Deep neural networks
• Two-layer neural network
• Hyperparameters
• Notation change and general case
• Shallow vs. deep networks

CMPS 497/CMPE 471 Special Topics in Deep Learning 5


Two-layer network

CMPS 497/CMPE 471 Special Topics in Deep Learning 6


Figures from http://udlbook.com
Two-layer network as one equation

CMPS 497/CMPE 471 Special Topics in Deep Learning 7


Two-layer network as one equation

Still .. a mathematical equation 


CMPS 497/CMPE 471 Special Topics in Deep Learning 7
8
Remember shallow net with 2 outputs?
• 1 input, 4 hidden units, 2 outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 9


Figures from http://udlbook.com
Networks as composing functions

Consider the pre-activations at the second hidden units


At this point, it’s a one-layer network with three outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 10


Figures from http://udlbook.com
Networks as composing functions

Consider the pre-activations at the second hidden units


At this point, it’s a one-layer network with three outputs

CMPS 497/CMPE 471 Special Topics in Deep Learning 11


Figures from http://udlbook.com
CMPS 497/CMPE 471 Special Topics in Deep Learning 12
Figures from http://udlbook.com
CMPS 497/CMPE 471 Special Topics in Deep Learning 13
Figures from http://udlbook.com
CMPS 497/CMPE 471 Special Topics in Deep Learning 14
Figures from http://udlbook.com
CMPS 497/CMPE 471 Special Topics in Deep Learning 15
Figures from http://udlbook.com
Shallow network with 1 output …

Bias +
Weight

Bias +
𝒙𝒙 Weight

Bias +
Weight

CMPS 497/CMPE 471 Special Topics in Deep Learning 16


Shallow network with 1 output …

Bias + Activation 𝒉𝒉𝟏𝟏


Weight (eg, ReLU)

Bias + Activation 𝒉𝒉𝟐𝟐


𝒙𝒙 Weight (eg, ReLU)

Bias + Activation
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑

CMPS 497/CMPE 471 Special Topics in Deep Learning 16


Shallow network with 1 output …

Bias + Activation 𝒉𝒉𝟏𝟏


Weight (eg, ReLU)

𝒉𝒉𝟐𝟐 Bias +
Bias + Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted 𝒚𝒚
Sum

Bias + Activation
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑

CMPS 497/CMPE 471 Special Topics in Deep Learning 16


Shallow network with 1 output …

Bias + Activation 𝒉𝒉𝟏𝟏


Weight (eg, ReLU)

𝒉𝒉𝟐𝟐 Bias +
Bias + Activation Piecewise
𝒙𝒙 Weight (eg, ReLU)
Weighted 𝒚𝒚 linear function
Sum

Bias + Activation
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑

CMPS 497/CMPE 471 Special Topics in Deep Learning 16


Shallow network with 3 outputs …
𝒚𝒚𝟏𝟏

𝒚𝒚𝟐𝟐

𝒚𝒚𝟑𝟑

Bias + Activation
𝒉𝒉𝟏𝟏
Weight (eg, ReLU)

𝒉𝒉𝟐𝟐
Bias + Activation
𝒙𝒙 Weight (eg, ReLU)

Bias + Activation
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑

CMPS 497/CMPE 471 Special Topics in Deep Learning 17


Shallow network with 3 outputs …
𝒚𝒚𝟏𝟏

𝒚𝒚𝟐𝟐

𝒚𝒚𝟑𝟑

𝒉𝒉𝟏𝟏 Bias +
Bias + Activation
Weighted
Weight (eg, ReLU)
Sum
𝒉𝒉𝟐𝟐
Bias +
Bias + Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted
Sum

Bias +
Bias + Activation
Weighted
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑 Sum

CMPS 497/CMPE 471 Special Topics in Deep Learning 17


Shallow network with 3 outputs …
𝒚𝒚𝟏𝟏

𝒚𝒚𝟐𝟐

𝒚𝒚𝟑𝟑

𝒉𝒉𝟏𝟏 Bias +
Bias + Activation
Weight (eg, ReLU)
Weighted 𝒚𝒚𝟏𝟏
Sum
𝒉𝒉𝟐𝟐
Bias +
Bias + Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted 𝒚𝒚𝟐𝟐
Sum

Bias +
Bias + Activation
Weight (eg, ReLU)
Weighted 𝒚𝒚𝟑𝟑
𝒉𝒉𝟑𝟑 Sum

CMPS 497/CMPE 471 Special Topics in Deep Learning 17


Shallow network with 3 outputs …
𝒚𝒚𝟏𝟏

𝒚𝒚𝟐𝟐

𝒚𝒚𝟑𝟑

𝒉𝒉𝟏𝟏 Bias +
Bias + Activation Piecewise
Weight (eg, ReLU)
Weighted 𝒚𝒚𝟏𝟏 linear function
Sum
𝒉𝒉𝟐𝟐
Bias +
Bias + Activation Piecewise
𝒙𝒙 Weight (eg, ReLU)
Weighted 𝒚𝒚𝟐𝟐 linear function
Sum

Bias +
Bias + Activation Piecewise
Weight (eg, ReLU)
Weighted 𝒚𝒚𝟑𝟑 linear function
𝒉𝒉𝟑𝟑 Sum

CMPS 497/CMPE 471 Special Topics in Deep Learning 17


Two-layer network with 1 output …

𝒉𝒉𝟏𝟏 Bias +
Bias + Activation
Weighted
Weight (eg, ReLU)
Sum
𝒉𝒉𝟐𝟐
Bias +
Bias + Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted
Sum

Bias +
Bias + Activation
Weighted
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑 Sum

18
Two-layer network with 1 output …

Bias + Activation 𝒉𝒉𝟏𝟏 Bias +


Activation 𝒉𝒉′𝟏𝟏
Weighted
Weight (eg, ReLU) (eg, ReLU)
Sum
𝒉𝒉𝟐𝟐
Bias + 𝒉𝒉′𝟐𝟐
Bias + Activation Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted
(eg, ReLU)
Sum

Bias +
Bias + Activation Activation
Weighted
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑 Sum (eg, ReLU) 𝒉𝒉′𝟑𝟑

18
Two-layer network with 1 output …

Bias + Activation 𝒉𝒉𝟏𝟏 Bias +


Activation 𝒉𝒉′𝟏𝟏
Weighted
Weight (eg, ReLU) (eg, ReLU)
Sum
𝒉𝒉𝟐𝟐
Bias + 𝒉𝒉′𝟐𝟐 Bias +
Bias + Activation Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted
(eg, ReLU)
Weighted 𝒚𝒚’
Sum Sum

Bias +
Bias + Activation Activation
Weighted
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑 Sum (eg, ReLU) 𝒉𝒉′𝟑𝟑

18
Two-layer network with 1 output …

Piecewise
linear functions

Bias + Activation 𝒉𝒉𝟏𝟏 Bias +


Activation 𝒉𝒉′𝟏𝟏
Weighted
Weight (eg, ReLU) (eg, ReLU)
Sum
𝒉𝒉𝟐𝟐
Bias + 𝒉𝒉′𝟐𝟐 Bias +
Bias + Activation Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted
(eg, ReLU)
Weighted 𝒚𝒚’
Sum Sum
Piecewise
linear function
Bias +
Bias + Activation Activation
Weighted
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑 Sum (eg, ReLU) 𝒉𝒉′𝟑𝟑

18
Two-layer network with 1 output …

2 outputs?
Piecewise
linear functions

Bias + Activation 𝒉𝒉𝟏𝟏 Bias +


Activation 𝒉𝒉′𝟏𝟏
Weighted
Weight (eg, ReLU) (eg, ReLU)
Sum
𝒉𝒉𝟐𝟐
Bias + 𝒉𝒉′𝟐𝟐 Bias +
Bias + Activation Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted
(eg, ReLU)
Weighted 𝒚𝒚’
Sum Sum
Piecewise
linear function
Bias +
Bias + Activation Activation
Weighted
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑 Sum (eg, ReLU) 𝒉𝒉′𝟑𝟑

18
Two-layer network with 1 output …
3 layers?

2 outputs?
Piecewise
linear functions

Bias + Activation 𝒉𝒉𝟏𝟏 Bias +


Activation 𝒉𝒉′𝟏𝟏
Weighted
Weight (eg, ReLU) (eg, ReLU)
Sum
𝒉𝒉𝟐𝟐
Bias + 𝒉𝒉′𝟐𝟐 Bias +
Bias + Activation Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted
(eg, ReLU)
Weighted 𝒚𝒚’
Sum Sum
Piecewise
linear function
Bias +
Bias + Activation Activation
Weighted
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑 Sum (eg, ReLU) 𝒉𝒉′𝟑𝟑

18
Two-layer network with 1 output …
3 layers?

2 inputs? 2 outputs?
Piecewise
linear functions

Bias + Activation 𝒉𝒉𝟏𝟏 Bias +


Activation 𝒉𝒉′𝟏𝟏
Weighted
Weight (eg, ReLU) (eg, ReLU)
Sum
𝒉𝒉𝟐𝟐
Bias + 𝒉𝒉′𝟐𝟐 Bias +
Bias + Activation Activation
𝒙𝒙 Weight (eg, ReLU)
Weighted
(eg, ReLU)
Weighted 𝒚𝒚’
Sum Sum
Piecewise
linear function
Bias +
Bias + Activation Activation
Weighted
Weight (eg, ReLU) 𝒉𝒉𝟑𝟑 Sum (eg, ReLU) 𝒉𝒉′𝟑𝟑

18
Deep neural networks
• Two-layer neural network
• Hyperparameters
• Notation change and general case
• Shallow vs. deep networks

CMPS 497/CMPE 471 Special Topics in Deep Learning 19


Hyperparameters
• 𝐾𝐾 layers = depth of network
• 𝐷𝐷𝑘𝑘 hidden units per layer = width of network

Are these learned in training?

• These are called hyperparameters – chosen before training the


network
• Can try retraining with different hyperparameters – hyperparameter
optimization or hyperparameter search

CMPS 497/CMPE 471 Special Topics in Deep Learning 20


Hyperparameters
• For fixed hyperparameters (e.g., 𝐾𝐾 = 2 layers with 𝐷𝐷𝑘𝑘 = 3 hidden units
in each):
the model describes a family of functions

the parameters determine the particular function

• Hence, when we also consider the hyperparameters:


Neural networks are representing
a family of families of functions relating input to output

CMPS 497/CMPE 471 Special Topics in Deep Learning 21


CMPS 497/CMPE 471 Special Topics in Deep Learning 22
Consider a deep neural network with 5 inputs, 2 outputs,
and 20 hidden layers, each containing 30 hidden units each.
What is the depth of this network? What is the width?

23
How many parameters are in that network (5 inputs, 2
outputs, 20 hidden layers, each of 30 hidden units each)?

24
Deep neural networks
• Two-layer neural network
• Hyperparameters
• Notation change and general case
• Shallow vs. deep networks

CMPS 497/CMPE 471 Special Topics in Deep Learning 25


Notation change #1

CMPS 497/CMPE 471 Special Topics in Deep Learning 26


Notation change #2

CMPS 497/CMPE 471 Special Topics in Deep Learning 27


Notation change #3 Weight
Bias
vector matrix

CMPS 497/CMPE 471 Special Topics in Deep Learning 28


General equations for deep network

CMPS 497/CMPE 471 Special Topics in Deep Learning 29


General equations for deep network

Still .. a mathematical equation 


CMPS 497/CMPE 471 Special Topics in Deep Learning 29
Example

CMPS 497/CMPE 471 Special Topics in Deep Learning 30


Figures from http://udlbook.com
CMPS 497/CMPE 471 Special Topics in Deep Learning 31
For a deep network of 4 inputs, 2 layers of 10 and 8 hidden
units, and 3 outputs, what are the sizes of each weight
matrix Ω and bias vector β?

32
Deep neural networks
• Two-layer neural network
• Hyperparameters
• Notation change and general case
• Shallow vs. deep networks

CMPS 497/CMPE 471 Special Topics in Deep Learning 33


Shallow vs. deep networks
The best results are created by deep networks with many layers.
• 50-1000 layers for most applications
• Best results in
• Computer vision
• Natural language processing
• Graph neural networks All use deep networks. But why?
• Generative models
• Reinforcement learning

CMPS 497/CMPE 471 Special Topics in Deep Learning 34


1. Ability to approximate diff. functions?
Both obey the universal approximation theorem.

Argument: One layer is enough, and for deep networks could arrange
for the other layers to compute the identity function.

CMPS 497/CMPE 471 Special Topics in Deep Learning 35


2. N of linear regions per parameter

5 layers 5 layers
10 hidden units per layer 50 hidden units per layer
471 parameters 10,801 parameters
161,501 linear regions >1040 linear regions

Figures from http://udlbook.com


2. N of linear regions per parameter
For a fixed parameter budget,
deeper networks produce more linear regions than shallower ones

• But there are dependencies between them


• Perhaps similar symmetries in real-world functions? Unknown

CMPS 497/CMPE 471 Special Topics in Deep Learning 37


3. Depth efficiency
• There are some functions that require a shallow network with
exponentially more hidden units than a deep network to achieve an
equivalent approximation.

Depth efficiency of deep networks

• But do the real-world functions we want to approximate have this


property? Unknown.

CMPS 497/CMPE 471 Special Topics in Deep Learning 38


4. Large structured networks
• Think about images as input – might be 1M pixels
• Fully connected networks not practical

• Need different parts of the image to be processed similarly


• no point in independently learning to recognize the same object at every
possible position in the image.
• Solution: process local image regions in parallel --> have weights that
only operate locally, and share across image
• This leads to convolutional networks
• Gradually integrate information from across the image – needs
multiple layers
CMPS 497/CMPE 471 Special Topics in Deep Learning 39
5. Fitting and generalization
• Fitting of deep models seems to be easier up to about 20 layers.
• Fitting with more hidden layers becomes harder.

• Generalization is better in deep networks.


CMPS 497/CMPE 471 Special Topics in Deep Learning 40
Figures from http://udlbook.com
CMPS 497/CMPE 471 Special Topics in Deep Learning 41
With same number of parameters, a deeper network
generally has ……… regions compared to shallow ones.
 less
 equal
 more
Depth efficiency means …….
 Adding more hidden layers in a deep net can achieve
equivalent approximationof shallow ones.
 A shallow network might need exponentially more hidden
units to achieve equivalent approximation of a deep net. 42
Where are we going?
• We have defined families of very flexible networks that map multiple
inputs to multiple outputs
• Now we need to train them
• How to choose loss functions
• How to find minima of the loss function
• How to do this in particular for deep networks
• Then we need to test them

CMPS 497/CMPE 471 Special Topics in Deep Learning 43

You might also like