0% found this document useful (0 votes)
216 views25 pages

Deep Neural Networks

The document discusses deep neural networks and their notation. It explains that deep neural networks have multiple hidden layers, with the number of layers depending on how "deep" the network is. Deeper networks can solve problems that shallower networks cannot. It then provides notation for the layers, weights, biases, and forward propagation process in a deep network. Vector and matrix notation is introduced to represent the operations over multiple samples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
216 views25 pages

Deep Neural Networks

The document discusses deep neural networks and their notation. It explains that deep neural networks have multiple hidden layers, with the number of layers depending on how "deep" the network is. Deeper networks can solve problems that shallower networks cannot. It then provides notation for the layers, weights, biases, and forward propagation process in a deep network. Vector and matrix notation is introduced to represent the operations over multiple samples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Deep Neural Networks

Arles Rodríguez
[email protected]

Facultad de Ciencias
Departamento de Matemáticas
Universidad Nacional de Colombia
Deep Neural Networks
𝑥1

𝑥2 ^𝑦 =𝑎
𝑥3

1-layer NN 1-hidden layer NN

2-hidden layer NN 3-hidden layer NN


Deep learning networks
• How deep is a neural networks depends on the
number of hidden layers.
• There are problems that shallow networks cannot
solve that deep neural networks do.
• It may be hard to predict in advance how deep the
network you would want.
• It is reasonable to start with a logistic regression
model and increase the number of layers.
Notation
• This is a 4-layer Layer 1
Layer 2
network with 3 hidden Layer 0

layers =4 Layer 3

Layer 4

5
5
3
• activation in layer
1
𝑎 =𝑔[ 𝑙] ( 𝑧 [ 𝑙 ] ) , 𝑎 =𝑥 , 𝑎 = ^
[ 𝑙] [0] [ 𝐿]
𝑦
=3 • weights in layer
Forward propagation in deep network

Given a single training sample


Layer 1 Layer 2 Layer 4
𝑧 [1]=𝑊 [1] 𝑥 +𝑏[1] 𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [4 ]=𝑊 [4 ] 𝑎[3] +𝑏[4 ]
Forward propagation in deep network
Given a single training sample , with , do you see any pattern?

Layer 1 Layer 2 Layer 4


𝑧 [1]=𝑊 [1] 𝑎[0 ]+𝑏[ 1] 𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [4 ]=𝑊 [4 ] 𝑎[3] +𝑏[4 ]

In general, for a given layer

Layer
𝑧 [𝑙]=𝑊 [ 𝑙] 𝑎[ 𝑙−1] +𝑏[𝑙]
Vectorizing…
In general, for a given layer

Layer
[𝑙]
𝑧 =𝑊 𝑎 [ 𝑙] [ 𝑙−1]
+𝑏 [𝑙] 𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙]

[ ]
¿ ¿ ¿ ¿
[𝑙 ]
𝑍 =¿𝑧 [𝑙](1) 𝑧 [𝑙] (2) … 𝑧[𝑙] (𝑚)
¿ ¿ ¿ ¿
Notation: , output for layer , sample

[ ]
¿ ¿ ¿ ¿
𝐴[ 𝑙]=¿𝑎[𝑙] (1) 𝑎[ 𝑙] (2) … 𝑎[𝑙](𝑚) Unit in layer
¿ ¿ ¿ ¿
Sample
Forward propagation loop
Layer 1
Layer 2
𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙] Layer 0
Layer 3

Layer 4

𝑍 [1]=𝑊 [1] 𝐴[0 ]+𝑏[ 1]

𝑍 [2 ]=𝑊 [2 ] 𝐴[1 ]+𝑏[ 2]


Forward propagation can be implemented as a for loop:
𝑍 [3 ]=𝑊 [ 3] 𝐴[2] +𝑏[3 ]
for l=1 to L:
[ 4]
𝑍 =𝑊 [4 ] [3]
𝐴 +𝑏 [4 ] 𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙]
Getting matrix dimensions right
𝑥1
^𝑦
𝑥2

[5] [0 ]
𝐿=¿5 𝑛 [1]
=¿3 [2]
𝑛 =¿5 𝑛 [3 ]
=¿4 𝑛[4 ]=¿2 𝑛 =¿ 1 𝑛 =¿ 2
What are the sizes of
size? size?

𝑧 [1]=𝑊 [1] 𝑎[0 ]+𝑏[ 1] 𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2]


Shapes: (3,1) (3,2)
(2,1) (3,1) Shapes: (5,1) (5,3)(3,1) (5,1)
) [0] , 1) (𝑛[0] , 1)
(𝑛[1] ,1) (𝑛[1] ,𝑛[ 0](𝑛 ) [1] ,1) (𝑛[2] ,1)
(𝑛[2] ,1) (𝑛[2] ,𝑛[ 1](𝑛
Getting matrix dimensions right [5] [0 ]
[1]
𝑛 =¿3 [2]
𝑛 =¿5 𝑛 [3 ]
=¿4 𝑛[4 ]=¿2 𝑛 =¿ 1 𝑛 =¿ 2
𝑥1
^𝑦 𝐿=¿5 size?
𝑥2
𝑧 [3 ]=𝑊 [3 ] 𝑎[2] +𝑏[3 ]
Shapes: (4,1) (4,5)(5,1) (4,1)
What are the sizes of
) [2] ,1) (𝑛[3] , 1)
(𝑛[3] , 1) (𝑛[3] , 𝑛[2](𝑛
size?
size?
𝑧 [1]=𝑊 [1] 𝑎[0 ]+𝑏[ 1] 𝑧 [4 ]=𝑊 [4 ] 𝑎[3] +𝑏[4 ]
Shapes: (3,1) (3,2)
(2,1) (3,1) Shapes: (2,1) (2,4)(4,1) (2,1)
) [0] , 1) (𝑛[1] ,1)
(𝑛[1] ,1) (𝑛[1] ,𝑛[ 0](𝑛
) [3] , 1) (𝑛[4 ] , 1)
(𝑛[4 ] , 1) (𝑛[4 ] , 𝑛[3 ](𝑛
size? size?

𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [5 ]=𝑊 [5 ] 𝑎[ 4 ]+𝑏[5 ]


Shapes: (5,1) (5,3)(3,1) (5,1) Shapes: (1,1) (1,2)(2,1) (1,1)
) [1] ,1) (𝑛[2] ,1)
(𝑛[2] ,1) (𝑛[2] ,𝑛[ 1](𝑛 ) [4 ] , 1) (𝑛[5] ,1)
(𝑛[5] ,1) (𝑛[5] ,𝑛[ 4 ](𝑛
Getting matrix dimensions right
General case (shapes):
𝑊 [𝑙]:(𝑛 [ 𝑙 ] ,𝑛 [ 𝑙 −1 ] ) 𝑑𝑊 [𝑙] : (𝑛[ 𝑙 ] , 𝑛[ 𝑙− 1] ) size?
[𝑙 ] [𝑙 ]
𝑏[𝑙] :(𝑛 , 1) 𝑑𝑏 [𝑙](𝑛
: , 1)
𝑧 [3 ]=𝑊 [3 ] 𝑎[2] +𝑏[3 ]
Shapes: (4,1) (4,5)(5,1) (4,1)
What are the sizes of
) [2] ,1) (𝑛[3] , 1)
(𝑛[3] , 1) (𝑛[3] , 𝑛[2](𝑛
size?
size?
𝑧 [1]=𝑊 [1] 𝑎[0 ]+𝑏[ 1] 𝑧 [4 ]=𝑊 [4 ] 𝑎[3] +𝑏[4 ]
Shapes: (3,1) (3,2)
(2,1) (3,1) Shapes: (2,1) (2,4)(4,1) (2,1)
) [0] , 1) (𝑛[0] , 1)
(𝑛[1] ,1) (𝑛[1] ,𝑛[ 0](𝑛
) [3] , 1) (𝑛[4 ] , 1)
(𝑛[4 ] , 1) (𝑛[4 ] , 𝑛[3 ](𝑛
size? size?

𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [5 ]=𝑊 [5 ] 𝑎[ 4 ]+𝑏[5 ]


Shapes: (5,1) (5,3)(3,1) (5,1) Shapes: (1,1) (1,2)(2,1) (1,1)
) [1] ,1) (𝑛[2] ,1)
(𝑛[2] ,1) (𝑛[2] ,𝑛[ 1](𝑛 ) [4 ] , 1) (𝑛[5] ,1)
(𝑛[5] ,1) (𝑛[5] ,𝑛[ 4 ](𝑛
Getting matrix dimensions right
General case (shapes):
𝑊 [𝑙]:(𝑛 [ 𝑙 ] ,𝑛 [ 𝑙 −1 ] ) 𝑑𝑊 [𝑙] :( 𝑛[ 𝑙 ] , 𝑛[ 𝑙− 1] )
[𝑙 ] [𝑙 ]
𝑏[𝑙] :(𝑛 , 1) 𝑑𝑏 [𝑙](𝑛
: , 1)

If,

What about shapes with multiple samples at the time?

• Dimensions of W are the same


• Dimensions of will change
Getting matrix dimensions right
𝑧 [1]=𝑊 [1] 𝑥 +𝑏[1]
[1] [ 0] [0]
For one sample, sizes are:(𝑛[1] ,1) (𝑛 ,𝑛 (𝑛 ) , 1) (𝑛[1] ,1)
Vectorized:

[ ]
¿ ¿ ¿ ¿
[1]
𝑍 =¿𝑧 [1] (1) 𝑧 [1] (2) … 𝑧[1] (𝑚)
¿ ¿ ¿ ¿
𝑍 [1]=𝑊 [1] 𝑋 +𝑏[1 ]
Is broadcasted by
) [0] , 𝑚)(𝑛[1] ,1)
For multiple samples, sizes are:(𝑛[1] ,𝑚)(𝑛[1] ,𝑛[ 0](𝑛
element-wise sum
• Dimensions of will change
: [ 𝑙 ] , 1)
𝑧 [𝑙] , 𝑎 [𝑙 ](𝑛 For layer :(𝑛 [ 0] ,𝑚)
𝑍 [𝑙] , 𝐴[𝑙](𝑛: [ 𝑙 ] , 𝑚) [𝑙 ]
𝑑𝑍 [𝑙] ,𝑑𝐴[𝑙] : (𝑛 , 𝑚)
Why deep representations?
𝑥1
^𝑦
𝑥2

simple complex

First layers,
detect edges Parts of faces
and borders Parts of faces composed together

Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of
hierarchical representations. Proceedings of the 26th International Conference On Machine Learning, ICML 2009, 609–616.
https://doi.org/10.1145/1553374.1553453
When is deep better than shallow
• There are functions that you can compute with a
small L-layer deep neural network that shallower
networks that require exponentially more hidden
units to compute.

𝑥1 𝑥2 𝑥3 𝑥 4 𝑥5 𝑥 6 𝑥7 𝑥8
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8

Enumerate computations of the input Depth of network is


When is deep better than shallow
• Hierarchical network can approximate a higher
degree polynomial via composition:

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 … 𝑥1 𝑥2 𝑥3 𝑥 4 𝑥5 𝑥 6 𝑥7 𝑥8

A hierarchical network with 11 layers and


Shallow network with units approximates
39 units can approximate arbitrarily well
arbitrarily well
Mhaskar, H., Liao, Q., & Poggio, T. (2016). Learning Functions: When Is Deep Better Than Shallow,
(045), 1–12. Retrieved from http://arxiv.org/abs/1603.00988
Practical advice
• Start with logistic regression.
• Start with 1-2 layers
• Increase number of layers iteratively.
Forward and backward functions
for l=1 to L:
[𝑙] [𝑙] Layer
𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠:𝑊 , 𝑏
𝑎[ 𝑙 −1 ] 𝑊 [𝑙 ] , 𝑏[ 𝑙] 𝑎[ 𝑙]
Forward propagation:
[𝑙] [ 𝑙] [ 𝑙−1] [𝑙] Cache:
𝑧 =𝑊 𝑎 +𝑏 Cache

Backward propagation:
[𝑙] [𝑙] 𝑑𝑎 [𝑙 −1] 𝑑𝑎[𝑙 ]
𝑐𝑎𝑐h𝑒:𝑧 𝑑𝑊 [𝑙 ]
𝑑𝑏
𝑑𝑊[𝑙[𝑙]]
𝑑𝑏
Forward and backward functions

𝑎[ 1] 𝑎[ 2] 𝑎[ 𝐿
𝑎[ 0] 𝑊 [1]
,𝑏[1]
𝑊 [2 ] [2]
,𝑏 … 𝑊 [ 𝐿] [ 𝐿]
,𝑏
^
𝑦
x Cache: Cache: Cache: Cache:

𝑑𝑎[1] 𝑑𝑎[2 ]… 𝑑𝑎[𝑙 −1]


𝑑𝑎[𝑙
[1]
𝑑𝑊[1] 𝑑𝑊 [2]
𝑑𝑊[𝑖[𝑖]] 𝑑𝑊[𝑙[𝑙]]
𝑑𝑏 𝑑𝑏 [2]
𝑑𝑏 𝑑𝑏
𝑊 [𝑙 ]=𝑊 [ 𝑙] − 𝛼 𝑑𝑊 [𝑙 ]
𝑏[𝑙 ]=𝑏[ 𝑙] − 𝛼 𝑑𝑏 [𝑙 ]
Forward and backward implementation
Having
for l=1 to L:
𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠:𝑤 [𝑙] , 𝑏[𝑙]
Forward propagation: , Cache
[𝑙]
𝑧 =𝑊 𝑎 [ 𝑙] [ 𝑙−1] [𝑙]
+𝑏 𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙]
𝐴[ 𝑙]=𝑔[ 𝑙 ] (𝑍 [ 𝑙 ] )
Backward propagation: , 𝑑𝑊 [𝑙],
′ 𝑑𝑍 [𝐿]= 𝐴[ 𝐿 ] − 𝑌
𝑑𝑧 =𝑑 𝑎 ∗𝑔 ( 𝑧 [ 𝑙 ] ) [𝑙]
[𝑙 ] [𝑙 ] ′
[𝑙 ]
( 𝑍 [𝑙 ])
[𝑙] [ 𝑙]
𝑑𝑍 =𝑑 𝐴 ∗[𝑙𝑔
]
1
𝑑𝑊 [𝑙 ]=𝑑 𝑧 [𝑙 ] 𝑎 [ 𝑙 − 1] 𝑇 𝑑𝑊 =
[𝑙 ]
𝑑𝑍 𝐴
[ 𝑙 − 1] 𝑇
𝑚
𝑑𝑏 [𝑙 ] =𝑑 𝑧 [𝑙 ] , axis=1, keepdims=True)
𝑑𝑎[𝑙 −1]=𝑊 [𝑙] 𝑇 𝑑𝑧 [ 𝑙]
𝑑𝐴[ 𝑙 −1]=𝑊 [𝑙 ]𝑇 𝑑𝑍 [ 𝑙]
Example
𝑥 ReLU ReLU sigmoid

𝑧 [1] 𝑧 [2 ] 𝑧 [3 ]

𝐿( ^𝑦 , 𝑦 )
𝑦 1− 𝑦
𝑑𝑎[1] 𝑑𝑎 [2 ] [ 𝐿]
𝑑𝑎 =− +
𝑎 1 −𝑎

𝑑 𝑊 [1] 𝑑 𝑊 [2] 𝑑 𝑊 [3 ]
𝑑 𝑏[1] 𝑑 𝑏[2] 𝑑 𝑏[3 ]
Parameters and hyperparameters
• Parameters are:

• Hyperparameters (controls W and b):


– Learning rate
– # iterations
– #hidden layers
– #hidden units
– Choice of activation functions
– Others (later): momentum, minibatch size, regularization
parameters, …
Applied deep learning process
• It’s iterative and empirical
• Require try out some values
1.idea • Depends on the application: vision,
NLP,
Online ad, web search,
recommendation.
3.experiment • Systematic process to explore
2.code hyperparameters.
References
• Ng. A (2022) Deep Learning Specialization. https://www.deeplearning.ai/courses/deep-learning-specialization/
• Lalin, J. (2021, December 10). Feedforward neural networks in depth, part 1: Forward and backward
propagations. I, Deep Learning. Retrieved February 28, 2023, from
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/
• Lalin, J. (2021, December 21). Feedforward neural networks in depth, part 2: Activation functions. I, Deep
Learning. Retrieved February 28, 2023, from
https://jonaslalin.com/2021/12/21/feedforward-neural-networks-part-2/
• Lalin, J. (2021, December 22). Feedforward neural networks in depth, part 3: Cost functions. I, Deep Learning.
Retrieved February 28, 2023, from https://jonaslalin.com/2021/12/22/feedforward-neural-networks-part-3/
• Mhaskar, H., Liao, Q., & Poggio, T. (2016). Learning Functions: When Is Deep Better Than Shallow, (045), 1–
12. Retrieved from http://arxiv.org/abs/1603.00988
• Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable
unsupervised learning of hierarchical representations. Proceedings of the 26th International Conference On
Machine Learning, ICML 2009, 609–616. https://doi.org/10.1145/1553374.1553453
¡Thank you!

You might also like