0% found this document useful (0 votes)

216 views25 pages

Deep Neural Networks

The document discusses deep neural networks and their notation. It explains that deep neural networks have multiple hidden layers, with the number of layers depending on how "deep" the network is. Deeper networks can solve problems that shallower networks cannot. It then provides notation for the layers, weights, biases, and forward propagation process in a deep network. Vector and matrix notation is introduced to represent the operations over multiple samples.

Uploaded by

Pedro Casariego Córdoba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

216 views25 pages

Deep Neural Networks

Uploaded by

Pedro Casariego Córdoba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Deep Neural Networks

Arles Rodríguez
[email protected]

Facultad de Ciencias
Departamento de Matemáticas
Universidad Nacional de Colombia
Deep Neural Networks
𝑥1

𝑥2 ^𝑦 =𝑎
𝑥3

1-layer NN 1-hidden layer NN

2-hidden layer NN 3-hidden layer NN

Deep learning networks
• How deep is a neural networks depends on the
number of hidden layers.
• There are problems that shallow networks cannot
solve that deep neural networks do.
• It may be hard to predict in advance how deep the
network you would want.
• It is reasonable to start with a logistic regression
model and increase the number of layers.
Notation
• This is a 4-layer Layer 1
Layer 2
network with 3 hidden Layer 0

layers =4 Layer 3

Layer 4

5
5
3
• activation in layer
1
𝑎 =𝑔[ 𝑙] ( 𝑧 [ 𝑙 ] ) , 𝑎 =𝑥 , 𝑎 = ^
[ 𝑙] [0] [ 𝐿]
𝑦
=3 • weights in layer
Forward propagation in deep network

Given a single training sample

Layer 1 Layer 2 Layer 4
𝑧 [1]=𝑊 [1] 𝑥 +𝑏[1] 𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [4 ]=𝑊 [4 ] 𝑎[3] +𝑏[4 ]
Forward propagation in deep network
Given a single training sample , with , do you see any pattern?

Layer 1 Layer 2 Layer 4

𝑧 [1]=𝑊 [1] 𝑎[0 ]+𝑏[ 1] 𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [4 ]=𝑊 [4 ] 𝑎[3] +𝑏[4 ]

In general, for a given layer

Layer
𝑧 [𝑙]=𝑊 [ 𝑙] 𝑎[ 𝑙−1] +𝑏[𝑙]
Vectorizing…
In general, for a given layer

Layer
[𝑙]
𝑧 =𝑊 𝑎 [ 𝑙] [ 𝑙−1]
+𝑏 [𝑙] 𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙]

[ ]
¿ ¿ ¿ ¿
[𝑙 ]
𝑍 =¿𝑧 [𝑙](1) 𝑧 [𝑙] (2) … 𝑧[𝑙] (𝑚)
¿ ¿ ¿ ¿
Notation: , output for layer , sample

[ ]
¿ ¿ ¿ ¿
𝐴[ 𝑙]=¿𝑎[𝑙] (1) 𝑎[ 𝑙] (2) … 𝑎[𝑙](𝑚) Unit in layer
¿ ¿ ¿ ¿
Sample
Forward propagation loop
Layer 1
Layer 2
𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙] Layer 0
Layer 3

Layer 4

𝑍 [1]=𝑊 [1] 𝐴[0 ]+𝑏[ 1]

𝑍 [2 ]=𝑊 [2 ] 𝐴[1 ]+𝑏[ 2]

Forward propagation can be implemented as a for loop:
𝑍 [3 ]=𝑊 [ 3] 𝐴[2] +𝑏[3 ]
for l=1 to L:
[ 4]
𝑍 =𝑊 [4 ] [3]
𝐴 +𝑏 [4 ] 𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙]
Getting matrix dimensions right
𝑥1
^𝑦
𝑥2

[5] [0 ]
𝐿=¿5 𝑛 [1]
=¿3 [2]
𝑛 =¿5 𝑛 [3 ]
=¿4 𝑛[4 ]=¿2 𝑛 =¿ 1 𝑛 =¿ 2
What are the sizes of
size? size?

𝑧 [1]=𝑊 [1] 𝑎[0 ]+𝑏[ 1] 𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2]

Shapes: (3,1) (3,2)
(2,1) (3,1) Shapes: (5,1) (5,3)(3,1) (5,1)
) [0] , 1) (𝑛[0] , 1)
(𝑛[1] ,1) (𝑛[1] ,𝑛[ 0](𝑛 ) [1] ,1) (𝑛[2] ,1)
(𝑛[2] ,1) (𝑛[2] ,𝑛[ 1](𝑛
Getting matrix dimensions right [5] [0 ]
[1]
𝑛 =¿3 [2]
𝑛 =¿5 𝑛 [3 ]
=¿4 𝑛[4 ]=¿2 𝑛 =¿ 1 𝑛 =¿ 2
𝑥1
^𝑦 𝐿=¿5 size?
𝑥2
𝑧 [3 ]=𝑊 [3 ] 𝑎[2] +𝑏[3 ]
Shapes: (4,1) (4,5)(5,1) (4,1)
What are the sizes of
) [2] ,1) (𝑛[3] , 1)
(𝑛[3] , 1) (𝑛[3] , 𝑛[2](𝑛
size?
size?
𝑧 [1]=𝑊 [1] 𝑎[0 ]+𝑏[ 1] 𝑧 [4 ]=𝑊 [4 ] 𝑎[3] +𝑏[4 ]
Shapes: (3,1) (3,2)
(2,1) (3,1) Shapes: (2,1) (2,4)(4,1) (2,1)
) [0] , 1) (𝑛[1] ,1)
(𝑛[1] ,1) (𝑛[1] ,𝑛[ 0](𝑛
) [3] , 1) (𝑛[4 ] , 1)
(𝑛[4 ] , 1) (𝑛[4 ] , 𝑛[3 ](𝑛
size? size?

𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [5 ]=𝑊 [5 ] 𝑎[ 4 ]+𝑏[5 ]

Shapes: (5,1) (5,3)(3,1) (5,1) Shapes: (1,1) (1,2)(2,1) (1,1)
) [1] ,1) (𝑛[2] ,1)
(𝑛[2] ,1) (𝑛[2] ,𝑛[ 1](𝑛 ) [4 ] , 1) (𝑛[5] ,1)
(𝑛[5] ,1) (𝑛[5] ,𝑛[ 4 ](𝑛
Getting matrix dimensions right
General case (shapes):
𝑊 [𝑙]:(𝑛 [ 𝑙 ] ,𝑛 [ 𝑙 −1 ] ) 𝑑𝑊 [𝑙] : (𝑛[ 𝑙 ] , 𝑛[ 𝑙− 1] ) size?
[𝑙 ] [𝑙 ]
𝑏[𝑙] :(𝑛 , 1) 𝑑𝑏 [𝑙](𝑛
: , 1)
𝑧 [3 ]=𝑊 [3 ] 𝑎[2] +𝑏[3 ]
Shapes: (4,1) (4,5)(5,1) (4,1)
What are the sizes of
) [2] ,1) (𝑛[3] , 1)
(𝑛[3] , 1) (𝑛[3] , 𝑛[2](𝑛
size?
size?
𝑧 [1]=𝑊 [1] 𝑎[0 ]+𝑏[ 1] 𝑧 [4 ]=𝑊 [4 ] 𝑎[3] +𝑏[4 ]
Shapes: (3,1) (3,2)
(2,1) (3,1) Shapes: (2,1) (2,4)(4,1) (2,1)
) [0] , 1) (𝑛[0] , 1)
(𝑛[1] ,1) (𝑛[1] ,𝑛[ 0](𝑛
) [3] , 1) (𝑛[4 ] , 1)
(𝑛[4 ] , 1) (𝑛[4 ] , 𝑛[3 ](𝑛
size? size?

𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [5 ]=𝑊 [5 ] 𝑎[ 4 ]+𝑏[5 ]

If,

What about shapes with multiple samples at the time?

• Dimensions of W are the same

• Dimensions of will change
Getting matrix dimensions right
𝑧 [1]=𝑊 [1] 𝑥 +𝑏[1]
[1] [ 0] [0]
For one sample, sizes are:(𝑛[1] ,1) (𝑛 ,𝑛 (𝑛 ) , 1) (𝑛[1] ,1)
Vectorized:

[ ]
¿ ¿ ¿ ¿
[1]
𝑍 =¿𝑧 [1] (1) 𝑧 [1] (2) … 𝑧[1] (𝑚)
¿ ¿ ¿ ¿
𝑍 [1]=𝑊 [1] 𝑋 +𝑏[1 ]
Is broadcasted by
) [0] , 𝑚)(𝑛[1] ,1)
For multiple samples, sizes are:(𝑛[1] ,𝑚)(𝑛[1] ,𝑛[ 0](𝑛
element-wise sum
• Dimensions of will change
: [ 𝑙 ] , 1)
𝑧 [𝑙] , 𝑎 [𝑙 ](𝑛 For layer :(𝑛 [ 0] ,𝑚)
𝑍 [𝑙] , 𝐴[𝑙](𝑛: [ 𝑙 ] , 𝑚) [𝑙 ]
𝑑𝑍 [𝑙] ,𝑑𝐴[𝑙] : (𝑛 , 𝑚)
Why deep representations?
𝑥1
^𝑦
𝑥2

simple complex

First layers,
detect edges Parts of faces
and borders Parts of faces composed together

Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of
hierarchical representations. Proceedings of the 26th International Conference On Machine Learning, ICML 2009, 609–616.
https://doi.org/10.1145/1553374.1553453
When is deep better than shallow
• There are functions that you can compute with a
small L-layer deep neural network that shallower
networks that require exponentially more hidden
units to compute.

𝑥1 𝑥2 𝑥3 𝑥 4 𝑥5 𝑥 6 𝑥7 𝑥8
𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8

Enumerate computations of the input Depth of network is

When is deep better than shallow
• Hierarchical network can approximate a higher
degree polynomial via composition:

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8 … 𝑥1 𝑥2 𝑥3 𝑥 4 𝑥5 𝑥 6 𝑥7 𝑥8

Backward propagation:
[𝑙] [𝑙] 𝑑𝑎 [𝑙 −1] 𝑑𝑎[𝑙 ]
𝑐𝑎𝑐h𝑒:𝑧 𝑑𝑊 [𝑙 ]
𝑑𝑏
𝑑𝑊[𝑙[𝑙]]
𝑑𝑏
Forward and backward functions

𝑎[ 1] 𝑎[ 2] 𝑎[ 𝐿
𝑎[ 0] 𝑊 [1]
,𝑏[1]
𝑊 [2 ] [2]
,𝑏 … 𝑊 [ 𝐿] [ 𝐿]
,𝑏
^
𝑦
x Cache: Cache: Cache: Cache:

𝑑𝑎[1] 𝑑𝑎[2 ]… 𝑑𝑎[𝑙 −1]

𝑑𝑎[𝑙
[1]
𝑑𝑊[1] 𝑑𝑊 [2]
𝑑𝑊[𝑖[𝑖]] 𝑑𝑊[𝑙[𝑙]]
𝑑𝑏 𝑑𝑏 [2]
𝑑𝑏 𝑑𝑏
𝑊 [𝑙 ]=𝑊 [ 𝑙] − 𝛼 𝑑𝑊 [𝑙 ]
𝑏[𝑙 ]=𝑏[ 𝑙] − 𝛼 𝑑𝑏 [𝑙 ]
Forward and backward implementation
Having
for l=1 to L:
𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠:𝑤 [𝑙] , 𝑏[𝑙]
Forward propagation: , Cache
[𝑙]
𝑧 =𝑊 𝑎 [ 𝑙] [ 𝑙−1] [𝑙]
+𝑏 𝑍 [𝑙]=𝑊 [ 𝑙] 𝐴[𝑙 −1] +𝑏[𝑙]
𝐴[ 𝑙]=𝑔[ 𝑙 ] (𝑍 [ 𝑙 ] )
Backward propagation: , 𝑑𝑊 [𝑙],
′ 𝑑𝑍 [𝐿]= 𝐴[ 𝐿 ] − 𝑌
𝑑𝑧 =𝑑 𝑎 ∗𝑔 ( 𝑧 [ 𝑙 ] ) [𝑙]
[𝑙 ] [𝑙 ] ′
[𝑙 ]
( 𝑍 [𝑙 ])
[𝑙] [ 𝑙]
𝑑𝑍 =𝑑 𝐴 ∗[𝑙𝑔
]
1
𝑑𝑊 [𝑙 ]=𝑑 𝑧 [𝑙 ] 𝑎 [ 𝑙 − 1] 𝑇 𝑑𝑊 =
[𝑙 ]
𝑑𝑍 𝐴
[ 𝑙 − 1] 𝑇
𝑚
𝑑𝑏 [𝑙 ] =𝑑 𝑧 [𝑙 ] , axis=1, keepdims=True)
𝑑𝑎[𝑙 −1]=𝑊 [𝑙] 𝑇 𝑑𝑧 [ 𝑙]
𝑑𝐴[ 𝑙 −1]=𝑊 [𝑙 ]𝑇 𝑑𝑍 [ 𝑙]
Example
𝑥 ReLU ReLU sigmoid

𝑧 [1] 𝑧 [2 ] 𝑧 [3 ]

𝐿( ^𝑦 , 𝑦 )
𝑦 1− 𝑦
𝑑𝑎[1] 𝑑𝑎 [2 ] [ 𝐿]
𝑑𝑎 =− +
𝑎 1 −𝑎

𝑑 𝑊 [1] 𝑑 𝑊 [2] 𝑑 𝑊 [3 ]
𝑑 𝑏[1] 𝑑 𝑏[2] 𝑑 𝑏[3 ]
Parameters and hyperparameters
• Parameters are:

• Hyperparameters (controls W and b):

– Learning rate
– # iterations
– #hidden layers
– #hidden units
– Choice of activation functions
– Others (later): momentum, minibatch size, regularization
parameters, …
Applied deep learning process
• It’s iterative and empirical
• Require try out some values
1.idea • Depends on the application: vision,
NLP,
Online ad, web search,
recommendation.
3.experiment • Systematic process to explore
2.code hyperparameters.
References
• Ng. A (2022) Deep Learning Specialization. https://www.deeplearning.ai/courses/deep-learning-specialization/
• Lalin, J. (2021, December 10). Feedforward neural networks in depth, part 1: Forward and backward
propagations. I, Deep Learning. Retrieved February 28, 2023, from
https://jonaslalin.com/2021/12/10/feedforward-neural-networks-part-1/
• Lalin, J. (2021, December 21). Feedforward neural networks in depth, part 2: Activation functions. I, Deep
Learning. Retrieved February 28, 2023, from
https://jonaslalin.com/2021/12/21/feedforward-neural-networks-part-2/
• Lalin, J. (2021, December 22). Feedforward neural networks in depth, part 3: Cost functions. I, Deep Learning.
Retrieved February 28, 2023, from https://jonaslalin.com/2021/12/22/feedforward-neural-networks-part-3/
• Mhaskar, H., Liao, Q., & Poggio, T. (2016). Learning Functions: When Is Deep Better Than Shallow, (045), 1–
12. Retrieved from http://arxiv.org/abs/1603.00988
• Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable
unsupervised learning of hierarchical representations. Proceedings of the 26th International Conference On
Machine Learning, ICML 2009, 609–616. https://doi.org/10.1145/1553374.1553453
¡Thank you!

Computer Vision Unit 4
No ratings yet
Computer Vision Unit 4
186 pages
Machine Learning Cheat Sheet PDF
No ratings yet
Machine Learning Cheat Sheet PDF
15 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
Generative Adversarial Networks (GANs)
No ratings yet
Generative Adversarial Networks (GANs)
51 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Natural Language Processing: Dr. Ahmed El-Bialy
100% (1)
Natural Language Processing: Dr. Ahmed El-Bialy
49 pages
Instant Ebooks Textbook Deep Generative Modeling Jakub M. Tomczak Download All Chapters
No ratings yet
Instant Ebooks Textbook Deep Generative Modeling Jakub M. Tomczak Download All Chapters
49 pages
Deep Learning 2017 Lecture7GAN
No ratings yet
Deep Learning 2017 Lecture7GAN
62 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
NPTEL Coursebook
No ratings yet
NPTEL Coursebook
649 pages
Churn For Bank Customers
No ratings yet
Churn For Bank Customers
28 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
7 Classification
100% (3)
7 Classification
63 pages
Deep Learning 4/7: Convolutional Neural Networks: C. de Castro, IEIIT-CNR, Cristina - Decastro@ieiit - Cnr.it
0% (1)
Deep Learning 4/7: Convolutional Neural Networks: C. de Castro, IEIIT-CNR, Cristina - Decastro@ieiit - Cnr.it
49 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
TF Idf
100% (3)
TF Idf
38 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
Factor Theorem: Level 2 Further Maths
No ratings yet
Factor Theorem: Level 2 Further Maths
14 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
35 pages
Data Science Intervieew Questions
100% (1)
Data Science Intervieew Questions
16 pages
Ch5 - Support Vector Machine (SVM)
No ratings yet
Ch5 - Support Vector Machine (SVM)
27 pages
RAG With Math
No ratings yet
RAG With Math
7 pages
Quantum Technology and Optimization Problems: Sebastian Feld Claudia Linnhoff-Popien
No ratings yet
Quantum Technology and Optimization Problems: Sebastian Feld Claudia Linnhoff-Popien
234 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
The Problem of Overfitting: Overfitting With Linear Regression
No ratings yet
The Problem of Overfitting: Overfitting With Linear Regression
32 pages
GNN&Reasoning
No ratings yet
GNN&Reasoning
187 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
17 pages
Soft Max
No ratings yet
Soft Max
6 pages
Unbalanced Transportation
No ratings yet
Unbalanced Transportation
24 pages
Back Propogation
No ratings yet
Back Propogation
43 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Based Python Code Generator For CNN
No ratings yet
Based Python Code Generator For CNN
11 pages
Essentials of Metaheuristics
No ratings yet
Essentials of Metaheuristics
253 pages
Data Science New
No ratings yet
Data Science New
9 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Lecture 2 Post
No ratings yet
Lecture 2 Post
65 pages
Cluster Analysis: Concepts and Techniques - Chapter 7
100% (1)
Cluster Analysis: Concepts and Techniques - Chapter 7
60 pages
Errors and Approximations Lec. 2.1: Errors in Numerical Methods
No ratings yet
Errors and Approximations Lec. 2.1: Errors in Numerical Methods
13 pages
Backpropagation
No ratings yet
Backpropagation
7 pages
Lec01-Introduction and Overview
No ratings yet
Lec01-Introduction and Overview
45 pages
ANN-Unit 6 - Deep Neural Networks
No ratings yet
ANN-Unit 6 - Deep Neural Networks
29 pages
Building Your Deep Neural Network - Step by Step v8 PDF
No ratings yet
Building Your Deep Neural Network - Step by Step v8 PDF
44 pages
CPU Scheduling Exercise With Solution
No ratings yet
CPU Scheduling Exercise With Solution
2 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
EC401 Information Theory & Coding
No ratings yet
EC401 Information Theory & Coding
2 pages
CNN Cheat Sheet
No ratings yet
CNN Cheat Sheet
5 pages
Early Stopping in Practice
No ratings yet
Early Stopping in Practice
14 pages
A Practical Guide To Graph Neural Networks
No ratings yet
A Practical Guide To Graph Neural Networks
28 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
Ajms 487 23
No ratings yet
Ajms 487 23
5 pages
Galect
No ratings yet
Galect
20 pages
Assignment
No ratings yet
Assignment
11 pages
Girard - On The Unity of Logic
No ratings yet
Girard - On The Unity of Logic
17 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Deep Learning
No ratings yet
Deep Learning
39 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Toc
100% (1)
Toc
9 pages
Building Powerful Image Classification Models Using Very Little Data
No ratings yet
Building Powerful Image Classification Models Using Very Little Data
20 pages
1989 Amc8
No ratings yet
1989 Amc8
6 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
Assignment 8 Richardson's Extrapolation
No ratings yet
Assignment 8 Richardson's Extrapolation
2 pages
Number Representation
No ratings yet
Number Representation
7 pages
Piecewise Defined Functions
No ratings yet
Piecewise Defined Functions
15 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
Detecting Pneumonia Using Vision Transformer and Comparing With Other Techniques
No ratings yet
Detecting Pneumonia Using Vision Transformer and Comparing With Other Techniques
5 pages
VLSI Interview Questions - 1
No ratings yet
VLSI Interview Questions - 1
9 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Route Optimization Techniques An Overview
No ratings yet
Route Optimization Techniques An Overview
6 pages
Searching Algorithms: Welcome To CS221: Programming & Data Structures
No ratings yet
Searching Algorithms: Welcome To CS221: Programming & Data Structures
12 pages
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
No ratings yet
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
4 pages
Secant and Modify Secant Method: Prum Sophearith E20170706
No ratings yet
Secant and Modify Secant Method: Prum Sophearith E20170706
11 pages
Non-Stochastic Best Arm Identification and Hyperparameter Optimization
No ratings yet
Non-Stochastic Best Arm Identification and Hyperparameter Optimization
13 pages
COMS 6998 Lec 1
No ratings yet
COMS 6998 Lec 1
8 pages
Bin Math
No ratings yet
Bin Math
8 pages
Hexadecimal Subtraction Increment Decrement
No ratings yet
Hexadecimal Subtraction Increment Decrement
5 pages
Quantum Networking and Internet
No ratings yet
Quantum Networking and Internet
4 pages
Computer Science Holiday Homework
No ratings yet
Computer Science Holiday Homework
7 pages
CSF309 Course Description Document
No ratings yet
CSF309 Course Description Document
3 pages
TOC - Pumping Lemma App
No ratings yet
TOC - Pumping Lemma App
2 pages
Jawaharlal Nehru Technological University Anantapur: (9A05403) Design and Analysis of Algorithms
No ratings yet
Jawaharlal Nehru Technological University Anantapur: (9A05403) Design and Analysis of Algorithms
2 pages

Deep Neural Networks

Uploaded by

Deep Neural Networks

Uploaded by

Deep Neural Networks

1-layer NN 1-hidden layer NN

2-hidden layer NN 3-hidden layer NN

Given a single training sample

Layer 1 Layer 2 Layer 4

In general, for a given layer

𝑍 [1]=𝑊 [1] 𝐴[0 ]+𝑏[ 1]

𝑍 [2 ]=𝑊 [2 ] 𝐴[1 ]+𝑏[ 2]

𝑧 [1]=𝑊 [1] 𝑎[0 ]+𝑏[ 1] 𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2]

𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [5 ]=𝑊 [5 ] 𝑎[ 4 ]+𝑏[5 ]

𝑧 [2 ]=𝑊 [2] 𝑎[ 1]+𝑏[ 2] 𝑧 [5 ]=𝑊 [5 ] 𝑎[ 4 ]+𝑏[5 ]

What about shapes with multiple samples at the time?

• Dimensions of W are the same

Enumerate computations of the input Depth of network is

A hierarchical network with 11 layers and

𝑑𝑎[1] 𝑑𝑎[2 ]… 𝑑𝑎[𝑙 −1]

• Hyperparameters (controls W and b):

You might also like