Demystifying Deep Learning

The document provides an overview of deep learning concepts including: - Computational graphs are used to represent deep networks, where nodes are operations and edges are variables. This allows efficient computation of gradients using reverse-mode differentiation. - Neural networks can be represented as computational graphs, where forward propagation evaluates the graph and backward propagation computes gradients for parameter updates using chain rule. - Frameworks like Tensorflow use automatic differentiation and reverse-mode to compute gradients for large neural networks. - The document recommends a public Jupyter notebook for building a simple neural network from scratch to better understand computational graphs and deep learning libraries.

Uploaded by

Mario Cordina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views68 pages

Demystifying Deep Learning

Uploaded by

Mario Cordina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

DEMYSTIFYING DEEP LEARNING Dr V Vella

AGENDA
Quick refresher on Gradient Descent and Probabilistic Perspectives
Differentiation Methods and Autodiff
Computational Graphs
Multi-layer Perceptrons – “Traditional Way”
Deep Networks – “Computational Graph” Architecture
Publicly available Jupyter Notebook – Build own Tensorflow!
SIMPLE REGRESSION
Hypothesis Function:

Cost Function:
GRADIENT DESCENT
Model training:
PROBABILISTIC INTERPRETATION
Let us assume that the target variables and the inputs are related via the equation:

where error term captures either unmodeled eﬀects or random noise. Let us further
assume that the error terms are distributed IID according to a Gaussian distribution
with mean zero and some variance sigma^2.
PROBABILISTIC INTERPRETATION
The probability of the data is given by
This quantity is typically viewed a function of y (and perhaps X), for a ﬁxed value of θ. When
we wish to explicitly view this as a function of θ, we will instead call it the likelihood function:

The principal of maximum likelihood says that we should choose θ so as to make the data as
high probability as possible. I.e., we should choose θ to maximize L(θ).
PROBABILISTIC INTERPRETATION
The derivations is simpler if we instead maximize the log likelihood ℓ(θ):

Hence, maximizing ℓ(θ) gives the same answer as minimizing Least-squares regression
corresponds to ﬁnding the
maximum likelihood
estimate of θ.
DIFFERENTIATION METHODS - AUTODIFF
In mathematics and computer algebra, automatic differentiation (AD), also called
algorithmic differentiation or computational differentiation, is a set of techniques
to numerically evaluate the derivative of a function specified by a computer
program.
Bakpropagation refers to the whole process of training an artificial neural network
using multiple backpropagation steps, each of which computes gradients and uses
them to perform a Gradient Descent step. In contrast, auto diff is simply a
technique used to compute gradients efficiently and it happens to be used by
backpropagation.
Tensorflow uses automatic differentiation and more specifically reverse-mode auto
differentiation.
NUMERICAL DIFFERENTIATION
The simplest solution is to compute an approximation of the derivatives, numerically.
Recall the following derivate equations:
NUMERICAL DIFFERENTIATION
COMPUTATIONAL GRAPHS
A computational graph is a directed graph where the nodes correspond to
operations or variables. Variables can feed their value into operations, and
operations can feed their output into other operations. This way, every node in the
graph defines a function of the variables.
COMPUTATIONAL GRAPHS AND DERIVATIVES
Consider the following computational graphs:
COMPUTATIONAL GRAPHS AND DERIVATIVES
COMPUTATIONAL GRAPHS AND DERIVATIVES
COMPUTATIONAL GRAPHS AND DERIVATIVES
We can evaluate the expression by setting the input variables to certain values and
computing nodes up through the graph. For example, let’s set a=2 and b=1:
COMPUTATIONAL GRAPHS AND DERIVATIVES
If one wants to understand derivatives in a computational graph, the key is to understand
derivatives on the edges. If a directly affects c, then we want to know how it affects c. If a
changes a little bit, we want to know the degree/factor by how much c changes.

We call this the partial derivative of c with respect to a.

To evaluate the partial derivatives in this graph, we need the sum rule and the product rule:
COMPUTATIONAL GRAPHS AND DERIVATIVES
Below, the graph has the derivative on each
edge labelled.
What if we want to understand how nodes
that aren’t directly connected affect each
other. Let’s consider how e is affected by a.
If we change a at a speed of 1, c also
changes at a speed of 1. In turn, c changing
at a speed of 1causes e to change at a
speed of 2. So e changes at a rate of 1∗2
with respect to a.
COMPUTATIONAL GRAPHS AND DERIVATIVES
The general rule is to sum over all possible
paths from one node to the other, multiplying
the derivatives on each edge of the path
together. For example, to get the derivative
of e with respect to b we get:
FACTORING PATHS
The problem with just “summing over the paths” is that it’s very easy to get a
combinatorial explosion in the number of possible paths.

Factoring:
FORWARD AND REVERSE MODE DIFFERENTIATION
FORWARD MODE DIFFERENTIATION
REVERSE MODE DIFFERENTIATION

Approach used by modern ML frameworks like Theano and Tensorflow.

COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
COMPUTATIONAL GRAPH - VECTORIZED
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
PATTERNS IN BACKWARD FLOW
SOFTMAX REGRESSION
SOFTMAX FUNCTION
LOG-LOSS
LOG-LOSS
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
NN MATH – THE “TRADITIONAL WAY”
NNS – COMPUTATIONAL GRAPH APPROACH
NNS – COMPUTATIONAL GRAPH APPROACH
PROCESSING INPUT
PROCESSING INPUT
DERIVATIVES AT EACH NODE
DERIVATIVES AT EACH NODE
STATE AFTER BACKWARD PASS
DERIVATIVE COMPUTATION COMPLETE
GRADIENTS FOR PARAMETER UPDATES
PARAMETER UPDATE
BUILDING YOUR OWN TENSORFLOW!
Follow and work throughout this public Jupyter Notebook.
At this point you should have all the necessary background to understand every step.
Suggested for everyone who wishes to have a good hands-on practical to further understand
the architecture of Deep NN libraries like Tensorflow.

http://www.deepideas.net/deep-learning-from-scratch-i-computational-graphs/

Matrix Calculus
No ratings yet
Matrix Calculus
33 pages
Ad Refer
No ratings yet
Ad Refer
53 pages
UNIT 1
No ratings yet
UNIT 1
30 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
Chap3slides
No ratings yet
Chap3slides
95 pages
First
No ratings yet
First
92 pages
BackPropagation
No ratings yet
BackPropagation
10 pages
Learning 3
No ratings yet
Learning 3
98 pages
Lecture21 Deep Learning PartII April12 2021
No ratings yet
Lecture21 Deep Learning PartII April12 2021
60 pages
AutomaticDifferentiation AppliedMaths
No ratings yet
AutomaticDifferentiation AppliedMaths
228 pages
Lecture NM 1 Numerical Differentiation Integration
No ratings yet
Lecture NM 1 Numerical Differentiation Integration
57 pages
Numerical_Methods_Kirkegaard
No ratings yet
Numerical_Methods_Kirkegaard
122 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
6 pages
XCS224N_Module2_Slides
No ratings yet
XCS224N_Module2_Slides
80 pages
Computational Graphs
No ratings yet
Computational Graphs
10 pages
Deep Learning
100% (4)
Deep Learning
100 pages
Lec06 Derivatives
No ratings yet
Lec06 Derivatives
22 pages
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
No ratings yet
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
3 pages
Automatic Differentiation (1) : Slides Prepared By: Atılım Güneş Baydin Gunes@robots - Ox.ac - Uk
No ratings yet
Automatic Differentiation (1) : Slides Prepared By: Atılım Güneş Baydin Gunes@robots - Ox.ac - Uk
114 pages
CS115_Intro_to_Optimization (1) (1)
No ratings yet
CS115_Intro_to_Optimization (1) (1)
60 pages
Unit 1 DL
No ratings yet
Unit 1 DL
52 pages
Lecture12 Diff
No ratings yet
Lecture12 Diff
31 pages
Lecture 3-4
No ratings yet
Lecture 3-4
50 pages
3-Gradient.pptx
No ratings yet
3-Gradient.pptx
31 pages
Differentiable Programming and Design Optimization
No ratings yet
Differentiable Programming and Design Optimization
72 pages
Tut 01
No ratings yet
Tut 01
39 pages
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
No ratings yet
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
84 pages
A Step-By-step Introduction to the Implementation of Automatic Differentiation
No ratings yet
A Step-By-step Introduction to the Implementation of Automatic Differentiation
17 pages
Building an API Product: Design, implement, and release API products that meet user needs 1st Edition Bruno Pedro download
100% (1)
Building an API Product: Design, implement, and release API products that meet user needs 1st Edition Bruno Pedro download
51 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Week 1 Solutions
No ratings yet
Week 1 Solutions
8 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
LOD Differentiable
No ratings yet
LOD Differentiable
55 pages
CS231n Convolutional Neural Networks For Visual Recognition
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition
9 pages
CS231n Convolutional Neural Networks For Visual Recognition 4
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 4
10 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
Introduction To Differentiable Physics - Physics-Based Deep Learning
No ratings yet
Introduction To Differentiable Physics - Physics-Based Deep Learning
8 pages
Machine Learning and Pattern Recognition Week 8 - Backprop
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Backprop
8 pages
04 Numerical
No ratings yet
04 Numerical
46 pages
Automatic Differentiation of Algorithms For Machine Learning
No ratings yet
Automatic Differentiation of Algorithms For Machine Learning
7 pages
07autodiff Nnets
No ratings yet
07autodiff Nnets
12 pages
Backward Forward Propogation
No ratings yet
Backward Forward Propogation
19 pages
Automatic Differentiation and Neural Networks
No ratings yet
Automatic Differentiation and Neural Networks
13 pages
Calc
No ratings yet
Calc
6 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Autodiff
No ratings yet
Autodiff
12 pages
Appendix D Calculus
No ratings yet
Appendix D Calculus
31 pages
Unit 3
No ratings yet
Unit 3
6 pages
cs224n 2023 Lecture03 Neuralnets
No ratings yet
cs224n 2023 Lecture03 Neuralnets
83 pages
MODULE 2 Deep Learning
No ratings yet
MODULE 2 Deep Learning
26 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Backpropagation Exercises
No ratings yet
Backpropagation Exercises
7 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Flat Clustering PDF
No ratings yet
Flat Clustering PDF
73 pages
Design of GSM Based Fire Alarm System
100% (1)
Design of GSM Based Fire Alarm System
51 pages
2025-CBET-University-Scholarship-Application-Form
No ratings yet
2025-CBET-University-Scholarship-Application-Form
3 pages
AB Micrologix PLC PDF
100% (4)
AB Micrologix PLC PDF
50 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Content Beyond Syllabus Unit-2
No ratings yet
Content Beyond Syllabus Unit-2
4 pages
Ebooks File Programming Language Foundations 1st Edition Aaron Stump All Chapters
100% (8)
Ebooks File Programming Language Foundations 1st Edition Aaron Stump All Chapters
84 pages
Lecture 2: Introduction To Pytorch
No ratings yet
Lecture 2: Introduction To Pytorch
7 pages
HP 5347A, 5348A Service
No ratings yet
HP 5347A, 5348A Service
508 pages
Manual-F61
No ratings yet
Manual-F61
77 pages
Python Binance Readthedocs Io en Latest
No ratings yet
Python Binance Readthedocs Io en Latest
228 pages
Berec Report On Universal Service Reflec - 0
No ratings yet
Berec Report On Universal Service Reflec - 0
95 pages
PDF_1678529419
No ratings yet
PDF_1678529419
100 pages
IT_G7_Thesis_Final
No ratings yet
IT_G7_Thesis_Final
25 pages
Project Documentation by Divyanshi Verma (719057)
No ratings yet
Project Documentation by Divyanshi Verma (719057)
298 pages
H 1000 5021 08 B Touch Trigger Probe Systems UG
No ratings yet
H 1000 5021 08 B Touch Trigger Probe Systems UG
38 pages
L1 Sales Deck Why SentinelOne
No ratings yet
L1 Sales Deck Why SentinelOne
27 pages
Dissertation Baraev1
No ratings yet
Dissertation Baraev1
69 pages
Jonathan Joslin
No ratings yet
Jonathan Joslin
34 pages
ProxiGuard Patrol Management System 7x Manual English
No ratings yet
ProxiGuard Patrol Management System 7x Manual English
29 pages
A Project Report On
No ratings yet
A Project Report On
57 pages
Multiuser Diversity in Downlink Channels: When Does The Feedback Cost Outweigh The Spectral Efficiency Gain?
No ratings yet
Multiuser Diversity in Downlink Channels: When Does The Feedback Cost Outweigh The Spectral Efficiency Gain?
25 pages
Icom R9000L SM
No ratings yet
Icom R9000L SM
198 pages
Assignment 4
No ratings yet
Assignment 4
15 pages
R Rep SM.2211 2 2018 PDF e
No ratings yet
R Rep SM.2211 2 2018 PDF e
40 pages
EP2200 Queueing Theory and Teletraffic Systems: Viktoria Fodor
No ratings yet
EP2200 Queueing Theory and Teletraffic Systems: Viktoria Fodor
29 pages
Attack Lab
No ratings yet
Attack Lab
15 pages
Deep Learning Tutorial Complete (v3)
No ratings yet
Deep Learning Tutorial Complete (v3)
109 pages
Introduction To Human Activity Recognition
No ratings yet
Introduction To Human Activity Recognition
10 pages
Session 6-4 Equipment - in - LTE - network - noNote 杨波-final
No ratings yet
Session 6-4 Equipment - in - LTE - network - noNote 杨波-final
39 pages
Pe 42462 Ds
No ratings yet
Pe 42462 Ds
20 pages
Programmable Controller Engineering Software Melsoft GX Works3 FB Quick Start Guide
No ratings yet
Programmable Controller Engineering Software Melsoft GX Works3 FB Quick Start Guide
56 pages
Mobile Networks Connected Drones: Field Trials, Simulations, and Design Insights
No ratings yet
Mobile Networks Connected Drones: Field Trials, Simulations, and Design Insights
8 pages
PN 161218
No ratings yet
PN 161218
17 pages
Applsci 10 05280 v2
No ratings yet
Applsci 10 05280 v2
10 pages
Exercise 6 PDF
No ratings yet
Exercise 6 PDF
2 pages
Scorpion Z6L & Z6PC Manual
No ratings yet
Scorpion Z6L & Z6PC Manual
14 pages
VF IN House Academy (BB5216 Knowledge)
No ratings yet
VF IN House Academy (BB5216 Knowledge)
70 pages
FAC1002 - Computer Codes
No ratings yet
FAC1002 - Computer Codes
44 pages
Virtual Assistant Kitty
No ratings yet
Virtual Assistant Kitty
6 pages
RXV3900_Firmware_Update
No ratings yet
RXV3900_Firmware_Update
5 pages
pec inspection report (1)
No ratings yet
pec inspection report (1)
2 pages
Wep4200 S
No ratings yet
Wep4200 S
2 pages
Provider ICdetection
No ratings yet
Provider ICdetection
22 pages
Page Up and Page Down Keys: See Also
No ratings yet
Page Up and Page Down Keys: See Also
2 pages
Saitel IEC 61850 Client and Server: Efficient Substation Automation With IEC 61850 Standard
No ratings yet
Saitel IEC 61850 Client and Server: Efficient Substation Automation With IEC 61850 Standard
4 pages
Easy 4GLTE IMSI Catchers For Non-Programmers
No ratings yet
Easy 4GLTE IMSI Catchers For Non-Programmers
13 pages
Adobe Photoshop Request Code PDF
No ratings yet
Adobe Photoshop Request Code PDF
4 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Demystifying Deep Learning

Uploaded by

Demystifying Deep Learning

Uploaded by

DEMYSTIFYING DEEP LEARNING Dr V Vella

We call this the partial derivative of c with respect to a.

Approach used by modern ML frameworks like Theano and Tensorflow.

You might also like