0% found this document useful (0 votes)
65 views

Demystifying Deep Learning

The document provides an overview of deep learning concepts including: - Computational graphs are used to represent deep networks, where nodes are operations and edges are variables. This allows efficient computation of gradients using reverse-mode differentiation. - Neural networks can be represented as computational graphs, where forward propagation evaluates the graph and backward propagation computes gradients for parameter updates using chain rule. - Frameworks like Tensorflow use automatic differentiation and reverse-mode to compute gradients for large neural networks. - The document recommends a public Jupyter notebook for building a simple neural network from scratch to better understand computational graphs and deep learning libraries.

Uploaded by

Mario Cordina
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Demystifying Deep Learning

The document provides an overview of deep learning concepts including: - Computational graphs are used to represent deep networks, where nodes are operations and edges are variables. This allows efficient computation of gradients using reverse-mode differentiation. - Neural networks can be represented as computational graphs, where forward propagation evaluates the graph and backward propagation computes gradients for parameter updates using chain rule. - Frameworks like Tensorflow use automatic differentiation and reverse-mode to compute gradients for large neural networks. - The document recommends a public Jupyter notebook for building a simple neural network from scratch to better understand computational graphs and deep learning libraries.

Uploaded by

Mario Cordina
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

DEMYSTIFYING DEEP LEARNING Dr V Vella

AGENDA
Quick refresher on Gradient Descent and Probabilistic Perspectives
Differentiation Methods and Autodiff
Computational Graphs
Multi-layer Perceptrons – “Traditional Way”
Deep Networks – “Computational Graph” Architecture
Publicly available Jupyter Notebook – Build own Tensorflow!
SIMPLE REGRESSION
Hypothesis Function:

Cost Function:
GRADIENT DESCENT
Model training:
PROBABILISTIC INTERPRETATION
Let us assume that the target variables and the inputs are related via the equation:

where error term captures either unmodeled effects or random noise. Let us further
assume that the error terms are distributed IID according to a Gaussian distribution
with mean zero and some variance sigma^2.
PROBABILISTIC INTERPRETATION
The probability of the data is given by
This quantity is typically viewed a function of y (and perhaps X), for a fixed value of θ. When
we wish to explicitly view this as a function of θ, we will instead call it the likelihood function:

The principal of maximum likelihood says that we should choose θ so as to make the data as
high probability as possible. I.e., we should choose θ to maximize L(θ).
PROBABILISTIC INTERPRETATION
The derivations is simpler if we instead maximize the log likelihood ℓ(θ):

Hence, maximizing ℓ(θ) gives the same answer as minimizing Least-squares regression
corresponds to finding the
maximum likelihood
estimate of θ.
DIFFERENTIATION METHODS - AUTODIFF
In mathematics and computer algebra, automatic differentiation (AD), also called
algorithmic differentiation or computational differentiation, is a set of techniques
to numerically evaluate the derivative of a function specified by a computer
program.
Bakpropagation refers to the whole process of training an artificial neural network
using multiple backpropagation steps, each of which computes gradients and uses
them to perform a Gradient Descent step. In contrast, auto diff is simply a
technique used to compute gradients efficiently and it happens to be used by
backpropagation.
Tensorflow uses automatic differentiation and more specifically reverse-mode auto
differentiation.
NUMERICAL DIFFERENTIATION
The simplest solution is to compute an approximation of the derivatives, numerically.
Recall the following derivate equations:
NUMERICAL DIFFERENTIATION
COMPUTATIONAL GRAPHS
A computational graph is a directed graph where the nodes correspond to
operations or variables. Variables can feed their value into operations, and
operations can feed their output into other operations. This way, every node in the
graph defines a function of the variables.
COMPUTATIONAL GRAPHS AND DERIVATIVES
Consider the following computational graphs:
COMPUTATIONAL GRAPHS AND DERIVATIVES
COMPUTATIONAL GRAPHS AND DERIVATIVES
COMPUTATIONAL GRAPHS AND DERIVATIVES
We can evaluate the expression by setting the input variables to certain values and
computing nodes up through the graph. For example, let’s set a=2 and b=1:
COMPUTATIONAL GRAPHS AND DERIVATIVES
If one wants to understand derivatives in a computational graph, the key is to understand
derivatives on the edges. If a directly affects c, then we want to know how it affects c. If a
changes a little bit, we want to know the degree/factor by how much c changes.

We call this the partial derivative of c with respect to a.

To evaluate the partial derivatives in this graph, we need the sum rule and the product rule:
COMPUTATIONAL GRAPHS AND DERIVATIVES
Below, the graph has the derivative on each
edge labelled.
What if we want to understand how nodes
that aren’t directly connected affect each
other. Let’s consider how e is affected by a.
If we change a at a speed of 1, c also
changes at a speed of 1. In turn, c changing
at a speed of 1causes e to change at a
speed of 2. So e changes at a rate of 1∗2
with respect to a.
COMPUTATIONAL GRAPHS AND DERIVATIVES
The general rule is to sum over all possible
paths from one node to the other, multiplying
the derivatives on each edge of the path
together. For example, to get the derivative
of e with respect to b we get:
FACTORING PATHS
The problem with just “summing over the paths” is that it’s very easy to get a
combinatorial explosion in the number of possible paths.

Factoring:
FORWARD AND REVERSE MODE DIFFERENTIATION
FORWARD MODE DIFFERENTIATION
REVERSE MODE DIFFERENTIATION

Approach used by modern ML frameworks like Theano and Tensorflow.


COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
SIGMOID FUNCTION
COMPUTATIONAL GRAPH - VECTORIZED
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
COMPUTATIONAL GRAPH
PATTERNS IN BACKWARD FLOW
SOFTMAX REGRESSION
SOFTMAX FUNCTION
LOG-LOSS
LOG-LOSS
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
FINDING WEIGHTS USING GRADIENT DESCENT
NN MATH – THE “TRADITIONAL WAY”
NNS – COMPUTATIONAL GRAPH APPROACH
NNS – COMPUTATIONAL GRAPH APPROACH
PROCESSING INPUT
PROCESSING INPUT
DERIVATIVES AT EACH NODE
DERIVATIVES AT EACH NODE
STATE AFTER BACKWARD PASS
DERIVATIVE COMPUTATION COMPLETE
GRADIENTS FOR PARAMETER UPDATES
PARAMETER UPDATE
BUILDING YOUR OWN TENSORFLOW!
Follow and work throughout this public Jupyter Notebook.
At this point you should have all the necessary background to understand every step.
Suggested for everyone who wishes to have a good hands-on practical to further understand
the architecture of Deep NN libraries like Tensorflow.

http://www.deepideas.net/deep-learning-from-scratch-i-computational-graphs/

You might also like