006-Multiple Layers DNN
006-Multiple Layers DNN
006-Multiple Layers DNN
1
Understand How NN Work
Neural Networks:
Understand matrix operations, loss calculations, and partial
derivatives.
Why do they work? The magic of calculus - leveraging the
chain rule.
Constructing a Neural Network:
Building Block Approach: Compose a neural network using
singular mathematical functions as units.
A neural network isn't magic—it's a sequence of mathematical
operations built step by step.
Concept of a Model:
Model Representation: A computational graph encapsulating a
mathematical function.
Purpose: Accurately map inputs to corresponding outputs . 2
Understand How NN Work
Forward Pass:
What is it? The initial journey of data through the model.
Objective: Calculate predictions based on current weights and
biases.
Loss Calculation:
Significance: A metric to determine how 'off' our predictions are
from actual values.
Backward Propagation:
Leverage quantities from forward pass and apply the chain rule
to deduce parameter impact on the loss.
3
Understand How NN Work
Parameter Update:
Why? To improve and refine our model's predictive capability.
How? Adjust weights and biases based on computed gradients
to minimize future loss.
4
Understand How NN Work
Beginning with Basics:
Description: Initiated with a model using solely linear
operations to transform features to target.
Implication: Despite optimal fitting, the model could only
capture linear relationships.
Introducing Non-linearity:
Process: Linear operations ➡ Nonlinear function (sigmoid) ➡
Further linear operations.
Benefit: The model now has the capability to grasp the intrinsic
nonlinear associations between inputs and outputs.
5
Understand How NN Work
Complex Interactions:
Highlight: The model isn't just about direct feature-to-target
relationships.
Advantage: It captures underlying patterns, recognizing
relationships between combinations of input features and the
target.
6
Deep Learning Models?
What is the connection between models like these and deep
learning models?
Definition: deep learning models are represented by series of
operations that have at least two, nonconsecutive nonlinear
functions involved.
First note that since deep learning models are just a series of
operations, the process of training them is in fact identical to the
process we’ve been using for the simpler models we’ve already
seen.
After all, what allows this training process to work is the
differentiability, so as long as the individual operations making up
the function are differentiable, the whole function will be
differentiable, and we’ll be able to train it using the same four-
step training procedure just described.
7
Manually Coding
So far our approach to actually training these models has been to
compute these derivatives by manually coding the forward and
backward passes and then multiplying the appropriate quantities
together to get the derivatives.
For the simple neural network model, this required 17 steps.
Because we’re describing the model at such a low level, it isn’t
immediately clear how we could add more complexity to this
model (or what exactly what that would mean) or even make a
simple change such as swapping out a different nonlinear
function for the sigmoid function.
To being able to build arbitrarily “deep” and otherwise “complex”
deep learning models, we’ll have to think about where in these 17
steps we can create reusable components, at a higher level than
individual operations, that we can swap in and out to build
different models. 8
Manually Coding
To guide us in the right direction as far as which abstractions to
create, we’ll try to map the operations we’ve been using to
traditional descriptions of neural networks as being made up of
“layers,” “neurons,” and so on.
As our first step, we’ll have to create an abstraction to represent
the individual operations we’ve been working with so far, instead
of continuing to code the same matrix multiplication and bias
addition over and over again.
9
Operations
We know that at a high level, based on the way we’ve used such
functions in our models, it should have forward and backward
methods, each of which receives an ndarray as an input and
outputs an ndarray.
Some operations, such as matrix multiplication, seem to have
another special kind of input, also an ndarray : the parameters.
10
Operations
Few important restrictions:
The shape of the output gradient ndarray must match the shape
of the output.
The shape of the input gradient that the Operation sends
backward during the backward pass must match the shape of the
Operation ’s input.
11
Building Blocks of NN:Layers
In terms of Operations, layers are a series of linear operations
followed by a nonlinear operation.
In our previous network we have five total operations: two linear
operations—a weight multiplication and the addition of a bias
term— followed the sigmoid function and then two more linear
operations.
We would say that the first three operations, up to and including
the nonlinear one, would constitute the first layer, and the last two
operations would constitute the second layer.
12
Building Blocks of NN:Layers
In addition, we say that the input itself represents a special kind
of layer called the input layer (in terms of numbering the layers,
this layer doesn’t count, so that we can think of it as the “zeroth”
layer).
The last layer, similarly, is called the output layer.
The middle layer—the “first one,” according to our numbering—
also has an important name: it is called a hidden layer, since it is
the only layer whose values we don’t typically see explicitly
during the course of training.
13
Building Blocks of NN:Layers
The output layer is an important exception to this definition of
layers, in that it does not have to have a nonlinear operation
applied to it;
This is simply because we often want the values that come out
of this layer to have values between negative infinity and infinity
(or at least between 0 and infinity), whereas nonlinear functions
typically “squash down” their input to some subset of that range
relevant to the particular problem we’re trying to solve (for
example, the sigmoid function squashes down its input to
between 0 and 1).
14
Building Blocks of NN:Layers
The more common way to represent neural networks is in terms
of layers, as shown in Figure.
15
Specific Operations
What specific Operations do we need to implement for the
models in the prior chapter to work?
Based on our experience of implementing that neural network
step by step, we know there are three kinds:
1)The matrix multiplication of the input with the matrix of
parameters
2)The addition of a bias term
3)The sigmoid activation function
16
Single layer Code (tape.gradient)
Define Necessary Functions:
Activation function: ReLu, sigmoid, etc
Derivative of ReLU, sigmoid, etc
Loss function: Mean Squared Error (MSE)
Derivative of MSE
Implement Forward and Backward Propagation & Training
17
Single layer Code (tape.gradient)
Define Necessary Functions:
18
Single layer Code (tape.gradient)
Implement Forward and Backward Propagation & Training
19
Single layer Code (tape.gradient)
Implement Forward and Backward Propagation & Training
20
Single layer Code (tape.gradient)
Implement Forward and Backward Propagation & Training with Class
21
Two layers Code (tape.gradient)
Define Necessary Functions:
Activation function: ReLu, sigmoid, etc
Derivative of ReLU, sigmoid, etc
Loss function: Mean Squared Error (MSE)
Derivative of MSE
Implement Forward and Backward Propagation & Training
22
Multiple layers DNN
Define Necessary Functions:
Activation function: ReLu, sigmoid, etc
Derivative of ReLU, sigmoid, etc
Loss function: Mean Squared Error (MSE)
Derivative of MSE
FullyConnectedLayer Class:
def __init__(self, input_dim, output_dim, activation,
activation_derivative)
def forward(self, X)
def backward
Chain Multiple Layers: class NeuralNetwork:
def __init__(self)
def forward(self, X):
def backward(self, dL):
23
Implement Forward and Backward Propagation & Training
Multiple layers DNN
The description of a neural network in terms of Layers:
24
Multiple layers DNN
Define Necessary Functions:
Activation function: ReLu, sigmoid, etc
Derivative of ReLU, sigmoid, etc
Loss function: Mean Squared Error (MSE)
Derivative of MSE
FullyConnectedLayer Class:
def __init__(self, input_dim, output_dim, activation,
activation_derivative)
def forward(self, X)
def backward
Chain Multiple Layers: class NeuralNetwork:
def __init__(self)
def forward(self, X):
def backward(self, dL): 25
Thank You