02 Neural Networks

The document discusses deep neural networks, focusing on their architecture, activation functions, training processes, and optimization methods. It covers key concepts such as feed-forward and backpropagation, various activation functions like Sigmoid and ReLU, and optimization techniques including Gradient Descent and Adam. Additionally, it highlights the importance of special layers and methods like dropout and batch normalization in enhancing neural network performance.

Uploaded by

Enrique Otaku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views32 pages

02 Neural Networks

Uploaded by

Enrique Otaku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Unit 2

Deep Neural
Networks

Oscar Contreras Carrasco

UNIVALLE
2024
Contents
●
General definitions
●
Feed-forward neural network architecture
●
Activation functions
●
Neural network training
●
Optimization methods
●
Special layers
Definitions
●
At this point, we learned about linear models and their
characteristics. Therefore, we are ready to delve into neural networks
and their characteristics.
●
First of all, we are going to provide some motivations in terms of
what the current state of the art regarding neural networks.
●
Let’s get started
General structure of a deep neural network
●
A deep neural is composed of several hidden layers, where each one has
several units.
●
We can depict a deep neural network in this way:

z2 z
x3

1
Definitions
●
Deep neural networks have several layers in their configuration
●
The type of neural network we have been describing thus far is also
known as Multilayer Perceptron
●
Each neuron is fully connected with the other units of the neural
network. So it is also known as fully connected neural network or
dense network.
●
Because of the vast number of parameters involved, these models
tend to overfit quite easily.
More about activation functions
●
Each unit in a neural network has an activation function attached to
it.
●
The choice of activation depends on the following factors:
●
Location in the neural network
●
Let’s now look further into activation functions
Most widely used activation functions
Sigmoid
●
The Sigmoid function is used for making binary predictions on a
dataset.
●
It is given by:

●
And its derivative is:

●
This function’s range is [0-1]
Hyperbolic tangent tanh
●
The tanh function is a popular activation function for neurons in
hidden layers
●
It is given by:

●
And its derivative is:

●
This function’s range is [-1, 1]
SoftPlus
●
As an alternative to tanh, a Softplus function could be used for
neurons in hidden layers
●
It is given by:

●
And its derivative is:
ReLU (Rectified Linear Unit)
●
The ReLU function is used as an activation for neurons located in
hidden layers
●
It is given by:

●
And its derivative is:
Leaky ReLU (Leaky Rectified Linear Unit)
●
In addition, the Leaky ReLU function can be used for better stability
●
It is given by:

●
And its derivative is:
Exercise
●
Find the derivatives of the following functions:
Forward propagation
●
Forward propagation is the process of passing through data from the input
layer to the output of a neural network.
●
It involves evaluating the logits and activation functions of each neuron in the
whole set.
●
Let us consider a popular example: Derive the forward propagation equations
for the XOR neural network given by:
Forward propagation
●
XOR forward propagation
Backpropagation
●
Backprop is the process of training a neural network.
●
It involves updating the values of all parameters.
●
Backpropagation of a neural network requires us to calculate the
derivatives of the error function with respect to all layer-level parameters.
●
Let’s now analyze the XOR case in more detail
Backpropagation
Backpropagation
Backpropagation
Forward propagation (Matrix form)
Backward propagation (Matrix form)

Where u is a N x 1 vector of ones

More on optimization methods
●
Gradient descent is not an efficient optimization method, especially
when dealing with high volumes of data.
●
We will now cover some optimization methods of interest we can
use in different scenarios.
Vanilla Gradient Descent
●
All methods we will describe are derived from Gradient Descent.
●
Gradient Descent takes all data from the dataset and uses it to
update the parameters.
●
We have used it extensively when we were working with previous
linear models.
●
Its main expression is:
Minibatch Gradient Descent
●
The main difference between Gradient Descent and Batch Gradient
Descent is that the latter defines small batches of data to update the
parameters, instead of loading the whole dataset in memory.
●
It is defined by:

●
Where Nb is the size of each batch.
●
Batches are typically multiples of 2. For instance, 16, 32, 64, etc.
Stochastic Gradient Descent
●
Instead of visiting the whole dataset or parts of it, we can randomly sample a
point or a batch of points in our dataset and then use it to update the
parameters.
●
This is exactly what Stochastic Gradient Descent does
●
Stochastic Gradient Descent can be defined as:
Comparison of Gradient Descent variants
AdaGrad
●
It is an optimization method that controls the learning rate by
summing up the squared gradients up to the current iteration.
●
Therefore, the main formula is:

●
Where θ is the set of parameters of the neural network.
RMSProp
●
Adadelta we covered before and RMSProp were developed
independently, where RMSProp is esentially the same as Adadelta,
but with predefined momentum values:

●
And the learning rate is to be chosen carefully. A value of 0.001
would be considered good.
Adaptive Moment Estimation (Adam)
●
Besides storing a decaying average of past square gradients, Adam
also keeps a decaying average of past gradients, given by:

●
Where mt and vt are the estimates for the first and second moment of
the gradients (mean and variance). These are typically initialized as
zero vectors. To counteract the bias towards zero, the following are
defined:
Comparison of optimization methods
Miscellaneous characteristics: Dropout
Miscellaneous characteristics: Batch normalization

UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Unit II
No ratings yet
Unit II
56 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
The Finite Element Method Using MATLAB - Kwon and Bang
60% (5)
The Finite Element Method Using MATLAB - Kwon and Bang
527 pages
Mcculloh: Linear Activation Function
No ratings yet
Mcculloh: Linear Activation Function
12 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Unit 5 (QB) - ML
No ratings yet
Unit 5 (QB) - ML
38 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
15 Deep
No ratings yet
15 Deep
39 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Ai Unit 5
No ratings yet
Ai Unit 5
33 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
2K21 - Ee - 192 MLP
No ratings yet
2K21 - Ee - 192 MLP
59 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Module 2
No ratings yet
Module 2
12 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Essential Concept in Artificial Neural Networks
No ratings yet
Essential Concept in Artificial Neural Networks
27 pages
Neural Networks - Annotated
No ratings yet
Neural Networks - Annotated
21 pages
Lesson 2 Neural Network Architectures
No ratings yet
Lesson 2 Neural Network Architectures
35 pages
Unit - 4 Artificial Neural Networks
No ratings yet
Unit - 4 Artificial Neural Networks
33 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
No ratings yet
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
57 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Neural Networks - Annotated
No ratings yet
Neural Networks - Annotated
21 pages
Neural Networks / Deep Learning
No ratings yet
Neural Networks / Deep Learning
9 pages
Ca 3 DL
No ratings yet
Ca 3 DL
6 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
Cs3491-Artificial Intelligence and Machine Learning-1221091049-Unit 5 Aiml
No ratings yet
Cs3491-Artificial Intelligence and Machine Learning-1221091049-Unit 5 Aiml
38 pages
Unit I
No ratings yet
Unit I
90 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Lagranges Interpolation Formula For Unequal Interval
0% (1)
Lagranges Interpolation Formula For Unequal Interval
18 pages
Graph Traversal
100% (1)
Graph Traversal
38 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Sparse Autoencoder
No ratings yet
Sparse Autoencoder
15 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
02 Systems of Linear Equations
No ratings yet
02 Systems of Linear Equations
42 pages
ST M Hdstat RNN Deep Learning
No ratings yet
ST M Hdstat RNN Deep Learning
17 pages
NP Hard and NP Complete
No ratings yet
NP Hard and NP Complete
15 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Chapter 3 Interpolation
100% (1)
Chapter 3 Interpolation
16 pages
Lec #4 (System of Linear Equations)
No ratings yet
Lec #4 (System of Linear Equations)
21 pages
Algorithms Course Outline
No ratings yet
Algorithms Course Outline
2 pages
Searching and Sorting PYQ's
No ratings yet
Searching and Sorting PYQ's
33 pages
Training Neural Network Controller
No ratings yet
Training Neural Network Controller
7 pages
2025-Math Formula Sheet (With Tricks) PDF
No ratings yet
2025-Math Formula Sheet (With Tricks) PDF
10 pages
2 Polynomials
No ratings yet
2 Polynomials
12 pages
ML Module 5
No ratings yet
ML Module 5
5 pages
Approximation and Estimation (Level 1 Practice)
No ratings yet
Approximation and Estimation (Level 1 Practice)
3 pages
Algorithms - Exam 2021-2022 Model Answer
No ratings yet
Algorithms - Exam 2021-2022 Model Answer
4 pages
Maths X Periodic Test 1 Samper Paper 01
No ratings yet
Maths X Periodic Test 1 Samper Paper 01
3 pages
CS 188: Artificial Intelligence: Search
No ratings yet
CS 188: Artificial Intelligence: Search
55 pages
Chpter 3-Simplex Algorithm
No ratings yet
Chpter 3-Simplex Algorithm
34 pages
Algorithm U3 Answer Key
No ratings yet
Algorithm U3 Answer Key
26 pages
LinearRegression Tutorial
No ratings yet
LinearRegression Tutorial
40 pages
NOI PH 2023 Elims Osmosis Editorial
No ratings yet
NOI PH 2023 Elims Osmosis Editorial
3 pages
Data Structure KCS 301
No ratings yet
Data Structure KCS 301
2 pages
Daa Cif 2024
No ratings yet
Daa Cif 2024
5 pages
Assignment QP
No ratings yet
Assignment QP
3 pages
PT 1 10 Maths
No ratings yet
PT 1 10 Maths
2 pages
Pdsa Ga5
No ratings yet
Pdsa Ga5
3 pages
Lec 05 - K-Means
No ratings yet
Lec 05 - K-Means
4 pages
Btech Cs Aiml 4 Sem Design and Analysis of Algorithm Aug 2022
No ratings yet
Btech Cs Aiml 4 Sem Design and Analysis of Algorithm Aug 2022
2 pages
Renditja Me Bubble Sort (Zbrites) :: / (Më Poshtë Është Dhe Zgjidhja Për Insertion Sort)
No ratings yet
Renditja Me Bubble Sort (Zbrites) :: / (Më Poshtë Është Dhe Zgjidhja Për Insertion Sort)
4 pages
Finite Elements For Analysis and Design Akin Akin PDF
No ratings yet
Finite Elements For Analysis and Design Akin Akin PDF
1 page
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet

02 Neural Networks

Uploaded by

02 Neural Networks

Uploaded by

Unit 2

Oscar Contreras Carrasco

Where u is a N x 1 vector of ones

You might also like