0% found this document useful (0 votes)

44 views49 pages

6COM1044 Deep Learning 1

The document discusses neural networks and deep learning, including an introduction to concepts like activation functions, loss functions, backpropagation, and the unreasonable effectiveness of deep learning. It also provides references for further reading.

Uploaded by

Amir Mohamed Nabil Saleh Elghamery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views49 pages

6COM1044 Deep Learning 1

Uploaded by

Amir Mohamed Nabil Saleh Elghamery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Neural Networks and Deep Learning 1

Dr. Shabnam N. Kadir

University of Hertfordshire

March 16, 2022

S.N. Kadir Neural Networks and Deep Learning 1

References

https://www.deeplearningbook.org/
https://machinelearningmastery.com/
inspirational-applications-deep-learning/
https://d2l.ai/

S.N. Kadir Neural Networks and Deep Learning 1

Introduction to Tensorflow

https://www.deeplearningbook.org/
We shall be using Python and Tensorflow for implementation
Jupyter notebooks
https://colab.research.google.com/notebooks/
welcome.ipynb

S.N. Kadir Neural Networks and Deep Learning 1

Machine Learning Algorithms

A ML algorithm is an algorithm that is able to learn from data.

Mitchell (1997) —A computer program is said to learn from
experience E with respect to some class of tasks T, and
performance measure P, if its performance at task T, as
measured by P, improves with experience E.”
ML enables us to tackle tasks that are too difficult to solve
with fixed programs written and designed by humans.

S.N. Kadir Neural Networks and Deep Learning 1

What is deep learning?

MIT notes by A. Amini 2019

S.N. Kadir Neural Networks and Deep Learning 1

Recall: Neurons and the Perceptron

S.N. Kadir Neural Networks and Deep Learning 1

Deep Learning

Deep learning is a ML technique that employs deep neural

networks.
A deep neural network is a multi-layered neural network that
contains two or more hidden layers
The weights of this neural network need to be adjusted so
that a loss/cost/error function is minimised.

S.N. Kadir Neural Networks and Deep Learning 1

Feed-forward Neural Network Architecture

No feedback connections in which outputs are fed back into

itself.
Important application: Object recognition. Convolutional
neural networks (CNNs) (next week’s lecture) are a specialized
type of FNN inspired by the visual system of the brain.
S.N. Kadir Neural Networks and Deep Learning 1
Deep Neural Networks

S.N. Kadir Neural Networks and Deep Learning 1

Why now?

The algorithms used to

train deep neural
networks have been
around for decades!

S.N. Kadir Neural Networks and Deep Learning 1

Why now?

S.N. Kadir Neural Networks and Deep Learning 1

Universal Approximation Theorem

Theorem (Hornik 1989, Cybenko 1989) :

”A feedforward neural network with a single layer is sufficient
to approximate, to arbitrary precision, any continuous
function.” (Hornik, K., et al. ”Multilayer feedforward networks are universal
approximators.” (1989))

S.N. Kadir Neural Networks and Deep Learning 1

Universal Approximation Theorem:

The universal approximation theorem means that regardless of

what function we are trying to learn, we know that a large
MLP will be able to represent this function.
We are not guaranteed, however, that the training algorithm
will be able to learn that function. Even if the MLP is able to
represent the function, learning can fail for two different
reasons.
First, the optimization algorithm used for training may not be
able to find the value of the parameters that corresponds to
the desired function.
Second, the training algorithm might choose the wrong
function as a result of overfitting.

S.N. Kadir Neural Networks and Deep Learning 1

The unreasonable effectiveness of deep learning

S.N. Kadir Neural Networks and Deep Learning 1

Quality of a neural network

Expressibility: What class of functions can the neural

network express?
Efficiency: How many resources (neurons, parameters, etc.)
does the neural network require to approximate a given
function?
Learnability: How rapidly can the neural network learn good
parameters for approximating a function?

S.N. Kadir Neural Networks and Deep Learning 1

The unreasonable effectiveness of deep learning

To express the same function, deeper neural networks often

require exponentially fewer neurons than shallow networks.

S.N. Kadir Neural Networks and Deep Learning 1

The unreasonable effectiveness of deep learning

S.N. Kadir Neural Networks and Deep Learning 1

Activation Functions

We cannot use a step function as activation function as we

did for the single perceptron.
The composite function produced by the interconnected
perceptrons with a discontinuous activation function will also
be discontinuous, as will the loss function. A differentiable
activation function makes the function computed by a neural
network differentiable
Linear g (h) = h

S.N. Kadir Neural Networks and Deep Learning 1

Why we need non-linear activation functions

S.N. Kadir Neural Networks and Deep Learning 1

Choice of activation function

While the original theorems were first stated in terms of units

with activation functions that saturate for both very negative
and very positive arguments, e.g. sigmoid.
Universal approximation theorems have also been proved for a
wider class of activation functions, which includes rectified
linear unit (ReLU) (Leshno et al. 1993)

S.N. Kadir Neural Networks and Deep Learning 1

Leshno et al. 1993

S.N. Kadir Neural Networks and Deep Learning 1

Activation Functions

Prior to the introduction of rectified linear units, most neural

networks used the logistic sigmoid activation function or the
hyperbolic tangent:
1
Sigmoidal g (h) = (1+e −h )
(e h −e −h )
Tanh g (h) = (e h +e −h )

S.N. Kadir Neural Networks and Deep Learning 1

Choice of Activation Functions

S.N. Kadir Neural Networks and Deep Learning 1

Softmax

e hi
Softmax yi = P hj .
j e

Often used in the final layer of a neural network-based

classifier.
https://towardsdatascience.com/
cross-entropy-loss-function-f38c4ec8643e

S.N. Kadir Neural Networks and Deep Learning 1

Example of a Neural network

S.N. Kadir Neural Networks and Deep Learning 1

Composition of functions in Deep Neural Network

S.N. Kadir Neural Networks and Deep Learning 1

Loss/Cost/Error functions
Target output ti , actual output from network yi . There are p
training examples.
The sum of squared error (sometimes call J or L):
p
1X
E= (ti − yi )2
2
i=1

Cross entropy
p
X
E =− (ti log(yi ) − (1 − ti ) log(1 − yi ))
i=1

(The cross entropy of two different discrete probability

distributions is:
X
H(p, q) = − p(x) log q(x)
x∈X

S.N. Kadir Neural Networks and Deep Learning 1

Example: Cross-entropy loss function

Picture from https://towardsdatascience.com/

cross-entropy-loss-function-f38c4ec8643e

S.N. Kadir Neural Networks and Deep Learning 1

Backpropagation

Backpropagation algorithm looks for the minimum of the loss

function in weight space using the method of gradient descent.
The combination of weights that minimizes the loss function
is considered to be a solution of the learning problem.
Since this method requires computation of the gradient of the
loss function at each iteration step, the loss function needs to
be continuous and differentiable.

S.N. Kadir Neural Networks and Deep Learning 1

Backpropagation

The algorithm can be decomposed into the following steps after

the initialization of weights (e.g. randomly).
1 Feed-forward computation
2 Backpropagation to the output layer
3 Backpropagation to each hidden layer
4 Weight updates
The algorithm is stopped when the value of the error function has
become sufficiently small.
As always, at the very least, divide your data into a test set and a
training set (additionally a validation set).

S.N. Kadir Neural Networks and Deep Learning 1

Chain Rule

If g is differentiable at x and f is differentiable at g (x), then the

derivative of the composition function can be found using the
Chain Rule:
d d
[f ◦ g (x)] ≡ [f (g (x))] = f 0 (g (x))g 0 (x).
dx dx
If y = f (u) and u = g (x), then y = f (g (x)), and the chain rule
can be written:
dy dy du
=
dx du dx

S.N. Kadir Neural Networks and Deep Learning 1

Backpropagation

S.N. Kadir Neural Networks and Deep Learning 1

Backpropagation

S.N. Kadir Neural Networks and Deep Learning 1

Backpropagation: Chain Rule

Input sum of neuron k in layer l:

X
zkl = wkjl ajl−1 + bkl
j

Activation function:
akl = f (zkl ),
where f could be σ.
X
zkl+1 = l+1 l
wmk l+1
ak + b m
k

Now the Chain Rule a lot!

∂E
∂wkjl

S.N. Kadir Neural Networks and Deep Learning 1

Generalised delta rule
Let f be a transfer function, i.e. Oip = f (ui ), where ui = j wij xj .
P
Then
   2 
∂E ∂ 1 X
y p − f 
X
∆wij (t) = −η = −η i wij xj  
∂wij ∂wij 2
i,p j
X p p 0
=η (yi − Oi )f (ui )xj .
p

where f 0 (ui ) = du
df
i
, is the derivative of f (ui ) with respect to ui . If
we update weights example by example,

∆wij (t) = η(yip − Oip )f 0 (ui )xj = ηδip xj

where δip = (yip − Oip )f 0 (ui ). This is known as the generalised

delta rule. The need for f 0 is a key reason why we need continuous
transfer functions.
S.N. Kadir Neural Networks and Deep Learning 1
Gradient descent

From https://www.deeplearningbook.org/

S.N. Kadir Neural Networks and Deep Learning 1

Gradient descent

From https://www.deeplearningbook.org/

S.N. Kadir Neural Networks and Deep Learning 1

Gradient descent

The non-linearity of activation functions causes most interesting

loss functions to become non-convex.

S.N. Kadir Neural Networks and Deep Learning 1

Gradient descent

From https://www.deeplearningbook.org/

S.N. Kadir Neural Networks and Deep Learning 1

Adaptive Learning Rules

Learning rates are no longer fixed.

Can be adjusted according to factors such as:
1 The size of the gradient
2 The size of particular weights
3 How fast learning is occuring
4 etc.

S.N. Kadir Neural Networks and Deep Learning 1

Batches, Epochs

Batch size is a hyperparameter which defines the number of

data samples to work through before updating weights.
Batch Gradient Descent: Batch size = Size of training set
Stochastic Gradient Descent (SGD): Batch size = 1
Mini-batch gradient Descent 1 < Batch Size < Size of
training set
An epoch has passed when all data samples in the training set
have been fed to the neural network during the training
process.
https://machinelearningmastery.com/
difference-between-a-batch-and-an-epoch/

S.N. Kadir Neural Networks and Deep Learning 1

SGD and non-convexity

Convex optimization algorithms with global convergence

guarantees used to train logistic regression or SVMs.
Stochastic gradient descent (SGD) applied to nonconvex loss
functions has no such convergence guarantee and is sensitive
to the values of the initial parameters.
SGD is only guaranteed to converge at a local minimum.
This may not be as bad as it sounds (overfitting).

S.N. Kadir Neural Networks and Deep Learning 1

Goodfellow 2017

S.N. Kadir Neural Networks and Deep Learning 1

SGD initialization of weights

For feedforward neural networks, it is important to initialize all

weights to small random values.
The biases may be initialized to zero or to small positive
values.

S.N. Kadir Neural Networks and Deep Learning 1

SGD sequential vs batch modes

The sequential mode of training is also known as on-line,

pattern, or stochastic mode. In this mode, weights are
updated after the presentation of each example.
The term ”stochastic” comes from the fact that the gradient
based on a single training sample is a ”stochastic
approximation” of the ”true” cost gradient.
In batch mode, weights are updated only after the complete
presentation of all examples in the training set, i.e., only after
each sweep or epoch. Here your error function is typically the
total sum of the errors obtained for each example in the
dataset. Impractical for a very large dataset.

S.N. Kadir Neural Networks and Deep Learning 1

Regularization

Regularization, in the context of machine learning, refers to

the process of modifying a learning algorithm so as to prevent
overfitting.

S.N. Kadir Neural Networks and Deep Learning 1

Dropout regularization
Randomly drop units
(along with their
connections) from the
neural network during
training
Dropout is a training
strategy which ignores a
proportion (e.g. half) of
the hidden neurons,
randomly, when training
the weights (not
updating their weights)
from Srivastava et al. 2014 and setting their
activation to zero.
A different set of neurons
is dropped on each
iteration.
S.N. Kadir Neural Networks and Deep Learning 1
Benefits of Dropout regularization

Forces networks not to rely on any one node (discourages

memorization).
Robustness: Randomly ignoring nodes prevents excessive
inter-dependencies from emerging between nodes (i.e. nodes
do not learn functions which rely on specific input values from
another node), this allows the network to learn more a more
robust relationship.
Akin to a brain losing a few neurons but still being able to do
a task.
Implementing dropout has much the same effect as taking the
average from a committee of networks, however the cost is
significantly less in both time and storage required.

S.N. Kadir Neural Networks and Deep Learning 1

Regularization: Early stopping

MIT notes by A. Amini 2019

S.N. Kadir Neural Networks and Deep Learning 1

Unit II
No ratings yet
Unit II
56 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
What Is Gradient Based Learning in Deep Learning
100% (1)
What Is Gradient Based Learning in Deep Learning
12 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
302 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
mlr3 Tutorial
100% (2)
mlr3 Tutorial
271 pages
Big M Method
No ratings yet
Big M Method
12 pages
Daa Uniti
No ratings yet
Daa Uniti
41 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
ML Unit-5
No ratings yet
ML Unit-5
19 pages
Daa LM Practical Exercises
No ratings yet
Daa LM Practical Exercises
49 pages
Mining of Massive Datasets: Jure Leskovec Anand Rajaraman Jeffrey D. Ullman
0% (1)
Mining of Massive Datasets: Jure Leskovec Anand Rajaraman Jeffrey D. Ullman
17 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
24 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
34 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Robust Network Design
No ratings yet
Robust Network Design
160 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
CS 611 Slides 5
No ratings yet
CS 611 Slides 5
28 pages
Unit I
No ratings yet
Unit I
90 pages
Module 2
No ratings yet
Module 2
44 pages
DL Intro
No ratings yet
DL Intro
64 pages
Kannan M5L3 Notes
No ratings yet
Kannan M5L3 Notes
98 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Mansoura University Faculty of Computers and Information Department of Computer Science First Semester: 2020-2021
No ratings yet
Mansoura University Faculty of Computers and Information Department of Computer Science First Semester: 2020-2021
31 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
104 pages
Data Structures and Algorithm Unit 5
No ratings yet
Data Structures and Algorithm Unit 5
57 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Lecture1 AI
No ratings yet
Lecture1 AI
91 pages
Must Know Questions Deep Learning
No ratings yet
Must Know Questions Deep Learning
22 pages
Applied Deep Learning - Part 1 - Artificial Neural Networks - by Arden Dertat - Towards Data Science
No ratings yet
Applied Deep Learning - Part 1 - Artificial Neural Networks - by Arden Dertat - Towards Data Science
34 pages
Deep Learning Techniques: 1. Define Neural Networks
No ratings yet
Deep Learning Techniques: 1. Define Neural Networks
31 pages
DL 2
No ratings yet
DL 2
62 pages
ST M Hdstat RNN Deep Learning
No ratings yet
ST M Hdstat RNN Deep Learning
17 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
DL Concepts 1 Overview
No ratings yet
DL Concepts 1 Overview
80 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
The Pumping Lemma
No ratings yet
The Pumping Lemma
40 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
A Study On Complements of Zero-Divisor Graphs of Some Algebraic Structures
No ratings yet
A Study On Complements of Zero-Divisor Graphs of Some Algebraic Structures
35 pages
Slides 06
No ratings yet
Slides 06
41 pages
Slides 11
No ratings yet
Slides 11
48 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
Soft - Computing - 2 With Numericals
No ratings yet
Soft - Computing - 2 With Numericals
64 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Lecture7 Part1
No ratings yet
Lecture7 Part1
41 pages
Unit 1
No ratings yet
Unit 1
30 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Appendhhdhdh
No ratings yet
Appendhhdhdh
17 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
01 - Study Unit 1 - Introduction To Algorithms
No ratings yet
01 - Study Unit 1 - Introduction To Algorithms
39 pages
Artificial Intelligence Basics
No ratings yet
Artificial Intelligence Basics
13 pages
First
No ratings yet
First
92 pages
Unit-3 D.L
No ratings yet
Unit-3 D.L
16 pages
Lecture 07 On Decision Trees
No ratings yet
Lecture 07 On Decision Trees
36 pages
Lecture7 Part2
No ratings yet
Lecture7 Part2
26 pages
Machine Learning Unit 5 Notes
No ratings yet
Machine Learning Unit 5 Notes
19 pages
Full Multiagent Systems: Introduction and Coordination Control 1st Edition Magdi S. Mahmoud Ebook All Chapters
100% (3)
Full Multiagent Systems: Introduction and Coordination Control 1st Edition Magdi S. Mahmoud Ebook All Chapters
65 pages
Quantum Computing in AI
No ratings yet
Quantum Computing in AI
10 pages
Deep Learning
No ratings yet
Deep Learning
38 pages
2 DL Training
No ratings yet
2 DL Training
60 pages
Computational Methods
No ratings yet
Computational Methods
7 pages
Experiment 5 Relational and Logical Operation
No ratings yet
Experiment 5 Relational and Logical Operation
3 pages
Formal Languages and Automata Theory Exercises Finite Automata Unit 3
No ratings yet
Formal Languages and Automata Theory Exercises Finite Automata Unit 3
12 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
UVa Problem List Catagorized Algorithmic Problem PDF
No ratings yet
UVa Problem List Catagorized Algorithmic Problem PDF
8 pages
Compiler Design
No ratings yet
Compiler Design
10 pages
CS217 2024 Lec11
No ratings yet
CS217 2024 Lec11
7 pages
Lec 06
No ratings yet
Lec 06
20 pages
Consensus Clustering
No ratings yet
Consensus Clustering
7 pages
Constrained Optimization Matop
No ratings yet
Constrained Optimization Matop
6 pages
Lec2-Deep Neural Networks
No ratings yet
Lec2-Deep Neural Networks
12 pages
Solving System of Non-Linear Equations Using Genetic Algorithm
No ratings yet
Solving System of Non-Linear Equations Using Genetic Algorithm
7 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
06 30175 A1000989 LC Data Structure Algorithms Main v1
No ratings yet
06 30175 A1000989 LC Data Structure Algorithms Main v1
6 pages
Char CRC5
No ratings yet
Char CRC5
6 pages
Neural Network As Universal Approximates
No ratings yet
Neural Network As Universal Approximates
5 pages
Twisted Question Bank - AI
No ratings yet
Twisted Question Bank - AI
2 pages
Computational Methods in Engineering (ME 5201) : Course Content
No ratings yet
Computational Methods in Engineering (ME 5201) : Course Content
2 pages
STP Report
No ratings yet
STP Report
2 pages
Chat-Gpt Coding Questions
No ratings yet
Chat-Gpt Coding Questions
2 pages
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
From Everand
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
César Pérez López
No ratings yet
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet