0% found this document useful (0 votes)

8 views

Deep Learning - Part-1

Uploaded by

abebaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Deep Learning - Part-1

Uploaded by

abebaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 143

Machine Learning

ABDELA AHMED, PhD

3. Deep Learning Algorithms

 ANN
 CNN
 DBM
 DBN
 Autoencoder
 LSTM

2
Outline

ANN

Overview Neural
Biological Gradient
of Network Training NN
Neurons Descent
DL Layer

3
What is deep learning?
Overview of DL
 Deep Learning is a growing trend in general data analysis and has been
termed one of the 10 breakthrough technologies

 Deep learning has had a long and rich history, but has gone by many names
reﬂecting diﬀerent philosophical viewpoints, and has waxed and waned in
popularity.

 Broadly speaking, there have been three waves of development of deep

learning:
i. Deep learning known as cybernetics in the 1940s–1960s.
 development of theories of biological learning and
implementations of the ﬁrst models such as the perceptron
allowing the training of a single neuron.
ii. Deep learning known as connectionism in the 1980s–1990s
 The central idea in connectionism is that a large number of simple
computational units can achieve intelligent behavior when
networked together
iii. The current resurgence under the name deep learning beginning in
2006
5
Overview of DL

 Deep learning has become more useful as the amount of available

training data has increased

 Deep learning models have grown in size over time as computer

infrastructure(both hardware and software) for deep learning has
improved.

 Deep learning has solved increasingly complicated applications

with increasing accuracy over time

6
Overview of DL

What exactly is Deep Learning?

 Deep Learning is an neural network with several layers of nodes

between input and output

 Deep-learning methods are representation-learning methods with

multiple levels of representation, obtained by composing simple
but non-linear modules that each transform the representation at
one level (starting with the raw input) into a representation at a
higher, slightly more abstract level.

 Representation learning is a set of

methods that allows a machine to be fed
with raw data and to automatically
discover the representations needed for
detection or classification. 7
Overview of DL

 Let’s be inspired by nature but not too much !!

 For airplanes, we developed aerodynamics and compressible fluid
dynamics.
 We figured that feathers and wing flapping weren't crucial
 What is the equivalent of aerodynamics for understanding
intelligence?
 Computational models of biological learning, i.e. models of how
learning happens or could happen in the brain
 Artificial Neural networks (ANNs) are algorithms that try to
mimic the information fusion of the brain.
 Most importantly, it is now the basis for most of the DL
algorithms
 As a result, one of the names that deep learning has gone by is
artiﬁcial neural networks (ANNs)

8
Neurons in the Brain
 The human brain is composed by 10
billions neuron which are
interconnected each others.
 Neurons communicate by sending
electrical impulses one to another
 Neurons receive inputs from other
neurons, carries out some computations
and sends its out put to other neurons
by electrical impulses

 A biological neuron consists of three

main components :
 Dendrites: input signals channel where
the strength of connections to nucleus
are affected by weights.
 Cell Body: where computation of input
signals and weights generate output
signals which will be delivered to
another neurons
 Axon: transmit output signals to
another neurons that are connected9 to it.
Biological vs Artificial Neurons

 The ANN uses a very simplified mathematical model what a biological neuron does
 ANNs are comprised of several interconnected computational units (neurons) arranged
in layers.
 A basic operating unit in a neural network is a neuron-like node it takes input from
other nodes and sends output to others.
 a computational unit that takes as input x1, x2, x3 .. xn and outputs y where f is the
activation function (E.g. binary threshold, Sigmoid, Softmax, ReLU, and others).
 Each connection link is associated with a weight that determines the strength of
the interconnection

10
Model of an artificial Neuron: Perceptron
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑏

 The neuron receives the weighted sum as input and calculates the output
as a function of input
 Example- compute the output computed by the following perceptron,
which uses a sigmoid activation function and a bias value of 0.5

0.9 x = 0.9×2 + 0.2×3 + 0.3×-1 = 2.6

2
z= 2.1 + 0.5 = 2.6
3 f(x + b)
0.2 1
y = 𝝈 𝒛 = 1+𝑒 −𝟐.𝟔 = 0.93
-1
11
0.3
Logistic Regression vs Perceptron

 What is the hypothesis function of linear regression?

 𝒉𝜽 = 𝜽𝑻 𝒙, where 𝜽 = [𝜽𝟎 , 𝜽𝟏 , …, 𝜽𝒎 ],
 𝒙 = [𝒙𝟎 , 𝒙𝟏 , … , 𝒙𝒎 ], and 𝒙𝟎 = 1

To account for the intercept term

𝒙𝟎 = 𝟏 𝜽𝟎 𝜽𝟎
𝜽
𝒙𝟏 𝜽𝟏 𝒉𝜽 = 𝜽 𝑻 𝒙 = 𝟏 𝒙 𝟎 𝒙 𝟏 𝒙 𝟐 𝒙 𝟑
𝜽𝟐
𝜽𝟐 𝜽𝟑
𝒙𝟐 𝜽𝑻 𝒙 𝒉𝜽
𝜽𝟑 = 𝜽𝟎 𝒙𝟎 + 𝜽𝟏 𝒙 𝟏 + 𝜽𝟐 𝒙𝟐 + 𝜽𝟑 𝒙𝟑
𝒙𝟑
= 𝜽𝟎 + 𝜽𝟏 𝒙𝟏 + 𝜽𝟐 𝒙𝟐 + 𝜽𝟑 𝒙𝟑

12
Logistic Regression vs Perceptron

 What is the hypothesis function of logistic regression?

 𝒉𝜽 = 𝝈(𝜽𝑻 𝒙), where 𝜽 = [𝜽𝟎 , 𝜽𝟏 , …, 𝜽𝒎 ], 𝒙 = [𝒙𝟎 , 𝒙𝟏 , … , 𝒙𝒎 ],
1
𝒙𝟎 = 1, and 𝝈 𝒛 = −𝒛 1+𝑒

𝜽𝟎
𝜽
𝒉𝜽 = 𝒂 = 𝝈 𝒛 = 𝝈 𝜽𝑻 𝒙 = 𝝈( 𝟏 𝒙𝟎 𝒙𝟏 𝒙𝟐 𝒙𝟑 )
𝒙𝟎 = 𝟏 𝜽𝟐
𝜽𝟎 𝜽𝟑
To account for the intercept term
𝒙𝟏
𝜽𝟏
= 𝝈(𝜽𝟎 𝒙𝟎 + 𝜽𝟏 𝒙𝟏 + 𝜽𝟐 𝒙𝟐 + 𝜽𝟑 𝒙𝟑 )
𝜽𝟐
𝒙𝟐 𝒛 = 𝜽𝑻 𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽
𝜽𝟑 = 𝝈(𝜽𝟎 + 𝜽𝟏 𝒙𝟏 + 𝜽𝟐 𝒙𝟐 + 𝜽𝟑 𝒙𝟑 )

𝒙𝟑 1
=
1 + 𝑒 −(𝜽𝟎 +𝜽𝟏 𝒙𝟏 + 𝜽𝟐 𝒙𝟐 + 𝜽𝟑 𝒙𝟑 )

13
Logistic Regression vs Perceptron

 Technically, logistic regression is a neural

network with only 1 neuron

𝒙𝟎 𝒙𝟎
𝜽𝟎 𝜽𝟎
𝒙𝟏 𝒙𝟏
𝜽𝟏 𝜽𝟏

𝜽𝟐 𝜽𝟐
𝒙𝟐 𝒛 = 𝜽𝑻 𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂 𝒉𝜽
𝜽𝟑 𝜽𝟑

𝒙𝟑
Nueron 𝒙𝟑
Nueron

14
Logistic Regression vs Perceptron

 Technically, logistic regression is a neural

network with only 1 neuron

𝒙𝟎 𝟏 Denoted as bias
𝜽𝟎 𝒃
𝒙𝟏 𝒙𝟏
𝜽𝟏 𝒘𝟏

𝜽𝟐 𝒘𝟐 ෝ
𝒚
𝒙𝟐 𝒛 = 𝜽𝑻 𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂
𝜽𝟑 𝒘𝟑

𝒙𝟑
Nueron 𝒙𝟑
Nueron
Using the notations in the neural network
literature, where 𝜽 = 𝒘 = 𝒘𝟏 , 𝒘𝟐 , 𝒘𝟑
ෝ,
(𝒘𝟎 is not part of this vector here), 𝒉𝜽 = 𝒚
and 𝜽𝟎 = 𝒘𝟎 = 𝒃

15
Logistic Regression vs Perceptron

 Technically, logistic regression is a neural

network with only 1 neuron

𝟏 𝒘𝟏
𝒃 𝑻
ෝ = 𝒂 = 𝝈 𝒛 = 𝝈 𝒘 𝒙 + 𝒃 = 𝝈( 𝒘𝟐 𝒙𝟏 𝒙𝟐 𝒙𝟑 + 𝒃)
𝒛 = 𝒘𝑻 𝒙 + 𝒃 𝒚
𝒙𝟏 𝒘𝟑
𝒘𝟏

𝒘𝟐 ෝ
𝒚 = 𝝈(𝒘𝟏 𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒘𝟑 𝒙𝟑 + 𝒃)
𝒙𝟐 𝒛 𝒂
𝒘𝟑
1
=
𝒙𝟑 1 + 𝑒 −(𝒘𝟏 𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒘𝟑 𝒙𝟑 +𝒃)
𝒂 = 𝝈(𝒛)

16
Neural Network Architectures

 A neural network is highly structured and comes in layers where the

first layer being the input layer, and the last layer being the output
layer and all layers in between are referred to as hidden layers.
 The input layer accepts the inputs and forwards them for further
processing through out the network.
 The input units are equivalent to the feature vector considered
for the classification task.
 The output layer produces the prediction of the input instance.
17
Neural Network Architectures
 Left: A 2-layer Neural Network
(one hidden layer of 7 neurons (or
units) and one output layer with 4
neurons), and five inputs.

 Right: A 5-layer neural network

with five inputs, four hidden
layers of 7 neurons each and one
output layer.

 Why not just use a single neuron? Why do wee need a larger
network?
 A single neuron (like logistic regression) only permits a linear
decision boundary
 Most real world problems are considerably more complicated

18
Topologies of an ANN

feedforward
completely
(directed, a-cyclic)
connected
recurrent
(feedback connections)
 Feedforward versus recurrent networks
 Feedforward: No loops, input  hidden layers  output
 Recurrent: Use feedback (positive or negative), A network with feedback,
where some of its inputs are connected to some of its outputs (discrete
time).
 For regular neural networks, the most common layer type is the fully-connected
layer in which neurons between two adjacent layers are fully pairwise
connected, but neurons within a single layer share no connections
 The above feed forward neural network is an example of Neural Network
topologies that use a stack of fully-connected layers
Multi Layer Perceptron (MLP)

 An artificial neural network structure where the flow of

information processing is in only one direction is called Feed
Forward Neural Network.
 One of the most popular FFNN model is the multi-layer
perceptron (MLP).
 The MLP architecture applied for various problems, including
disease diagnosis, function approximation, pattern
classification, fault identification and in manufacturing
20
processes
MLP- 11 neurons, 3 layers: Notations
 We can construct a neural network with as many layers, and
neurons in any layer, as needed
 Notations
 x = input, b = bias term
 w = weights
 z = net input
 f = activation function
 a = output to next layer

Layer 1 Layer 2
Layer 3

Input Layer 21
MLP- 11 neurons, 3 layers : Notations
 We can construct a neural network with as many layers, and
neurons in any layer, as needed
 Notations
 x = input, b = bias term
 w = weights
 z = net input-
 f = activation function
 a = output to next layer

22
MLP- 4 neurons, 2 layers : Notations
 We can construct a neural network with as many layers, and
neurons in any layer, as needed
 Notations
 x = input, b = bias term
 w = weights
 z = net input- sum of weighted inputs
 f = activation function
 a = output to next layer

23
MLP- 4 neurons, 2 layers : Notations
 We can construct a neural network with as many layers, and
neurons in any layer, as needed
 Notations
 x = input, b = bias term
 w = weights
 z = net input- sum of weighted inputs
 f = activation function
 a = activation - output to next layer

24
MLP Matrix representations - Example
 We can construct a neural network with as many layers, and
neurons in any layer, as needed

Input Layer
Input Layer
(or Layer 0) Layer 1

25
MLP Matrix representations - Example

x1 x2 x3 w14 w15 w24 w25 w34 w35 w46 w56

1 0 1 0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3 -0.2

b4 b5 b6
 Bias added to Hidden
Neurons
-0.4 0.2 0.1
MLP Matrix representations - Example

Net Input and Output Calculation

Unitj Net Input Zj Output Oj

6
MLP Matrix representations - Example

Net Input and Output Calculation

Unitj Net Input Zj Output Oj

4 0.2 + 0 - 0.5 -0.4 = -0.7

6
MLP Matrix representations - Example

Net Input and Output Calculation

Unitj Net Input Zj Output Oj

4 0.2 + 0 - 0.5 -0.4 = -0.7 1

Oj  = 0.332
1  e0.7
5

6
MLP Matrix representations - Example

Net Input and Output Calculation

Unitj Net Input Zj Output Oj

4 0.2 + 0 + 0.5 -0.4 = -0.7 1

Oj  = 0.332
1  e0.7
5 -0.3 + 0 + 0.2 + 0.2 =0.1
1
Oj   0.1
= 0.525
1 e
6
MLP Matrix representations - Example

Net Input and Output Calculation

Unitj Net Input Zj Output Oj

4 0.2 + 0 + 0.5 -0.4 = -0.7 1

Oj  = 0.332
1  e0.7
5 -0.3 + 0 + 0.2 + 0.2 =0.1
1
Oj   0.1
= 0.525
1 e
6 (-0.3)0.332-
1
(0.2)(0.525)+0.1= -0.105 Oj 
1  e0.105 = 0.475
MLP : Matrix representation
 We can construct a network of neurons (i.e., a
neural network) with as many layers, and neurons
in any layer, as needed

[𝟏]
To indicate that this activation, a, is in layer 1
𝒛𝟏 𝒂[𝟏]
𝟏
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑