0% found this document useful (0 votes)
4 views

5_neural networks

The document discusses neural networks, particularly their application in spam detection, where they learn to classify emails as spam or non-spam. It explains the structure of neural networks, starting from logistic regression as a single neuron model to more complex networks with multiple layers. The document emphasizes the ability of neural networks to learn features automatically rather than relying on engineered inputs.

Uploaded by

rhzx3519
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

5_neural networks

The document discusses neural networks, particularly their application in spam detection, where they learn to classify emails as spam or non-spam. It explains the structure of neural networks, starting from logistic regression as a single neuron model to more complex networks with multiple layers. The document emphasizes the ability of neural networks to learn features automatically rather than relying on engineered inputs.

Uploaded by

rhzx3519
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Neural Networks

CSIT375/975 AI for Cybersecurity


SCIT University of Wollongong

Disclaimer: The presentation materials


come from various sources. For further
information, check the references section
Outline
• Towards Neural Networks
• Model Representation

2
Example application
Spam Detection with NN
• Application: separate spam emails from a set of
non-spam emails
• Given a large collection of example emails, each labeled
“spam” or “ham”
• Learn to predict labels of new, future emails

• How: use neural networks to distinguish between


spam and ham emails

3
Towards Neural Networks

• Technically, logistic regression is a neural network with only 1 neuron


𝒙𝟎 𝒙𝟎
𝜽𝟎 𝜽𝟎
𝒙𝟏 𝒙𝟏
𝜽𝟏 𝜽𝟏

𝜽𝟐 𝜽𝟐
𝒙𝟐 𝒛 = 𝜽𝑻𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂 𝒉𝜽
𝜽𝟑 𝜽𝟑

𝒙𝟑
Neuron 𝒙𝟑
Neuron

4
Towards Neural Networks

• Technically, logistic regression is a neural network with only 1 neuron


𝒙𝟎 𝟏 Denoted as bias
𝜽𝟎 𝒃
𝒙𝟏 𝒙𝟏
𝜽𝟏 𝒘𝟏

𝜽𝟐 𝒘𝟐 +
𝒚
𝒙𝟐 𝒛 = 𝜽𝑻𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂
𝜽𝟑 𝒘𝟑

𝒙𝟑
Neuron 𝒙𝟑
Neuron
Using the notations in the neural network
literature, where 𝜽 = 𝒘 = 𝒘𝟏 , 𝒘𝟐 , 𝒘𝟑
&,
(𝒘𝟎 is not part of this vector here), 𝒉𝜽 = 𝒚
and 𝜽𝟎 = 𝒘𝟎 = 𝒃 5
Towards Neural Networks

• Technically, logistic regression is a neural network with only 1 neuron


𝟏 𝒙𝟏
𝒃 𝒛 = 𝒘𝑻𝒙 + 𝒃 + = 𝒂 = 𝝈 𝒛 = 𝝈 𝒘𝑻𝒙 + 𝒃 = 𝝈( 𝒘𝟏 𝒘𝟐 𝒘𝟑
𝒚 𝒙𝟐 + 𝒃)
𝒙𝟏 𝒙𝟑
𝒘𝟏

𝒘𝟐 + = 𝝈(𝒘𝟏 𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒘𝟑 𝒙𝟑 + 𝒃)
𝒙𝟐 𝒛 𝒂 𝒚
𝒘𝟑
1
=
𝒙𝟑 1 + 𝑒 '(𝒘𝟏𝒙𝟏+ 𝒘𝟐𝒙𝟐+ 𝒘𝟑𝒙𝟑+𝒃)
𝒂 = 𝝈(𝒛)

6
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as


many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐

𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 7
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as


many layers, and neurons in any layer, as needed
[𝟏] [𝟏]
To indicate that this activation, a, is in layer 1
𝒛𝟏 𝒂𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐

𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 8
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as


many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐

𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 9
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as


many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐

𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 10
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as


many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 11
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as


many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏] [𝟏]
By convention, this
𝒙𝟏 𝒛𝟐 𝒂𝟐
neural network is said
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 to have 2 layers (and
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
not 3) since the input
𝒙𝟑 Layer 2 Output layer layer is typically not
[𝟏] counted!
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) Hidden layer with 4 neurons


Layer 1 12
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as


many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏] [𝟏]
Also, the more layers
𝒙𝟏 𝒛𝟐 𝒂𝟐
we add, the deeper
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 the neural network
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
becomes, giving rise to
𝒙𝟑 Layer 2 Output layer the concept of
[𝟏] deep learning!
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) Hidden layer with 4 neurons


Layer 1 13
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as


many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐] Interestingly, neural
𝒙𝟐 𝒛𝟏 𝒂[𝟐] +
𝒚
[𝟏]
𝟏
networks learn their
𝒛𝟑 𝒂[𝟏]
𝒙𝟑
𝟑
Layer 2 Output layer own features!

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) Hidden layer with 4 neurons


Layer 1 14
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as


many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
This looks like logistic
regression, but with
[𝟏]
𝒛𝟐 𝒂[𝟏]
𝟐
features that were
[𝟏] [𝟏]
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 learnt (i.e., 𝒂𝟏 , 𝒂𝟐 ,
𝟏
[𝟏] [𝟏] [𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒂𝟑 , 𝒂𝟒 ) and NOT
Layer 2 Output layer
engineered by us (i.e.,
[𝟏] 𝒙𝟏 , 𝒙𝟐 , and 𝒙𝟑 )
𝒛𝟒 𝒂[𝟏]
𝟒
Hidden layer with 4 neurons
Layer 1 15
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 16
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏 𝑥.
𝒙𝟎 𝒙 = 𝑥/
[𝟏] 𝑥0
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 17
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
[']
𝒛𝟏 𝒂[𝟏]
𝟏
[.] [.] [.]
𝑤.. 𝑤./ 𝑤.0 𝑤.3
[.]
𝑤''
[.] [.] [.] [.]
𝒙𝟎 [']
𝒘[𝟏] = 𝑤/. 𝑤// 𝑤/0 𝑤/3
𝑤'( [𝟏] [.] [.] [.] [.]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 𝑤0. 𝑤0/ 𝑤00 𝑤03
[']
𝑤') [𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
[']
𝑤'* 𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 18
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[.] [.] [.]
𝑤.. 𝑤./ 𝑤.0 𝑤.3
[.]

[.] [.] [.] [.]


𝒙𝟎 𝒘[𝟏] = 𝑤/. 𝑤// 𝑤/0 𝑤/3
[𝟏] [.] [.] [.] [.]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 𝑤0. 𝑤0/ 𝑤00 𝑤03
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 19
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[.] [.] [.]
𝑤.. 𝑤./ 𝑤.0 𝑤.3
[.]

[.] [.] [.] [.]


𝒙𝟎 𝒘[𝟏] = 𝑤/. 𝑤// 𝑤/0 𝑤/3
[𝟏] [.] [.] [.] [.]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 𝑤0. 𝑤0/ 𝑤00 𝑤03
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 20
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[.] [.] [.]
𝑤.. 𝑤./ 𝑤.0 𝑤.3
[.]

[.] [.] [.] [.]


𝒙𝟎 𝒘[𝟏] = 𝑤/. 𝑤// 𝑤/0 𝑤/3
[𝟏] [.] [.] [.] [.]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 𝑤0. 𝑤0/ 𝑤00 𝑤03
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐] +
𝒚
𝟏 Dimension of 𝒘[𝟏] = (3, 4)
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 21
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[.] [.] [.]
[𝟏]
𝒛𝟏 [𝟏] 𝑤.. 𝑤/. 𝑤0.
𝒂𝟏
[.] [.] [.]
[𝟏]𝑻
𝑤./ 𝑤// 𝑤0/
𝒙𝟎 𝒘 = [.] [.] [.]
[𝟏] [𝟏] 𝑤.0 𝑤/0 𝑤00
𝒙𝟏 𝒛𝟐 𝒂𝟐 [.] [.] [.]
𝑤.3 𝑤/3 𝑤03
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
𝑻
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑 Dimension of 𝒘[𝟏] = (4, 3)
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 22
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[.]
[']
𝑏'
[𝟏]
𝒛𝟏 [𝟏]
𝒂𝟏 𝑏.
[.]
𝑏/
𝒙𝟎 [']
𝑏( 𝒃[𝟏] = [.]
[𝟏] [𝟏] 𝑏0
𝒙𝟏 [']
𝒛𝟐 𝒂𝟐
𝑏) [.]
[𝟐]
𝑏3
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
['] [𝟏]
𝑏* 𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 23
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[.]
[𝟏]
𝒛𝟏 [𝟏] 𝑧.
𝒂𝟏
[.]
𝑧/
𝒙𝟎 𝒛[𝟏] = [.]
[𝟏] [𝟏] 𝑧0
𝒙𝟏 𝒛𝟐 𝒂𝟐 [.]
𝑧3
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 24
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[.]
[𝟏]
𝒛𝟏 [𝟏]
𝒂𝟏
𝑎.
[.]
𝑎/
𝒙𝟎 𝒂[𝟏] = [.]
[𝟏] [𝟏] 𝑎0
𝒙𝟏 𝒛𝟐 𝒂𝟐 [.]
𝑎3
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 25
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [𝟏]
[𝟏]
𝑤// 𝑤0/ 𝑤1/ 𝒃𝟏
𝒛𝟐 𝒂[𝟏]
𝒙𝟏 𝟐 [/] [/] [/] 𝑥/ [𝟏]
𝑤/0 𝑤00 𝑤10 𝒃𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 = 𝑥0 +
𝒙𝟐 𝟏 [/] [/] [/] [𝟏]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑤01 𝑤11 𝑥1 𝒃 𝟑
𝒙𝟑 Layer 2 [/] [/] [/] [𝟏]
𝑤/2 𝑤02 𝑤12 𝒃𝟒
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 26
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/]
+ [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 27
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] )
Layer 1 28
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟏] [𝟏]
(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒂𝟏 = 𝝈(𝒛𝟏 )
Layer 1 29
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟏] [𝟏]
(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒂𝟏 = 𝝈(𝒛𝟏 )
Layer 1 30
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟏] [𝟏]
(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒂𝟐 = 𝝈(𝒛𝟐 )
Layer 1 31
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟏] [𝟏]
(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒂𝟐 = 𝝈(𝒛𝟐 )
Layer 1 32
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟏] [𝟏]
(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒂𝟑 = 𝝈(𝒛𝟑 )
Layer 1 33
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟏] [𝟏]
(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒂𝟑 = 𝝈(𝒛𝟑 )
Layer 1 34
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟏] [𝟏]
(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒂𝟒 = 𝝈(𝒛𝟒 )
Layer 1 35
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟏] [𝟏]
(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒂𝟒 = 𝝈(𝒛𝟒 )
Layer 1 36
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[/]
[𝟏]
𝒛𝟏 [𝟏] 𝑤..
𝒂𝟏
[(] [/]
𝑤'' 𝑤/.
𝒙𝟎 𝒘[𝟐] = [/]
[𝟏] [𝟏] [(] 𝑤0.
𝒙𝟏 𝒛𝟐 𝒂𝟐 𝑤('
[/]
[(]
𝑤3.
𝑤)' 𝒛[𝟐]
𝟏 𝒂𝟏
[𝟐]
+
𝒚
𝒙𝟐
[𝟏]
𝒛𝟑 [𝟏] [(] Dimension of 𝒘[𝟐] = (4, 1)
𝒂𝟑 𝑤*'
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 37
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[(]
𝑤'' 𝑻 [/] [/] [/] [/]
𝒙𝟎 𝒘[𝟐] = 𝑤.. 𝑤/. 𝑤0. 𝑤3.
[𝟏] [(]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝑤(' 𝑻
Dimension of 𝒘[𝟐] = (1, 4)
[(]
𝑤)' 𝒛[𝟐]
𝟏 𝒂𝟏
[𝟐]
+
𝒚
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑 𝑤
[(]
*'
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 38
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟐]
𝒙𝟎 𝒃[𝟐] = 𝒃𝟏
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 39
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟐]
𝒙𝟎 𝒛[𝟐] = 𝒛𝟏
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 40
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟐]
𝒙𝟎 𝒂[𝟐] = 𝒂𝟏
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 41
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏] = 𝒘 𝒂 + 𝒃[𝟐] [/]
𝒙𝟏 𝒂𝟐 𝑎/
[𝟐]
𝒛𝟏 𝒂[𝟐] [/]
𝒙𝟐 𝟏
+
𝒚 [0] [0] [0] [0] 𝑎0 [0]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
= 𝑤// 𝑤0/ 𝑤1/ 𝑤2/ [/]
+ 𝑏/
𝒙𝟑 Layer 2 𝑎1
[/]
[𝟏]
𝑎2
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 42
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
= 𝒘 𝒂 + 𝒃[𝟐]
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 [/] [.] [/] [.] [/] [.]
= 𝑤.. 𝑎. + 𝑤/. 𝑎/ + 𝑤0. 𝑎0 + 𝑤3. 𝑎3
[/] [.]
+ 𝑏.
[/]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 43
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
= 𝒘 𝒂 + 𝒃[𝟐]
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 [/] [.] [/] [.] [/] [.]
= 𝑤.. 𝑎. + 𝑤/. 𝑎/ + 𝑤0. 𝑎0 + 𝑤3. 𝑎3 + 𝑏.
[/] [.] [/]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
𝒂[𝟐] = 𝝈(𝒛[𝟐] )
Layer 1 44
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
= 𝒘 𝒂 + 𝒃[𝟐]
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 [/] [.] [/] [.] [/] [.]
= 𝑤.. 𝑎. + 𝑤/. 𝑎/ + 𝑤0. 𝑎0 + 𝑤3. 𝑎3 + 𝑏.
[/] [.] [/]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟐] [𝟐]
(or Layer 0)
𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝒂𝟏 = 𝝈(𝒛𝟏 )
Layer 1 45
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
= 𝒘 𝒂 + 𝒃[𝟐]
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 [/] [.] [/] [.] [/] [.]
= 𝑤.. 𝑎. + 𝑤/. 𝑎/ + 𝑤0. 𝑎0 + 𝑤3. 𝑎3 + 𝑏.
[/] [.] [/]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟐] [𝟐]
(or Layer 0)
𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝒂𝟏 = 𝝈(𝒛𝟏 )
Layer 1 46
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙= 𝒂[𝟎]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
𝒂[𝟏] = 𝝈(𝒛[𝟏] )
𝒙𝟏
𝑻 [𝟏]
𝒙𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒛[𝟐] = [𝟐]
𝒘 𝒂 + 𝒃[𝟐]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝒙𝟑
𝟑
Layer 2
𝒂[𝟐] = 𝝈(𝒛[𝟐] )

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 47
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻 [𝟎]
𝟏 𝒛[𝟏] = [𝟏]
𝒘 𝒂 + 𝒃[𝟏]
𝒙 = 𝒂[𝟎]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
𝒂[𝟏] = 𝝈(𝒛[𝟏] )
𝒙𝟏
𝑻 [𝟏]
𝒙𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒛[𝟐] = [𝟐]
𝒘 𝒂 + 𝒃[𝟐]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝒙𝟑
𝟑
Layer 2
𝒂[𝟐] = 𝝈(𝒛[𝟐] )

[𝟏]
𝒛𝟒 𝒂[𝟏] But, this assumes only 1 training
Input Layer 𝟒

(or Layer 0) example!


Layer 1 48
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻 [𝟎]
𝟏 𝒛[𝟏] = [𝟏]
𝒘 𝒂 + 𝒃[𝟏]
𝒙 = 𝒂[𝟎]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
𝒂[𝟏] = 𝝈(𝒛[𝟏] )
𝒙𝟏
𝑻 [𝟏]
𝒙𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒛[𝟐] = [𝟐]
𝒘 𝒂 + 𝒃[𝟐]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝒙𝟑
𝟑
Layer 2
𝒂[𝟐] = 𝝈(𝒛[𝟐] )

[𝟏]
𝒛𝟒 𝒂[𝟏] How can we account for all the
Input Layer 𝟒

(or Layer 0) training examples?


Layer 1 49
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
Assuming 𝒏 examples
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻 (𝒊)
[𝟏]
𝒛[𝟏](𝒊) = 𝒘 𝟏 𝒂[𝟎](𝒊) + 𝒃[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒂[𝟏](𝒊) = 𝝈(𝒛[𝟏](𝒊) )
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏] 𝑻(𝒊) [𝟏](𝒊)
𝒙𝟑
𝟑
Layer 2 𝒛[𝟐](𝒊) = 𝒘[𝟐] 𝒂 + 𝒃[𝟐]

[𝟏] 𝒂[𝟐](𝒊) = 𝝈(𝒛[𝟐](𝒊) )


𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
Refers to the 𝑖𝑡ℎ example in
(or Layer 0)
Layer 1 the training dataset 50
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
Assuming 𝒏 examples
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻
𝒛[𝟏](𝒊) = 𝒘 𝟏 (𝒊) 𝒂[𝟎](𝒊) + 𝒃[𝟏]
[𝟏]
𝒛𝟐 𝒂[𝟏]
But, loops in general slow down
𝒙𝟏 𝟐
[𝟐] [𝟐]
programs; hence, it is better to
𝒂[𝟏](𝒊) = 𝝈(𝒛[𝟏](𝒊) )
𝒛𝟏 𝒂𝟏 +
𝒚
𝒙𝟐
[𝟏]
further vectorize the
𝒛𝟑 𝒂[𝟏] [𝟐](𝒊) [𝟐]𝑻(𝒊) [𝟏](𝒊)
𝒙𝟑
𝟑
Layer 2 𝒛 implementation
=𝒘 𝒂 𝒃[𝟐] to
in+ order
[𝟏]
avoid
𝒂[𝟐](𝒊)any loop,
= 𝝈(𝒛 whenever
[𝟐](𝒊) ) possible
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
Refers to the 𝑖𝑡ℎ example in
(or Layer 0)
Layer 1 the training dataset 51
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
Assuming 𝒏 examples
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻 (𝒊)
𝒛[𝟏](𝒊) = 𝒘 𝟏 𝒂[𝟎](𝒊) + 𝒃[𝟏]
𝒙𝟏
[𝟏]
𝒛𝟐 𝒂[𝟏]
𝟐 To this end, we can simply stack all
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 𝒂[𝟏](𝒊) = 𝝈(𝒛[𝟏](𝒊)
𝒙 vectors (or )𝒂[𝟎] vectors), 𝒛
𝒙𝟐 𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
vectors,
𝒛[𝟐](𝒊) =𝒘
and
[𝟐] 𝒂 vectors
𝑻(𝒊) in different
𝒂[𝟏](𝒊) + 𝒃[𝟐]
𝒙𝟑 Layer 2
matrices
[𝟐](𝒊)
of every layer!
[𝟐](𝒊)
[𝟏] [𝟏]
𝒂 = 𝝈(𝒛 )
𝒛𝟒 𝒂𝟒
Input Layer
Refers to the 𝑖𝑡ℎ example in
(or Layer 0)
Layer 1 the training dataset 52
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
(𝟏) (𝟐) (𝒏) Assuming
𝒙= 𝒂[𝟎] 𝒙𝟏 𝒙𝟏 𝒙𝟏 𝒏 examples
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑋 = 𝒙(𝟏)
𝟐
(𝟐)
𝒙𝟐 … (𝒏)
𝒙𝟐
𝒙𝟏
(𝟏) (𝟐) (𝒏)
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒙𝟑 𝒙𝟑 𝒙𝟑
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2 This vector represents the first
example in the training dataset
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 53
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
(𝟏) (𝟐) (𝒏) Assuming
𝒙= 𝒂[𝟎] 𝒙𝟏 𝒙𝟏 𝒙𝟏 𝒏 examples
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑋 = 𝒙(𝟏)
𝟐
(𝟐)
𝒙𝟐 … (𝒏)
𝒙𝟐
𝒙𝟏
(𝟏) (𝟐) (𝒏)
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒙𝟑 𝒙𝟑 𝒙𝟑
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2 This vector represents the second
example in the training dataset
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 54
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
(𝟏) (𝟐) (𝒏) Assuming
𝒙= 𝒂[𝟎] 𝒙𝟏 𝒙𝟏 𝒙𝟏 𝒏 examples
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑋 = 𝒙(𝟏)
𝟐
(𝟐)
𝒙𝟐 … (𝒏)
𝒙𝟐
𝒙𝟏
(𝟏) (𝟐) (𝒏)
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒙𝟑 𝒙𝟑 𝒙𝟑
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2 This vector represents the 𝒏𝒕𝒉
example in the training dataset
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 55
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
𝒙= 𝒂[𝟎] 𝒂𝟏 𝒂𝟏 𝒂𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
[𝟏]
𝒛𝟐 𝒂[𝟏] 𝑋= 𝒂𝟐 𝒂𝟐 … 𝒂𝟐
𝒙𝟏 𝟐
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 𝒂𝟑 𝒂𝟑 𝒂𝟑
𝒙𝟐 𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
If we denote 𝒙 as 𝒂[𝟎]
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 56
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
𝒙= 𝒂[𝟎] 𝒂𝟏 𝒂𝟏 𝒂𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑋 = 𝐴[7] = 𝒂𝟐 𝒂𝟐 … 𝒂𝟐
𝒙𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 𝒂𝟑 𝒂𝟑 𝒂𝟑
𝒙𝟐 𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
If we denote 𝒙 as 𝒂[𝟎]
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 57
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[/](/) [/](0) [/](8)
𝒙= 𝒂[𝟎] 𝑧/ 𝑧/ 𝑧/
[/](/) [/](0) [/](8)
[𝟏]
𝒛𝟐 𝒂[𝟏] 𝑧0 𝑧0 𝑧0
𝒙𝟏 𝟐 𝑍 [/] = [/](/) [/](0) [/](8)
[𝟐]
𝒛𝟏 𝒂[𝟐] 𝑧1 𝑧1 … 𝑧1
𝒙𝟐 𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏] [/](/) [/](0) [/](8)
𝟑 𝑧2 𝑧2 𝑧2
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 58
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[/](/) [/](0) [/](8)
𝒙= 𝒂[𝟎] 𝑎/ 𝑎/ 𝑎/
[/](/) [/](0) [/](8)
[𝟏]
𝒛𝟐 𝒂[𝟏] 𝑎0 𝑎0 𝑎0
𝒙𝟏 𝟐 𝐴[/] = [/](/) [/](0) [/](8)
[𝟐]
𝒛𝟏 𝒂[𝟐] 𝑎1 𝑎1 … 𝑎1
𝒙𝟐 𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏] [/](/) [/](0) [/](8)
𝟑 𝑎2 𝑎2 𝑎2
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 59
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙 = 𝒂[𝟎]
[0](/) [0](0) [0](8)
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑍 [0] = 𝑧/ 𝑧/ … 𝑧/
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 60
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙 = 𝒂[𝟎]
[0](/) [0](0) [0](8)
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝐴[0] = 𝑎/ 𝑎/ … 𝑎/
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 61
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻 (𝒊)
[𝟏]
𝒛[𝟏](𝒊) = 𝒘 𝟏 𝒂[𝟎](𝒊) + 𝒃[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒂[𝟏](𝒊) = 𝝈(𝒛[𝟏](𝒊) )
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏] 𝑻(𝒊) [𝟏](𝒊)
𝒙𝟑
𝟑
Layer 2 𝒛[𝟐](𝒊) = 𝒘[𝟐] 𝒂 + 𝒃[𝟐]

[𝟏] 𝒂[𝟐](𝒊) = 𝝈(𝒛[𝟐](𝒊) )


𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) Before Vectorization


Layer 1 62
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize


(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] No Explicit Loop!
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻
[𝟏]
𝒁[𝟏] = 𝒘 𝟏 𝑨[𝟎] + 𝒃[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝑨[𝟏] = 𝝈(𝒁[𝟏] )
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏] 𝑻 [𝟏]
𝒙𝟑
𝟑
Layer 2
𝒁[𝟐] = [𝟐]
𝒘 𝑨 + 𝒃[𝟐]

[𝟏] 𝑨[𝟐] = 𝝈(𝒁[𝟐] )


𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) After Vectorization


Layer 1 63
References

• Machine Learning and Security Protecting Systems with Data and Algorithms, Clarence Chio, David
Freeman
• Deep learning, Mohammad Hammoud, CMU Qatar

64

You might also like