0% found this document useful (0 votes)

4 views

5_neural networks

The document discusses neural networks, particularly their application in spam detection, where they learn to classify emails as spam or non-spam. It explains the structure of neural networks, starting from logistic regression as a single neuron model to more complex networks with multiple layers. The document emphasizes the ability of neural networks to learn features automatically rather than relying on engineered inputs.

Uploaded by

rhzx3519

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

5_neural networks

Uploaded by

rhzx3519

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Neural Networks

CSIT375/975 AI for Cybersecurity

SCIT University of Wollongong

Disclaimer: The presentation materials

come from various sources. For further
information, check the references section
Outline
• Towards Neural Networks
• Model Representation

2
Example application
Spam Detection with NN
• Application: separate spam emails from a set of
non-spam emails
• Given a large collection of example emails, each labeled
“spam” or “ham”
• Learn to predict labels of new, future emails

• How: use neural networks to distinguish between

spam and ham emails

3
Towards Neural Networks

• Technically, logistic regression is a neural network with only 1 neuron

𝒙𝟎 𝒙𝟎
𝜽𝟎 𝜽𝟎
𝒙𝟏 𝒙𝟏
𝜽𝟏 𝜽𝟏

𝜽𝟐 𝜽𝟐
𝒙𝟐 𝒛 = 𝜽𝑻𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂 𝒉𝜽
𝜽𝟑 𝜽𝟑

𝒙𝟑
Neuron 𝒙𝟑
Neuron

4
Towards Neural Networks

• Technically, logistic regression is a neural network with only 1 neuron

𝒙𝟎 𝟏 Denoted as bias
𝜽𝟎 𝒃
𝒙𝟏 𝒙𝟏
𝜽𝟏 𝒘𝟏

𝜽𝟐 𝒘𝟐 +
𝒚
𝒙𝟐 𝒛 = 𝜽𝑻𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂
𝜽𝟑 𝒘𝟑

𝒙𝟑
Neuron 𝒙𝟑
Neuron
Using the notations in the neural network
literature, where 𝜽 = 𝒘 = 𝒘𝟏 , 𝒘𝟐 , 𝒘𝟑
&,
(𝒘𝟎 is not part of this vector here), 𝒉𝜽 = 𝒚
and 𝜽𝟎 = 𝒘𝟎 = 𝒃 5
Towards Neural Networks

• Technically, logistic regression is a neural network with only 1 neuron

𝟏 𝒙𝟏
𝒃 𝒛 = 𝒘𝑻𝒙 + 𝒃 + = 𝒂 = 𝝈 𝒛 = 𝝈 𝒘𝑻𝒙 + 𝒃 = 𝝈( 𝒘𝟏 𝒘𝟐 𝒘𝟑
𝒚 𝒙𝟐 + 𝒃)
𝒙𝟏 𝒙𝟑
𝒘𝟏

𝒘𝟐 + = 𝝈(𝒘𝟏 𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒘𝟑 𝒙𝟑 + 𝒃)
𝒙𝟐 𝒛 𝒂 𝒚
𝒘𝟑
1
=
𝒙𝟑 1 + 𝑒 '(𝒘𝟏𝒙𝟏+ 𝒘𝟐𝒙𝟐+ 𝒘𝟑𝒙𝟑+𝒃)
𝒂 = 𝝈(𝒛)

6
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as

many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐

𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 7
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as

many layers, and neurons in any layer, as needed
[𝟏] [𝟏]
To indicate that this activation, a, is in layer 1
𝒛𝟏 𝒂𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐

𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 8
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as

many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐

𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 9
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as

many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐

𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 10
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as

many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 11
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as

many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏] [𝟏]
By convention, this
𝒙𝟏 𝒛𝟐 𝒂𝟐
neural network is said
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 to have 2 layers (and
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
not 3) since the input
𝒙𝟑 Layer 2 Output layer layer is typically not
[𝟏] counted!
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) Hidden layer with 4 neurons

Layer 1 12
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as

many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏] [𝟏]
Also, the more layers
𝒙𝟏 𝒛𝟐 𝒂𝟐
we add, the deeper
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 the neural network
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
becomes, giving rise to
𝒙𝟑 Layer 2 Output layer the concept of
[𝟏] deep learning!
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) Hidden layer with 4 neurons

Layer 1 13
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as

many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐] Interestingly, neural
𝒙𝟐 𝒛𝟏 𝒂[𝟐] +
𝒚
[𝟏]
𝟏
networks learn their
𝒛𝟑 𝒂[𝟏]
𝒙𝟑
𝟑
Layer 2 Output layer own features!

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) Hidden layer with 4 neurons

Layer 1 14
Neural Networks

• We can construct a network of neurons (i.e., a neural network) with as

many layers, and neurons in any layer, as needed
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
This looks like logistic
regression, but with
[𝟏]
𝒛𝟐 𝒂[𝟏]
𝟐
features that were
[𝟏] [𝟏]
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 learnt (i.e., 𝒂𝟏 , 𝒂𝟐 ,
𝟏
[𝟏] [𝟏] [𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒂𝟑 , 𝒂𝟒 ) and NOT
Layer 2 Output layer
engineered by us (i.e.,
[𝟏] 𝒙𝟏 , 𝒙𝟐 , and 𝒙𝟑 )
𝒛𝟒 𝒂[𝟏]
𝟒
Hidden layer with 4 neurons
Layer 1 15
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 16
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏 𝑥.
𝒙𝟎 𝒙 = 𝑥/
[𝟏] 𝑥0
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 17
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
[']
𝒛𝟏 𝒂[𝟏]
𝟏
[.] [.] [.]
𝑤.. 𝑤./ 𝑤.0 𝑤.3
[.]
𝑤''
[.] [.] [.] [.]
𝒙𝟎 [']
𝒘[𝟏] = 𝑤/. 𝑤// 𝑤/0 𝑤/3
𝑤'( [𝟏] [.] [.] [.] [.]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 𝑤0. 𝑤0/ 𝑤00 𝑤03
[']
𝑤') [𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
[']
𝑤'* 𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 18
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[.] [.] [.]
𝑤.. 𝑤./ 𝑤.0 𝑤.3
[.]

[.] [.] [.] [.]

𝒙𝟎 𝒘[𝟏] = 𝑤/. 𝑤// 𝑤/0 𝑤/3
[𝟏] [.] [.] [.] [.]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 𝑤0. 𝑤0/ 𝑤00 𝑤03
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 19
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[.] [.] [.]
𝑤.. 𝑤./ 𝑤.0 𝑤.3
[.]

[.] [.] [.] [.]

(or Layer 0)
Layer 1 20
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[.] [.] [.]
𝑤.. 𝑤./ 𝑤.0 𝑤.3
[.]

[.] [.] [.] [.]

𝒙𝟎 𝒘[𝟏] = 𝑤/. 𝑤// 𝑤/0 𝑤/3
[𝟏] [.] [.] [.] [.]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 𝑤0. 𝑤0/ 𝑤00 𝑤03
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐] +
𝒚
𝟏 Dimension of 𝒘[𝟏] = (3, 4)
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 21
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[.] [.] [.]
[𝟏]
𝒛𝟏 [𝟏] 𝑤.. 𝑤/. 𝑤0.
𝒂𝟏
[.] [.] [.]
[𝟏]𝑻
𝑤./ 𝑤// 𝑤0/
𝒙𝟎 𝒘 = [.] [.] [.]
[𝟏] [𝟏] 𝑤.0 𝑤/0 𝑤00
𝒙𝟏 𝒛𝟐 𝒂𝟐 [.] [.] [.]
𝑤.3 𝑤/3 𝑤03
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
𝑻
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑 Dimension of 𝒘[𝟏] = (4, 3)
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 22
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[.]
[']
𝑏'
[𝟏]
𝒛𝟏 [𝟏]
𝒂𝟏 𝑏.
[.]
𝑏/
𝒙𝟎 [']
𝑏( 𝒃[𝟏] = [.]
[𝟏] [𝟏] 𝑏0
𝒙𝟏 [']
𝒛𝟐 𝒂𝟐
𝑏) [.]
[𝟐]
𝑏3
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
['] [𝟏]
𝑏* 𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 23
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[.]
[𝟏]
𝒛𝟏 [𝟏] 𝑧.
𝒂𝟏
[.]
𝑧/
𝒙𝟎 𝒛[𝟏] = [.]
[𝟏] [𝟏] 𝑧0
𝒙𝟏 𝒛𝟐 𝒂𝟐 [.]
𝑧3
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 24
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[.]
[𝟏]
𝒛𝟏 [𝟏]
𝒂𝟏
𝑎.
[.]
𝑎/
𝒙𝟎 𝒂[𝟏] = [.]
[𝟏] [𝟏] 𝑎0
𝒙𝟏 𝒛𝟐 𝒂𝟐 [.]
𝑎3
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 25
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [𝟏]
[𝟏]
𝑤// 𝑤0/ 𝑤1/ 𝒃𝟏
𝒛𝟐 𝒂[𝟏]
𝒙𝟏 𝟐 [/] [/] [/] 𝑥/ [𝟏]
𝑤/0 𝑤00 𝑤10 𝒃𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 = 𝑥0 +
𝒙𝟐 𝟏 [/] [/] [/] [𝟏]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑤01 𝑤11 𝑥1 𝒃 𝟑
𝒙𝟑 Layer 2 [/] [/] [/] [𝟏]
𝑤/2 𝑤02 𝑤12 𝒃𝟒
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 26
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/]
+ [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 27
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙𝟎 [/] [/] [/] [/]
[𝟏]
𝑤// 𝑥/ + 𝑤0/ 𝑥0 + 𝑤1/ 𝑥1 + 𝑏/
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐 [/] [/] [/] [/]
𝑤/0 𝑥/ + 𝑤00 𝑥0 + 𝑤10 𝑥1 + 𝑏0
𝒙𝟐
[𝟐]
𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 = [/] [/] [/] [/]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
𝑤/1 𝑥/ + 𝑤01 𝑥0 + 𝑤11 𝑥1 + 𝑏1
𝒙𝟑 Layer 2 [/] [/] [/] [/]
𝑤/2 𝑥/ + 𝑤02 𝑥0 + 𝑤12 𝑥1 + 𝑏2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] )
Layer 1 28
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[/]
[𝟏]
𝒛𝟏 [𝟏] 𝑤..
𝒂𝟏
[(] [/]
𝑤'' 𝑤/.
𝒙𝟎 𝒘[𝟐] = [/]
[𝟏] [𝟏] [(] 𝑤0.
𝒙𝟏 𝒛𝟐 𝒂𝟐 𝑤('
[/]
[(]
𝑤3.
𝑤)' 𝒛[𝟐]
𝟏 𝒂𝟏
[𝟐]
+
𝒚
𝒙𝟐
[𝟏]
𝒛𝟑 [𝟏] [(] Dimension of 𝒘[𝟐] = (4, 1)
𝒂𝟑 𝑤*'
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 37
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[(]
𝑤'' 𝑻 [/] [/] [/] [/]
𝒙𝟎 𝒘[𝟐] = 𝑤.. 𝑤/. 𝑤0. 𝑤3.
[𝟏] [(]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝑤(' 𝑻
Dimension of 𝒘[𝟐] = (1, 4)
[(]
𝑤)' 𝒛[𝟐]
𝟏 𝒂𝟏
[𝟐]
+
𝒚
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑 𝑤
[(]
*'
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 38
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟐]
𝒙𝟎 𝒃[𝟐] = 𝒃𝟏
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 39
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟐]
𝒙𝟎 𝒛[𝟐] = 𝒛𝟏
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 40
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟐]
𝒙𝟎 𝒂[𝟐] = 𝒂𝟏
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 41
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏] = 𝒘 𝒂 + 𝒃[𝟐] [/]
𝒙𝟏 𝒂𝟐 𝑎/
[𝟐]
𝒛𝟏 𝒂[𝟐] [/]
𝒙𝟐 𝟏
+
𝒚 [0] [0] [0] [0] 𝑎0 [0]
[𝟏]
𝒛𝟑 [𝟏]
𝒂𝟑
= 𝑤// 𝑤0/ 𝑤1/ 𝑤2/ [/]
+ 𝑏/
𝒙𝟑 Layer 2 𝑎1
[/]
[𝟏]
𝑎2
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 42
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
= 𝒘 𝒂 + 𝒃[𝟐]
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 [/] [.] [/] [.] [/] [.]
= 𝑤.. 𝑎. + 𝑤/. 𝑎/ + 𝑤0. 𝑎0 + 𝑤3. 𝑎3
[/] [.]
+ 𝑏.
[/]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 43
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
= 𝒘 𝒂 + 𝒃[𝟐]
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 [/] [.] [/] [.] [/] [.]
= 𝑤.. 𝑎. + 𝑤/. 𝑎/ + 𝑤0. 𝑎0 + 𝑤3. 𝑎3 + 𝑏.
[/] [.] [/]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
𝒂[𝟐] = 𝝈(𝒛[𝟐] )
Layer 1 44
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
= 𝒘 𝒂 + 𝒃[𝟐]
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 [/] [.] [/] [.] [/] [.]
= 𝑤.. 𝑎. + 𝑤/. 𝑎/ + 𝑤0. 𝑎0 + 𝑤3. 𝑎3 + 𝑏.
[/] [.] [/]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟐] [𝟐]
(or Layer 0)
𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝒂𝟏 = 𝝈(𝒛𝟏 )
Layer 1 45
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟐] = 𝒘 𝒙 + 𝒃[𝟐]
[𝟐]
𝒙𝟎
[𝟐]𝑻 [𝟏]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
= 𝒘 𝒂 + 𝒃[𝟐]
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 [/] [.] [/] [.] [/] [.]
= 𝑤.. 𝑎. + 𝑤/. 𝑎/ + 𝑤0. 𝑎0 + 𝑤3. 𝑎3 + 𝑏.
[/] [.] [/]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
[𝟐] [𝟐]
(or Layer 0)
𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝒂𝟏 = 𝝈(𝒛𝟏 )
Layer 1 46
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻
𝟏 𝒛[𝟏] = 𝒘 𝒙 + 𝒃[𝟏]
[𝟏]
𝒙= 𝒂[𝟎]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
𝒂[𝟏] = 𝝈(𝒛[𝟏] )
𝒙𝟏
𝑻 [𝟏]
𝒙𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒛[𝟐] = [𝟐]
𝒘 𝒂 + 𝒃[𝟐]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝒙𝟑
𝟑
Layer 2
𝒂[𝟐] = 𝝈(𝒛[𝟐] )

[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 47
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] 𝑻 [𝟎]
𝟏 𝒛[𝟏] = [𝟏]
𝒘 𝒂 + 𝒃[𝟏]
𝒙 = 𝒂[𝟎]
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐
𝒂[𝟏] = 𝝈(𝒛[𝟏] )
𝒙𝟏
𝑻 [𝟏]
𝒙𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒛[𝟐] = [𝟐]
𝒘 𝒂 + 𝒃[𝟐]
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝒙𝟑
𝟑
Layer 2
𝒂[𝟐] = 𝝈(𝒛[𝟐] )

[𝟏]
𝒛𝟒 𝒂[𝟏] But, this assumes only 1 training
Input Layer 𝟒

(or Layer 0) example!

Layer 1 48
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

[𝟏]
𝒛𝟒 𝒂[𝟏] How can we account for all the
Input Layer 𝟒

(or Layer 0) training examples?

Layer 1 49
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
Assuming 𝒏 examples
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻 (𝒊)
[𝟏]
𝒛[𝟏](𝒊) = 𝒘 𝟏 𝒂[𝟎](𝒊) + 𝒃[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒂[𝟏](𝒊) = 𝝈(𝒛[𝟏](𝒊) )
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏] 𝑻(𝒊) [𝟏](𝒊)
𝒙𝟑
𝟑
Layer 2 𝒛[𝟐](𝒊) = 𝒘[𝟐] 𝒂 + 𝒃[𝟐]

[𝟏] 𝒂[𝟐](𝒊) = 𝝈(𝒛[𝟐](𝒊) )

𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
Refers to the 𝑖𝑡ℎ example in
(or Layer 0)
Layer 1 the training dataset 50
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
Assuming 𝒏 examples
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻
𝒛[𝟏](𝒊) = 𝒘 𝟏 (𝒊) 𝒂[𝟎](𝒊) + 𝒃[𝟏]
[𝟏]
𝒛𝟐 𝒂[𝟏]
But, loops in general slow down
𝒙𝟏 𝟐
[𝟐] [𝟐]
programs; hence, it is better to
𝒂[𝟏](𝒊) = 𝝈(𝒛[𝟏](𝒊) )
𝒛𝟏 𝒂𝟏 +
𝒚
𝒙𝟐
[𝟏]
further vectorize the
𝒛𝟑 𝒂[𝟏] [𝟐](𝒊) [𝟐]𝑻(𝒊) [𝟏](𝒊)
𝒙𝟑
𝟑
Layer 2 𝒛 implementation
=𝒘 𝒂 𝒃[𝟐] to
in+ order
[𝟏]
avoid
𝒂[𝟐](𝒊)any loop,
= 𝝈(𝒛 whenever
[𝟐](𝒊) ) possible
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
Refers to the 𝑖𝑡ℎ example in
(or Layer 0)
Layer 1 the training dataset 51
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
Assuming 𝒏 examples
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻 (𝒊)
𝒛[𝟏](𝒊) = 𝒘 𝟏 𝒂[𝟎](𝒊) + 𝒃[𝟏]
𝒙𝟏
[𝟏]
𝒛𝟐 𝒂[𝟏]
𝟐 To this end, we can simply stack all
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 𝒂[𝟏](𝒊) = 𝝈(𝒛[𝟏](𝒊)
𝒙 vectors (or )𝒂[𝟎] vectors), 𝒛
𝒙𝟐 𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
vectors,
𝒛[𝟐](𝒊) =𝒘
and
[𝟐] 𝒂 vectors
𝑻(𝒊) in different
𝒂[𝟏](𝒊) + 𝒃[𝟐]
𝒙𝟑 Layer 2
matrices
[𝟐](𝒊)
of every layer!
[𝟐](𝒊)
[𝟏] [𝟏]
𝒂 = 𝝈(𝒛 )
𝒛𝟒 𝒂𝟒
Input Layer
Refers to the 𝑖𝑡ℎ example in
(or Layer 0)
Layer 1 the training dataset 52
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
(𝟏) (𝟐) (𝒏) Assuming
𝒙= 𝒂[𝟎] 𝒙𝟏 𝒙𝟏 𝒙𝟏 𝒏 examples
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑋 = 𝒙(𝟏)
𝟐
(𝟐)
𝒙𝟐 … (𝒏)
𝒙𝟐
𝒙𝟏
(𝟏) (𝟐) (𝒏)
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒙𝟑 𝒙𝟑 𝒙𝟑
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2 This vector represents the first
example in the training dataset
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 53
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
(𝟏) (𝟐) (𝒏) Assuming
𝒙= 𝒂[𝟎] 𝒙𝟏 𝒙𝟏 𝒙𝟏 𝒏 examples
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑋 = 𝒙(𝟏)
𝟐
(𝟐)
𝒙𝟐 … (𝒏)
𝒙𝟐
𝒙𝟏
(𝟏) (𝟐) (𝒏)
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒙𝟑 𝒙𝟑 𝒙𝟑
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2 This vector represents the second
example in the training dataset
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 54
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
(𝟏) (𝟐) (𝒏) Assuming
𝒙= 𝒂[𝟎] 𝒙𝟏 𝒙𝟏 𝒙𝟏 𝒏 examples
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑋 = 𝒙(𝟏)
𝟐
(𝟐)
𝒙𝟐 … (𝒏)
𝒙𝟐
𝒙𝟏
(𝟏) (𝟐) (𝒏)
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒙𝟑 𝒙𝟑 𝒙𝟑
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2 This vector represents the 𝒏𝒕𝒉
example in the training dataset
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 55
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
𝒙= 𝒂[𝟎] 𝒂𝟏 𝒂𝟏 𝒂𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
[𝟏]
𝒛𝟐 𝒂[𝟏] 𝑋= 𝒂𝟐 𝒂𝟐 … 𝒂𝟐
𝒙𝟏 𝟐
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 𝒂𝟑 𝒂𝟑 𝒂𝟑
𝒙𝟐 𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
If we denote 𝒙 as 𝒂[𝟎]
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 56
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
𝒙= 𝒂[𝟎] 𝒂𝟏 𝒂𝟏 𝒂𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑋 = 𝐴[7] = 𝒂𝟐 𝒂𝟐 … 𝒂𝟐
𝒙𝟏
[𝟎](𝟏) [𝟎](𝟐) [𝟎](𝒏)
[𝟐]
𝒛𝟏 𝒂[𝟐] +
𝒚 𝒂𝟑 𝒂𝟑 𝒂𝟑
𝒙𝟐 𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
If we denote 𝒙 as 𝒂[𝟎]
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 57
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[/](/) [/](0) [/](8)
𝒙= 𝒂[𝟎] 𝑧/ 𝑧/ 𝑧/
[/](/) [/](0) [/](8)
[𝟏]
𝒛𝟐 𝒂[𝟏] 𝑧0 𝑧0 𝑧0
𝒙𝟏 𝟐 𝑍 [/] = [/](/) [/](0) [/](8)
[𝟐]
𝒛𝟏 𝒂[𝟐] 𝑧1 𝑧1 … 𝑧1
𝒙𝟐 𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏] [/](/) [/](0) [/](8)
𝟑 𝑧2 𝑧2 𝑧2
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 58
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏
[/](/) [/](0) [/](8)
𝒙= 𝒂[𝟎] 𝑎/ 𝑎/ 𝑎/
[/](/) [/](0) [/](8)
[𝟏]
𝒛𝟐 𝒂[𝟏] 𝑎0 𝑎0 𝑎0
𝒙𝟏 𝟐 𝐴[/] = [/](/) [/](0) [/](8)
[𝟐]
𝒛𝟏 𝒂[𝟐] 𝑎1 𝑎1 … 𝑎1
𝒙𝟐 𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏] [/](/) [/](0) [/](8)
𝟑 𝑎2 𝑎2 𝑎2
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 59
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙 = 𝒂[𝟎]
[0](/) [0](0) [0](8)
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑍 [0] = 𝑧/ 𝑧/ … 𝑧/
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 60
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏

𝒙 = 𝒂[𝟎]
[0](/) [0](0) [0](8)
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝐴[0] = 𝑎/ 𝑎/ … 𝑎/
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0)
Layer 1 61
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏]
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻 (𝒊)
[𝟏]
𝒛[𝟏](𝒊) = 𝒘 𝟏 𝒂[𝟎](𝒊) + 𝒃[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝒂[𝟏](𝒊) = 𝝈(𝒛[𝟏](𝒊) )
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏] 𝑻(𝒊) [𝟏](𝒊)
𝒙𝟑
𝟑
Layer 2 𝒛[𝟐](𝒊) = 𝒘[𝟐] 𝒂 + 𝒃[𝟐]

[𝟏] 𝒂[𝟐](𝒊) = 𝝈(𝒛[𝟐](𝒊) )

𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) Before Vectorization

Layer 1 62
Vectorizing Input and All Variables

• To help develop an efficient learning algorithm, let us vectorize

(represent in vectors & matrices) the input and the variables involved
[𝟏]
𝒛𝟏 𝒂[𝟏] No Explicit Loop!
𝟏 𝒇𝒐𝒓 𝒊 = 𝟏 𝒕𝒐 𝒏:
𝒙 = 𝒂[𝟎] 𝑻
[𝟏]
𝒁[𝟏] = 𝒘 𝟏 𝑨[𝟎] + 𝒃[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒛𝟏 [𝟐]
𝒂𝟏 +
𝒚 𝑨[𝟏] = 𝝈(𝒁[𝟏] )
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏] 𝑻 [𝟏]
𝒙𝟑
𝟑
Layer 2
𝒁[𝟐] = [𝟐]
𝒘 𝑨 + 𝒃[𝟐]

[𝟏] 𝑨[𝟐] = 𝝈(𝒁[𝟐] )

𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒

(or Layer 0) After Vectorization

Layer 1 63
References

• Machine Learning and Security Protecting Systems with Data and Algorithms, Clarence Chio, David
Freeman
• Deep learning, Mohammad Hammoud, CMU Qatar

TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Lesson 7.0 Supervised Learning With Neural Networks (1)
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks (1)
22 pages
Introduction To Neural Networks For Senior Design: August 9 - 12, 2004 Intro-1
No ratings yet
Introduction To Neural Networks For Senior Design: August 9 - 12, 2004 Intro-1
33 pages
Neural Network
100% (1)
Neural Network
54 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Unit 1
No ratings yet
Unit 1
20 pages
UNIT III 3.1 ML Artificial Neural Networks
No ratings yet
UNIT III 3.1 ML Artificial Neural Networks
65 pages
Notes Chapter Neural Networks
No ratings yet
Notes Chapter Neural Networks
18 pages
7_Neural Networks (1)
No ratings yet
7_Neural Networks (1)
66 pages
Module1 ECO-598 AI & ML Aug 21
No ratings yet
Module1 ECO-598 AI & ML Aug 21
45 pages
Neural
No ratings yet
Neural
53 pages
UNIT - 4
No ratings yet
UNIT - 4
17 pages
4. Learning Algorithm
No ratings yet
4. Learning Algorithm
58 pages
Unit 3 - Ann
No ratings yet
Unit 3 - Ann
49 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
unit-1
No ratings yet
unit-1
19 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Deep Learning Modeule V01
No ratings yet
Deep Learning Modeule V01
70 pages
Chapter 5 Artificial Neural Networks
No ratings yet
Chapter 5 Artificial Neural Networks
50 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
Neural network
No ratings yet
Neural network
7 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
15-NEURAL-NETWORK-UPDATED
No ratings yet
15-NEURAL-NETWORK-UPDATED
85 pages
7 Neural Networks
No ratings yet
7 Neural Networks
70 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
ML unit 2
No ratings yet
ML unit 2
58 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
Chapter 2 - 2 Shallow neural network 2_2
No ratings yet
Chapter 2 - 2 Shallow neural network 2_2
34 pages
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
No ratings yet
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
40 pages
Understanding and Coding Neural Networks From Scratch in Python and R
100% (1)
Understanding and Coding Neural Networks From Scratch in Python and R
15 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
Lecture15 NeuronNetworks
No ratings yet
Lecture15 NeuronNetworks
61 pages
Unit 3
No ratings yet
Unit 3
8 pages
Understanding and Coding Neural Networks From Scratch in Python and R
No ratings yet
Understanding and Coding Neural Networks From Scratch in Python and R
12 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
22 pages
ML Unit-5 Final
No ratings yet
ML Unit-5 Final
23 pages
Deep Learning - Part-1
No ratings yet
Deep Learning - Part-1
143 pages
Neural Network: Throughout The Whole Network, Rather Than at Specific Locations
No ratings yet
Neural Network: Throughout The Whole Network, Rather Than at Specific Locations
8 pages
ML-Lec10-Artificial Neural Networks (1)
No ratings yet
ML-Lec10-Artificial Neural Networks (1)
76 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
Lecture 7 - Neural Networks
No ratings yet
Lecture 7 - Neural Networks
48 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Lec 23
No ratings yet
Lec 23
13 pages
09-Neural Networks
No ratings yet
09-Neural Networks
18 pages
Lecture 3 - MATLAB Representation of Neural Network
No ratings yet
Lecture 3 - MATLAB Representation of Neural Network
6 pages
Neural Networks - Annotated
No ratings yet
Neural Networks - Annotated
21 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
ANN PG Module1
No ratings yet
ANN PG Module1
75 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
Lec 06
No ratings yet
Lec 06
111 pages
Module 3 Ppt
No ratings yet
Module 3 Ppt
83 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet
Machine Learning: Lunch & Learn - Session 8 Luis Borbon 25/07/2017
No ratings yet
Machine Learning: Lunch & Learn - Session 8 Luis Borbon 25/07/2017
30 pages
MCQ SOFT COMPUTING
No ratings yet
MCQ SOFT COMPUTING
23 pages
All You Need To Know About Batch Size, Epochs and Training Steps in A Neural Network - by Rukshan Pramoditha - Data Science 365 - Medium
No ratings yet
All You Need To Know About Batch Size, Epochs and Training Steps in A Neural Network - by Rukshan Pramoditha - Data Science 365 - Medium
19 pages
Pattern Recognition and Computer Vision 4th Chinese Conference PRCV 2021 Beijing China October 29 November 1 2021 Proceedings Part IV Lecture Notes in Computer Science Huimin Ma (Editor) 2024 Scribd Download
100% (1)
Pattern Recognition and Computer Vision 4th Chinese Conference PRCV 2021 Beijing China October 29 November 1 2021 Proceedings Part IV Lecture Notes in Computer Science Huimin Ma (Editor) 2024 Scribd Download
79 pages
AWID For IntrusionCISS2019
No ratings yet
AWID For IntrusionCISS2019
6 pages
A Review of Remote Sensing For Water Quality Retrieval Progress and Challenges
No ratings yet
A Review of Remote Sensing For Water Quality Retrieval Progress and Challenges
21 pages
Questions-For-Data-Mining-2020 Eng Marwan
No ratings yet
Questions-For-Data-Mining-2020 Eng Marwan
19 pages
Efficient Estimation of Word Representations in Vector Space: January 2013
No ratings yet
Efficient Estimation of Word Representations in Vector Space: January 2013
13 pages
Reviewing The Role of AI in Fraud Detection and Prevention in Financial Services
No ratings yet
Reviewing The Role of AI in Fraud Detection and Prevention in Financial Services
11 pages
Bootstrap Methods For Foreign Currency Exchange Rates Prediction
No ratings yet
Bootstrap Methods For Foreign Currency Exchange Rates Prediction
6 pages
Acoustic Deep Learning PDF
No ratings yet
Acoustic Deep Learning PDF
16 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
31 pages
Siddharth Dhiman
No ratings yet
Siddharth Dhiman
9 pages
CIS Theory - MachineLearning
No ratings yet
CIS Theory - MachineLearning
13 pages
Car Make and Model Recognition Using Ima
No ratings yet
Car Make and Model Recognition Using Ima
8 pages
Vehicle Accidentand Traffic Classification Using Deep Convolutional Neural Networks
No ratings yet
Vehicle Accidentand Traffic Classification Using Deep Convolutional Neural Networks
7 pages
Intelligent Approach To Predication and Control Ground Vibrations in Mines
No ratings yet
Intelligent Approach To Predication and Control Ground Vibrations in Mines
14 pages
Cognitive Science and Artificial Intelligence
No ratings yet
Cognitive Science and Artificial Intelligence
19 pages
Concepts in Deep Learning
No ratings yet
Concepts in Deep Learning
14 pages
Quick Help
No ratings yet
Quick Help
12 pages
9 1 1 451
No ratings yet
9 1 1 451
11 pages
Automatica: Kyriakos G. Vamvoudakis Frank L. Lewis
No ratings yet
Automatica: Kyriakos G. Vamvoudakis Frank L. Lewis
11 pages
Cellular Neural Networks: A Review
No ratings yet
Cellular Neural Networks: A Review
31 pages
Handbook of Intelligent Control
No ratings yet
Handbook of Intelligent Control
138 pages
Autism Spectrum Disorder Detection: Video Games Based Facial Expression Diagnosis Using Deep Learning
No ratings yet
Autism Spectrum Disorder Detection: Video Games Based Facial Expression Diagnosis Using Deep Learning
11 pages
Lecture 1 - Neural Network Definitions and Concepts 1
No ratings yet
Lecture 1 - Neural Network Definitions and Concepts 1
4 pages
Q1. What Is Data Science? List The Differences Between Supervised and Unsupervised Learning
100% (1)
Q1. What Is Data Science? List The Differences Between Supervised and Unsupervised Learning
41 pages
IBM Watson AI On IBM Cloud Professional Certification Program
No ratings yet
IBM Watson AI On IBM Cloud Professional Certification Program
27 pages
Civil-Applications of Artificial Neural Networks in Civil Engineering
100% (1)
Civil-Applications of Artificial Neural Networks in Civil Engineering
25 pages
Binary Neural Networks
No ratings yet
Binary Neural Networks
218 pages