5_neural networks
5_neural networks
2
Example application
Spam Detection with NN
• Application: separate spam emails from a set of
non-spam emails
• Given a large collection of example emails, each labeled
“spam” or “ham”
• Learn to predict labels of new, future emails
3
Towards Neural Networks
𝜽𝟐 𝜽𝟐
𝒙𝟐 𝒛 = 𝜽𝑻𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂 𝒉𝜽
𝜽𝟑 𝜽𝟑
𝒙𝟑
Neuron 𝒙𝟑
Neuron
4
Towards Neural Networks
𝜽𝟐 𝒘𝟐 +
𝒚
𝒙𝟐 𝒛 = 𝜽𝑻𝒙 𝒂 = 𝝈(𝒛) 𝒉𝜽 𝒙𝟐 𝒛 𝒂
𝜽𝟑 𝒘𝟑
𝒙𝟑
Neuron 𝒙𝟑
Neuron
Using the notations in the neural network
literature, where 𝜽 = 𝒘 = 𝒘𝟏 , 𝒘𝟐 , 𝒘𝟑
&,
(𝒘𝟎 is not part of this vector here), 𝒉𝜽 = 𝒚
and 𝜽𝟎 = 𝒘𝟎 = 𝒃 5
Towards Neural Networks
𝒘𝟐 + = 𝝈(𝒘𝟏 𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒘𝟑 𝒙𝟑 + 𝒃)
𝒙𝟐 𝒛 𝒂 𝒚
𝒘𝟑
1
=
𝒙𝟑 1 + 𝑒 '(𝒘𝟏𝒙𝟏+ 𝒘𝟐𝒙𝟐+ 𝒘𝟑𝒙𝟑+𝒃)
𝒂 = 𝝈(𝒛)
6
Neural Networks
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
(or Layer 0)
Layer 1 7
Neural Networks
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
(or Layer 0)
Layer 1 8
Neural Networks
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
(or Layer 0)
Layer 1 9
Neural Networks
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
𝒙𝟐
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
(or Layer 0)
Layer 1 10
Neural Networks
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
(or Layer 0)
Layer 1 11
Neural Networks
𝒙𝟎
[𝟏] [𝟏]
By convention, this
𝒙𝟏 𝒛𝟐 𝒂𝟐
neural network is said
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 to have 2 layers (and
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
not 3) since the input
𝒙𝟑 Layer 2 Output layer layer is typically not
[𝟏] counted!
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
𝒙𝟎
[𝟏] [𝟏]
Also, the more layers
𝒙𝟏 𝒛𝟐 𝒂𝟐
we add, the deeper
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚 the neural network
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
becomes, giving rise to
𝒙𝟑 Layer 2 Output layer the concept of
[𝟏] deep learning!
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐] Interestingly, neural
𝒙𝟐 𝒛𝟏 𝒂[𝟐] +
𝒚
[𝟏]
𝟏
networks learn their
𝒛𝟑 𝒂[𝟏]
𝒙𝟑
𝟑
Layer 2 Output layer own features!
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
𝒙𝟎
[𝟏]
𝒙𝟏 𝒛𝟐 𝒂[𝟏]
𝟐
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
(or Layer 0)
Layer 1 16
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 17
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 18
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 19
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 20
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 21
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 22
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 23
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 24
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 25
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 26
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 27
Vectorizing Input and All Variables
(or Layer 0)
𝒂[𝟏] = 𝝈(𝒛[𝟏] )
Layer 1 28
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 37
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 38
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 39
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 40
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 41
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 42
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 43
Vectorizing Input and All Variables
(or Layer 0)
𝒂[𝟐] = 𝝈(𝒛[𝟐] )
Layer 1 44
Vectorizing Input and All Variables
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
(or Layer 0)
Layer 1 47
Vectorizing Input and All Variables
[𝟏]
𝒛𝟒 𝒂[𝟏] But, this assumes only 1 training
Input Layer 𝟒
[𝟏]
𝒛𝟒 𝒂[𝟏] How can we account for all the
Input Layer 𝟒
(or Layer 0)
Layer 1 53
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 54
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 55
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 56
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 57
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 58
Vectorizing Input and All Variables
(or Layer 0)
Layer 1 59
Vectorizing Input and All Variables
𝒙 = 𝒂[𝟎]
[0](/) [0](0) [0](8)
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝑍 [0] = 𝑧/ 𝑧/ … 𝑧/
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
(or Layer 0)
Layer 1 60
Vectorizing Input and All Variables
𝒙 = 𝒂[𝟎]
[0](/) [0](0) [0](8)
[𝟏]
𝒛𝟐 [𝟏]
𝒂𝟐 𝐴[0] = 𝑎/ 𝑎/ … 𝑎/
𝒙𝟏
[𝟐]
𝒙𝟐 𝒛𝟏 𝒂[𝟐]
𝟏
+
𝒚
[𝟏]
𝒛𝟑 𝒂[𝟏]
𝟑
𝒙𝟑 Layer 2
[𝟏]
𝒛𝟒 𝒂[𝟏]
Input Layer 𝟒
(or Layer 0)
Layer 1 61
Vectorizing Input and All Variables
• Machine Learning and Security Protecting Systems with Data and Algorithms, Clarence Chio, David
Freeman
• Deep learning, Mohammad Hammoud, CMU Qatar
64