Neural Networks Optional
Neural Networks Optional
Deep Learning
What is a
deeplearning.ai
Neural Network?
Housing Price Prediction
price
size of house
Housing Price Prediction
Housing Price Prediction
size 𝑥1
#bedrooms 𝑥2
y
zip code 𝑥3
wealth 𝑥4
Introduction to
Neural Networks
Why is Deep
deeplearning.ai Learning taking off?
Andrew Ng
Scale drives deep learning progress
Performance
Amount of data
Andrew Ng
Scale drives deep learning progress
Idea
• Data
• Computation
• Algorithms
Experiment Code
Andrew Ng
Basics of Neural
Network Programming
Binary Classification
deeplearning.ai
Binary Classification
Blue
Green
Red
Andrew Ng
Notation
Andrew Ng
Basics of Neural
Network Programming
Logistic Regression
deeplearning.ai
Logistic Regression
Andrew Ng
Basics of Neural
Network Programming
Logistic Regression
deeplearning.ai
cost function
Logistic Regression cost function
1
𝑦ො = 𝜎 𝑤 𝑇 𝑥 + 𝑏 , where 𝜎 𝑧 =
1+𝑒 −𝑧
Andrew Ng
Basics of Neural
Network Programming
Gradient Descent
deeplearning.ai
Gradient Descent
𝑇 1
Recap: 𝑦 = 𝜎 𝑤 𝑥 + 𝑏 , 𝜎 𝑧 =
1+𝑒 −𝑧
𝑚 𝑚
1 1
𝐽 𝑤, 𝑏 = 𝑚 ℒ(𝑦 𝑖 , 𝑦 (𝑖) ) =
−
𝑚 𝑦 (𝑖) log 𝑦 𝑖 + (1 − 𝑦 (𝑖) ) log(1 − 𝑦 𝑖 )
𝑖=1 𝑖=1
𝑏
𝑤 Andrew Ng
Gradient Descent
Andrew Ng
Basics of Neural
Network Programming
Derivatives
deeplearning.ai
Intuition about derivatives
𝑓 𝑎 = 3𝑎
𝑎
Andrew Ng
Basics of Neural
Network Programming
More derivatives
deeplearning.ai
examples
Intuition about derivatives
𝑓 𝑎 = 𝑎2
𝑎
Andrew Ng
More derivative examples
Andrew Ng
Basics of Neural
Network Programming
Computation Graph
deeplearning.ai
Computation Graph
Andrew Ng
Basics of Neural
Network Programming
Derivatives with a
deeplearning.ai Computation Graph
Computing derivatives
𝑎=5
11 33
𝑏=3 6 𝑣 =𝑎+𝑢 𝐽 = 3𝑣
𝑢=𝑏𝑐
𝑐=2
Andrew Ng
Computing derivatives
𝑎=5
11 33
𝑏=3 6 𝑣 =𝑎+𝑢 𝐽 = 3𝑣
𝑢=𝑏𝑐
𝑐=2
Andrew Ng
Basics of Neural
Network Programming
Logistic Regression
deeplearning.ai
Gradient descent
Logistic regression recap
𝑧 = 𝑤𝑇𝑥 + 𝑏
𝑦ො = 𝑎 = 𝜎(𝑧)
ℒ 𝑎, 𝑦 = −(𝑦 log(𝑎) + (1 − 𝑦) log(1 − 𝑎))
Andrew Ng
Logistic regression derivatives
𝑥1
𝑤1
𝑥2 𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(a, 𝑦)
𝑤2
b
Andrew Ng
Basics of Neural
Network Programming
Gradient descent
deeplearning.ai
on m examples
Logistic regression on m examples
Andrew Ng
Logistic regression on m examples
Andrew Ng
Basics of Neural
Network Programming
Vectorization
deeplearning.ai
What is vectorization?
Andrew Ng
Basics of Neural
Network Programming
More vectorization
deeplearning.ai
examples
Neural network programming guideline
Whenever possible, avoid explicit for-loops.
Andrew Ng
Neural network programming guideline
Whenever possible, avoid explicit for-loops.
Andrew Ng
Vectors and matrix valued functions
Say you need to apply the exponential operation on every element of a
matrix/vector.
𝑣1
𝑣= ⋮
𝑣𝑛
u = np.zeros((n,1))
for i in range(n):
u[i]=math.exp(v[i])
Andrew Ng
Logistic regression derivatives
J = 0, dw1 = 0, dw2 = 0, db = 0
for i = 1 to n:
𝑧 (𝑖) = 𝑤 𝑇 𝑥 (𝑖) + 𝑏
𝑎(𝑖) = 𝜎(𝑧 (𝑖) )
𝐽 += − 𝑦 (𝑖) log 𝑦ො 𝑖 + (1 − 𝑦 𝑖 ) log(1 − 𝑦ො 𝑖 )
d𝑧 (𝑖) = 𝑎(𝑖) (1 − 𝑎(𝑖) )
(𝑖)
d𝑤1 += 𝑥1 d𝑧 (𝑖)
(𝑖)
d𝑤2 += 𝑥2 d𝑧 (𝑖)
db += d𝑧 (𝑖)
J = J/m, d𝑤1 = d𝑤1 /m, d𝑤2 = d𝑤2 /m, db = db/m
Andrew Ng
Basics of Neural
Network Programming
Vectorizing Logistic
deeplearning.ai
Regression
Vectorizing Logistic Regression
𝑧 (1) = 𝑤 𝑇 𝑥 (1) + 𝑏 𝑧 (2) = 𝑤 𝑇 𝑥 (2) + 𝑏 𝑧 (3) = 𝑤 𝑇 𝑥 (3) + 𝑏
𝑎(1) = 𝜎(𝑧 (1) ) 𝑎(2) = 𝜎(𝑧 (2) ) 𝑎(3) = 𝜎(𝑧 (3) )
Andrew Ng
Basics of Neural
Network Programming
Vectorizing Logistic
deeplearning.ai Regression’s Gradient
Computation
Vectorizing Logistic Regression
Andrew Ng
Implementing Logistic Regression
J = 0, d𝑤1 = 0, d𝑤2 = 0, db = 0
for i = 1 to m:
𝑧 (𝑖) = 𝑤 𝑇 𝑥 (𝑖) + 𝑏
𝑎(𝑖) = 𝜎(𝑧 (𝑖) )
𝐽 += − 𝑦 (𝑖) log 𝑎 𝑖 + (1 − 𝑦 𝑖 ) log(1 − 𝑎 𝑖 )
d𝑧 (𝑖) = 𝑎 (𝑖) −𝑦 (𝑖)
(𝑖)
d𝑤1 += 𝑥1 d𝑧 (𝑖)
(𝑖)
d𝑤2 += 𝑥2 d𝑧 (𝑖)
db += d𝑧 (𝑖)
J = J/m, d𝑤1 = d𝑤1 /m, d𝑤2 = d𝑤2 /m
db = db/m
Andrew Ng
Basics of Neural
Network Programming
Broadcasting in
deeplearning.ai
Python
Broadcasting example
Calories from Carbs, Proteins, Fats in 100g of different foods:
Apples Beef Eggs Potatoes
Carb 56.0 0.0 4.4 68.0
Protein 1.2 104.0 52.0 8.0
Fat 1.8 135.0 99.0 0.9
cal = A.sum(axis = 0)
percentage = 100*A/(cal.reshape(1,4))
Broadcasting example
1 101
2 100 102
+ =
3 103
4 104
A note on python/
deeplearning.ai
numpy vectors
Python Demo
Andrew Ng
Python / numpy vectors
import numpy as np
a = np.random.randn(5)
a = np.random.randn((5,1))
a = np.random.randn((1,5))
assert(a.shape = (5,1))
Andrew Ng
One hidden layer
Neural Network
Neural Networks
deeplearning.ai
Overview
What is a Neural Network?
𝑥1
𝑥2 𝑦ො
𝑥3 x
b
𝑥1
𝑥2 𝑦ො
𝑥3 x
𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑎[1] + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , 𝑦)
𝑏 [1] 𝑊 [2]
𝑏 [2] Andrew Ng
One hidden layer
Neural Network
Neural Network
deeplearning.ai
Representation
Neural Network Representation
𝑥1
𝑥2 𝑦ො
𝑥3
Andrew Ng
One hidden layer
Neural Network
Computing a
deeplearning.ai Neural Network’s
Output
Neural Network Representation
𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3
𝑧 = 𝑤𝑇𝑥 + 𝑏
𝑎 = 𝜎(𝑧)
Andrew Ng
Neural Network Representation
𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3
𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑥1
𝑎 = 𝜎(𝑧) 𝑥2 𝑦ො
𝑥3
Andrew Ng
Neural Network Representation
1 1𝑇 [1] [1] 1
𝑎1
1
𝑧1 = 𝑤1 𝑥 + 𝑏1 , 𝑎 1 = 𝜎(𝑧1 )
𝑥1 1 1𝑇 [1] [1] 1
𝑎2
1
𝑧2 = 𝑤2 𝑥 + 𝑏2 , 𝑎 2 = 𝜎(𝑧2 )
𝑥2 𝑦ො 1 1𝑇 [1] [1] 1
𝑎3
1
𝑧3 = 𝑤3 𝑥 + 𝑏3 , 𝑎 3 = 𝜎(𝑧3 )
𝑥3 1 1𝑇 [1] [1] 1
𝑎4
1
𝑧4 = 𝑤4 𝑥 + 𝑏4 , 𝑎 4 = 𝜎(𝑧4 )
Andrew Ng
Neural Network Representation learning
1
𝑎1
Given input x:
𝑥1 1
𝑎2 1 1
𝑧 =𝑊 𝑥+𝑏 1
𝑥2 1
𝑦ො
𝑎3
𝑥3 𝑎 1 = 𝜎(𝑧 1 )
1
𝑎4
2 2
𝑧 =𝑊 𝑎1 +𝑏2
𝑎 2 = 𝜎(𝑧 2
)
Andrew Ng
One hidden layer
Neural Network
Vectorizing across
deeplearning.ai
multiple examples
Vectorizing across multiple examples
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑥1 𝑎1 = 𝜎(𝑧 1 )
𝑥2 𝑦ො
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑥3
𝑎2 = 𝜎(𝑧 2 )
Andrew Ng
Vectorizing across multiple examples
for i = 1 to m:
𝑧 1 (𝑖) = 𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖
)
2 (𝑖) 2
𝑧 =𝑊 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖
)
Andrew Ng
One hidden layer
Neural Network
Explanation
deeplearning.ai for vectorized
implementation
Justification for vectorized implementation
Andrew Ng
Recap of vectorizing across multiple examples
for i = 1 to m
𝑥1
𝑧 1 (𝑖) =𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑥2 𝑦ො
𝑥3 𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖 )
𝑧 2 (𝑖) =𝑊 2 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖 )
𝑋 = 𝑥 (1) 𝑥 (2) … 𝑥 (𝑚)
𝑍1 =𝑊 1 𝑋+𝑏 1
𝐴1 = 𝜎(𝑍 1 )
𝑍2 =𝑊 2 𝐴1 +𝑏 2
A[1] = 𝑎[1](1) 𝑎[1](2) … 𝑎[1](𝑚)
𝐴2 = 𝜎(𝑍 2 )
Andrew Ng
One hidden layer
Neural Network
Activation functions
deeplearning.ai
Activation functions
𝑥1
𝑥2 𝑦ො
𝑥3
Given x:
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑎1 = 𝜎(𝑧 1 )
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑎2 = 𝜎(𝑧 2 ) Andrew Ng
Pros and cons of activation functions
a a
x
z
1
sigmoid: 𝑎 =
1 + 𝑒 −𝑧
a a
z z
Andrew Ng
One hidden layer
Neural Network
Why do you
deeplearning.ai need non-linear
activation functions?
Activation function
𝑥1
𝑥2 𝑦ො
𝑥3
Given x:
𝑧 1 =𝑊 1 𝑥+𝑏 1
𝑎 1 = 𝑔[1] (𝑧 1
)
2 2
𝑧 =𝑊 𝑎1 +𝑏2
𝑎 2 = 𝑔[2] (𝑧 2 )
Andrew Ng
One hidden layer
Neural Network
Derivatives of
deeplearning.ai activation functions
Sigmoid activation function
a
1
𝑔(𝑧) =
1 + 𝑒 −𝑧
z
Andrew Ng
Tanh activation function
a
𝑔(𝑧) = tanh(𝑧)
Andrew Ng
ReLU and Leaky ReLU
a a
z z
ReLU Leaky ReLU
Andrew Ng
One hidden layer
Neural Network
Andrew Ng
Formulas for computing derivatives
Andrew Ng
One hidden layer
Neural Network
Backpropagation
deeplearning.ai intuition (Optional)
Computing gradients
Logistic regression
𝑥
𝑤 𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
𝑏
Andrew Ng
Neural network gradients
𝑊 [2]
𝑥 𝑏 [2]
𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑥 + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , y)
𝑏 [1]
Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦
𝑇
𝑑𝑊 [2] = 𝑑𝑧 [2] 𝑎 1
𝑑𝑏 [2] = 𝑑𝑧 [2]
𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2]
∗ 𝑔[1] ′(z 1 )
𝑑𝑊 [1] = 𝑑𝑧 [1] 𝑥 𝑇
𝑑𝑏 [1] = 𝑑𝑧 [1]
Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦 𝑑𝑍 [2] = 𝐴[2] − 𝑌
𝑇 1 𝑇
𝑑𝑊 [2] = 𝑑𝑧 [2] 𝑎 1 𝑑𝑊 = 𝑑𝑍 [2] 𝐴 1
[2]
𝑚
1
𝑑𝑏 [2] = 𝑑𝑧 [2] 𝑑𝑏 = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 2 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
[2]
𝑚
𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2]
∗ 𝑔[1] ′(z 1 ) 𝑑𝑍 [1] = 𝑊 2 𝑇 𝑑𝑍 [2] ∗ 𝑔[1] ′(Z 1 )
1
𝑑𝑊 [1] = 𝑑𝑧 [1] 𝑥 𝑇 𝑑𝑊 [1] = 𝑑𝑍 [1] 𝑋 𝑇
𝑚
1
𝑑𝑏 [1] = 𝑑𝑧 [1] 𝑑𝑏 [1] = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 1 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
𝑚
Andrew Ng
One hidden layer
Neural Network
Random Initialization
deeplearning.ai
What happens if you initialize weights to
zero?
[1]
𝑥1 𝑎1
[2]
𝑎1 𝑦ො
[1]
𝑥2 𝑎2
Andrew Ng
Random initialization
[1]
𝑥1 𝑎1
[2]
𝑎1 𝑦ො
[1]
𝑥2 𝑎2
Andrew Ng
Deep Neural
Networks
Deep L-layer
deeplearning.ai Neural network
What is a deep neural network?
Andrew
Ng
Deep Neural
Networks
Forward Propagation
deeplearning.ai in a Deep Network
Forward propagation in a deep network
Andrew
Ng
Deep Neural
Networks
Andrew
Ng
Backward propagation for layer l
Andrew
Ng
Summary
Andrew
Ng