0% found this document useful (0 votes)
7 views96 pages

Neural Networks Optional

The document provides an introduction to deep learning and neural networks, explaining key concepts such as binary classification, logistic regression, and gradient descent. It emphasizes the importance of data, computation, and algorithms in driving deep learning progress. Additionally, it covers programming guidelines for neural networks, including vectorization and broadcasting in Python.

Uploaded by

Adit Haqy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views96 pages

Neural Networks Optional

The document provides an introduction to deep learning and neural networks, explaining key concepts such as binary classification, logistic regression, and gradient descent. It emphasizes the importance of data, computation, and algorithms in driving deep learning progress. Additionally, it covers programming guidelines for neural networks, including vectorization and broadcasting in Python.

Uploaded by

Adit Haqy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Introduction to

Deep Learning

What is a
deeplearning.ai
Neural Network?
Housing Price Prediction
price

size of house
Housing Price Prediction
Housing Price Prediction

size 𝑥1

#bedrooms 𝑥2
y
zip code 𝑥3

wealth 𝑥4
Introduction to
Neural Networks

Why is Deep
deeplearning.ai Learning taking off?
Andrew Ng
Scale drives deep learning progress
Performance

Amount of data
Andrew Ng
Scale drives deep learning progress

Idea
• Data

• Computation

• Algorithms
Experiment Code

Andrew Ng
Basics of Neural
Network Programming

Binary Classification
deeplearning.ai
Binary Classification

1 (cat) vs 0 (non cat)

Blue
Green
Red

Andrew Ng
Notation

Andrew Ng
Basics of Neural
Network Programming

Logistic Regression
deeplearning.ai
Logistic Regression

Andrew Ng
Basics of Neural
Network Programming

Logistic Regression
deeplearning.ai
cost function
Logistic Regression cost function
1
𝑦ො = 𝜎 𝑤 𝑇 𝑥 + 𝑏 , where 𝜎 𝑧 =
1+𝑒 −𝑧

Given (𝑥 (1) , 𝑦 (1) ),…,(𝑥 (𝑚) , 𝑦 (𝑚) ) , want 𝑦ො (𝑖) ≈ 𝑦 𝑖 .


Loss (error) function:

Andrew Ng
Basics of Neural
Network Programming

Gradient Descent
deeplearning.ai
Gradient Descent
𝑇 1
Recap: 𝑦 = 𝜎 𝑤 𝑥 + 𝑏 , 𝜎 𝑧 =
1+𝑒 −𝑧
𝑚 𝑚
1 1
𝐽 𝑤, 𝑏 = 𝑚 ℒ(𝑦 𝑖 , 𝑦 (𝑖) ) =

𝑚 𝑦 (𝑖) log 𝑦 𝑖 + (1 − 𝑦 (𝑖) ) log(1 − 𝑦 𝑖 )
𝑖=1 𝑖=1

Want to find 𝑤, 𝑏 that minimize 𝐽 𝑤, 𝑏


𝐽 𝑤, 𝑏

𝑏
𝑤 Andrew Ng
Gradient Descent

Andrew Ng
Basics of Neural
Network Programming

Derivatives
deeplearning.ai
Intuition about derivatives
𝑓 𝑎 = 3𝑎

𝑎
Andrew Ng
Basics of Neural
Network Programming

More derivatives
deeplearning.ai
examples
Intuition about derivatives
𝑓 𝑎 = 𝑎2

𝑎
Andrew Ng
More derivative examples

Andrew Ng
Basics of Neural
Network Programming

Computation Graph
deeplearning.ai
Computation Graph

Andrew Ng
Basics of Neural
Network Programming

Derivatives with a
deeplearning.ai Computation Graph
Computing derivatives
𝑎=5
11 33
𝑏=3 6 𝑣 =𝑎+𝑢 𝐽 = 3𝑣
𝑢=𝑏𝑐
𝑐=2

Andrew Ng
Computing derivatives
𝑎=5
11 33
𝑏=3 6 𝑣 =𝑎+𝑢 𝐽 = 3𝑣
𝑢=𝑏𝑐
𝑐=2

Andrew Ng
Basics of Neural
Network Programming

Logistic Regression
deeplearning.ai
Gradient descent
Logistic regression recap

𝑧 = 𝑤𝑇𝑥 + 𝑏
𝑦ො = 𝑎 = 𝜎(𝑧)
ℒ 𝑎, 𝑦 = −(𝑦 log(𝑎) + (1 − 𝑦) log(1 − 𝑎))

Andrew Ng
Logistic regression derivatives
𝑥1
𝑤1
𝑥2 𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(a, 𝑦)
𝑤2
b

Andrew Ng
Basics of Neural
Network Programming

Gradient descent
deeplearning.ai
on m examples
Logistic regression on m examples

Andrew Ng
Logistic regression on m examples

Andrew Ng
Basics of Neural
Network Programming

Vectorization
deeplearning.ai
What is vectorization?

Andrew Ng
Basics of Neural
Network Programming

More vectorization
deeplearning.ai
examples
Neural network programming guideline
Whenever possible, avoid explicit for-loops.

Andrew Ng
Neural network programming guideline
Whenever possible, avoid explicit for-loops.

Andrew Ng
Vectors and matrix valued functions
Say you need to apply the exponential operation on every element of a
matrix/vector.

𝑣1
𝑣= ⋮
𝑣𝑛

u = np.zeros((n,1))
for i in range(n):
u[i]=math.exp(v[i])

Andrew Ng
Logistic regression derivatives
J = 0, dw1 = 0, dw2 = 0, db = 0
for i = 1 to n:
𝑧 (𝑖) = 𝑤 𝑇 𝑥 (𝑖) + 𝑏
𝑎(𝑖) = 𝜎(𝑧 (𝑖) )
𝐽 += − 𝑦 (𝑖) log 𝑦ො 𝑖 + (1 − 𝑦 𝑖 ) log(1 − 𝑦ො 𝑖 )
d𝑧 (𝑖) = 𝑎(𝑖) (1 − 𝑎(𝑖) )
(𝑖)
d𝑤1 += 𝑥1 d𝑧 (𝑖)
(𝑖)
d𝑤2 += 𝑥2 d𝑧 (𝑖)
db += d𝑧 (𝑖)
J = J/m, d𝑤1 = d𝑤1 /m, d𝑤2 = d𝑤2 /m, db = db/m

Andrew Ng
Basics of Neural
Network Programming

Vectorizing Logistic
deeplearning.ai
Regression
Vectorizing Logistic Regression
𝑧 (1) = 𝑤 𝑇 𝑥 (1) + 𝑏 𝑧 (2) = 𝑤 𝑇 𝑥 (2) + 𝑏 𝑧 (3) = 𝑤 𝑇 𝑥 (3) + 𝑏
𝑎(1) = 𝜎(𝑧 (1) ) 𝑎(2) = 𝜎(𝑧 (2) ) 𝑎(3) = 𝜎(𝑧 (3) )

Andrew Ng
Basics of Neural
Network Programming

Vectorizing Logistic
deeplearning.ai Regression’s Gradient
Computation
Vectorizing Logistic Regression

Andrew Ng
Implementing Logistic Regression
J = 0, d𝑤1 = 0, d𝑤2 = 0, db = 0
for i = 1 to m:
𝑧 (𝑖) = 𝑤 𝑇 𝑥 (𝑖) + 𝑏
𝑎(𝑖) = 𝜎(𝑧 (𝑖) )
𝐽 += − 𝑦 (𝑖) log 𝑎 𝑖 + (1 − 𝑦 𝑖 ) log(1 − 𝑎 𝑖 )
d𝑧 (𝑖) = 𝑎 (𝑖) −𝑦 (𝑖)
(𝑖)
d𝑤1 += 𝑥1 d𝑧 (𝑖)
(𝑖)
d𝑤2 += 𝑥2 d𝑧 (𝑖)
db += d𝑧 (𝑖)
J = J/m, d𝑤1 = d𝑤1 /m, d𝑤2 = d𝑤2 /m
db = db/m
Andrew Ng
Basics of Neural
Network Programming

Broadcasting in
deeplearning.ai
Python
Broadcasting example
Calories from Carbs, Proteins, Fats in 100g of different foods:
Apples Beef Eggs Potatoes
Carb 56.0 0.0 4.4 68.0
Protein 1.2 104.0 52.0 8.0
Fat 1.8 135.0 99.0 0.9

cal = A.sum(axis = 0)
percentage = 100*A/(cal.reshape(1,4))
Broadcasting example
1 101
2 100 102
+ =
3 103
4 104

1 2 3 100 200 300 101 202 303


+ =
4 5 6 104 205 306

1 2 3 100 101 102 103


+ =
4 5 6 200 204 205 206
General Principle
Basics of Neural
Network Programming

A note on python/
deeplearning.ai
numpy vectors
Python Demo

Andrew Ng
Python / numpy vectors

import numpy as np

a = np.random.randn(5)

a = np.random.randn((5,1))

a = np.random.randn((1,5))

assert(a.shape = (5,1))

Andrew Ng
One hidden layer
Neural Network

Neural Networks
deeplearning.ai
Overview
What is a Neural Network?
𝑥1
𝑥2 𝑦ො
𝑥3 x

w 𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)

b
𝑥1
𝑥2 𝑦ො
𝑥3 x

𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑎[1] + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , 𝑦)

𝑏 [1] 𝑊 [2]
𝑏 [2] Andrew Ng
One hidden layer
Neural Network

Neural Network
deeplearning.ai
Representation
Neural Network Representation

𝑥1

𝑥2 𝑦ො

𝑥3

Andrew Ng
One hidden layer
Neural Network

Computing a
deeplearning.ai Neural Network’s
Output
Neural Network Representation

𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3

𝑧 = 𝑤𝑇𝑥 + 𝑏

𝑎 = 𝜎(𝑧)

Andrew Ng
Neural Network Representation

𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3

𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑥1
𝑎 = 𝜎(𝑧) 𝑥2 𝑦ො
𝑥3
Andrew Ng
Neural Network Representation
1 1𝑇 [1] [1] 1
𝑎1
1
𝑧1 = 𝑤1 𝑥 + 𝑏1 , 𝑎 1 = 𝜎(𝑧1 )
𝑥1 1 1𝑇 [1] [1] 1
𝑎2
1
𝑧2 = 𝑤2 𝑥 + 𝑏2 , 𝑎 2 = 𝜎(𝑧2 )
𝑥2 𝑦ො 1 1𝑇 [1] [1] 1
𝑎3
1
𝑧3 = 𝑤3 𝑥 + 𝑏3 , 𝑎 3 = 𝜎(𝑧3 )
𝑥3 1 1𝑇 [1] [1] 1
𝑎4
1
𝑧4 = 𝑤4 𝑥 + 𝑏4 , 𝑎 4 = 𝜎(𝑧4 )

Andrew Ng
Neural Network Representation learning
1
𝑎1
Given input x:
𝑥1 1
𝑎2 1 1
𝑧 =𝑊 𝑥+𝑏 1
𝑥2 1
𝑦ො
𝑎3
𝑥3 𝑎 1 = 𝜎(𝑧 1 )
1
𝑎4
2 2
𝑧 =𝑊 𝑎1 +𝑏2

𝑎 2 = 𝜎(𝑧 2
)

Andrew Ng
One hidden layer
Neural Network

Vectorizing across
deeplearning.ai
multiple examples
Vectorizing across multiple examples
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑥1 𝑎1 = 𝜎(𝑧 1 )
𝑥2 𝑦ො
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑥3
𝑎2 = 𝜎(𝑧 2 )

Andrew Ng
Vectorizing across multiple examples
for i = 1 to m:
𝑧 1 (𝑖) = 𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖
)
2 (𝑖) 2
𝑧 =𝑊 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖
)

Andrew Ng
One hidden layer
Neural Network

Explanation
deeplearning.ai for vectorized
implementation
Justification for vectorized implementation

Andrew Ng
Recap of vectorizing across multiple examples
for i = 1 to m
𝑥1
𝑧 1 (𝑖) =𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑥2 𝑦ො
𝑥3 𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖 )
𝑧 2 (𝑖) =𝑊 2 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖 )
𝑋 = 𝑥 (1) 𝑥 (2) … 𝑥 (𝑚)
𝑍1 =𝑊 1 𝑋+𝑏 1
𝐴1 = 𝜎(𝑍 1 )
𝑍2 =𝑊 2 𝐴1 +𝑏 2
A[1] = 𝑎[1](1) 𝑎[1](2) … 𝑎[1](𝑚)
𝐴2 = 𝜎(𝑍 2 )
Andrew Ng
One hidden layer
Neural Network

Activation functions
deeplearning.ai
Activation functions
𝑥1

𝑥2 𝑦ො
𝑥3

Given x:
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑎1 = 𝜎(𝑧 1 )
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑎2 = 𝜎(𝑧 2 ) Andrew Ng
Pros and cons of activation functions
a a

x
z
1
sigmoid: 𝑎 =
1 + 𝑒 −𝑧
a a

z z
Andrew Ng
One hidden layer
Neural Network

Why do you
deeplearning.ai need non-linear
activation functions?
Activation function
𝑥1

𝑥2 𝑦ො
𝑥3

Given x:
𝑧 1 =𝑊 1 𝑥+𝑏 1
𝑎 1 = 𝑔[1] (𝑧 1
)
2 2
𝑧 =𝑊 𝑎1 +𝑏2
𝑎 2 = 𝑔[2] (𝑧 2 )
Andrew Ng
One hidden layer
Neural Network

Derivatives of
deeplearning.ai activation functions
Sigmoid activation function

a
1
𝑔(𝑧) =
1 + 𝑒 −𝑧
z

Andrew Ng
Tanh activation function
a
𝑔(𝑧) = tanh(𝑧)

Andrew Ng
ReLU and Leaky ReLU
a a

z z
ReLU Leaky ReLU

Andrew Ng
One hidden layer
Neural Network

Gradient descent for


deeplearning.ai neural networks
Gradient descent for neural networks

Andrew Ng
Formulas for computing derivatives

Andrew Ng
One hidden layer
Neural Network

Backpropagation
deeplearning.ai intuition (Optional)
Computing gradients
Logistic regression
𝑥
𝑤 𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
𝑏

Andrew Ng
Neural network gradients
𝑊 [2]
𝑥 𝑏 [2]
𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑥 + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , y)

𝑏 [1]

Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦
𝑇
𝑑𝑊 [2] = 𝑑𝑧 [2] 𝑎 1

𝑑𝑏 [2] = 𝑑𝑧 [2]

𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2]
∗ 𝑔[1] ′(z 1 )

𝑑𝑊 [1] = 𝑑𝑧 [1] 𝑥 𝑇

𝑑𝑏 [1] = 𝑑𝑧 [1]
Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦 𝑑𝑍 [2] = 𝐴[2] − 𝑌

𝑇 1 𝑇
𝑑𝑊 [2] = 𝑑𝑧 [2] 𝑎 1 𝑑𝑊 = 𝑑𝑍 [2] 𝐴 1
[2]
𝑚
1
𝑑𝑏 [2] = 𝑑𝑧 [2] 𝑑𝑏 = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 2 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
[2]
𝑚

𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2]
∗ 𝑔[1] ′(z 1 ) 𝑑𝑍 [1] = 𝑊 2 𝑇 𝑑𝑍 [2] ∗ 𝑔[1] ′(Z 1 )
1
𝑑𝑊 [1] = 𝑑𝑧 [1] 𝑥 𝑇 𝑑𝑊 [1] = 𝑑𝑍 [1] 𝑋 𝑇
𝑚
1
𝑑𝑏 [1] = 𝑑𝑧 [1] 𝑑𝑏 [1] = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 1 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
𝑚
Andrew Ng
One hidden layer
Neural Network

Random Initialization
deeplearning.ai
What happens if you initialize weights to
zero?
[1]
𝑥1 𝑎1
[2]
𝑎1 𝑦ො
[1]
𝑥2 𝑎2

Andrew Ng
Random initialization
[1]
𝑥1 𝑎1
[2]
𝑎1 𝑦ො
[1]
𝑥2 𝑎2

Andrew Ng
Deep Neural
Networks

Deep L-layer
deeplearning.ai Neural network
What is a deep neural network?

logistic regression 1 hidden layer

2 hidden layers 5 hidden layers


Andrew
Ng
Deep neural network notation

Andrew
Ng
Deep Neural
Networks

Forward Propagation
deeplearning.ai in a Deep Network
Forward propagation in a deep network

Andrew
Ng
Deep Neural
Networks

Forward and backward


deeplearning.ai propagation
Forward propagation for layer l

Andrew
Ng
Backward propagation for layer l

Andrew
Ng
Summary

Andrew
Ng

You might also like