0% found this document useful (0 votes)
4 views60 pages

Lecture21 Deep Learning PartII April12 2021

The lecture focuses on Deep Learning, specifically on the concepts of forward and backward propagation in neural networks. It explains how computations flow through a computation graph to calculate outputs and gradients. Key topics include the computation of derivatives and the application of the chain rule in calculus.

Uploaded by

zxi09062025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views60 pages

Lecture21 Deep Learning PartII April12 2021

The lecture focuses on Deep Learning, specifically on the concepts of forward and backward propagation in neural networks. It explains how computations flow through a computation graph to calculate outputs and gradients. Key topics include the computation of derivatives and the application of the chain rule in calculus.

Uploaded by

zxi09062025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

AI for Medicine

Lecture 21:
Deep Learning – Part II

April 12, 2021


Mohammad Hammoud
Carnegie Mellon University in Qatar
Today…
• Last Monday’s Session:
• Deep Learning – Part I

• Today’s Session:
• Deep Learning – Part II

• Announcements:
• Assignment 3 is due on Wednesday April 14 by midnight
• Quiz II is on April 19
Outline

Deep Learning

Computation Gradient
Overview Vectorization
Graph Descent
The Flow of Computations in Neural
Networks
• The flow of computations in a neural network goes in two ways:
1. Left-to-right: This is referred to as forward propagation, which results in
computing the output of the network
2. Right-to-left: This is referred to as back propagation, which results in
computing the gradients (or derivatives) of the parameters in the network

• The intuition behind this 2-way flow of computations can be explained


through the concept of “computation graphs”
• What is a computation graph?
What is a Computation Graph?
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄


𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄


𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄


𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄


𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄


𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward propagation allows computing
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=𝟑
𝒅𝒗
To compute the derivative of with respect to , we went back to ,
nudged it, and measured the corresponding resultant increase on
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅 𝒂
The change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅 𝒂
And the change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
This is denoted as the chain rule in calculus
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱
=𝟑= ×𝟏
𝒅𝒂 𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=𝟑=𝟑 ×𝟏
𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
In essence, to compute the derivative of with respect to , we had to go back to ,
nudge it a little bit, and measure the corresponding resultant increase on
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
Then, we had to go back to , nudge it a little bit, and measure the corresponding
resultant increase on
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂

Then, we multiplied the changes together (i.e., we applied the chain rule!)
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= .
𝒅𝒖 𝒅𝒗 𝒅 𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
The change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
And the change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱
=𝟑= ×𝟏
𝒅𝒖 𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=𝟑=𝟑 ×𝟏
𝒅𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
Same as before, we had to go back to then to in order to compute the derivative
of
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒃
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒃
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=? = × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃

3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟗= × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃

3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟗= × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃

3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟗= × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃

3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟗= × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃

3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒄
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒄
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=? = × ×
𝒅𝒄 𝒅𝒗 𝒅 𝒖 𝒅 𝒄

3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅 𝒖 𝒅 𝒄

3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅𝒗 𝒅𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅𝒖 𝒅𝒄

3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅𝒗 𝒅𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅𝒖 𝒅𝒄

3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅𝒗 𝒅𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅𝒖 𝒅𝒄

3 1 4
Outline

Deep Learning

Computation Gradient
Overview Vectorization
Graph Descent
The Computation Graph of Logistic
Regression
• Let us translate logistic regression (which is a neural network with only
1 neuron) into a computation graph
𝟏
𝒃 𝑻
𝒛 =𝒘 𝒙 + 𝒃

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒙𝟏 𝒘
𝟏 𝒃
𝒙 𝟐 𝒘𝟐 𝒛 𝒂 ^
𝒚 𝒙 𝑻
𝒘𝟑
𝒘
𝒙𝟑
𝒂=𝝈 ( 𝒛 )

Where , , , and is the cost (or loss) function


Forward Propagation
• The loss function can be computed by moving from left to right

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝒘
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 𝝏
( − 𝒚𝒍𝒐𝒈
= Partial ( 𝒂 ) −of
derivative ( 𝟏with
− 𝒚 ) respect − 𝒂) )
𝒍𝒐𝒈 ( 𝟏 to
𝝏𝒂 𝝏𝒂
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 𝝏
= ( − 𝒚𝒍𝒐𝒈 ( 𝒂 ) − ( 𝟏 − 𝒚 ) 𝒍𝒐𝒈 ( 𝟏 − 𝒂 ) )
𝝏𝒂 𝝏𝒂
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 −𝝏𝒚 (𝟏 − 𝒚 () ) (
= ( − 𝒚𝒍𝒐𝒈 𝒂 − 𝟏 − 𝒚 ) 𝒍𝒐𝒈 ( 𝟏 − 𝒂 ) )
+
𝝏𝒂 𝝏𝒂𝒂 ( 𝟏− 𝒂)
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂
= Partial
× derivative of with respect to
𝝏 𝒛 𝝏𝒂 𝝏 𝒛
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂 − 𝒚 (𝟏 − 𝒚 )
= × ¿
𝝏 𝒛 𝝏𝒂 𝝏 𝒛 𝒂 (
+
( 𝟏 − 𝒂)
× ¿
)
𝝏𝒛 𝒂
+
(
𝝏 𝒂 − 𝒚 (𝟏 − 𝒚 )
( 𝟏 − 𝒂) )
×𝒂 (𝟏 − 𝒂)
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂
=𝒂 −×𝒚
𝝏 𝒛 𝝏𝒂 𝝏 𝒛
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂 𝝏 𝒛
= Partial
× derivative
× of with respect to
𝝏𝒃 𝝏𝒂 𝝏 𝒛 𝝏𝒃
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂 𝝏 𝒛 𝝏𝒛
× ¿ (𝒂 − 𝒚 )× ¿ (𝒂 − 𝒚 )
𝝏 𝒃¿ (𝒂 − 𝒚 )×𝟏
= ×
𝝏𝒃 𝝏𝒂 𝝏 𝒛 𝝏𝒃
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 𝝏𝓛 𝝏 𝒂 𝝏 𝒛
= Partial
× derivative
× of with respect to
𝝏 𝒘 𝝏 𝒂 𝝏 𝒛 𝝏𝒘
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)


𝒃
𝒙 𝑻

𝝏𝓛 𝝏𝓛 𝝏 𝒂 𝝏 𝒛 𝝏𝒛
¿ (𝒂 − 𝒚 )×
𝝏 𝒘¿ (𝒂 − 𝒚 ) 𝒙
= × ×
𝝏 𝒘 𝝏 𝒂 𝝏 𝒛 𝝏𝒘
Next Monday’s Lecture…

Deep Learning

Computation Gradient
Overview Vectorization
Graph Descent

Continue…

You might also like