0% found this document useful (0 votes)

53 views

Backpropagation: TA: Yi Wen

This document discusses backpropagation, which is an algorithm for efficiently computing gradients in neural networks. It motivates backpropagation by explaining that neural network training involves minimizing a loss function over parameters. It then covers numerical gradient estimation versus analytical gradients using the chain rule. The key steps of backpropagation are outlined as identifying intermediate functions during forward propagation, computing local gradients using the chain rule, and combining gradients to get the full gradient. Matrix calculus rules for derivatives with respect to vectors and matrices are presented. Tips for implementing backpropagation include writing the computation graph, tracking error signals, computing the loss derivative, and enforcing shape rules on gradients.

Uploaded by

Trần Văn Duy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Backpropagation: TA: Yi Wen

Uploaded by

Trần Văn Duy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Backpropagation

TA: Yi Wen

April 17, 2020

CS231n Discussion Section

Slides credits: Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen

Agenda
● Motivation
● Backprop Tips & Tricks
● Matrix calculus primer
Agenda
● Motivation
● Backprop Tips & Tricks
● Matrix calculus primer
Motivation
Recall: Optimization objective is minimize loss
Motivation
Recall: Optimization objective is minimize loss

Goal: how should we tweak the parameters to decrease the loss?

Agenda
● Motivation
● Backprop Tips & Tricks
● Matrix calculus primer
A Simple Example
Loss

Goal: Tweak the parameters to minimize loss

=> minimize a multivariable function in parameter space

A Simple Example

=> minimize a multivariable function

Plotted on WolframAlpha
Approach #1: Random Search
Intuition: the step we take in the domain of function
Approach #2: Numerical Gradient
Intuition: rate of change of a function with respect to a
variable surrounding a small region
Approach #2: Numerical Gradient
Intuition: rate of change of a function with respect to a
variable surrounding a small region

Finite Differences:
Approach #3: Analytical Gradient
Recall: partial derivative by limit definition
Approach #3: Analytical Gradient
Recall: chain rule
Approach #3: Analytical Gradient
Recall: chain rule

E.g.
Approach #3: Analytical Gradient
Recall: chain rule

Intuition: upstream gradient values propagate backwards -- we can reuse them!

Gradient

“direction and rate of fastest increase”

Numerical Gradient vs Analytical Gradient

What about Autograd?
● Deep learning frameworks can automatically perform backprop!

● Problems might surface related to underlying gradients when debugging your

models

“Yes You Should Understand Backprop”

https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
Problem Statement: Backpropagation

Given a function f with respect to inputs x, labels y, and parameters 𝜃

compute the gradient of Loss with respect to 𝜃
Problem Statement: Backpropagation
An algorithm for computing the gradient of a compound function as a series of
local, intermediate gradients:

1. Identify intermediate functions (forward prop)

2. Compute local gradients (chain rule)
3. Combine with upstream error signal to get full gradient

local(x,W,b) => y
Input x W,b y output
dx,dW,db <= grad_local(dy,x,W,b)
dx dy
dW,db
Modularity: Previous Example

Compound function

Intermediate Variables
(forward propagation)
Modularity: 2-Layer Neural Network

Compound function

Intermediate Variables
(forward propagation)

=> Squared Euclidean Distance

between and
Intermediate Variables
(forward propagation) ? f(x;W,b) = Wx + b ?

(↑lecture note) Input one feature vector

(←here) Input a batch of data (matrix)

1. intermediate functions
Intermediate Variables Intermediate Gradients
2. local gradients
(forward propagation) (backward propagation)
3. full gradients

？？？

？？？
Agenda
● Motivation
● Backprop Tips & Tricks
● Matrix calculus primer
Derivative w.r.t. Vector

Scalar-by-Vector

Vector-by-Vector
1. intermediate functions
2. local gradients
Derivative w.r.t. Vector: Chain Rule 3. full gradients

?
Derivative w.r.t. Vector: Takeaway
Derivative w.r.t. Matrix

Scalar-by-Matrix

Vector-by-Matrix ?
Derivative w.r.t. Matrix: Dimension Balancing

When you take scalar-by-matrix gradients

The gradient has shape of denominator

● Dimension balancing is the “cheap” but efficient approach to

gradient calculations in most practical settings
Derivative w.r.t. Matrix: Takeaway
1. intermediate functions
Intermediate Variables Intermediate Gradients
2. local gradients
(forward propagation) (backward propagation)
3. full gradients
Backprop Menu for Success

1. Write down variable graph

2. Keep track of error signals
3. Compute derivative of loss function
4. Enforce shape rule on error signals, especially when deriving
over a linear transformation
Vector-by-vector

?
Vector-by-vector

?
Matrix multiplication [Backprop]

? ?
Elementwise function [Backprop]

María O Kean J (2015) Economía
No ratings yet
María O Kean J (2015) Economía
18 pages
Back Propagation
No ratings yet
Back Propagation
71 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Chap5 3-BackProp
No ratings yet
Chap5 3-BackProp
41 pages
Learning 3
No ratings yet
Learning 3
98 pages
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND_20250415_122012_0000
No ratings yet
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND_20250415_122012_0000
18 pages
Neural Networks - III: ICT3212 - Introduction To Intelligence Systems COM3303 - Artificial Intelligence
No ratings yet
Neural Networks - III: ICT3212 - Introduction To Intelligence Systems COM3303 - Artificial Intelligence
44 pages
Loss Optimization Gradient Decent
No ratings yet
Loss Optimization Gradient Decent
10 pages
Lecture 7
No ratings yet
Lecture 7
24 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
Chapter 6 - Backpropagation
No ratings yet
Chapter 6 - Backpropagation
48 pages
Back Propagation
No ratings yet
Back Propagation
19 pages
Tut 01
No ratings yet
Tut 01
39 pages
Unit 3
No ratings yet
Unit 3
6 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
nn1
No ratings yet
nn1
21 pages
Lecture Slides 4 - Backpropagation - 2021
No ratings yet
Lecture Slides 4 - Backpropagation - 2021
19 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
Back Propagation in NN
No ratings yet
Back Propagation in NN
30 pages
ML Session 15 Backpropagation
No ratings yet
ML Session 15 Backpropagation
30 pages
Neural Networks Part2
No ratings yet
Neural Networks Part2
28 pages
Neural-Networks Back Propagation
No ratings yet
Neural-Networks Back Propagation
70 pages
UML - unit 2
No ratings yet
UML - unit 2
10 pages
6.Deriving Back Propogation
No ratings yet
6.Deriving Back Propogation
11 pages
Chapter 3-3 Neural Network-Back Propagation
No ratings yet
Chapter 3-3 Neural Network-Back Propagation
32 pages
2024 04 CS115 Vector Caculus
No ratings yet
2024 04 CS115 Vector Caculus
131 pages
CHAPTER 3.4.1 - Backpropagation_Updated
No ratings yet
CHAPTER 3.4.1 - Backpropagation_Updated
20 pages
Guided Backpropagation
No ratings yet
Guided Backpropagation
11 pages
Skymind The Math Behind Neural Networks
100% (1)
Skymind The Math Behind Neural Networks
17 pages
CS231n Convolutional Neural Networks For Visual Recognition
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition
9 pages
XCS224N_Module2_Slides
No ratings yet
XCS224N_Module2_Slides
80 pages
Lecture 2, Part 2: Backpropagation: Roger Grosse
No ratings yet
Lecture 2, Part 2: Backpropagation: Roger Grosse
9 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
Backpropagation_Networks_Presentation_Updated
No ratings yet
Backpropagation_Networks_Presentation_Updated
10 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
No ratings yet
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
6 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
CS231n Convolutional Neural Networks For Visual Recognition 4
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 4
10 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Backward Forward Propogation
No ratings yet
Backward Forward Propogation
19 pages
Backpropagation_Steps
No ratings yet
Backpropagation_Steps
2 pages
backprop unit 2
No ratings yet
backprop unit 2
5 pages
Back Propogation
No ratings yet
Back Propogation
28 pages
Multi Perceptor
No ratings yet
Multi Perceptor
37 pages
FFNN,GD,Backpropagation
No ratings yet
FFNN,GD,Backpropagation
18 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
4 - BP
No ratings yet
4 - BP
19 pages
Week3_Backpropagation
No ratings yet
Week3_Backpropagation
32 pages
Neural Networks: Learning: Introduction To Machine Learning
No ratings yet
Neural Networks: Learning: Introduction To Machine Learning
8 pages
Lecture02.Backpropagation.annotated
No ratings yet
Lecture02.Backpropagation.annotated
33 pages
Pptchapter06 Unit 3
No ratings yet
Pptchapter06 Unit 3
80 pages
Machine Learning Unit-2 Backpropagation Algorithm
No ratings yet
Machine Learning Unit-2 Backpropagation Algorithm
23 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Unit 2
No ratings yet
Unit 2
38 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Introduction to Advanced Mathematical Analysis
From Everand
Introduction to Advanced Mathematical Analysis
Simone Malacrida
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
On Deep Learning-Based Massive MIMO Indoor User Localization
No ratings yet
On Deep Learning-Based Massive MIMO Indoor User Localization
5 pages
Efficient MIMO Detection With Imperfect Channel Knowledge - A Deep Learning Approach
No ratings yet
Efficient MIMO Detection With Imperfect Channel Knowledge - A Deep Learning Approach
6 pages
Nonlinear Precoding For Phase-Quantized Constant-Envelope Massive MU-MIMO-OFDM
No ratings yet
Nonlinear Precoding For Phase-Quantized Constant-Envelope Massive MU-MIMO-OFDM
6 pages
Deep Learning-Based MIMO Communications
No ratings yet
Deep Learning-Based MIMO Communications
9 pages
Deep Tensor Convolution On Multicores
No ratings yet
Deep Tensor Convolution On Multicores
10 pages
Lec9 CNN 25jan18
No ratings yet
Lec9 CNN 25jan18
111 pages
Channel Modeling by RBF Neural Networks For 5G Mm-Wave Communication
No ratings yet
Channel Modeling by RBF Neural Networks For 5G Mm-Wave Communication
6 pages
Lecture 3 CNN - Backpropagation
No ratings yet
Lecture 3 CNN - Backpropagation
18 pages
Present Simple Vs Present Conti
No ratings yet
Present Simple Vs Present Conti
1 page
Business Proposal
No ratings yet
Business Proposal
26 pages
Home Visit Form
No ratings yet
Home Visit Form
11 pages
Din 1683
No ratings yet
Din 1683
2 pages
Hydro Resources Contractors Group Corp. Vs NIA (441 SCRA 614)
No ratings yet
Hydro Resources Contractors Group Corp. Vs NIA (441 SCRA 614)
22 pages
Redemption of Preference Shares
No ratings yet
Redemption of Preference Shares
11 pages
Download Modern Blue iPad Displaying Futuristic Interface Wallpaper Wallpapers.com
No ratings yet
Download Modern Blue iPad Displaying Futuristic Interface Wallpaper Wallpapers.com
1 page
wise_deposit_note__867900812__1975707863__en
No ratings yet
wise_deposit_note__867900812__1975707863__en
2 pages
APDevFundamentals Studentmanual
No ratings yet
APDevFundamentals Studentmanual
353 pages
CSCI262/CSCI862 System Security Spring 2021 Assignment 3 (12 Marks, Worth 12%)
No ratings yet
CSCI262/CSCI862 System Security Spring 2021 Assignment 3 (12 Marks, Worth 12%)
3 pages
BB FiberOpticCommandments
No ratings yet
BB FiberOpticCommandments
4 pages
USB4 Host Interface CTS Rev 1.9 - CLEAN
No ratings yet
USB4 Host Interface CTS Rev 1.9 - CLEAN
49 pages
Cadbury World Case Study
100% (1)
Cadbury World Case Study
88 pages
In Ac rcub-DGMST-2020A1918024IISEPTEMBER
No ratings yet
In Ac rcub-DGMST-2020A1918024IISEPTEMBER
1 page
One India One Plan
100% (1)
One India One Plan
3 pages
AGC 100 Operator's Manual 4189340753 UK
No ratings yet
AGC 100 Operator's Manual 4189340753 UK
17 pages
Catchment Area Analysis - Royapuram 5
No ratings yet
Catchment Area Analysis - Royapuram 5
24 pages
Chapter 6 Case Solution
No ratings yet
Chapter 6 Case Solution
2 pages
CV Ashiqur Rahman
No ratings yet
CV Ashiqur Rahman
2 pages
POM Unit 4
No ratings yet
POM Unit 4
16 pages
The Toyota Way (Overview)
No ratings yet
The Toyota Way (Overview)
7 pages
8 Laws (Toxic and Hazardous Waste) PDF
No ratings yet
8 Laws (Toxic and Hazardous Waste) PDF
20 pages
Sales Confirmation: Alpha Trading S.P.A. Compagnie Tunisienne de Navigation
No ratings yet
Sales Confirmation: Alpha Trading S.P.A. Compagnie Tunisienne de Navigation
1 page
MMDA V Viron Transport G
No ratings yet
MMDA V Viron Transport G
4 pages
Literature Review of Occupational Stress
100% (1)
Literature Review of Occupational Stress
9 pages
Communication in Poultry Grower Relations A Blueprint to Success 1st Edition Larry Cole Phd pdf download
No ratings yet
Communication in Poultry Grower Relations A Blueprint to Success 1st Edition Larry Cole Phd pdf download
53 pages
Marketing Cloud - Getting Started Guide PDF
No ratings yet
Marketing Cloud - Getting Started Guide PDF
19 pages
M13_Notes_Final Revision Recap
No ratings yet
M13_Notes_Final Revision Recap
36 pages
Lab Manual
No ratings yet
Lab Manual
3 pages
Lansing Housing Commission's Sale Documents Submitted To HUD
No ratings yet
Lansing Housing Commission's Sale Documents Submitted To HUD
613 pages

Backpropagation: TA: Yi Wen

Uploaded by

Backpropagation: TA: Yi Wen

Uploaded by

Backpropagation

April 17, 2020

Slides credits: Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen

Goal: how should we tweak the parameters to decrease the loss?

Goal: Tweak the parameters to minimize loss

=> minimize a multivariable function in parameter space

=> minimize a multivariable function

Intuition: upstream gradient values propagate backwards -- we can reuse them!

“direction and rate of fastest increase”

Numerical Gradient vs Analytical Gradient

● Problems might surface related to underlying gradients when debugging your

“Yes You Should Understand Backprop”

Given a function f with respect to inputs x, labels y, and parameters 𝜃

1. Identify intermediate functions (forward prop)

=> Squared Euclidean Distance

(↑lecture note) Input one feature vector

(←here) Input a batch of data (matrix)

When you take scalar-by-matrix gradients

The gradient has shape of denominator

● Dimension balancing is the “cheap” but efficient approach to

1. Write down variable graph

You might also like