0% found this document useful (0 votes)

17 views6 pages

Unit 3

Aiml

Uploaded by

vdon9320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Unit 3

Aiml

Uploaded by

vdon9320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Historical Trends in Deep Learning

• Deep learning has had a long and rich history, but has gone by many names reflecting different
philosophical viewpoints, and has waxed and waned in popularity.

• Deep learning has become more useful as the amount of available training data has increased.

• Deep learning models have grown in size over time as computer hardware and software
infrastructure for deep learning has improved.

• Deep learning has solved increasingly complicated applications with increasing accuracy over
time

Deep learning known as cybernetics in the 1940s–1960s, deep learning known as connectionism
in the 1980s–1990s, and the current resurgence under the name deep learning beginning in 2006.

The first wave started with cybernetics in the 1940s–1960s, with the development of theories of
biological learning (McCulloch and Pitts, 1943; Hebb, 1949) and implementations of the first
models such as the perceptron (Rosenblatt, 1958) allowing the training of a single neuron.

The second wave started with the connectionist approach of the 1980–1995 period, with back-
propagation (Rumelhart et al., 1986a) to train a neural network with one or two hidden layers.

The current and third wave, deep learning, started around 2006 (Hinton et al., 2006; Bengio et
al., 2007; Ranzato et al., 2007a)

Back-Propagation and Other Differentiation Algorithms

The back-propagation algorithm (Rumelhart et al., 1986a), often simply called backprop, allows
the information from the cost to then flow backwards through the network, in order to compute
the gradient.

Actually, back-propagation refers only to the method for computing the gradient, while another
algorithm, such as stochastic gradient descent, is used to perform learning using this gradient

we will describe how to compute the gradient ∇xf(x, y) for an arbitrary function f, where x is a
set of variables whose derivatives are desired, and y is an additional set of variables that are
inputs to the function but whose derivatives are not required

∇θ
In learning algorithms, the gradient we most often require is the gradient of the cost function
with respect to the parameters, J(θ).
Computational Graphs

To describe the back-propagation algorithm more precisely, it is helpful to have a more precise
computational graph language.

we use each node in the graph to indicate a variable. The variable may be a scalar, vector, matrix,
tensor, or even a variable of another type.

Examples of computational graphs are shown in Fig

Examples of computational graphs. (a) The graph using the× operation to compute z = xy.

(b) The graph for the logistic regression prediction yˆ = σ  x  w + b  . Some of the
intermediate expressions do not have names in the algebraic expression but need names in the
graph. We simply name the i-th such variable u(i) .

(c) The computational graph for the expression H = max{0, XW + b}, which computes a design
matrix of rectified linear unit activations H given a design matrix containing a minibatch of
inputs X.
(d) Examples a–c applied at most one operation to each variable, but it is possible to apply more
than one operation

Here we show a computation graph that applies more than one operation to the weights w of a
linear regression model. The weights are used to make the both the prediction yˆ and the weight
decay penalty λ i sigma wi 2

Chain Rule of Calculus

Back-propagation is an algorithm that computes the chain rule, with a specific order of
operations that is highly efficient.

Let x be a real number, and let f and g both be functions mapping from a real number to a real
number. Suppose that y = g(x) and z = f (g(x)) = f (y).

Then the chain rule states that

The back-propagation algorithm is very simple. To compute the gradient of some scalar z with
respect to one of its ancestors x in the graph, we begin by observing that the gradient with
respect to z is given by dx/dz=1

We can then compute the gradient with respect to each parent of z in the graph by multiplying
the current gradient by the Jacobian of the operation that produced z.

We continue multiplying by Jacobians traveling backwards through the graph in this way until
we reach x.

More formally, each node in the graph G corresponds to a variable. To achieve maximum
generality, we describe this variable as being a tensor V.

We assume that each variable V is associated with the following subroutines:

• get_operation(V): This returns the operation that computes V, represented by the edges coming
into V in the computational graph. For example, there may be a Python or C++ class
representing the matrix multiplication operation, and the get_operation function. Suppose we
have a variable that is created by matrix multiplication, C = AB. Then get_operation(V) returns a
pointer to an instance of the corresponding C++ class.

• get_consumers(V, G): This returns the list of variables that are children of V in the
computational graph G.

• get_inputs(V, G): This returns the list of variables that are parents of V in the computational
graph G

For example, we might use a matrix multiplication operation to create a variable C = AB.

Suppose that the gradient of a scalar z with respect to C is given by G.

The matrix multiplication operation is responsible for defining two back-propagation rules, one
for each of its input arguments.

If we call the bprop method to request the gradient with respect to A given that the gradient on
the output is G, then the bprop method of the matrix multiplication operation must state that the
gradient with respect to A is given by GB transpose.

Likewise, if we call the bprop method to request the gradient with respect to B, then the matrix
operation is responsible for implementing the bprop method and specifying that the desired
gradient is given by A transpose G.

The back-propagation algorithm itself does not need to know any differentiation rules. It only
needs to call each operation’s bprop rules with the right arguments. Formally, op.bprop(inputs,
X, G) must return
Computing a gradient in a graph with n nodes will never execute more than O(n2) operations or
store the output of more than O(n2) operations.

The back-propagation algorithm adds one Jacobian-vector product, which should be expressed
with O(1) nodes, per edge in the original graph. Because the computational graph is a directed
acyclic graph it has at most O(n2) edges.

Most neural network cost functions are roughly chain-structured, causing back-propagation to
have O(n) cost.

ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville (Z-Lib - Org) - 226-228
No ratings yet
Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville (Z-Lib - Org) - 226-228
3 pages
Lecture 02-2
No ratings yet
Lecture 02-2
37 pages
2024 04 CS115 Vector Caculus
No ratings yet
2024 04 CS115 Vector Caculus
131 pages
Introduction To Feed Forward Neural Networks
No ratings yet
Introduction To Feed Forward Neural Networks
121 pages
Learning 3
No ratings yet
Learning 3
98 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
First
No ratings yet
First
92 pages
BACK PROPAGATION Cluster 4
No ratings yet
BACK PROPAGATION Cluster 4
45 pages
(Fall 2024) Deep Learning 2
No ratings yet
(Fall 2024) Deep Learning 2
46 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
SOLAR POWER BANK Final Report Submission
100% (1)
SOLAR POWER BANK Final Report Submission
22 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
100% (1)
Principles of Training Multi-Layer Neural Network Using Backpropagation
15 pages
XCS224N Module2 Slides
No ratings yet
XCS224N Module2 Slides
80 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
Backpropagation - Wikipedia
No ratings yet
Backpropagation - Wikipedia
28 pages
5 Backward Propagation
No ratings yet
5 Backward Propagation
81 pages
Chap 3 Slides
No ratings yet
Chap 3 Slides
95 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
Mod 2 DL
No ratings yet
Mod 2 DL
8 pages
Unit 3
No ratings yet
Unit 3
110 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
SJNanda - Neural Network
No ratings yet
SJNanda - Neural Network
43 pages
Free Ebook MCQ Series Based On e PG Pathshala P02-M1,2,3
No ratings yet
Free Ebook MCQ Series Based On e PG Pathshala P02-M1,2,3
81 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
43 pages
3 Gradient
No ratings yet
3 Gradient
31 pages
2005 RG Body
No ratings yet
2005 RG Body
1,402 pages
CSR BC417 Datasheet
100% (2)
CSR BC417 Datasheet
116 pages
Dl-Unit 2
No ratings yet
Dl-Unit 2
7 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
Quick Reference: 45/545RFE Component Location and I.D
No ratings yet
Quick Reference: 45/545RFE Component Location and I.D
8 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
47 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
LTE Radio Access Network Protocols and Procedures
0% (1)
LTE Radio Access Network Protocols and Procedures
151 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Smart Irrigation System Using IoT Technology (AutoRecovered)
No ratings yet
Smart Irrigation System Using IoT Technology (AutoRecovered)
11 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Backpropagation Through Time
No ratings yet
Backpropagation Through Time
11 pages
13 - Chapter 5 PDF
No ratings yet
13 - Chapter 5 PDF
40 pages
Ihp w22 Model Answer Paper 22655
No ratings yet
Ihp w22 Model Answer Paper 22655
14 pages
Backpropagation Exercises
No ratings yet
Backpropagation Exercises
7 pages
Week 1 Solutions
No ratings yet
Week 1 Solutions
8 pages
NN 1
No ratings yet
NN 1
21 pages
Module 2
No ratings yet
Module 2
14 pages
FFNN, GD, Backpropagation
No ratings yet
FFNN, GD, Backpropagation
18 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Unit 2
No ratings yet
Unit 2
19 pages
Working of Multi-Layer Perceptron
No ratings yet
Working of Multi-Layer Perceptron
16 pages
Notes On Backpropagation
No ratings yet
Notes On Backpropagation
14 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
CS231n Convolutional Neural Networks For Visual Recognition 4
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 4
10 pages
NIBDocument NIB16
No ratings yet
NIBDocument NIB16
92 pages
Epaycard - Customer - Account - Opening - Form - BUSTAMANTE, ARGEE L
No ratings yet
Epaycard - Customer - Account - Opening - Form - BUSTAMANTE, ARGEE L
1 page
Michael Todd Beauty Kicks Off Black Friday Sale
No ratings yet
Michael Todd Beauty Kicks Off Black Friday Sale
3 pages
Research Ethics in The Digital Age - Ethics For The Social Sciences and Humanities in Times of Mediatization and Digitization (High)
No ratings yet
Research Ethics in The Digital Age - Ethics For The Social Sciences and Humanities in Times of Mediatization and Digitization (High)
159 pages
Is 1892
No ratings yet
Is 1892
1 page
Future Scope and Conclusion
No ratings yet
Future Scope and Conclusion
13 pages
Backprop Unit 2
No ratings yet
Backprop Unit 2
5 pages
Classification and Diagnosis Using Back Propagation Artificial Neural Networks ANN
No ratings yet
Classification and Diagnosis Using Back Propagation Artificial Neural Networks ANN
5 pages
Data Migration in Fiori
No ratings yet
Data Migration in Fiori
22 pages
National Cybersecurity Policy 2023 - 2028 Is Published - Carey Abogados
No ratings yet
National Cybersecurity Policy 2023 - 2028 Is Published - Carey Abogados
4 pages
CS231n Convolutional Neural Networks For Visual Recognition
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition
9 pages
Mechanical Engineering - Lab Manual For Measurement and Instrumentation
No ratings yet
Mechanical Engineering - Lab Manual For Measurement and Instrumentation
18 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Lecture 2, Part 2: Backpropagation: Roger Grosse
No ratings yet
Lecture 2, Part 2: Backpropagation: Roger Grosse
9 pages
WPC MP
No ratings yet
WPC MP
19 pages
Clips Report-CAM - 6-2023-10-13-1407
No ratings yet
Clips Report-CAM - 6-2023-10-13-1407
2 pages
Course Unit - Human Flourishing in Science and Technology-Merged
No ratings yet
Course Unit - Human Flourishing in Science and Technology-Merged
24 pages
Soal
No ratings yet
Soal
14 pages
Backpropagation
No ratings yet
Backpropagation
2 pages
Air Coolers Price Inquiry Form
No ratings yet
Air Coolers Price Inquiry Form
4 pages
D - 5368 - Qadnet-Motorized Smoke Damper-Ocr-26102022.
No ratings yet
D - 5368 - Qadnet-Motorized Smoke Damper-Ocr-26102022.
3 pages
TNN 500af
No ratings yet
TNN 500af
49 pages
RCH-Series, Hollow Plunger Cylinders: Shown From Left To Right: RCH-306, RCH-120, RCH-1003
No ratings yet
RCH-Series, Hollow Plunger Cylinders: Shown From Left To Right: RCH-306, RCH-120, RCH-1003
2 pages
Digital Mp3 Player
No ratings yet
Digital Mp3 Player
3 pages
MatLab Add
No ratings yet
MatLab Add
9 pages
Objective:: Lab 07 - Introduction To Computing (EC-102)
No ratings yet
Objective:: Lab 07 - Introduction To Computing (EC-102)
10 pages
64167USERASSIST
No ratings yet
64167USERASSIST
10 pages
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
No ratings yet
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
6 pages
BQS502 Assignment 1 Mac - Aug 2022
No ratings yet
BQS502 Assignment 1 Mac - Aug 2022
2 pages
Productattachments Files Downloads Ezmimo 2-4ghz Datasheet
No ratings yet
Productattachments Files Downloads Ezmimo 2-4ghz Datasheet
1 page
Fifth Dimension: The Light to See
From Everand
Fifth Dimension: The Light to See
Marc E. King
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Homography: Homography: Transformations in Computer Vision
From Everand
Homography: Homography: Transformations in Computer Vision
Fouad Sabry
No ratings yet

Unit 3

Uploaded by

Unit 3

Uploaded by

Historical Trends in Deep Learning

Back-Propagation and Other Differentiation Algorithms

Examples of computational graphs are shown in Fig

Chain Rule of Calculus

Then the chain rule states that

We assume that each variable V is associated with the following subroutines:

Suppose that the gradient of a scalar z with respect to C is given by G.

You might also like