0% found this document useful (0 votes)

4 views48 pages

Unit 3 .

The lecture discusses the limitations of linear classifiers, particularly in relation to non-linearly separable functions like XOR, and introduces multilayer perceptrons as a solution. It explains the architecture of neural networks, including the use of activation functions and the concept of feature learning. Finally, it outlines the backpropagation algorithm as a method for learning in neural networks, emphasizing its role in computing gradients for optimization.

Uploaded by

rafeedahjannath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views48 pages

Unit 3 .

Uploaded by

rafeedahjannath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

CS 4995 Lecture 2:

Multilayer Perceptrons & Backpropagation

Richard Zemel

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 1 / 45

Limits of Linear Classification

Single neurons (linear classifiers) are very limited in expressive power.

XOR is a classic example of a function that’s not linearly separable.

There’s an elegant proof using convexity.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 2 / 45

Limits of Linear Classification
Convex Sets

A set S is convex if any line segment connecting points in S lies

entirely within S. Mathematically,

x1 , x2 ∈ S =⇒ λx1 + (1 − λ)x2 ∈ S for 0 ≤ λ ≤ 1.

A simple inductive argument shows that for x1 , . . . , xN ∈ S, weighted

averages, or convex combinations, lie within the set:

λ1 x1 + · · · + λN xN ∈ S for λi > 0, λ1 + · · · λN = 1.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 3 / 45

Limits of Linear Classification

Showing that XOR is not linearly separable

Half-spaces are obviously convex.
Suppose there were some feasible hypothesis. If the positive examples are in
the positive half-space, then the green line segment must be as well.
Similarly, the red line segment must line within the negative half-space.

But the intersection can’t lie in both half-spaces. Contradiction!

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 4 / 45

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 5 / 45

• Suppose we just use pixels as pattern A
Limitsthe
offeatures.
Linear Classification
nating simple patterns pattern A
•Can a binary threshold unit
pattern A
slation with
A more wrap-around
discriminate
troubling between
example different
patterns that have the same
number of on pixels? pattern A pattern B
xels as
– Not if the patterns can
pattern A pattern B
translate with wrap-around!
unit pattern B
pattern A
fferent
ame
pattern B
These images represent 16-dimensional vectors. White = 0, black = 1.
an
pattern B
around! Want to distinguish patterns A and B in all possible translations (with
wrap-around) pattern B
Translation invariance is commonly desired in vision!
Suppose there’s a feasible solution. The average of all translations of A is the
vector (0.25, 0.25, . . . , 0.25). Therefore, this point must be classified as A.
Similarly, the average of all translations of B is also (0.25, 0.25, . . . , 0.25).
Therefore, it must be classified as B. Contradiction!

Credit: Geoffrey Hinton

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 5 / 45
Limits of Linear Classification
Sometimes we can overcome this limitation using feature maps, just
like for linear regression. E.g., for XOR:
 
x1
ψ(x) =  x2 
x1 x2

x1 x2 φ1 (x) φ2 (x) φ3 (x) t

0 0 0 0 0 0
0 1 0 1 0 1
1 0 1 0 0 1
1 1 1 1 1 0

This is linearly separable. (Try it!)

Not a general solution: it can be hard to pick good basis functions.
Instead, we’ll use neural nets to learn nonlinear hypotheses directly.
Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 6 / 45
Multilayer Perceptrons

We can connect lots of

units together into a
directed acyclic graph.
This gives a feed-forward
neural network. That’s
in contrast to recurrent
neural networks, which
can have cycles. (We’ll
talk about those later.)
Typically, units are
grouped together into
layers.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 7 / 45

Multilayer Perceptrons

Each layer connects N input units to M output units.

In the simplest case, all input units are connected to all output units. We call this
a fully connected layer. We’ll consider other layer types later.
Note: the inputs and outputs for a layer are distinct from the inputs and outputs
to the network.

Recall from softmax regression: this means we

need an M × N weight matrix.
The output units are a function of the input
units:
y = f (x) = φ (Wx + b)
A multilayer network consisting of fully
connected layers is called a multilayer
perceptron. Despite the name, it has nothing
to do with perceptrons!

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 8 / 45

Multilayer Perceptrons

Some activation functions:

Rectified Linear Unit

Linear Soft ReLU
(ReLU)
y =z y = log 1 + e z
y = max(0, z)

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 9 / 45

Multilayer Perceptrons

Some activation functions:

Hyperbolic Tangent
Hard Threshold Logistic
(tanh)
1 if z > 0 1
y=
0 if z ≤ 0 y= e z − e −z
1 + e −z y=
e z + e −z

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 10 / 45

Multilayer Perceptrons

Designing a network to compute XOR:

Assume hard threshold activation function

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 11 / 45

Multilayer Perceptrons

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 12 / 45

Multilayer Perceptrons

Each layer computes a function, so the network

computes a composition of functions:

h(1) = f (1) (x)

h(2) = f (2) (h(1) )
..
.
y = f (L) (h(L−1) )

Or more simply:

y = f (L) ◦ · · · ◦ f (1) (x).

Neural nets provide modularity: we can implement

each layer’s computations as a black box.
Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 13 / 45
Feature Learning
Neural nets can be viewed as a way of learning features:

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 14 / 45

Feature Learning
Neural nets can be viewed as a way of learning features:

The goal:

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 14 / 45

Feature Learning

Input representation of a digit : 784 dimensional vector.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 15 / 45

Feature Learning

Each first-layer hidden unit computes σ(wiT x)

Here is one of the weight vectors (also called a feature).
It’s reshaped into an image, with gray = 0, white = +, black = -.
To compute wiT x, multiply the corresponding pixels, and sum the result.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 16 / 45

Feature Learning

There are 256 first-level features total. Here are some of them.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 17 / 45

Expressive Power

We’ve seen that there are some functions that linear classifiers can’t
represent. Are deep networks any better?
Any sequence of linear layers can be equivalently represented with a
single linear layer.
y = |W(3) W{z(2) W(1)} x
,W0

Deep linear networks are no more expressive than linear regression!

Linear layers do have their uses — stay tuned!

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 18 / 45

Expressive Power

Multilayer feed-forward neural nets with nonlinear activation functions

are universal approximators: they can approximate any function
arbitrarily well.
This has been shown for various activation functions (thresholds,
logistic, ReLU, etc.)
Even though ReLU is “almost” linear, it’s nonlinear enough!

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 19 / 45

Expressive Power
Universality for binary inputs and targets:
Hard threshold hidden units, linear output
Strategy: 2D hidden units, each of which responds to one particular
input configuration

Only requires one hidden layer, though it needs to be extremely wide!

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 20 / 45
Expressive Power

What about the logistic activation function?

You can approximate a hard threshold by scaling up the weights and
biases:

y = σ(x) y = σ(5x)
This is good: logistic units are differentiable, so we can tune them
with gradient descent. (Stay tuned!)

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 21 / 45

Expressive Power

Limits of universality

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 22 / 45

Expressive Power

Limits of universality
You may need to represent an exponentially large network.
If you can learn any function, you’ll just overfit.
Really, we desire a compact representation!

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 22 / 45

Expressive Power

Limits of universality
You may need to represent an exponentially large network.
If you can learn any function, you’ll just overfit.
Really, we desire a compact representation!
We’ve derived units which compute the functions AND, OR, and
NOT. Therefore, any Boolean circuit can be translated into a
feed-forward neural net.
This suggests you might be able to learn compact representations of
some complicated functions

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 22 / 45

Overview

We’ve seen that multilayer neural networks are powerful. But how can
we actually learn them?
Backpropagation is the central algorithm in this course.
It’s is an algorithm for computing gradients.
Really it’s an instance of reverse mode automatic differentiation, which
is much more broadly applicable than just neural nets.
This is “just” a clever and efficient use of the Chain Rule for derivatives.
We’ll see how to implement an automatic differentiation system next
week.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 23 / 45

Recap: Gradient Descent
Recall: gradient descent moves opposite the gradient (the direction of
steepest descent)

Weight space for a multilayer neural net: one coordinate for each weight or
bias of the network, in all the layers
Conceptually, not any different from what we’ve seen so far — just higher
dimensional and harder to visualize!
We want to compute the cost gradient dJ /dw, which is the vector of
partial derivatives.
This is the average of dL/dw over all the training examples, so in this
lecture we focus on computing dL/dw.
Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 24 / 45
Univariate Chain Rule

We’ve already been using the univariate Chain Rule.

Recall: if f (x) and x(t) are univariate functions, then

d df dx
f (x(t)) = .
dt dx dt

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 25 / 45

Univariate Chain Rule

Recall: Univariate logistic least squares model

z = wx + b
y = σ(z)
1
L = (y − t)2
2

Let’s compute the loss derivatives.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 26 / 45

Univariate Chain Rule

How you would have done it in calculus class

1
L= (σ(wx + b) − t)2
2 ∂L ∂ 1
= (σ(wx + b) − t)2
∂L ∂ 1 ∂b ∂b 2
= (σ(wx + b) − t)2
∂w ∂w 2 1 ∂
= (σ(wx + b) − t)2
1 ∂ 2 2 ∂b
= (σ(wx + b) − t)
2 ∂w ∂
= (σ(wx + b) − t) (σ(wx + b) − t)
∂ ∂b
= (σ(wx + b) − t) (σ(wx + b) − t)
∂w ∂
= (σ(wx + b) − t)σ 0 (wx + b) (wx + b)
∂ ∂b
= (σ(wx + b) − t)σ 0 (wx + b) (wx + b)
∂w = (σ(wx + b) − t)σ 0 (wx + b)
0
= (σ(wx + b) − t)σ (wx + b)x

What are the disadvantages of this approach?

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 27 / 45

Univariate Chain Rule

A more structured way to do it

Computing the derivatives:

Computing the loss: dL
=y −t
z = wx + b dy
dL dL 0
y = σ(z) = σ (z)
dz dy
1
L = (y − t)2 ∂L dL
2 = x
∂w dz
∂L dL
=
∂b dz

Remember, the goal isn’t to obtain closed-form solutions, but to be able

to write a program that efficiently computes the derivatives.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 28 / 45

Univariate Chain Rule

We can diagram out the computations using a computation graph.

The nodes represent all the inputs and computed quantities, and the
edges represent which nodes are computed directly as a function of
which other nodes.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 29 / 45

Univariate Chain Rule

A slightly more convenient notation:

Use y to denote the derivative dL/dy , sometimes called the error signal.
This emphasizes that the error signals are just values our program is
computing (rather than a mathematical operation).
This is not a standard notation, but I couldn’t find another one that I liked.

Computing the loss: Computing the derivatives:

z = wx + b y =y −t
y = σ(z) z = y σ 0 (z)
1
L = (y − t)2 w =zx
2
b=z

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 30 / 45

Multivariate Chain Rule
Problem: what if the computation graph has fan-out > 1?
This requires the multivariate Chain Rule!

L2 -Regularized regression Multiclass logistic regression

z = wx + b
y = σ(z) X
1 z` = w`j xj + b`
L = (y − t)2 j
2
1 e zk
R = w2 yk = P z
2 `e
`

Lreg = L + λR
X
L=− tk log yk
k
Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 31 / 45
Multivariate Chain Rule
Suppose we have a function f (x, y ) and functions x(t) and y (t). (All
the variables here are scalar-valued.) Then

d ∂f dx ∂f dy
f (x(t), y (t)) = +
dt ∂x dt ∂y dt

Example:
f (x, y ) = y + e xy
x(t) = cos t
y (t) = t 2
Plug in to Chain Rule:
df ∂f dx ∂f dy
= +
dt ∂x dt ∂y dt
= (ye ) · (− sin t) + (1 + xe xy ) · 2t
xy

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 32 / 45

Multivariable Chain Rule

In the context of backpropagation:

In our notation:
dx dy
t=x +y
dt dt

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 33 / 45

Backpropagation

Full backpropagation algorithm:

Let v1 , . . . , vN be a topological ordering of the computation graph
(i.e. parents come before children.)
vN denotes the variable we’re trying to compute derivatives of (e.g. loss).

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 34 / 45

Backpropagation
Example: univariate logistic least squares regression
Backward pass:

Lreg = 1 dy
dLreg z =y
R = Lreg dz
dR = y σ 0 (z)
Forward pass: = Lreg λ ∂z dR
dLreg w= z +R
z = wx + b L = Lreg ∂w dw
dL = z x + Rw
y = σ(z)
= Lreg ∂z
1 b=z
L = (y − t)2 dL ∂b
2 y =L
1 dy =z
R = w2
2 = L (y − t)
Lreg = L + λR

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 35 / 45

Backpropagation
Multilayer Perceptron (multiple outputs):
Backward pass:
L=1
yk = L (yk − tk )
(2)
wki = yk hi
(2)
bk = yk
Forward pass: hi =
X (2)
yk wki
k
X (1) (1)
zi = wij xj + bi
j zi = hi σ 0 (zi )
hi = σ(zi ) (1)
wij = zi xj
X (2) (2)
yk = wki hi + bk bi
(1)
= zi
i
1X
L= (yk − tk )2
2
k

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 36 / 45

Vector Form

Computation graphs showing individual units are cumbersome.

As you might have guessed, we typically draw graphs over the
vectorized variables.

We pass messages back analogous to the ones for scalar-valued nodes.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 37 / 45

Vector Form
Consider this computation graph:

Backprop rules:
X ∂yk ∂y >
zj = yk z= y,
∂zj ∂z
k

where ∂y/∂z is the Jacobian matrix:

 ∂y1 ∂y1 
∂z ··· ∂zn
∂y  . 1 . .. 
=  .. .. . 
∂z ∂ym ∂ym
∂z1 ··· ∂zn

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 38 / 45

Vector Form

Examples
Matrix-vector product
∂z
z = Wx =W x = W> z
∂x
Elementwise operations
 
exp(z1 ) 0
∂y ..
y = exp(z) = z = exp(z) ◦ y
 
∂z . 
0 exp(zD )

Note: we never explicitly construct the Jacobian. It’s usually simpler

and more efficient to compute the VJP directly.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 39 / 45

Vector Form

Full backpropagation algorithm (vector form):

Let v1 , . . . , vN be a topological ordering of the computation graph
(i.e. parents come before children.)
vN denotes the variable we’re trying to compute derivatives of (e.g. loss).
It’s a scalar, which we can treat as a 1-D vector.

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 40 / 45

Vector Form

MLP example in vectorized form:

Backward pass:
L=1
y = L (y − t)
W(2) = yh>
Forward pass: b(2) = y
z = W(1) x + b(1) h = W(2)> y
h = σ(z) z = h ◦ σ 0 (z)
y = W(2) h + b(2) W(1) = zx>
1 b(1) = z
L = kt − yk2
2
Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 41 / 45
Computational Cost
Computational cost of forward pass: one add-multiply operation per
weight X (1) (1)
zi = wij xj + bi
j

Computational cost of backward pass: two add-multiply operations

per weight
(2)
wki = yk hi
(2)
X
hi = yk wki
k

Rule of thumb: the backward pass is about as expensive as two

forward passes.
For a multilayer perceptron, this means the cost is linear in the
number of layers, quadratic in the number of units per layer.
Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 42 / 45
Closing Thoughts

Backprop is used to train the overwhelming majority of neural nets today.

Even optimization algorithms much fancier than gradient descent
(e.g. second-order methods) use backprop to compute the gradients.
Despite its practical success, backprop is believed to be neurally implausible.
No evidence for biological signals analogous to error derivatives.
All the biologically plausible alternatives we know about learn much
more slowly (on computers).
So how on earth does the brain learn?

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 43 / 45

Closing Thoughts

The psychological profiling [of a programmer] is mostly the ability to shift

levels of abstraction, from low level to high level. To see something in the
small and to see something in the large.

– Don Knuth
By now, we’ve seen three different ways of looking at gradients:
Geometric: visualization of gradient in weight space
Algebraic: mechanics of computing the derivatives
Implementational: efficient implementation on the computer
When thinking about neural nets, it’s important to be able to shift
between these different perspectives!

Richard Zemel CS 4995 Lecture 2: Multilayer Perceptrons & Backpropagation 44 / 45

Gradient - AI by Hand Workbook
No ratings yet
Gradient - AI by Hand Workbook
26 pages
AI (Whole)
No ratings yet
AI (Whole)
96 pages
Module I
No ratings yet
Module I
109 pages
UNIT 3-Multilayer-Perceptrons
No ratings yet
UNIT 3-Multilayer-Perceptrons
23 pages
ML Section15 Neural Networks
No ratings yet
ML Section15 Neural Networks
133 pages
Dave Reed: Connectionist Approach To AI
No ratings yet
Dave Reed: Connectionist Approach To AI
26 pages
Section06 DeepLearning
No ratings yet
Section06 DeepLearning
92 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
125 pages
Lecture Notes 02
No ratings yet
Lecture Notes 02
65 pages
Lecture 4 - Linear Classification
No ratings yet
Lecture 4 - Linear Classification
34 pages
Transformers LLMs
100% (1)
Transformers LLMs
163 pages
cs188 sp24 Note22
No ratings yet
cs188 sp24 Note22
8 pages
Exam 2003
No ratings yet
Exam 2003
21 pages
Lec 05
No ratings yet
Lec 05
46 pages
Machine Learning: The Hundred-Page Book
No ratings yet
Machine Learning: The Hundred-Page Book
17 pages
10 Multilayer Perceptrons
No ratings yet
10 Multilayer Perceptrons
54 pages
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
No ratings yet
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
48 pages
08 Neural Networks
No ratings yet
08 Neural Networks
47 pages
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
No ratings yet
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
9 pages
11-Nonlinear Models (Neural Networks)
No ratings yet
11-Nonlinear Models (Neural Networks)
6 pages
2 - MLP and CNN
No ratings yet
2 - MLP and CNN
32 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
Linearly Separable 1
No ratings yet
Linearly Separable 1
36 pages
Neural Networks Unit-3
No ratings yet
Neural Networks Unit-3
14 pages
02A DL2023 NN Basics
No ratings yet
02A DL2023 NN Basics
52 pages
Lecture 0.4 - Neural Networks
No ratings yet
Lecture 0.4 - Neural Networks
51 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Slides NN
No ratings yet
Slides NN
59 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
2K21 - Ee - 192 MLP
No ratings yet
2K21 - Ee - 192 MLP
59 pages
1c Perceptrons
No ratings yet
1c Perceptrons
20 pages
Whitepaper KX
No ratings yet
Whitepaper KX
230 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
ANN Syllabus
No ratings yet
ANN Syllabus
2 pages
Lecture 8 - Intro To Neural Networks
No ratings yet
Lecture 8 - Intro To Neural Networks
61 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
FML Unit5
No ratings yet
FML Unit5
21 pages
Week3 Perceptron Mlprwerwerwer
No ratings yet
Week3 Perceptron Mlprwerwerwer
8 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
NN-Ch2 New V1
No ratings yet
NN-Ch2 New V1
99 pages
L05 Slides - mlp2
No ratings yet
L05 Slides - mlp2
21 pages
1c Perceptrons4
No ratings yet
1c Perceptrons4
5 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
13 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
Lecture 4
No ratings yet
Lecture 4
65 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
UNIT1 Perceptron MLP
No ratings yet
UNIT1 Perceptron MLP
26 pages
cs188 sp23 Note25
No ratings yet
cs188 sp23 Note25
8 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Anthony Kuh - Neural Networks and Learning Theory
No ratings yet
Anthony Kuh - Neural Networks and Learning Theory
72 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
NN 1
No ratings yet
NN 1
6 pages
Single Layer Feedforward Networks
No ratings yet
Single Layer Feedforward Networks
21 pages
Unit V
No ratings yet
Unit V
25 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
DL 02 Deep Forward Networks
No ratings yet
DL 02 Deep Forward Networks
47 pages
5 - From Linear Models To Multi-Layer Perceptrons
No ratings yet
5 - From Linear Models To Multi-Layer Perceptrons
45 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Module 2
No ratings yet
Module 2
44 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Supervised Learning Neural Networks
No ratings yet
Supervised Learning Neural Networks
34 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
17 pages
Boltzmann Machine
No ratings yet
Boltzmann Machine
47 pages
Neural Network
No ratings yet
Neural Network
82 pages
6 05 Undercomplete Vs Overcomplete Hidden Layer
No ratings yet
6 05 Undercomplete Vs Overcomplete Hidden Layer
4 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
10 pages
SoftComputing Module I
No ratings yet
SoftComputing Module I
4 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Introduction To Deep Convolutional Neural Networks: March 2016
No ratings yet
Introduction To Deep Convolutional Neural Networks: March 2016
51 pages
Soft Computing
No ratings yet
Soft Computing
16 pages
All-In-One Emotion, Sentiment and Intensity Prediction Using A Multi-Task Ensemble Framework-Ppt-1
No ratings yet
All-In-One Emotion, Sentiment and Intensity Prediction Using A Multi-Task Ensemble Framework-Ppt-1
29 pages
Ad3511-Deep Learning-Lab Manual
No ratings yet
Ad3511-Deep Learning-Lab Manual
53 pages
Unit 5 - Week 4: Assignment 4
No ratings yet
Unit 5 - Week 4: Assignment 4
4 pages
SC Practicals1 To 10 Black Book
No ratings yet
SC Practicals1 To 10 Black Book
32 pages
Deep Learning LAB
No ratings yet
Deep Learning LAB
47 pages
MLP Vs RBF Doctoral Thesis
No ratings yet
MLP Vs RBF Doctoral Thesis
32 pages
Handwritten Digit Recognition Using A Neural Network
No ratings yet
Handwritten Digit Recognition Using A Neural Network
4 pages
Imp Questions For Ci - Update
No ratings yet
Imp Questions For Ci - Update
8 pages
Multilayer Perceptron
No ratings yet
Multilayer Perceptron
16 pages
Apicella Et Al. 2019 - A Simple and Efficient Architecture For Trainable Activation Functions
No ratings yet
Apicella Et Al. 2019 - A Simple and Efficient Architecture For Trainable Activation Functions
15 pages
NNT Intro 1
No ratings yet
NNT Intro 1
8 pages
A Decision Support System For Diabetes Prediction Using Machine Learning and Deep Learning Techniques
No ratings yet
A Decision Support System For Diabetes Prediction Using Machine Learning and Deep Learning Techniques
4 pages
NNDL Internal I Key
No ratings yet
NNDL Internal I Key
5 pages
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
No ratings yet
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
3 pages
Abstract Lie Algebras
From Everand
Abstract Lie Algebras
David J Winter
No ratings yet