0% found this document useful (0 votes)

64 views57 pages

Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei

Deep learning is a framework for function approximation using neural networks composed of affine functions and nonlinear activations. It can approximate any continuous function and leverage prior knowledge through network architecture. For example, convolutional neural networks use local filters for images/audio. Multilayer perceptrons can perform nonlinear regression by minimizing loss on training data.

Uploaded by

martin.durand955

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views57 pages

Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei

Uploaded by

martin.durand955

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Deep learning as a building block in probabilistic models

Pierre-Alexandre Mattei

http://pamattei.github.io/
@pamattei

Inria, Maasai team

Université Côte d’Azur

1
But actually, what is deep learning?

Deep learning is a general framework for function approximation.

2
But actually, what is deep learning?

Deep learning is a general framework for function approximation.

It uses parametric approximators called neural networks, which are compositions of

some tunable affine functions f1 , ..., fL with a simple fixed nonlinear function σ:

F(x) = f1 ◦ σ ◦ f2 ◦ ... ◦ σ ◦ fL (x)

These functions are called layers. The nonlinearity σ is usually called the activation
function.

2
But actually, what is deep learning?

Deep learning is a general framework for function approximation.

It uses parametric approximators called neural networks, which are compositions of

some tunable affine functions f1 , ..., fL with a simple fixed nonlinear function σ:

F(x) = f1 ◦ σ ◦ f2 ◦ ... ◦ σ ◦ fL (x)

These functions are called layers. The nonlinearity σ is usually called the activation
function.

The derivatives of F with respect to the tunable parameters can be computed using the
chain rule via the backpropagation algorithm.

2
A glimpse at the zoology of layers
The simplest kind of affine layer is called a fully connected layer:
fl (x) = Wl x + bl ,
where Wl and bl are tunable parameters.

3
A glimpse at the zoology of layers
The simplest kind of affine layer is called a fully connected layer:
fl (x) = Wl x + bl ,
where Wl and bl are tunable parameters.

The activation function σ is usually a univariate fixed function applied elementwise.

Here are two popular choices:

Hyperbolic tangent
σ(x)

2
Restricted linear unit (ReLU)

−5.0 −2.5 0.0 2.5 5.0

x
3
Why is it convenient to compose affine functions?

• Neural nets are powerful approximators: any continuous function can be

arbitrarily well approximated on a compact using a three-layer fully connected
network F = f1 ◦ σ ◦ f2 (universal approximation theorem, Leshno, Lin, Pinkus, and
Schocke, Neural Netw., 1993). The conditions are that σ is not a polynomial and that
the network can be arbitrarily wide. A good review of similar results is the one of
Pinkus (Acta Numer., 1998).

4
Why is it convenient to compose affine functions?

• Neural nets are powerful approximators: any continuous function can be

• There are similar results for very thin but arbitrarily deep networks (Lin & Jegelka,
NeurIPS 2018).

4
Why is it convenient to compose affine functions?

• Neural nets are powerful approximators: any continuous function can be

• There are similar results for very thin but arbitrarily deep networks (Lin & Jegelka,
NeurIPS 2018).

• Some prior knowledge can be distilled into the architecture (i.e. the type of affine
functions/activations) of the network. For example, convolutional neural networks
(CNNs, LeCun et al., NeuIPS 1990) leverage the fact that local information plays an
important role in images/sound/sequence data. In that case, the affine functions are
convolution operators with some learnt filters.

4
Why is it convenient to compose affine functions?

• Often, this prior knowledge can be based on known symmetries, leading to deep
architectures that are equivariant or invariant to the action of some group (see
e.g. the work of Taco Cohen or Stéphane Mallat). This is useful when dealing with
images, sound, molecules...

5
Why is it convenient to compose affine functions?

• The layers can capture hierarchical representations of the data, sometimes

(almost) explicitely (e.g. the capsules of Hinton et al., ICLR 2018).

5
Why is it convenient to compose affine functions?

• The layers can capture hierarchical representations of the data, sometimes

(almost) explicitely (e.g. the capsules of Hinton et al., ICLR 2018).

• When the neural network parametrises a regression function, empirical evidence

shows that adding more layers leads to better out-of-sample behaviour. Roughly,
this means that adding more layers is a way of increasing the complexity of statistical
models without paying a large overfitting price: there is a regularisation-by-depth
effect.

5
A simple example: nonlinear regression with a multilayer
perceptron (MLP)
We want to perform regression on a data set

(x1 , y1 ), ..., (xn , yn ) ∈ Rp × R.

6
A simple example: nonlinear regression with a multilayer
perceptron (MLP)
We want to perform regression on a data set

(x1 , y1 ), ..., (xn , yn ) ∈ Rp × R.

We can model the regression function using a multilayer perceptron (MLP): two
connected layers with an hyperbolic tangent in-between:

y ≈ F θ (x) = W1 tanh(W0 x + b0 ) + b1 .

We call the complete set of parameters θ = (W1 , W0 , b1 , b0 ). The coordinates of the

intermediate representation W0 x + b0 are called hidden units. The more hidden units, the
more flexible (and difficult to train) the model.

6
A simple example: nonlinear regression with a multilayer
perceptron (MLP)
We want to perform regression on a data set

(x1 , y1 ), ..., (xn , yn ) ∈ Rp × R.

We can model the regression function using a multilayer perceptron (MLP): two
connected layers with an hyperbolic tangent in-between:

y ≈ F θ (x) = W1 tanh(W0 x + b0 ) + b1 .

We call the complete set of parameters θ = (W1 , W0 , b1 , b0 ). The coordinates of the

intermediate representation W0 x + b0 are called hidden units. The more hidden units, the
more flexible (and difficult to train) the model.

A natural way to find the parameters θ (a.k.a. the weights) of the MLP Fθ , is to minimise
the mean squared loss:
n
1X
MSE(θ) = (yi − Fθ (xi ))2
n
i=1

6
A simple example: nonlinear regression with a multilayer
perceptron (MLP)

∀i ≤ n, yi ≈ Fθ (xi ) = W1 tanh(W0 xi + b0 ) + b1 ,

Let’s try to recover the function sin(x)/x using 20 samples:

0.8 0.8

5 hidden units 20 hidden units

0.4 0.4

0.0 0.0

−10 0 10 −10 0 10

7
What’s "kind of wrong" with the approach we just saw?

This was of just minimising the MSE allows us to do predictions, but we cannot assess
the uncertainty of our predictions.

8
What’s "kind of wrong" with the approach we just saw?

This was of just minimising the MSE allows us to do predictions, but we cannot assess
the uncertainty of our predictions.

Being able to model uncertainty is critical in machine learning, even in purely

predictive tasks.

8
What’s "kind of wrong" with the approach we just saw?

This was of just minimising the MSE allows us to do predictions, but we cannot assess
the uncertainty of our predictions.

Being able to model uncertainty is critical in machine learning, even in purely

predictive tasks.

So what do we want?

8
What’s "kind of wrong" with the approach we just saw?

This was of just minimising the MSE allows us to do predictions, but we cannot assess
the uncertainty of our predictions.

Being able to model uncertainty is critical in machine learning, even in purely

predictive tasks.

So what do we want?

We want to be able to make probabilistic preditions, like

• "the probability that the temperature in Nice tomorrow is between 20 and 25 degrees
in 17%",
• "the probability that this patient has this kind of cancer is 56%".

8
What’s "kind of wrong" with the approach we just saw?

This was of just minimising the MSE allows us to do predictions, but we cannot assess
the uncertainty of our predictions.

Being able to model uncertainty is critical in machine learning, even in purely

predictive tasks.

So what do we want?

We want to be able to make probabilistic preditions, like

• "the probability that the temperature in Nice tomorrow is between 20 and 25 degrees
in 17%",
• "the probability that this patient has this kind of cancer is 56%".

To do that, we need to have a probabilistic model of our data, hence the need for
generative models.

8
What’s a generative model?

Let’s start with some data D. For example, in the regression case with p-dimensional
continuous features,
D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × R)n .
In the binary classification case,

D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × {0, 1})n .

9
What’s a generative model?

Let’s start with some data D. For example, in the regression case with p-dimensional
continuous features,
D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × R)n .
In the binary classification case,

D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × {0, 1})n .

In the unsupervised case, the data usually looks like D = (x1 , ..., xn ) ∈ (Rp )n .

9
What’s a generative model?

Let’s start with some data D. For example, in the regression case with p-dimensional
continuous features,
D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × R)n .
In the binary classification case,

D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × {0, 1})n .

In the unsupervised case, the data usually looks like D = (x1 , ..., xn ) ∈ (Rp )n .

We call (x1 , ..., xn ) the features and (y1 , ..., yn ) the labels. The features are usually stored
in a n × p matrix called the design matrix.

9
What’s a generative model?

Let’s start with some data D. For example, in the regression case with p-dimensional
continuous features,
D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × R)n .
In the binary classification case,

D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × {0, 1})n .

In the unsupervised case, the data usually looks like D = (x1 , ..., xn ) ∈ (Rp )n .

We call (x1 , ..., xn ) the features and (y1 , ..., yn ) the labels. The features are usually stored
in a n × p matrix called the design matrix.

A generative model “describes a process that is assumed to give rise

to some data”
David MacKay, in his book Information Theory, Inference, and Learning Algorithms (2003).

Formally, a generative model will just be a probability density p(D).

9
Generative models for supervised learning: General
assumptions
Although we’ll mostly focus on the unsupervised case my lectures, let us begin with
the (arguably simpler) supervised case D = ((x1 , y1 ), ..., (xn , yn )). It could be either a
regression or a classification task, for example.

10
Generative models for supervised learning: General
assumptions
Although we’ll mostly focus on the unsupervised case my lectures, let us begin with
the (arguably simpler) supervised case D = ((x1 , y1 ), ..., (xn , yn )). It could be either a
regression or a classification task, for example.

Most of the time, it makes sense to build generative models that assume that the
observations are independent. This leads to
n
Y
p(D) = p((x1 , y1 ), ..., (xn , yn )) = p(xi , yi ).
i=1

Usually, we also further assume that the data are identically distributed. This means
that all the (xi , yi ) will follow the same distribution that we may denote p(x, y)

Most of the time, it makes sense to build generative models that assume that the
observations are independent. This leads to
n
Y
p(D) = p((x1 , y1 ), ..., (xn , yn )) = p(xi , yi ).
i=1

Usually, we also further assume that the data are identically distributed. This means
that all the (xi , yi ) will follow the same distribution that we may denote p(x, y)

When these two assumptions are met, we say that the data are independent and
identically distributed (i.i.d.). This is super useful is practice because, rather than
having to find a distribution p((x1 , y1 ), ..., (xn , yn )) over a very large space (whose
dimension grows linearly with n), we’ll just have to find a much lower dimensional
distribution p(x, y).
10
Generative models for supervised learning: Do we really have
to be fully generative?

Using the product rule, we may rewrite our p(x, y) as

p(x, y) = p(x)p(y|x) = p(y)p(x|y).

11
Generative models for supervised learning: Do we really have
to be fully generative?

Using the product rule, we may rewrite our p(x, y) as

p(x, y) = p(x)p(y|x) = p(y)p(x|y).

But if we mainly want to do (probailisitic) preditions, knowing p(y|x) is enough. It’s exactly
this conditional distribution that will give us statements like "the probability that this
patient x has this kind of cancer is 56%".

11
Generative models for supervised learning: Do we really have
to be fully generative?

Using the product rule, we may rewrite our p(x, y) as

p(x, y) = p(x)p(y|x) = p(y)p(x|y).

Based on these insights, there are two main approaches for building p(x, y):
• The fully generative (or model-based) approach posits a joint distribution p(x, y)
(often by specifying both p(y) and p(x|y)).
• The discriminative (or conditional) approach just specifies p(y|x) and completely
ignores p(x).

11
Generative models for supervised learning: Do we really have
to be fully generative?

Using the product rule, we may rewrite our p(x, y) as

p(x, y) = p(x)p(y|x) = p(y)p(x|y).

What do you think are the benefits of the two approaches?

11
Generative models for supervised learning: Discriminative vs
fully generative

A few examples of the two approaches:

• Discriminative: linear and logistic regression, Neural nets for
regression/classification, Gaussian process regression/classification
• Generative: Linear/quadratic discriminant analysis, Mixture discriminant analysis,
Supervised variational autoencoders, most of the models Charles Bouveyron will talk
about in his course1

1 cf. his book with G. Celeux, B. Murphy et A. Raftery. 12

Generative models for supervised learning: Discriminative vs
fully generative

A few examples of the two approaches:

Some of the advantages/drawbacks:

• Discriminative: much easier to design (and usually train) because we don’t have to
model p(x). Usually more accurate where we have a lot of data. Cannot
acommodate to missing features or do semi-supervised learning (missing
labels) easily.
• Generative: Can deal with missing features/labels. Usually more accurate when we
do not have a lot of data. Usually more robust to adversarial examples. Requires to
specify p(x) which is often hard because x may be high-dimensional/complex.

1 cf. his book with G. Celeux, B. Murphy et A. Raftery. 12

Generative vs Discriminative: a concrete example

Article from decanter.com/wine-news/police-uncover-italian-wine-fraud-88060/

13
Generative vs Discriminative: a concrete example

One of the wines the bad guys counterfeited was from the Barolo region. According to
Wikipedia, those wines have "pronounced tannins and acidity", and "moderate to high
alcohol levels (Minimum 13%)". This would help a trained human recognise them, but
could we train an algorithm to learn those characteristics?

Picture from Wikipedia

14
Generative vs Discriminative: a concrete example

125

100
Barolo
acidity

Other

11 12 13 14 15
alcohol

Data from Forina, Armanino, Castino, and Ubigli, (Vitis, 1986).

15
Generative vs Discriminative: a concrete example
The generative way would use the formula p(x, y) = p(y)p(x|y) and model the class
conditional distributions p(x|y) using a continuous bivariate ditribution (e.g. 2D Gaussians).

125

100
Barolo
acidity

Other

11 12 13 14 15
alcohol

16
Generative vs Discriminative: a concrete example
The generative way would use the formula p(x, y) = p(y)p(x|y) and model the class
conditional distributions p(x|y) using a continuous bivariate ditribution (e.g. 2D Gaussians).

125

100
acidity

Barolo

11 12 13 14 15
alcohol

17
Generative vs Discriminative: a concrete example
The generative way would use the formula p(x, y) = p(y)p(x|y) and model the class
conditional distributions p(x|y) using a continuous bivariate ditribution (e.g. 2D Gaussians).
Here is what we obtain using the R package Mclust (Scrucca, Fop, Murphy, and Raftery,
R Journal, 2016).
140
120
Fixed Acidity

100
80
60

11 12 13 14

Alcohol
18
Generative vs Discriminative: a concrete example
The discriminative way would only model p(y|x). Since there are only 2 classes, this
means that p(y|x) will be a Bernoulli random variable whose parameter π(x) ∈ [0, 1] is
a function of the features.

125

100
Barolo
acidity

Other

11 12 13 14 15
alcohol

19
Generative vs Discriminative: a concrete example
The discriminative way would only model p(y|x). Since there are only 2 classes, this
means that p(y|x) will be a Bernoulli random variable whose parameter π(x) ∈ [0, 1] is
a function of the features.

125

100
Barolo
acidity

Other

11 12 13 14 15
alcohol

125

Key idea
100
Since we have an unknown function and Barolo
acidity

neural nets are good function approximators, Other

we should model π(x) using a neural net!
75

11 12 13 14 15
alcohol

19
Generative vs Discriminative: last words

We’ll focus now on the discriminative approach using neural nets, because it is
simpler. For more on the differences and links between the generative and discriminative
schools, a wonderful reference is Tom Minka’s short note on the subject: Discriminative
models, not discriminative training 2 .

2 https://tminka.github.io/papers/minka-discriminative.pdf
20
Generative vs Discriminative: last words

Let us go back to our discriminative model: we can write it

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

where B(·|θ) denotes the density of a Bernoulli distribution with parameter θ ∈ [0, 1]. The
key idea is then to model the function x 7→ π(x) using a neural net.

2 https://tminka.github.io/papers/minka-discriminative.pdf
20
Generative vs Discriminative: last words

Let us go back to our discriminative model: we can write it

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

where B(·|θ) denotes the density of a Bernoulli distribution with parameter θ ∈ [0, 1]. The
key idea is then to model the function x 7→ π(x) using a neural net.

This key idea goes way beyond the discriminative context

This general strategy of using outputs of neural nets as parameters of simple
probability distributions is the main recipe for building deep generative models. It
has been used extensively, for example in deep latent variable models such as variational
autoencoders (VAEs) or generative adversarial networks (GANs).

2 https://tminka.github.io/papers/minka-discriminative.pdf
20
How to model π
Our discriminative model for binary classification is

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

and we wish to model π using a neural net. But what kind of neural net?

21
How to model π
Our discriminative model for binary classification is

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

and we wish to model π using a neural net. But what kind of neural net?

The only really important constraint of the problem is that we need to have

∀x ∈ Rp , π(x) ∈ [0, 1].

Is it possible to enforce that easily in a neural net?

21
How to model π
Our discriminative model for binary classification is

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

and we wish to model π using a neural net. But what kind of neural net?

The only really important constraint of the problem is that we need to have

∀x ∈ Rp , π(x) ∈ [0, 1].

Is it possible to enforce that easily in a neural net?

Yes! By using a function that only output stuff in [0, 1] as the output layer. For example the
1
logistic sigmoid function σ : a 7→ 1+exp(−a) .
1.00

0.75
σ(x)

0.50

0.25

0.00

−10 −5 0 5 10
x

21
How to model π

So at the end, we’ll model π using the formula

π(x) = σ(fθ (x)),

where σ is the sigmoid function and fθ : Rp −→ R is any neural network (whose weights
are stored in a vector θ) that takes the features as input and returns an unconstrained real
number.

22
How to model π

So at the end, we’ll model π using the formula

π(x) = σ(fθ (x)),

where σ is the sigmoid function and fθ : Rp −→ R is any neural network (whose weights
are stored in a vector θ) that takes the features as input and returns an unconstrained real
number.

We have a lot of flexibility to choose fθ . In particular, if the features x1 , ..., xn are images,
we could use a CNN. In the case of time-series, we could use a recurrent neural net. In
the case of sets, we could use a deepsets architecture (Zaheer et al., NeurIPS 2017).

22
How to model π

So at the end, we’ll model π using the formula

π(x) = σ(fθ (x)),

where σ is the sigmoid function and fθ : Rp −→ R is any neural network (whose weights
are stored in a vector θ) that takes the features as input and returns an unconstrained real
number.

For the wine example, we could just take a small MLP

fθ (x) = W1 tanh(W0 xi + b0 ) + b1 .

22
How to model π

So at the end, we’ll model π using the formula

π(x) = σ(fθ (x)),

where σ is the sigmoid function and fθ : Rp −→ R is any neural network (whose weights
are stored in a vector θ) that takes the features as input and returns an unconstrained real
number.

For the wine example, we could just take a small MLP

fθ (x) = W1 tanh(W0 xi + b0 ) + b1 .

Since the function π and the model p(y|x) now depend on some parameters θ, we’ll
denote them by πθ and pθ (y|x) from now on.

22
How to find θ

There are many ways to find good parameter values for a generative model. One could
use Bayesian inference, score matching, the method of moments, adversarial training...
Let us focus on one of the most traditional ways: maximum likelihood. The idea is to find
a θ̂ that maximises the log-likelihood function log pθ (D).

23
How to find θ

In the discriminative case, the likelihood is:

n
X N
X N
X
log pθ (D) = log pθ (yi , xi ) = log pθ (yi |xi ) + log p(xi ),
i=1 i=1 i=1
Pn
but, since we don’t model p(x), i=1 log p(xi ) is constant, and maximising `(θ) is
equivalent to maximising
n
X
`(θ) = log pθ (yi |xi ).
i=1

We’ll also call `(θ) the likelihood (in fact, we’ll call any function that is equal to log pθ (D) up
to a constant the likelihood).

23
How to find θ: from ML to XENT

We have
n
X N
X
log π(x)yi (1 − π(x))1−yi ,

`(θ) = log pθ (yi |xi ) =
i=1 i=1

which leads to
n
X
`(θ) = [yi ln π(xi ) + (1 − yi ) ln(1 − π(xi ))] .
i=1

24
How to find θ: from ML to XENT

We have
n
X N
X
log π(x)yi (1 − π(x))1−yi ,

`(θ) = log pθ (yi |xi ) =
i=1 i=1

which leads to
n
X
`(θ) = [yi ln π(xi ) + (1 − yi ) ln(1 − π(xi ))] .
i=1

We will want to maximise this function, which is equivalent to minimising its opposite,
which is called the cross-entropy loss.
The cross-entropy loss is the most commonly used loss for neural networks, and is a way
of doing maximum likelihood without necessarily saying it.

Final Exam Update Huawei
0% (1)
Final Exam Update Huawei
13 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Lecture Five Radial-Basis Function Networks: Associate Professor
No ratings yet
Lecture Five Radial-Basis Function Networks: Associate Professor
64 pages
DSA5105 Lecture5
No ratings yet
DSA5105 Lecture5
52 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
On The Geometry of Deep Learning
No ratings yet
On The Geometry of Deep Learning
14 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
978-3-030-41068-1 (1) - 133-188
No ratings yet
978-3-030-41068-1 (1) - 133-188
56 pages
Module 2
No ratings yet
Module 2
44 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
12 - ASAP - NPTEL - Neural Network - Let4
No ratings yet
12 - ASAP - NPTEL - Neural Network - Let4
13 pages
DL 2
No ratings yet
DL 2
62 pages
Richi's Neural Nets Summary
No ratings yet
Richi's Neural Nets Summary
114 pages
Neural Network Theory22
No ratings yet
Neural Network Theory22
60 pages
Machine Learning: The Hundred-Page Book
No ratings yet
Machine Learning: The Hundred-Page Book
17 pages
Macro Finance
No ratings yet
Macro Finance
119 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Neural Networks Five
No ratings yet
Neural Networks Five
65 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Guilhoto Math
No ratings yet
Guilhoto Math
25 pages
Two Applications of Deep Learning in The Physical Layer of Communication Systems
No ratings yet
Two Applications of Deep Learning in The Physical Layer of Communication Systems
10 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Fundations Data Science
No ratings yet
Fundations Data Science
16 pages
The Mathematics of Artificial Intelligence: 1 Supervised Learning
No ratings yet
The Mathematics of Artificial Intelligence: 1 Supervised Learning
10 pages
Machine Learning-Lecture 16 (Student)
No ratings yet
Machine Learning-Lecture 16 (Student)
10 pages
CV 3
No ratings yet
CV 3
159 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Neural Network As Universal Approximates
No ratings yet
Neural Network As Universal Approximates
5 pages
cs188 sp23 Note25
No ratings yet
cs188 sp23 Note25
8 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
281 pages
1 NeuralNetworks
No ratings yet
1 NeuralNetworks
128 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
What Is Gradient Based Learning in Deep Learning
100% (1)
What Is Gradient Based Learning in Deep Learning
12 pages
2K21 - Ee - 192 MLP
No ratings yet
2K21 - Ee - 192 MLP
59 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
No ratings yet
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
51 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Deep Learning and Computacional Physics
No ratings yet
Deep Learning and Computacional Physics
88 pages
NeurIPS 2021 Towards Lower Bounds On The Depth of Relu Neural Networks Paper
No ratings yet
NeurIPS 2021 Towards Lower Bounds On The Depth of Relu Neural Networks Paper
13 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Deep Learning Hand Book 2024
No ratings yet
Deep Learning Hand Book 2024
185 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
Vanishing Gradient Problem
No ratings yet
Vanishing Gradient Problem
3 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Unit 3
No ratings yet
Unit 3
12 pages
Unit 3
No ratings yet
Unit 3
7 pages
Understanding Deep Convolutional Networks
No ratings yet
Understanding Deep Convolutional Networks
17 pages
Contents MLP PDF
No ratings yet
Contents MLP PDF
60 pages
Lecture 2: Basics and Definitions: Networks As Data Models
No ratings yet
Lecture 2: Basics and Definitions: Networks As Data Models
28 pages
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
DSS Chap 10
No ratings yet
DSS Chap 10
31 pages
JuneJuly - 2019
No ratings yet
JuneJuly - 2019
2 pages
HAGI BrightSpot July08
No ratings yet
HAGI BrightSpot July08
11 pages
Energy Optimization Prediction Write Up
No ratings yet
Energy Optimization Prediction Write Up
44 pages
Artificial Intelligence PDF
No ratings yet
Artificial Intelligence PDF
82 pages
LU5: Deep Feedforward Networks: Hidden Units, Architecture Design
No ratings yet
LU5: Deep Feedforward Networks: Hidden Units, Architecture Design
15 pages
Longer Reading - A Brief History of Artificial Intelligence
No ratings yet
Longer Reading - A Brief History of Artificial Intelligence
6 pages
2667-Article Text-6869-1-10-20230615
No ratings yet
2667-Article Text-6869-1-10-20230615
17 pages
Combining Inductive and Analytical Learning
No ratings yet
Combining Inductive and Analytical Learning
6 pages
Implementation Class Based Storage
No ratings yet
Implementation Class Based Storage
15 pages
Deep Learning Methods in Soft Robotics Architectur
No ratings yet
Deep Learning Methods in Soft Robotics Architectur
30 pages
Quiz - Review On Machine Learning
No ratings yet
Quiz - Review On Machine Learning
6 pages
National Institute of Technology Patna: Department of Computer Science & Engineering
No ratings yet
National Institute of Technology Patna: Department of Computer Science & Engineering
2 pages
Deep Learning: New Computational Modelling Techniques For Genomics
No ratings yet
Deep Learning: New Computational Modelling Techniques For Genomics
15 pages
PyCoder 12 20
No ratings yet
PyCoder 12 20
80 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Research 2
No ratings yet
Research 2
9 pages
CH 5
No ratings yet
CH 5
21 pages
An Intelligent Re Ective Colour Sensor System For Paper and Textile Industries
No ratings yet
An Intelligent Re Ective Colour Sensor System For Paper and Textile Industries
6 pages
Position Control of A PM Stepper Motor Using Neural Networks
No ratings yet
Position Control of A PM Stepper Motor Using Neural Networks
4 pages
AI Breeder - Genomic Predictions For Crop Breeding
No ratings yet
AI Breeder - Genomic Predictions For Crop Breeding
5 pages
DL For NLP
No ratings yet
DL For NLP
31 pages
Handwritten Digits Recognition
No ratings yet
Handwritten Digits Recognition
27 pages
SRM VALLIAMMAI 1924103-Machine-Learning
100% (1)
SRM VALLIAMMAI 1924103-Machine-Learning
10 pages
Deep Learning Glossary
No ratings yet
Deep Learning Glossary
30 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
7 pages
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
No ratings yet
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning
10 pages
Time-Series Extreme Event Forecasting With Neural Networks at Uber
No ratings yet
Time-Series Extreme Event Forecasting With Neural Networks at Uber
5 pages
Neural Networks
No ratings yet
Neural Networks
28 pages