0% found this document useful (0 votes)

30 views62 pages

Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei

This document discusses deep learning as a building block for probabilistic models. It provides an overview of generative and discriminative models for supervised learning tasks like classification and regression. Specifically, it explains that the goal is to build models that can produce uncertainty assessments and probabilistic predictions. While generative models fully model the joint distribution p(x,y), discriminative models directly model the conditional distribution p(y|x), which is sufficient for probabilistic predictions.

Uploaded by

martin.durand955

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views62 pages

Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei

Uploaded by

martin.durand955

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Deep learning as a building block in probabilistic models

Part II

Pierre-Alexandre Mattei

http://pamattei.github.io/
@pamattei

Inria, Maasai team

Université Côte d’Azur

1
Overview of talk

Recap on (supervised) generative/discriminative models

Deep discriminative models for classification

Deep discriminative models for regression

2
Supervised learning with uncertainty: general goal

The goal of this lecture is to train predictive machine learning models that can
produce uncertainty assessments.

3
Supervised learning with uncertainty: general goal

The goal of this lecture is to train predictive machine learning models that can
produce uncertainty assessments.

So what do we want?

3
Supervised learning with uncertainty: general goal

The goal of this lecture is to train predictive machine learning models that can
produce uncertainty assessments.

So what do we want?

We want to be able to make probabilistic preditions, like

• "the probability that the temperature in Nice tomorrow is between 20 and 25 degrees
in 17%",
• "the probability that this patient has this kind of cancer is 56%".

3
Supervised learning with uncertainty: general goal

The goal of this lecture is to train predictive machine learning models that can
produce uncertainty assessments.

So what do we want?

We want to be able to make probabilistic preditions, like

• "the probability that the temperature in Nice tomorrow is between 20 and 25 degrees
in 17%",
• "the probability that this patient has this kind of cancer is 56%".

To do that, we need to have a probabilistic model of our data, hence the need for
generative models, that can either be fully generative or discriminative.

3
What’s a generative model?

Let’s start with some data D. For example, in the regression case with p-dimensional
continuous features,
D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × R)n .
In the binary classification case,

D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × {0, 1})n .

4
What’s a generative model?

Let’s start with some data D. For example, in the regression case with p-dimensional
continuous features,
D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × R)n .
In the binary classification case,

D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × {0, 1})n .

In the unsupervised case, the data usually looks like D = (x1 , ..., xn ) ∈ (Rp )n .

4
What’s a generative model?

Let’s start with some data D. For example, in the regression case with p-dimensional
continuous features,
D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × R)n .
In the binary classification case,

D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × {0, 1})n .

In the unsupervised case, the data usually looks like D = (x1 , ..., xn ) ∈ (Rp )n .

We call (x1 , ..., xn ) the features and (y1 , ..., yn ) the labels. The features are usually stored
in a n × p matrix called the design matrix.

4
What’s a generative model?

Let’s start with some data D. For example, in the regression case with p-dimensional
continuous features,
D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × R)n .
In the binary classification case,

D = ((x1 , y1 ), ..., (xn , yn )) ∈ (Rp × {0, 1})n .

In the unsupervised case, the data usually looks like D = (x1 , ..., xn ) ∈ (Rp )n .

We call (x1 , ..., xn ) the features and (y1 , ..., yn ) the labels. The features are usually stored
in a n × p matrix called the design matrix.

A generative model “describes a process that is assumed to give rise

to some data”
David MacKay, in his book Information Theory, Inference, and Learning Algorithms (2003).

Formally, a generative model will just be a probability density p(D).

4
Generative models for supervised learning: General
assumptions
Although we’ll mostly focus on the unsupervised case my lectures, let us begin with
the (arguably simpler) supervised case D = ((x1 , y1 ), ..., (xn , yn )). It could be either a
regression or a classification task, for example.

5
Generative models for supervised learning: General
assumptions
Although we’ll mostly focus on the unsupervised case my lectures, let us begin with
the (arguably simpler) supervised case D = ((x1 , y1 ), ..., (xn , yn )). It could be either a
regression or a classification task, for example.

Most of the time, it makes sense to build generative models that assume that the
observations are independent. This leads to
n
Y
p(D) = p((x1 , y1 ), ..., (xn , yn )) = p(xi , yi ).
i=1

Usually, we also further assume that the data are identically distributed. This means
that all the (xi , yi ) will follow the same distribution that we may denote p(x, y)

Most of the time, it makes sense to build generative models that assume that the
observations are independent. This leads to
n
Y
p(D) = p((x1 , y1 ), ..., (xn , yn )) = p(xi , yi ).
i=1

Usually, we also further assume that the data are identically distributed. This means
that all the (xi , yi ) will follow the same distribution that we may denote p(x, y)

When these two assumptions are met, we say that the data are independent and
identically distributed (i.i.d.). This is super useful is practice because, rather than
having to find a distribution p((x1 , y1 ), ..., (xn , yn )) over a very large space (whose
dimension grows linearly with n), we’ll just have to find a much lower dimensional
distribution p(x, y).
5
Generative models for supervised learning: Do we really have
to be fully generative?

Using the product rule, we may rewrite our p(x, y) as

p(x, y) = p(x)p(y|x) = p(y)p(x|y).

6
Generative models for supervised learning: Do we really have
to be fully generative?

Using the product rule, we may rewrite our p(x, y) as

p(x, y) = p(x)p(y|x) = p(y)p(x|y).

But if we mainly want to do (probailisitic) preditions, knowing p(y|x) is enough. It’s exactly
this conditional distribution that will give us statements like "the probability that this
patient x has this kind of cancer is 56%".

6
Generative models for supervised learning: Do we really have
to be fully generative?

Using the product rule, we may rewrite our p(x, y) as

p(x, y) = p(x)p(y|x) = p(y)p(x|y).

Based on these insights, there are two main approaches for building p(x, y):
• The fully generative (or model-based) approach posits a joint distribution p(x, y)
(often by specifying both p(y) and p(x|y)).
• The discriminative (or conditional) approach just specifies p(y|x) and completely
ignores p(x).

6
Generative models for supervised learning: Do we really have
to be fully generative?

Using the product rule, we may rewrite our p(x, y) as

p(x, y) = p(x)p(y|x) = p(y)p(x|y).

What do you think are the benefits of the two approaches?

6
Generative models for supervised learning: Discriminative vs
fully generative

A few examples of the two approaches:

• Discriminative: linear and logistic regression, Neural nets for
regression/classification, Gaussian process regression/classification
• Generative: Linear/quadratic discriminant analysis, Mixture discriminant analysis,
Supervised variational autoencoders, most of the models Charles Bouveyron will talk
about in his course1

1 cf. his book with G. Celeux, B. Murphy et A. Raftery. 7

Generative models for supervised learning: Discriminative vs
fully generative

A few examples of the two approaches:

Some of the advantages/drawbacks:

• Discriminative: much easier to design (and usually train) because we don’t have to
model p(x). Usually more accurate where we have a lot of data. Cannot
acommodate to missing features or do semi-supervised learning (missing
labels) easily.
• Generative: Can deal with missing features/labels. Usually more accurate when we
do not have a lot of data. Usually more robust to adversarial examples. Requires to
specify p(x) which is often hard because x may be high-dimensional/complex.

1 cf. his book with G. Celeux, B. Murphy et A. Raftery. 7

Generative vs Discriminative: a concrete example

Article from decanter.com/wine-news/police-uncover-italian-wine-fraud-88060/

8
Generative vs Discriminative: a concrete example

One of the wines the bad guys counterfeited was from the Barolo region. According to
Wikipedia, those wines have "pronounced tannins and acidity", and "moderate to high
alcohol levels (Minimum 13%)". This would help a trained human recognise them, but
could we train an algorithm to learn those characteristics?

Picture from Wikipedia

9
Generative vs Discriminative: a concrete example

125

100
Barolo
acidity

Other

11 12 13 14 15
alcohol

Data from Forina, Armanino, Castino, and Ubigli, (Vitis, 1986).

10
Generative vs Discriminative: a concrete example
The generative way would use the formula p(x, y) = p(y)p(x|y) and model the class
conditional distributions p(x|y) using a continuous bivariate ditribution (e.g. 2D Gaussians).

125

100
Barolo
acidity

Other

11 12 13 14 15
alcohol

11
Generative vs Discriminative: a concrete example
The generative way would use the formula p(x, y) = p(y)p(x|y) and model the class
conditional distributions p(x|y) using a continuous bivariate ditribution (e.g. 2D Gaussians).

125

100
acidity

Barolo

11 12 13 14 15
alcohol

12
Generative vs Discriminative: a concrete example
The generative way would use the formula p(x, y) = p(y)p(x|y) and model the class
conditional distributions p(x|y) using a continuous bivariate ditribution (e.g. 2D Gaussians).
Here is what we obtain using the R package Mclust (Scrucca, Fop, Murphy, and Raftery,
R Journal, 2016).
140
120
Fixed Acidity

100
80
60

11 12 13 14

Alcohol
13
Generative vs Discriminative: a concrete example
The discriminative way would only model p(y|x). Since there are only 2 classes, this
means that p(y|x) will be a Bernoulli random variable whose parameter π(x) ∈ [0, 1] is
a function of the features.

125

100
Barolo
acidity

Other

11 12 13 14 15
alcohol

14
Generative vs Discriminative: a concrete example
The discriminative way would only model p(y|x). Since there are only 2 classes, this
means that p(y|x) will be a Bernoulli random variable whose parameter π(x) ∈ [0, 1] is
a function of the features.

125

100
Barolo
acidity

Other

11 12 13 14 15
alcohol

125

Key idea
100
Since we have an unknown function and Barolo
acidity

neural nets are good function approximators, Other

we should model π(x) using a neural net!
75

11 12 13 14 15
alcohol

14
Overview of talk

Recap on (supervised) generative/discriminative models

Deep discriminative models for classification

Deep discriminative models for regression

15
How to create a deep discriminative model?

We’ll focus now on the discriminative approach using neural nets, because it is
simpler. For more on the differences and links between the generative and discriminative
schools, a wonderful reference is Tom Minka’s short note on the subject: Discriminative
models, not discriminative training 2 .

2 https://tminka.github.io/papers/minka-discriminative.pdf
16
How to create a deep discriminative model?

Let us go back to our discriminative model: we can write it

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

where B(·|θ) denotes the density of a Bernoulli distribution with parameter θ ∈ [0, 1]. The
key idea is then to model the function x 7→ π(x) using a neural net.

2 https://tminka.github.io/papers/minka-discriminative.pdf
16
How to create a deep discriminative model?

Let us go back to our discriminative model: we can write it

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

where B(·|θ) denotes the density of a Bernoulli distribution with parameter θ ∈ [0, 1]. The
key idea is then to model the function x 7→ π(x) using a neural net.

This key idea goes way beyond the discriminative context

This general strategy of using outputs of neural nets as parameters of simple
probability distributions is the main recipe for building deep generative models. It
has been used extensively, for example in deep latent variable models such as variational
autoencoders (VAEs) or generative adversarial networks (GANs).

2 https://tminka.github.io/papers/minka-discriminative.pdf
16
How to model π
Our discriminative model for binary classification is

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

and we wish to model π using a neural net. But what kind of neural net?

17
How to model π
Our discriminative model for binary classification is

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

and we wish to model π using a neural net. But what kind of neural net?

The only really important constraint of the problem is that we need to have

∀x ∈ Rp , π(x) ∈ [0, 1].

Is it possible to enforce that easily in a neural net?

17
How to model π
Our discriminative model for binary classification is

p(y|x) = B(y|π(x)) = π(x)y (1 − π(x))1−y ,

and we wish to model π using a neural net. But what kind of neural net?

The only really important constraint of the problem is that we need to have

∀x ∈ Rp , π(x) ∈ [0, 1].

Is it possible to enforce that easily in a neural net?

Yes! By using a function that only output stuff in [0, 1] as the output layer. For example the
1
logistic sigmoid function σ : a 7→ 1+exp(−a) .
1.00

0.75
σ(x)

0.50

0.25

0.00

−10 −5 0 5 10
x

17
How to model π

So at the end, we’ll model π using the formula

π(x) = σ(fθ (x)),

where σ is the sigmoid function and fθ : Rp −→ R is any neural network (whose weights
are stored in a vector θ) that takes the features as input and returns an unconstrained real
number.

18
How to model π

So at the end, we’ll model π using the formula

π(x) = σ(fθ (x)),

where σ is the sigmoid function and fθ : Rp −→ R is any neural network (whose weights
are stored in a vector θ) that takes the features as input and returns an unconstrained real
number.

We have a lot of flexibility to choose fθ . In particular, if the features x1 , ..., xn are images,
we could use a CNN. In the case of time-series, we could use a recurrent neural net. In
the case of sets, we could use a deepsets architecture (Zaheer et al., NeurIPS 2017).

18
How to model π

So at the end, we’ll model π using the formula

π(x) = σ(fθ (x)),

where σ is the sigmoid function and fθ : Rp −→ R is any neural network (whose weights
are stored in a vector θ) that takes the features as input and returns an unconstrained real
number.

For the wine example, we could just take a small MLP

fθ (x) = W1 tanh(W0 xi + b0 ) + b1 .

18
How to model π

So at the end, we’ll model π using the formula

π(x) = σ(fθ (x)),

where σ is the sigmoid function and fθ : Rp −→ R is any neural network (whose weights
are stored in a vector θ) that takes the features as input and returns an unconstrained real
number.

For the wine example, we could just take a small MLP

fθ (x) = W1 tanh(W0 xi + b0 ) + b1 .

Since the function π and the model p(y|x) now depend on some parameters θ, we’ll
denote them by πθ and pθ (y|x) from now on.

18
How to find θ

There are many ways to find good parameter values for a generative model. One could
use Bayesian inference, score matching, the method of moments, adversarial training...
Let us focus on one of the most traditional ways: maximum likelihood. The idea is to find
a θ̂ that maximises the log-likelihood function log pθ (D).

19
How to find θ

In the discriminative case, the likelihood is:

n
X N
X N
X
log pθ (D) = log pθ (yi , xi ) = log pθ (yi |xi ) + log p(xi ),
i=1 i=1 i=1
Pn
but, since we don’t model p(x), i=1 log p(xi ) is constant, and maximising `(θ) is
equivalent to maximising
n
X
`(θ) = log pθ (yi |xi ).
i=1

We’ll also call `(θ) the likelihood (in fact, we’ll call any function that is equal to log pθ (D) up
to a constant the likelihood).

19
How to find θ: from ML to XENT

We have
n
X N
X
log π(x)yi (1 − π(x))1−yi ,

`(θ) = log pθ (yi |xi ) =
i=1 i=1

which leads to
n
X
`(θ) = [yi π(xi ) + (1 − yi ) log(1 − π(xi ))] .
i=1

20
How to find θ: from ML to XENT

We have
n
X N
X
log π(x)yi (1 − π(x))1−yi ,

`(θ) = log pθ (yi |xi ) =
i=1 i=1

which leads to
n
X
`(θ) = [yi π(xi ) + (1 − yi ) log(1 − π(xi ))] .
i=1

We will want to maximise this function, which is equivalent to minimising its opposite,
which is called the cross-entropy loss.
The cross-entropy loss is the most commonly used loss for neural networks, and is a way
of doing maximum likelihood without necessarily saying it.

20
Multiclass classification

If we have a multiclass problem (with K classes), we need to replace the Bernoulli

distribution with a categorical distribution3 . The model becomes

p(y|x) = Cat(y|π(x)).

The output of the neural net π(x) is no longer a single probability but a vector of
proportions of dimension K:

π(x) = (π(x)1 , ..., π(x)K ).

Of course, the proportions must be in [0, 1] and sum to one:

K
X
π(x)k = 1.
k=1

3 Bishop (see e.g. Section 2.2) calls this a multinomial distribution 21

Multiclass classification

If we have a multiclass problem (with K classes), we need to replace the Bernoulli

distribution with a categorical distribution3 . The model becomes

p(y|x) = Cat(y|π(x)).

The output of the neural net π(x) is no longer a single probability but a vector of
proportions of dimension K:

π(x) = (π(x)1 , ..., π(x)K ).

Of course, the proportions must be in [0, 1] and sum to one:

K
X
π(x)k = 1.
k=1

How can we enforce that the outputs of our neural net sum to one?

3 Bishop (see e.g. Section 2.2) calls this a multinomial distribution 21

Multiclass classification with the softmax

A simple way to make sure that the outputs of a neural net are indeed in [0, 1] and sum to
one is to use a softmax as a last layer.

22
Multiclass classification with the softmax

A simple way to make sure that the outputs of a neural net are indeed in [0, 1] and sum to
one is to use a softmax as a last layer.

The softmax function (aka normalised exponential), that’s defined as:

!|
exp a1 exp a2 exp aK
softmax(a1 , ..., aK ) = PK , PK , ..., PK .
j=1 exp aj j=1 exp aj j=1 exp aj

So at the end our model will look like

p(y|x) = Cat(y|πθ (x)),

with
πθ (x) = softmax(fθ (x)),
The function fθ can be modelled by any kind of neural network with input space Rp (the
data space) and output space RK . The unknown weights of the network are denoted by θ.

22
How to find θ: from ML to XENT (continued)

In the multiclass setting, the goal is again to maximise the likelihood:

n
X n
X
`(θ) = log pθ (yi |xi ) = log (Cat(yi |πθ (xi ))) .
i=1 i=1

One convenient way of writing the categorical density with parameter π is

Cat(y|π) = π1y1 ...πKyK ,

where y is a one-hot encoding of the label. This can be used to rewrite the likelihood:
n X
X K
`(θ) = yik log πθ (xi )k .
i=1 k=1

The opposite of this quantity is often called the cross-entropy loss. So minimising the
cross-entropy is equivalent to maximising the likelihood of a discriminative model.

23
Overview of talk

Recap on (supervised) generative/discriminative models

Deep discriminative models for classification

Deep discriminative models for regression

24
What is regression again?

In regression, the goal is to predict a continuous target y ∈ R using some features x ∈ Rp .

As before, the discriminative approach is to model only p(y|x). This is no longer a discrete
but a continuous distribution. What distribution could we choose?

25
What is regression again?

In regression, the goal is to predict a continuous target y ∈ R using some features x ∈ Rp .

As before, the discriminative approach is to model only p(y|x). This is no longer a discrete
but a continuous distribution. What distribution could we choose?

Often, we simply use a Gaussian! Indeed, the most famous regression model is the
Gaussian linear regression:

pβ,σ (y|x) = N (y|µ + xT β, σ 2 ),

which can be rewritten in a perhaps more familiar vector form

y = Xβ + µ1n + ε,

with ε ∼ N (0, σ 2 In ).

25
What is regression again?

In regression, the goal is to predict a continuous target y ∈ R using some features x ∈ Rp .

As before, the discriminative approach is to model only p(y|x). This is no longer a discrete
but a continuous distribution. What distribution could we choose?

Often, we simply use a Gaussian! Indeed, the most famous regression model is the
Gaussian linear regression:

pβ,σ (y|x) = N (y|µ + xT β, σ 2 ),

which can be rewritten in a perhaps more familiar vector form

y = Xβ + µ1n + ε,

with ε ∼ N (0, σ 2 In ).

How do we "make this model deep"?

25
What is regression again?

In regression, the goal is to predict a continuous target y ∈ R using some features x ∈ Rp .

As before, the discriminative approach is to model only p(y|x). This is no longer a discrete
but a continuous distribution. What distribution could we choose?

Often, we simply use a Gaussian! Indeed, the most famous regression model is the
Gaussian linear regression:

pβ,σ (y|x) = N (y|µ + xT β, σ 2 ),

which can be rewritten in a perhaps more familiar vector form

y = Xβ + µ1n + ε,

with ε ∼ N (0, σ 2 In ).

How do we "make this model deep"?

By replacing the simple linear function x 7→ µ + xT β by a neural network µθ (x).

25
Deep regression

In regression, the goal is to predict a continuous target y ∈ R using some features x ∈ Rp .

As before, the discriminative approach is to model only p(y|x). This is no longer a discrete
but a continuous distribution. What distribution could we choose?

26
Deep regression

In regression, the goal is to predict a continuous target y ∈ R using some features x ∈ Rp .

As before, the discriminative approach is to model only p(y|x). This is no longer a discrete
but a continuous distribution. What distribution could we choose?

Often, we simply use a Gaussian! Indeed, the most famous regression model is the
Gaussian linear regression:

pβ,σ (y|x) = N (y|µ + xT β, σ 2 ),

which can be rewritten in a perhaps more familiar vector form

y = Xβ + µ1n + ε,

with ε ∼ N (0, σ 2 In ).

26
Deep regression

In regression, the goal is to predict a continuous target y ∈ R using some features x ∈ Rp .

As before, the discriminative approach is to model only p(y|x). This is no longer a discrete
but a continuous distribution. What distribution could we choose?

Often, we simply use a Gaussian! Indeed, the most famous regression model is the
Gaussian linear regression:

pβ,σ (y|x) = N (y|µ + xT β, σ 2 ),

which can be rewritten in a perhaps more familiar vector form

y = Xβ + µ1n + ε,

with ε ∼ N (0, σ 2 In ).

How do we "make this model deep"?

26
Deep regression

In regression, the goal is to predict a continuous target y ∈ R using some features x ∈ Rp .

As before, the discriminative approach is to model only p(y|x). This is no longer a discrete
but a continuous distribution. What distribution could we choose?

Often, we simply use a Gaussian! Indeed, the most famous regression model is the
Gaussian linear regression:

pβ,σ (y|x) = N (y|µ + xT β, σ 2 ),

which can be rewritten in a perhaps more familiar vector form

y = Xβ + µ1n + ε,

with ε ∼ N (0, σ 2 In ).

How do we "make this model deep"?

By replacing the simple linear function x 7→ µ + xT β by a neural network µθ (x).

26
Deep regression

The simplest deep regression model is

pθ (y|x) = N (y|µθ (x), σ 2 ),

which could also be written
n
y = (µθ (xi ))i=1 + ε,
with ε ∼ N (0, σ 2 In ).
What are the parameters to learn?

27
Deep regression

The simplest deep regression model is

pθ (y|x) = N (y|µθ (x), σ 2 ),

which could also be written
n
y = (µθ (xi ))i=1 + ε,
with ε ∼ N (0, σ 2 In ).
What are the parameters to learn?

We have to learn θ and σ. Do you see a way to generalise this model further by adding
another deep learning touch?

27
Deep regression

The simplest deep regression model is

pθ (y|x) = N (y|µθ (x), σ 2 ),

which could also be written
n
y = (µθ (xi ))i=1 + ε,
with ε ∼ N (0, σ 2 In ).
What are the parameters to learn?

We have to learn θ and σ. Do you see a way to generalise this model further by adding
another deep learning touch?

Rather than assuming that σ is constant, we could rather model it using a neural net
σθ(x) !
Why on earth would that be a good idea?

27
Deep heteroscedatic regression

A deep heteroscedatic regression model is

pθ (y|x) = N (y|µθ (x), σθ (x)2 ),

its key strength is that it allows to model non-constant uncertainties about the value of
the target.

28
Deep heteroscedatic regression

A deep heteroscedatic regression model is

pθ (y|x) = N (y|µθ (x), σθ (x)2 ),

its key strength is that it allows to model non-constant uncertainties about the value of
the target.

Figure from https://en.wikipedia.org/wiki/Heteroscedasticity.

Quadrant Data Efficient Machine Learning Screen
No ratings yet
Quadrant Data Efficient Machine Learning Screen
6 pages
Decoding Generative and Discriminative Models
No ratings yet
Decoding Generative and Discriminative Models
8 pages
08 Generative
No ratings yet
08 Generative
23 pages
Bishop Valencia 07
No ratings yet
Bishop Valencia 07
22 pages
CSGL
No ratings yet
CSGL
11 pages
ML Merge
No ratings yet
ML Merge
145 pages
Unit-2: Logistic Regression
No ratings yet
Unit-2: Logistic Regression
30 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
DL Highlights
No ratings yet
DL Highlights
6 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
A Gentle Introduction To Generative Adversarial Networks (GANs)
No ratings yet
A Gentle Introduction To Generative Adversarial Networks (GANs)
15 pages
Unit 7 - 2
No ratings yet
Unit 7 - 2
59 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Lec 12
No ratings yet
Lec 12
15 pages
Challenges in ML&DM
No ratings yet
Challenges in ML&DM
12 pages
7.1 Generative & Discriminative Learning
No ratings yet
7.1 Generative & Discriminative Learning
16 pages
Generative VS Discriminative Models - by Prathap Manohar Joshi - Medium
No ratings yet
Generative VS Discriminative Models - by Prathap Manohar Joshi - Medium
1 page
Lecture 6 - Generative Models
No ratings yet
Lecture 6 - Generative Models
33 pages
AI Week 14
No ratings yet
AI Week 14
3 pages
MLP RL1
No ratings yet
MLP RL1
6 pages
Lec1 Intro
No ratings yet
Lec1 Intro
51 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
No ratings yet
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
55 pages
Discriminative and Generative Models in Machine Learning
No ratings yet
Discriminative and Generative Models in Machine Learning
9 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
This Story Paraphrased From A Post On 9/4/12
No ratings yet
This Story Paraphrased From A Post On 9/4/12
7 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Deep Learning Answers
No ratings yet
Deep Learning Answers
36 pages
Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Assignment 02: Submitted To
No ratings yet
Assignment 02: Submitted To
4 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
WK 08
No ratings yet
WK 08
10 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Classification
No ratings yet
Classification
53 pages
ML Unit I
No ratings yet
ML Unit I
14 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
FALL SEMESTER 2019-20 AI With Python: ECE4031 Digital Assignment - 1
No ratings yet
FALL SEMESTER 2019-20 AI With Python: ECE4031 Digital Assignment - 1
14 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Unit 1 Part 3
No ratings yet
Unit 1 Part 3
11 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Deep Gen Models Tutorial
No ratings yet
Deep Gen Models Tutorial
96 pages
Lecture3 Linear Classifiers
No ratings yet
Lecture3 Linear Classifiers
36 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
63 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
6 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
17 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
Discriminative vs. Generative Models
No ratings yet
Discriminative vs. Generative Models
6 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
DLT Unit-1
No ratings yet
DLT Unit-1
28 pages
ML 3
No ratings yet
ML 3
21 pages
The Matrixial Brain: Experiments in Reality
From Everand
The Matrixial Brain: Experiments in Reality
Paul Chaplin
No ratings yet
Abductive Reasoning: Fundamentals and Applications
From Everand
Abductive Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Woodwork Course Dublin
100% (2)
Woodwork Course Dublin
4 pages
(P) YUNPENG 2019 - Towards An Integrated Process Model For New Product Development
No ratings yet
(P) YUNPENG 2019 - Towards An Integrated Process Model For New Product Development
19 pages
Course Outline (EEE315Lab)
No ratings yet
Course Outline (EEE315Lab)
4 pages
Combo Meals - Interactive Classroom Activity
No ratings yet
Combo Meals - Interactive Classroom Activity
17 pages
BCU Immigration Booklet
No ratings yet
BCU Immigration Booklet
12 pages
Format For Enrolment Position For School Uniform 2025-26
No ratings yet
Format For Enrolment Position For School Uniform 2025-26
41 pages
1st Week Aug
No ratings yet
1st Week Aug
5 pages
Development of One'S Self As A Product of Enculturation: Marilyn B. Encarnacion
100% (1)
Development of One'S Self As A Product of Enculturation: Marilyn B. Encarnacion
18 pages
Ethical Hacker v1 0 Overview
No ratings yet
Ethical Hacker v1 0 Overview
1 page
CA 2 Assessment in Learning
No ratings yet
CA 2 Assessment in Learning
23 pages
ISO-9001: 2000 Awareness Training: Presentation By: Shashikant Gupta
No ratings yet
ISO-9001: 2000 Awareness Training: Presentation By: Shashikant Gupta
62 pages
Argumentative Essay Note (1) - 1
No ratings yet
Argumentative Essay Note (1) - 1
5 pages
Rubric For High Quality Physical Education
No ratings yet
Rubric For High Quality Physical Education
2 pages
Mrs. Ofelia Melchora M. Sumalin Principal Holy Spirit Academy
No ratings yet
Mrs. Ofelia Melchora M. Sumalin Principal Holy Spirit Academy
7 pages
Isbs203 System Analysis and Design: Designing The Human Interface
No ratings yet
Isbs203 System Analysis and Design: Designing The Human Interface
5 pages
Oral Communication in Context: Luzviminda P. Laureniana
100% (1)
Oral Communication in Context: Luzviminda P. Laureniana
10 pages
Functional Grammar: An Introduction
No ratings yet
Functional Grammar: An Introduction
248 pages
Fe Del Mundo
0% (1)
Fe Del Mundo
7 pages
Libros para Aprender Ingles
100% (1)
Libros para Aprender Ingles
3 pages
The Impact of Artificial Intelligence On Education and Learning
No ratings yet
The Impact of Artificial Intelligence On Education and Learning
2 pages
Vte 211syllabus
No ratings yet
Vte 211syllabus
16 pages
Mercer Theory
50% (2)
Mercer Theory
3 pages
Pages
No ratings yet
Pages
6 pages
College Grades 1st Sem (S.Y 2019-2020)
No ratings yet
College Grades 1st Sem (S.Y 2019-2020)
51 pages
Module 3-Tle
No ratings yet
Module 3-Tle
3 pages
HIC Learning Module: Holy Infant College
No ratings yet
HIC Learning Module: Holy Infant College
9 pages
Corporate Accounting 2 Bcom
No ratings yet
Corporate Accounting 2 Bcom
1 page
Planeaciones Normal
No ratings yet
Planeaciones Normal
88 pages
S2.4 Template of Table of Unpacking of Unit Standards and Learning Competencies
No ratings yet
S2.4 Template of Table of Unpacking of Unit Standards and Learning Competencies
3 pages
The Nature of Narrative Texts
No ratings yet
The Nature of Narrative Texts
9 pages