0% found this document useful (0 votes)

10 views

Lecture 5

Uploaded by

Sourabh Dandare

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Lecture 5

Uploaded by

Sourabh Dandare

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 127

Recap

• Bayes classifier is optimal for minimizing risk. Risk

minimization is a very good objective.

PR NPTEL course – p.1/127

Recap

• Bayes classifier is optimal for minimizing risk. Risk

minimization is a very good objective.
• Given class conditional densities we can derive the
Bayes classifier for any loss function.

PR NPTEL course – p.2/127

Recap

• Bayes classifier is optimal for minimizing risk. Risk

minimization is a very good objective.
• Given class conditional densities we can derive the
Bayes classifier for any loss function.
• There are other ways (other than loss function) to
trade different errors. For example, NP classifier.

PR NPTEL course – p.3/127

Recap

• Bayes classifier is optimal for minimizing risk. Risk

PR NPTEL course – p.4/127

Receiver Operating Characteristic (ROC)

• Consider a one dimensional feature space, 2-class

problem with a classifier, h(X) = 0 if X < τ .
• Consider equal priors, Gaussian class conditional
densities with equal variance, 0-1 loss. Now let us
write the probability of error as a function of τ .

PR NPTEL course – p.5/127

Receiver Operating Characteristic (ROC)

Z τ Z ∞
P [error] = 0.5 f1 (X) dX + 0.5 f0 (X) dX
−∞ τ
τ − µ1 τ − µ0
µ ¶ µ ¶
= 0.5Φ + 0.5(1 − Φ )
σ σ
• As we vary τ we trade one kind of error with another.
In Bayes classifier, the loss function determines the
‘exchange rate’.

PR NPTEL course – p.6/127

ROC curve

• The receiver operating characteristic (ROC) curve is

one way to conveniently visualize and exploit this
trade off.
• For a two class classifier there are four possible
outcomes of a classifcation decison – two are correct
decisions and two are errors.
• Let ei denote probability of wrongly assigning class i,
i = 0, 1.

PR NPTEL course – p.7/127

ROC curve

Then we have
e0 = P [X ≤τ |X ∈ c-1] (a miss)
e1 = P [X >τ |X ∈ c-0] (false alarm)
1 − e0 = P [X >τ |X ∈ c-1] (correct detection)
1 − e1 = P [X ≤τ |X ∈ c-0] (correct rejection)

PR NPTEL course – p.8/127

ROC curve

Then we have
e0 = P [X ≤τ |X ∈ c-1] (a miss)
e1 = P [X >τ |X ∈ c-0] (false alarm)
1 − e0 = P [X >τ |X ∈ c-1] (correct detection)
1 − e1 = P [X ≤τ |X ∈ c-0] (correct rejection)
• For fixed class conditional densities, if we vary τ the
point (e1 , 1 − e0 ) moves on a smooth curve in ℜ2 .
• This is traditionally called the ROC curve. (Choice of
coordinates is arbitrary)

PR NPTEL course – p.9/127

• For any fixed τ we can estimate e0 and e1 from
training data.

PR NPTEL course – p.10/127

• For any fixed τ we can estimate e0 and e1 from
training data.
• Hence, varying τ we can find ROC and decide which
may be the best operating point.

PR NPTEL course – p.11/127

PR NPTEL course – p.12/127

• For any fixed τ we can estimate e0 and e1 from
training data.
• Hence, varying τ we can find ROC and decide which
may be the best operating point.
• This can be done for any threshold based classifier
irrespective of class conditional densities.
• When the class conditional densities are Gaussian
with equal variance, we use this procedure to
estimate Bayes error also.

PR NPTEL course – p.13/127

• From our earlier error integral we get
τ − µ0
= Φ−1 (1 − e1 ) = a, say
σ
τ − µ1
= Φ−1 (1 − (1 − e0 )) = b, say
σ

PR NPTEL course – p.14/127

• From our earlier error integral we get
τ − µ0
= Φ−1 (1 − e1 ) = a, say
σ
τ − µ1
= Φ−1 (1 − (1 − e0 )) = b, say
σ
|µ1 −µ0 |
• Then, |a − b| = σ
= d, the discriminability.

PR NPTEL course – p.15/127

• From our earlier error integral we get
τ − µ0
= Φ−1 (1 − e1 ) = a, say
σ
τ − µ1
= Φ−1 (1 − (1 − e0 )) = b, say
σ
|µ1 −µ0 |
• Then, |a − b| = σ
= d, the discriminability.
• Knowing e1 , (1 − e0 ), we can get d and hence the
Bayes error. For our given τ we can also get the
actuall error probability. We can tweak τ to match the
Bayes error.
PR NPTEL course – p.16/127
• We can in general use the ROC curve in
multidimensional cases also. Consider, for example,
h(X) = sgn(Wt X + w0 ).
We can use ROC to fix w0 after learning W.

PR NPTEL course – p.17/127

Implementing Bayes Classifier

• We need class conditional densities and prior

probabilities.

PR NPTEL course – p.18/127

Implementing Bayes Classifier

• We need class conditional densities and prior

probabilities.
• Prior probabilities can be estimated as fraction of
examples from each class.

PR NPTEL course – p.19/127

Implementing Bayes Classifier

• We need class conditional densities and prior

PR NPTEL course – p.20/127

Implementing Bayes Classifier

• We need class conditional densities and prior

probabilities.
• Prior probabilities can be estimated as fraction of
examples from each class.
• Since examples are iid and the class labels of
examples are known, we have some iid samples from
each class conditional distribution.
• The problem: Given {x1 , x2 , · · · , xn } drawn iid
according to some distribution, estimate the
probability distribution / density.

PR NPTEL course – p.21/127

Estimating densities

• Two main approaches: Parametric and

non-parametric.

PR NPTEL course – p.22/127

Estimating densities

• Two main approaches: Parametric and

PR NPTEL course – p.23/127

Estimating densities

• Two main approaches: Parametric and

non-parametric.
• Parametric: We assume we have iid realizations of a
random variable X whose distribution is known except
for values of a parameter vector. We estimate the
parameters of the density using the samples available.
• In non-parametric approach we do not assume form
of density. It is often modelled as a convex
combination of some densities using the samples.

PR NPTEL course – p.24/127

Estimating parameters of a density

• Denote the density by f (x | θ) where θ is a parameter

vector.

PR NPTEL course – p.25/127

Estimating parameters of a density

• Denote the density by f (x | θ) where θ is a parameter

vector.
• For example, let θ = (θ1 , θ2 ) and
2
(x − θ1 )
µ ¶
1
f (x | θ) = √ exp −
2π θ2 2θ2

PR NPTEL course – p.26/127

Estimating parameters of a density

• Denote the density by f (x | θ) where θ is a parameter

vector.
• For example, let θ = (θ1 , θ2 ) and
2
(x − θ1 )
µ ¶
1
f (x | θ) = √ exp −
2π θ2 2θ2
f (x|θ) is normal with mean and variance constituting
the parameter vector.

PR NPTEL course – p.27/127

Estimating parameters of a density

• Denote the density by f (x | θ) where θ is a parameter

vector.
• For example, let θ = (θ1 , θ2 ) and
2
(x − θ1 )
µ ¶
1
f (x | θ) = √ exp −
2π θ2 2θ2
f (x|θ) is normal with mean and variance constituting
the parameter vector.
• Now estimation of density is same as estimation of a
parameter vector.

PR NPTEL course – p.28/127

Notation

• Let X denote a random variable with density f (x | θ).

(Use same notation even when X is a random vector)

PR NPTEL course – p.29/127

Notation

• Let X denote a random variable with density f (x | θ).

(Use same notation even when X is a random vector)
• A (iid) sample of size n consists of n iid realizations of
X.

PR NPTEL course – p.30/127

Notation

• Let X denote a random variable with density f (x | θ).

PR NPTEL course – p.31/127

Notation

• Let X denote a random variable with density f (x | θ).

(Use same notation even when X is a random vector)
• A (iid) sample of size n consists of n iid realizations of
X.
• x = (x1 , · · · , xn )T – the sample or the data.
We sometimes use D to denote the data.
• It can be thought of as a realization of (X1 , · · · , Xn )T
where Xi are iid with density f (x | θ).

PR NPTEL course – p.32/127

• A statistic is a function of data, e.g., g(x1 , · · · , xn ).

PR NPTEL course – p.33/127

• A statistic is a function of data, e.g., g(x1 , · · · , xn ).
• An estimator is such a statistic. θ̂(x1 , · · · , xn ).

PR NPTEL course – p.34/127

• A statistic is a function of data, e.g., g(x1 , · · · , xn ).
• An estimator is such a statistic. θ̂(x1 , · · · , xn ).
• When we need to remember the sample size, we
write θ̂n

PR NPTEL course – p.35/127

• A statistic is a function of data, e.g., g(x1 , · · · , xn ).
• An estimator is such a statistic. θ̂(x1 , · · · , xn ).
• When we need to remember the sample size, we
write θ̂n
• For example,
n
1 X
θ̂n = xi
n i=1

the well-known sample mean.

PR NPTEL course – p.36/127

• There can be different estimators that are intuitively
reasonable.

PR NPTEL course – p.37/127

• There can be different estimators that are intuitively
reasonable.
• Let X be Poisson with parameter λ. Then sample
mean as well as sample variance seem to be
reasonable estimators for λ.

PR NPTEL course – p.38/127

• There can be different estimators that are intuitively
reasonable.
• Let X be Poisson with parameter λ. Then sample
mean as well as sample variance seem to be
reasonable estimators for λ.
• Let X be normal with mean µ and variance unity.
Both sample mean and sample median seem good
choices.

PR NPTEL course – p.39/127

PR NPTEL course – p.40/127

• We need ‘good’ estimators.

PR NPTEL course – p.41/127

• We need ‘good’ estimators.
• We need some criteria for ‘goodness’. Also, methods
to obtain such estimators.

PR NPTEL course – p.42/127

• We need ‘good’ estimators.
• We need some criteria for ‘goodness’. Also, methods
to obtain such estimators.
• In this course, we will consider two methods:
Maximum likelihood and Bayesian estimators.

PR NPTEL course – p.43/127

PR NPTEL course – p.44/127

• An estimator, θ̂ of a parameter (vector) θ is said to be
unbiased if E[θ̂] = θ .

PR NPTEL course – p.45/127

• An estimator, θ̂ of a parameter (vector) θ is said to be
unbiased if E[θ̂] = θ .

• The θ̂ is a function of data. Hence the expectation is

with respect to the joint density of (X1 , · · · Xn ), the iid
random variables.

PR NPTEL course – p.46/127

• An estimator, θ̂ of a parameter (vector) θ is said to be
unbiased if E[θ̂] = θ .

• The θ̂ is a function of data. Hence the expectation is

with respect to the joint density of (X1 , · · · Xn ), the iid
random variables.
• Since Xi ∼ f (x | θ), the expectation above needs
value of θ. So, we write

Eθ [θ̂] = θ

PR NPTEL course – p.47/127

• An unbiased estimator, θ̂ satisfies

Eθ [θ̂] = θ
• θ̂ is an unbiased estimator, if for every density in the
class of densities we are interested in (i.e., every
value of the parameter in the parameter space),
expected value of the estimator is the true parameter
value.

PR NPTEL course – p.48/127

• Let f (x | θ) be normal with mean θ and variance unity.
P
Let θ̂n = (1/n) i xi

PR NPTEL course – p.49/127

• Let f (x | θ) be normal with mean θ and variance unity.
P
Let θ̂n = (1/n) i xi
• Then E[θ̂n ] = θ for all n because EXi = θ.

PR NPTEL course – p.50/127

• Let f (x | θ) be normal with mean θ and variance unity.
P
Let θ̂n = (1/n) i xi
• Then E[θ̂n ] = θ for all n because EXi = θ.
• Sample mean is an unbiased estimator of actual
mean.

PR NPTEL course – p.51/127

PR NPTEL course – p.52/127

PR NPTEL course – p.53/127

• Let f (x | θ) be normal with mean θ and variance unity.
P
Let θ̂n = (1/n) i xi
• Then E[θ̂n ] = θ for all n because EXi = θ.
• Sample mean is an unbiased estimator of actual
mean.
• Let θ̂′ (x1 , · · · , xn ) = 0.5(x1 + x2 ).
• This is also an unbiased estimator.
• So is θ̂′′ = x1 .
• Unbiasedness alone is not enough
PR NPTEL course – p.54/127
• One possibility: We can say θ̂ is better than θ̂′ if, ∀θ,

Pθ [−a ≤ (θ̂−θ) ≤ b] ≥ Pθ [−a ≤ (θ̂′ −θ) ≤ b] ∀a, b > 0

(for any fixed sample size)

PR NPTEL course – p.55/127

• One possibility: We can say θ̂ is better than θ̂′ if ∀θ,

Pθ [−a ≤ (θ̂−θ) ≤ b] ≥ Pθ [−a ≤ (θ̂′ −θ) ≤ b] ∀a, b > 0

(for any fixed sample size)
• Difficult to get such estimators.

PR NPTEL course – p.56/127

• A weaker method is: θ̂ is better than θ̂′ if

Eθ [(θ̂ − θ)2 ] ≤ Eθ [(θ̂′ − θ)2 ] ∀θ

PR NPTEL course – p.57/127

• A weaker method is: θ̂ is better than θ̂′ if

Eθ [(θ̂ − θ)2 ] ≤ Eθ [(θ̂′ − θ)2 ] ∀θ

• The mean square error of an estimator is defined by

MSEθ (θ̂) = Eθ [(θ̂ − θ)2 ]

PR NPTEL course – p.58/127

• Lemma:
MSEθ (θ̂) = Vθ (θ̂) + [Bθ (θ̂)]2
where Vθ (θ̂) is the variance given by

Vθ (θ̂) = Eθ [(θ̂ − Eθ [θ̂])2 ]

and Bθ (θ̂) is the bias given by

Bθ (θ̂) = Eθ [θ̂] − θ

PR NPTEL course – p.59/127

• Lemma:
MSEθ (θ̂) = Vθ (θ̂) + [Bθ (θ̂)]2
where Vθ (θ̂) is the variance given by

Vθ (θ̂) = Eθ [(θ̂ − Eθ [θ̂])2 ]

and Bθ (θ̂) is the bias given by

Bθ (θ̂) = Eθ [θ̂] − θ
• For unbiased estimators the variance is the mean
square error (because bias is zero).
PR NPTEL course – p.60/127
• Proof:

MSE(θ̂) = E[(θ̂ − θ)2 ]

PR NPTEL course – p.61/127

• Proof:

MSE(θ̂) = E[(θ̂ − θ)2 ]

= E[{(θ̂ − E[θ̂]) + (E[θ̂] − θ)}2 ]

PR NPTEL course – p.62/127

• Proof:

MSE(θ̂) = E[(θ̂ − θ)2 ]

= E[{(θ̂ − E[θ̂]) + (E[θ̂] − θ)}2 ]
= E[(θ̂ − E[θ̂])2 ] + (E[θ̂] − θ)2 +
h i
2E (θ̂ − E[θ̂])(E[θ̂] − θ)

PR NPTEL course – p.63/127

• Proof:

MSE(θ̂) = E[(θ̂ − θ)2 ]

= E[{(θ̂ − E[θ̂]) + (E[θ̂] − θ)}2 ]
= E[(θ̂ − E[θ̂])2 ] + (E[θ̂] − θ)2 +
h i
2E (θ̂ − E[θ̂])(E[θ̂] − θ)
= V (θ̂) + [B(θ̂)]2 + 2(E[θ̂] − θ)E[(θ̂ − E[θ̂])

PR NPTEL course – p.64/127

• Proof:

MSE(θ̂) = E[(θ̂ − θ)2 ]

= E[{(θ̂ − E[θ̂]) + (E[θ̂] − θ)}2 ]
= E[(θ̂ − E[θ̂])2 ] + (E[θ̂] − θ)2 +
h i
2E (θ̂ − E[θ̂])(E[θ̂] − θ)
= V (θ̂) + [B(θ̂)]2 + 2(E[θ̂] − θ)E[(θ̂ − E[θ̂])
= V (θ̂) + [B(θ̂)]2

PR NPTEL course – p.65/127

• For unbiased estimators, low variance implies low
MSE.

PR NPTEL course – p.66/127

• For unbiased estimators, low variance implies low
MSE.
• Earlier example: When θ̂ is the sample mean,

σ2
Vθ (θ̂n ) =
n

PR NPTEL course – p.67/127

• For unbiased estimators, low variance implies low
MSE.
• Earlier example: When θ̂ is the sample mean,

σ2
Vθ (θ̂n ) =
n
For θ̂n′ = 0.5(x1 + x2 ),
2
σ
Vθ (θ̂n′ ) =
2

PR NPTEL course – p.68/127

• For unbiased estimators, low variance implies low
MSE.
• Earlier example: When θ̂ is the sample mean,

σ2
Vθ (θ̂n ) =
n
For θ̂n′ = 0.5(x1 + x2 ),
2
σ
Vθ (θ̂n′ ) =
2
• Hence θ̂ is better than θ̂′ PR NPTEL course – p.69/127
• So, unbiased estimators with low mean square error
are good.

PR NPTEL course – p.70/127

PR NPTEL course – p.71/127

• So, unbiased estimators with low mean square error
are good.
• For a given family of density functions, θ̂ is said to be
uniformly minimum variance unbiased estimator (UMVUE) if
1. θ̂ is unbiased, and
2. MSEθ (θ̂n ) ≤ MSEθ (θ̂n′ ) ∀n, θ ,
and forall θ̂′ that are unbiased estimators for θ.

PR NPTEL course – p.72/127

PR NPTEL course – p.73/127

• So far, we are looking at figures of merit of estimators
at (all) fixed sample sizes.

PR NPTEL course – p.74/127

• So far, we are looking at figures of merit of estimators
at (all) fixed sample sizes.
• We can also think of asymptotic properties.

PR NPTEL course – p.75/127

PR NPTEL course – p.76/127

• So far, we are looking at figures of merit of estimators
at (all) fixed sample sizes.
• We can also think of asymptotic properties.
• An estimator θ̂ is said to be consistent for θ if
P
θ̂n → θ ∀θ
• For example, the sample mean is a consistent
estimator of population mean (expectation of the
random variable)
(Law of large numbers)

PR NPTEL course – p.77/127

• A consistent estimator need not be unbiased.

PR NPTEL course – p.78/127

• A consistent estimator need not be unbiased.
• Let θ be the mean and let
n
1 X
θ̂n = xi
n+1 i=1

PR NPTEL course – p.79/127

• A consistent estimator need not be unbiased.
• Let θ be the mean and let
n
1 X
θ̂n = xi
n+1 i=1

• This is not an unbiased estimator.

PR NPTEL course – p.80/127

• A consistent estimator need not be unbiased.
• Let θ be the mean and let
n
1 X
θ̂n = xi
n+1 i=1

• This is not an unbiased estimator.

• But we have the following

PR NPTEL course – p.81/127

Ã !2 
n
2 1 X 1
E[(θ̂n − θ) ] = E  (xi − θ) − θ 
n+1 i=1
n+1

PR NPTEL course – p.82/127

Ã !2 
n
2 1 X 1
E[(θ̂n − θ) ] = E  (xi − θ) − θ 
n+1 i=1
n+1
1 2 1 2
= nσ + θ −
(n + 1)2 (n + 1)2

2θ X
E[ (xi − θ)]
(n + 1) 2

PR NPTEL course – p.83/127

Ã !2 
n
2 1 X 1
E[(θ̂n − θ) ] = E  (xi − θ) − θ 
n+1 i=1
n+1
1 2 1 2
= nσ + θ −
(n + 1)2 (n + 1) 2

2θ X
E[ (xi − θ)]
(n + 1) 2

n 2 1 2
= σ + θ
(n + 1)2 (n + 1) 2

PR NPTEL course – p.84/127

• Thus, E[(θ̂n − θ)2 ] → 0 as n → ∞.
• Hence, θ̂ is consistent (though it is biased).

PR NPTEL course – p.85/127

• Maximum Likelihood (ML) estimation is a general
procedure for obtaining consistent estimators.
• It is a parametric method.
• We estimate parameters of a density based on iid
samples.
• For most densities, ML estimates are consistent.

PR NPTEL course – p.86/127

Maximum likelihood estimation

• Let x = {x1 , x2 , · · · , xn } be the samples.

• Likelihood function is defined by
n
Y
L(x, θ) = f (xj |θ)
j=1

PR NPTEL course – p.87/127

Maximum likelihood estimation

• Let x = {x1 , x2 , · · · , xn } be the samples.

• Likelihood function is defined by
n
Y
L(x, θ) = f (xj |θ)
j=1

• If samples are from a discrete random variable, f is

taken to be the mass function. If samples are from a
continuous random variable, then f is the density
function.

PR NPTEL course – p.88/127

Maximum likelihood estimation

• We essentially look at the likelihood function as a

function of θ with the xj being known values (as given
by data).

PR NPTEL course – p.89/127

Maximum likelihood estimation

• We essentially look at the likelihood function as a

function of θ with the xj being known values (as given
by data).
• To emphasize this we write it as L(θ, x) or L(θ | x) or
L(θ | D).
Recall that we denote the data samples by D also.

PR NPTEL course – p.90/127

Maximum likelihood estimation contd..

• The maximum likelihood (ML) estimate of θ is the

value that (globally) maximizes the likelihood function.

PR NPTEL course – p.91/127

Maximum likelihood estimation contd..

• The maximum likelihood (ML) estimate of θ is the

value that (globally) maximizes the likelihood function.
• θ∗ is the MLE for θ if
L(θ∗ | x) ≥ L(θ | x) ∀θ

PR NPTEL course – p.92/127

Maximum likelihood estimation contd..

• The maximum likelihood (ML) estimate of θ is the

value that (globally) maximizes the likelihood function.
• θ∗ is the MLE for θ if
L(θ∗ | x) ≥ L(θ | x) ∀θ
• Finding MLE is an optimization problem.

PR NPTEL course – p.93/127

• For convenience in optimization we often take the log
likelihood given by
n
X
l(θ | x) = log L(θ | x) = log f (xj |θ)
j=1

PR NPTEL course – p.94/127

• For convenience in optimization we often take the log
likelihood given by
n
X
l(θ | x) = log L(θ | x) = log f (xj |θ)
j=1

• Now the ML estimate would be maximizer of the log

likelihood.

PR NPTEL course – p.95/127

• For convenience in optimization we often take the log
likelihood given by
n
X
l(θ | x) = log L(θ | x) = log f (xj |θ)
j=1

• Now the ML estimate would be maximizer of the log

likelihood.
• For many densities we can analytically solve for the
maximizer.

PR NPTEL course – p.96/127

• For convenience in optimization we often take the log
likelihood given by
n
X
l(θ | x) = log L(θ | x) = log f (xj |θ)
j=1

• Now the ML estimate would be maximizer of the log

likelihood.
• For many densities we can analytically solve for the
maximizer.
• In general we can use numerical optimization
techniques.
PR NPTEL course – p.97/127
Example

• Consider one dimensional case.

Let f (x|θ) ∼ N (µ, σ 2 ) with θ1 = µ and θ2 = σ .

PR NPTEL course – p.98/127

Example

• Consider one dimensional case.

Let f (x|θ) ∼ N (µ, σ 2 ) with θ1 = µ and θ2 = σ .
2
(x − θ1 )
µ ¶
1
f (x|θ) = √ exp − 2
θ2 2π 2θ 2

PR NPTEL course – p.99/127

Example

• Consider one dimensional case.

Let f (x|θ) ∼ N (µ, σ 2 ) with θ1 = µ and θ2 = σ .
2
(x − θ1 )
µ ¶
1
f (x|θ) = √ exp − 2
θ2 2π 2θ 2

• Now the likelihood is given by

n 2
(xj − θ1 )
µ ¶
Y 1
L(θ | x) = √ exp − 2
j=1
θ2 2π 2θ 2

PR NPTEL course – p.100/127

• Hence log likelihood would be
n · 2
(xj − θ1 )
X ¸
l(θ | x) = − log(θ2 ) − 0.5 log(2π) − 2
j=1
2θ 2

PR NPTEL course – p.101/127

Example

• Hence log likelihood would be

n · 2
(xj − θ1 )
X ¸
l(θ | x) = − log(θ2 ) − 0.5 log(2π) − 2
j=1
2θ 2
n
X (xj − θ1 )2
= −n log(θ2 ) − 0.5n log(2π) −
j=1
2θ22

PR NPTEL course – p.102/127

Example

• Hence log likelihood would be

n · 2
(xj − θ1 )
X ¸
l(θ | x) = − log(θ2 ) − 0.5 log(2π) − 2
j=1
2θ 2
n
X (xj − θ1 )2
= −n log(θ2 ) − 0.5n log(2π) −
j=1
2θ22

• To maximize log likelihood we equate the partial

derivatives to zero.

PR NPTEL course – p.103/127

• This gives
n
∂l X
= (xj − θ1 ) = 0
∂θ1 j=1
n
∂l n 1 X
= − + 3 (xj − θ1 )2 = 0
∂θ2 θ2 θ2 j=1

PR NPTEL course – p.104/127

• Solving these, we get
n
1 X
θ̂1 = xj
n j=1
n
1 X
θ̂2 = (xj − θ̂1 )2
n j=1

PR NPTEL course – p.105/127

• Solving these, we get
n
1 X
θ̂1 = xj
n j=1
n
1 X
θ̂2 = (xj − θ̂1 )2
n j=1

• These are the ML estimates of mean and variance of

a normal density

PR NPTEL course – p.106/127

• Solving these, we get
n
1 X
θ̂1 = xj
n j=1
n
1 X
θ̂2 = (xj − θ̂1 )2
n j=1

• These are the ML estimates of mean and variance of

a normal density
• ML estimate of variance is not unbiased.
PR NPTEL course – p.107/127
Example: discrete case

• Let X have Bernoulli distribution. That is X takes

values 0 and 1 with probability (1 − p) and p
respectively.

PR NPTEL course – p.108/127

Example: discrete case

• Let X have Bernoulli distribution. That is X takes

values 0 and 1 with probability (1 − p) and p
respectively.
• Then, f (x|p) = px (1 − p)1−x , x ∈ {0, 1}

PR NPTEL course – p.109/127

Example: discrete case

• Let X have Bernoulli distribution. That is X takes

values 0 and 1 with probability (1 − p) and p
respectively.
• Then, f (x|p) = px (1 − p)1−x , x ∈ {0, 1}
• The mass function has only one parameter, namely, p.

PR NPTEL course – p.110/127

Example: discrete case

• Let X have Bernoulli distribution. That is X takes

values 0 and 1 with probability (1 − p) and p
respectively.
• Then, f (x|p) = px (1 − p)1−x , x ∈ {0, 1}
• The mass function has only one parameter, namely, p.
• Note that we must have 0 ≤ p ≤ 1.

PR NPTEL course – p.111/127

• The likelihood function is
n
Y
L(p | x) = pxj (1 − p)1−xj = pnx̄ (1 − p)n−nx̄
j=1

1
Pn
where x̄ = n j=1 xj is the sample mean.

PR NPTEL course – p.112/127

• The likelihood function is
n
Y
L(p | x) = pxj (1 − p)1−xj = pnx̄ (1 − p)n−nx̄
j=1

1
Pn
where x̄ = n j=1 xj is the sample mean.
• The loglikelihood is given by

l(p | x) = nx̄ log p + n(1 − x̄) log(1 − p)

PR NPTEL course – p.113/127

• Differentiating the log likelihood with respect to p and
equating to zero we get

nx̄ n(1 − x̄)

=
p 1−p

PR NPTEL course – p.114/127

• Differentiating the log likelihhod with respect to p and
equating to zero we get

nx̄ n(1 − x̄)

=
p 1−p
which implies
n
1 X
p = x̄ = xj
n j=1

PR NPTEL course – p.115/127

• Differentiating the log likelihood with respect to p and
equating to zero we get

nx̄ n(1 − x̄)

=
p 1−p
which implies
n
1 X
p = x̄ = xj
n j=1

• This is the ML estimate of the parameter p of a

Bernoulli random variable.
• Sample mean is the ML estimator. PR NPTEL course – p.116/127
To Summarize

• To implement Bayes classifier, we need to estimate

densities.

PR NPTEL course – p.117/127

To Summarize

• To implement Bayes classifier, we need to estimate

densities.
• Parametric methods assume that form of density is
known.

PR NPTEL course – p.118/127

To Summarize

• To implement Bayes classifier, we need to estimate

densities.
• Parametric methods assume that form of density is
known.
• Estimate (for a parameter) is a function of (iid) data

PR NPTEL course – p.119/127

To Summarize

• To implement Bayes classifier, we need to estimate

densities.
• Parametric methods assume that form of density is
known.
• Estimate (for a parameter) is a function of (iid) data
• An estimate is unbiased if its expectation is the true
value.

PR NPTEL course – p.120/127

To Summarize

• To implement Bayes classifier, we need to estimate

densities.
• Parametric methods assume that form of density is
known.
• Estimate (for a parameter) is a function of (iid) data
• An estimate is unbiased if its expectation is the true
value.
• The MSE of an unbiased estimator is its variance.

PR NPTEL course – p.121/127

To Summarize

• To implement Bayes classifier, we need to estimate

PR NPTEL course – p.122/127

• Consistent estimators converge to the true value in
probability as sample size goes to infinity

PR NPTEL course – p.123/127

• Consistent estimators converge to the true value in
probability as sample size goes to infinity
• Maximum likelihood estimation is a general procedure
that can find consistent estimators.

PR NPTEL course – p.124/127

• Consistent estimators converge to the true value in
probability as sample size goes to infinity
• Maximum likelihood estimation is a general procedure
that can find consistent estimators.
• MLE is the maximizer of the likelihood function.

PR NPTEL course – p.125/127

PR NPTEL course – p.126/127

PR NPTEL course – p.127/127

2009 - Introductory Time Series With R - Select Solutions - Aug 05
33% (3)
2009 - Introductory Time Series With R - Select Solutions - Aug 05
16 pages
Lecture 7
No ratings yet
Lecture 7
108 pages
Lecture3
No ratings yet
Lecture3
81 pages
Lecture 4
No ratings yet
Lecture 4
128 pages
Lecture 6
No ratings yet
Lecture 6
123 pages
Lecture 10
No ratings yet
Lecture 10
129 pages
Lecture14
No ratings yet
Lecture14
122 pages
Lecture2
No ratings yet
Lecture2
128 pages
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
No ratings yet
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
8 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Estimation and Detection: Lecture 6: The Bayesian Philosophy
No ratings yet
Estimation and Detection: Lecture 6: The Bayesian Philosophy
19 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
densityestimation
No ratings yet
densityestimation
28 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
4.4 Parametric and Non-parametric Estimator
No ratings yet
4.4 Parametric and Non-parametric Estimator
47 pages
Notests PDF
No ratings yet
Notests PDF
153 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
Empirical Finance1
No ratings yet
Empirical Finance1
31 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
No ratings yet
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
31 pages
Questions_for_Unit_4 (2)
No ratings yet
Questions_for_Unit_4 (2)
6 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
58 pages
Lec 6
No ratings yet
Lec 6
14 pages
Statistics
No ratings yet
Statistics
60 pages
Dr. Arslan Shaukat
No ratings yet
Dr. Arslan Shaukat
18 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Multivariate classification
No ratings yet
Multivariate classification
7 pages
Lecture 09
No ratings yet
Lecture 09
32 pages
Densityestimation
No ratings yet
Densityestimation
33 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
16 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
R300 Advanced Econometrics Methods Lecture Slides
No ratings yet
R300 Advanced Econometrics Methods Lecture Slides
362 pages
STA 303 Lec 1
No ratings yet
STA 303 Lec 1
5 pages
Lecture 34
No ratings yet
Lecture 34
135 pages
Numerical Optimization of Likelihoods: Additional Literature For STK2120
No ratings yet
Numerical Optimization of Likelihoods: Additional Literature For STK2120
46 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
Lecture Notes On Bayesian Nonparametrics: Version: May 16, 2014
No ratings yet
Lecture Notes On Bayesian Nonparametrics: Version: May 16, 2014
108 pages
BNP PDF
No ratings yet
BNP PDF
108 pages
Bayesian-inference-slides-2021
No ratings yet
Bayesian-inference-slides-2021
37 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Lec12 PDF
No ratings yet
Lec12 PDF
9 pages
Lec 1
No ratings yet
Lec 1
42 pages
Assignment 10 solution
No ratings yet
Assignment 10 solution
8 pages
STAT2102_Chapter6
No ratings yet
STAT2102_Chapter6
5 pages
Notes6_Classification
No ratings yet
Notes6_Classification
10 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Temario Isl or
No ratings yet
Temario Isl or
15 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
2 Statistical Definitions: 2.1 Probability Density Function
No ratings yet
2 Statistical Definitions: 2.1 Probability Density Function
9 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Stats Lab
No ratings yet
Stats Lab
18 pages
1 - 16 - Practice Test
No ratings yet
1 - 16 - Practice Test
4 pages
It14 Belotti PDF
No ratings yet
It14 Belotti PDF
37 pages
MGT646
No ratings yet
MGT646
8 pages
Section 03.4 Shared Lab
No ratings yet
Section 03.4 Shared Lab
5 pages
Tables Appendix
No ratings yet
Tables Appendix
16 pages
Profile R
No ratings yet
Profile R
22 pages
Polverino Et Al 2023 Machine Learning For Prognostics and Health Management of Industrial Mechanical Systems and
No ratings yet
Polverino Et Al 2023 Machine Learning For Prognostics and Health Management of Industrial Mechanical Systems and
20 pages
Audit Sampling For Tests of Details of Balances
No ratings yet
Audit Sampling For Tests of Details of Balances
44 pages
Intermediate Regression With Statsmodels in Python
No ratings yet
Intermediate Regression With Statsmodels in Python
129 pages
Introduction To Correlation and Regression Analysis PDF
No ratings yet
Introduction To Correlation and Regression Analysis PDF
6 pages
NCSU Course Syllabus - ST 515 - 001 - Experimental Statistics For Engineers I
No ratings yet
NCSU Course Syllabus - ST 515 - 001 - Experimental Statistics For Engineers I
7 pages
Chapter 14, Multiple Regression Using Dummy Variables
No ratings yet
Chapter 14, Multiple Regression Using Dummy Variables
19 pages
Weighted Standard Error
No ratings yet
Weighted Standard Error
9 pages
Econometric Methods For Panel Data
No ratings yet
Econometric Methods For Panel Data
58 pages
InstMan6 10
No ratings yet
InstMan6 10
68 pages
Opm101chapter8 000
No ratings yet
Opm101chapter8 000
43 pages
Chapter11 Slides
No ratings yet
Chapter11 Slides
20 pages
Mlr-Ii Practical 4.3
No ratings yet
Mlr-Ii Practical 4.3
3 pages
Module 34. Analysis of Variance (ANOVA) PDF
No ratings yet
Module 34. Analysis of Variance (ANOVA) PDF
89 pages
Bound Test Yuni
No ratings yet
Bound Test Yuni
3 pages
Choosing Appropriate Descriptive Statistics, Graphs and Statistical Tests
No ratings yet
Choosing Appropriate Descriptive Statistics, Graphs and Statistical Tests
47 pages
Presentation of Research Methodology
No ratings yet
Presentation of Research Methodology
11 pages
Using_the_Arima_model_to_forecast_the_share_of_rai
No ratings yet
Using_the_Arima_model_to_forecast_the_share_of_rai
11 pages
Regression Logistic 4
No ratings yet
Regression Logistic 4
51 pages
UDAU M6 Correlation & Regression
No ratings yet
UDAU M6 Correlation & Regression
26 pages
Introduction To Statistics - 2023-2024
No ratings yet
Introduction To Statistics - 2023-2024
38 pages
2110 WST01-01 IAL Statistics S1 October 2021 PDF
No ratings yet
2110 WST01-01 IAL Statistics S1 October 2021 PDF
24 pages
Ejemplo 2 Regresión Lineal Multiple Desarrollado
No ratings yet
Ejemplo 2 Regresión Lineal Multiple Desarrollado
14 pages