0% found this document useful (0 votes)

24 views29 pages

05 Density Estimation

Uploaded by

Mostafa Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views29 pages

05 Density Estimation

Uploaded by

Mostafa Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Pattern Classification

05. Density Estimation

AbdElMoniem Bayoumi, PhD

Spring 2023
Recap: Gaussian Densities
• Assume a multi dimensional Gaussian
density for each 𝑃(𝑋|𝐶! )

• Features may be independent (or

conditionally independent), i.e.,
independent Gaussians

• Features may be dependent in other cases

2
Recap: Applying Bayes Rule
• One way on how to apply Bayes rule in practical
situations:

– Obtain the training set 𝑋 1 , 𝑋 2 ⋯ 𝑋(𝑀)

– Assume a multi-dimensional Gaussian density for each

class, i.e., 𝑃(𝑋|𝐶! )

– To obtain the form of each density we need 𝜇! and Σ! for

each class 𝑖 à estimate from training set

– Estimate the a priori probabilities 𝑃(𝐶! ) from the training

set, i.e., according to the frequencies of each class

– Using the obtained estimates, plug in Bayes rule to

obtain the classification rule

3
Density Estimation
• In Bayes rule, the probability densities
have to be estimated

• One way is to assume that they are

multivariate Gaussian and estimate 𝜇 & Σ of
these distributions

• Estimate the densities from data

4
Histogram Analysis
𝑚
𝑝̂ 𝑥 =
𝑀 ∗ (𝑠𝑖𝑧𝑒 𝑜𝑓 𝑏𝑖𝑛)

• m is the number of data points falling

within a given range, i.e., histogram bin

• M is the total number of points (that

belongs to the same class)

• Size of bin: size of the histogram bin

5
Histogram Analysis
• Consider 1-D example:
– m is number of data points within the given
range, e.g., 2 < 𝑥 ≤ 3

𝑚
𝑝̂ 2 < 𝑥 ≤ 3 =
" 𝒙
𝒑 𝑀 ∗ (𝑠𝑖𝑧𝑒 𝑜𝑓 𝑏𝑖𝑛) 𝒑 𝒙

0 1 2 3 4 5 6 𝒙 0 1 2 3 4 5 6 𝒙
Data was originally generated
from this density 6
Histogram Analysis
𝚫
!%$
! 𝑝 𝑥 𝑑𝑥 ≈ 𝚫 . 𝑝(𝑥)
𝚫
!"$

𝚫 𝚫
• Probability (generated point 𝜖 𝑥 − , 𝑥 +
$ $
) ≈ 𝚫 . 𝑝(𝑥) ≡ 𝑧

𝒑 𝒚

𝒙
𝒚
Bin size ≡ Δ 7
Histogram Analysis
𝚫
!%$
! 𝑝 𝑥 𝑑𝑥 ≈ 𝚫 . 𝑝(𝑥)
𝚫
!"$

𝚫 𝚫
• Probability (generated point 𝜖 0𝑿 −
𝟐
, 𝑿 + 1) ≈ 𝚫 . 𝑝(𝑥) ≡ 𝑧
𝟐

• Assume we draw a number M of points according to p(x)

à binomial distribution

• Binomial distribution with probability 𝑧 for number of

points falling in BIN

8
Histogram Analysis
𝑃 𝑘 𝑝𝑜𝑖𝑛𝑡𝑠 𝑓𝑎𝑙𝑙𝑖𝑛𝑔 𝑖𝑛 𝐵𝐼𝑁 𝑜𝑢𝑡 𝑜𝑓 𝑀 𝑝𝑜𝑖𝑛𝑡𝑠
𝑀 5
= 𝑧 (1 − 𝑧)675
𝑘

𝐸 # 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝐵𝐼𝑁 = 𝑀. 𝑧
= 𝑀. 𝑝 𝑥 . ∆

• Example: flip a coin 10 times

10 8
𝑃 8 𝐻𝑒𝑎𝑑𝑠 = 𝑝 (1 − 𝑝)9:78
8
𝐸 # 𝐻𝑒𝑎𝑑𝑠 = 𝑝. 𝑀 = 0.5 ∗ 10 = 5
𝑝 ≡ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 ℎ𝑒𝑎𝑑
9
Histogram Analysis
𝐸 # 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝐵𝐼𝑁 = 𝑀. 𝑧
= 𝑀. 𝑝 𝑥 . ∆

• If k points fall in the histogram range then

assuming:
𝑘 ≈ 𝑀. 𝑝 𝑥 . ∆

• Then, estimate of p(x) is:

𝑘 Recall:
𝑚
𝑝 𝑥 = 𝑝̂ 𝑥 =
𝑀 ∗ (𝑠𝑖𝑧𝑒 𝑜𝑓 𝑏𝑖𝑛)
𝑀∆

10
Histogram Analysis
• Weak method of estimation

• Discontinuity of these density estimates,

even though the true densities are
assumed to be smooth
" 𝒙
𝒑 𝒑 𝒙

𝒙
0 1 2 3 4 5 6 𝒙 0 1 2 3 4 5 6
11
Naïve Estimator
• Instead of partitioning X, i.e., feature space, into
a number of prespecified ranges, we perform a
similar range analysis for every X

𝒉 𝒉
𝑿− 𝑿 𝑿+
𝟐 𝟐

𝒉 𝒉
#𝑝𝑜𝑖𝑛𝑡𝑠 𝑓𝑎𝑙𝑙𝑖𝑛𝑔 𝑖𝑛 5𝑿 − 𝟐 , 𝑿 + 𝟐 6
𝑃& 𝑋 =
𝑀ℎ

12
Naïve Estimator
• Drawbacks:

– Discontinuity of the density estimates

– All data points are weighted equally

regardless of their distance to the
estimation point, i.e, X

13
Kernel Density Estimator

• a.k.a. Parzen Window Density Estimator

• Choose a bump function

Bump fn.

• Summation of bump functions:

Estimate of density

14
Kernel Density Estimator
• Choose bump function as Gaussian with
standard deviation (bandwidth) h:

7< =
𝑒 =;=
𝜙; 𝑥 =
2𝜋ℎ

𝜙0 𝑥

h: bandwidth

𝒉
15
Kernel Density Estimator
• Choose bump function as Gaussian with
standard deviation (bandwidth) h:
#$ &
𝑒 %"&
𝜙" 𝑥 =
2𝜋ℎ

• X(m) are the points generated from the

density P(X) that we want to estimate

𝝓𝒉 𝑿 − 𝑿(𝒎)

Bump fn.

16
X(m)
Kernel Density Estimator

𝑓 𝑥

𝑓 𝑥 − 𝑥1

𝑥1
17
Kernel Density Estimator
• Summation of bump functions:

𝑴
𝟏
Q 𝑿 =
𝑷 V 𝝓𝒉 𝑿 − 𝑿(𝒎)
𝑴
𝒎?𝟏
Summation over # of generated points

Estimate of density

18
Kernel Density Estimator

0.75
True density
0.25

1 2

0.75
Density Estimation
0.25

19
Kernel Density Estimator
• 𝝓𝒉 does not have to be Gaussian
𝟏 𝑿
𝝓𝒉 = 𝒈( )
𝒉 𝒉

where 𝒈 = is any suitable bump function that

integrates to 1:
A
∫@A 𝒈 𝒙 𝒅𝒙 = 𝟏
e.g.
#$ & #$ &
𝑒 % 𝑒 %"&
g x = → 𝜙" 𝑥 =
2𝜋 2𝜋ℎ
20
Kernel Density Estimator
• 𝝓𝒉 does not have to be Gaussian
𝟏 𝑿
𝝓𝒉 = 𝒈( )
𝒉 𝒉

where 𝒈 = is any suitable bump function that

integrates to 1:
A
∫@A 𝒈 𝒙 𝒅𝒙 = 𝟏

𝒇 𝒙 𝒙
𝒇
𝒉

-1 1 -h h 21
Kernel Density Estimator
• Naïve estimator is equivalent to a Parzen
window estimator with:
1 1
𝑔 𝑥 =W 1, − ≤ 𝑥 <
2 2
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• In this case:
𝑴
𝟏 𝑿 − 𝑿(𝒎)
@
𝑷 𝑿 = D𝒈 𝑿 − 𝑿(𝒎)
𝑴𝒉 𝒉 𝒈
𝒎"𝟏
𝒉

𝑿(𝒎)
𝒉 𝒉
𝑿(𝒎) − 𝑿(𝒎) +
𝟐 𝟐
22
Kernel Density Estimator
1-D form:
𝑴
𝟏
B 𝑿 =
𝑷 E 𝝓𝒉 𝑿 − 𝑿(𝒎)
𝑴
𝒎"𝟏
𝑴
𝟏 𝑿 − 𝑿(𝒎)
= E𝒈
𝑴𝒉 𝒉
𝒎"𝟏

?
∫>? 𝒈 𝒙 𝒅𝒙 = 𝟏

?
'
∫>? 𝑷 𝒙 𝒅𝒙 = 𝟏

23
Kernel Density Estimator
Multi-dimension Form:
𝑴
𝟏 𝑿 − 𝑿(𝒎)
B 𝑿 =
𝑷 F𝒈
𝑴𝒉 𝑵 𝒉
𝒎&𝟏

,
∫#, 𝒈 𝑿 𝒅𝑿 = 𝟏

For example: multi-dimension independent

Gaussian density:
* $'&
# ∑'()
𝑒 %
𝑔 𝑋 =
(2𝜋)./%
diagonal bandwidth matrix
24
How to choose h?

Large h Small h

25
How to choose h?
• Too small h à bumpy estimate or non-smooth

• Too large h à the estimate could be too

smooth that essential details of the density
will be lost or smoothed out

26
How to choose h?

True Density

Too small h

Too large h

27
Optimal h
• The optimal H (diagonal bandwidth matrix) can be
approximated as :
𝟏
𝟒 𝑵$𝟒
𝑯𝒊 = 𝝈𝒊 normal reference rule
𝑵+𝟐 𝑴
where
𝜎& = Σ' &,&

• Σ' is the estimated covariance matrix, i.e.,

,
1
Σ' = - (𝑋 𝑚 − 𝜇)(𝑋
̂ ̂ -
𝑚 − 𝜇)
𝑀 𝑵
)*+
𝟏
𝒉𝒐𝒑𝒕 = D 𝑯𝒊
./
Σ' &,& ≡ 𝑖 diagonal element of Σ' 𝑵
𝒊J𝟏
𝑁 ≡ dimensions
For multi-variate normal kernel & diagonal bandwidth matrix

Bowman, A.W., and Azzalini, A. (1997), Applied Smoothing Techniques for Data Analysis,
29
London: Oxford University Press [page 32].
Acknowledgment
• These slides have been created relying on
lecture notes of Prof. Dr. Amir Atiya

(Bernard. W. Silverman) Density Estimation For Sta
No ratings yet
(Bernard. W. Silverman) Density Estimation For Sta
92 pages
Review of Kernel Density Estimation
No ratings yet
Review of Kernel Density Estimation
35 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Densityestimation
No ratings yet
Densityestimation
28 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
Mean-Shift Tracking: R.Collins, CSE, PSU CSE598G Spring 2006
No ratings yet
Mean-Shift Tracking: R.Collins, CSE, PSU CSE598G Spring 2006
93 pages
CpE646 7v3 PDF
No ratings yet
CpE646 7v3 PDF
40 pages
Non-Parametric Methods
No ratings yet
Non-Parametric Methods
51 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
Racine - 2007 - Nonparametric Econometrics A Primer
No ratings yet
Racine - 2007 - Nonparametric Econometrics A Primer
88 pages
Histogram: Nonparametric Kernel Density Estimation
No ratings yet
Histogram: Nonparametric Kernel Density Estimation
19 pages
Parameter Estimation - PR
No ratings yet
Parameter Estimation - PR
66 pages
Simon Sheather 2004 PDF
No ratings yet
Simon Sheather 2004 PDF
10 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
Empirical Finance1
No ratings yet
Empirical Finance1
31 pages
Modern Multivariate Statistical Techniques: - Nonparametric Density Estimation Xi Chen Nov 6
No ratings yet
Modern Multivariate Statistical Techniques: - Nonparametric Density Estimation Xi Chen Nov 6
20 pages
Non Parametric Density Estimation
No ratings yet
Non Parametric Density Estimation
4 pages
CrimeStatChapter 8
No ratings yet
CrimeStatChapter 8
43 pages
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
No ratings yet
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
8 pages
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
No ratings yet
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
11 pages
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
No ratings yet
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
7 pages
Articulo Sheather
No ratings yet
Articulo Sheather
11 pages
Nonparametric Methods: Jason Corso
No ratings yet
Nonparametric Methods: Jason Corso
49 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
13 Density Estimation Note
No ratings yet
13 Density Estimation Note
48 pages
Tabak Turner
No ratings yet
Tabak Turner
20 pages
U4 ProbabilityDensityEstimation
No ratings yet
U4 ProbabilityDensityEstimation
6 pages
TEAA - Memory Based Tecniques
No ratings yet
TEAA - Memory Based Tecniques
23 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
Slides3part1 mrbm2324
No ratings yet
Slides3part1 mrbm2324
29 pages
PR January20 04 PDF
No ratings yet
PR January20 04 PDF
40 pages
Density Estimation Is A Statistical Technique Used
No ratings yet
Density Estimation Is A Statistical Technique Used
16 pages
Paper For Getdist in Parameter Constraints
No ratings yet
Paper For Getdist in Parameter Constraints
14 pages
UNIT2SVMKNN
No ratings yet
UNIT2SVMKNN
31 pages
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
No ratings yet
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
5 pages
Intro To Kernel Density Estimation
No ratings yet
Intro To Kernel Density Estimation
4 pages
Kernel Density Estimation - Wikipedia
No ratings yet
Kernel Density Estimation - Wikipedia
11 pages
On Density Estimation
No ratings yet
On Density Estimation
4 pages
04.05-Histograms-and-Binnings - Ipynb - Colaboratory
No ratings yet
04.05-Histograms-and-Binnings - Ipynb - Colaboratory
7 pages
Density Estimation
No ratings yet
Density Estimation
17 pages
HISTOGRAMS
No ratings yet
HISTOGRAMS
5 pages
Aula 4
No ratings yet
Aula 4
15 pages
Pa 01 Density Estimation
No ratings yet
Pa 01 Density Estimation
25 pages
Parzen Windowing
No ratings yet
Parzen Windowing
10 pages
09 ML Nonparametric Machine Learning
No ratings yet
09 ML Nonparametric Machine Learning
19 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
Nonparametric Statistics Epiphany 2024-25
No ratings yet
Nonparametric Statistics Epiphany 2024-25
102 pages
MT2023 Sol
No ratings yet
MT2023 Sol
8 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Kernel Density Estimation
No ratings yet
Kernel Density Estimation
10 pages
Non-Parametric Methods Using Kernel Density Estimation
No ratings yet
Non-Parametric Methods Using Kernel Density Estimation
1 page
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Pattern Recognition 21BR551 MODULE 03 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 03 NOTES
16 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Chap 4
No ratings yet
Chap 4
21 pages
ML Unit-4
No ratings yet
ML Unit-4
29 pages

05 Density Estimation

Uploaded by

05 Density Estimation

Uploaded by

Pattern Classification

05. Density Estimation

AbdElMoniem Bayoumi, PhD

• Features may be independent (or

• Features may be dependent in other cases

– Obtain the training set 𝑋 1 , 𝑋 2 ⋯ 𝑋(𝑀)

– Assume a multi-dimensional Gaussian density for each

– To obtain the form of each density we need 𝜇! and Σ! for

– Estimate the a priori probabilities 𝑃(𝐶! ) from the training

– Using the obtained estimates, plug in Bayes rule to

• One way is to assume that they are

• Estimate the densities from data

• m is the number of data points falling

• M is the total number of points (that

• Size of bin: size of the histogram bin

• Assume we draw a number M of points according to p(x)

• Binomial distribution with probability 𝑧 for number of

• Example: flip a coin 10 times

• If k points fall in the histogram range then

• Then, estimate of p(x) is:

• Discontinuity of these density estimates,

– Discontinuity of the density estimates

– All data points are weighted equally

• a.k.a. Parzen Window Density Estimator

• Summation of bump functions:

• X(m) are the points generated from the

where 𝒈 = is any suitable bump function that

where 𝒈 = is any suitable bump function that

For example: multi-dimension independent

• Too large h à the estimate could be too

• Σ' is the estimated covariance matrix, i.e.,

You might also like