05 Density Estimation
05 Density Estimation
Spring 2023
Recap: Gaussian Densities
• Assume a multi dimensional Gaussian
density for each 𝑃(𝑋|𝐶! )
2
Recap: Applying Bayes Rule
• One way on how to apply Bayes rule in practical
situations:
3
Density Estimation
• In Bayes rule, the probability densities
have to be estimated
4
Histogram Analysis
𝑚
𝑝̂ 𝑥 =
𝑀 ∗ (𝑠𝑖𝑧𝑒 𝑜𝑓 𝑏𝑖𝑛)
5
Histogram Analysis
• Consider 1-D example:
– m is number of data points within the given
range, e.g., 2 < 𝑥 ≤ 3
𝑚
𝑝̂ 2 < 𝑥 ≤ 3 =
" 𝒙
𝒑 𝑀 ∗ (𝑠𝑖𝑧𝑒 𝑜𝑓 𝑏𝑖𝑛) 𝒑 𝒙
0 1 2 3 4 5 6 𝒙 0 1 2 3 4 5 6 𝒙
Data was originally generated
from this density 6
Histogram Analysis
𝚫
!%$
! 𝑝 𝑥 𝑑𝑥 ≈ 𝚫 . 𝑝(𝑥)
𝚫
!"$
𝚫 𝚫
• Probability (generated point 𝜖 𝑥 − , 𝑥 +
$ $
) ≈ 𝚫 . 𝑝(𝑥) ≡ 𝑧
𝒑 𝒚
𝒙
𝒚
Bin size ≡ Δ 7
Histogram Analysis
𝚫
!%$
! 𝑝 𝑥 𝑑𝑥 ≈ 𝚫 . 𝑝(𝑥)
𝚫
!"$
𝚫 𝚫
• Probability (generated point 𝜖 0𝑿 −
𝟐
, 𝑿 + 1) ≈ 𝚫 . 𝑝(𝑥) ≡ 𝑧
𝟐
8
Histogram Analysis
𝑃 𝑘 𝑝𝑜𝑖𝑛𝑡𝑠 𝑓𝑎𝑙𝑙𝑖𝑛𝑔 𝑖𝑛 𝐵𝐼𝑁 𝑜𝑢𝑡 𝑜𝑓 𝑀 𝑝𝑜𝑖𝑛𝑡𝑠
𝑀 5
= 𝑧 (1 − 𝑧)675
𝑘
𝐸 # 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝐵𝐼𝑁 = 𝑀. 𝑧
= 𝑀. 𝑝 𝑥 . ∆
10
Histogram Analysis
• Weak method of estimation
𝒙
0 1 2 3 4 5 6 𝒙 0 1 2 3 4 5 6
11
Naïve Estimator
• Instead of partitioning X, i.e., feature space, into
a number of prespecified ranges, we perform a
similar range analysis for every X
𝒉 𝒉
𝑿− 𝑿 𝑿+
𝟐 𝟐
𝒉 𝒉
#𝑝𝑜𝑖𝑛𝑡𝑠 𝑓𝑎𝑙𝑙𝑖𝑛𝑔 𝑖𝑛 5𝑿 − 𝟐 , 𝑿 + 𝟐 6
𝑃& 𝑋 =
𝑀ℎ
12
Naïve Estimator
• Drawbacks:
13
Kernel Density Estimator
Bump fn.
14
Kernel Density Estimator
• Choose bump function as Gaussian with
standard deviation (bandwidth) h:
7< =
𝑒 =;=
𝜙; 𝑥 =
2𝜋ℎ
𝜙0 𝑥
h: bandwidth
𝒉
15
Kernel Density Estimator
• Choose bump function as Gaussian with
standard deviation (bandwidth) h:
#$ &
𝑒 %"&
𝜙" 𝑥 =
2𝜋ℎ
𝝓𝒉 𝑿 − 𝑿(𝒎)
Bump fn.
16
X(m)
Kernel Density Estimator
𝑓 𝑥
𝑓 𝑥 − 𝑥1
𝑥1
17
Kernel Density Estimator
• Summation of bump functions:
𝑴
𝟏
Q 𝑿 =
𝑷 V 𝝓𝒉 𝑿 − 𝑿(𝒎)
𝑴
𝒎?𝟏
Summation over # of generated points
Estimate of density
18
Kernel Density Estimator
0.75
True density
0.25
1 2
0.75
Density Estimation
0.25
19
Kernel Density Estimator
• 𝝓𝒉 does not have to be Gaussian
𝟏 𝑿
𝝓𝒉 = 𝒈( )
𝒉 𝒉
𝒇 𝒙 𝒙
𝒇
𝒉
-1 1 -h h 21
Kernel Density Estimator
• Naïve estimator is equivalent to a Parzen
window estimator with:
1 1
𝑔 𝑥 =W 1, − ≤ 𝑥 <
2 2
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• In this case:
𝑴
𝟏 𝑿 − 𝑿(𝒎)
@
𝑷 𝑿 = D𝒈 𝑿 − 𝑿(𝒎)
𝑴𝒉 𝒉 𝒈
𝒎"𝟏
𝒉
𝑿(𝒎)
𝒉 𝒉
𝑿(𝒎) − 𝑿(𝒎) +
𝟐 𝟐
22
Kernel Density Estimator
1-D form:
𝑴
𝟏
B 𝑿 =
𝑷 E 𝝓𝒉 𝑿 − 𝑿(𝒎)
𝑴
𝒎"𝟏
𝑴
𝟏 𝑿 − 𝑿(𝒎)
= E𝒈
𝑴𝒉 𝒉
𝒎"𝟏
?
∫>? 𝒈 𝒙 𝒅𝒙 = 𝟏
?
'
∫>? 𝑷 𝒙 𝒅𝒙 = 𝟏
23
Kernel Density Estimator
Multi-dimension Form:
𝑴
𝟏 𝑿 − 𝑿(𝒎)
B 𝑿 =
𝑷 F𝒈
𝑴𝒉 𝑵 𝒉
𝒎&𝟏
,
∫#, 𝒈 𝑿 𝒅𝑿 = 𝟏
Large h Small h
25
How to choose h?
• Too small h à bumpy estimate or non-smooth
26
How to choose h?
True Density
Too small h
Too large h
27
Optimal h
• The optimal H (diagonal bandwidth matrix) can be
approximated as :
𝟏
𝟒 𝑵$𝟒
𝑯𝒊 = 𝝈𝒊 normal reference rule
𝑵+𝟐 𝑴
where
𝜎& = Σ' &,&
Bowman, A.W., and Azzalini, A. (1997), Applied Smoothing Techniques for Data Analysis,
29
London: Oxford University Press [page 32].
Acknowledgment
• These slides have been created relying on
lecture notes of Prof. Dr. Amir Atiya
30