KDE - Direct Plug-In Method
KDE - Direct Plug-In Method
R( K )
hopt = 5
( K ) R( f (2) )n
2
2
5 5 R( f (2) ) R( K ) 4 22 ( K )
AMISE(hopt ) =
4 n4
Where:
- R ( K ) and 22 ( K ) are known constant quantities determined by the selection of the kernel
function (e.g., Gaussian).
(2)
The main problem in using the formula above in practice is that we don’t know the value of R ( f ) :
the integral of the squared second derivative of the underlying probability density function f ( x ) , which
we are trying to estimate.
(2)
How can we overcome this problem? We can estimate R ( f ) using the KDE itself.
Before we go any further, we need to introduce a neat math trick: Assuming that r 0 , the
f ( r ) () → 0 , then the following relation holds:
(2)
So, if we are to apply it to R ( f ) :
(2)
Remember that f ( x ) is the probability density function, so we can express R ( f ) as follows:
1 n ˆ (4)
ˆ 4 = f ( xi )
n i =1
ψr Estimation
Using a KDE estimator with kernel function L(.) and bandwidth g , the fˆ ( x; g ) is defined as follows:
1 n x − Xi
fˆ ( x; g ) =
ng i =1
L(
g
)
1 n x − Xi
fˆ (1) ( x; g ) = 2 L(1) ( )
ng i =1 g
...
1 n (r ) x − X i
fˆ ( r ) ( x; g ) = L ( g )
ng r +1 i =1
1 n n Xi − X j
2 r +1
ˆ r = L( r ) ( )
n g i =1 j =1 g
The next question is: What is g ? It is the kernel function L(.) (and the bandwidth g ). Is it the same
one we use for the final KDE? Not necessarily, as our goal is to minimize the error in the ˆ r estimate.
Under certain regularity assumptions (Wand and Jones (1995)), The asymptotic bias and variance of ˆ r
are obtained, so we can compute the asymptotic mean squared error (AMSE):
L( r ) (0) ( L) r + 2 g 2 2 R( L( r ) ) o 4
AMSE[ˆ r ( g )] = r +1 + 2 + + [ f ( r ) ( x)]2 f ( x)dx − r2
ng 4
2 2 r +1
n g n
k ! L( r ) (0)
g AMSE = r + k +1 −
k ( L)ˆ r + k n
Note that a symmetric kernel function has a positive Kth moment (i.e., k ( L) 0 ) for even K (i.e.,
k {0, 2, 4, 6,8,...} )
- For k = 0 (Silverman method), we start with a known distribution (e.g., Gaussian), calculate ˆ 4
(analytically), and compute the optimal bandwidth.
- For k = 2 (Direct plug-in method), we start with a known distribution (e.g., Gaussian). Then:
o Calculate ˆ 8 (analytically).
o Stage 1:
▪ Compute the optimal bandwidth value for estimating ˆ 6 .
▪ Calculate ˆ 6 .
o Stage 2:
▪ Compute the optimal bandwidth value for estimating ˆ 4 .
▪ Calculate ˆ 4 .
o Calculate the optimal KDE bandwidth.
- For k = 4 (Direct Plug-in method), we start with a known distribution (e.g., Gaussian). Then:
o Calculate ˆ12 (analytically).
o Stage 1:
▪ Compute the optimal bandwidth value for estimating ˆ10 .
▪ Calculate ˆ10 .
o Stage 2:
▪ Compute the optimal bandwidth value for estimating ˆ 8 .
▪ Calculate ˆ 8 .
o Stage 3:
▪ Compute the optimal bandwidth value for estimating ˆ 6 .
▪ Calculate ˆ 6 .
o Stage 4:
▪ Compute the optimal bandwidth value for estimating ˆ 4 .
▪ Calculate ˆ 4 .
o Calculate the optimal KDE bandwidth.
1 ( x − )2
2. Assume a Gaussian underlying distribution (i.e., f ( x) = ( x) = exp − ), then
2 2 2
calculate (analytically) the ˆ 8 .
105
ˆ 8 = ( x) ( x)dx =
(8)
− 32 ˆ 9
3. Calculate the optimal bandwidth for ˆ 6 , g1 :
2 K (6) (0)
g1 = − 9
2 ( K )ˆ 8 n
4. Estimate ˆ 6 :
1 n n Xi − X j
ˆ 6 = 2 7
n g1
K
i =1 j =1
(6)
(
g1
)
2 K (4) (0)
g2 = 7 −
2 ( K )ˆ 6 n
6. Estimate ˆ 4 :
1 n n (4) X i − X j
ˆ 4 = K ( g )
n2 g 25 i =1 j =1 2
R( K )
hDPI = 5
( K )ˆ 4 n
2
2
The Sheather and Jones Direct Plug-in method is popular in practice for a broad set of cases, yielding a
good performance for smooth densities, at least in simulation.
References
- Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC
London.
- W. Zucchini, Applied smoothing techniques, Part 1 Kernel Density Estimation., 2003.
- Byeong U. Park and J. S. Marron. Comparison of Data-Driven Bandwidth Selectors. Journal of the
American Statistical Association Vol. 85, No. 409 (Mar., 1990), pp. 66-72 (7 pages).
- S.J. Sheather and M.C. Jones. A reliable data-based bandwidth selection method for kernel
density estimation. J. Royal Statist. Soc. B, 53:683-690, 1991.