0% found this document useful (0 votes)

91 views5 pages

KDE - Direct Plug-In Method

This paper introduces the Direct Plug-in, a popular KDE bandwidth calculation used for a broad set of cases. For more information or related material, visit us at: https://numxl.com/numxl-pro/ To find this article on our website, please go to: https://numxl.com/blogs/kde-direct-plug-in-method/

Uploaded by

NumXL Pro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views5 pages

KDE - Direct Plug-In Method

Uploaded by

NumXL Pro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Type: Technical Note

KDE – Direct Plug-in Method

A quick recap: In the KDE Optimization Primer, we derived the formula for the optimal bandwidth and
the minimum AMISE, as follows:

R( K )
hopt = 5
 ( K ) R( f (2) )n
2
2

5 5 R( f (2) ) R( K ) 4 22 ( K )
AMISE(hopt ) =
4 n4
Where:

- R ( f (2) ) is defined as follows:


R( f (2) ) = [f
(2)
( x)]2 dx
−

- R ( K ) and  22 ( K ) are known constant quantities determined by the selection of the kernel
function (e.g., Gaussian).
(2)
The main problem in using the formula above in practice is that we don’t know the value of R ( f ) :
the integral of the squared second derivative of the underlying probability density function f ( x ) , which
we are trying to estimate.
(2)
How can we overcome this problem? We can estimate R ( f ) using the KDE itself.

Before we go any further, we need to introduce a neat math trick: Assuming that r  0 , the
f ( r ) () → 0 , then the following relation holds:

[ f ( x)]2 dx = (−1) s  f (2 s ) ( x) f ( x)dx

(s)

(2)
So, if we are to apply it to R ( f ) :

R( f (2) ) =  [ f (2) ( x)]2 dx =  f (4) ( x) f ( x)dx

(2)
Remember that f ( x ) is the probability density function, so we can express R ( f ) as follows:

R( f (2) ) = E[ f (4) ( x)] =  4

KDE - Direct Plug-in Method -1- © Spider Financial Corp

(2)
In effect, we converted R ( f ) into the expectation of the 4-th derivative of f ( x ) , which can be
estimated (i.e., ˆ 4 ) non-parametrically using the sample data:

1 n ˆ (4)
ˆ 4 =  f ( xi )
n i =1

Where fˆ (.) is a data-driven estimator of the f ( x ) fourth derivative.

(4)

ψr Estimation
Using a KDE estimator with kernel function L(.) and bandwidth g , the fˆ ( x; g ) is defined as follows:

1 n x − Xi
fˆ ( x; g ) = 
ng i =1
L(
g
)

1 n x − Xi
fˆ (1) ( x; g ) = 2  L(1) ( )
ng i =1 g

...

1 n (r ) x − X i
fˆ ( r ) ( x; g ) = L ( g )
ng r +1 i =1

Plugging in fˆ ( x; g ) , the expected r-th derivative estimator ˆ 4 ) is expressed as follows:

(r )

1 n n Xi − X j
2 r +1 
ˆ r = L( r ) ( )
n g i =1 j =1 g

The next question is: What is g ? It is the kernel function L(.) (and the bandwidth g ). Is it the same
one we use for the final KDE? Not necessarily, as our goal is to minimize the error in the ˆ r estimate.

Under certain regularity assumptions (Wand and Jones (1995)), The asymptotic bias and variance of ˆ r
are obtained, so we can compute the asymptotic mean squared error (AMSE):

 L( r ) (0)  ( L) r + 2  g 2  2 R( L( r ) ) o 4
AMSE[ˆ r ( g )] =  r +1 + 2 + +   [ f ( r ) ( x)]2 f ( x)dx − r2 
 ng 4 
2 2 r +1
n g n 

And the AMSE optimal bandwidth is:

k ! L( r ) (0)
g AMSE = r + k +1 −
k ( L)ˆ r + k  n

KDE - Direct Plug-in Method -2- © Spider Financial Corp

Where:

- k is the number of stages, provided k ( L)  0 .

Note that a symmetric kernel function has a positive Kth moment (i.e., k ( L)  0 ) for even K (i.e.,
k  {0, 2, 4, 6,8,...} )

- For k = 0 (Silverman method), we start with a known distribution (e.g., Gaussian), calculate ˆ 4
(analytically), and compute the optimal bandwidth.
- For k = 2 (Direct plug-in method), we start with a known distribution (e.g., Gaussian). Then:
o Calculate ˆ 8 (analytically).
o Stage 1:
▪ Compute the optimal bandwidth value for estimating ˆ 6 .
▪ Calculate ˆ 6 .
o Stage 2:
▪ Compute the optimal bandwidth value for estimating ˆ 4 .
▪ Calculate ˆ 4 .
o Calculate the optimal KDE bandwidth.
- For k = 4 (Direct Plug-in method), we start with a known distribution (e.g., Gaussian). Then:
o Calculate ˆ12 (analytically).
o Stage 1:
▪ Compute the optimal bandwidth value for estimating ˆ10 .
▪ Calculate ˆ10 .
o Stage 2:
▪ Compute the optimal bandwidth value for estimating ˆ 8 .
▪ Calculate ˆ 8 .
o Stage 3:
▪ Compute the optimal bandwidth value for estimating ˆ 6 .
▪ Calculate ˆ 6 .
o Stage 4:
▪ Compute the optimal bandwidth value for estimating ˆ 4 .
▪ Calculate ˆ 4 .
o Calculate the optimal KDE bandwidth.

Typically, two stages ( k = 2 ) are considered a good trade-off between bias (mitigated as k increases),
and variance (increases with k ).

Direct Plug-in Method (Sheather & Jones)

This is the method proposed by Sheather and Jones (1991), where they consider L = K and k = 2 ,
yielding what we call the Direct Plug-In (DPI). The algorithm is:

1. Using the sample data, calculate ˆ = min( s, ˆ IQR ) .

1  ( x −  )2 
2. Assume a Gaussian underlying distribution (i.e., f ( x) =  ( x) = exp  −  ), then
2  2 2 
calculate (analytically) the ˆ 8 .

105
ˆ 8 =  ( x) ( x)dx =
(8)

− 32 ˆ 9
3. Calculate the optimal bandwidth for ˆ 6 , g1 :

2 K (6) (0)
g1 = − 9
2 ( K )ˆ 8  n

4. Estimate ˆ 6 :

1 n n Xi − X j
ˆ 6 = 2 7
n g1
 K
i =1 j =1
(6)
(
g1
)

5. Calculate the optimal bandwidth for ˆ 4 , g 2 :

2 K (4) (0)
g2 = 7 −
2 ( K )ˆ 6  n
6. Estimate ˆ 4 :

1 n n (4) X i − X j
ˆ 4 =  K ( g )
n2 g 25 i =1 j =1 2

7. Now, using ˆ 4 as an estimate for R ( f

(2)
):

R( K )
hDPI = 5
 ( K )ˆ 4  n
2
2

Conclusion
In this paper, we assumed that the kernel function K is not only symmetric but also has four (4)
continuous derivatives. The assumption excludes the use of many kernel functions (e.g., uniform,
triangular, bi-weight, tri-weight) but fortunately, the Gaussian meets the conditions, and is often used
with the DPI method.

The Sheather and Jones Direct Plug-in method is popular in practice for a broad set of cases, yielding a
good performance for smooth densities, at least in simulation.

References
- Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC
London.
- W. Zucchini, Applied smoothing techniques, Part 1 Kernel Density Estimation., 2003.
- Byeong U. Park and J. S. Marron. Comparison of Data-Driven Bandwidth Selectors. Journal of the
American Statistical Association Vol. 85, No. 409 (Mar., 1990), pp. 66-72 (7 pages).
- S.J. Sheather and M.C. Jones. A reliable data-based bandwidth selection method for kernel
density estimation. J. Royal Statist. Soc. B, 53:683-690, 1991.

USMLE Epidemiology and Biostatistics
100% (1)
USMLE Epidemiology and Biostatistics
30 pages
David J. Saville, Graham R. Wood (Auth.) - Statistical Methods - A Geometric Primer-Springer-Verlag New York (1996)
No ratings yet
David J. Saville, Graham R. Wood (Auth.) - Statistical Methods - A Geometric Primer-Springer-Verlag New York (1996)
278 pages
SARIMA Modeling & Forecast in Excel
No ratings yet
SARIMA Modeling & Forecast in Excel
2 pages
Introduction To Kernel Smoothing
100% (1)
Introduction To Kernel Smoothing
24 pages
DLMDSAS01 - Advanced Statistics.
100% (1)
DLMDSAS01 - Advanced Statistics.
248 pages
SARIMAX Modeling & Forecast in Excel
100% (1)
SARIMAX Modeling & Forecast in Excel
3 pages
Chapter One
100% (1)
Chapter One
46 pages
Machine Learning: Data Set
100% (1)
Machine Learning: Data Set
52 pages
Data Preparation For Strategy Returns
No ratings yet
Data Preparation For Strategy Returns
6 pages
Global Kinetic Analysis of Complex Mater
No ratings yet
Global Kinetic Analysis of Complex Mater
22 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
9 pages
Case Study: Calculating CAPM Beta in Excel
100% (1)
Case Study: Calculating CAPM Beta in Excel
18 pages
12...... Probability - Standard Distributions
100% (1)
12...... Probability - Standard Distributions
14 pages
Inventory Management Lecture-05
No ratings yet
Inventory Management Lecture-05
18 pages
Review of Kernel Density Estimation
No ratings yet
Review of Kernel Density Estimation
35 pages
ARIMA Modeling & Forecast in Excel
No ratings yet
ARIMA Modeling & Forecast in Excel
2 pages
(L.B. Jeunhomme) Single-Mode Fiber-Optics
No ratings yet
(L.B. Jeunhomme) Single-Mode Fiber-Optics
30 pages
Kernel Density Estimation (KDE) in Excel Tutorial
No ratings yet
Kernel Density Estimation (KDE) in Excel Tutorial
8 pages
Racine - 2007 - Nonparametric Econometrics A Primer
No ratings yet
Racine - 2007 - Nonparametric Econometrics A Primer
88 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
Choosing The Correct Statistical Test Made Easy
100% (1)
Choosing The Correct Statistical Test Made Easy
5 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
Kde Presentation PDF
No ratings yet
Kde Presentation PDF
105 pages
Case Study - A
No ratings yet
Case Study - A
19 pages
Sigma Phase Embrittlement of Stainless Steel in FC PDF
No ratings yet
Sigma Phase Embrittlement of Stainless Steel in FC PDF
23 pages
CpE646 7v3 PDF
No ratings yet
CpE646 7v3 PDF
40 pages
Stat 1
No ratings yet
Stat 1
67 pages
Nonparametric Statistics Epiphany 2024-25
No ratings yet
Nonparametric Statistics Epiphany 2024-25
102 pages
Some Important Sampling Distributions
No ratings yet
Some Important Sampling Distributions
71 pages
Density Estimation
No ratings yet
Density Estimation
17 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
References: (1) Athanasios Papoulis, Probability, Random Variables, and Ed.), 2001 (4 Ed.)
No ratings yet
References: (1) Athanasios Papoulis, Probability, Random Variables, and Ed.), 2001 (4 Ed.)
38 pages
Binomial Distributions For Sample Counts
No ratings yet
Binomial Distributions For Sample Counts
38 pages
Empirical Finance1
No ratings yet
Empirical Finance1
31 pages
Masculinity Ideology and Gender Role Conflict
No ratings yet
Masculinity Ideology and Gender Role Conflict
26 pages
1D Kernel Density Estimation
No ratings yet
1D Kernel Density Estimation
9 pages
TEAA - Memory Based Tecniques
No ratings yet
TEAA - Memory Based Tecniques
23 pages
PR
No ratings yet
PR
52 pages
Kernel Density Estimation
No ratings yet
Kernel Density Estimation
10 pages
X12 ARIMA in NumXL Notes
No ratings yet
X12 ARIMA in NumXL Notes
14 pages
Introduction To Kernel Smoothing
No ratings yet
Introduction To Kernel Smoothing
24 pages
Density Estimation Is A Statistical Technique Used
No ratings yet
Density Estimation Is A Statistical Technique Used
16 pages
Discrete Fourier Transform in Excel Tutorial
No ratings yet
Discrete Fourier Transform in Excel Tutorial
10 pages
Non Parametric Density Estimation
No ratings yet
Non Parametric Density Estimation
4 pages
Tabak Turner
No ratings yet
Tabak Turner
20 pages
Principal Component Analysis Tutorial 101 With NumXL
No ratings yet
Principal Component Analysis Tutorial 101 With NumXL
10 pages
Time Series Smoothing in Excel
No ratings yet
Time Series Smoothing in Excel
11 pages
Technical Note - Autoregressive Model
No ratings yet
Technical Note - Autoregressive Model
12 pages
Simon Sheather 2004 PDF
No ratings yet
Simon Sheather 2004 PDF
10 pages
PSPP - Chapter 3
No ratings yet
PSPP - Chapter 3
22 pages
UNIT2SVMKNN
No ratings yet
UNIT2SVMKNN
31 pages
7.estimation Clustering
No ratings yet
7.estimation Clustering
56 pages
Estimating The Support of A High-Dimensional Distribution
No ratings yet
Estimating The Support of A High-Dimensional Distribution
28 pages
Modern Multivariate Statistical Techniques: - Nonparametric Density Estimation Xi Chen Nov 6
No ratings yet
Modern Multivariate Statistical Techniques: - Nonparametric Density Estimation Xi Chen Nov 6
20 pages
c3 Dist
No ratings yet
c3 Dist
21 pages
Articulo Sheather
No ratings yet
Articulo Sheather
11 pages
Managing Bullying in The Workplace: A Model of Servant Leadership, Employee Resilience and Proactive Personality
No ratings yet
Managing Bullying in The Workplace: A Model of Servant Leadership, Employee Resilience and Proactive Personality
19 pages
Robust Kernel Density Estimation-Kim and Scott
No ratings yet
Robust Kernel Density Estimation-Kim and Scott
37 pages
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
No ratings yet
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
11 pages
Article LR
No ratings yet
Article LR
18 pages
Kernel Density Estimation - Wikipedia
No ratings yet
Kernel Density Estimation - Wikipedia
11 pages
X13ARIMA-SEATS Modeling Part 1 - Seasonal Adjustment
No ratings yet
X13ARIMA-SEATS Modeling Part 1 - Seasonal Adjustment
6 pages
Regression Tutorial 201 With NumXL
No ratings yet
Regression Tutorial 201 With NumXL
12 pages
ARIMA To The Rescue (Excel)
No ratings yet
ARIMA To The Rescue (Excel)
7 pages
Robust Kernel Density Estimation With Median-of-Means Principle-Humbert
No ratings yet
Robust Kernel Density Estimation With Median-of-Means Principle-Humbert
22 pages
X13ARIMA-SEATS Modeling Part 3 - Regression
No ratings yet
X13ARIMA-SEATS Modeling Part 3 - Regression
8 pages
Regression Tutorial 101 With NumXL
No ratings yet
Regression Tutorial 101 With NumXL
8 pages
Chap 4
No ratings yet
Chap 4
21 pages
Exact Amplitude Distributions of Sums of Stochastic Sinusoidals
No ratings yet
Exact Amplitude Distributions of Sums of Stochastic Sinusoidals
14 pages
Time Series Simulation Tutorial in Excel
No ratings yet
Time Series Simulation Tutorial in Excel
5 pages
Mms - H.PDF 2
No ratings yet
Mms - H.PDF 2
10 pages
The Optimal Bandwidth For Kernel Density Estimation of Skewed Distribution: A Case Study On Survival Time Data of Cancer Patients
No ratings yet
The Optimal Bandwidth For Kernel Density Estimation of Skewed Distribution: A Case Study On Survival Time Data of Cancer Patients
9 pages
A Comparative Study of The Rule of Thumb, Umbiased Cross Validation and The Shearther Jones-Direct Plug-In Approaches of Kernel Density Estimation Using Real Life Data
No ratings yet
A Comparative Study of The Rule of Thumb, Umbiased Cross Validation and The Shearther Jones-Direct Plug-In Approaches of Kernel Density Estimation Using Real Life Data
9 pages
Transformations in Density Estimation
No ratings yet
Transformations in Density Estimation
12 pages
Nadaraya-Watson Teoria PDF
No ratings yet
Nadaraya-Watson Teoria PDF
9 pages
Formula 1
No ratings yet
Formula 1
8 pages
Data Preparation For Futures Returns
No ratings yet
Data Preparation For Futures Returns
6 pages
Stepwise Regression Tutorial With NumXL
No ratings yet
Stepwise Regression Tutorial With NumXL
8 pages
Kernel Density Estimation and Its Application
No ratings yet
Kernel Density Estimation and Its Application
8 pages
Lec7 Density PDF
No ratings yet
Lec7 Density PDF
9 pages
Martineau Et Al. (2022)
No ratings yet
Martineau Et Al. (2022)
13 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
AA1 Tema4
No ratings yet
AA1 Tema4
37 pages
WTI Futures Curve Analysis Using PCA (Part I)
No ratings yet
WTI Futures Curve Analysis Using PCA (Part I)
7 pages
Estimation 2
No ratings yet
Estimation 2
20 pages
NumXL VBA SDK - Getting Started
No ratings yet
NumXL VBA SDK - Getting Started
7 pages
MA 20104 Probability and Statistics Assignment No. 3: e M T T e
No ratings yet
MA 20104 Probability and Statistics Assignment No. 3: e M T T e
6 pages
Article Kernal Model
No ratings yet
Article Kernal Model
9 pages
Pohq 4
No ratings yet
Pohq 4
26 pages
US Crude Inventory Modeling in Excel
No ratings yet
US Crude Inventory Modeling in Excel
6 pages
EWMA Tutorial
No ratings yet
EWMA Tutorial
6 pages
Classification and Kernel Density Estimation
No ratings yet
Classification and Kernel Density Estimation
7 pages
Empirical Distribution Function (EDF) in Excel Tutorial
No ratings yet
Empirical Distribution Function (EDF) in Excel Tutorial
6 pages
ML Unit-4
No ratings yet
ML Unit-4
29 pages
Bandwidth Selection: Classical or Plug-In?: Lucent Technologies
No ratings yet
Bandwidth Selection: Classical or Plug-In?: Lucent Technologies
24 pages
Regression Tutorial 202 With NumXL
No ratings yet
Regression Tutorial 202 With NumXL
4 pages
Heatmap Calculation Tutorial Using Kernel Density Estimation (KDE) Algorithm
No ratings yet
Heatmap Calculation Tutorial Using Kernel Density Estimation (KDE) Algorithm
6 pages
NumXL Functions
No ratings yet
NumXL Functions
11 pages
ps7 Sol
No ratings yet
ps7 Sol
7 pages
On Density Estimation
No ratings yet
On Density Estimation
4 pages
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
No ratings yet
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
5 pages
A Stabilized Bandwidth Selection Method For Kernel Smoothing of The Periodogram
No ratings yet
A Stabilized Bandwidth Selection Method For Kernel Smoothing of The Periodogram
12 pages
Duong Hazelton 2003 Jns
No ratings yet
Duong Hazelton 2003 Jns
15 pages
(Paper) Wand, M. P. and Schucany, W. R. (1990) - Gaussian-Based Kernels. Canad. J. Statist. 18 197-204
No ratings yet
(Paper) Wand, M. P. and Schucany, W. R. (1990) - Gaussian-Based Kernels. Canad. J. Statist. 18 197-204
9 pages
18 Mba 14
No ratings yet
18 Mba 14
4 pages
Max Little, Patrick Mcsharry, Irene Moroz and Stephen Roberts
No ratings yet
Max Little, Patrick Mcsharry, Irene Moroz and Stephen Roberts
4 pages
Intro To Kernel Density Estimation
No ratings yet
Intro To Kernel Density Estimation
4 pages
Lecture 4
No ratings yet
Lecture 4
4 pages
Lecture 12
No ratings yet
Lecture 12
4 pages
What Can Numxl Do For Me?: 1. General Statistics
No ratings yet
What Can Numxl Do For Me?: 1. General Statistics
5 pages
4.06 The Normal Probability Distribution - Making Probability Statements and The 1-2-3 Std-Rule
No ratings yet
4.06 The Normal Probability Distribution - Making Probability Statements and The 1-2-3 Std-Rule
2 pages
Discussion Assignment Unit 1
No ratings yet
Discussion Assignment Unit 1
2 pages
Homework Nonprm Solution
No ratings yet
Homework Nonprm Solution
2 pages
Non-Parametric Methods Using Kernel Density Estimation
No ratings yet
Non-Parametric Methods Using Kernel Density Estimation
1 page
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet

KDE - Direct Plug-In Method

Uploaded by

KDE - Direct Plug-In Method

Uploaded by

Type: Technical Note

KDE – Direct Plug-in Method

- R ( f (2) ) is defined as follows:

[ f ( x)]2 dx = (−1) s  f (2 s ) ( x) f ( x)dx

R( f (2) ) =  [ f (2) ( x)]2 dx =  f (4) ( x) f ( x)dx

R( f (2) ) = E[ f (4) ( x)] =  4

KDE - Direct Plug-in Method -1- © Spider Financial Corp

Where fˆ (.) is a data-driven estimator of the f ( x ) fourth derivative.

Plugging in fˆ ( x; g ) , the expected r-th derivative estimator ˆ 4 ) is expressed as follows:

And the AMSE optimal bandwidth is:

KDE - Direct Plug-in Method -2- © Spider Financial Corp

- k is the number of stages, provided k ( L)  0 .

KDE - Direct Plug-in Method -3- © Spider Financial Corp

Direct Plug-in Method (Sheather & Jones)

1. Using the sample data, calculate ˆ = min( s, ˆ IQR ) .

5. Calculate the optimal bandwidth for ˆ 4 , g 2 :

7. Now, using ˆ 4 as an estimate for R ( f

KDE - Direct Plug-in Method -4- © Spider Financial Corp

KDE - Direct Plug-in Method -5- © Spider Financial Corp

You might also like