0% found this document useful (0 votes)

38 views

Histogram: Nonparametric Kernel Density Estimation

The histogram is a nonparametric density estimate constructed by dividing the range into bins and counting the number of observations in each bin. It assigns each observation in a bin an equal density estimate equal to the relative frequency of observations in that bin divided by the bin width. The histogram depends on the choice of binwidth and origin.

Uploaded by

sabeeh iqbal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Histogram: Nonparametric Kernel Density Estimation

Uploaded by

sabeeh iqbal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Histogram

Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric kernel density estimation

Tine Buch-Kromann

Nonparametric kernel density estimation Histogram

Construction

X1 , ..., Xn iid r.v. with (unknown) density, f .

Aim: Estimate the density and display it graphically.

Construction:
I Divide the range into bins

Bj = [x0 + (j − 1)h, x0 + jh), j ∈Z

with origin x0 and binwidth h.

I Count the observations in each Bj (=: nj )
nj
I Normalize to 1: fj = nh (relative frequencies, divided by h).
I Draw bars with height fj for bin Bj .

Nonparametric kernel density estimation Histogram

Formula

Formula of the histogram:

n
1 XX
fˆh (x) = 1(Xi ∈Bj ) 1(x∈Bj )
nh
i=1 j

Note: Denote by mj the center of the bin Bj . The histogram

assigns each x in Bj = [mj − h2 , mj + h2 ) the same estimate, fˆh (mj )
for f .

Nonparametric kernel density estimation Histogram

Derivation
Motivation of the histogram:
The probability of an observation X will fall into the bin
Bj = [mj − h2 , mj + h2 ) is
Z
P(X ∈ Bj ) = f (u)du
Bj
≈ f (mj ) · h

Approximate by the relative frequency of observations in the

interval:
1
P(X ∈ Bj ) ≈ #{Xi ∈ Bj }
n
Combining this, we get
1
fˆh (mj ) = #{Xi ∈ Bj }
nh

Nonparametric kernel density estimation Histogram

Binwidth
The histogram fˆh (mj ) depends on the binwidth h and the origin x0 .

The effect of the choice of binwidth is displayed in the four

histograms:

Nonparametric kernel density estimation Histogram

Statistical properties (Asymptotic)
Statistical properties of the histogram as an estimator of the
unknown density.

Let X1 , ..., Xn ∼ f . We have

n
1 XX
fˆh (x) = 1(Xi ∈Bj ) 1(x∈Bj )
nh
i=1 j

Consistency:
P
Is fˆh (x) a consistent estimator of f (x), ie. fˆh (x) −→ f (x)?

Suppose the origin x0 = 0. We want to estimate the density at

x ∈ Bj = [(j − 1)h, jh)
n
1 X
fˆh (x) = 1(Xi ∈Bj )
nh
i=1

Nonparametric kernel density estimation Histogram

Bias and Variance

Bias
E[fˆh (x) − f (x)] ≈ f 0 (mj ) · (mj − x)
Note: The bias is increasing in the slope of f (mj ) and the bias is 0 if x = mj .

Variance
1
V[fˆh (x)] ≈ f (x)
nh
Note: The variance is proportional to f (x) and decreases when nh increases.

Bias increases when h increases and variance decreases when h in-

creases. i.e. we have to find a compromise between bias and variance
to find an optimal h.

Nonparametric kernel density estimation Histogram

Mean Square Error (MSE)

Mean Square Error

MSE[fˆh (x)] = E[fˆh (x) − f (x)]2

= Variance + Bias 2 (general result)
1 £ 0 ¤2
≈ f (x) + f (mj ) (mj − x)2
nh
Note: The histogram converges in mean square to f(x) if h → 0 and nh → ∞.
That means more and more observations and smaller and smaller binwidth, but not
too fast.

Convergence in mean square implies convergence i probability:

fˆh (x) is a consistent estimator of f (x).

Nonparametric kernel density estimation Histogram

Bias, variance and MSE for a histogram

Squared bias: Thin solid line.

Variance: Dashed line.
MSE: Thick line.

Nonparametric kernel density estimation Histogram

Mean Integrated Squared Error (MISE)
MSE measures the accuracy of fˆh (x) as an estimator of f in a
single point. But we want a global quality measure: MISE
·Z ∞ ³ ´2 ¸
MISE(fˆh ) = E fˆh (x) − f (x) dx
−∞
Z ∞ ·³ ´2 ¸
= ˆ
E fh (x) − f (x) dx
−∞
Z ∞ h i
= MSE fˆh (x) dx
−∞
..
.
1 h2
≈ + ||f 0 ||22
nh 12
= AMISE(fˆh )
R∞
where ||f 0 ||22 = −∞ f
0 (x)2 dx

Nonparametric kernel density estimation Histogram

Optimal Binwidth

Criterion for selecting an optimal binwidth h:

Select h that minimizes AMISE.

∂AMISE(fˆh ) 1 1
= − 2 + h||f 0 ||22 = 0
∂h nh 6
Hence µ ¶1/3
6
h0 = ∼ n−1/3
n||f 0 ||22

Nonparametric kernel density estimation Histogram

Rule-of-thumb binwidth

Problem: f is unknown, so we cannot calculate ||f 0 ||22 !!!

Solution: Assume that f follows a special distribution, ex.

standard normal distribution, then:
1
||f 0 ||22 = √
4 π

Therefore we get a rule-of-thumb binwidth:

Ã !1/3
6
h0 = ≈ 3.5n−1/3
n 4√1 π

Nonparametric kernel density estimation Histogram

Origin

The histogram depends on the origin

Nonparametric kernel density estimation Histogram

Drawbacks of the histogram

I Constant over interval (step function)

I Results depend on origin
I Binwidth choice
I Slow rate of convergence.

Solution to the dependence on the origin x0 :

Averaged Shifted Histogram (ASH)

Nonparametric kernel density estimation Histogram

Averaged shifted histogram (idea)
ASH is obtained by averaging over histograms correspondig to dif-
ferent origins.

It seems to correspond to a smaller binwidth than the histogram

from which it is constructed. But it is not an ordinary histogram
with smaller binwidth.
Nonparametric kernel density estimation Histogram
Averaged shifted histogram

Histogram with origin x0 = 0, and bins

Bj = [(j − 1)h, jh), j ∈Z

Generate M − 1 new bin grids by shifting each Bj by the amount

kh/M to the right
·µ ¶ µ ¶ ¶
k k
Bjk = j −1− h, j + h , k ∈ {1, ..., M − 1}
M M

Calculate a histogram for each bin grid

 
n
X X
1
fˆh,k (x) =  1(Xi ∈Bjk ) 1(x∈Bjk ) 
nh
i=1 j

Nonparametric kernel density estimation Histogram

Averaged shifted histogram

Compute an average over these estimates

 
M−1
X X n X
1 1
fˆh (x) =  1(Xi ∈Bjk ) 1(x∈Bjk ) 
M nh
k=0 i=1 j
 
n
X M−1
X X
1  1
= 1(Xi ∈Bjk ) 1(x∈Bjk ) 
n Mh
i=1 k=0 j

Note: As M → ∞, ASH does not depend on the origin ie.

step function → continuous function.

Motivation for kernel density estimation.

Nonparametric kernel density estimation Histogram

Summary (1)
I The formula of the histogram with binwidth h and origin x0 :
n
1 XX
fˆh (x) = 1(Xi ∈Bj ) 1(x∈Bj )
nh
i=1 j

where Bj = [x0 + (j − 1)h, x0 + jh) and j ∈ Z.

I Bias
E[fˆh (x) − f (x)] ≈ f 0 (mj ) · (mj − x)

I Variance
1
V[fˆh (x)] ≈ f (x)
nh

I The asymptotic MISE

1 h2
AMISE = + ||f 0 ||22
nh 12
Nonparametric kernel density estimation Histogram
Summary (2)

I The optimal binwidth h0 that minimizes AMISE

µ ¶1/3
6
h0 = ∼ n−1/3
n||f 0 ||22

I The optimal binwidth h0 that minimizes AMISE for N(0,1)

(Rule-of-thumb)
h0 ≈ 3.5n−1/3

I The averaged shifted histogram (ASH)

 
n
X M−1
XX
1 1
fˆh (x) =  1(Xi ∈Bjk ) 1(x∈Bjk ) 
n Mh
i=1 k=0 j

Nonparametric kernel density estimation Histogram

(Bernard. W. Silverman) Density Estimation For Sta
No ratings yet
(Bernard. W. Silverman) Density Estimation For Sta
92 pages
Non-Parametric Methods Using Kernel Density Estimation
No ratings yet
Non-Parametric Methods Using Kernel Density Estimation
1 page
Kernel Density Estimation (KDE) in Excel Tutorial
No ratings yet
Kernel Density Estimation (KDE) in Excel Tutorial
8 pages
Airflow Quantity Measurements
100% (1)
Airflow Quantity Measurements
16 pages
Center, Spread, and Shape of Distributions - Level 3-4
100% (1)
Center, Spread, and Shape of Distributions - Level 3-4
16 pages
Non Parametric Density Estimation
No ratings yet
Non Parametric Density Estimation
4 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
Review of Kernel Density Estimation
No ratings yet
Review of Kernel Density Estimation
35 pages
Parameter Estimation - PR
No ratings yet
Parameter Estimation - PR
66 pages
Nonparametric Methods: Jason Corso
No ratings yet
Nonparametric Methods: Jason Corso
49 pages
Articulo Sheather
No ratings yet
Articulo Sheather
11 pages
U4 ProbabilityDensityEstimation
No ratings yet
U4 ProbabilityDensityEstimation
6 pages
Chapter One
100% (1)
Chapter One
46 pages
Density Estimation
No ratings yet
Density Estimation
17 pages
Non-Parametric Methods
No ratings yet
Non-Parametric Methods
51 pages
Kernel Density Estimation
No ratings yet
Kernel Density Estimation
10 pages
Simon Sheather 2004 PDF
No ratings yet
Simon Sheather 2004 PDF
10 pages
Tabak-Turner
No ratings yet
Tabak-Turner
20 pages
Modern Multivariate Statistical Techniques: - Nonparametric Density Estimation Xi Chen Nov 6
No ratings yet
Modern Multivariate Statistical Techniques: - Nonparametric Density Estimation Xi Chen Nov 6
20 pages
On density estimation
No ratings yet
On density estimation
4 pages
Empirical Finance1
No ratings yet
Empirical Finance1
31 pages
TEAA - Memory Based Tecniques
No ratings yet
TEAA - Memory Based Tecniques
23 pages
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
No ratings yet
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
5 pages
slides3part1-mrbm2324
No ratings yet
slides3part1-mrbm2324
29 pages
05 Density Estimation
No ratings yet
05 Density Estimation
29 pages
Kernel Density Estimation - Wikipedia
No ratings yet
Kernel Density Estimation - Wikipedia
11 pages
13 Density Estimation Note
No ratings yet
13 Density Estimation Note
48 pages
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
No ratings yet
Getdist: Kernel Density Estimation: Url: Http://Cosmologist - Info
11 pages
Racine - 2007 - Nonparametric Econometrics A Primer
No ratings yet
Racine - 2007 - Nonparametric Econometrics A Primer
88 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
AMC Technical Brief 4 (Kernel Density Estimation Using Kernel - Xla)
No ratings yet
AMC Technical Brief 4 (Kernel Density Estimation Using Kernel - Xla)
2 pages
Kernel Density Estimation and Its Application
No ratings yet
Kernel Density Estimation and Its Application
8 pages
Lecture 12
No ratings yet
Lecture 12
4 pages
HISTOGRAMS
No ratings yet
HISTOGRAMS
5 pages
Histogram Density Estimation
No ratings yet
Histogram Density Estimation
17 pages
CE-613 - DOC - 02 Descriptive Stat, Frequency Plot
No ratings yet
CE-613 - DOC - 02 Descriptive Stat, Frequency Plot
62 pages
Green 1988
No ratings yet
Green 1988
3 pages
CrimeStatChapter 8
No ratings yet
CrimeStatChapter 8
43 pages
Chap 4
No ratings yet
Chap 4
21 pages
Histogram
No ratings yet
Histogram
7 pages
Transformations in Density Estimation
No ratings yet
Transformations in Density Estimation
12 pages
TP_stat_inf_103957
No ratings yet
TP_stat_inf_103957
32 pages
Estimation
No ratings yet
Estimation
25 pages
Robust Kernel Density Estimation with Median-of-Means principle-Humbert
No ratings yet
Robust Kernel Density Estimation with Median-of-Means principle-Humbert
22 pages
Intro To Kernel Density Estimation
No ratings yet
Intro To Kernel Density Estimation
4 pages
Densityestimation
No ratings yet
Densityestimation
33 pages
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
No ratings yet
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
7 pages
Lecture 19-NonParametricDensity
No ratings yet
Lecture 19-NonParametricDensity
18 pages
densityestimation
No ratings yet
densityestimation
28 pages
Robust Kernel Density Estimation-Kim And Scott
No ratings yet
Robust Kernel Density Estimation-Kim And Scott
37 pages
Norway04 Nonparametric
No ratings yet
Norway04 Nonparametric
32 pages
Wickham Stati
No ratings yet
Wickham Stati
12 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Kde Slides
No ratings yet
Kde Slides
29 pages
Estimation Bertinoro09 Cristiano Porciani 1
No ratings yet
Estimation Bertinoro09 Cristiano Porciani 1
42 pages
Article LR
No ratings yet
Article LR
18 pages
Estimator Properties
No ratings yet
Estimator Properties
17 pages
The Optimal Bandwidth For Kernel Density Estimation of Skewed Distribution: A Case Study On Survival Time Data of Cancer Patients
No ratings yet
The Optimal Bandwidth For Kernel Density Estimation of Skewed Distribution: A Case Study On Survival Time Data of Cancer Patients
9 pages
Chapter_02_3dd163f1e86b5d97eecec4e61b4af8cc
No ratings yet
Chapter_02_3dd163f1e86b5d97eecec4e61b4af8cc
46 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Graphical Presentation of Data
No ratings yet
Graphical Presentation of Data
4 pages
4024 w16 QP 12 PDF
No ratings yet
4024 w16 QP 12 PDF
20 pages
Opencv
100% (1)
Opencv
1,312 pages
MBA 105 Statistical Techniques
100% (1)
MBA 105 Statistical Techniques
107 pages
1 Frequency Distribution
No ratings yet
1 Frequency Distribution
33 pages
IE4102 Lecture3
No ratings yet
IE4102 Lecture3
32 pages
SB Assignment
No ratings yet
SB Assignment
13 pages
data_analysis__visualisations_in_excel_printable
No ratings yet
data_analysis__visualisations_in_excel_printable
39 pages
Good Hope School - 12 17 2A Ch.6 More About Statistical Diagrams CQ
No ratings yet
Good Hope School - 12 17 2A Ch.6 More About Statistical Diagrams CQ
24 pages
devish all unit
No ratings yet
devish all unit
42 pages
Data Mining: Exploring Data Data Mining: Exploring Data: Lecture Notes For Chapter 3 Lecture Notes For Chapter 3
No ratings yet
Data Mining: Exploring Data Data Mining: Exploring Data: Lecture Notes For Chapter 3 Lecture Notes For Chapter 3
34 pages
Maths Booklet Bio and Chem
No ratings yet
Maths Booklet Bio and Chem
52 pages
IGCSE Maths CE 0580 0980 SB Quick Revision Guide
No ratings yet
IGCSE Maths CE 0580 0980 SB Quick Revision Guide
19 pages
Unit1-Data Science
No ratings yet
Unit1-Data Science
77 pages
SFB Module I 2019
No ratings yet
SFB Module I 2019
37 pages
3.data Summarizing and Presentation PDF
No ratings yet
3.data Summarizing and Presentation PDF
34 pages
Task Force 1996
No ratings yet
Task Force 1996
28 pages
Managerial Statistics
100% (1)
Managerial Statistics
12 pages
Uji Normalitas
No ratings yet
Uji Normalitas
11 pages
Illustrative Problems
No ratings yet
Illustrative Problems
123 pages
Chapter 3 Data Presentation
No ratings yet
Chapter 3 Data Presentation
37 pages
Agriculture Project Guide Form 3 Project(3)
No ratings yet
Agriculture Project Guide Form 3 Project(3)
14 pages
Behavioral Statistics: Chapter 2 - Describing Data With Tables and Graphs
No ratings yet
Behavioral Statistics: Chapter 2 - Describing Data With Tables and Graphs
47 pages
Den Clue
No ratings yet
Den Clue
16 pages
Iii Las 10 1
No ratings yet
Iii Las 10 1
11 pages
4.1.2.A CandyStatistics 2021 - Covid
No ratings yet
4.1.2.A CandyStatistics 2021 - Covid
4 pages
Max Like Method
No ratings yet
Max Like Method
38 pages
Geolog6.6 Determin Tutorial
No ratings yet
Geolog6.6 Determin Tutorial
124 pages

Histogram: Nonparametric Kernel Density Estimation

Uploaded by

Histogram: Nonparametric Kernel Density Estimation

Uploaded by

Histogram

Nonparametric kernel density estimation

Nonparametric kernel density estimation Histogram

X1 , ..., Xn iid r.v. with (unknown) density, f .

Bj = [x0 + (j − 1)h, x0 + jh), j ∈Z

with origin x0 and binwidth h.

Nonparametric kernel density estimation Histogram

Formula of the histogram:

Note: Denote by mj the center of the bin Bj . The histogram

Nonparametric kernel density estimation Histogram

Approximate by the relative frequency of observations in the

Nonparametric kernel density estimation Histogram

The effect of the choice of binwidth is displayed in the four

Nonparametric kernel density estimation Histogram

Let X1 , ..., Xn ∼ f . We have

Suppose the origin x0 = 0. We want to estimate the density at

Nonparametric kernel density estimation Histogram

Bias increases when h increases and variance decreases when h in-

Nonparametric kernel density estimation Histogram

Mean Square Error

MSE[fˆh (x)] = E[fˆh (x) − f (x)]2

Convergence in mean square implies convergence i probability:

Nonparametric kernel density estimation Histogram

Squared bias: Thin solid line.

Nonparametric kernel density estimation Histogram

Nonparametric kernel density estimation Histogram

Criterion for selecting an optimal binwidth h:

Nonparametric kernel density estimation Histogram

Problem: f is unknown, so we cannot calculate ||f 0 ||22 !!!

Solution: Assume that f follows a special distribution, ex.

Therefore we get a rule-of-thumb binwidth:

Nonparametric kernel density estimation Histogram

The histogram depends on the origin

Nonparametric kernel density estimation Histogram

I Constant over interval (step function)

Solution to the dependence on the origin x0 :

Averaged Shifted Histogram (ASH)

Nonparametric kernel density estimation Histogram

It seems to correspond to a smaller binwidth than the histogram

Histogram with origin x0 = 0, and bins

Bj = [(j − 1)h, jh), j ∈Z

Generate M − 1 new bin grids by shifting each Bj by the amount

Calculate a histogram for each bin grid

Nonparametric kernel density estimation Histogram

Compute an average over these estimates

Note: As M → ∞, ASH does not depend on the origin ie.

Motivation for kernel density estimation.

Nonparametric kernel density estimation Histogram

where Bj = [x0 + (j − 1)h, x0 + jh) and j ∈ Z.

I The asymptotic MISE

I The optimal binwidth h0 that minimizes AMISE

I The optimal binwidth h0 that minimizes AMISE for N(0,1)

I The averaged shifted histogram (ASH)

Nonparametric kernel density estimation Histogram

You might also like