0% found this document useful (0 votes)
38 views

Histogram: Nonparametric Kernel Density Estimation

The histogram is a nonparametric density estimate constructed by dividing the range into bins and counting the number of observations in each bin. It assigns each observation in a bin an equal density estimate equal to the relative frequency of observations in that bin divided by the bin width. The histogram depends on the choice of binwidth and origin.

Uploaded by

sabeeh iqbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Histogram: Nonparametric Kernel Density Estimation

The histogram is a nonparametric density estimate constructed by dividing the range into bins and counting the number of observations in each bin. It assigns each observation in a bin an equal density estimate equal to the relative frequency of observations in that bin divided by the bin width. The histogram depends on the choice of binwidth and origin.

Uploaded by

sabeeh iqbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Histogram

Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric kernel density estimation

Tine Buch-Kromann

Nonparametric kernel density estimation Histogram


Construction

X1 , ..., Xn iid r.v. with (unknown) density, f .


Aim: Estimate the density and display it graphically.

Construction:
I Divide the range into bins

Bj = [x0 + (j − 1)h, x0 + jh), j ∈Z

with origin x0 and binwidth h.


I Count the observations in each Bj (=: nj )
nj
I Normalize to 1: fj = nh (relative frequencies, divided by h).
I Draw bars with height fj for bin Bj .

Nonparametric kernel density estimation Histogram


Formula

Formula of the histogram:


n
1 XX
fˆh (x) = 1(Xi ∈Bj ) 1(x∈Bj )
nh
i=1 j

Note: Denote by mj the center of the bin Bj . The histogram


assigns each x in Bj = [mj − h2 , mj + h2 ) the same estimate, fˆh (mj )
for f .

Nonparametric kernel density estimation Histogram


Derivation
Motivation of the histogram:
The probability of an observation X will fall into the bin
Bj = [mj − h2 , mj + h2 ) is
Z
P(X ∈ Bj ) = f (u)du
Bj
≈ f (mj ) · h

Approximate by the relative frequency of observations in the


interval:
1
P(X ∈ Bj ) ≈ #{Xi ∈ Bj }
n
Combining this, we get
1
fˆh (mj ) = #{Xi ∈ Bj }
nh

Nonparametric kernel density estimation Histogram


Binwidth
The histogram fˆh (mj ) depends on the binwidth h and the origin x0 .

The effect of the choice of binwidth is displayed in the four


histograms:

Nonparametric kernel density estimation Histogram


Statistical properties (Asymptotic)
Statistical properties of the histogram as an estimator of the
unknown density.

Let X1 , ..., Xn ∼ f . We have


n
1 XX
fˆh (x) = 1(Xi ∈Bj ) 1(x∈Bj )
nh
i=1 j

Consistency:
P
Is fˆh (x) a consistent estimator of f (x), ie. fˆh (x) −→ f (x)?

Suppose the origin x0 = 0. We want to estimate the density at


x ∈ Bj = [(j − 1)h, jh)
n
1 X
fˆh (x) = 1(Xi ∈Bj )
nh
i=1

Nonparametric kernel density estimation Histogram


Bias and Variance

Bias
E[fˆh (x) − f (x)] ≈ f 0 (mj ) · (mj − x)
Note: The bias is increasing in the slope of f (mj ) and the bias is 0 if x = mj .

Variance
1
V[fˆh (x)] ≈ f (x)
nh
Note: The variance is proportional to f (x) and decreases when nh increases.

Bias increases when h increases and variance decreases when h in-


creases. i.e. we have to find a compromise between bias and variance
to find an optimal h.

Nonparametric kernel density estimation Histogram


Mean Square Error (MSE)

Mean Square Error

MSE[fˆh (x)] = E[fˆh (x) − f (x)]2


= Variance + Bias 2 (general result)
1 £ 0 ¤2
≈ f (x) + f (mj ) (mj − x)2
nh
Note: The histogram converges in mean square to f(x) if h → 0 and nh → ∞.
That means more and more observations and smaller and smaller binwidth, but not
too fast.

Convergence in mean square implies convergence i probability:


fˆh (x) is a consistent estimator of f (x).

Nonparametric kernel density estimation Histogram


Bias, variance and MSE for a histogram

Squared bias: Thin solid line.


Variance: Dashed line.
MSE: Thick line.

Nonparametric kernel density estimation Histogram


Mean Integrated Squared Error (MISE)
MSE measures the accuracy of fˆh (x) as an estimator of f in a
single point. But we want a global quality measure: MISE
·Z ∞ ³ ´2 ¸
MISE(fˆh ) = E fˆh (x) − f (x) dx
−∞
Z ∞ ·³ ´2 ¸
= ˆ
E fh (x) − f (x) dx
−∞
Z ∞ h i
= MSE fˆh (x) dx
−∞
..
.
1 h2
≈ + ||f 0 ||22
nh 12
= AMISE(fˆh )
R∞
where ||f 0 ||22 = −∞ f
0 (x)2 dx

Nonparametric kernel density estimation Histogram


Optimal Binwidth

Criterion for selecting an optimal binwidth h:


Select h that minimizes AMISE.

∂AMISE(fˆh ) 1 1
= − 2 + h||f 0 ||22 = 0
∂h nh 6
Hence µ ¶1/3
6
h0 = ∼ n−1/3
n||f 0 ||22

Nonparametric kernel density estimation Histogram


Rule-of-thumb binwidth

Problem: f is unknown, so we cannot calculate ||f 0 ||22 !!!

Solution: Assume that f follows a special distribution, ex.


standard normal distribution, then:
1
||f 0 ||22 = √
4 π

Therefore we get a rule-of-thumb binwidth:


à !1/3
6
h0 = ≈ 3.5n−1/3
n 4√1 π

Nonparametric kernel density estimation Histogram


Origin

The histogram depends on the origin

Nonparametric kernel density estimation Histogram


Drawbacks of the histogram

I Constant over interval (step function)


I Results depend on origin
I Binwidth choice
I Slow rate of convergence.

Solution to the dependence on the origin x0 :

Averaged Shifted Histogram (ASH)

Nonparametric kernel density estimation Histogram


Averaged shifted histogram (idea)
ASH is obtained by averaging over histograms correspondig to dif-
ferent origins.

It seems to correspond to a smaller binwidth than the histogram


from which it is constructed. But it is not an ordinary histogram
with smaller binwidth.
Nonparametric kernel density estimation Histogram
Averaged shifted histogram

Histogram with origin x0 = 0, and bins

Bj = [(j − 1)h, jh), j ∈Z

Generate M − 1 new bin grids by shifting each Bj by the amount


kh/M to the right
·µ ¶ µ ¶ ¶
k k
Bjk = j −1− h, j + h , k ∈ {1, ..., M − 1}
M M

Calculate a histogram for each bin grid


 
n
X X
1
fˆh,k (x) =  1(Xi ∈Bjk ) 1(x∈Bjk ) 
nh
i=1 j

Nonparametric kernel density estimation Histogram


Averaged shifted histogram

Compute an average over these estimates


 
M−1
X X n X
1 1
fˆh (x) =  1(Xi ∈Bjk ) 1(x∈Bjk ) 
M nh
k=0 i=1 j
 
n
X M−1
X X
1  1
= 1(Xi ∈Bjk ) 1(x∈Bjk ) 
n Mh
i=1 k=0 j

Note: As M → ∞, ASH does not depend on the origin ie.


step function → continuous function.

Motivation for kernel density estimation.

Nonparametric kernel density estimation Histogram


Summary (1)
I The formula of the histogram with binwidth h and origin x0 :
n
1 XX
fˆh (x) = 1(Xi ∈Bj ) 1(x∈Bj )
nh
i=1 j

where Bj = [x0 + (j − 1)h, x0 + jh) and j ∈ Z.


I Bias
E[fˆh (x) − f (x)] ≈ f 0 (mj ) · (mj − x)

I Variance
1
V[fˆh (x)] ≈ f (x)
nh

I The asymptotic MISE


1 h2
AMISE = + ||f 0 ||22
nh 12
Nonparametric kernel density estimation Histogram
Summary (2)

I The optimal binwidth h0 that minimizes AMISE


µ ¶1/3
6
h0 = ∼ n−1/3
n||f 0 ||22

I The optimal binwidth h0 that minimizes AMISE for N(0,1)


(Rule-of-thumb)
h0 ≈ 3.5n−1/3

I The averaged shifted histogram (ASH)


 
n
X M−1
XX
1 1
fˆh (x) =  1(Xi ∈Bjk ) 1(x∈Bjk ) 
n Mh
i=1 k=0 j

Nonparametric kernel density estimation Histogram

You might also like