Normal Statistics Estimation
Normal Statistics Estimation
Let us consider a Matlab example based on the dataset of body temperature measurements
of 130 individuals from the article [1]. The dataset can be downloaded from the journal’s
website. This dataset was derived from the article [2]. First of all, if we use ’dfittool’ to fit a
normal distribution to this data we get a pretty good approximation, see figure 5.1.
1
0.6 body temperature 0.9 body temperature
normal fit
normal fit
0.8
0.5
0.7
Cumulative probability
0.4 0.6
Density
0.5
0.3
0.4
0.2 0.3
0.2
0.1
0.1
0 0
96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Data Data
Figure 5.1: Fitting a body temperature dataset. (a) Histogram of the data and p.d.f. of fitted
normal distribution; (b) Empirical c.d.f. and c.d.f. of fitted normal distribution.
The tool also outputs the following MLEstimates µ̂ and α̂ of parameters µ, α of normal
distribution:
Parameter Estimate Std. Err.
mu 98.2492 0.0643044
sigma 0.733183 0.0457347.
27
Also, if our dataset vector name is ’normtemp’ then using the matlab function ’normfit’ by
typing ’[mu,sigma,muint,sigmaint]=normfit(normtemp)’ outputs the following:
The last two intervals here are 95% confidence intervals for parameters µ and α. This means
that not only we are able to estimate the parameters of normal distribution using MLE but
also to garantee with confidence 95% that the ’true’ unknown parameters of the distribution
belong to these confidence intervals. How this is done is the topic of this lecture. Notice
that conventional ’normal’ temperature 98.6 does not fall into the estimated 95% confidence
interval [98.122, 98.376].
Distribution of the estimates of parameters of normal distribution.
Let us consider a sample
X1 , . . . , Xn � N(µ, α 2 )
from normal distribution with mean µ and variance α 2 . MLE gave us the following estimates
of µ and α 2 - µ̂ = X̄ and α̂ 2 = X̄ 2 − (X̄)2 . The question is: how close are these estimates to
actual values of the unknown parameters µ and α 2 ? By LLN we know that these estimates
converge to µ and α 2 ,
X̄ � µ, X̄ 2 − (X̄)2 � α 2 , n � →,
but we will try to describe precisely how close X̄ and X̄ 2 − (X̄)2 are to µ and α 2 . We will
start by studying the following question:
¯ X̄ 2 − (X)
What is the joint distribution of (X, ¯ 2 ) when X1 , . . . , Xn are i.i.d from N(0, 1)?
A similar question for a sample from a general normal distribution N (µ, α 2 ) can be reduced
to this one by renormalizing Zi = (Xi − µ)/α. We will need the following definition.
Definition. If X1 , . . . , Xn are i.i.d. standard normal then the distribution of
X12 + . . . + Xn2
28
⎞
� ⎞ 1 �
⎞
�
Y1 �
n
· · · �1n X1
� .
� .
⎜
Y
=
�
.
⎝
= V X
=
�
... .
⎝
�
.
⎜ � ..
⎜
.
? .
⎝
.
v1 = ≥ , . . . , ≥
n n
and let the remaining rows be any vectors such that the matrix V defines orthogonal trans
formation. This can be done since the length of the first row vector |v1 | = 1, and we can
simply choose the rows v2 , . . . , vn to be any orthogonal basis in the hyperplane orthogonal
to vector v1 .
Let us discuss some properties of this particular transformation. First of all, we showed
above that Y1 , . . . , Yn are also i.i.d. standard normal. Because of the particular choice of the
first row v1 in V, the first r.v.
1 1 ≥
¯
Y1 = ≥ X1 + . . . + ≥ Xn = nX
n n
and, therefore,
1
X̄ = ≥ Y1 . (5.0.1)
n
Next, n times sample variance can be written as
⎟
1 ⎛2
2
n(X̄ 2 − (X̄) ) = X12
+...+ Xn2
− ≥ (X1 + . . . + Xn )
n
2 2 2
= X 1 + . . . + Xn − Y 1 .
Equations (5.0.1) and (5.0.2) show that sample ≥ ¯ mean and sample variance are independent
since Y1 and (Y2 , . . . , Yn ) are independent, nX = Y1 has standard normal distribution and
n(X̄ 2 − (X̄)2 ) has �2n−1 distribution since Y2 , . . . , Yn are independent standard normal.
Let us write down the implications of this result for a general normal distribution:
X1 , . . . , Xn � N(µ, α 2 ).
29
In this case, we know that
X1 − µ Xn − µ
Z1 = , · · · , Zn = � N(0, 1)
α α
are independent standard normal. Theorem applied to Z1 , . . . , Zn gives that
n ≥
≥ ≥ 1� Xi − µ n(X̄ − µ)
nZ̄ = n = � N(0, 1)
n i=1 α α
and
⎟ 1 �⎟ X − µ ⎛ 2 ⎟ 1 � X − µ ⎛ 2 ⎛
n(Z¯2 − (Z̄)2 ) = n
i i
−
n α n α
n ⎟
1 � Xi − µ 1 � Xi − µ 2
⎛
= n −
n i=1 α n α
X̄ 2 − (X̄)2
= n 2
� �2n−1 .
α
We proved that MLE µ̂ = X̄ and α̂ 2 = X̄ 2 − (X̄)2 are independent and
�
n(µ̂−µ) ˆ2
n�
�
� N(0, 1), �2
� �2n−1 .
30
This definition means that we can garantee with probability/confidence � that our
unknown parameter lies within the interval [S1 , S2 ]. We will now show how in the case of a
normal distribution N(µ, α 2 ) we can construct confidence intervals for unknown µ and α 2 .
Let us recall that in the last lecture we proved that if
then ≥
n(µ̂ − µ) nα̂ 2
A= � N(0, 1) and B = 2 � �2n−1
α α
and the random variables A and B are independent. If we recall the definition of �2
distribution, this means that we can represent A and B as
0.4
0.25
0.2
0.15
1−�
0.1 2
1−�
2
0.05
0
0 c1 5 10 c2 15 20 25
First, let us consider p.d.f. of �2n−1 distribution (see figure 5.2) and choose points c1 and
c2 so that the area in each tail is (1 − �)/2. Then the area between c1 and c2 is � which
means that
P(c1 � B � c2 ) = �.
Therefore, we can ’garantee’ with probability � that
nα̂ 2
c1 � � c2 .
α2
31
Solving this for α 2 gives
nα̂ 2 nα̂ 2
� α2 � .
c2 c1
This precisely means that the interval
⎠ nα̂ 2 nα̂ 2 �
,
c2 c1
Definition. If Y0 , Y1 , . . . , Yn are i.i.d. standard normal then the distribution of the ran
dom variable
Y0
�
1
n
(Y12 + . . . + Yn2 )
is called (Student) tn -distribution with n degrees of freedom.
We will find the p.d.f. of this distribution in the following lectures together with p.d.f. of
2
� -distribution and some others. At this point we only note that this distribution does not
depend on any parameters besides degrees of freedom n and, therefore, it can be tabulated.
Consider the following expression:
A Y1
� =� � tn−1
1 1
n−1
B n−1
(Y22 +...+ Yn2 )
which, by definition, has tn−1 -distribution with n − 1 degrees of freedom. On the other hand,
� ≥
A ≥ (µ̂ − µ) � 1 nα̂ 2 n−1
� = n 2
= (µ̂ − µ).
1
B α n−1 α α̂
n−1
If we now look at the p.d.f. of tn−1 distribution (see figure 5.3) and choose the constants
−c and c so that the area in each tail is (1 − �)/2, (the constant is the same on each side
because the distribution is symmetric) we get that with probability �,
≥
n−1
−c � (µ̂ − µ) � c
α̂
and solving this for µ, we get the confidence interval
α̂ α̂
µ̂ − c ≥ � µ � µ̂ + c ≥ .
n−1 n−1
0.86, 1.53, 1.57, 1.81, 0.99, 1.09, 1.29, 1.78, 1.29, 1.58.
32
Tails of t2n−1 -distribution.
0.4
0.3
0.2
1−� 1−�
2 2
0.1
0
−6 −4 −c 0 c 4 6
Let us choose confidence level � = 95% = 0.95. We have to find c1 , c2 and c as explained
above. Using the table for t9 -distribution we need to find c such that
t9 (−→, c) = 0.975
which gives us c = 2.262. To find c1 and c2 we have to use the �29 -distribution table so that
33
These confidence intervals may not look impressive but the sample size is very small here,
n = 10.
References.
[1] Allen L .Shoemaker (1996), ”What’s Normal? - Temperature, Gender, and Heart
Rate”. Journal of Statistics Education, v.4, n.2.
[2] Mackowiak, P. A., Wasserman, S. S., and Levine, M. M. (1992), ”A Critical Appraisal
of 98.6 Degrees F, the Upper Limit of the Normal Body Temperature, and Other Legacies
of Carl Reinhold August Wunderlich”. Journal of the American Medical Association, 268,
1578-1580.
34