Chapter 4
Chapter 4
Contents
I Basic concepts.
I Point estimation.
I Goodness of fit.
I Distribution of the sample mean.
I Confidence interval for the mean.
Basic concepts
I Objective: To study the characteristics of interest of a population
through the information obtained from a sample.
I We identify the concept of population with that of the random
variable X which is the object of study.
I The law or the distribution of the population is the distribution of
the random variable of interest, X. Such a distribution may be
totally or partially unknown. (For example, we know it is normal,
but we dont know the values of µ and σ, or we know it is Binomial
with p unknown).
I The unknown values that determine the distribution of the
population are the parameters. These are FIXED, NONRANDOM
values of the population.
I The main goal of the statistical inference is to infer the values of the
population parameters or certain characteristics of a random
variable, such as the population mean, based on the information
given by a sample.
Sampling
11,25
7,50
3,75
Sampling 0
!
Population parameter, !
SAMPLE DATA
9,5
10,0
7,5 !
4,5
5,0 3,8
Inference
2,5 1,6 1,5
0,8
0,2
0
Sample parameter, l
Example of sampling and inference
I We select a s.r.s. of size 7 given by:
3.8 9.5 4.8 1.6 0.2 0.8 1.5
I The sample mean of these values is x = 3.171. Then, the relative
error (relative bias) is (4 − 3.171) /4 = 0.207.
I If we add new elements to the previous s.r.s., the sample mean will
change. Indeed, the increase of elements makes the sample mean
converge to the population mean.
4,6 4,6
4,4 4,2 4,4 4,4 4,3 4,3
4,5 4,0 4,1 4,1 4,0 4,2 4,0
3,9
3,6
3,1 3,3
3,0
1,5
0
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Sample size
Example of sampling and inference
I On the other hand, if we select another s.r.s. of size 7, we obtain:
20000
15000
10000
5000
0
1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8
Example of sampling and inference
I Next, we compare the histograms with all the possible values of the
sample mean for samples of size 7 and 17:
SAMPLE MEAN DISTRIBUTION - SAMPLES OF SIZE 7 SAMPLE MEAN DISTRIBUTION - SAMPLES OF SIZE 17
25000 60000
20000
45000
15000
30000
10000
15000
5000
0 0
1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8
Simple random sampling
Conclusions
and
and
" n
# n n
1X 1 X 1 X V [X ]
V X =V Xi = 2 V [Xi ] = 2 V [X ] =
n n n n
i=1 i=1 i=1
p=X
b
p (1 − p)
E [b
p] = p y V [b
p] =
n
so that if the sample size is very large, b
p must be very close to p.
Example
1 0 0 1 1 0 1 0 1 0
for y = 0, 1, . . . , m.
I Remember that Y ∼ B (m, p) means that Y is the sum of m
independent r.v. with a Bernoulli distribution with parameter p, i.e.,
a s.r.s. X1 , . . . , Xm such that Y = X1 + · · · + Xm .
I Consequently, E [Y ] = mp and V [Y ] = mp (1 − p).
I Let us see how to estimate the proportion p in two different
situations.
Binomial distribution
Y Y1 + · · · + Yn
p= =
n×m
b
m
I Besides, due to the properties of the sample mean:
Y mp
E [b
p] = E = =p
m m
and
Y 1 p (1 − p)
V [b
p] = V = 2V Y =
m m n×m
so if the number of replicas of the Bernoulli experiment, m, and/or
the sample size, n, is/are very large, b
p should be very close to p.
Example
I Assume next that Pablo takes a second sample from the variable
X =“Vote for Pablo”, obtaining:
0 0 0 0 1 0 0 1 0 0
λx e −λ
P (X = x) = , for x = 0, 1, 2, . . .
x!
Note that E [X ] = λ and V [X ] = λ.
I Now, for a s.r.s. of size n of the r.v. X , say X1 , . . . , Xn , and since
E [X ] = λ, it is possible to estimate λ as follows:
λ
b=X
I Moreover:
b =V X = λ
h i h i
E λ
b =E X =λ y V λ
n
1 1 1 1
E [X ] =× 1 + × 2 + × 3 + × 4 = 2.5
4 4 4 4
2
V [X ] = E X 2 − E [X ] =
1 1 1 1
= × 12 + × 22 + × 32 + × 42 − 2.52 = 1.25
4 4 4 4
Example
I Therefore:
2.4 − 2.5 X − 2.5 2.6 − 2.5
P 2.4 < X < 2.6 = P < < =
0.1 0.1 0.1
= P (−1 < Z < 1) = P (Z < 1) − P (Z < −1) =
= P (Z < 1) − (1 − P (Z < 1)) =
= 2 × P (Z < 1) − 1 = 2 × 0.8413 − 1 = 0.6826.
and, thus:
σ σ
P X − zα/2 √ < µ < X + zα/2 √ =1−α
n n
I Therefore, a confidence interval for µ at confidence level 1 − α is
given by:
σ σ
x − zα/2 √ , x + zα/2 √
n n
Confidence intervals
5.29 3.66 5.71 6.62 4.30 5.85 6.25 3.40 3.55 5.57
4.60 5.69 5.81 5.71 6.29 5.66 6.19 3.79 4.98 4.84
p = x.
where b
Example