0% found this document useful (0 votes)
22 views45 pages

Chapter 4

Uploaded by

maariavidaal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views45 pages

Chapter 4

Uploaded by

maariavidaal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Statistics I

Chapter 5: Introduction to statistical inference


Chapter 5. Introduction to statistical inference

Contents

I Basic concepts.
I Point estimation.
I Goodness of fit.
I Distribution of the sample mean.
I Confidence interval for the mean.
Basic concepts
I Objective: To study the characteristics of interest of a population
through the information obtained from a sample.
I We identify the concept of population with that of the random
variable X which is the object of study.
I The law or the distribution of the population is the distribution of
the random variable of interest, X. Such a distribution may be
totally or partially unknown. (For example, we know it is normal,
but we dont know the values of µ and σ, or we know it is Binomial
with p unknown).
I The unknown values that determine the distribution of the
population are the parameters. These are FIXED, NONRANDOM
values of the population.
I The main goal of the statistical inference is to infer the values of the
population parameters or certain characteristics of a random
variable, such as the population mean, based on the information
given by a sample.
Sampling

I Sample: finite subset of a population. The number of elements in


the sample is the sample size.
I Most of the time it is impossible to work with all the elements of a
population:
I The elements may exist conceptually but not in reality (the
population of defective items produced by a machine during its life
time).
I It may be impossible for economic reasons to study the whole
population.
I It may be impossible due to time constraints to work with the whole
population and the population may change with time (electoral
surveys).
I The study implies the destruction of the element (the study of the
average lifetime of some light bulbs, the study of the service life of a
semiconductor. . . ).
Simple random sample

I A simple random sample should be used when the probability


distribution of the elements of the population are homogeneous with
respect to the variable of interest, that is, whenever we have no
additional information about the population prior to the sampling.
I A simple random sample must be chosen so that:
I each element of the population has the same probability of being
selected (assures the sample is representative of the whole
population),
I Sampling must be with replacement to make sure the population
remains unchanged for all the selections (needed to simplify things:
if the population size N is large with respect to the sample size n,
then sampling with replacement is not needed).
Simple random sample

I Let X be a random variable with distribution F . A simple random


sample of size n is a set of n v.a. X1 , . . . , Xn such that:
I All X1 , . . . , Xn have the same distribution F (Xi ∼ F , ∀i).
I X1 , . . . , Xn are independents.

I Each specific value x1 , . . . , xn of a s.r.s. is called observed sample.


I A statistic is a real function of the s.r.s. X1 , . . . , Xn . Therefore, a
statistic is a random variable (unlike a parameter which is a fixed
number, inherent to a population).
Use of a simple random sample

I Assume that the value of E [X ] for a r.v. X is unknown. To


estimate the value of E [X ], it is usual to use a s.r.s. to obtain the
sample mean:
n
1X
X = Xi
n
i=1

Note that X is a random variable.


I Now, for
Pnan observed sample x1 , . . . , xn , we obtain a numeric value
x = n1 i=1 xi .
I OBS: X 6= x.
I We will see later why X is a good estimate of E [X ]. Firt, let’s see
an example.
Example of sampling and inference
I Suppose the population has 24 members and the random variable of
interest is X = “Time to complete a medical service00 .
I The population values (in minutes) are:

5.1 1 0.9 3.8 10.2 2.1 9.5 4.5


1 2.2 1.5 4.8 1.6 8.8 4.3 1
9 5.1 0.2 2.3 0.8 7.8 7.7 1.5

I Consequently, the population mean is E [X ] = 4.


POPULATION DATA
15,00

11,25

7,50

3,75
Sampling 0
!
Population parameter, !

SAMPLE DATA
9,5
10,0

7,5 !
4,5
5,0 3,8
Inference
2,5 1,6 1,5
0,8
0,2
0

Sample parameter, l
Example of sampling and inference
I We select a s.r.s. of size 7 given by:
3.8 9.5 4.8 1.6 0.2 0.8 1.5
I The sample mean of these values is x = 3.171. Then, the relative
error (relative bias) is (4 − 3.171) /4 = 0.207.
I If we add new elements to the previous s.r.s., the sample mean will
change. Indeed, the increase of elements makes the sample mean
converge to the population mean.

CHANGES IN THE MEAN WITH DIFFERENT SAMPLE SIZES


6,0

4,6 4,6
4,4 4,2 4,4 4,4 4,3 4,3
4,5 4,0 4,1 4,1 4,0 4,2 4,0
3,9
3,6
3,1 3,3
3,0

1,5

0
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Sample size
Example of sampling and inference
I On the other hand, if we select another s.r.s. of size 7, we obtain:

5.1 1 0.9 3.8 10.2 2.1 9.5

with sample mean x = 4.65. An histogram with all the possible


values of the sample mean for samples of size 7 is the following:
SAMPLE MEAN DISTRIBUTION - SAMPLES OF SIZE 7
25000

20000

15000

10000

5000

0
1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8
Example of sampling and inference

I Next, we compare the histograms with all the possible values of the
sample mean for samples of size 7 and 17:

SAMPLE MEAN DISTRIBUTION - SAMPLES OF SIZE 7 SAMPLE MEAN DISTRIBUTION - SAMPLES OF SIZE 17
25000 60000

20000

45000

15000

30000

10000

15000

5000

0 0
1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8
Simple random sampling

Conclusions

I A simple random sample of size n of a r.v. X is a set of independent


r.v. with the same distribution that of X :
n
{Xi }i=1 i.i.d.

I The sample mean, X , is a random variable. In general, statistics are


random variables that depends on the random selection of the
members of the sample.
Sample mean

I We compute the expectation and variance of the r.v. X to show the


reasons that make X be a good estimator of E [X ].
I For that, we use two properties of the expectation and variance of
sums of random variables.
I Let X1 , . . . , Xn be a s.r.s. of a r.v. X with expectation E [X ] and
variance V [X ]. Then:

E [a1 X1 + · · · + an Xn ] = a1 E [X1 ] + · · · + an E [Xn ]

and

V [a1 X1 + · · · + an Xn ] = a12 V [X1 ] + · · · + an2 V [Xn ] ,

for any set of real numbers a1 , . . . , an .


Sample mean

I With the previous properties, we obtain (for a s.r.s. X1 , . . . , Xn ):


" n # n n
  1X 1X 1X
E X =E Xi = E [Xi ] = E [X ] = E [X ]
n n n
i=1 i=1 i=1

and
" n
# n n
  1X 1 X 1 X V [X ]
V X =V Xi = 2 V [Xi ] = 2 V [X ] =
n n n n
i=1 i=1 i=1

I Therefore, the expected value of X is E [X ]. We say that X is an


unbiased estimate of E [X ].
 
I Moreover, as V X = V [X ] /n, The error of estimation may be
reduced if we increase the sample size n.
I These properties justify the use of X as an estimator of E [X ].
Bernoulli distribution

I The previous results allow us to obtain statistics that can be used to


estimate the parameters of the distributions introduced in Chapter 4.
I First, let X be a random variable with a Bernoulli distribution with
parameter p, X ∼ Ber (p). Hence,

1 with probability p
X =
0 with probability 1 − p

I Note that E [X ] = p and V [X ] = p (1 − p).


Bernoulli distribution

I Assume that we have a s.r.s. of size n of the r.v. X . The goal is to


estimate the value of the parameter p based on the s.r.s. X1 , . . . , Xn .
I Since p, the success probability, is the expected value of X , it is
possible to estimate p as follows:

p=X
b

I Moreover, from the previous results, we have that:

p (1 − p)
E [b
p] = p y V [b
p] =
n
so that if the sample size is very large, b
p must be very close to p.
Example

I Pablo wants to be the Mayor of a small village. As he would like to


know his chances, Pablo decides to make a small survey to estimate
the proportion of voters of the village that supports him.
I Assume the r.v. X =“Vote for Pablo” that takes two values, 1, if
the selected person supports Pablo, and 0, otherwise.
I We take a s.r.s. of size 10 obtaining the values:

1 0 0 1 1 0 1 0 1 0

I Consequently, the estimated proportion is b


p = 0.5.
Binomial distribution

I Next, let Y be a random variable with a Binomial distribution of


parameters m and p, Y ∼ B (m, p). Therefore,
 
m y m−y
P (Y = y ) = p (1 − p)
y

for y = 0, 1, . . . , m.
I Remember that Y ∼ B (m, p) means that Y is the sum of m
independent r.v. with a Bernoulli distribution with parameter p, i.e.,
a s.r.s. X1 , . . . , Xm such that Y = X1 + · · · + Xm .
I Consequently, E [Y ] = mp and V [Y ] = mp (1 − p).
I Let us see how to estimate the proportion p in two different
situations.
Binomial distribution

I First, if we only have a sample of Y , say Y1 , then, it is possible to


estimate p as follows:
Y1 X1 + · · · + Xm
p=
b =
m m
that coincides with the sample mean of the s.r.s. X1 , . . . , Xm of the
Bernoulli variable X .
I Therefore, the estimate verifies the same properties that the
estimate based on the Bernoulli distribution, i.e.:
p (1 − p)
E [b
p] = p and V [b
p] =
m
so if the number of replicas of the Bernoulli experiment, m, is very
big, b
p should be very close to p.
Example

I In the previous example, if we define the variable Y =“Number of


voters of Pablo in a sample of size 10”, for the obtained sample we
have that Y1 = 5.
I In this case, the estimated proportion is again b
p = 0.5.
Binomial distribution

I Second, if we have a s.r.s. of size n of the r.v. Y , say Y1 , . . . , Yn


(note that n is the sample size and m is the number of replicas of the
Bernoulli experiment), then, it is possible to estimate p as follows:

Y Y1 + · · · + Yn
p= =
n×m
b
m
I Besides, due to the properties of the sample mean:
 
Y mp
E [b
p] = E = =p
m m

and 
Y 1   p (1 − p)
V [b
p] = V = 2V Y =
m m n×m
so if the number of replicas of the Bernoulli experiment, m, and/or
the sample size, n, is/are very large, b
p should be very close to p.
Example

I Assume next that Pablo takes a second sample from the variable
X =“Vote for Pablo”, obtaining:

0 0 0 0 1 0 0 1 0 0

I In this case, the estimated proportion is 0.2.


I The value of the variable Y =“Number of voters in a sample of size
10” is therefore Y2 = 2.
I If we look for the estimate of the proportion that takes into account
the values Y1 = 5 and Y2 = 2, we obtain:
5+2 7
p= = = 0.35
10 × 2
b
20
Poisson distribution
I Let X be a random variable with a Poisson distribution of parameter
λ, X ∼ P (λ). Therefore,

λx e −λ
P (X = x) = , for x = 0, 1, 2, . . .
x!
Note that E [X ] = λ and V [X ] = λ.
I Now, for a s.r.s. of size n of the r.v. X , say X1 , . . . , Xn , and since
E [X ] = λ, it is possible to estimate λ as follows:

λ
b=X

I Moreover:

b =V X = λ
h i   h i  
E λ
b =E X =λ y V λ
n

so if the sample size, n, is very big, λ


b should be close to λ.
Example

I We want to estimate the expected value of the r.v. X =“number of


people that arrive at an ATM on Wednesdays at 11:00 am”. Assume
that X follows a Poisson distribution of parameter λ.
I We count the value of the variable X at 100 consecutive
Wednesdays obtaining the values x1 = 5, x2 = 7 . . . , x100 = 3 with
sample mean 6. Consequently, λ b = 6.
Exponential distribution

I Next, let Y be a random variable with an exponential distribution of


parameter λ, Y ∼ E (λ). Therefore,

f (y ) = λe −λy , for y > 0

Note that E [Y ] = 1/λ and V [Y ] = 1/λ2 .


I Now, for a s.r.s. of size n of the r.v. Y , say Y1 , . . . , Yn , it is possible
to estimate λ as follows:
b= 1
λ
Y
Example

I In the previous example, we also take 100 values of the r.v.


Y =“waiting time in minutes between arrivals at the ATM on
Wednesdays at 11:00 am”, given by
y1 = 9.5, y2 = 7.3, . . . , y100 = 15.2.
I Note that the variable Y has a exponential distribution as we have
assumed that X has a Poisson distribution.
I The sample mean is 11 minutes, so that the estimate of λ is
λ
b = 1/11 = 0.0909.
Normal distribution

I Finally, let X be a random variable of a normal or Gaussian


distribution of parameters µ and σ, X ∼ N (µ, σ). Therefore,
 
1 1 2
f (x) = √ exp − 2 (x − µ) , for − ∞ < x < ∞
2πσ 2σ

Note that E [X ] = µ and V [X ] = σ 2 .


I Now, for a s.r.s. of size n of the r.v. X , say X1 , . . . , Xn , it is possible
to estimate µ and σ as follows:
v
u n
u1 X 2
µ
b=X and σb=t Xi − X
n
i=1
Normal distribution
I Due to the properties of the sample mean:
    σ2
E [b
µ] = E X = µ y V [b
µ] = V X =
n
so if n, is very large, µ
b should be close to µ.
I The analysis of the standard deviation is more complex, but it is
possible to show that:
 2 n − 1 2
E σb = σ
n
 2  2
b is not σ 2 , although E σ
i.e., E σ b tends to σ 2 if n tends to ∞.
I For this reason, sometimes it is better to use the estimator called
quasi-variance given by:
n
1 X 2
s2 = Xi − X
n−1
i=1
 
Note that E s 2 = σ 2 . Therefore, a reasonable estimator of the

standard deviation is the quasi-standard deviation, s 2 .
Example

I Assume that the monthly returns of a certain financial asset follow a


normal distribution. We want to estimate the parameters of this
normal distribution.
I We have n = 46 values of the monthly returns of the asset (in
percentages).
I The sample mean, x = 1.03, is an estimate of the population mean,
µ.
I On the other hand, the sample standard deviation, σ b = 4.16, is an
estimate of the population standard deviation, σ. An alternative
estimate is the quasi-standard deviation, s = 4.25.
Goodness of fit

I As we have seen in some of the examples, sometimes we have


assumed that the data comes from a given distribution. In real
applications, this assumption should be carefully justified.
I The goodness of fit methods are useful procedures to do that.
I Here, we focus on two graphical methods to perform goodness of fit.
Histogram with estimated density function
I The first one compares the data histogram with the density function
with estimated parameters. If the assumption on the distribution is
true, then the density function should be close to the histogram.
I For instance, the next plot shows the data histogram corresponding
to 200 returns of a certain asset. The plot compares the data
histogram with the normal density function with estimated
parameters based on the sample (b µ = 0.83 and σ
b = 4.12).
QQ-plot
I The second graphic is called QQ-plot and shows the sample
quantiles versus the corresponding distribution quantiles with
parameters estimated from the sample.
I If the data have been actually generated by the considered
distribution, then the points in the graphic will approximately lie in a
real line.
I If the corresponding distribution function is continuous and
increasing, the pth quantile (0 < p < 1), denoted by qp , is obtained
by inverting the distribution function. Therefore, if we look for qp
such that F (qp ) = p, then, qp = F −1 (p).
I If the corresponding distribution function is discrete or piecewise
constant, qp = min {x : F (x) ≥ p}.
I The sample pth quantile, Qp , is obtained as follows: (1) the data,
x1 , . . . , xn is sorted in increasing order, x(1) , . . . , x(n) ; (2) then, Qp is
given by Qp = x([np]) .
QQ-plot
I For example, the plot shows the QQ-plot of 200 returns of a certain
asset that compares the sample quantiles versus the quantiles of a
normal distribution with parameters estimated from the sample
µ
b = 0.83 and σb = 4.12.
I The plot suggests that the normal distribution may be appropriate
for the data.
The distribution of the sample mean

I Previous slides have shown how to estimate parameter estimation of


several distributions using the properties of the sample mean.
I Next, we look for the distribution of the sample mean that will be
useful to obtain confidence intervals.
The distribution of the sample mean

I If X has a normal distribution N (µ, σ) and X1 , . . . , Xn is a s.r.s. of


X , then:  
σ X −µ
X ∼ N µ, √ =⇒ σ ∼ N (0, 1)
n √
n

I If X has expectation E [X ] and variance V [X ] and does not have a


normal distribution, then the Central Limit Theorem (CLT) affirms
that, if X1 , . . . , Xn is a s.r.s. of X , where n is big enough (n ≥ 30,
approximately), then:
r !
V [X ] X − E [X ]
X −→ N E [X ] , =⇒ q −→ N (0, 1)
n V [X ]
n
Example

I If X1 , . . . , Xn is a s.r.s. of X with a Ber (p) distribution, for large n:


r !
p (1 − p) X −p
X −→ N p, =⇒ q −→ N (0, 1)
n p(1−p)
n
Example

I Let X be a discrete r.v. with probability function:



1/4 if x = 1, 2, 3, 4
P (X = x) =
0 otherwise

I A s.r.s. of size n = 125 is taken from X . Compute the probability


that the sample mean will be between 2.4 and 2.6.
I We first compute E [X ] and V [X ] as follows:

1 1 1 1
E [X ] =× 1 + × 2 + × 3 + × 4 = 2.5
4 4 4 4
2
V [X ] = E X 2 − E [X ] =
 

1 1 1 1
= × 12 + × 22 + × 32 + × 42 − 2.52 = 1.25
4 4 4 4
Example

I From the CLT:


r !
1.25
X −→ N 2.5, = N (2.5, 0.1)
125

I Therefore:
 
 2.4 − 2.5 X − 2.5 2.6 − 2.5
P 2.4 < X < 2.6 = P < < =
0.1 0.1 0.1
= P (−1 < Z < 1) = P (Z < 1) − P (Z < −1) =
= P (Z < 1) − (1 − P (Z < 1)) =
= 2 × P (Z < 1) − 1 = 2 × 0.8413 − 1 = 0.6826.

I Note that this is an approximate probability as we are using an


approximate distribution.
Confidence intervals

I Instead of a point estimator, it could be more informative to have an


interval of plausible values for the true value of the unknown
parameter.
I For instance, given a sample, we would like to know an interval of
values containing the true value of the population mean, µ, with
total certainty. Unfortunately, this is not possible.
I Instead, we consider a way to obtain intervals such that the
(1 − α) % of the intervals that are constructed in this way contains
the true value of the population mean, µ. Here, 1 − α is the
confidence level and the interval obtained is a confidence interval.
Confidence intervals

I Assume that X1 , . . . , Xn is a s.r.s. of a r.v. X with a N (µ, σ)


distribution, where σ is known.
 
I We know that X ∼ N µ, √σn , or, equivalently, X√−µ σ ∼ N (0, 1).
n
Then: !
X −µ
P −zα/2 < σ < zα/2 =1−α

n

and, thus:
 
σ σ
P X − zα/2 √ < µ < X + zα/2 √ =1−α
n n
I Therefore, a confidence interval for µ at confidence level 1 − α is
given by:  
σ σ
x − zα/2 √ , x + zα/2 √
n n
Confidence intervals

I 100 samples of size n = 50 have been generated from a N (−2, 1)


distribution. Confidence intervals at level 90% have been
constructed with the samples. Approximately, the 90% of the
Interpretación
intervals includes thefrecuentista
true valuedel
µ= intervalo
−2. de confianza
Example

I Let us suppose that the returns of the company SEGURA.SL follow


a normal distribution with mean µ euros and variance σ 2 = 1. A
simple random sample, of size n = 20, of the returns is taken and
the following values are obtained:

5.29 3.66 5.71 6.62 4.30 5.85 6.25 3.40 3.55 5.57
4.60 5.69 5.81 5.71 6.29 5.66 6.19 3.79 4.98 4.84

I The sample mean of the 20 returns is x = 5.188. Therefore, the


confidence interval at confidence level 90% for the mean return of
this company is:
 
1 1
5.188 − 1.645 √ , 5.188 + 1.645 √ = (4.6678, 5.7082)
20 20
Confidence intervals

I What happens if the standard deviation is unkwnon or if the random


variable is not normal?
I When the sample size, n, is large, the CLT tells us that the
distribution of X is approximately normally distributed,
independently of the data distribution.
I Therefore, if the data are not normally distributed, for large sample
sizes, we can use the following confidence interval for the population
mean, µ:  
σ σ
x − zα/2 √ , x + zα/2 √
b b
n n
where σ
b is the sample standard deviation.
Confidence intervals

I Let X1 , . . . , Xn be a s.r.s. of a r.v. X with a Ber(p) distribution.


Then, X is a r.v. that estimates the success proportion p based on n
Bernoulli trials.
I Due to the CLT: r !
p (1 − p)
X ∼N p,
n
I The confidence interval for the proportion p is as follows:
r r !
p (1 − b
p) p (1 − bp)
p − zα/2
b b
b ,b
p + zα/2
n n

p = x.
where b
Example

I Let’s go back to the example corresponding to the estimation of the


proportion of a Bernoulli distribution. Pablo finally makes a survey
with n = 100 interviews and obtains the estimate b p = 0.4.
I The confidence interval at level 95% of the proportion p is:
r r !
0.4 (1 − 0.4) 0.4 (1 − 0.4)
0.4 − 1.96 , 0.4 + 1.96 =
100 100
= (0.3039, 0.4960)

You might also like