0% found this document useful (0 votes)

22 views45 pages

Chapter 4

Uploaded by

maariavidaal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views45 pages

Chapter 4

Uploaded by

maariavidaal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Statistics I

Chapter 5: Introduction to statistical inference

Chapter 5. Introduction to statistical inference

Contents

I Basic concepts.
I Point estimation.
I Goodness of fit.
I Distribution of the sample mean.
I Confidence interval for the mean.
Basic concepts
I Objective: To study the characteristics of interest of a population
through the information obtained from a sample.
I We identify the concept of population with that of the random
variable X which is the object of study.
I The law or the distribution of the population is the distribution of
the random variable of interest, X. Such a distribution may be
totally or partially unknown. (For example, we know it is normal,
but we dont know the values of µ and σ, or we know it is Binomial
with p unknown).
I The unknown values that determine the distribution of the
population are the parameters. These are FIXED, NONRANDOM
values of the population.
I The main goal of the statistical inference is to infer the values of the
population parameters or certain characteristics of a random
variable, such as the population mean, based on the information
given by a sample.
Sampling

I Sample: finite subset of a population. The number of elements in

the sample is the sample size.
I Most of the time it is impossible to work with all the elements of a
population:
I The elements may exist conceptually but not in reality (the
population of defective items produced by a machine during its life
time).
I It may be impossible for economic reasons to study the whole
population.
I It may be impossible due to time constraints to work with the whole
population and the population may change with time (electoral
surveys).
I The study implies the destruction of the element (the study of the
average lifetime of some light bulbs, the study of the service life of a
semiconductor. . . ).
Simple random sample

I A simple random sample should be used when the probability

distribution of the elements of the population are homogeneous with
respect to the variable of interest, that is, whenever we have no
additional information about the population prior to the sampling.
I A simple random sample must be chosen so that:
I each element of the population has the same probability of being
selected (assures the sample is representative of the whole
population),
I Sampling must be with replacement to make sure the population
remains unchanged for all the selections (needed to simplify things:
if the population size N is large with respect to the sample size n,
then sampling with replacement is not needed).
Simple random sample

I Let X be a random variable with distribution F . A simple random

sample of size n is a set of n v.a. X1 , . . . , Xn such that:
I All X1 , . . . , Xn have the same distribution F (Xi ∼ F , ∀i).
I X1 , . . . , Xn are independents.

I Each specific value x1 , . . . , xn of a s.r.s. is called observed sample.

I A statistic is a real function of the s.r.s. X1 , . . . , Xn . Therefore, a
statistic is a random variable (unlike a parameter which is a fixed
number, inherent to a population).
Use of a simple random sample

I Assume that the value of E [X ] for a r.v. X is unknown. To

estimate the value of E [X ], it is usual to use a s.r.s. to obtain the
sample mean:
n
1X
X = Xi
n
i=1

Note that X is a random variable.

I Now, for
Pnan observed sample x1 , . . . , xn , we obtain a numeric value
x = n1 i=1 xi .
I OBS: X 6= x.
I We will see later why X is a good estimate of E [X ]. Firt, let’s see
an example.
Example of sampling and inference
I Suppose the population has 24 members and the random variable of
interest is X = “Time to complete a medical service00 .
I The population values (in minutes) are:

5.1 1 0.9 3.8 10.2 2.1 9.5 4.5

1 2.2 1.5 4.8 1.6 8.8 4.3 1
9 5.1 0.2 2.3 0.8 7.8 7.7 1.5

I Consequently, the population mean is E [X ] = 4.

POPULATION DATA
15,00

11,25

7,50

3,75
Sampling 0
!
Population parameter, !

SAMPLE DATA
9,5
10,0

7,5 !
4,5
5,0 3,8
Inference
2,5 1,6 1,5
0,8
0,2
0

Sample parameter, l
Example of sampling and inference
I We select a s.r.s. of size 7 given by:
3.8 9.5 4.8 1.6 0.2 0.8 1.5
I The sample mean of these values is x = 3.171. Then, the relative
error (relative bias) is (4 − 3.171) /4 = 0.207.
I If we add new elements to the previous s.r.s., the sample mean will
change. Indeed, the increase of elements makes the sample mean
converge to the population mean.

CHANGES IN THE MEAN WITH DIFFERENT SAMPLE SIZES

6,0

4,6 4,6
4,4 4,2 4,4 4,4 4,3 4,3
4,5 4,0 4,1 4,1 4,0 4,2 4,0
3,9
3,6
3,1 3,3
3,0

1,5

0
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Sample size
Example of sampling and inference
I On the other hand, if we select another s.r.s. of size 7, we obtain:

5.1 1 0.9 3.8 10.2 2.1 9.5

with sample mean x = 4.65. An histogram with all the possible

values of the sample mean for samples of size 7 is the following:
SAMPLE MEAN DISTRIBUTION - SAMPLES OF SIZE 7
25000

20000

15000

10000

5000

0
1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8
Example of sampling and inference

I Next, we compare the histograms with all the possible values of the
sample mean for samples of size 7 and 17:

SAMPLE MEAN DISTRIBUTION - SAMPLES OF SIZE 7 SAMPLE MEAN DISTRIBUTION - SAMPLES OF SIZE 17
25000 60000

20000

45000

15000

30000

10000

15000

5000

0 0
1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8
Simple random sampling

Conclusions

I A simple random sample of size n of a r.v. X is a set of independent

r.v. with the same distribution that of X :
n
{Xi }i=1 i.i.d.

I The sample mean, X , is a random variable. In general, statistics are

random variables that depends on the random selection of the
members of the sample.
Sample mean

I We compute the expectation and variance of the r.v. X to show the

reasons that make X be a good estimator of E [X ].
I For that, we use two properties of the expectation and variance of
sums of random variables.
I Let X1 , . . . , Xn be a s.r.s. of a r.v. X with expectation E [X ] and
variance V [X ]. Then:

E [a1 X1 + · · · + an Xn ] = a1 E [X1 ] + · · · + an E [Xn ]

and

V [a1 X1 + · · · + an Xn ] = a12 V [X1 ] + · · · + an2 V [Xn ] ,

for any set of real numbers a1 , . . . , an .

Sample mean

I With the previous properties, we obtain (for a s.r.s. X1 , . . . , Xn ):

" n # n n
1X 1X 1X
E X =E Xi = E [Xi ] = E [X ] = E [X ]
n n n
i=1 i=1 i=1

and
" n
# n n
1X 1 X 1 X V [X ]
V X =V Xi = 2 V [Xi ] = 2 V [X ] =
n n n n
i=1 i=1 i=1

I Therefore, the expected value of X is E [X ]. We say that X is an

unbiased estimate of E [X ].

I Moreover, as V X = V [X ] /n, The error of estimation may be
reduced if we increase the sample size n.
I These properties justify the use of X as an estimator of E [X ].
Bernoulli distribution

I The previous results allow us to obtain statistics that can be used to

estimate the parameters of the distributions introduced in Chapter 4.
I First, let X be a random variable with a Bernoulli distribution with
parameter p, X ∼ Ber (p). Hence,

1 with probability p
X =
0 with probability 1 − p

I Note that E [X ] = p and V [X ] = p (1 − p).

Bernoulli distribution

I Assume that we have a s.r.s. of size n of the r.v. X . The goal is to

estimate the value of the parameter p based on the s.r.s. X1 , . . . , Xn .
I Since p, the success probability, is the expected value of X , it is
possible to estimate p as follows:

p=X
b

I Moreover, from the previous results, we have that:

p (1 − p)
E [b
p] = p y V [b
p] =
n
so that if the sample size is very large, b
p must be very close to p.
Example

I Pablo wants to be the Mayor of a small village. As he would like to

know his chances, Pablo decides to make a small survey to estimate
the proportion of voters of the village that supports him.
I Assume the r.v. X =“Vote for Pablo” that takes two values, 1, if
the selected person supports Pablo, and 0, otherwise.
I We take a s.r.s. of size 10 obtaining the values:

1 0 0 1 1 0 1 0 1 0

I Consequently, the estimated proportion is b

p = 0.5.
Binomial distribution

I Next, let Y be a random variable with a Binomial distribution of

parameters m and p, Y ∼ B (m, p). Therefore,

m y m−y
P (Y = y ) = p (1 − p)
y

for y = 0, 1, . . . , m.
I Remember that Y ∼ B (m, p) means that Y is the sum of m
independent r.v. with a Bernoulli distribution with parameter p, i.e.,
a s.r.s. X1 , . . . , Xm such that Y = X1 + · · · + Xm .
I Consequently, E [Y ] = mp and V [Y ] = mp (1 − p).
I Let us see how to estimate the proportion p in two different
situations.
Binomial distribution

I First, if we only have a sample of Y , say Y1 , then, it is possible to

estimate p as follows:
Y1 X1 + · · · + Xm
p=
b =
m m
that coincides with the sample mean of the s.r.s. X1 , . . . , Xm of the
Bernoulli variable X .
I Therefore, the estimate verifies the same properties that the
estimate based on the Bernoulli distribution, i.e.:
p (1 − p)
E [b
p] = p and V [b
p] =
m
so if the number of replicas of the Bernoulli experiment, m, is very
big, b
p should be very close to p.
Example

I In the previous example, if we define the variable Y =“Number of

voters of Pablo in a sample of size 10”, for the obtained sample we
have that Y1 = 5.
I In this case, the estimated proportion is again b
p = 0.5.
Binomial distribution

I Second, if we have a s.r.s. of size n of the r.v. Y , say Y1 , . . . , Yn

(note that n is the sample size and m is the number of replicas of the
Bernoulli experiment), then, it is possible to estimate p as follows:

Y Y1 + · · · + Yn
p= =
n×m
b
m
I Besides, due to the properties of the sample mean:

Y mp
E [b
p] = E = =p
m m

and
Y 1 p (1 − p)
V [b
p] = V = 2V Y =
m m n×m
so if the number of replicas of the Bernoulli experiment, m, and/or
the sample size, n, is/are very large, b
p should be very close to p.
Example

I Assume next that Pablo takes a second sample from the variable
X =“Vote for Pablo”, obtaining:

0 0 0 0 1 0 0 1 0 0

I In this case, the estimated proportion is 0.2.

I The value of the variable Y =“Number of voters in a sample of size
10” is therefore Y2 = 2.
I If we look for the estimate of the proportion that takes into account
the values Y1 = 5 and Y2 = 2, we obtain:
5+2 7
p= = = 0.35
10 × 2
b
20
Poisson distribution
I Let X be a random variable with a Poisson distribution of parameter
λ, X ∼ P (λ). Therefore,

λx e −λ
P (X = x) = , for x = 0, 1, 2, . . .
x!
Note that E [X ] = λ and V [X ] = λ.
I Now, for a s.r.s. of size n of the r.v. X , say X1 , . . . , Xn , and since
E [X ] = λ, it is possible to estimate λ as follows:

λ
b=X

I Moreover:

b =V X = λ
h i h i
E λ
b =E X =λ y V λ
n

so if the sample size, n, is very big, λ

b should be close to λ.
Example

I We want to estimate the expected value of the r.v. X =“number of

people that arrive at an ATM on Wednesdays at 11:00 am”. Assume
that X follows a Poisson distribution of parameter λ.
I We count the value of the variable X at 100 consecutive
Wednesdays obtaining the values x1 = 5, x2 = 7 . . . , x100 = 3 with
sample mean 6. Consequently, λ b = 6.
Exponential distribution

I Next, let Y be a random variable with an exponential distribution of

parameter λ, Y ∼ E (λ). Therefore,

f (y ) = λe −λy , for y > 0

Note that E [Y ] = 1/λ and V [Y ] = 1/λ2 .

I Now, for a s.r.s. of size n of the r.v. Y , say Y1 , . . . , Yn , it is possible
to estimate λ as follows:
b= 1
λ
Y
Example

I In the previous example, we also take 100 values of the r.v.

Y =“waiting time in minutes between arrivals at the ATM on
Wednesdays at 11:00 am”, given by
y1 = 9.5, y2 = 7.3, . . . , y100 = 15.2.
I Note that the variable Y has a exponential distribution as we have
assumed that X has a Poisson distribution.
I The sample mean is 11 minutes, so that the estimate of λ is
λ
b = 1/11 = 0.0909.
Normal distribution

I Finally, let X be a random variable of a normal or Gaussian

distribution of parameters µ and σ, X ∼ N (µ, σ). Therefore,

1 1 2
f (x) = √ exp − 2 (x − µ) , for − ∞ < x < ∞
2πσ 2σ

Note that E [X ] = µ and V [X ] = σ 2 .

I Now, for a s.r.s. of size n of the r.v. X , say X1 , . . . , Xn , it is possible
to estimate µ and σ as follows:
v
u n
u1 X 2
µ
b=X and σb=t Xi − X
n
i=1
Normal distribution
I Due to the properties of the sample mean:
σ2
E [b
µ] = E X = µ y V [b
µ] = V X =
n
so if n, is very large, µ
b should be close to µ.
I The analysis of the standard deviation is more complex, but it is
possible to show that:
2 n − 1 2
E σb = σ
n
2 2
b is not σ 2 , although E σ
i.e., E σ b tends to σ 2 if n tends to ∞.
I For this reason, sometimes it is better to use the estimator called
quasi-variance given by:
n
1 X 2
s2 = Xi − X
n−1
i=1

Note that E s 2 = σ 2 . Therefore, a reasonable estimator of the
√
standard deviation is the quasi-standard deviation, s 2 .
Example

I Assume that the monthly returns of a certain financial asset follow a

normal distribution. We want to estimate the parameters of this
normal distribution.
I We have n = 46 values of the monthly returns of the asset (in
percentages).
I The sample mean, x = 1.03, is an estimate of the population mean,
µ.
I On the other hand, the sample standard deviation, σ b = 4.16, is an
estimate of the population standard deviation, σ. An alternative
estimate is the quasi-standard deviation, s = 4.25.
Goodness of fit

I As we have seen in some of the examples, sometimes we have

assumed that the data comes from a given distribution. In real
applications, this assumption should be carefully justified.
I The goodness of fit methods are useful procedures to do that.
I Here, we focus on two graphical methods to perform goodness of fit.
Histogram with estimated density function
I The first one compares the data histogram with the density function
with estimated parameters. If the assumption on the distribution is
true, then the density function should be close to the histogram.
I For instance, the next plot shows the data histogram corresponding
to 200 returns of a certain asset. The plot compares the data
histogram with the normal density function with estimated
parameters based on the sample (b µ = 0.83 and σ
b = 4.12).
QQ-plot
I The second graphic is called QQ-plot and shows the sample
quantiles versus the corresponding distribution quantiles with
parameters estimated from the sample.
I If the data have been actually generated by the considered
distribution, then the points in the graphic will approximately lie in a
real line.
I If the corresponding distribution function is continuous and
increasing, the pth quantile (0 < p < 1), denoted by qp , is obtained
by inverting the distribution function. Therefore, if we look for qp
such that F (qp ) = p, then, qp = F −1 (p).
I If the corresponding distribution function is discrete or piecewise
constant, qp = min {x : F (x) ≥ p}.
I The sample pth quantile, Qp , is obtained as follows: (1) the data,
x1 , . . . , xn is sorted in increasing order, x(1) , . . . , x(n) ; (2) then, Qp is
given by Qp = x([np]) .
QQ-plot
I For example, the plot shows the QQ-plot of 200 returns of a certain
asset that compares the sample quantiles versus the quantiles of a
normal distribution with parameters estimated from the sample
µ
b = 0.83 and σb = 4.12.
I The plot suggests that the normal distribution may be appropriate
for the data.
The distribution of the sample mean

I Previous slides have shown how to estimate parameter estimation of

several distributions using the properties of the sample mean.
I Next, we look for the distribution of the sample mean that will be
useful to obtain confidence intervals.
The distribution of the sample mean

I If X has a normal distribution N (µ, σ) and X1 , . . . , Xn is a s.r.s. of

X , then:
σ X −µ
X ∼ N µ, √ =⇒ σ ∼ N (0, 1)
n √
n

I If X has expectation E [X ] and variance V [X ] and does not have a

normal distribution, then the Central Limit Theorem (CLT) affirms
that, if X1 , . . . , Xn is a s.r.s. of X , where n is big enough (n ≥ 30,
approximately), then:
r !
V [X ] X − E [X ]
X −→ N E [X ] , =⇒ q −→ N (0, 1)
n V [X ]
n
Example

I If X1 , . . . , Xn is a s.r.s. of X with a Ber (p) distribution, for large n:

r !
p (1 − p) X −p
X −→ N p, =⇒ q −→ N (0, 1)
n p(1−p)
n
Example

I Let X be a discrete r.v. with probability function:

1/4 if x = 1, 2, 3, 4
P (X = x) =
0 otherwise

I A s.r.s. of size n = 125 is taken from X . Compute the probability

that the sample mean will be between 2.4 and 2.6.
I We first compute E [X ] and V [X ] as follows:

1 1 1 1
E [X ] =× 1 + × 2 + × 3 + × 4 = 2.5
4 4 4 4
2
V [X ] = E X 2 − E [X ] =

1 1 1 1
= × 12 + × 22 + × 32 + × 42 − 2.52 = 1.25
4 4 4 4
Example

I From the CLT:

r !
1.25
X −→ N 2.5, = N (2.5, 0.1)
125

I Therefore:

2.4 − 2.5 X − 2.5 2.6 − 2.5
P 2.4 < X < 2.6 = P < < =
0.1 0.1 0.1
= P (−1 < Z < 1) = P (Z < 1) − P (Z < −1) =
= P (Z < 1) − (1 − P (Z < 1)) =
= 2 × P (Z < 1) − 1 = 2 × 0.8413 − 1 = 0.6826.

I Note that this is an approximate probability as we are using an

approximate distribution.
Confidence intervals

I Instead of a point estimator, it could be more informative to have an

interval of plausible values for the true value of the unknown
parameter.
I For instance, given a sample, we would like to know an interval of
values containing the true value of the population mean, µ, with
total certainty. Unfortunately, this is not possible.
I Instead, we consider a way to obtain intervals such that the
(1 − α) % of the intervals that are constructed in this way contains
the true value of the population mean, µ. Here, 1 − α is the
confidence level and the interval obtained is a confidence interval.
Confidence intervals

I Assume that X1 , . . . , Xn is a s.r.s. of a r.v. X with a N (µ, σ)

distribution, where σ is known.

I We know that X ∼ N µ, √σn , or, equivalently, X√−µ σ ∼ N (0, 1).
n
Then: !
X −µ
P −zα/2 < σ < zα/2 =1−α
√
n

and, thus:

σ σ
P X − zα/2 √ < µ < X + zα/2 √ =1−α
n n
I Therefore, a confidence interval for µ at confidence level 1 − α is
given by:
σ σ
x − zα/2 √ , x + zα/2 √
n n
Confidence intervals

I 100 samples of size n = 50 have been generated from a N (−2, 1)

distribution. Confidence intervals at level 90% have been
constructed with the samples. Approximately, the 90% of the
Interpretación
intervals includes thefrecuentista
true valuedel
µ= intervalo
−2. de confianza
Example

I Let us suppose that the returns of the company SEGURA.SL follow

a normal distribution with mean µ euros and variance σ 2 = 1. A
simple random sample, of size n = 20, of the returns is taken and
the following values are obtained:

5.29 3.66 5.71 6.62 4.30 5.85 6.25 3.40 3.55 5.57
4.60 5.69 5.81 5.71 6.29 5.66 6.19 3.79 4.98 4.84

I The sample mean of the 20 returns is x = 5.188. Therefore, the

confidence interval at confidence level 90% for the mean return of
this company is:

1 1
5.188 − 1.645 √ , 5.188 + 1.645 √ = (4.6678, 5.7082)
20 20
Confidence intervals

I What happens if the standard deviation is unkwnon or if the random

variable is not normal?
I When the sample size, n, is large, the CLT tells us that the
distribution of X is approximately normally distributed,
independently of the data distribution.
I Therefore, if the data are not normally distributed, for large sample
sizes, we can use the following confidence interval for the population
mean, µ:
σ σ
x − zα/2 √ , x + zα/2 √
b b
n n
where σ
b is the sample standard deviation.
Confidence intervals

I Let X1 , . . . , Xn be a s.r.s. of a r.v. X with a Ber(p) distribution.

Then, X is a r.v. that estimates the success proportion p based on n
Bernoulli trials.
I Due to the CLT: r !
p (1 − p)
X ∼N p,
n
I The confidence interval for the proportion p is as follows:
r r !
p (1 − b
p) p (1 − bp)
p − zα/2
b b
b ,b
p + zα/2
n n

p = x.
where b
Example

I Let’s go back to the example corresponding to the estimation of the

proportion of a Bernoulli distribution. Pablo finally makes a survey
with n = 100 interviews and obtains the estimate b p = 0.4.
I The confidence interval at level 95% of the proportion p is:
r r !
0.4 (1 − 0.4) 0.4 (1 − 0.4)
0.4 − 1.96 , 0.4 + 1.96 =
100 100
= (0.3039, 0.4960)

Managerial Statistics
No ratings yet
Managerial Statistics
128 pages
Chapter 3 - Sampling Distribution and Confidence Interval1
No ratings yet
Chapter 3 - Sampling Distribution and Confidence Interval1
54 pages
Statistics and Probability Chapter 1 2 3
No ratings yet
Statistics and Probability Chapter 1 2 3
89 pages
06 Stat Est
No ratings yet
06 Stat Est
41 pages
L06 Inference
No ratings yet
L06 Inference
48 pages
Mod 2 Stats
No ratings yet
Mod 2 Stats
8 pages
Chap5 Statistical Inference
No ratings yet
Chap5 Statistical Inference
37 pages
Chapter 3 Sampling Distribution and Confidence Interval
100% (2)
Chapter 3 Sampling Distribution and Confidence Interval
57 pages
Unit 7
No ratings yet
Unit 7
35 pages
DPBS 1203 Business and Economic Statistics
No ratings yet
DPBS 1203 Business and Economic Statistics
21 pages
Point Estimation and Sampling Distribution
No ratings yet
Point Estimation and Sampling Distribution
6 pages
Introduction To Inferential Statistics Sampling Distributions
No ratings yet
Introduction To Inferential Statistics Sampling Distributions
21 pages
Psyc 235: Introduction To Statistics: Don'T Forget To Sign in For Credit!
No ratings yet
Psyc 235: Introduction To Statistics: Don'T Forget To Sign in For Credit!
41 pages
10 Intro To Stats
No ratings yet
10 Intro To Stats
43 pages
Sampling Distribution
No ratings yet
Sampling Distribution
41 pages
5 Inference
No ratings yet
5 Inference
57 pages
Chapter 6. Estiamation
No ratings yet
Chapter 6. Estiamation
65 pages
Formula List Statistics 2
No ratings yet
Formula List Statistics 2
4 pages
03 Estimation IITB PDF
No ratings yet
03 Estimation IITB PDF
58 pages
Sp25 Module 06 Sampling
No ratings yet
Sp25 Module 06 Sampling
45 pages
Basics
No ratings yet
Basics
61 pages
Sampling Distributions
No ratings yet
Sampling Distributions
32 pages
Normal Distribution:Sampling
No ratings yet
Normal Distribution:Sampling
8 pages
Probability 2 FPM
No ratings yet
Probability 2 FPM
55 pages
Exercise 5
No ratings yet
Exercise 5
6 pages
Week 9+10+11
No ratings yet
Week 9+10+11
82 pages
5 Describing Populations: in This Chapter We Describe Populations and Samples Using The Language of Probability
No ratings yet
5 Describing Populations: in This Chapter We Describe Populations and Samples Using The Language of Probability
9 pages
Chap 1 Sampling Distributions
No ratings yet
Chap 1 Sampling Distributions
14 pages
Sampling Dist
No ratings yet
Sampling Dist
34 pages
Chap5 Statistical Inference
No ratings yet
Chap5 Statistical Inference
38 pages
Untitled 3
No ratings yet
Untitled 3
32 pages
Distributions
No ratings yet
Distributions
21 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Statistical Inference Frequentist
No ratings yet
Statistical Inference Frequentist
25 pages
Inferential Statistics
No ratings yet
Inferential Statistics
73 pages
Gsbiju MA202 3 1
No ratings yet
Gsbiju MA202 3 1
5 pages
ESTIMATION
No ratings yet
ESTIMATION
4 pages
Statistics For Econometrics
No ratings yet
Statistics For Econometrics
41 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Notes ch3 Sampling Distributions
No ratings yet
Notes ch3 Sampling Distributions
20 pages
Lecture 5: Statistical Independence, Discrete Random Variables
No ratings yet
Lecture 5: Statistical Independence, Discrete Random Variables
4 pages
Statistics PDF
No ratings yet
Statistics PDF
17 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
Statistics Final Review
No ratings yet
Statistics Final Review
28 pages
Introduction To Probabilistic Sampling
No ratings yet
Introduction To Probabilistic Sampling
39 pages
Chapter 6
No ratings yet
Chapter 6
11 pages
What Is Statistic
No ratings yet
What Is Statistic
129 pages
Internal Paper
No ratings yet
Internal Paper
20 pages
Chap2 PDF
No ratings yet
Chap2 PDF
20 pages
BN2102 1-6 Notes
No ratings yet
BN2102 1-6 Notes
38 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Chapter 5 Prob
No ratings yet
Chapter 5 Prob
6 pages
Introduction To Probability and Statistics Twelfth Edition: Presentation Designed and Written By: Barbara M. Beaver
No ratings yet
Introduction To Probability and Statistics Twelfth Edition: Presentation Designed and Written By: Barbara M. Beaver
31 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Statistical Inference
No ratings yet
Statistical Inference
106 pages
MIT2 854F10 Stats
No ratings yet
MIT2 854F10 Stats
38 pages
Probd
No ratings yet
Probd
49 pages

Chapter 4

Uploaded by

Chapter 4

Uploaded by

Statistics I

Chapter 5: Introduction to statistical inference

I Sample: finite subset of a population. The number of elements in

I A simple random sample should be used when the probability

I Let X be a random variable with distribution F . A simple random

I Each specific value x1 , . . . , xn of a s.r.s. is called observed sample.

I Assume that the value of E [X ] for a r.v. X is unknown. To

Note that X is a random variable.

5.1 1 0.9 3.8 10.2 2.1 9.5 4.5

I Consequently, the population mean is E [X ] = 4.

CHANGES IN THE MEAN WITH DIFFERENT SAMPLE SIZES

5.1 1 0.9 3.8 10.2 2.1 9.5

with sample mean x = 4.65. An histogram with all the possible

I A simple random sample of size n of a r.v. X is a set of independent

I The sample mean, X , is a random variable. In general, statistics are

I We compute the expectation and variance of the r.v. X to show the

E [a1 X1 + · · · + an Xn ] = a1 E [X1 ] + · · · + an E [Xn ]

V [a1 X1 + · · · + an Xn ] = a12 V [X1 ] + · · · + an2 V [Xn ] ,

for any set of real numbers a1 , . . . , an .

I With the previous properties, we obtain (for a s.r.s. X1 , . . . , Xn ):

I Therefore, the expected value of X is E [X ]. We say that X is an

I The previous results allow us to obtain statistics that can be used to

I Note that E [X ] = p and V [X ] = p (1 − p).

I Assume that we have a s.r.s. of size n of the r.v. X . The goal is to

I Moreover, from the previous results, we have that:

I Pablo wants to be the Mayor of a small village. As he would like to

I Consequently, the estimated proportion is b

I Next, let Y be a random variable with a Binomial distribution of

I First, if we only have a sample of Y , say Y1 , then, it is possible to

I In the previous example, if we define the variable Y =“Number of

I Second, if we have a s.r.s. of size n of the r.v. Y , say Y1 , . . . , Yn

I In this case, the estimated proportion is 0.2.

so if the sample size, n, is very big, λ

I We want to estimate the expected value of the r.v. X =“number of

I Next, let Y be a random variable with an exponential distribution of

f (y ) = λe −λy , for y > 0

Note that E [Y ] = 1/λ and V [Y ] = 1/λ2 .

I In the previous example, we also take 100 values of the r.v.

I Finally, let X be a random variable of a normal or Gaussian

Note that E [X ] = µ and V [X ] = σ 2 .

I Assume that the monthly returns of a certain financial asset follow a

I As we have seen in some of the examples, sometimes we have

I Previous slides have shown how to estimate parameter estimation of

I If X has a normal distribution N (µ, σ) and X1 , . . . , Xn is a s.r.s. of

I If X has expectation E [X ] and variance V [X ] and does not have a

I If X1 , . . . , Xn is a s.r.s. of X with a Ber (p) distribution, for large n:

I Let X be a discrete r.v. with probability function:

I A s.r.s. of size n = 125 is taken from X . Compute the probability

I From the CLT:

I Note that this is an approximate probability as we are using an

I Instead of a point estimator, it could be more informative to have an

I Assume that X1 , . . . , Xn is a s.r.s. of a r.v. X with a N (µ, σ)

I 100 samples of size n = 50 have been generated from a N (−2, 1)

I Let us suppose that the returns of the company SEGURA.SL follow

I The sample mean of the 20 returns is x = 5.188. Therefore, the

I What happens if the standard deviation is unkwnon or if the random

I Let X1 , . . . , Xn be a s.r.s. of a r.v. X with a Ber(p) distribution.

I Let’s go back to the example corresponding to the estimation of the

You might also like