0% found this document useful (0 votes)
8 views

Lecture 5

Uploaded by

priyasivags
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 5

Uploaded by

priyasivags
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Statistics for Economics

Lecture 5

Sampling and
Sampling Distributions
Reading: Newbold et al, Statistics for
Business and Economics - Chapter 6

Confidence Intervals
Reading: Chapter 7.1 - 7.3 1
Lecture Goals
After completing this Lecture, you should be able to:
 Describe a simple random sample and why sampling is
important
 Explain the difference between descriptive and
inferential statistics
 Define the concept of a sampling distribution
 Determine the mean and standard deviation for the
sampling distribution of the sample mean, X
 Describe the Central Limit Theorem and its importance
 Determine the mean and standard deviation for the
sampling distribution of the sample proportion, p̂

2
Lecture Goals (2)
After completing this lecture, you should
be able to:

 Understand the difference between biasedness and


efficiency of point sample estimators.
 Distinguish between a sample mean and a confidence
interval estimate for the population mean.
 Construct and interpret a confidence interval estimate for
a single population mean using both the Z and t
distributions.

3
Introduction

 Descriptive statistics
 Collecting, presenting, and describing data

 Inferential statistics
 Drawing conclusions and/or making decisions
concerning a population based only on
sample data

4
Inferential Statistics

 Making statements about a population by


examining sample results
Sample statistics Population parameters
(known) Inference (unknown, but can
be estimated from
sample evidence)

Sample Population

5
Inferential Statistics
Drawing conclusions and/or making decisions
concerning a population based on sample results.
 Estimation
 e.g., Estimate the population mean
weight using the sample mean
weight
 Hypothesis Testing
 e.g., Use sample evidence to test
the claim (hypothesis) that the
population mean weight is 65kg

6
Sampling from a Population

 A Population is the set of all items or individuals


of interest
 Examples: All likely voters in the next election
All parts produced today
All sales receipts for November

 A Sample is a subset of the population


 Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Random receipts selected for audit

7
Population vs. Sample

Population Sample

8
Why Sample?

 Less time consuming than a census

 Less costly to administer than a census

 Population of data may not be available,


especially when dealing with time series data.

 It is possible to obtain statistical results at


sufficiently high precision based on samples.

9
Simple Random Sample

 In Random sampling, every object in the population has


the same probability of being selected
 Objects are selected independently
 Samples can be obtained from a table of random
numbers or computer random number generators

 A simple random sample is the ideal against which


other sampling methods are compared

10
Sampling Distributions

 A sampling distribution is the probability


distribution of all the possible values a
statistic, such as the sample mean, can
take, from samples of same size, from
the same population.
 In the next slide is a visual example of
the sampling distribution for the sample
mean.
11
( ,
( ,
( ,
( ,
Sampling distribution of sample means
(recall from lecture 4)
Population, with mean µ=0,
with any distribution

Sample 1 Sample 2 Sample 3 Sample 4

X1 X2 X3 X4
Sampling
distribution of
sample
means tends f X  
towards
normal
distribution,
as n ∞ 12
0
Recap: Sample Mean

 Let X1, X2, . . ., Xn represent a random sample from a


population

 The sample mean value of these observations is


defined as
n
1
X   Xi
n i1

E[X]  μ
13
Standard Error = Standard Deviation of
Sample Mean

 Different samples of the same size from the same


population will yield different sample means, X 1 , X 2 , X 3
etc.
 A measure of the variability
σ of the sample mean from
sample to sample is given by the Standard Error of the
Mean (standard deviation of the sample mean):

σ
σX 
n
 Note that the standard error of the mean decreases as
the sample size increases, and σ falls. 14
If the Population is not Normal
(continued)

Non-Normal Population Distribution


Sampling distribution
properties:
Central Tendency

E[X]  μ μ x
Variation Sampling Distribution of x
σ becomes normal as n ∞
σx  Larger
n Smaller
sample size sample
size

x
E[X]  μ 15
Sampling Distribution of X
Properties
(continued)
σ
σX 
n
As n increases, and Larger
σ , decreases, so sample size,
σ x decreases smaller
population
standard
Smaller sample deviation
size, large
population
standard deviation

E[X]  μ x
16
Central Limit Theorem

 Even if the population is not normal,

 …sample means from the population will be


approximately normal as long as the sample
size is large enough.

17
Central Limit Theorem (CLT)
As the sample size gets larger and tends towards
x
infinity…the sampling distribution of always tends
towards a normal distribution… regardless of how the
population is distributed.. The population can be very far
x
from normal, , will still be normally distributed as n ∞
n↑

x
18
Standard Normal Distribution for
the Sample Means
 Z-value for the sampling distribution of X :

X μ X μ
Z 
σX σ
n
where: X = sample mean
μ = population mean
σ x = standard error of the mean
Z is a standardized normal random variable with mean of 0
and a variance of 1
19
Central Limit Theorem
(continued)
 Let X1, X2, . . . , Xn be a set of n independent random
variables having identical distributions with mean µ,
variance σ2, and X as the mean of these random
variables.
 As n becomes large, the central limit theorem states
that the distribution of

X  μx
Z
σX
approaches the standard normal distribution

20
How Large is Large Enough?

 For most distributions, n > 25 will give a


sampling distribution of X that is nearly
normal, even if the underlying population of X
is not normal.

21
If the Population is Normal

 If a population is normal with mean μ and


standard deviation σ, the sampling distribution
of X is also normally distributed with

σ
E[X]  μ and
σX 
n

for any sample size n

22
Sampling Distribution Properties

Normal Population

E[X]  μ Distribution

σ μ x
σX  Normal Sampling
n Distribution

Notice that the standard


deviation of the sample mean, E (x)= μ
will always be smaller than the x
standard deviation of X
23
Example

 Suppose a large population has mean μ = 8


and standard deviation σ = 3. Suppose a
random sample of size n = 36 is selected.

 What is the probability that the sample mean is


between 7.8 and 8.2?

24
Example
(continued)

Solution:
 Even if the population is not normally distributed
(we are not told - so assume it is not), the
central limit theorem can be used (n > 25)
 … so the sampling distribution of x is
approximately normal
 … where E(x ) = 8
σ 3
 …and standard deviation σ x  n  36  0.5
25
Example
(continued)
Solution (continued):
 
 7.8-8 μ X
-μ 8.2-8 
P(7.8  X  8.2)  P 3   
 σ 3 
 36 n 36 
 P(-0.4  Z  0.4)  0.3108

Population Sampling Standard Normal


Distribution Distribution Distribution .1554
??? +.1554
? ??
? ? Sample Standardize
?? ?
?
7.8 8.2 -0.4 0.4
μ8 X μX  8 x μz  0 Z
26
Acceptance Intervals

Determine a range where (1 – α)% of the sample means are


likely to occur, given a population mean and variance.
 By the Central Limit Theorem, we know that the distribution of X is
approximately normal if n is large enough, with mean μ and standard
deviation σ X
 If we let zα/2 be the z-value that leaves a % area of α/2 in the upper tail of
the standard normal distribution, thereby creating an interval: -zα/2 to zα/2
which encloses Z with probability (1 – α) 1–α

α/2
α/2

 Then μ  z  /2 σ X
- zα/2 0 zα/2 Z
is the interval that includes X with probability 1 – α.
27
Acceptance Intervals - Example

 Example Question: lets say you know that the


variable X has a population mean of 50 and
standard deviation of 10. Given a sample size
of 20, create an interval where 95% of the
sample means are likely to occur.
 Answer: we must apply the formula μ  z  /2 σ X to
work out this interval, but first we need to find
σ 10
z /2 and σ X σX    2.236
n 20

1-α = 0.95. So, α = 0.05, and α/2 = 0.025


28
Acceptance Intervals –
Example (continued)
 The Z distribution in the appendix of your
textbooks, gives you the P(Z<z).
 But we want P(Z>z) =0.025 to work out the z
value such that 2.5% of the area is in the upper
tail - in other words we want z /2 = z 0.05/2 = z 0.025
 With the use of the Z table, we can find by
looking up P(Z<0.975), to find the equivalent z 0.025 z
value. As per the Z table, z = 1.96.
 So now we have: μ  z /2σ X  50  (1.96* 2.236)  45.62, 54.38
 So the interval where 95% of the sample means
are likely to occur is between 45.62 and 54.38
29
6.3
Sampling Distributions of
Sample Proportions

Sampling
Distributions

Sampling Sampling
Distributions Distributions
of Sample of Sample
Means Proportions

30
Sampling Distributions of
Sample Proportions
P = the proportion of the population having
some characteristic
 Sample proportion (p̂ ) provides an estimate
of P:

ˆp  X  number of items in the sample having the characteristic of interest


n sample size

 0 ≤ p̂ ≤ 1
 p̂ has a binomial distribution, but can be approximated
by a normal distribution when nP(1 – P) > 5

31
^
Sampling Distribution of p
 Normal approximation:
Sampling Distribution
P(Pˆ )
.3
.2
.1
0
0 .2 .4 .6 8 1 P̂

Properties:
P(1 P)
E(pˆ )  P and
σ pˆ 
n
(where P = population proportion)
32
Z-Value for Proportions

Standardize p̂ to a Z value with the formula:

pˆ  P pˆ  P
Z 
σ pˆ P(1 P)
n

Where the distribution of Z is a good


approximation to the standard normal
distribution if nP(1−P) > 5

33
Example

 If the true proportion of voters who support


Proposition A is P = 0.4, what is the
probability that a sample of size 200 yields a
sample proportion between 0.40 and 0.45?

 i.e.: if P = 0.4 and n = 200, what is


P(0.40 ≤ p̂ ≤ 0.45) ?

34
Example
(continued)

 if P = 0.4 and n = 200, what is


P(0.40 ≤ p̂ ≤ 0.45) ?

P(1 P) .4(1 .4)


Find σ pˆ : σ pˆ    .03464
n 200

Convert to  .40  .40 .45  .40 


standard ˆ
P(.40  p  .45)  P Z 
normal:  .03464 .03464 
 P(0  Z  1.44)

35
Example
(continued)

 if P = 0.4 and n = 200, what is


P(0.40 ≤ p̂ ≤ 0.45) ?

Use standard normal table: P(0 ≤ Z ≤ 1.44) = .4251


Lecture Goals
Standardized
Sampling Distribution Normal Distribution

.4251

Standardize

.40 .45 p̂ 0 1.44


Z

36
Confidence Intervals
Key Point of the Lecture:

 When constructing Confidence Intervals for the


Population Mean, μ

 When Population Variance σ2 is Known - Use Z table

 When Population Variance σ2 is Unknown - Use T table

37
Properties of Point Estimators
 An estimator of a population parameter is
 a random variable that depends on sample
information. For example, the sample
mean is an estimator of the population
mean.
 The estimator value provides an estimate to
this unknown parameter of interest, e.g. the
sample mean value is an estimate of the
population mean value.
 The estimator is random variable because it
will take different values from different
samples, but a specific value of the estimator
is called an estimate for the population 38
parameter.
Point and Interval Estimates
 A point estimate is a single number,
 a confidence interval provides additional
information about the preciseness, variability
or equivalently, confidence in our the estimate.

Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Width of
confidence interval
39
Revision: Point Estimates

We can estimate a with a Sample


Population Parameter … Statistic
(a Point Estimate)

Mean μ x
Proportion P pˆ

40
Unbiasedness
 A point estimator θˆ is said to be an
unbiased estimator of the parameter  if its
expected value is equal to that parameter:

E(θ̂)  θ
 Examples:
 The sample mean x is an unbiased estimator of μ

 The sample variance s2 is an unbiased estimator of σ2


 The sample proportion pˆ is an unbiased estimator of P

41
Unbiasedness
(continued)

 θˆ1 is an unbiased estimator, θ̂2 is a biased


Estimate of the population parameter θ

θˆ1 θ̂ 2

θ θˆ
42
Bias

 Let θ̂ be an estimator of 

 The bias in θ̂ is defined as the difference


between its mean and 

Bias(θ̂)  E(θ̂)  θ

 The bias of an unbiased estimator is 0

43
Most Efficient Estimator

 Suppose there are several unbiased estimators of 


 The most efficient estimator or the minimum variance
unbiased estimator of  is the unbiased estimator with the
smallest variance

 Let θ̂1 and θ̂2 be two unbiased estimators of , based on


the same number of sample observations. Then,
 θˆ1 is said to be more efficient than θˆ2 if Var(θ̂1 )  Var(θ̂ 2 )

 The relative efficiency of θˆ1 with respect to θˆ2 is the ratio


of their variances:
Var(θ̂ 2 )
Relative Efficiency 
Var(θˆ1)
44
Confidence Interval Estimation
 How much uncertainty is associated with a point
estimate of a population parameter using a
sample of data?
 An interval estimate tells us about the expected
variability in our point estimate in different
samples, so it informs us about how precise our
estimate is, and thus how confident we are about
our point estimate.
 Such interval estimates are called confidence
interval estimates.
 The most common confidence interval is the
confidence interval for the population mean.
45
Confidence Interval Estimate
 An interval gives a range of values which:
 Takes into consideration variation in sample
statistics from sample to sample
 It is however based on observations from 1
sample
 Gives information about how precise
our estimate is, or close our estimate
can be expected to be relative to
unknown population parameters
 Stated in terms of level of confidence
 Any given finite interval will never yield 100%
46
confidence.
Confidence Interval and
Confidence Level
 If P(a <  < b) = 1 -  then the interval from a
to b is called a 100(1 - )% confidence
interval of . Where: α is the % margin of error.
 The quantity 100(1 - )% is called the
confidence level of the interval
  is between 0 and 1
 In repeated samples of the population, the true value
of the parameter  would be contained in 100(1 -
)% of intervals calculated this way.
 The confidence interval calculated in this manner is
written as a <  < b with 100(1 - )% confidence
47
Estimation Process

Random Sample I am 95%


confident that
Population μ is between
Mean 40 & 60.
(mean, μ, is X = 50
unknown)

Sample

48
Confidence Level, (1-)
(continued)
 Suppose confidence level = 95%
 Also written (1 - ) = 0.95
 A relative frequency interpretation:
 From repeated samples, 95% of all the
confidence intervals that can be constructed of
size n will contain the unknown true parameter
 A specific interval either will contain or will
not contain the true parameter

49
General Formula

 The general form for all confidence


intervals is:

θ̂  ME
Point Estimate ± Margin of Error

 The value of the margin of error depends


on the desired level of confidence

50
Confidence Intervals

Confidence
Intervals

Population Population Population


Mean Proportion Variance

σ2 Known σ2 Unknown

(From normally distributed populations)


51
Confidence Interval
7.2
Estimation for the Mean
(σ2 Known)
 Assumptions
 Population variance σ2 is known
 Population is normally distributed
 If population is not normal, use large sample

 Confidence interval estimate:


σ
x  z α/2
n
(where z/2 is the normal distribution value for a probability of /2 in
each tail)
52
Confidence Limits
 The confidence interval is
σ
x  z α/2
n

 The endpoints of the interval are

σ
UCL  x  z α/2 Upper confidence limit
n

σ Lower confidence limit


LCL  x  z α/2
n
53
Margin of Error
 The confidence interval,
σ
x  z α/2
n

 Can also be written as x  ME


where ME is called the margin of error

σ
ME  z α/2
n

 The interval width, w, is equal to twice the margin of


error
54
Reducing the Margin of Error

σ
ME  z α/2
n

The margin of error can be reduced if

 the population standard deviation can be reduced (σ↓)

 The sample size is increased (n↑)

 The confidence level is decreased, (1 – ) ↓

55
Finding z/2
 Consider a 95% confidence interval:
1   .95

α α
 .025  .025
2 2

Z units: z = -1.96 0 z = 1.96


Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit

 Find z.025 = 1.96 from the standard normal distribution table


56
Common Levels of Confidence
 Commonly used confidence levels are 90%,
95%, and 99%
Confidence
Confidence
Coefficient, Z/2 value
Level
1 
90% .90 1.645
95% .95 1.96
99% .99 2.58

57
Intervals and Level of Confidence
Sampling Distribution of the Mean

/2 1  /2

Intervals
x
μx  μ
extend from x1
σ 100(1-)%
LCL  x  z x2
n of intervals
to constructed
σ contain μ;
UCL  x  z
n 100()% do
Confidence Intervals not. 58
Example
 A sample of 11 circuits from a large normal
population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is 0.35 ohms.

 Determine a 95% confidence interval for the


true mean resistance of the population.

59
Example
(continued)

 A sample of 11 circuits from a large normal


population has a mean resistance of 2.20
ohms. We know from past testing that the
population standard deviation is .35 ohms.

 Solution: x  z/2
σ
n

 2.20  1.96 (.35/ 11)

 2.20  .2068

1.9932  μ  2.4068
60
Interpretation
 We are 95% confident that the true mean
resistance is between 1.9932 and 2.4068
ohms
 Although the true mean may or may not be
in this interval, 95% of intervals formed in
this manner will contain the true mean

61
(σ2
7.3
Unknow
n)
Confidence
Intervals

Population Population Population


Mean Proportion Variance

σ2 Known σ2 Unknown

(From normally distributed populations)


62
Student’s t Distribution

 Consider a random sample of n observations


 with mean x and standard deviation s

 from a normally distributed population with mean μ

 Then the variable


x μ
t
s/ n
follows the Student’s t distribution with (n - 1) degrees
of freedom
63
Student’s t Distribution

 The t is a family of distributions


 The t value depends on degrees of
freedom (d.f.)
 Number of observations that are free to vary after
sample mean has been calculated

d.f. = n - 1

64
Student’s t Distribution
Note: t Z as n increases

Standard
Normal
(t with df = ∞)

t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal

0 t
65
Student’s t Table

Upper Tail Area


Let: n = 3
df .10 .05 .025 df = n - 1 = 2
 = .10
1 3.078 6.314 12.706 /2 =.05
2 1.886 2.920 4.303
3 1.638 2.353 3.182 /2 = .05

The body of the table


contains t values, not 0 2.920 t
probabilities
66
t distribution values
With comparison to the Z value

Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.)

.80 1.372 1.325 1.310 1.282


.90 1.812 1.725 1.697 1.645
.95 2.228 2.086 2.042 1.960
.99 3.169 2.845 2.750 2.576

Note: t Z as n increases
67
(σ2
Unknow
n)

 If the population standard deviation σ is


unknown, we can substitute the sample
standard deviation, s
 This introduces extra uncertainty, since
s is variable from sample to sample
 So we use the t distribution instead of
the normal distribution
68
Confidence Interval Estimation
for the Mean (σ2 Unknown)
(continued)
 Assumptions
 Population standard deviation is unknown
 Population is normally distributed
 If population is not normal, use large sample
 Use Student’s t Distribution
 Confidence Interval Estimate:
s
x  t n-1,α/2
n

where tn-1,α/2 is the critical value of the t distribution with n-1 d.f.
and an area of α/2 in each tail:
P(tn1  t n1,α/2 )  α/2
69
Margin of Error
 The confidence interval,
s
x  t n-1,α/2
n

 Can also be written as x  ME


where ME is called the margin of error:
s
ME  t n-1,α/2
n

70
Example
A random sample of n = 25 has x = 50 and
s = 8. Form a 95% confidence interval for μ

 d.f. = n – 1 = 24, so t n1,α/2  t 24,.025  2.0639

The confidence interval is


s
x  t n-1,α/2
n
8
50  (2.0639)
25
46.698  μ  53.302 71
Lecture Summary
 Introduced sampling distributions
 Described the sampling distribution of sample means
 For normal populations
 Using the Central Limit Theorem
 Described the sampling distribution of sample
proportions
 Calculated probabilities using sampling distributions
 Formulated confidence intervals for the population
mean using a sample of data, using Z or t distribution
critical values.

72

You might also like