0% found this document useful (0 votes)
32 views60 pages

Chapte 8 Estimation

Uploaded by

lindazd1223
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views60 pages

Chapte 8 Estimation

Uploaded by

lindazd1223
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

College of Health and Medical Science

Department of Epidemiology and Biostatistics

Statistical Estimation Techniques


Hamdi Fekredin (BSc, MPH)

October, 2024

[email protected] 1
Estimation techniques
Learning objectives
Upon completion of the session, students will be able to
Identify the different estimation techniques in one samples situation
Estimate sample size for cross-sectional study

2
Introduction
In the real world, the values of population parameters are fixed
and usually not known.
Instead, we must try to say something about the way in which
a variable is distributed using the information contained in a
sample of observations.
The process of drawing conclusions about an entire population
based on the data in a sample is known as statistical inference.
Two broad categories: Estimation and Hypothesis testing.

3
Estimation
Is concerned with estimating the values of specific population
parameters based on sample statistic.
Is about using information in a sample to make estimates of the
characteristics (parameters) of the source population.

Examples: A sample survey revealed:


 Proportion of smokers among a certain group of population aged 15 to 24.

 Mean of SBP among sampled population

The next question is what can we predict about the characteristics of


the population from which the sample was drawn
4
Estimation, Estimator & Estimate
♣ Estimation is the computation of a statistic from sample data,
often yielding a value that is an approximation (guess) of its
target, an unknown true population parameter value.

♣ The statistic itself is called an estimator and can be of two


types - point or interval.

♣ The value or values that the estimator assumes are called


estimates.
5
Two methods of estimation are commonly used:
point estimation and interval estimation

Point estimation involves the calculation of a single number to


estimate the population parameter
Interval estimation specifies a range of reasonable values for
the parameter

6
Point versus Interval Estimators
♣ An estimator that represents a "single best guess" is called a
point estimator.

♣ When the estimate is of the form of a "range of plausible


values", it is called an interval estimator.

 Thus,
 A point estimate is of the form: [ Value ],

 Whereas, an interval estimate is of the form: [ lower limit,


upper limit ] 7
Sample mean ( ) is an unbiased estimator of population mean.

8
Estimating the Sampling Error

 Any estimates derived from samples are subject to the


sampling error.
 This comes from the fact that only a part of the population
was observed, instead of the whole.
 A different samples could have come up with different results.

 The amount of variation that exists among the estimates from


the different possible samples is the sampling error. 9
 The set of sample means in repeated random samples of size n from a
given population has variance .
 The standard deviation of this set of sample means is and is
referred to as the standard error of the mean (sem) or the standard
error.
 The SEM is estimated by if  is unknown.

10
 The sampling error is dependent on sample size (n) and the

variability of individual sample points ().


 As n increases, the sample mean ( ) and the sample variance
s2 approach the values of the true population parameters, µ
and 2, respectively.

11
Example
 Suppose that the mean ± sd of DBP on 20 old males is 78.5 ± 10.3 mm
Hg.

1. What is our best estimate of µ ?

Our best estimate of µ is 78.5

2. What is the SEM ?


The sem of this estimate is 10.3/√20 = 2.3

3. Compare the SEM with the sd.


The sem (2.3) is much smaller than sd (10.3).
12
1. Point Estimate
 A single numerical value used to estimate the corresponding
population parameter.
Sample Statistic are Estimators of Population Parameters

Sample mean, µ
Sample variance, S2 2
Sample P or π
proportion, p OR
Sample Odds Ratio,
RR

ρ 13
Sample Relative Risk, RŔ
2. Interval Estimation
 Interval estimation specifies a range of reasonable values for
the population parameter based on a point estimate.
 A confidence interval is a particular type of interval estimator.

Confidence Intervals
 Give a plausible range of values of the estimate likely to include
the “true” (population) value with a given confidence level.
 An interval estimate provides more information about a
population characteristic than does a point estimate 14
 CI’s also give information about the precision of an estimate.

 When sampling variability is high, the CI will be wide to reflect


the uncertainty of the observation.

 Wider CIs indicate less certainty.

 CIs can also answer the question of whether or not an


association exists (analogous to p-values…).

 Narrow CI widths reflects large sample size or low variability


or both. 15
General Formula:
The general formula for all CIs is:

The value of the statistic in sample


(eg., mean, proportions, etc.)
point estimate  (measure of how confident we want to be)
 (standard error)

From a Z table or a T table, depending on the


sampling distribution of the statistic.

16
A confidence interval has 3 components:

1) A point estimate (e.g. the sample mean)

2) The standard error of the point estimate ( e.g. SEM =σ/√ n )

3) A confidence coefficient (conf. coeff)


Lower limit = Point Estimate - (Critical Value/ confidence
coefficient) x (Standard Error)
Upper limit = Point Estimate + (Critical Value/ confidence
coefficient) x (Standard Error)
17
Confidence Level
 Confidence Level:

 Confidence in which the interval will contain the unknown

population parameter
 A percentage (less than 100%)

Example: 95%
 Also written (1 - α) = .95

18
Definition of 95% CI
1. Probabilistic interpretation:
 If all possible random samples of a given sample size were obtained
and if each were used to obtain its own CI, then 95% of all such CIs
would contain the unknown population parameter; the remaining 5%
would not.

2. Practical interpretation
 When sampling is from a normally distributed population with known
standard deviation, we are 100 (1-α) [e.g., 95%] confident that the
single computed interval contains the unknown population
parameter. 19
Estimation for Single Population

20
1. CI for a Population Mean (normally distributed)

A. Known variance (large sample size)

Consider the task of computing a CI estimate of μ for a


population distribution that is normal with σ known.
 Available are data from a random sample of size = n.

21
Assumptions
 Population standard deviation () is known

 Population is normally distributed

 If population is not normal, use large sample

A 100(1-)% C.I. for  is:

  is to be chosen by the researcher, most common values


of  are 0.1, 0.05 and 0.01. 22
3. Commonly used CLs are 90%, 95%, and 99%

23
Finding the Critical Value

24
Margin of Error
(Precision of the estimate)

25
Factors Affecting Margin of Error

The CI for mean or margin of error is determined by n, s,


and α.
As n increases, the CI decreases.

As s increases, the length of CI increases.

As the confidence level increases (α decreases), the length


of CI increases.
26
Example:
1. Waiting times (in hours) at a particular hospital are believed to
be approximately normally distributed with a variance of
2.25 hr.

a. A sample of 20 outpatients revealed a mean waiting time of


1.52 hours. Construct the 95% CI for the estimate of the
population mean.

b. Suppose that the mean of 1.52 hours had resulted from a


sample of 32 patients. Find the 95% CI.

c. What effect does larger sample size have on the CI? 27


a.
2.25
1.52 1.96 1.52 1.96(.33)
20
1.52 .65 (.87, 2.17)

 We are 95% confident that the true mean waiting time is


between 0.87 and 2.17 hrs.
 95% of the intervals formed in this manner will contain the true
mean.

28
b. 2.25
1.52 1.96 1.52 1.96(.27)
32
1.52 .53 (.99, 2.05)

c. The larger the sample size makes the CI narrower (more


precision).

29
 When constructing CIs, it has been assumed that the standard
deviation of the underlying population,  , is known
 What if  is not known?

 In this case, the SE of the population can be replaced by the


SE of the sample if the sample size is large enough (n>30).
With large sample size, we assume a normal distribution.

30
 Example: It was found that a sample of 35 patients were 17.2
minutes late for appointments, on the average, with SD of 8
minutes. What is the 90% CI for µ? Ans: (14.98, 19.42).
 Since the sample size is fairly large (>30) and the
population SD is unknown, we assume the distribution
of sample mean to be normally distributed based on
the CLT and the sample SD to replace population .

31
B. Unknown variance
(small sample size, n ≤ 30)
 What if the  for the underlying population is unknown and
the sample size is small?

 As an alternative we use Student’s t distribution .

32
33
Student’s t Distribution
 The t is a family of continuous probability distributions

 Bell Shaped

 Symmetric about zero (the mean)

 Flatter than the Normal (0,1). This means

The variability of a t is greater than that of a Z that is


normal(0,1)
Thus, there is more area under the tails and less at center

Because variability is greater, resulting confidence intervals


34
will be wider.
• Note: t approaches z as n increases

35
Student’s t Table

36
t distribution values
 With comparison to the Z value

37
Example

 Standard error =

 t-value at 90% CI at 19 df =1.729

38
39
2. CIs for population proportion, p

Is based on three elements of CI.


Point estimate

SE of point estimate

Confidence coefficient
40
41
42
Lower limit = Point Estimate - (Critical Value) x (Standard
Error of Estimate)
Upper limit = Point Estimate + (Critical Value) x (Standard
Error of Estimate)

Hence,

is an approximate 95% CI for the true proportion p.

43
Example 1
 A random sample of 100 people shows that 25 are left-
handed. Form a 95% CI for the true proportion of left-
handers.

44
Interpretation

45
Example
 It was found that 28.1% of 153 cervical-cancer cases had never
had a Pap smear prior to the time of case’s diagnosis. Calculate
a 95% CI for the percentage of cervical-cancer cases who never
had a Pap test.

46
Sample size Determination
Too small sample size :
May fail to detect an important effect

Estimates of effect may be too imprecise (wide CI’s)

Too many sample size:


May results in wastage of resources.

To make generalizations about entire population, we need


a total sample size of 200-400
47
Confidence interval approach
 Given confidence interval
mean ( proportion ) z  s.e
2

 Hence the absolute precision denoted by d is given as


Margin of error
d = z s.e

 Where s.e is the standard error2 of the estimator of the
parameter of interest.

48
Steps to determine sample size:
1. Specify tolerable error (i.e., desired precision and confidence
level via d and  )

2. Identify appropriate equation relating tolerable error (d, ) to


sample size (n)

3. Estimate unknown quantities in equation

4. Solve for n

5. Evaluate (and return to first step)


sample size calculation should relate to the study’s outcome
variable 49
Estimating a single population
mean/proportion

50
Examples
1. A survey is being planned to determine what proportion of
families in a certain area are medically indigent. It is found
that the proportion is 0.35 from previous studies. A 95%
confidence interval is desired with d=5% What size sample of
families should be selected?
2. Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% points with 95%
confidence. Let p=0.20, d=0.03, α=5%

52
Example
3. Suppose that for a certain group of cancer patients, we are
interested in estimating the mean age at diagnosis. We would like
a 95% CI and wants margin of error of 2 units.

If the population SD is 124 years, how large should our sample


be?

= 1.96*1.96*124 = 119
2*2

53
Suppose there is no prior information about the proportion
(p) who breastfeed

For a fixed absolute precision (d), the required sample


size increases as P increases form 0 to 0.5, and then
decreases in the same way as the prevalence
approaches 1.

54
 An estimate of p is not always available.

 However, the formula may also be used for sample size


calculation based on various assumptions for the values of
p.
P = 0.1  n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2  n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3  n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5  n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7  n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8  n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
55
Some Considerations

56
Using design effect
 The loss of effectiveness by the use of cluster sampling,
instead of simple random sampling, is the design effect.
 The design effect is basically the ratio of the actual variance,
under the sampling method actually used, to the variance
computed under the assumption of simple random
sampling
Using design effect cont.…
 When simple and systematic random sampling
techniques are used design effect is one.
 When clustering sampling technique is used design
effect is two.
 When multi stage sampling technique is used design
effect is equal to the number of stages.
Quiz

1. List and explain at least two types of probability sampling


2. State central limit theorem
3. Differentiate point and interval estimation

60

You might also like