0% found this document useful (0 votes)
21 views44 pages

Estimation

Uploaded by

Filimon Cheneke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views44 pages

Estimation

Uploaded by

Filimon Cheneke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

Statistical Estimation

Desta M. (MPH)

1
Objective
• To know methods and principles of drawing
conclusions about a larger group (or population)
based on samples taken from that population

2
Introduction
• Descriptive statistics help investigators describe
and summarize data.

• Probability and sampling distribution concepts


needed to evaluate data using statistical
methods.

• Inferential statistics are the statistical methods


used to draw conclusions from a sample and
make inferences to the entire population.

• The two primary methods for making inference


are estimation and hypothesis testing.
3
4
5
Statistical Estimation
• Estimation is the process of determining a likely value
for a variable in the survey population, based on
information collected from the sample.

• Estimation is the use of sample statistics to estimate


population parameters.

• For example, a sample survey could be used to


produce any of the following statistics:

– estimates for the proportion of smokers among


all people aged 15 to 24 in the
population;
– the mean level of a certain enzyme among
healthy men 6
Parameter Estimations
• Population parameter: the underlying (unknown)
distribution of the variable of interest for a
population

• Sample parameter: estimates of the population


parameters obtained from a sample.

7
Example:
• A sample survey revealed:
― Proportion of smokers among a certain group of
population aged 15 to 24.
― Mean of SBP among sampled population
― Prevalence of HIV among people involved in the
study

→The next question is what can we predict


about the characteristics of the
population from which the sample was
drawn.

8
Types of Estimates
 Point Estimation: A single numerical value is used to
estimate the corresponding population parameter.
 x is an estimator of the population mean μ.

 s is an estimator of the population standard deviation


σ
 p is an estimator of the population proportion π.

 Point estimate is always within the interval


estimate

9
Point Estimation …
 From a single sample we can calculate a sample
statistic to estimate a single parameter (a point
estimate).
 Point estimate for population mean µ is
n

i =1
xi
x =
n

 Point estimate for population proportion


 xis given by
p=
n

 Where x is the total number of success (events)


10
Mean … Example
 A SRS of 16 apparently healthy subjects yielded
the following values of urine excreted (mg per
day);
0.007, 0.03, 0.025, 0.008, 0.03, 0.038, 0.007,
0.005, 0.032, 0.04, 0.009, 0.014, 0.011, 0.022,
0.009, 0.008

If xCompute
1 , x 2 , ..., x n estimate
point are n observed valuesmean
of the population , then
n

x i
0.295
x= i =1
 0.01844
n 16 11
Proportion … Example
• In a survey of 300 automobile drivers in one city,
123 reported that they wear seat belts regularly.

• Estimate the seat belt rate of the city.

• Answer : p= 123/300 = 0.41=41%

12
Interval estimation
 Interval estimation: is a statement that a population
parameter has a value lying between two specified limits.

 The value of the sample statistic will vary from sample to


sample therefore to simply obtain an estimate of the
single value of the parameter is not generally acceptable.

 We need to take into account the sample to sample


variation of the statistic.
 A confidence interval defines an interval within which
the true population parameter is like to fall (interval
estimate).
13
Interval estimation …

• Interval estimate (Confidence interval) - consists


of two numbers, a lower limit and an upper limit
which serve as the bounding values within which
the parameter is expected to lie with a certain
degree of confidence.
• Interval estimate:
• Takes into consideration variation in sample
statistics from sample to sample
• Provides Range of Values Based on Observations
from 1 Sample
• Gives Information about Closeness to Unknown
Population Parameter
• Stated in terms of Probability
• Never 100% Sure 14
Interval estimation …
• Two questions to put bounds on our point
estimates to reflect our level of confidence
– How wide does the bracket have to be?
– What is our tolerance of error
(variability, not mistake)?
• Scientists usually accept a 5% chance that the
range will not include the true population value
– The range or interval is called 95%
confidence interval
• however 90% and 99% confidence intervals are
sometimes used.

15
16
17
18
19
Factors Affecting Interval Width
 Level of Confidence

 90% CI is narrower than 95% CI since we are


only 90% certain that the interval includes the
population parameter.

 The 99% CI is wider than 95% CI; the extra width


meaning that we can be more certain that the
interval will contain the population parameter.

20
Factors Affecting Interval Width …

 But to obtain a higher confidence from the same


sample, we must be willing to accept a larger margin of
error (a wider interval).

 For a given confidence level (i.e. 90%, 95%, 99%) the


width of the confidence interval depends on the standard
error of the estimate which in turn depends on the:

A. Sample size:-The larger the sample size, the


narrower the confidence interval and the more
precise our estimate.

21
Factors Affecting Interval Width …
 You can make the precision as high as you
want by taking a large enough sample.
 The margin of error decreases as√n
increases.

B. Standard deviation:-The more the


variation among the individual values, the
wider the confidence interval and the less
precise the estimate.
 As sample size increases SD decreases.
22
• Confidence Intervals for

• A single population mean


• A single population proportion

23
1) C.I. for a population mean
(normally distributed)
A) Known variance (large sample size)

• A 100(1‐α)% C.I. for μ is

• α is to be chosen by the researcher, most common


values of α are 0.05, 0.01, 0.001 and 0.1.

24
• 100(1-α)% CI for μ when σ is known (sampling
from normal population or large sample)

The 95% confidence interval is interpreted in such


a way that, under the conditions assumed for
underlying distribution, you are 95% confident that25
Example
 A physical therapist wished to estimate, with 99%
confidence, the mean maximal strength of a
particular muscle in a certain group of individuals.

 He assume that strength scores are approximately


normally distributed with a variance of 144.
 A sample of 150 subjects who participated in the
experiment yielded a mean of 84.3.

26
• Solution:

⇒ We are 99% confident that the population mean


is between 76.3 and 92.3.

27
Example
• A data on 199 patients on systolic blood pressure
gives a mean value of 125.8 mmHg. Let us
assume that the standard deviation for this
patient population is known to be 20 mmHg.

Construct a 95 percent confidence interval for


the population mean.

28
• Solution
• α = 0.05 Z α/2 ⇒ 1.96

125.8 ±1.96× 20
√199

• The 95% CI is (123.0, 128.6 mmHg )

• We are 95% sure that the average systolic blood


pressure for similar patients is between 123 and
128.6.

29
B) Unknown variance (small sample size
n ≤ 30)
A 100(1‐α)% C.I. for μ is

 The t distribution density curve is bell shaped and


symmetrical about zero.
Different curves for different df (i.e. sample sizes)
and for very large df very close to Z.
30
The Z-test is applied when:
 The distribution is normal

 The population standard deviation σ is known or

 When the sample size n is large ( n ≥ 30) and

 With unknown σ (by taking S as estimator of σ).

31
But, what happens when n< 30 and σ is unknown?

 We will use a t-distribution which depends on the number of


degrees of freedom (df).

 The distribution is symmetrical, bell-shaped and similar to


the normal but more spread out.

 The sample standard deviation is used as an estimate of σ


(the standard deviation of the population which is unknown)
and appears to be a logical substitute.

 For large sample sizes (n ≥ 30), both t and Z curves are so


close together and it does not much matter which you use.

32
33
Degrees of Freedom
 It is defined as the number of values which are free
to vary after imposing a certain restriction on your
data.

Example: If 3 scores have a mean of 10, how many of


the scores can be freely chosen?

Solution: The first and the second scores could be


chosen freely (i.e., 8 and 12, 9 and 5, 7 & 15, etc.)

But the third score is fixed (i.e., 10, 16, 8, etc.)


 Hence, there are two degrees of freedom 34
35
36
• Example
• In a study of preeclampsia, Kaminski and Rechberger
found the mean systolic blood pressure of 10 healthy,
nonpregnant women to be 119 with a standard
deviation of 2.1.
A. What is the estimated standard error of the
mean?

B. Construct the 99% confidence interval for the


mean of the population from which the 10
subjects may be presumed to be a random
sample.

C. What is the precision of the estimate?

D. What assumptions are necessary for the validity


of the confidence interval you constructed? 37
38
C. Precision = 3.250 X 0.66
= 2.16
D. The population is normally distributed. The
10 subjects represent a random sample from this
population

39
2) C.I. for a population proportion
(large sample size)

40
• Example
• A research study obtained data regarding sexual
behavior from a sample of unmarried men and
women between the ages of 20 and 44 residing
in geographic areas characterized by high rates
of sexually transmitted diseases and admission to
drug programs. Forty percent of 1229
respondents reported that they never used a
condom.

• Construct a 95 percent confidence interval for


the population proportion never using a condom.

41
42
Example

• In a survey of 300 automobile drivers in one city,


123 reported that they wear seat belts regularly.
Estimate the seat belt rate of the city and 95%
confidence interval for true population
proportion.

• Answer :p= 123/300 =0.41=41%


n=300,
Estimate of the seat belt of the city at 95%
CI = p ± z ×(√p(1-p) /n) =(0.35,0.47)

43
44

You might also like