100% found this document useful (1 vote)
207 views74 pages

Inferential Estimation

Here are the steps to construct a 95% confidence interval for the population mean waiting time based on this sample: 1) The point estimate of the population mean (μ) is the sample mean (x̄), which is 17.2 minutes. 2) We do not know the population standard deviation (σ), so we will use the sample standard deviation (s) and a t-distribution instead of the normal. 3) With a sample size of n=35, the t-value for a 95% CI is approximately 2. 4) The formula for the 95% CI is: x̄ ± t(α/2, n-1) * (s/√n)

Uploaded by

Abrham Belay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
207 views74 pages

Inferential Estimation

Here are the steps to construct a 95% confidence interval for the population mean waiting time based on this sample: 1) The point estimate of the population mean (μ) is the sample mean (x̄), which is 17.2 minutes. 2) We do not know the population standard deviation (σ), so we will use the sample standard deviation (s) and a t-distribution instead of the normal. 3) With a sample size of n=35, the t-value for a 95% CI is approximately 2. 4) The formula for the 95% CI is: x̄ ± t(α/2, n-1) * (s/√n)

Uploaded by

Abrham Belay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Estimation

Ephrem Mannekulih (BSc, MSc)


Biostatistics & Health Informatics
Statistics vs. Parameters
 Sample Statistics – any summary measure calculated
from the sample

o E.g., The mean vitamin D level in a sample of 100 study


subjects is 63 nmol/L

 Population parameter – the true value in the entire


population

o E.g., The true mean vitamin D in the entire population is


62 nmol/L
Statistical inference
 It is the process of drawing conclusions about an entire
population parameter based on sample statistics.

 There are usually two methods of inference:

1. Estimation

2. Hypothesis Testing
Estimation
 The procedure of estimating the values of specific
population parameters based on sample statistics.

 The objective is to determine the approximate value of a


population parameter

 Estimator :- is statistical formula or equation by which


statistics are computed and inferred to population

 Estimate:- the value or values that are computed by


estimator.
Properties of good estimates
1. Unbiased

o An estimate whose expected value is equal to parameter


[E(T) = 𝜃).

2. Consistency

o When difference between estimate and parameter is


relatively stable as the sample size grows larger.

3. Relatively efficient

o Estimate of a parameter whose variance is smaller is said to


be relatively efficient
Methods of Estimation
 There are two common methods of estimation

1. Point estimation

2. Interval estimation

 Point estimation involves the calculation of a single sample


statistics to estimate the population parameter

 Interval estimation specifies a range of reasonable values


for the parameter, most likely includes parameters to be
estimated
Point Estimate
 A point estimate is a single numerical value of sample
statistics used to estimate corresponding population
parameter

o A point estimate is of the form: [ Value ],

Corresponding
Sample Statistics
Population Parameters
𝑋 µ
𝑆2 𝜎2
𝑆 𝜎
𝑃 P
Interval Estimate
 Because of variation in sampling, a point estimate is not
expected to be equal to population parameters

 It is more meaningful and save to estimate using interval


which is more likely to include true population parameters

 Interval estimation specifies a range of reasonable values for


the population parameter based on a point estimate.

o An interval estimate is of the form: [ lower limit, upper


limit ]
Point vs. Interval Estimate
 A point estimate is a single value used to predict true
population parameters

 An interval estimate is a range of reasonable values likely to


include true population parameter
Confidence Intervals(CI)
 A confidence interval is a particular type of interval estimator.

 Give a plausible range of values of the estimate likely to


include the true population parameters with a given
confidence level.

 A confidence interval provides more information about a


population parameters than does a point estimate
…Confidence Intervals
Confidence interval gives information about;

1. The level of confidence in estimating population


parameters

2. Information about the precision of an estimates

 How much uncertainty is associated with a point


estimate of a population parameter

 When sampling variability is high, the CI will be


wider and it reflect less certainty of the estimate
…Confidence Intervals
3. Information about whether or not an association exists
(analogous to p-values…)

o E.g., if the CI of odds ratio includes the value 1 we cannot


be confident that exposure is associated with disease.

4. Indicate whether or not an intervention/treatment is beneficial


or harmful.

o E.g., if the odds ratio of intervention/treatment is greater


than 1 it is beneficial & if it is between 0 & 1 it is harmful
…Confidence Intervals
 A CI in general:

o Takes into consideration variation in sample statistics


from sample to sample

o Based on observation from single sample

o Gives information about closeness to unknown


population parameters
Constructing CI
 There are 3 important elements in constructing CI:

1. Point estimate

2. SE of the point estimate

3. Confidence coefficient
General Formula: The general formula for
all CIs is:
The value of the statistics in a sample (eg.,
mean, odds ratio, etc.)

point estimate  (measure of how confident we want to be)


 (standard error)

From a Z table or a T table, depending on


the sampling distribution of the statistics.
Standard error of the statistic.
Cont.…
 Lower limit = Point Estimate - (Critical Value) x (Standard
Error)

 Upper limit = Point Estimate + (Critical Value) x (Standard


Error)

o A wide interval suggests imprecision of estimation.

o Narrow CI widths reflects large sample size or low


variability or both.
Finding the Critical Value
Confidence Level
 Confidence Level

o Confidence in which the interval will contain the


unknown population parameter

 Usually a percentage less than 100%

o Example: 95%

 Also written (1 - α) = 0.95


Interpreting Confidence Intervals
1. Probabilistic interpretation:

 In repeated sampling, from a normally distributed population


with a known standard deviation, 100 (1-α) percent [e.g.,
95%] of all intervals constructed by Estimate  (𝒁𝟏−𝜶/𝟐 )
 (SE) in the long run include the true population
parameters

 Approximately 95 percent of the intervals constructed by


Estimate  (𝒁𝟏−𝜶/𝟐 )  (SE) will include the true
population parameters
Cont.…
2. Practical interpretation

 When sampling is from a normally distributed population


with known standard deviation, we are 100 (1-α) [e.g., 95%]
confident that the single computed interval contains the
unknown population parameter.
Parent population=normal; n=5
Use known standard deviation; Use Z-scores

Limits= 1.96*known SD,


Keep only the confidence intervals that missed the true mean

3 misses=6% error rate For a 95% confidence


interval, you can be
95% confident that
you captured the true
population value.
Parent population=Bernouilli (.25); n=100
Use sample p

Limits= 1.96*Estimated SE,


Single miss=2% error rate
Margin of Error (Precision of the estimate)
 Margin of Error(e):- the value added or subtracted to the
point estimate to form CI

 It quantify the variability around the point estimate

 Example: margin of error for estimating population mean


Factors Affecting Margin of Error
 The margin of error is determined by n, 𝜎, and α.

o As n increases, the margin of error decreases

o As 𝜎 increases, the margin of error increases

o As the confidence level increases (α decreases), the


margin of error increases.
CI Estimation for Single Population
1. CI for a Single Population Mean: When 𝜎
known
A. Assumptions B. Assumptions

 Sample is randomly drawn  Sample is randomly drawn


from normally distributed from not normally
population distributed population

 A 100(1-)% C.I. for  is:  Sample is large (n ≥ 30)

𝝈  A 100(1-)% C.I. for  is:


𝑿 ± 𝒁𝜶/𝟐
𝒏 𝝈
𝑿 ± 𝒁𝜶/𝟐
𝒏
Cont.…
 The point estimate of 𝜇 is sample mean 𝑥
𝜎
 The standard error of the 𝑥 is
𝑛

 Commonly used CLs are 90%, 95%, and 99%


Example:
 Waiting times (in hours) at a particular hospital are believed
to be approximately normally distributed with a variance of
2.25 ℎ𝑟 2 .

A. A sample of 20 outpatients revealed a mean waiting


time of 1.52 hours. Construct the 95% CI for the
estimate of the population mean.

B. Suppose that the mean of 1.52 hours had resulted from a


sample of 32 patients. Find the 95% CI.

C. What effect does larger sample size have on the CI?


Answer
A.
2.25
1.52  1.96  1.52  1.96 (.33)
20
 1.52  .65  (.87 , 2.17 )

 We are 95% confident that the true mean waiting time is


between 0.87 and 2.17 hrs.
 Although the true mean may or may not be in this interval,
95% of the intervals formed in this manner will contain the
true mean.
 An incorrect interpretation is that there is 95% probability
that this interval contains the true population mean.
Cont.…
B.
2.25
1.52  1.96  1.52  1.96(.27)
32
 1.52  .52  (.99, 2.05)

C. The larger the sample size makes the CI narrower (more


precision).
Example 2
 In a study of patient flow through the offices of general
practitioners, it was found that a sample of 35 patients was
17.2 minutes late for appointments, on the average. Previous
research had shown the standard deviation (𝜎) to be about 8
minutes. The population distribution was felt to be non-
normal. What is the 90 percent confidence interval form, the
true mean amount of time late for appointments?
Answer
 Since the population standard deviation is known, sample is
randomly drawn from not normally distributed population
and sample is large enough (n ≥ 30)

o We use Z-distribution to construct CI

8
o 𝑍 at 𝛼 = 10% = 1.645 and 𝜎𝑥 =
𝛼 = 1.3522
2 35

o 90% CI;
…….CI for a Single Population Mean:When 𝝈
unknown
C. Assumptions D. Assumptions

 Sample is randomly drawn  Sample is randomly drawn


from normally distributed from not normally
population distributed population

 A 100(1-)% C.I. for  is:  The sample is large (n ≥ 30)

𝑺  A 100(1-)% C.I. for  is:


𝑿 ± 𝒕𝜶/𝟐
𝒏 𝑺
𝑿 ± 𝒕𝜶/𝟐
𝒏
Student’s t Distribution
 The t-distribution is a family of distributions which is bell
shaped and symmetric about zero mean

 Discovered in 1908 by William Gossett

 The shape of the distribution depends on the sample size n

 Indexed by a parameter referred to as the degrees of freedom


(df ) of the distribution
Student’s t Vs. z Distribution
 Flatter than the Normal-distribution

o The variability of a t-distribution is greater than that of a


Z-distribution

o Thus, there is more area under the tails and less at center

o Because variability is greater, resulting confidence


intervals will be wider.
 Note: t approaches z as n increases
What happens as sample gets larger?
 As sample gets larger, the t-distribution looks more like the
Z-distribution with mean=0 and variance=1.

 Z and t values become almost identical, so CIs are almost


identical T-distribution and Standard Normal Z distribution

0.4
Z distribution
0.3
density

0.2 T with 60 d.f.

0.1

0.0

-5 0 5
Value
Degrees of Freedom (df)
 df = Number of observations that are free to vary after
sample mean has been calculated df = n-1
Student’s t Table
t distribution values
 With comparison to the Z value
Example
 A random sample of 20 patient’s duration of cardiac bypass has
a mean duration 𝑋=267 minute and variance 𝑆 2 = 36,700 𝑚2 .
Assume that sample are drawn from normally distributed
population with unknown variance. Construct 90% CI to
estimate the unknown population mean.
Answer
 Since the population variance is unknown and the
population is normally distributed; we use t-distribution to
construct CI

𝑆 36,700
 Standard error = = = 42.84 minutes
𝑛 20

 t-value at 90% CL at 19 df =1.729

 90% CI = 267 ± 1.729*42.84 = [192.93, 341.06]


2. CIs for single population proportion, p
Assumptions
 The sample is randomly drawn from population

 The conditions for the binomial distribution are satisfied

 The distribution of sample proportion (𝑝 ) is


approximately normal if the sample size is large

o ( n𝒑 ≥ 𝟓 𝐚𝐧𝐝 𝐧𝒒 ≥ 𝟓)
…….CIs for single population proportion, p
 A sample proportion is used as the point estimator of the
population proportion

 An interval estimate for population proportion (P) can be


calculated;

 Estimate ± (reliability coefficient) × (standard


error of the estimator)
Cont.…
 Standard error of the proportion is

 We estimate the standard error of proportion data as


Cont.…
 The confidence interval for population proportion is
calculated as;

 Where;

o Z is standard normal value for level of confidence

o 𝑃 is sample proportion

o n is sample size
Cont.…
 Lower limit = Point Estimate - (Critical Value) x (Standard
Error of Estimate)

 Upper limit = Point Estimate + (Critical Value) x (Standard


Error of Estimate)

 An approximate of 95% CI for the true proportion p is;


Example 1
 A random sample of 100 people shows that 25 are left-handed.
construct a 95% CI for the true proportion of left-handers.

 Answer;

o 𝑃 = 25/100=0.25

𝑃(1−𝑃)
o 𝑆𝑃 = = 0.25(0.75)/100 = 0.0433
𝑛

o 0.25± 1.96(0.0433)

o [0.1651, 0.3349]
Interpretation
 We are 95% confident that the true percentage of left hand in
the population is between 16.51 and 33.49%

 Although may or may not contain the true population


proportion, 95% of interval formed from a random sample
100 in this manner will contain true population parameter
Example 2
 It was found that 28.1% of 153 cervical-cancer cases had
never had a Pap smear prior to the time of case’s diagnosis.
Calculate a 95% CI for the percentage of cervical-cancer
cases who never had a Pap test.
CI Estimation for Two Populations
3. CI for the difference between two population
means: When both 𝝈𝟏 and 𝝈𝟐 are known
A. Assumptions B. Assumptions
 Samples are randomly and  Samples are randomly and
independently drawn independently drawn

 Populations are normally  Both populations are


distributed normally distributed

 The CI for 𝑋1 -𝑋2  Both 𝑛1 and 𝑛2 are ≥ 30

 The CI for 𝑋1 -𝑋2


……….CI for the difference between population
means
 The standard error of 𝑋1 -𝑋2

 The point estimate for difference is


Example 1
 Researchers are interested in the difference between serum
uric acid levels in patients with and without Down’s
syndrome. Accordingly a sample of 12 individuals with
Down’s syndrome yielded a mean of 4.5 mg/100ml and
𝜎1 2 =1. A sample of 15 normal individuals of the same age
and sex were found to have a mean 3.4 mg/100ml and
𝜎2 2 =1.5. If it is reasonable to assume that the two
populations of values are normally distributed calculate the
95% CI for 𝜇1 - 𝜇2 .
 Answer
o SE = 0.4282, 95% CI = 1.1 ± 1.96 (0.4282) = (0.26, 1.94)
o WE are 95% confident that the true difference between
the two population means is between 0.26 and 1.94.
………CI for the difference between
population means
C. Assumptions

 Samples are randomly and independently drawn

 Both 𝜎1 and 𝜎2 are unknown

 The two sampled populations are normally distributed or the


condition (𝑛1 & 𝑛2 ≥ 30) is fulfilled

 In these cases there are two situations:

1. When population variances are equal

2. When population variances are equal


……….CI for the difference between population
means
1. The populations have equal variances

o We use the two sample standard deviations and pool


them to estimate 

o The test statistic is a t value with (𝑛1 + 𝑛2 – 2) degrees of


freedom

o The pooled estimate (𝑆𝑝 2 ) is the weighted average of the


two sample variances.
Cont.…
 The pooled standard deviation is

 The standard error of the estimate is given by:


Cont.…
Example
 A study was conducted to compare the serum iron levels of
children with cystic fibrosis to those of healthy children.
Serum iron levels were measured for random samples of 𝑛1
= 9 healthy children and 𝑛2 = 13 children with cystic
fibrosis. For 9 healthy children , 𝑋1 = 18.9𝜇𝑚𝑜𝑙/𝑙 and 𝑠1 =
5.9𝜇𝑚𝑜𝑙/𝑙. For 13 children with cystic fibrosis, 𝑋2 =
11.9𝜇𝑚𝑜𝑙/𝑙 and 𝑠2 = 6.3 𝜇𝑚𝑜𝑙/𝑙. The two underlying
populations of serum iron levels are independent and
normally distributed. Assume that the two population have
equal variance
Cont.…
 .Construct the 95% CI for 𝑋1 - 𝑋2 .

o The pooled sample variance is :


Cont….
 A t-value at 95% CL with 20 df is 2.086. And the 95% CI
for 𝑋1 - 𝑋2 is
……….CI for the difference between population
means
2. The populations have unequal variances

 The confidence interval for 𝜇1 − 𝜇2 is:

 Where;
Cont.…
 Where;
Example
 Among the 18 subjects with schizophrenia, the mean number
of treatment days was 4.7 with a standard deviation of 9.3. In
the bipolar disorder treatment group of 10 subjects, the mean
number of psychiatric disorder treatment days was 8.8 with a
standard deviation of 11.5. We assume that the two populations
of number of psychiatric disorder days are approximately
normally distributed. Now let us assume, however, that the two
population variances are not equal. We wish to construct a 95
percent confidence interval for the difference between the
means of the two populations represented by the samples
Cont.….
 With 17df & 1-0.5/2 = 0.975, 𝑡1 = 2.1098,

 With 9df 1-0.5/2 = 0.975, 𝑡1 = 2.2622

 95% CI;
Reading Assignment
 CI for the difference between paired population means.
4. CI for difference of two population
proportions
 Assumptions;

o The samples are randomly drawn from populations

o The conditions for the binomial distribution are satisfied.

o 𝑛1 𝑝1 ≥ 5; 𝑛1 (1 − 𝑝1 ) ≥ 5

o 𝑛2 𝑝2 ≥ 5; 𝑛2 (1 − 𝑝2 ) ≥ 5

 The point estimate for the difference;

o 𝑝1 - 𝑝2
……CI for two population proportions

 SE of the difference =

 The confidence interval for p1 – p2 is:


The following formula is also equally used
 An approximate 95% confidence interval takes the form
Example
 In a clinical trial for a new drug to treat hypertension, 𝑛1 =
50 patients were randomly assigned to receive the new drug,
and 𝑛2 = 50 patients to receive a placebo. 34 of the patients
receiving the drug showed improvement, while 15 of those
receiving placebo showed improvement.

 Compute a 95% CI estimate for the difference between


proportions improved.
Cont.…
 𝑃1 = 34/50 = 0.68, 𝑃2 = 15/50 = 0.30

 The point estimate for the difference is 𝑃1 - 𝑃2 =


[0.68−0.30] = 0.38

 SE of the difference =

 The 95% CI;

o Lower Limit = 0.38 – (1.96)(0.0925) = 0.20

o Upper Limit = 0.38 + (1.96)(0.0925) = 0.56

 95% CI = (0.20, 0.56)


Google for more!!!

You might also like