0% found this document useful (0 votes)
19 views91 pages

Estimation & Sample Size Determination

This document discusses statistical estimation and sample size determination. It defines key terms related to estimation such as point estimates, standard error, confidence level, and margin of error. It explains that the goal of conducting surveys is to make inferences about unknown population parameters based on a sample. Estimation techniques depend on whether the variable is continuous or dichotomous, and the number of samples or groups. Point estimates provide a single value for the population parameter while confidence intervals provide a range of values that have a given probability of containing the true parameter.

Uploaded by

dawit tesfa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views91 pages

Estimation & Sample Size Determination

This document discusses statistical estimation and sample size determination. It defines key terms related to estimation such as point estimates, standard error, confidence level, and margin of error. It explains that the goal of conducting surveys is to make inferences about unknown population parameters based on a sample. Estimation techniques depend on whether the variable is continuous or dichotomous, and the number of samples or groups. Point estimates provide a single value for the population parameter while confidence intervals provide a range of values that have a given probability of containing the true parameter.

Uploaded by

dawit tesfa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Statistical Estimation and

sample size determination


By: Amare M (MPH/Biostatistics)
[email protected]

January 2024

Estimation 1
Objectives
By the end of this part the learners will be able to:

 Know methods and principles of drawing Conclusions about a larger group (or
population) based on samples taken from that population

 Know how to make statistical decisions based on various tests

 Define point estimate, standard error, confidence level, and margin of error

 Compare and contrast standard error and margin of error

 Compute and interpret confidence intervals for means and proportions

 Differentiate independent and matched or paired samples

 Compute confidence intervals for the difference in means, and proportions in


independent samples

Estimation 2
Revisions
• Descriptive statistics:
o Data collection
o Data organization and presentation
o Data summarization

• Probability:
o Concepts, and
o Distributions

• Inferential statistics: performing


o Estimation
o Hypothesis testing
o Determining relationships among variables, and
o Making predictions

Estimation 3
Statistical inference

• Making inferences about unknown population parameters


based on sample statistics

• Used when we want to draw a conclusion for the data


obtained from the sample

• There are two broad areas of statistical inference:

Estimation

Hypothesis testing

Estimation 4
Statistical inference

• In estimation, sample statistics are used to generate


estimates about unknown population parameters

• In hypothesis testing, a specific statement or hypothesis is


generated about a population parameters; and

• Sample statistics are used to assess the likelihood that the


hypothesis is true

Estimation 5
Statistical inference

Population parameters Sample statistics


 Fixed, but  Known values, but

 Unknown values , but  Varies from sample to

 Estimated from the sample sample, but

statistics  Computed from the data

Estimation 6
The concept of statistical inference
Parameters

population

Random Sample

Statistic
Statistical Estimation

What is the goal of conducting survey?


• The goal of conducting surveys is to obtain information about
a particular population.

• When the sample has been selected and the information


collected and, there still remains the task of linking the
information gathered from the sample back to the overall
population.

Estimation 8
Estimation …

• Estimation is the process of determining a likely value for a


population parameter;

o True population mean or

o True population proportion based on a random


sample

Estimation 9
Estimation …
• In practice, we select a sample from the population and use sample
statistics;

oThe sample mean or


o The sample proportion to estimate the unknown parameter
• The sample should be representative of the population, with
participants selected at random from the population
• Because different samples can produce different results, it is
necessary to quantify the precision that might exist among
estimates from different samples

Estimation 10
Estimation …
• The techniques for estimation and other procedures in statistical
inference depend on;
o The appropriate classification of the outcome/dependent
variable (the key study variable) as continuous or dichotomous
o The number of comparison groups in the investigation
o Eg. Two comparison groups
 Independent Men Vs Women
 Dependent (matched/paired )

• These issues dictate the appropriate estimation technique

Estimation 11
Estimation techniques

Number of samples Outcome Parameter to be


(groups) variable estimated
One sample Continuous Mean

Two independent Continuous Difference in means


samples
Two dependent/matched/ Continuous Mean difference
paired samples
One sample Dichotomous Proportion (prevalence,
cumulative incidence)
Two independent Dichotomous Difference or ratio
samples (attributable risk, relative
risk, odds ratio)
Estimation 12
Estimation …

• The are two types of estimates that can be produced for any
population parameter:

– A point estimate and

– A confidence interval estimate

Estimation 13
Point estimate

• A point estimate for a population parameter is a single-valued


estimate of that parameter

• Sample mean or sample proportion

• Point estimate is always within the interval estimate

• The point estimates for the population mean and proportion


are the sample mean and sample proportion, respectively.

• These are our best single-valued estimates of the unknown


population parameters

Estimation 14
Point estimate …
• The sample mean is an unbiased estimator of the population
mean

• The same holds true for the sample proportion with regard to
estimating the population proportion

Estimation 15
Point estimates

Estimation 16
Desirable properties of estimators
• Unbiasedness

o expected value =population parameter

o Unbiasedness is an average or long-run property

o Any systematic deviation of the estimator from the population parameter is


called bias

• Efficiency

o An estimator is efficient if it has a relatively small variance

• Consistency

o Probability of being close to the parameter it estimates increases as the


sample size increases

• Sufficiency

o contains all the information in the data about the parameter it estimates.
Estimation 17
Confidence interval estimate

• A confidence interval (CI) estimate is a range of values for a


population parameter with a level of confidence attached (eg.
95% confidence that the interval contains the unknown
parameter)

• CI for mean or CI for proportion

• The level of confidence is similar to a probability

• The CI starts with the point estimate and builds in what is


called a margin of error

Estimation 18
Margin of error

• The margin of error incorporates the confidence level (eg.


90%, 95% or 99%, which is chosen by the investigator) and the
sampling variability or the standard error of the point
estimate

Estimation 19
Confidence interval estimate …

• A confidence interval is a range of values that is likely to cover


the true population parameter, and its general form is:

Point estimate ± margin of error

• The point estimate is determined first

• CI estimate=point estimate ± margin of error

• CI estimate=

• CI estimate= sample proportion ± margin of error

Estimation 20
Confidence interval estimate …

• A level of confidence is selected that reflects the likelihood


that the CI contains the true, unknown parameter

• Usually, confidence levels of 90%, 95%, and 99% are chosen

• Although theoretically any confidence level between 0% and


100% can be selected

Estimation 21
A 95% confidence level estimate of mean

• Means a 95% probability that a CI will contain the true


population mean
• Thus,

The Central Limit Theorem, which stated that for large samples,
the distribution of the sample means is approximately normal
with a mean and standard deviation

• we use the Central Limit Theorem to the margin of error

Estimation 22
Confidence interval estimate …

• For the standard normal distribution, the following is a true


statement :

chance that a standard normal variable (z) will fall between -1.96
and 1.96

• The Central Limit Theorem states that for large samples,

• If we make this substitution, the following statement is true:

Estimation 23
Confidence interval estimate …

Estimation 24
Confidence interval estimate …
• The 95% CI for the population mean is the interval in the last
probability statement and is given by

• the margin of error is


• Where 1.96 reflects the fact that a 95% confidence level is
selected, and the standard error (the standard
deviation of the point estimate, )
• The general form of a CI can be rewritten as follows:

Estimation 25
Confidence intervals for one sample, continuous outcome

• We wish to estimate the mean of a continuous outcome


variable in a single population.
• For example, we wish to estimate the mean systolic blood pressure,
body mass index (BMI), total cholesterol level, or white blood cell
count in a single population.
• Example 1: In a sample of 3539 study participants attending the
seventh examination of the Framingham Offspring study, about
3534 provided the systolic blood pressure information. The
summary statistics computed for systolic blood pressure
information indicates the mean is 127.3 and standard deviation
19.0. Generate a 95% CI for systolic blood pressure using data
collected in the Framingham Offspring Study.

Estimation 26
Confidence intervals for one sample, continuous outcome …

• Because the sample size is large, we use the following


formula,
The z value for 95% confidence is Z=1.96. Substituting the
sample statistics and the z value for 95% Confidence, we have

127.3±0.63
Therefore the 95% CI: (126.7, 127.9)
• The margin of error is very small here because of the large
sample size (narrow CI and precise estimate).
• We are 95% confident that the true mean is between 126.7
and 127.9.

Estimation 27
Confidence intervals for one sample, continuous outcome …

• Example 2: From example 1, a subsample of n=10 participants


who attended the seventh examination of the Framingham
Offspring Study for systolic blood pressure information with
summary statistics of mean and standard deviation 121.2 and
11.1 respectively. Compute the 95% CI for the true systolic
blood pressure using the data given.
• Because the sample size is small, we must now use the CI
formula that involves t rather than z,
• The t value for 95% confidence with df=9 is t=2.262.
Substituting the sample statistics and the t value for 95%
confidence, we have
• We have 121.2±7.94 95%CI (113.3, 129.1)

Estimation 28
Confidence intervals for one sample, continuous outcome …

Confidence interval for µ

Sample size CI estimate Z or t values


n≥30
Find z in z-table
n<30 Find t in t-table,
df=n-1

Estimation 29
Confidence intervals for one sample, dichotomous outcome

• There are many applications where the outcome of interest is


dichotomous.
• The parameter of interest is the unknown population
proportion, denoted p.
• Suppose we wish to estimate the proportion of people with
diabetes in a population, or the proportion of people with
hypertension or obesity
• Hypertension and obesity are define by specific levels of
blood pressure and BMI, respectively
• When the outcome of interest is dichotomous, we record on
each member of the sample whether they have the
characteristic of interest or not.

Estimation 30
Confidence intervals for one sample, dichotomous outcome…

• The sample size is denoted by n and we let x denote the number of


successes in the sample
• The specific response that is considered a success is defined by the
investigator.
• For example, if we wish to estimate the proportion of people with
diabetes in a population, we consider a diagnosis of diabetes (the
outcome of interest) a success and lack of diagnosis a failure.
• In this example, x represents the number of people with a diagnosis
of diabetes in the sample.
• The sample proportion is denoted ˆp, and is computed by taking
the ratio of the number of successes in the sample to the sample
size, ˆp=x/n}.

Estimation 31
Confidence intervals for one sample, dichotomous outcome…

• The CI for the population proportion takes the same form as


the CI for the population mean (i.e., point estimate ±margin of
error)
• The point estimate for the population proportion is the
sample proportion.
• The margin of error is the product of the z value for the
desired confidence level (e.g., z=1.96 for 95% confidence) and
the standard error of the point estimate,

• The preceding formula is appropriate for large samples,


defined as at least five successes (nˆp) and at least five
failures [n(1-p)] in the sample

Estimation 32
Confidence intervals for one sample, dichotomous outcome…

• If there are fewer than five successes or failures, then


alternative procedures  called exact methods, must be used
to estimate the population proportion.
• Example 3: from a study of 3539 participants who attended
the seventh examination of the Offspring in the Framingham
Heart Study, one particular characteristic measured was
treatment with antihypertensive medication. There were a
total of 1219 participants on treatment and 2313 participants
not on treatment. If we call treatment a success, then x=1219
and n=3532. The sample proportion is

• Thus, a point estimate for the population proportion is 0.345,


or 34.5%.

Estimation 33
Confidence intervals for one sample, dichotomous outcome…

• Our best estimate of the proportion of participants in the


population on treatment for hypertension is 0.345.
• Suppose we now wish to generate a 95% CI. To use the
preceding formula, we need to satisfy the sample size
criterion; specifically, we need at least five successes and five
failures.
• Here we more than satisfy that requirement, so the CI formula

Estimation 34
Confidence intervals for one sample, dichotomous outcome…

• Thus, we are 95% confident that the true proportion of


persons on antihypertensive medication in the population is
between 0.329 and 0.361, or between 32.9% and 36.1%
• Specific applications of estimation for a single population
with a dichotomous outcome involve estimating prevalence
and cumulative incidence.

• Confidence interval for p


Sample size, success and CI estimate Z –statistic
failures
np≥5 and *n(1-p)+ ≥ 5 Find z in the z-table

Estimation 35
Confidence intervals for two independent sample, continuous outcome

• There are many applications where it is of interest to compare


two groups with respect to their mean scores on a continuous
outcome.
• For example, the comparison groups might be men versus
women, patients assigned to an experimental treatment
versus a placebo in a clinical trial, or patients with a history of
cardiovascular disease versus patients free of cardiovascular
disease.
• We can compare mean systolic blood pressures in men versus
women, or mean BMI or total cholesterol levels in patients
assigned to experimental treatment versus placebo
Estimation 36
Confidence intervals for two independent sample, continuous outcome…

• A key feature here is that the two comparison groups are


independent, or physically separate.
• The two groups might be determined by a particular attribute
or characteristic (e.g., sex, diagnosis of cardiovascular disease)
or might be set up by the investigator (e.g., participants
assigned to receive an experimental drug or placebo)

Estimation 37
Confidence intervals for two independent sample, continuous outcome…

• Similar to the approach we used to estimate the mean of a continuous variable in


a single population, we first compute descriptive statistics on each of the two
samples
• Specifically, we compute the sample size, mean, and standard deviation in each
sample.
• We denote these summary statistics as for Sample 1 and
for Sample 2.
• (The designation of Sample 1 and Sample 2 is essentially arbitrary.
• In a clinical trial setting, the convention is to call the experimental treatment
Group 1 and the control treatment Group 2.
• However, when comparing groups such as men and women, either group can be 1
or 2.)
• The interpretation of the CI estimate depends on how the samples are assigned,
but this is just for interpretation and does not affect the computations.

Estimation 38
Confidence intervals for two independent sample, continuous outcome…

• In the two-independent-samples application with a


continuous outcome, the parameter of interest is the
difference in population means, µ1-µ2
• The point estimate for the difference in population means is
the difference in sample means,
• The form of the CI is again point estimate ± margin
of error, and the margin of error incorporates a value from
either the z or t distribution reflecting the selected confidence
level and the standard error of the point estimate.
• The use of z or t depends on whether the sample sizes are
large or small. (The specific guidelines follow.)
• The standard error of the point estimate incorporates the
variability in the outcome of interest in each of the
comparison groups. Estimation 39
Confidence intervals for two independent sample, continuous outcome…

• Contains the formulas for CIs for the difference in population


means (See below in the table).

Estimation 40
Confidence intervals for two independent sample, continuous outcome…

• The formulas (in the Table) assume equal variability in the two
populations (i.e., that the population variances are equal, or S12=s22).
• This means that the outcome is equally variable m in each of the
comparison populations.
• For analysis, we have samples from each of the comparison populations. If
the sample variances are similar, then the assumption about variability in
the populations is reasonable.
• As a guideline, if the ratio of the sample variances, S12/s22, is between 0.5
and 2 (i.e., if one variance is no more than double the other), then the
formulas in the above Table are appropriate.
• If the ratio of the sample variances is greater than 2 or less than 0.5, then
alternative formulas must be used to account for the heterogeneity in
variances.
Estimation 41
Confidence intervals for two independent sample, continuous outcome…

• are the means of the outcome in the independent


samples, z or t are values from the z or t distributions
reflecting the desired confidence level, and
is the standard error of the point estimate,
• Here, Sp is the pooled estimate of the common standard
deviation (again, assuming that the variances in the
populations are similar) computed as the weighted average of
the standard deviations in the samples:

Estimation 42
Confidence intervals for two independent sample, continuous outcome…

• Because we are assuming equal variances in the groups, we


pool the information on variability (sample variances) to
generate an estimate of the population variability.
• Example: Suppose we want to compare mean systolic blood
pressures in men versus women using a 95% CI based on
summary statistics on Men and Women attending the seventh
examination of the Framingham Offspring study .

Men Women

n s n s
Systolic BP 1623 128.2 17.5 1911 126.5 20.1

Estimation 43
Confidence intervals for two independent sample, continuous outcome…

• The data that we have large samples (more than 30) of both men and
women, and therefore we use the CI formula from the above Table with z
as opposed to t.
• However, before implementing the formula, we first check whether the
assumption of equality of the population variances is reasonable.
• The guideline suggests investigating the ratio of the sample variances,
S12/s22
• Suppose we call men Group 1 and women Group 2. (Again, this is
arbitrary. It only needs to be noted when interpreting the results.)
• The ratio of the sample variances is 17.52 / 20.12 =0.76, which falls
between 0.5 and 2, suggesting that the assumption of equality of the
population variances is reasonable.

Estimation 44
Confidence intervals for two independent sample, continuous outcome…

• The appropriate CI formula for the difference in mean systolic


blood pressures between men and women is:

Estimation 45
Confidence intervals for two independent sample, continuous outcome…

• Notice that the pooled estimate of the common standard


deviation, Sp, falls between the standard deviations in the
comparison groups (i.e., 17.5 and 20.1).
• Sp is slightly closer in value to the standard deviation in the
women (20.1) as there are slightly more women in the
sample.
• Recall that Sp is a weighted average of the standard deviations
in the comparison groups, weighted by the respective sample
sizes.

Estimation 46
Confidence intervals for two independent sample, continuous outcome…

• The 95% CI for the difference in mean systolic blood pressures is:

• The CI is interpreted as follows: We are 95% confident that


the difference in mean systolic blood pressures between men
and women is between 0.44 and 2.96 units.
• Our best estimate of the difference, the point estimate, is 1.7
units.
• The standard error of the difference is 0.641, and the margin
of error is 1.26 units.

Estimation 47
Confidence intervals for two independent sample, continuous outcome…

• Note that when we generate estimates for a population


parameter in a single sample (e.g., the mean µ or population
proportion P), the resulting CI provides a range of likely
values for that parameter.
• In contrast, when there are two independent samples and the
goal is to compare means or proportions, the resultant CI
does not provide a range of values for the parameters in the
comparison populations; instead, the CI provides a range of
values for the difference.

Estimation 48
Confidence intervals for two independent sample, continuous outcome…

• In this example, we estimate that the difference in mean


systolic blood pressures is between 0.44 and 2.96 units, with
men having the higher values.
• The last aspect of the interpretation is based on the fact that
the CI is positive and that we called men Group 1 and women
Group 2.
• Had we designated the groups the other way (i.e., women as
Group 1 and men as Group 2), the CI would have been -2.96
to -0.44, suggesting that women have lower systolic blood
pressures (anywhere from 0.44 to 2.96 units lower than men).

Estimation 49
Confidence intervals for matched sample, continuous outcome

• Here the two comparison groups are dependent (or matched,


or paired).
• One such possible scenario involves a single sample of
participants and each participant is measured twice, possibly
before and after an intervention or under two experimental
conditions (e.g., in a crossover trial).
• The goal of the analysis is to compare the mean score
measured before the intervention with the mean score
measured afterward.

Estimation 50
Confidence intervals for matched sample, continuous outcome…

• Another scenario is one in which matched samples are analyzed.


• For example, we might be interested in the difference in an outcome
between twins or siblings.
• Again, we have two samples and the goal is to compare the two means;
however, the samples are related or dependent.
• In the first scenario, before and after measurements are taken in the
same individual.
• In the second scenario, measures are taken in pairs of individuals from
the same family.
• When the samples are dependent, we cannot use the techniques
described in previous section to compare means.
• Because the samples are dependent, statistical techniques that account
for the dependency must be used.
• The technique here focuses on difference scores

Estimation 51
Confidence intervals for matched sample, continuous outcome…

• (e.g., the difference between measures taken before versus


after the intervention, or the difference between measures
taken in twins or siblings)
• In estimation and other statistical inference applications, it is
critically important to appropriately identify the unit of
analysis. Units of analysis are independent entities.
• In one sample and two-independent-samples applications,
participants are the units of analysis.
• In the two-dependent-samples application, the pair is the unit
and not the number of measurements, which is twice the
number of pairs or units.
Estimation 52
Confidence intervals for matched sample, continuous outcome…

• The parameter of interest is the mean difference, µd


• Again, the first step is to compute descriptive statistics on the
sample data.
• Specifically, descriptive statistics are computed on difference scores.
• We compute the sample size (which, in this case, is the number of
distinct participants or distinct pairs) and the mean and standard
deviation of the difference scores.
• We denote these summary statistics as respectively.
• The appropriate formula for the CI for the mean difference depends
on the sample size.
• The formulas are shown below and are identical to those we
presented for estimating the mean of a single sample, except here
we focus on difference scores.

Estimation 53
Confidence intervals for matched sample, continuous outcome…

• Here n is the number of participants or pairs, are


the mean and standard deviation of the difference scores
(where differences are computed on each participant or
between members of a matched pair), and z or t are the
values from the z or t distributions reflecting the desired
confidence level.
• is the standard error of the point estimate,

Estimation 54
Confidence intervals for matched sample, continuous outcome…

• Example: The data in the table below are systolic blood


pressures measured at the sixth and seventh examinations in
a subsample of n=15 randomly selected participants.
• The first column contains unique identification numbers,
assigned only to distinguish individual participants.
• Suppose we want to compare systolic blood pressures
between examinations (i.e., changes over four years). Because
the data in the two samples (Examination 6 and Examination
7) are matched, we compute difference scores.

Estimation 55
Confidence intervals for matched sample, continuous outcome…

• The difference scores can be computed by subtracting the


blood pressure measured at Examination 7 from that
measured at Examination 6, or vice versa.
• If we subtract the blood pressure measured at Examination 6
from that measured at Examination 7, then positive
differences represent increases over time and negative
differences represent decreases over time.
• The table contains the difference scores for each participant.

Estimation 56
Confidence intervals for matched sample, continuous outcome…

Confidence Intervals for µd

Estimation 57
Confidence intervals for matched sample, continuous outcome…

Estimation 58
Confidence intervals for matched sample, continuous outcome…

Estimation 59
Confidence intervals for matched sample, continuous outcome…

• Notice that several participants’ systolic blood pressures


decreased over four years (e.g., Participant 1’s blood pressure
decreased by 27 units from 168 to 141), whereas others
increased (e.g., Participant 2’s blood pressure increased by 8
units from 111 to 119).
• We now estimate the mean difference in blood pressures over
four years.
• This is similar to a one-sample problem with a continuous
outcome in the previous section, except here we focus on the
difference scores.
• In this sample we have n=15,

Estimation 60
Confidence intervals for matched sample, continuous outcome…

• We now use these descriptive statistics to compute a 95% for


the mean difference in systolic blood pressure in the
population
• Because the sample size is small (n=15), we use the following
formula:

Estimation 61
Confidence intervals for matched sample, continuous outcome…

• We are 95% confident that the mean difference in systolic


blood pressures between Examination 6 and Examination 7
(approximately four years apart) is between -12.4 and
1.8
• The null (or no effect) value of the CI for the mean difference
is 0.
• Therefore, based on the 95% CI we cannot conclude that
there is a statistically significant difference in blood pressures
over time because the CI for the mean difference includes 0.

Estimation 62
Confidence intervals for two independent sample, dichotomous
outcome
• It is very common to compare two groups in terms of the presence or absence of a
particular characteristic or attribute.
• There are many instances in which the outcome variable is dichotomous (e.g.,
prevalent cardiovascular disease or diabetes, current smoking status, incident
coronary heart disease, cancer remission, successful device implant).
• Similar to the applications described for continuous outcomes, we focus here on
the case where there are two comparison groups that are independent or
physically separate and the outcome is dichotomous.
• The two groups might be determined by a particular attribute or characteristic of
the participant (e.g., sex, age less than 65 versus age 65 and older) or might be set
up by the investigator (e.g., participants assigned to receive an experimental drug
or a placebo, a pharmacological versus a surgical treatment).
• When the outcome is dichotomous, the analysis involves comparing the
proportions of successes between the two groups.

Estimation 63
Proportions in two independent groups
• The methods that are used to compare proportions in two
independent groups
• Risk difference: which is computed by taking the difference
in proportions between comparison groups and is similar to
the estimate of the difference in means for a continuous
outcome described earlier
• Relative risk: is computed by taking the ratio of proportions.
• Odds ratio: is computed by taking the ratio of the odds of
success in the comparison groups.

Estimation 64
Confidence Intervals for the Risk Difference
The risk difference (RD) is similar to the difference in means
when the outcome is continuous. The parameter of interest is
the difference in proportions in the population, RD=p1-p2.
The point estimate is the difference in sample proportions,
RD ˆ=ˆp1-ˆp2. The sample proportions are computed by taking
the ratio of the number of successes (x) to the sample size (n)
in each group, ˆp1=x1 ⁄n1 and ˆp2=x2 ⁄n2, respectively. The
formula for the CI for the difference in proportions, or the
RD, is given in Table below.

Estimation 65
• Confidence interval for (P1-P2)

Estimation 66
• The formula in the above Table is appropriate for large
samples, defined as at least five successes (nˆp) and at least
five failures [n(1-ˆp)] in each sample. If there are fewer than
five successes or five failures in either comparison group, then
alternative procedures called exact methods must be used to
estimate the difference in population proportions.

Estimation 67
• Example: we presented data measured in participants
who attended the fifth examination of the offspring in
the Framingham Heart Study. A total of n=3799
participants attended the fifth examination, and the
Table below contains data on prevalent CVD among
participants who were and were not currently smoking
cigarettes at the time of the fifth examination in the
Framingham Offspring Study.

Estimation 68
• Prevalent CVD in Smokers and Nonsmokers

Estimation 69
• Confidence interval for (P1-P2)

Estimation 70
• The outcome is prevalent CVD and the two comparison groups are defined
by current smoking status. The point estimate of prevalent CVD among
nonsmokers is 298 / 3055=0.0975, and the point estimate of prevalent
CVD among current smokers is 81 / 744= 0.1089. When constructing CIs
for the RD, the convention is to call the exposed or treated Group 1 and
the unexposed or untreated Group 2. Here smoking status defines the
comparison groups, and we will call the current smokers Group 1 and the
nonsmokers Group 2. A CI for the difference in prevalent CVD (or RD)
between smokers and nonsmokers is given below.
• In this example, we have more than enough successes (cases of prevalent
CVD) and failures (persons free of CVD) in each comparison group.

Estimation 71
• We are 95% confident that the difference in proportions of smokers
as compared to nonsmokers with prevalent CVD is between -0.0133
and 0.0361.
• The null, or no difference, value for the RD is 0. Because the 95% CI
includes 0, we cannot conclude that there is a statistically significant
difference in prevalent CVD between smokers and nonsmokers.

Estimation 72
• Example: A randomized trial is conducted to evaluate the effectiveness of
a newly developed pain reliever designed to reduce pain in patients
following joint replacement surgery. The trial compares the new pain
reliever to the pain reliever currently used (called the standard of care). A
total of 100 patients undergoing joint replacement surgery agree to
participate in the trial. Patients are randomly assigned to receive either
the new pain reliever or the standard pain reliever following surgery. The
patients are blind to the treatment assignment. Before receiving the
assigned treatment, patients are asked to rate their pain on a scale of 0 to
10, with higher scores indicative of more pain. Each patient is then given
the assigned treatment and after 30 minutes is again asked to rate his or
her pain on the same scale. The primary outcome is a reduction in pain of
3 or more scale points (defined by clinicians as a clinically meaningful
reduction). The data shown in the Table below are observed in the trial

Estimation 73
• A point estimate for the difference in proportions of patients
reporting a clinically meaningful reduction in pain between
treatment groups is 0.46-0.22=0.24. (Notice that we call the
experimental or new treatment Group 1 and the standard Group 2.)
• There is a 24 percentage point increase in patients reporting a
meaningful reduction in pain with the new pain reliever as
compared to the standard pain reliever.
• We now construct a 95% CI for the difference in proportions.
• The sample sizes in each comparison group are adequate—i.e.,
each treatment group has at least five successes (patients reporting
reduction in pain) and at least five failures—therefore the formula
for the CI is

Estimation 74
• We are 95% confident that the difference in proportions of patients
reporting a meaningful reduction in pain is between 0.06 and 0.42
comparing the new and standard pain relievers.
• Our best estimate is an increase of 24 percentage points with the new
pain reliever.
• Because the 95% CI does not contain 0 (the null value), we can conclude
that there is a statistically significant difference between pain relievers—in
this case, in favor of the new pain reliever.

Estimation 75
Sample size determination

Estimation 76
Thanks!

Introduction 91

You might also like