7 Estimation
7 Estimation
– hypothesis testing
– making predictions.
Testing
Sampling distributions
Sampling distributions of
• Means and
• Proportions
Statistical analysis of data begins with description of quantities, their
relationships, and structure.
A quantity is anything that can be counted, ranked, and measured.
it is essential to examine the distribution of the variable for
skewness (tails),
kurtosis (peaked or flat distribution), spread (range of the values)
and
outliers (data values separated from the rest of the data).
Information about each of these characteristics determines to choose
the statistical analyses and can be accurately explained and
interpreted.
10/17/2023 6
We have two facts that are key to statistical inference.
Sample statistics are known values for any given sample, but vary
from sample to sample taken from the same population.
10/17/2023 7
Sampling distributions
10/17/2023 9
If the sample statistic is the sample mean, then the distribution is
the sampling distribution of sample means.
10/17/2023 10
In practice we do not take repeated samples from a
population
10/17/2023 11
Properties of sampling distribution
10/17/2023 12
The central limit theorem
10/17/2023 13
An illustration showing how a sample size determines the
shape of the sampling distribution
10/17/2023 14
If the population itself is normally distributed, with mean =
and standard deviation = , the sample means will have a
normal distribution for any sample size n.
Population distribution Sample means
distribution
10/17/2023 15
If the sample statistic is a proportion, providing n is large
the sample proportions will be distributed normally with
mean p and standard deviation called the
standard error of the proportion
10/17/2023 16
The mean and standard error
Example:
The weights of one year old children in a certain region are distributed
with a mean weight of 8 kg and a standard deviation of 0.7 kg. 38 children
are randomly selected from the population, and the mean of each sample
is determined.
a. Find the mean and standard error of the mean of the sampling
distribution.
10/17/2023 17
Interpreting the Central Limit Theorem
10/17/2023 18
Important points about assumptions of normal distribution
The sample size is large (i.e. greater than 30) so that we can
use the Central Limit Theorem (CLT).
10/17/2023 19
We use the student t-distribution (t- statistic)
provided that we have the following three
conditions satisfied:
The sample is from a normally distributed population,
the sample size is small i.e. less than 30. (Note that we can also
use t-test even if n> 30 if we want to be more conservative!)
10/17/2023 20
The concept of statistical inference
Parameters
population
Random Sample
Statistic
Statistical Estimation
Estimate
10/17/2023 25
Point estimation
10/17/2023 26
point estimates
Estimation
Desirable properties of estimators include:
• Unbiasedness
– expected value =population parameter
– Unbiasedness is an average or long-run property
– Any systematic deviation of the estimator from the population
parameter is called bias
• Efficiency
– An estimator is efficient if it has a relatively small variance
• Consistency
– probability of being close to the parameter it estimates
increases as the sample size increases
• Sufficiency
– contains all the information in the data about the
parameter it estimates.
Interval estimation
• A confidence interval (CI) estimate is a range of
values for a population parameter with a level of
confidence attached (e.g., 95% confidence that the
interval contains the unknown parameter).
• The level of confidence is similar to a probability.
• The CI starts with the point estimate and builds in
what is called a margin of error.
• The margin of error incorporates the confidence level
(e.g., 90% or 95%, which is chosen by the
investigator) and
• The sampling variability or the standard error of the
point estimate.
Interval estimation….
• A CI is a range of values that is likely to cover
the true population parameter, and
• Its general form is point estimate ± margin of
error.
• The point estimate is determined first.
• The point estimates for the population mean
and proportion are the sample mean and
sample proportion, respectively.
• These are our best single-valued estimates of
the unknown population parameters.
Interval estimation….
• Specifically, the t values for CIs are larger for smaller samples, resulting in
larger margins of error (i.e., there is more imprecision with small samples).
Confidence Level
Interpretation of CIs in general
• Suppose we want to estimate a population mean using a 95%
confidence level.
• If we take 100 different samples (in practice, we take only one)
and for each sample we compute a 95% CI, in theory 95 out of the
100 CIs will contain the true mean value (μ).
• This leaves 5 of 100 CIs that will not include the true mean value.
• In practice, we select one random sample and generate one CI.
• This interval may or may not contain the true mean; the observed
interval may overestimate μ or underestimate μ.
• The 95% CI is the likely range of the true, unknown parameter.
• It is important to note that a CI does not reflect the variability in
the unknown parameter but instead provides a range of values
that are likely to include the unknown parameter.
Example
In a study of preeclampsia, the mean systolic blood pressure of 10 healthy,
non-pregnant women to be 119 with a standard deviation of 2.1.
n ≥ 30 Find z in z Table
• For men
• For women
=-9.5-2.262x8.64/3.162, -9.5+2.262x8.64/3.162
(-15.68,3.32)
The mean difference before and after taking drug has no significant
difference because the confidence interval contain zero.
Example
• A crossover trial is conducted to evaluate the effectiveness
of a new drug designed to reduce the symptoms of
depression in adults over 65 years of age following a stroke.
Symptoms of depression are measured on a scale of 0 to 100,
with higher scores indicative of more frequent and severe
symptoms of depression. Patients who suffered a stroke are
eligible for the trial. The trial is run as a crossover trial where
each patient receives both the new drug and a placebo.
Patients are blind to the treatment assignment, and the
order of treatments (e.g., placebo and then new drug or new
drug and then placebo) are randomly assigned. After each
treatment, depressive symptoms are measured in each
patient. The difference in depressive symptoms is measured
in each patient by subtracting the depressive symptom score
after taking the placebo from the depressive symptom score
after taking the new drug. A total of 100 participants
completed the trial and the data are summarized in Table
Summary statistics on differences in depressive symptoms
91