0% found this document useful (0 votes)
20 views65 pages

Lec - 7& 8 (Stastical Estimation)

The document outlines statistical estimation concepts, focusing on sampling distribution theory, statistical inference, and estimation methods. It differentiates between point and interval estimation, detailing how to compute confidence intervals for population means and proportions. Additionally, it emphasizes the importance of sample size calculation in obtaining representative samples for accurate statistical analysis.

Uploaded by

Begidu Yilma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views65 pages

Lec - 7& 8 (Stastical Estimation)

The document outlines statistical estimation concepts, focusing on sampling distribution theory, statistical inference, and estimation methods. It differentiates between point and interval estimation, detailing how to compute confidence intervals for population means and proportions. Additionally, it emphasizes the importance of sample size calculation in obtaining representative samples for accurate statistical analysis.

Uploaded by

Begidu Yilma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Arba-Minch University

College of Medicine and Health sciences


School of Public Health

Statistical Estimation

By: Etenesh K. (BSc, MPH( Epidemiology & Biostatistics))

02/09/2025 1
Learning Objectives

At the end of this session the student will be able


to:
 Know sampling distribution theory

 Describe Statistical inference and estimation

 Differentiate between point and interval estimation

 Compute appropriate confidence intervals for population

means and proportions and interpret the findings


 Describe methods of sample size calculation

02/09/2025 2
What is a sampling distribution?
• is a distribution of all possible values of a statistic computed
from samples of the same size randomly selected from the
same population.
• In order to make an inference (e.g. estimate) about the
parameter from the sample statistic, one has to know or make
some assumptions about the distribution of the sample
statistic.

02/09/2025 3
Cont..
• Due to random variation different samples from the same population
will have different sample means.
• If we repeatedly take sample of the same size n from a population the
means of the samples form a sampling distribution of means of size n.
E.g. Take a sample (n) from N and calculate the statistic, e.g., mean.
• Take another sample (same size) and calculate mean.
• Repeat & repeat & repeat & ………..
• Do you expect all the sample means the same? NO

02/09/2025 4
Cont..
• Sampling variability: the value of any statistic ( mean or
proportion ) varies in repeated random sampling.
• They will vary BUT less variation
• In practice we do not take repeated samples from a population i.e.
we do not encounter sampling distribution empirically, but it is
necessary to know their properties in order to draw statistical
inferences.

02/09/2025 5
Cont..
 When sampling a discrete, finite population, a sampling

distribution can be constructed.


 However, this construction is difficult with a large population

and impossible with an infinite population.


 We consider sample statistics as random variables.

For example:
 Age of individuals is a random variable.

 Similarly, mean age is a random variable.


02/09/2025 6
Cont..
• One may generate the sampling distribution of means as follows:

1. Obtain a sample of n observations selected completely at


random from a large population
– Determine their mean and then replace the observations in the
population.

2. Obtain another random sample of n observations from the


population, determine their mean and again replace the
observations

02/09/2025 7
Cont..
3. Repeat the sampling procedure until the possible number of
different samples drawn.
• For each sample, calculate the sample value of interest
(statistic) such as sample mean, and proportion.

02/09/2025 8
Cont..

4. The result is a series of means of samples of size n.


• If each mean in the series is now treated as an individual
observation and arrayed in a frequency distribution, one
determines the sampling distribution of means of samples of
size n.

02/09/2025 9
Cont..

02/09/2025 10
Properties of sampling distribution
 1. The mean of the sampling distribution of is the same as the population mean

(μx = μ)
 2. The standard deviation of the sampling distribution of is equal to the

population standard deviation divided by the square root of the sample size
(σ/√n). It is called Standard error
• 3. If the original distribution is approximately normal, the sampling distribution
is normal even at small sample sizes.
 If the original population) is non-normal, the sampling distribution will be
approximately normal by central limit theorem provided n is large enough (>
30).
02/09/2025 11
Cont..
 When sample sizes are large, sampling distribution generated

by repeated random sampling with replacement is invariably a


normal distribution regardless of the shape of the population
distribution (Central limit theorem).

02/09/2025 12
Cont..
• The beauty of the CLT is that it allows us to make probability
statements about without regard for the distribution of X provided n
is large.
 Since , we can standardize to obtain


And, use our standard normal tables to find the probability that lies in
any particular interval.

02/09/2025 13
Note
 The standard deviation represents the variability in the
individual data.
 The standard error represents the variability in the sample
estimates. Or Measures how much the sample statistic varies
from sample to sample.

02/09/2025 14
Inferential statistics

 Descriptive statistics help investigators to describe and


summarize data.
 Probability and sampling distribution concepts needed to
evaluate data using statistical methods.
 Without probability and sampling distribution theory:
 we could not make statements about populations without
studying everyone in the population.
 studying everyone population is an undesirable and often
impossible task.
02/09/2025 15
cont..

Statistical inference

is the procedure by which we reach a conclusion

about a population on the basis of the


information contained in a sample that has been
drawn from that population.
 The two primary methods for making inference

are estimation and hypothesis testing.

02/09/2025 16
Cont..

02/09/2025 17
Statistical Estimation

• Estimation: is the process of determining a likely value

for a variable in the population based on information


collected from the sample.
 The use of sample statistics to estimate population

parameters.
 Researchers are usually interested in looking at estimates

of many statistics, totals, averages and proportions.

E.g. Estimates for the proportion of smokers among all


people aged 15 to 24 in the population.

02/09/2025 18
Cont..

Types of Estimation
1. Point Estimation

2. Interval Estimation

02/09/2025 19
1. Point Estimation

02/09/2025 20
Cont..
 From a single sample we can calculate a sample

statistic to estimate a single parameter (a point


estimate).
 Point estimate for population mean µ is

 Point estimate for population proportion is given by

 Where x is the total number of success (events)

02/09/2025 21
Cont..
• The problem is that two different samples are very likely to
result in different sample means, and thus there is some degree
of uncertainty involved.
• A point estimate does not provide any information about the
inherent variability of the estimator; we do not know how
close is to μ in any given situation.

02/09/2025 22
Properties of a Good Estimates
a. Un biasedness
 A sample statistic whose mean is equal to the
population parameter it estimates is unbiased.
The sample mean and median are unbiased
estimators of the population mean μ.

b. Minimum variance
 An estimate which has a minimum standard error
is a good estimator.
For symmetrical distribution the mean has a
minimum standard error and
If the distribution is skewed the median has a
minimum standard error.
02/09/2025 23
Cont..
c. Consistency
 As sample size increases, variation of the
estimator from the true population value
decreases

02/09/2025 24
2. Interval estimation
• Interval estimation: is a statement that a
population parameter has a value lying between
two specified limits.
 An interval estimate provides more information
about a population characteristic than a point
estimate.
 The value of the sample statistic will vary from
sample to sample therefore to simply obtain an
estimate of the single value of the parameter is not
generally acceptable.
02/09/2025 25
Cont..
 We need to take into account the sample to sample variation of
the statistic.

 A confidence interval defines an interval within which the


true population parameter is like to fall (interval estimate)

02/09/2025 26
02/09/2025 27
Cont..
 Interval estimate (Confidence interval) -
consists of two numbers, a lower limit
and an upper limit which serve as the
bounding values within which the
parameter is expected to lie with a certain
degree of confidence.

02/09/2025 28
Cont..
• A CI in general:
 Takes into consideration variation in
sample statistics from sample to sample
 Based on observation from one sample
 Gives information about closeness to
unknown population parameters
 Stated in terms of level of confidence
 Never 100% sure

02/09/2025 29
Cont..
• Confidence Level: Confidence in which the interval
will contain the unknown population parameter.
 A percentage (less than 100%)

 Most commonly the 95% confidence intervals are


calculated, however 90% and 99% confidence intervals
are sometimes used.
 As the confidence level increases we obtain a wider
confidence interval.
e.g. 90% CI is narrower than 95% CI 99% CI is wider than
95% CI
02/09/2025 30
02/09/2025 31
Cont..

A (1-α) 100% confidence interval for unknown population mean


and population proportion is given as follows;

 
 [ x  z . , x  z . ] for estimating mean
n 2 2 n
if  is unknown, it can be estimated by s.e
 
 [ p  z  . P (1  P ) / n , p  z  . P (1  P ) / n ] for estimating proportion
2 2

02/09/2025 32
Cont..
Interpretation:

• we are 100% (1-α) [e.g., 95%]


confident that the single computed
interval contains the unknown
population parameter.

02/09/2025 33
Cont..
 For a given confidence level (i.e. 90%, 95%, 99%) the
width of the confidence interval depends on
 The Standard Error of the estimate which in turn
depends on the:

1. Sample size:-The larger the sample size, the narrower


the confidence interval and the more precise our estimate.
 Lack of precision means in repeated sampling the values
of the sample statistic are spread out or scattered.

02/09/2025 34
 You can make the precision as high as you want by
taking a large enough sample.
 The margin of error decreases as√n increases.

2. Standard deviation:-The more the variation


among the individual values, the wider the
confidence interval and the less precise the
estimate.
 As sample size increases SD decreases.

02/09/2025 35
02/09/2025 36
Cont..

Confidence Intervals for


• A single population mean
• A single population proportion

02/09/2025 37
1) C.I. for a single population mean (normally distributed)

Known variance (large sample size)


• A 100(1‐α)% C.I. for μ is

• α is to be chosen by the researcher, most common values of α are


0.05, 0.01, 0.001 and 0.1.

02/09/2025 38
Example
 A physical therapist wished to estimate, with 99% confidence,
the mean maximal strength of a particular muscle in a certain
group of individuals.
 He assume that strength scores are approximately normally
distributed with a variance of 144.
 A sample of 15 subjects who participated in the experiment
yielded a mean of 84.3.

02/09/2025 39
Solution:

⇒ We are 99% confident that the population mean is between


76.3 and 92.3.

02/09/2025 40
E.g. 2. A random sample of 100 cancer patients
treated with a new drug has a mean survival time of
46.9 months.

 If the SD of the population is 43.3 months, find a


95% confidence interval for the population mean.

Solution: 46.9 ± (1.96) x(43.3 /√100) = 46.9 ±


8.5 = (38.4 to 55.4 months)

 Hence, there is 95% certainty that the limits (38.4,


55.4) contain the mean survival times in the
population from which the sample arose.
02/09/2025 41
The Z-test is applied when:
 The distribution is normal
 The population standard deviation σ is known or
 When the sample size n is large ( n ≥ 30) and
 With unknown σ (by taking S as estimator of σ).

02/09/2025 42
3) C.I. for a population proportion (large sample size)

02/09/2025 43
Cont..

p = 123/300 = 0.41 a point estimator of π.

α = 0.05 ⇒ Z0.025 = 1.96

We are 95% sure that the population proportion (p) lies


between 0.36 and 0.46

02/09/2025 44
Exercise

1. An epidemiologist is worried about the ever increasing trend of


malaria in a certain locality and wants to estimate the
proportion of persons infected in the peak malaria transmission
period.
If he takes a random sample of 150 persons in that locality

during the peak transmission period and finds that 60 of them


are positive for malaria.

02/09/2025 45
Cont..
Find: a) 95%
b) 90%
c) 99% confidence intervals for the proportion of
the whole infected people in that locality during the
peak malaria transmission period.

02/09/2025 46
Cont..
Solution:
Sample proportion = 60 / 150 =0.4
a) A 95% C.I for the population proportion (the proportion of
the whole infected people in that locality) = 0.4 ± 1.96 (0.04)
= (0.4 ± 0.078) = (0.322, 0.478).
b) A 90 = 0.4 ± 1.64 (0.04) = (0.4 ± 0.065)
c) A 99= 0.4 ± 2.57 (0.04) = (0.4 ± 0.1)

02/09/2025 47
Sample size determination
• How many samples should be taken from the larger
population to have a representative sample?
If too many…
• Shortage of resource
– Data collection
– Analysis
• Waste of resources

02/09/2025 48
Con…
If too few…
• May fail to detect an important effect
• Estimates of effect may be too imprecise (wide CI’s)

02/09/2025 49
Con…
Why is it important to consider sample size?
• In studies concerned with estimating some characteristic of a
population (e.g. the prevalence of asthmatic children), sample
size calculations are important to ensure that estimates are
obtained with required precision or confidence.

02/09/2025 50
Con…
• In planning any investigation we must decide how
many people need to be studied in order to answer
the study objectives
• Is studies concerned with detecting an effect
– e.g. a difference b/n two treatments, or identify risk
of a diagnosis, if a certain risk factor is present
versus absent),

02/09/2025 51
Cont..
– Sample size calculations are important to ensure that
if an effect deemed to be clinically or biologically
important exists,
– Then, there is a high chance of it being detected
– i.e. that the analysis will be statistically significant.

02/09/2025 52
Cont..
Sample size determination depends on the:
 objective of the study;

 design of the study;

 How different or dispersed the population

 accuracy of the measurements to be made;

 degree of precision required for generalization;

 degree of confidence with which to conclude

 Availability of resources
02/09/2025 53
Incorrect sample size will lead to:

• Wrong conclusions

• Poor quality research (Errors)

– Error can be minimized by increasing the sample size

• Waste of resources and loss of money

• Ethical problems

• Delay in completion

02/09/2025 54
Sample size determination
• Given confidence interval

mean ( proportion ) z  s.e


2

• Hence the absolute precision denoted by d is given as


d  z  s.e
2

• Where s.e is the standard error of the estimator of the


parameter of interest.
02/09/2025 55
Estimating a single population mean

02/09/2025 56
Sample size for single population proportion

 If the study aims to be conducted on single population, then we


need the following :
1. What is the probability of the event occurring?
2. How much error is tolerable ?or How much precision do
we need?
3. How confident do we need to be that the true population
value falls within the confidence interval?

02/09/2025 57
Single population proportion
• Let p denotes proportion of success, then

02/09/2025 58
Cont..
Where:
 n-is minimum sample size
 p-is estimate of the prevalence rate for the population
(if it is unknown we use 50%)
 d-is the margin of sampling error tolerated
 Zα/2 is the standard normal variable at (1-α)100%
confidence level and α is mostly 5%

02/09/2025 59
Point to be considered

02/09/2025 60
Example

1. A hospital administrator wishes to know what proportions of


discharged patients are unhappy with the care received during
hospitalization. If 95% Confidence interval is desired to estimate
the proportion within 5% margin of error, how large a sample
should be drawn?

n = Z2p(1-p)/d2=(1.96) 2 (.5×.5)/(.05)2 =384.2


≈ 385 patients

02/09/2025 61
Excersis

• A researcher wishes to estimate mean CD4 count level in a


defined community. From preliminary contact he thinks this
mean is about 400 mg/dl with a standard deviation of 40
mg/dl. If he is willing to tolerate a sampling error of up to 5
mg/dl in his estimate, how many subjects should be included
in his study?

02/09/2025 62
Con..

• If the population size is assumed to be very large, the


required sample size would be:
• n = (1.96)2 (40)2 / (5)2

=3.8416x1600/25
=245.8624 ≈ 246

• If the population size is, say, 2000, the required


sample size would be 219 persons.
02/09/2025 63
Reading assignment

Confidence Intervals for


• Difference of population mean
• Difference of population proportion

02/09/2025 64
Thank you!!!

02/09/2025 65

You might also like