0% found this document useful (0 votes)
6 views

5.Sampling and Sampling Distributions

Sampling is the process of selecting a portion of a population to represent the entire group, which is crucial for making generalizations in biostatistical analysis. It is essential that the sample is unbiased and representative to ensure valid conclusions can be drawn about the population. Various sampling methods exist, including probability and non-probability sampling, each with its own advantages and limitations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

5.Sampling and Sampling Distributions

Sampling is the process of selecting a portion of a population to represent the entire group, which is crucial for making generalizations in biostatistical analysis. It is essential that the sample is unbiased and representative to ensure valid conclusions can be drawn about the population. Various sampling methods exist, including probability and non-probability sampling, each with its own advantages and limitations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 79

Sampling

1
Sampling
 The process of selecting a portion of the population to
represent the entire population.
 Sampling individuals from a population into a sample is a
critically important step in any biostatistical analysis, because
we are making generalizations about the population based on
that sample.
 When selecting a sample from a population, it is important that
the sample is representative of the population, i.e., the
sample should be similar to the population with respect to key
characteristics.

2
Sampling Vs. Statistics
• Statistical Inference:
 Predict and forecast values of
population parameters... On basis of sample statistics
 Test hypotheses about values derived from limited and
of population parameters... incomplete sample information
 Make decisions...

Make
Make Onthe
On thebasis
basisof
of
generalizationsabout
generalizations about observationsofofaa
observations
thecharacteristics
the characteristicsofof sample,aapart
sample, partofofaa
aapopulation...
population... population
population

 However, unbiased, representative sample should be drawn at random


from the entire population.
 To make generalizations about the characteristics of a population, unbiased, representative
sample should be drawn at random from the entire population.

Unbiased
Sample
Unbiased,
representative sample
Male Female drawn at random from
Population
the entire population.

People with high Biased Biased,


income status. Sample
unrepresentative
sample drawn from
Male Female people with high income
Population status.
Sampling …
 Therefore,
 A main concern in sampling:
 Ensure that the sample represents the population, and
 The findings can be generalized.
 While selecting a SAMPLE, there are basic questions:
1. What is the group of people (STUDY POPULATION) from
which we want to draw a sample?
2. How many people do we need in our sample?
3. How will these people be selected?

5
 What do we mean “the representative sample”?
 It is an explicit or implicit objective of most studies in health care
which ‘count’ something or other (quantitative studies), to offer
conclusions that are generalizable.
 This means that the findings of a study apply to situations other than
that of the cases in the study.
 To give a hypothetical example, Smith and Jones’ (1997) study of
consultation rates in primary care which was based on data from five
practices in differing geographic settings (urban, suburban, rural)
finds higher rates in the urban environment.
 When they wrote it up for publication, Smith and Jones used statistics
to claim their findings could be generalised: the differences applied
not just to these five practices, but to all practices in the country.

6
What do we mean “the representative sample”?...
 For such a claim to be valid (the study to possess ‘external validity’),
the authors must convince us that their sample was not biased (that it
was representative). Although other criteria must also be met (for
instance, that the design was both appropriate and carried out
correctly - the study’s ‘internal validity’ and ‘reliability’),
 It is the representativeness of a sample which allows the researcher to
generalise the findings to the wider population.
 If a study has an unrepresentative or biased sample, then it may
still have internal validity and reliability, but it will not be
generalizable (will not possess external validity). Consequently the
results of the study will be applicable only to the group under study.
 Such studies are, by themselves, of little use, and for example in
the case of drug trials, this could be dangerous if their findings
were generalised.
7
What do we mean “the representative sample”?...
 It is essential to a study’s design (assuming that study wants to
generalise and is not simply descriptive of one setting) that sampling
is taken seriously.
 However, there is a second issue which must be addressed in
sampling, sample size. Generalisations from data to wider
population depend upon a kind of statistic which tests inferences
or hypotheses.
 Example_1, the t-test can be used to test a hypothesis that there
is a difference between two populations, based on a sample from
each. If we select 100 males and 100 females and test their BMI. We
find a difference in our samples, and wish to argue that the difference
found is not an accident (due to chance), but reflects an actual
difference in the wider populations from which the samples were
drawn. We use a t-test to see if we can make this claim validly.
8
What do we mean “the representative sample”?...

 Example_2, studies have shown that the prevalence of obesity


is inversely related to educational attainment (i.e., persons with
higher levels of education are less likely to be obese).
 Consequently, if we were to select a sample from a population
in order to estimate the overall prevalence of obesity, we
would want the educational level of the sample to be similar
to that of the overall population in order to avoid an over- or
underestimate of the prevalence of obesity.

9
 Why do we need to select a sample?
 In some circumstances, it is not necessary to select a sample. If the
subjects of your study are very rare, for instance a disease occurring
only once in 100 000 children, then you might decide to study every
case you can find.
 More usually, however selecting a portion of the population to represent
the entire population is must due to the follow reasons:
 Feasibility: Sampling may be the only feasible method of
collecting information.
 Reduced cost: Sampling reduces demands on resource such
as finance, personnel, and material.
 Greater accuracy: Sampling may lead to better accuracy of
collecting data
 Greater speed: Data can be collected and summarized more
quickly
 Sampling enables us to estimate the characteristic of a

population by directly observing a portion of10the population .


The hierarchy of
sampling
Study subjects
The actual participants in the study

Sample
Subjects who are selected

Sampling Frame
The list of potential subjects from which the sample is drawn

Source population
The population from whom the study subjects would be obtained

Target population
The population to whom the results would be generalized

11
Selection methods of sampling units
1. Sampling without replacement: If we selected a unit from
the population it should not be returned to that population
before the next draw. For a population of size N we can
form Ncn different samples of size n.
c
N n =

2. Sampling with replacement: in this type of selection we


have to return the first selected unit to the population
before the next draw. Here from a population of size N we
can construct Nn ; Sample sizes of n.
12
Error in sampling
 When we take a sample, our results will not exactly
equal with the results for the whole population.
 Two types of errors:
 Sampling error (random error)
 Non-sampling error (bias)

13
Sampling error
 The value of the characteristic measured in a sample
differs from that of the total population.
 Because a sample is a subset of a larger group.
 This type of error, arising from the sampling process, is
called sampling error.
 Can’t be avoided or totally eliminated.
 Minimized by increasing the size of the sample.
 When n = N, sampling error = 0

14
Non sampling error (bias)
 Systematic error in the design or conduct of a sampling
procedure.
 Results in distortion of the sample and study results.
 More serious type of error
 Multi-factorial causes
 Selection bias,
 Information bias.
 Observational error
 Respondent errors
 Errors in editing and tabulation of data

15
16
Sampling Methods
 Two broad divisions:
A. Probability sampling methods
B. Non-probability sampling methods

17
A. Probability sampling
 Every sampling unit has a known and non‐zero
probability of selection into the sample.
 However, because study samples are randomly selected
and their probability of inclusion can be calculated,
 Reliable estimates can be produced and
 Generalization can be made about the population.
 The method chosen depends on a number of factors, such as
 The available sampling frame,
 How spread out the population is,
 How costly it is to survey members of the population
 How users will analyse the data

18
Most common probability sampling methods
1. Simple random sampling
2. Systematic random sampling
3. Stratified random sampling
4. Cluster sampling
5. Multi-stage sampling
6. Sampling with probability proportional to size (PPS)

19
1. Simple random sampling
 The required number of individuals are selected at
random from the sampling frame, a list or a database of
all individuals in the population
 Each member (sampling unit) of the population has
an equal chance of being included in the sample.

 SRS has certain limitations:


 Requires a sampling frame which is not always possible.
 Minority subgroups of interest may not be selected.
 Difficult if the reference population is dispersed.
20
To use a SRS method:
 Make a numbered list of all sampling units in the
population
 Each unit should be numbered from 1 to N (where N is the
size of the population)
 Select the required number using any one of the sampling
procedures.
 The randomness of the sample is ensured by:
 Use of “lottery” methods
 Table of random numbers or
 Computer programs

21
2. Systematic random sampling
 Sometimes called interval sampling
 Selection of individuals from the sampling frame
systematically rather than randomly
 Individuals are taken at regular intervals down the list
(Every Kth individual is chosen from the sampling frame)
 The starting point is chosen at random

22
Steps in systematic random sampling
1. Number the units on your frame from 1 to N (where
N is the total population size).
2. Determine the sampling interval (K) by dividing the
number of units in the population by the desired sample
size. K=
3. Select a number between one and K at random. This
number is called random start and would be the first
number included in your sample.
4. Select every Kth unit after that first number

23
24
Example:
 To select n=20 from N=100, sampling interval K=N/n=100 ÷ 20 = 5
 You will need to select one unit out of every five units to end up
with a total of 20 units in your sample.
 Select a number between 1 and 5 from a table of random
numbers or by simple random sampling
 If you choose 4, the fourth unit on your frame would be the
first unit included in your sample;
 The sample might consist of the following units to make up a
sample of 20: 4 (the random start), 9, 14, 19, 24..., 99 (up to N,
which is 100 in this case).

25
Systematic sampling…
 Note: Systematic sampling should not be used when a cyclic
repetition is inherent in the sampling frame.
 Advantage:
 Easier to perform it
 Require less time than SRS
 Very good when the population from which sample is to be
draw homogeneously distributed
 Disadvantage:
 Patterns/periodicity

26
3. Stratified random sampling
 It is done when the population is known to have heterogeneity
with regard to some factors and those factors are used for
stratification.
 Using stratified sampling, the population is divided into
homogeneous, mutually exclusive groups called strata, and
 A population can be stratified by any variable that is available for
all units prior to sampling (e.g., age, sex, province of residence,
income, etc.).

27
Stratified random sampling…
 A separate sample is drawn independently from each
stratum.
 Any of the sampling methods mentioned in this section
(and others that exist) can be used to sample within
each stratum.
 Stratified sampling ensures an adequate sample size
for sub‐groups in the population of interest.
 When a population is stratified, each stratum becomes
an independent population and you will need to
decide the sample size for each stratum.

28
Stratified random sampling…
 Why do we need to create strata?
 There are many reasons,
 That it can make the sampling strategy more efficient.
 You need a larger sample to get a more accurate estimation of a
characteristic that varies greatly from one unit to the other than
for a characteristic that does not.
 For example, if every person in a population had the same
salary, then a sample of one individual would be enough to
get a precise estimate of the average salary

29
Stratified random sampling…
 Equal allocation:
 Allocate equal sample size to each stratum
 Proportionate allocation:
j= Nj
 j is sample size of the jth stratum
 Nj is population size of the jth stratum
 n = n1+ n2+ ...+ nk is the total sample size
 N = N1+ N2+ ...+ Nk is the total population size

30
Stratified random sampling…
 Example:
 Village A B C D Total
 HHs 100 150 120 130 500
 S. size ? ? ? ? 60

 Advantage
 The representativeness of the sample is improved.
 Disadvantage
 Sampling frame for the entire population has to be prepared
separately for each stratum.

31
4. Cluster sampling
 Sometimes it is too expensive to carry out SRS
 Population may be large and scattered.
 Complete list of the study population unavailable
 Population consists of many natural groups (clusters)
 Travel costs can become expensive if interviewers have to
survey people from one end of the country to the other.
 Cluster sampling is the most widely used to reduce the cost
(administrative convenience)
 The clusters should be homogeneous, unlike stratified
sampling where the strata are heterogeneous

32
Steps in cluster sampling
 Cluster sampling divides the population into groups or clusters.
 A number of clusters are selected randomly to represent the total
population, and then all units within selected clusters are included in
the sample.
 This differs from stratified sampling, where some units are selected
from each group.
 No units from non-selected clusters are included in the sample, they
are represented by those from selected clusters.
 Example:
 In a school based study, we assume students of the same school to be
homogeneous.
 We can select randomly sections and include all students of the
selected sections only

33
Cluster sampling…
 Advantages:
 Cost and time reduction
 It creates 'pockets' of sampled units instead of spreading the
sample over the whole territory.
 Sometimes a list of all units in the population is not available,
while a list of all clusters is either available or easy to create.
 Disadvantages:
 Creates a loss of efficiency when compared with SRS.
 It is usually better to survey a large number of small clusters
instead of a small number of large clusters.
 This is because neighboring units tend to be more alike, resulting in a
sample that does not represent the whole spectrum of opinions or
situations present in the overall population (Design Effect).

34
5. Multi-stage sampling
 Similar to the cluster sampling, except that it involves picking a
sample from within each chosen cluster, rather than including all
units in the cluster.
 The selected clusters in the primary cluster sample are themselves
sampled, rather than fully studied
 This type of sampling requires at least two stages.
 The primary sampling unit (PSU) is the sampling unit in the first
sampling stage.
 The secondary sampling unit (SSU) is the sampling unit in the
second sampling stage, etc.

35
Multi-stage sampling…

36
Multi-stage sampling…
 Advantage:
 You do not need to have a list of all units in the
population.
 Saves a great amount of time and effort by
not having to create a list of all the units in a
population.
 Commonly used with cluster sampling
 Multi‐Stage Cluster Sampling
 Disadvantage
 Sampling error is increased compared with a SRS

37
B. Non-probability sampling
 In non‐probability sampling, every item has an unknown chance of
being selected.
 There is an assumption that there is an even distribution of a
characteristic of interest within the population.
 In non‐probability sampling, since elements are chosen
subjectively, there is no way to estimate the probability of any one
element being included in the sample.
 It may lead to unrepresentative samples and/or results are
unconvincing
 Reliability can’t be ensured
 Inappropriate if the aim is to measure variables and generalize
findings obtained from a sample to the population.

38
Non probability sampling …
 Despite these drawbacks, non‐probability sampling
methods can be useful when descriptive comments
about the sample itself are desired.
 Secondly, they are quick, inexpensive and convenient.

 There are also other circumstances, such as researches,


when it is unfeasible or impractical to conduct
probability sampling.

39
The most common types of non‐probability sampling
1. Convenience or haphazard sampling
2. Volunteer sampling
3. Judgment sampling
4. Quota sampling
5. Snowball sampling technique

40
1. Convenience or haphazard sampling
 Convenience sampling is sometimes referred to as haphazard or
accidental sampling.
 Major reason is administration convenience and sample is
chosen with ease of access
 It is not normally representative of the target population
because sample units are only selected if they can be
accessed easily and conveniently.
 It can be used when time and resources are too short, but that
advantage is greatly offset by the presence of bias.
 Although useful applications of the technique are limited, it can
deliver accurate results when the population is homogeneous.

41
Convenience or haphazard sampling…
 For example, a scientist could use this method to
determine whether a lake is polluted or not.
 Assuming that the lake water is well‐mixed, any
sample would yield similar information.
 A scientist could safely draw water anywhere on the
lake without bothering about whether or not the sample
is representative

42
2. Volunteer sampling
 Occurs when people volunteer to be involved in the study.
 In psychological experiments or pharmaceutical trials (e.g.,
drug testing), for example, it would be difficult and unethical
to enlist random participants from the general public.
 In these instances, the sample is taken from a group of
volunteers.
 Sometimes, the researcher offers payment to attract
respondents.
 Sampling voluntary participants as opposed to the general
population may introduce strong biases.
 The majority does not volunteer, resulting in large selection
bias.
43
3. Judgment sampling
 This approach is used when a sample is taken based on
certain judgments about the overall population.
 The underlying assumption is that the investigator will select
units that are characteristic of the population.
 The critical issue here is objectivity: how much can judgment be
relied upon to arrive at a typical sample?
 Researchers often use this method in exploratory studies like
pre‐testing of questionnaires and focus groups.
 They also prefer to use this method in laboratory settings
where the choice of experimental subjects (i.e., animal,
human) reflects the investigator's pre‐existing beliefs about the
population

44
4. Quota sampling
 The most common sampling method in market research about
the views on products
 A proper design may have been used to determine what
numbers are needed in each of the quotas
 A sample of 50 men and 50 women
 Selection of individuals is done until the required total in
each group (quota) is obtained.
 Quota sampling is an effective sampling method when
information is urgently required and can be conducted without
sampling frames.
 In many cases where the population has no suitable frame, quota
sampling may be the only appropriate sampling method.

45
Quota sampling…
 The main argument against quota sampling is that it
does not meet the basic requirement of randomness.

 Some units may have no chance of selection or the


chance of selection may be unknown.

 Therefore, the sample may be biased

46
5. Snowball sampling
 A technique for selecting a sample where existing
study subjects recruit future subjects from among their
acquaintances.
 Thus the sample group appears to grow like a rolling
snowball.
 Sampling people who are difficult to contact

47
Snowball sampling …
 This sampling technique is often used in hidden populations
which are difficult for researchers to access; example
populations would be drug users, CSWs, homeless or street children,
etc.
 Because sample members are not selected from a sampling frame,
snowball samples are subject to numerous biases. For example,
people who have many friends are more likely to be recruited into the
sample

48
Sampling distribution

49
Sampling distribution
 The mean of a representative sample provides an estimate of
the unknown population mean, but intuitively we know that if
we took multiple samples from the same population, the
estimates would vary from one another.
 We could, in fact, sample over and over from the same
population and compute a mean for each of the samples. In
essence, all these sample means constitute yet another
"population," and We could graphically display the frequency
distribution of the sample means.
 This is referred to as the sampling distribution of
the sample means.

50
Main types of sampling distributions
I. Distribution of the sample mean
II. Distribution of the sample proportion
III. Distribution of the difference between two means
IV. Distribution of the difference between two proportions

51
I. Sampling distribution of sample mean
 The sampling distributions illustrate three fundamental properties:
1. The mean:
 The mean of all possible estimates obtained from samples of identical
size is equal to the true population mean.
2. The standard deviation(SD):
 The SD of the sampling distribution decreases as the sample size
increases.
 The SD of a sampling distribution takes a special name, standard
error, often indicated by the letters SE.
3. The shape:
 The shape of the sampling distribution is approximately normal
when the sample size is large.
 This property is known as the Central Limit Theorem.
 It is the most important of all the three properties
52
Sampling distribution of sample mean…

 The probability distribution of the sample mean is called the sampling


distribution of the sample mean.
 Suppose we have a population of size N=4, constituting the ages
of four outpatients.
x, Age (years): 18, 20, 22, 24

53
Sampling distribution of sample mean…
 Now consider all possible samples of size n=2
There are 4*4 = 16 different but equally- Each of these samples has a sample
likely samples of size 2 that can be drawn mean. For example, the mean of the
(with replacement) from a uniform sample (20,22) is 21, and the mean of the
population of the integers from 18 to 24: sample (18,22) is 20.

 Samples of size 2 from uniform (18,24)  Sample means from uniform (18,24)

54
Sampling distribution of sample mean…

55
Sampling distribution of sample mean…
 Summary measures of this sampling distribution:
1. Calculate the mean of the sample means by adding the individual 16
sample means & dividing the sum by 16.
2. Also calculate the SD of the sample means.
3. Finally compare with the original population results.

56
Comparing the population with its sampling distribution

57
Comparing the population with its sampling distribution…

 We note that the mean of the sampling distribution of


has the same value as the mean of the original population.

 However, the variance is not equal to the original


population variance; but it is equal to the population
variance divided by the sample size used to obtain
sampling distribution.
 E.g. σ = 2.236/1.414=1.58
σx 
n
 Which is equal to the original population variance.
58
Sampling distribution of sample mean…
 The square root of the sampling distribution variance is called
standard error (SE) of the mean or, simply, standard error.
σ
= σx 
n
 OR, the standard deviation of any sample statistic is called its SE.
 SE is determined by both the sample size and the degree of
variability among the individual observations
 SD quantifies the amount of variability among individuals in a
population, while
 SE quantifies the variability among means of repeated samples
drawn from that population
 The SE is always smaller than the SD (except when n = 1)
59
Sampling with Vs without replacement
 The foregoing sampling distribution of sample means
was based on the assumption that sampling is either
with replacement or the samples are drawn from infinite
populations.
 Sampling with replacement is difficult under practical
conditions
 Necessary to sample from finite population
 Sampling with  Sampling without replacement
replacement Population size = N
Population size = N Sample size = n
Sample size = n 1st draw = N
1st draw = N 2nd draw = N-1
2nd draw = N 3rd draw = N-2
3rd draw = N nth draw = (N-n+1)
nth draw = N  The # of possible samples = NCn = N!
The # of possible samples n!(N-n)!
= Nn  The mean of the sampling distribution is equal
to the population mean.
 The variance of the sampling distribution is:
= σ2 (N-n)
n (N-1)
 Finite population correction, (N-n)/(N-1)
 Calculating sampling error
 Sampling error: The difference between a value (a statistic) computed
from a sample and the corresponding value (a parameter) computed
from a population
 Example: (for the mean)

 Example
 If the population mean is μ = 98.6 degrees and a sample of n = 5
temperatures yields a sample mean of x = 99.2 degrees, then the
sampling error is: = 99.2 – 98.6 = 0.6 degrees.
 The sampling error may be +ve or -ve ( x may be > or < μ)
 The expected sampling error decreases as the sample size increases

62
 Central Limit Theorem (CLT)
 The CLT states that if you have a population with mean μ and
standard deviation σ and take sufficiently large random
samples from the population with replacement, then the
distribution of the sample means will be approximately
normally distributed.
 This will hold true regardless of whether the source population
is normal or skewed, provided the sample size is sufficiently
large (usually n > 30). If the population is normal, then the
theorem holds true even for samples smaller than 30.

63
Central Limit Theorem (CLT)…
 In fact, this also holds true even if the population is binomial,
provided that min(np, n(1-p))> 5, where n is the sample size
and p is the probability of success in the population.
 This means that we can use the normal probability model to
quantify uncertainty when making inferences about a
population mean based on the sample mean.
 For the random samples we take from the population, we
can compute the mean of the sample means:
then we get
 Similarly, we can compute the standard deviation of the
sample means:

64
Central Limit Theorem (CLT)…
 In order for the result of the CLT to hold true,
 The sample must be sufficiently large (n > 30). Again,
there are two exceptions to this. If the population is
normal, then the result holds for samples of any size
(i..e, the sampling distribution of the sample means
will be approximately normal even for samples of size
less than 30).

65
 Central Limit Theorem with a Normal Population:
 The figure below illustrates a normally distributed characteristic, X, in a
population in which the population mean is 75 with a standard deviation 8.

66
Central Limit Theorem with a Normal Population…
 If we take simple random samples (with replacement) of size n=10 from the
population and compute the mean for each of the samples, the distribution
of sample means should be approximately normal according to the CLT.
 Note that the sample size (n=10) is less than 30, but the source population is
normally distributed, so this is not a problem.
 The distribution of the sample means is illustrated below. Note that the
horizontal axis is different from the previous illustration, and that the range is
narrower.

67
Central Limit Theorem with a Normal Population…
 The mean of the sample means is 75 and the standard
deviation of the sample means is 2.5, with the standard
deviation of the sample means computed as follows:

 If we were to take samples of n=5 instead of n=10, we would


get a similar distribution, but the variation among the sample
means would be larger. In fact, when we did this we got a
sample mean = 75 and a sample standard deviation = 3.6.

68
 Central Limit Theorem with a Dichotomous Outcome:
 Now suppose we measure a characteristic, X, in a population and
that this characteristic is dichotomous (e.g., success of a medical
procedure: yes or no) with 30% of the population classified as a
success (i.e., p=0.30) as shown below.

69
Central Limit Theorem with a Dichotomous Outcome…
 The CLT applies even to binomial populations like this provided that the minimum
of np and n(1-p) is at least 5, where "n" refers to the sample size, and "p" is the
probability of "success" on any given trial.
 In this case, we will take samples of n=20 with replacement, so min(np, n(1-
p)) = min(20(0.3), 20(0.7)) = min(6, 14) = 6. Therefore, the criterion is met.
 The population mean and standard deviation for a binomial distribution are given:
 Mean binomial probability:

 Standard deviation:

70
Central Limit Theorem with a Dichotomous Outcome…
 The distribution of sample means based on samples of size n=20 is shown below.

 The mean of the sample means is:

 And, the standard deviation of the sample means is:

71
Central Limit Theorem with a Dichotomous Outcome…
 Now, instead of taking samples of n=20, suppose we take simple random samples
(with replacement) of size n=10.
 Note that in this scenario we do not meet the sample size requirement for the
Central Limit Theorem (i.e., min(np, n(1-p)) = min(10(0.3), 10(0.7)) = min(3,
7) = 3).
 The distribution of sample means based on samples of size n=10 is shown
on the right, and you can see that it is not quite normally distributed.
 The sample size must be larger in order for the distribution to approach
normality.

72
 Application of the Central Limit Theorem:
Example-1:
Data from the Framingham Heart Study found that subjects over age 50 had a
mean HDL of 54 and a standard deviation of 17.
Suppose a physician has 40 patients over age 50 and wants to determine the
probability that the mean HDL cholesterol for this sample of 40 men is 60 mg/dl or
more (i.e., low risk).
Probability questions about a sample mean can be addressed with the Central
Limit Theorem, as long as the sample size is sufficiently large. In this case n=40, so
the sample mean is likely to be approximately normally distributed,
so we can compute the probability of HDL>60 by using the standard normal
distribution table.

73
Application of the CLT Example-1…
 The population mean is 54, but the question is what is the probability that the
sample mean will be >60?
 In general, we have the standardization formula :

 While, the standard deviation of the sample mean is:

 Then, the formula to standardize a sample mean will be:

And in this case:

 P(Z > 2.22) can be looked up in the standard normal distribution table, and because we want
the probability that P(Z > 2.22), we compute is as P(Z > 2.22) = 1 - 0.9868 = 0.0132.
Therefore, the probability that the mean HDL in these 40 patients will exceed 60 is 1.32%.

74
Application of the Central Limit Theorem…
Example-2:
Suppose we want to estimate the mean LDL cholesterol) in the population
of adults 65 years of age and older. We know from studies of adults under
age 65 that the standard deviation is 13, and we will assume that the
variability in LDL in adults 65 years of age and older is the same.
We will select a sample of n=100 participants > 65 years of age, and we
will use the mean of the sample as an estimate of the population mean.
We want our estimate to be precise, specifically we want it to be within 3
units of the true mean LDL value.

75
Application of the CLT Example-2…
What is the probability that our estimate (i.e., the sample mean) will be within 3 units of the
true mean? We think of this question as P(μ - 3 < sample mean < μ + 3). Because this is a
probability about a sample mean, we will use the Central Limit Theorem.
With a sample of size n=100 we clearly satisfy the sample size criterion so we can use the
Central Limit Theorem and the standard normal distribution table.
The previous questions focused on specific values of the sample mean (e.g., 50 or 60) and
we converted those to Z scores and used the standard normal distribution table to find the
probabilities.
Here the values of interest are μ - 3 and μ + 3. The solution can be set up as follows:

 From the standard normal distribution table P(Z < 2.31) = 0.9896, and a P(Z < -2.31) = 0.0104. The
range between these two = P(-2.31 < Z < 2.31) = 0.9896 - 0.0104 = 0.9791.
 Therefore, there is a 97.91% probability that the sample mean, based on a sample of size
n=100, will be within 3 units of the true population mean.
 This is a very powerful statement, because it means that for this question looking only at 100
individuals aged 65 or older gives us a very precise estimate of the population mean.
76
Application of the Central Limit Theorem…
 Example 3
 Given: μ = 50, σ = 16, n = 64
Find: P( x > 53)
Solution
1. Write the given information, μ=50, σ=16, n=64
2. Sketch a normal curve
3. Convert x to a z score

4. Find the appropriate value(s) in the Table


The area of the SND above a value of z = 1.5 gives an area of
0.0668. The probability P (z > 1.5) = 0.0668
5. Complete the answer:
The probability that X is greater than 53 is 0.0668.
Application of the Central Limit Theorem…
 Example 4
 Suppose a population has mean μ = 8 and standard deviation σ = 3.
 Suppose a random sample of size n = 36 is selected.
 What is the probability that the sample mean is between 7.8 and 8.2?
Solution:
 Even if the population is not normally distributed, the central limit
theorem can be used (n > 30)
 … so the sampling distribution of is approximately normal
x
 … with mean = 8
μx
 …and
σ 3
σx   0.5
n 36
Application of the Central Limit Theorem Example 4…
 
 7.8 - 8 μx -μ 8.2 - 8 
P(7.8  μ x  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  z  0.4)  0.3108

Population Sampling Standard Normal


Distribution Distribution Distribution
.1554
??? +.1554
? ??
? ? Sample Standardize
? ? ?
?
-0.4 0.4
μ 8 x 7.8
μx 8
8.2
x μz 0 z

You might also like