0% found this document useful (0 votes)
45 views

Sampling and Sample Size: Dr. Keerti Jain, NIIT University, Neemrana

The document discusses sampling and sample size. It defines population as the entire set of items or people of interest, while a sample is a subset of the population. The key points are: - Samples are taken due to limited time/money to study entire populations - Probability sampling allows statistical inference about populations from representative samples - Sample size depends on desired confidence level, margin of error, and population variability - Larger samples are needed when variability is unknown or to compare two groups on an outcome

Uploaded by

Tushar Goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Sampling and Sample Size: Dr. Keerti Jain, NIIT University, Neemrana

The document discusses sampling and sample size. It defines population as the entire set of items or people of interest, while a sample is a subset of the population. The key points are: - Samples are taken due to limited time/money to study entire populations - Probability sampling allows statistical inference about populations from representative samples - Sample size depends on desired confidence level, margin of error, and population variability - Larger samples are needed when variability is unknown or to compare two groups on an outcome

Uploaded by

Tushar Goel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 40

SAMPLING AND SAMPLE

SIZE

Dr. Keerti Jain,


NIIT University, Neemrana
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

POPULATION AND
2

SAMPLE
Population:
a set which includes all measurements
of interest to the researcher
(The collection of all responses,
measurements, or counts that are
of interest)

Sample:
A subset of the population
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

3 POPULATION DEFINITION

• A population can be defined as including all people or items with the


characteristic one wishes to understand.
• Because there is very rarely enough time or money to gather information
from everyone or everything in a population, the goal becomes finding a
representative sample (or subset) of that population.
• The population from which the sample is drawn may not be the same as
the population about which we actually want information. Often there is
large but not complete overlap between these two groups due to frame
issues etc .
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

4 EXAMPLE

• We might study rats in order to get a better understanding of


human health, or we might study records from people born in
2008 in order to make predictions about people born in 2009.
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

5 SAMPLING

A sample is “a smaller (but hopefully


representative) collection of units from a population
used to determine truths about that population”
(Field, 2005)
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

6 WHY SAMPLING?

• What is your population of interest?


• To whom do you want to generalize your results?
• All doctors
• School children
• Indians
• Women aged 15-45 years
• Other
• Can you sample the entire population?
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

7
WHY SAMPLING?

• Less costs
• Less field time
• But less accuracy
• When it’s impossible to study the whole
population
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

8 WHEN MIGHT SAMPLE THE


ENTIRE POPULATION?

• When your population is very small


• When you have extensive resources
• When you don’t expect a very high response
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

9
TERMINOLOGY

Target Population:
The population to be studied/ to which the investigator wants to generalize his results
Sampling Unit:
Smallest unit from which sample can be selected
Study Population:
The part of target population from which the investigation collect the sample population
Sampling frame:
List of all the sampling units from which sample is drawn
Sampling scheme:
Method of selecting sampling units from sampling frame
Dr. Keerti Jain, NIIT University Neemrana
07/22/20

10 SAMPLING

STUDY POPULATION

Sample Frame

SAMPLE

TARGET POPULATION
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

11 SAMPLING BREAKDOWN
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

12 EXAMPLE OF SAMPLING FRAME

The sampling frame is the list from which the


potential respondents are drawn
•Registrar’s office
•Class rosters
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

13 IMPORTANCE OF SAMPLING
FRAME
• In the most straightforward case, such as the sentencing of a batch of material
from production (acceptance sampling by lots), it is possible to identify and
measure every single item in the population and to include any one of them in
our sample. However, in the more general case this is not possible.

• There is no way to identify all rats in the set of all rats. Where voting is not
compulsory, there is no way to identify which people will actually vote at a
forthcoming election (in advance of the election)
• As a remedy, we seek a sampling frame which has the property that we can
identify every single element and include any in our sample.
• The sampling frame must be representative of the population
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

14 FACTORS INFLUENCE SAMPLE


REPRESENTATIVENESS

• Sampling procedure
• Sample size
• Participation (response)
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

15 SAMPLING PROCESS

The sampling process comprises several stages:


• Defining the population of concern
• Specifying a sampling frame a set of items or events possible to measure
• Specifying a sampling method for selecting items or events from the frame
• Determining the sample size
• Implementing the sampling plan
• Sampling and data collecting
• Reviewing the sampling process
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

16
TYPES OF SAMPLING
TECHNIQUES

• Non Probability Sampling

• Probability Sampling
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

17
NON PROBABILITY SAMPLING

• Probability of being chosen is unknown


• Cheaper- but unable to generalise
• potential for bias
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

18
PROBABILITY SAMPLING

• Random sampling
• Each subject has a known probability of being selected

• Allows application of statistical sampling


theory to results to:
• Generalise
• Test hypotheses
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

19 TYPES OF NON-PROBABILITY
SAMPLE

• Convenience sample
• Purposive sample
• Judgmental Sampling
• Quota Sampling
• SnowBall Sampling
• Panel Sampling
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

20 TYPES OF PROBABILITY
SAMPLING
• Simple Random Sample
• Systematic random sample
• Stratified random sample
• Multistage sample
• Multiphase sample
• Cluster sample
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

21
Errors in Sample

Systematic error (or bias)


Inaccurate response (information bias)
Selection bias

Sampling error (random error)


Dr. Keerti Jain, NIIT University Neemrana 07/22/20

22
TYPE 1 ERROR

• The probability of finding a difference with our sample


compared to population, and there really isn’t one….
• Known as the α (or “type 1 error”)
• Usually set at 5% (or 0.05)
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

23
TYPE 2 ERROR

• The probability of not finding a difference that actually exists


between our sample compared to the population…
• Known as the β (or “type 2 error”)
• Power is (1- β) and is usually 80%
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

24 SAMPLE SIZE FOR ESTIMATING


POPULATION MEAN
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

25 EXAMPLE 1

• An investigator wants to estimate the mean systolic blood pressure in children


with congenital heart disease who are between the ages of 3 and 5. How many
children should be enrolled in the study? The investigator plans on using a
95% confidence interval (so Z=1.96) and wants a margin of error of 5 units.
The standard deviation of systolic blood pressure is unknown, but the
investigators conduct a literature search and find that the standard deviation
of systolic blood pressures in children with other cardiac defects is between
15 and 20. To estimate the sample size, we consider the larger standard
deviation in order to obtain the most conservative (largest) sample size. 
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

26 SOLUTION

In order to ensure that the 95% confidence interval estimate of the mean
systolic blood pressure in children between the ages of 3 and 5 with
congenital heart disease is within 5 units of the true mean, a sample of size
62 is needed.
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

27

Example 2
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

28 SAMPLE SIZES FOR TWO


INDEPENDENT SAMPLES
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

29

EXAMPLE 3
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

30 EXAMPLE 4

An investigator wants to compare two diet programs in children who are obese. One diet
is a low fat diet, and the other is a low carbohydrate diet. The plan is to enroll children
and weigh them at the start of the study. Each child will then be randomly assigned to
either the low fat or the low carbohydrate diet. Each child will follow the assigned diet
for 8 weeks, at which time they will again be weighed. The number of pounds lost will
be computed for each child. Based on data reported from diet trials in adults, the
investigator expects that 20% of all children will not complete the study. A 95%
confidence interval will be estimated to quantify the difference in weight lost between
the two diets and the investigator would like the margin of error to be no more than 3
pounds. How many children should be recruited into the study?  
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

31 SOLUTION

Samples of size n1=56 and n2=56 will ensure that the 95% confidence interval for
the difference in weight lost between diets will have a margin of error of no more
than 3 pounds. Again, these sample sizes refer to the numbers of children with
complete data.
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

SAMPLE SIZE FOR ONE SAMPLE,


32 DICHOTOMOUS OUTCOME
(PROPORTION)

where p is proportion
E is sampling error or tolerable margin of error
E= difference between population proportion and sample proportion
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

33

EXAMPLE 5

It was desired to estimate proportion of anemic children in a certain


preparatory school. In a similar study at another school a proportion
of 30 % was detected.
Compute the minimal sample size required at a confidence limit of 95%
and accepting a difference of up to 4% of the true population.

SOLUTION
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

34 EXAMPLE 6

An investigator wants to estimate the proportion of freshmen at his


University who currently smoke cigarettes (i.e., the prevalence of
smoking). How many freshmen should be involved in the study to
ensure that a 95% confidence interval estimate of the proportion of
freshmen who smoke is within 5% of the true proportion?
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

35 SOLUTION

In order to ensure that the 95% confidence interval estimate of the


proportion of freshmen who smoke is within 5% of the true proportion, a
sample of size 385 is needed
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

SAMPLE SIZES FOR TWO SAMPLES,


36 DICHOTOMOUS OUTCOME
(PROPORTIONS)

E is sampling error or tolerable margin of error


E= difference between sample proportions
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

37
EXAMPLE 7
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

38 EXAMPLE 8

• An investigator wants to estimate the impact of smoking during pregnancy on


premature delivery. Normal pregnancies last approximately 40 weeks and premature
deliveries are those that occur before 37 weeks. The 2005 National Vital Statistics
report indicates that approximately 12% of infants are born prematurely in the United
States.5 The investigator plans to collect data through medical record review and to
generate a 95% confidence interval for the difference in proportions of infants born
prematurely to women who smoked during pregnancy as compared to those who did
not. How many women should be enrolled in the study to ensure that the 95%
confidence interval for the difference in proportions has a margin of error of no more
than 4%?
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

39 SOLUTION

The sample sizes (i.e., numbers of women who smoked and did not smoke
during pregnancy) can be computed using the formula shown above.
National data suggest that 12% of infants are born prematurely. We will use
that estimate for both groups in the sample size computation.

  
Samples of size n1=508 women who smoked during pregnancy and n2=508
women who did not smoke during pregnancy will ensure that the 95%
confidence interval for the difference in proportions who deliver prematurely
will have a margin of error of no more than 4%.
Dr. Keerti Jain, NIIT University Neemrana 07/22/20

40

You might also like