Sampling Note
Sampling Note
INDICATOR SURVEYS
SAMPLING METHODOLGY
August 29th, 2009
available at: www.enterprisesurveys.org
07/21/09 2
INTRODUCTION
1. The World Banks Enterprise Surveys (ES) collect data from key manufacturing and service
sectors in every region of the world. The Surveys use standardized survey instruments and a uniform
sampling methodology to minimize measurement error and to yield data that are comparable across
the worlds economies. Most importantly, the Enterprise Surveys are designed to provide panel data
sets. Because panel data is one of the best ways to pinpoint how and which of the changes in the
business environment affect firm-level productivity over time, the Enterprise Survey team has made
panel data a top priority.
2. The use of properly designed survey instruments and a uniform sampling methodology
provide a solid foundation for recommendations that stem from this analysis. The World Banks
Enterprise Survey aims to achieve the following objectives:
provide statistically significant investment climate indicators that are comparable across
countries;
assess the constraints to private sector growth and job creation;
build a panel of establishment-level data that will make it possible to track changes in the
business environment over time, thus allowing impact assessments of reforms; and
stimulate dialogue on reform opportunities.
3. This note provides information on the sampling methodology for Enterprise Surveys and
Indicators Surveys, which are lighter versions of the former. Two complementary documents, the
Questionnaire Manual and the Technical Note on Weight Computation for Enterprise and Indicator
Surveys complete the documentation. The Questionnaire Manual
1. SAMPLING METHODOLOGY
provides a detailed explanation of
the questions contained in the questionnaire and how the questionnaire should be implemented. The
weights note provides a technical explanation on how weights and adjustments are computed.
4. The sampling methodology of the World Banks Enterprise Survey generates sample sizes
appropriate to achieve two main objectives: first, to benchmark the investment climate of individual
economies across the world and, second, to conduct firm performance analyses focusing mainly on
how investment climate constraints affect productivity and job creation in selected sectors.
5. To achieve both objectives the sampling methodology:
generates a sample representative of the whole non-agricultural private economy that
substantiates assertions about this part of the economy, not only about the
manufacturing sector. The overall sample should include, in addition to selected
manufacturing industries, services industries and other relevant sectors of the economy;
and
generates large enough sample sizes for selected industries to conduct statistically robust
analyses with levels of precision at a minimum 7.5% precision for 90% confidence
intervals about:
1
i. Estimates of population proportions (percentages), at the industry level; and
ii. Estimates of the mean of log of sales at the industry level.
1
A 7.5% precision of an estimate in a 90% confidence interval means that we can guarantee that the population
parameter is within the 7.5% range of the observed sample estimate, except in 10% of the cases.
07/21/09 3
1.1. Stratification
6. The population of industries to be included in the Enterprise Surveys and Indicator Surveys,
the Universe of the study, includes the following list (according to ISIC, revision 3.1): all
manufacturing sectors (group D), construction (group F), services (groups G and H), transport,
storage, and communications (group I), and subsector 72 (from Group K). Also, to limit the surveys
to the formal economy the sample frame for each country should include only establishments with
five (5) or more employees. Fully government owned firms are excluded as the Universe is defined
as the non-agricultural private sector.
7. Enterprise Surveys and Indicator Surveys are stratified following 3 criteria: sector of activity,
firm size, and geographical location. Stratification by firm size divides the population of firms into 3
strata: small firms (5-19 employees), medium firms (20-99 employees), and large firms (100 or more
employees). Geographical distribution is defined to reflect the distribution of the non-agricultural
economic activity of the country; for most countries this implies including the main urban centers or
regions of the country. Around the world, most of the non-agricultural activity is clustered around
the main centers of population.
8. The stratification by sector of activity depends on the size of the economy as measured by
the Gross National Income (GNI). As described in Table 1, very small economies (below $15 billion
GNI of 2008) are surveyed using the Indicator Survey and are stratified into 2 groups:
manufacturing and rest of the non-agricultural economy, with 75 interviews allocated to each group.
For small economies, GNI between $15 billion and $100 billion, the universe of industries is
stratified into manufacturing, retail, and the rest of the non-agricultural economy. Medium size
economies single out the 4 most important manufacturing industries of the country, remaining
manufacturing industries are grouped together into an additional strata, while retail and the rest of
the non-agricultural economy compose the final 2 strata for the economy. Finally, for large
economies 6 manufacturing sectors are selected as strata while remaining manufacturing industries
are grouped into a residual sector; retail and the rest of the non-agricultural economy complete the
overall sample.
Table 1 Stratification by Sector for Different Economy Sizes
Size GNI
as of 2008
# of manuf.
Industries
# of services
industries
Rest of the
non-agric.
economy
Total sample
size
Very small <$15 billion
1 together 150
Small ($15-100
billion
1 1 1 360
Medium $100-500
billion
5 1 1 1000
Large >$500
billion
7 1 1 1320
9. The sampling plan as described above is designed to have a representative services industry
for all countries: retail trade (ISIC 52). For all economy-sizes inferences can be made for
manufacturing and the rest of the non-agricultural economy. For small economies inferences can be
07/21/09 4
made for manufacturing, retail and the rest of the non-agricultural economy. For medium and large
size economies, in addition, inferences can be made for selected manufacturing industries.
10. To keep comparability with previous surveys and across countries, two (2) manufacturing
industries will be selected in all medium and large countries: manufacture of food products and
beverages (ISIC 15), and manufacture of wearing apparel and fur (ISIC 18). Additional industries are
chosen at the two-digit ISIC level depending on the characteristics of the economy as summarized in
three variables: contribution to value added, employment, and number of firms. The final decision
of industries will be made on a country by country basis trying to keep similar industries across
countries to facilitate cross-country comparability.
2. SAMPLE SIZE
11. Overall sample sizes for both Enterprise Surveys and Indicator Surveys are determined by
the degree of stratification of the sample. The overall sample size depends on the decision of the
sample size for each level of stratification. In all ES and IS the objectives of stratification are to
allow an acceptable level of precision for estimates, at, first, different first, within size levels (small,
medium, and large), second, at the different levels of regional stratification, and third, for the
different sectors of stratification (which, as explained before, are chosen depending on the size of
the economy).
12. Given that both the Enterprise Survey and the Indicator Survey include more than 100
indicators the computation of the minimum sample size required is complicated since it depends on
the variance of each indicator. However, many of the indicators computed from the survey are
proportions, such as percentage of firms that engage in X activity or chose Y action. In this case the
computation of the sample size is simplified by the fact that the variance of a proportion is bounded.
Assuming the maximum variance (0.5) the minimum level of precision is guaranteed.
13. Table 2 exhibits minimum sample sizes for different population sizes for estimates of
proportions with 5% and 7.5% precision in 90% confidence intervals, assuming maximum variance.
2
With 5% precision, the minimum sample size tends to a sample size of 270, as population size
increases; with 7.5% precision the sample size tends to 120. Note that if the population size of an
industry falls below 1,500, the required sample size for proportions may be reduced considerably
(see figure 1 and 2). Although a 5% precision would be most desirable, a precision of 7.5% is in line
with the budget constraints for the Enterprise Survey work around the world. Consequently, an
operational 120 samples size per stratum was selected.
2
1
2
2
1
1 1 1
(
(
(
|
|
|
.
|
\
|
+ =
z
k
PQ N
N
N
n where N=population size, P=population proportion, Q=1-P
k=desired level of precision,
2
1
z is the value of the normal standard coordinate for a desired level of confidence, 1-.
07/21/09 5
Table 2 - Sample Sizes Required with 5% and 7.5% Precision and 90% Confidence
Population
size
Sample
Size 5%
Sample
Size
7.5%
50 42 36
100 73 55
200 115 75
300 143 86
400 162 93
500 176 97
600 187 100
700 195 103
800 202 105
900 208 106
1000 213 107
1250 223 110
1500 229 111
1750 234 113
2000 238 113
2500 244 115
3000 248 116
5000 257 117
10000 263 119
50000 269 120
100000 270 120
07/21/09 6
Figure 1: Optimal sample size 5% precision, 90% confidence interval
0
50
100
150
200
250
300
100 200 300 400 500 750 1000 1500 2000 3000 5000 10000 50000 100000
Population size
S
a
m
p
l
e
s
i
z
e
Figure 2: Optimal Sample Size 7.5 precision, 90% confidence interval
0
20
40
60
80
100
120
140
0 1000 2000 3000 4000 5000 6000
Population Size
S
a
m
p
l
e
S
i
z
e
14. The survey also includes several quantitative variables which are unbounded such as time
to obtain a permit, number of employees, or sales. For practical purposes, the most important
quantitative variable in the survey was chosen: total sales. The minimum sample size for this variable
was determined on the basis of realized information from previous surveys. Because sales have a
07/21/09 7
largely skewed distribution, the required sample size for inferences about its mean is typically too
large.
3
However, it is standard practice to work with sales in log form which takes away its large
variability.
The minimum sample size required for a 7.5% precision on estimates of log of sales was computed
for each strata. This was compared to the minimum sample size for proportions under highest
variance. For most strata, the minimum sample size for proportions was larger than the one required
for the log of sales. Table 3 illustrates the sample sizes required for different industries in a medium
size economy, Ukraine, using actual universe numbers of firms in manufacturing, services, and the
rest of the economy.
Table 3 Example of Sample Sizes required by Sector
N
Min.sample
size for prop
7.5%
Min. sample
size forl og
sales 5%
Coef. Of
variation
Food manuf. 4,184 117 22 0.143461
Garment manuf. 1,389 111 20 0.13629
Machinery & eq. 2,257 114 18 0.129497
Other manufac 15,574 119 21 0.139216
Retail 10,297 119 24 0.149958
Residual 50,971 120 20 0.135673
15. As Table 3 shows, the minimum sample size required for proportions to guarantee a
7.5% precision, in most cases guarantees the minimum sample size required for inferences about the
mean of log sales with a more demanding level of precision of 5%. In general, this result holds for
large universe numbers as long as the coefficient of variation of the quantitative variables is less than
0.5. Checking on the existing survey information for around 50 countries, the coefficient of variation
of log of sales for all industries is typically below 0.5 in all countries.
16. Note that by setting minimum sample sizes of 120 firms, the overall level of precision
for the universe is likely much better than 7.5%. For example, with overall sample sizes 837, 593 and
438 for the countries included in the table, the overall precision levels on inferences about
proportions for the whole economy, provided that observations are properly weighted, are 2.84,
3.37%, and 3.71%, respectively.
2.1. Additional Levels of Stratification and Sample Selection
3
1
2
2
1
1
(
(
(
|
|
|
.
|
\
|
+ =
y
CV z
k
N
n