2 Complex Sampling Concepts: PSU PSU PSU Usus CS SRS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

2 Complex Sampling Concepts

2.1 Introduction
A common theme in alcohol abuse research is that data are usually obtained from a multi-stage or
complex sample design. An example of a typical complex sampling scheme is:

o Stratify the geographical area under study according to census geography and census socio-
economic variables.
o Form meaningful clusters of population elements, called primary sampling units (PSUs), for
example schools, in each stratum.
o Draw a predetermined number of PSUs from each stratum, using probability sampling
proportional to size.
o Do one or more stages of subsampling within each PSU.
o Draw a simple random sample of ultimate sampling units (USUs) at the last stage.

The main advantages of a complex sample (CS) in comparison with a simple random sample (SRS)
are:
o CS does not require a complete sampling frame of the population elements.
o CS is more economical and practical.
o CS guarantees a representative sample of the population.
o CS makes a step-by-step design of the sample possible.

The main disadvantage of CS is that it is generally less efficient than SRS, i.e., it yields estimates of
lower precision for a fixed sample size.

In the application of CS, the design effect (deff) and sampling weights play an important role. The
design effect is defined as

Variance of an estimate using complex sampling


deff =
Variance of an estimate under SRS

The design effect (deff) provides a rough and ready method of estimating the variance of survey
statistics and of adjusting the output of standard statistical software packages for the complex
sampling design. This aspect of deff derives from its assumed portability. See Kish (1965) for a
discussion of design effects.

The deff is used not only to produce estimates of variance, but also to adjust the output of standard
analyses. For example, the practitioner may utilize standard statistical software packages to
conduct a regression analysis of a hypothesized linear relationship between survey variables, or to
formulate a multi-way table and conduct a χ 2 test of independence between survey variables. The
output of standard statistical software packages gives wrong answers for such problems (because

Chapter 2: Complex sampling concepts 8


the underlying assumptions of the methods are not satisfied for complex survey designs). A first-
order correction may be obtained by dividing the corresponding test statistic by the estimated deff.
See Rao & Scott (1981) and Skinner, Holt, & Smith (1989).

In this chapter we start with some known results for the sum of random variables and for multiple
linear regression to illustrate adjustments that must be made to accommodate complex sampling
properly. In Sections 2.2 and 2.3 we provide a brief summary of important concepts in complex
sampling and in Sections 2.4 to 2.7 discuss how these concepts are currently applied to fit
regression models to survey data. We will show that standard software packages for regression
analysis allow for a weight variable, but do not yield the correct standard error estimates and
measures of fit.

2.2 Indicator variables and t-estimators

Consider a finite population of identifiable units U = {u1 , u2 ,..., u N } where the size N of the
population is assumed known. The inclusion of a given element ui in a sample s is a random event
indicated by the random variable I i (sample membership indicator of element i), i = 1, 2,..., N
defined as

⎧1 if i ∈ s
Ii = ⎨
⎩ 0 else.

The probability that ui will be included in the sample, denoted by π i , is

π i = P(ui ∈ s) = P( I i = 1).

The probability that both ui and u j will be included in the sample, denoted by π ij , is

π ij = P(ui ∈ s and u j ∈ s ) = P( I i ⋅ I j = 1)

By definition (see e.g. Traat, Meister, & Söstra, 2001). Therefore, E ( I i ) = π i , Var ( I i ) = π i (1 − π i )
and Cov( I i , I j ) = π ij − π iπ j .

The selected sample s = {u(1) , u(2) ,..., u( n ) } , is an unordered set of population units, where n denotes
the sample size. A sampling weight wi for the i-th USU is usually calculated as 1/ π i , where π i

Chapter 2: Complex sampling concepts 9


denotes the inclusion probability. If π 1 = π 2 = ... = π the sample is called a self-weighting sample.
Sometimes wi is called the base weight.

Let y N and z N be N ×1 vectors of finite population values with typical elements y j and z j , j =
1, 2, …, N respectively. Denote the values drawn from a multi-stage sample of size n by y s and
z s , where

y s = ( y(1) , y(2) ,..., y( j ) ,..., y( n ) )' ,


z s = ( z(1) , z(2) ,..., z( j ) ,..., z( n ) )' .

Here z( j ) denotes the j-th sample element, z( j ) ∈ z s .

N N N N N
Consider the population totals t1 = ∑ yi , t2 = ∑ yi2 , t3 = ∑ zi , t4 = ∑ zi2 , and t5 = ∑ zi yi . Each
i =1 i =1 i =1 i =1 i =1

population total t j can be estimated by the corresponding π -estimator t jπ (Horvitz & Thompson,
1952). For example,

n N
t1π =
∑ y(i ) / π i , t 5π = ∑ zi yi / π i .
i =1 i =1

Each estimated total can be written as a linear function of the sample membership indicators
I i , i = 1, 2,..., N . For example,

∑i i i
t1π = I y / π .
i =1

This estimator (Horvitz-Thompson) is an unbiased estimator of t1 since

( )
N N
E t1π = ∑ E ( I i ) yi / π i = ∑ yi = t1.
i =1 i =1

Chapter 2: Complex sampling concepts 10


Use of Cov( I i , I j ) = π ij − π iπ j gives

1 ⎛ π ⎞
( )
N N
Var t1π = ∑∑ ⎜ ij − 1⎟ yi y j .

i =1 j =1 π ij ⎝ π iπ j


Hence,
n n
1 ⎛ π ij ⎞
m (t1π ) =
Var ∑∑ ⎜⎜
i =1 j =1 π ij ⎝ π iπ j
− 1⎟ y(i ) y( j ) .

This simple example shows that the expected value and variance of the sum are rather different
under complex sampling than for the corresponding case of simple random sampling.

Standard methods are available for the point estimation of sample functions of the population
totals, such as means, ratios, and differences of ratios. These methods are based (see e.g. Särndal,
et. al., 1992) on the following result. Given that a population parameter θ can be expressed as a
function of several population totals, i.e. θ = f (t1 , t2 ,..., tq ) , then an estimator θ of θ is obtained
from θ = f (t1π ,..., t jπ ,..., t qπ ) where t jπ is the corresponding π -estimator of t . Additionally,
j

consistent estimators of the sample variances of the estimators are available and have been
implemented in various programs for the analysis of survey data (cf. Section 2.2). One method of
estimating the variance of θ if θ is a nonlinear function of the totals is by a first-order Taylor
approximation of f (t1π ,..., t jπ ,..., t qπ ) (see e.g. Wolter, 1985).

To estimate the variance of a survey estimator in the case of a single-stage sampling design, there
are typically two alternatives: (1) the variance estimator based upon pps sampling with
replacement, and (2) the Yates-Grundy estimator of variance (Yates & Grundy, 1953; Biyani,
1980) for pps without replacement sampling. Many survey practitioners will find the first estimator
to be a satisfactory approximation to the variance given the actual survey design. For instances in
which it is important to reflect the without replacement sampling design (e.g., important sampling
fraction) and where it is feasible to calculate the joint inclusion probabilities (e.g., Durbin’s two-
per-stratum design; see, Durbin, 1967, and Shapiro & Bateman, 1978), you would have the
opportunity to specify the Yates-Grundy estimator. Applied to a multi-stage sampling design, these
estimators usually provide a very good approximation to the total variance.

Chapter 2: Complex sampling concepts 11


2.3 Additional weight adjustments

It was previously mentioned that the sampling weight wi is usually calculated as wi = 1/ π i , the so-
called base weight.

In a practical application each weight wi usually undergoes additional adjustments,


1 1 l i is the response rate in a cell
o such as wi = (nonresponse adjusted weight) , where R
l
π i Ri
containing the i–th unit
1 1 T
o wi = Fi (post stratification weight) , where Fi = k , and where Tk denotes the total
li
πi R Tl k
number of units belonging to the k-th cell, k = 1, 2, …, K of a contingency table formed
from a set of categorical variables. For example, consider the variables age (5 categories),
gender (2 categories), and ethnic group (3 categories). In this case K = 5 × 2 × 3 = 30 . From
the sample one can establish, for each of these variables, which category (e.g. male/female)
is assigned to the i-th USU, and hence determine the cell number for that specific
combination. The estimated total Tl k is obtained using

1 1
wi = .
li
πi R

It is evident that non-response and post stratification adjusted weights will have an impact on the
estimation of population totals and functions of population totals.

2.4 Linear regression


In this section we show the effect of sampling on the estimates and variability of regression
coefficients. Again, the results are rather different from those that can be derived under the more
familiar case of simple random sampling.

Suppose YN is an N × p matrix defined as YN' = (y1 , y 2 ,..., y N ) , where the elements of y i are
values of p variables of interest.

Chapter 2: Complex sampling concepts 12


2.4.1 Example 1

Let yij denote a typical element of y i , where yij is the number of occasions alcohol was consumed
by student i (i = 1, 2,..., N ) in the prior 30 days, and where j = 1 denotes grade 8, j = 2 denotes
grade 9, …, j = 5 denotes grade 12.

2.4.2 Example 2

Let the subscript i denote the student i (i = 1, 2,..., N ) . Suppose yi1 equals the number of times this
student was under the influence of alcohol in the prior year; yi 2 is a language score, and yi 3 is a
math score.

Example 1 above describes a longitudinal study, often referred to in the literature as a repeated
measurements study, since measurements are made on the same individual on successive
occasions. Note that, in general, measurement occasions are not necessarily equally spaced over
time.

Example 2 describes a typical cross-sectional study. Note, however, that this study may have been
carried out in 1998, and subsequently repeated in 2000 and 2002. It is evident that the finite
populations U1998 , U 2000 , and U 2002 will overlap if, for example, 8th to 12th graders in the state of
Texas are defined as the population elements. Hence, the samples s1998 , s2000 , and s2002 may also
have overlapping units. A cross-sectional study, repeated over time, is often referred to as a panel
study, but the statistical treatment usually treats the data as multiple-group data. In this examples
the year of study defines the group. A typical multiple group application is to test for differences in
the means of latent variables under the assumption of factor invariance.

Consider the case p = 1 (univariate regression) so that YN = y N , an N-dimensional vector. Suppose


further that X N is an N × r matrix defined as X'N = (x1 , x 2 ,..., x N ) , where the elements of xi are
values of r auxiliary variables, for example xi1 = gender, xi 2 = socio-economic status, and xi 3 = age.

The finite population regression coefficient vector β is a function of y and is defined as

β = ( X'N X N ) −1 X'N y N . (2.1)

Chapter 2: Complex sampling concepts 13


Under a so-called design-based approach, β is an obvious choice for the parameter of interest
when regression is based on sample survey data. In the estimation of β we assume an underlying
homoscedastic model

E (y N | X N ) = X N β; Cov(y N | X N ) = σ 2 I N . (2.2)

Let X s denote an n × r matrix of rows of X N selected according to some sampling design s.

The ordinary least squares (OLS) estimator β = ( X's X s ) −1 X's y s of β is not, in general, a design-
consistent estimator of β .

An equivalent expression for (2.1) is

β = T−1t (2.3)

N N
where T = ∑ xα xα' , and t = ∑ xα yα .
α =1 α =1

Let tij and ti 0 denote typical elements of T and t respectively, then

N N
tij = ∑ xiα x jα and ti 0 = ∑ xiα yα .
α =1 α =1

Each total (cf. Section 2.2) can be estimated by their unbiased π -estimators. For example,

∑ (iα ) ( jα ) α
t ij ,π = x x / π ,
α =1

∑ (iα ) (α ) α
t i 0,π = x y / π , i, j = 1, 2,..., r .
α =1

In matrix notation,

'
x x n
l=
T ∑ (α ) (α ) = X's Ws Xs ,
α =1 πα
(2.4)

Chapter 2: Complex sampling concepts 14


and
x y
n
t = ∑ (α ) (α ) = X's Ws y s ,
α =1 πα (2.5)

where x(α ) denotes the α -th column of X's .

This yields the design-weighted estimator

l −1 t = ( X' W X )−1 X' W y ,


β W = T s s s s s s
(2.6)

which is a design-consistent estimator of β .

The weighting matrix is defined as Ws = diag ( w1 , w2 ,..., wn ) , where wi = 1/ π i denotes the


sampling weight for the i-th USU. In the case of a self-weighting design, i.e.,
π i = n / N , i = 1, 2,..., N , the estimators β and β W are identical.

2.5 Standard error estimation

For most sample designs used in practice, the sampling variance of β W cannot be estimated using
standard computer packages, and a variance estimating technique has to be used.
The basic methods available (see e.g. Rao, 1975) are:

(a) Linearization or Taylor expansion methods (Wolter, 1985, Binder, 1983):


Suppose that the parameter θ is a nonlinear function f (t1 , t2 ,..., tq ) of population totals, then θ
is consistently estimated by θ = f (t1,π , t 2,π ,..., t q ,π ) . By Taylor linearization it follows that

q
f (t1,π , t 2,π ,..., t q ,π )  f (t1 , t2 ,..., tq ) + ∑ a j (t j ,π − t j ) ,
j =1

where
∂f (t1,π , t 2,π ,..., t q ,π )
aj = |t1,π =t ,...,t q ,π =t
∂t j ,π 1 q

Chapter 2: Complex sampling concepts 15


From (1.3) it follows that β is a nonlinear function of r (r + 1) / 2 + r population totals, with
corresponding estimator β W as defined in (2.6).

Using a first-order Taylor approximation, it can be shown that (cf. (2.3))

β W  β + T−1 (t − Tβ
l ).
(2.7)

From (2.7) it follows that

Cov(β W )  T−1VT−1.

An approximate expression for T−1VT−1 is T l −1 VT


l l −1 . Typical elements of V and V
l are given
in, for example, Särndal et. al. (1992, page 194).

(b) Balanced repeated replication (McCarthy, 1969): Statistics based on half-samples, which are
selected so as to ensure an orthogonal balanced set, are computed and the empirical covariances
of theses statistics are used as the appropriate estimator.

In longitudinal studies an increase in precision is obtained if allowance is made for the fact that
units sampled over time are correlated. Replication methods provide a simple means for
incorporating this correlation.

(c) Jackknife (Miller, 1974): The sample is first split into subsamples, each of which reflects the
original complex design. Statistics based on the sample data without one of the subsamples are
computed and the empirical covariances of these statistics serve as covariance estimators. A
more detailed account is given in Wolter (1985).

(d) Bootstrap (Efron, 1981, 1982; Kovar, Rao & Wu, 1988): The sample data is used to construct
an artificial population U * which is assumed to mimic the real, but unknown, population U.
The original design is used to draw a series of K samples (with replacement) from U * . For each
*
“bootstrap” sample, i, an estimate θ i of the population parameter θ is computed and
* * *
subsequently θ and var( θ ) are estimated from θ 1 ,θ 2 ,...,θ K .

While the standard statistical package computer programs do not in general deal with the complex
sample design situation, several special purpose programs for covariance estimation have been
developed for use with complex sample designs. Lepkowski and Bowles (1996) give a list of eight
software packages that are available for use by the general survey analyst. The eight catalogued in
their paper are CENVAR, CLUSTERS, EpiInfo, PC CARP, STATA, SUDAAN, VPLX and WesVar. The

Chapter 2: Complex sampling concepts 16


first six programs use the Taylor series expansion for variance estimation and the remaining
programs use replication methods.

Theoretical comparisons of the different methods of covariance estimation by Krewski & Rao
(1981) and empirical comparisons by Kish & Frankel (1974) and by Richards & Freeman (1980)
indicate their performance is very similar in many cases.

2.6 Heteroscedastic model


Some predictors may exhibit heteroscedasticity, in which changes in variance occur with changes
in the values of the predictor. For example, the variance of income across individuals is
systematically higher for higher-income individuals.

In the case of a heteroscedastic model

E (y N | X N ) = X N B Cov(y N | X N ) = σ 2 V, (2.8)

the finite population parameter

β* = ( X'N V −1X N ) −1 X'N V −1y N (2.9)

is a more suitable parameter for inference.

If V is diagonal and the inclusion probabilities are proportional to the variances, then β W (cf. (2.6))
coincides with the weighted least squares estimator

*
β = ( X's Vs−1X s ) −1 X's Vs−1y s
(2.10)

where Vs is the appropriate n × n submatrix of V .

Standard programs compute the OLS estimator, β , and can often also compute the generalized OLS
*
estimator, β , together with unbiased estimators of their model-variances σ 2 ( X' X ) −1 under (2.2)
s s

and σ 2 ( X's Vs−1X s ) −1 under (2.8) respectively. The design-weighted estimator, β W , can also be
obtained by the weighted regression options of standard statistical packages (e.g. LISREL or SPSS)
by using the weights 1/ π i . Alternatively, β W can be obtained by unweighted regression on the
transformed variables yi / π i and xi / π i . Nathan (1988), however, has pointed out that the

Chapter 2: Complex sampling concepts 17


reported variances and covariances will be incorrect. This implies that the standard significance
tests (e.g. F-tests) will be invalid and can result in misleading conclusions.

The programs that use weighted regression, with weights 1/ π i , report the estimator of the
variance-covariance matrix as σl ( X's Ws X s ) . The model-variance of
2 −1
β W , under the
homoscedastic model (2.2), is

σl ( X's Ws X s ) ( X W V W X )( X W X )
2 −1 ' ' −1
s s s s s s s s

which simplifies to σl ( X's Ws X s ) under the homoscedastic model (V = I) only for self-weighting
2 −1

designs and under the heteroscedastic model (2.8) only if V is diagonal and the inclusion
probabilities are proportional to the variances.

2.7 Covariance matrix of vector of totals

2.7.1 Introduction

In this section formulae for the estimation of the covariance matrix of a vector of totals are given
for single-stage, two-stage, and three-stage sampling designs.

For a multi-stage sampling design we assume the following general sampling methods at each
stage:

o First stage: random sampling with replacement (WR), random sampling without
replacement and equal probability of selection (WOR), and random sampling without
replacement and unequal probabilities (UWOR).
o Second stage: if the first stage is not WR, then WR, WOR, or UWOR.
o Third stage: if second stage is not WR, then WR, WOR, or systematic.

From the above it follows that all specifications other than weights are ignored for subsequent
stages if a multi-stage sample contains a WR, or an approximation to WR, stage.

Overall weights for each ultimate sampling unit can be obtained as a product of weights for
corresponding units computed in each sampling stage.

Chapter 2: Complex sampling concepts 18


2.7.2 Notation

N : Total number of elements in the population


n : Total number of elements in the sample
H : Number of strata
nh : Sampled number of primary sampling units (PSU) per stratum
mhi : Number of elements in the i-th sampled PSU in stratum h, i=1, …., nh
whij : Overall sampling weight for j-th element in the i-th sampled PSU in stratum h
y hij : Values of vector y for the j-th element in the i-th sampled PSU in stratum h
yT : Population total sum for vector of variables y

2.7.3 Total covariances

To simplify the expressions for the estimated covariance matrix of a vector of totals, let

z hij = whij y hij (2.11)

where the index h denotes a stratum within a given sampling stage, i denotes the i–th sampled unit
within stratum h in the same sampling stage and j denotes all final stage units contained within hi.

Let

mhij

z hi = ∑ z hij
j=1
(2.12)

nh
1
zh =
nh
∑z
i =1
hi (2.13)

and

1 nh
( )( z )
'
S h2 (y ) = ∑ z hi − z h
nh − 1 i =1
hj − zh
(2.14)

Chapter 2: Complex sampling concepts 19


Single stage sample

The covariance of the total for vector y in a single-stage sample is estimated by:

( ) ( ) ( )
H
l y = V
V T T ∑ h T
l 1 y = U y
h=1 (2.15)

( )
where U h y T is an estimated contribution from stratum h = 1, …, H and depends on the sampling
method used:

( )
o For WR, U h y T = nh S h2 (y ) ,

( )
o For simple random sampling, U h y T = (1 − f h ) nh S 2h (y ) ,

o and for sampling WOR and unequal probabilities,


nh nh ⎛ ⎞
π π
( )
U h y T = ∑∑ ⎜ hi hj − 1⎟ (z hi − z h )(z hi − z h )' .


i =1 i > j ⎝ π hij

In the variance estimator, π hi and π hj are the inclusion probability for units i and j in stratum h,
and π hij is the joint inclusion probability for the same units (Yates & Grundy, 1953; Sen, 1953). In
some situations it may yield a negative estimate and is treated as undefined.

Currently, for each stratum h containing a single element, the covariance contribution U h y T ( ) is
set to zero. An alternative procedure is to collapse strata. Presently, we leave it to the discretion of
the user to collapse strata prior to any further statistical analysis.

Two-stage sample

When two-stage sampling is used and sampling WOR is applied in the first stage, the following
estimate of the covariance of the total for vector y may be used:

( ) ( ) ( ) ( )
H nh K hi
l y = V
V T
l 2 y = V
T
l 1 y +
T ∑∑ π hi ∑ U hik y T .
h =1 i =1 k =1 (2.16)

o Here π hi represents the first stage inclusion probability for the primary sampling unit i from
stratum h.

Chapter 2: Complex sampling concepts 20


o If simple random sampling is used, the inclusion probability is equal to the sampling rate
f h for stratum h.
o The number of second stage strata in the primary sampling unit i within the first stage
stratum h is denoted by K hi .

o ( ) is the covariance contribution from the second stage stratum k from the primary
U hik y T
sampling unit hi. It depends on the sampling method used in the second stage (see formulae
above).

Three-stage sample

For a three-stage sample where first stage sampling is done without replacement, and simple
random sampling is applied in the second stage, the following estimate of the covariance of the
total for vector y may be used:

nhik Lhikj

( ) ( ) ( )
H nh K hi
l y = V
V T
l 2 y +
T ∑∑ π hi ∑ f hik ∑∑ U hikjl y T ,
h =1 i =1 k =1 j =1 l =1
(2.17)

where
o f hik represents the sampling rate for the secondary sampling units in the second-stage
stratum hik,
o Lhikj indicates the number of third-stage strata in the secondary sampling unit hikj, and

o ( )
U hikjl y T denotes the covariance contribution from the third-stage stratum l, which is
contained in the secondary sampling unit hikj. Again, this depends on the third-stage
sample method (see formulae above).

2.8 Approximate covariance matrix of estimators


In this section we provide a general procedure for the estimation of the approximate covariance
matrix of estimators. The results derived are based on Binder (1983) and use a first-order Taylor
linearization.

Assume that L is the likelihood function or any other appropriate function of the vector γ of
unknown parameters, and that an estimate γ of γ is obtained by solving the set of simultaneous
equations

Chapter 2: Complex sampling concepts 21


∂ ln L
=0
∂γ γ = γˆ
(2.18)

In general, no closed-form solution to the set of equations (2.18) exists, and therefore parameter
estimates are obtained iteratively using the Fisher scoring algorithm, for example,

γˆ (t +1) = γˆ (t ) + I −n1 ( γˆ (t ) ) g ( γˆ (t ) )
(2.19)

where γˆ (t ) denotes the parameter values at iteration t , t = 1, 2," ; g ( ⋅) denotes the gradient vector;
and I n ( ⋅) denotes the information matrix. In other words,

∂ ln L
g(γ) = (2.20)
∂γ
and
⎡ ∂ 2 ln L ⎤
In ( γ ) = −E ⎢ ⎥
⎣ ∂γ∂γ ′ ⎦ (2.21)

Denote the contribution to the gradient vector of each first-stage element for a given sampling
stage by g hij , where h denotes stratum, and i the i –th unit within this stratum. The index j denotes
a typical final stage element contained within the PSU hi, then

H nh mhi
[g ( γ )]r = ∑∑∑ [g hij ( γ )]r
h =1 i =1 j =1
(2.22)

From (2.18), (2.20), and (2.22) it follows that γ̂ is the solution to the set of equations

H nh mhi
ˆ ( γˆ ) = ∑∑∑ g hij ( γˆ ) = 0
w
h =1 i =1 j =1
(2.23)

ˆ ( γˆ ) at γˆ = γ , it follows that
Using a first-order Taylor expansion of w

ˆ (γ)
∂w
ˆ ( γˆ ) ≈ w
0=w ˆ (γ) + ( γˆ − γ )
∂γ ' (2.24)

Chapter 2: Complex sampling concepts 22


Taking variances on both sides, it further follows that

ˆ (γ)
∂w ˆ ( γ ) ⎞′
⎛ ∂w
Cov ( w
ˆ ( γˆ ) ) ≈ Cov ( γ ) ⎜
ˆ ⎟.
∂γ ' ⎝ ∂γ ' ⎠ (2.25)

∂ 2 ln L ∂ ⎡ ∂g( γ ) ⎤
Thus, provided that (cf. (2.23)) = is a non-singular matrix,
∂γ∂γ ' ∂γ ⎢⎣ ∂γ ' ⎥⎦

−1 −1
⎡ ∂ 2 ln L ⎤ ⎡ ∂ 2 ln L ⎤
Cov ( γˆ ) ≈ ⎢ ⎥ Cov ( ˆ
w ( ˆ
γ ) ) ⎢ ∂γ∂γ ' ⎥ ,
⎣ ∂γ∂γ ' ⎦ ⎣ ⎦
⎡ ∂ 2 ln L ⎤
where E ⎢ ⎥ = −Ι n ( γ ) .
⎣ ∂γ∂γ ' ⎦

Therefore, an approximate expression for the asymptotic covariance matrix of γ is given by

Cov ( γˆ ) ≈ Ι n −1 ( γ ) GΙ n −1 ( γ )
(2.26)

where G = Cov ( w
ˆ ( γˆ ) ) .

Using results derived by Fuller (1975) (see also Section 2.7), it follows that, under single stage
sampling with replacement (WR) or without replacement (WOR),

H
nh (1 − f h ) nh
( )( )
'
G=∑ ∑ t hi. − t h.. t hi. − t h.. (2.27)
h =1 n h − 1 i =1

where:
nhi
o nh = ∑ mhij , with mhij the number of cases with identical response patterns within stratum
j =1

h, cluster i, and USU j. If f hij = 1 for all h, then mhij = 1 for all h, i and j.
nh
o fh = , the sampling rate for stratum h .
Nh
o t hij () ()
= g hij γ where g hij γ is the hij -th contribution to the gradient vector g( γ ) as
defined by (2.19).

Chapter 2: Complex sampling concepts 23


mhij

o t hi. = ∑ t hij .
j =1
nh nh
1
o t h.. = ∑ t hi. t h.. = ∑ t hi.
j =1 nh i =1

Currently, we assume a zero contribution to G for strata that contain a single PSU (cluster).
Alternatively, the collapsing of strata or PSUs is left to the user's discretion (see Section 2.7.3).
Additionally, if there is no variable to define clusters, the observations within each stratum are
treated as being the primary sampling units.

Chapter 2: Complex sampling concepts 24


2.9 References
Binder, D.A. (1983). On the variances of asymptotically normal estimators from complex surveys,
International Statistical Review, 51, 279-292.

Binder, D.A. & Hidiroglou, M.A. (1988). Sampling in time. In: P.R. Krishnaiah & C.R. Rao (Eds.).
Handbook of Statistics, Vol. 6. Amsterdam: North-Holland, pp. 187-211.

Biyani, S.H. (1980). On variance estimator in unequal probability sampling, Proceedings of the
Survey Research Methods American Statistical Association, 634-637.

Durbin, J. (1967). Design of Multi-Stage Surveys for the Estimation of Sampling Errors, Applied
Statistics, XVI, 152-164.

Efron, B. (1981). Nonparametric standard errors and confidence intervals, Canadian Journal of
Statistics, 9, 139-172.

Efron, B. (1982). The jackknife, the bootstrap and other resampling plans, CBMS-NSF Regional
Conf. Series in Applied Mathematics, no. 38.

Fuller, W.A. (1975). Regression Analysis for Sample Survey. Sankhya, Series C, 37, 117-132.

Horvitz, D.G. & Thompson, D.J. (1952). A generalization of sampling without replacement from a
finite universe. Journal of the American Statistical Association, 47, 663-685.

Kish, L. (1965). Survey Sampling. New York: John Wiley.

Kish, L., & Frankel, M.R. (1974). Inference from Complex Samples, Journal of Royal Statistical
Society Ser. B, 36, 1-37.

Kovar, J., Rao, J.N.K., & Wu, C.F.J. (1988). Bootstrap and other methods to measure errors in
survey estimates, Canadian Journal of Statistics, 16 (Supplement), 25-45.

Krewski, D., & Rao, J.N.K. (1981). Inference from stratified samples: properties of the
linearization, jackknife and balanced repeated replication methods, Annals of Statistics, 9, 1010-
1019.

Chapter 2: Complex sampling concepts 25


Lepkowski, J., & Bowles, J. (1996). Sampling error software for personal computers, Survey
Statistician, 35, 10-17.

McCarthy, P.J. (1969). Pseudo-replication: Half samples, Internat. Stat. Rev., 37, 239-264.

Miller, R.G. (1974). The jackknife: A review, Biometrika, 61, 1-15.

Rao, J.N.K. (1975). Unbiased variance estimation for multistage designs, Sankhya, C37, 133-139.

Rao, J.N.K., & Scott, A.J. (1981). The Analysis of Categorical Data from Complex Sample
Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables, Journal of
the American Statistical Association, 76, 221-230.

Richards, V., & Freeman, D.H. (1980). A comparison of replicated and pseudo-replicated
covariance matrix estimators for the analysis of contingency tables, Proceedings of the Survey
Research Methods, American Statistical Association, 209-211.

Särndal, C.E., Swensson, B., & Wretman, J. (1992). Model assisted survey sampling. New York:
Springer.

Sen, A.R. (1953). On the estimate of the variance in sampling with varying probabilities. Journal
of the Indian Society of Agricultural Statistics, 5, 55-77.

Shapiro, G.M., & Bateman, D.V. (1978). A better alternative to the collapsed stratum variance
estimate, Proceedings of the Survey Research Methods, American Statistical Association, 451-456.

Skinner, C.J., Holt, D., & Smith, T.M.F. (1989). Analysis of Complex Surveys. Chichester: Wiley.

Traat, I., Meister, K., & Söstra, K. (2001). Statistical inference in sampling theory, Theory of
stochastic processes, Vol. 7(23), no. 1-2, 301-316.

Wolter, K.M. (1985). Introduction to Variance Estimation. New York: Springer-Verlag.

Yates, F., & Grundy, P.M. (1953). Selection without replacement from within strata with
probability proportional to size, Journal of the Royal Statistical Society, Series B, 15, 253-261.

Chapter 2: Complex sampling concepts 26

You might also like