0% found this document useful (0 votes)
106 views2 pages

Cheat Sheet: Optimal Stratification

This document provides a cheat sheet on how to use the SamplingStrata R package to optimize stratification for sampling surveys. It describes three methods for stratification - atomic, continuous, and spatial - depending on whether the stratification variables are categorical, continuous, or have spatial correlation. For each method it outlines the steps to define the sampling frame, set precision constraints, build/optimize strata, evaluate the solution, and select the sample. It also provides an example using data on Swiss municipalities.

Uploaded by

Ari Clecius
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views2 pages

Cheat Sheet: Optimal Stratification

This document provides a cheat sheet on how to use the SamplingStrata R package to optimize stratification for sampling surveys. It describes three methods for stratification - atomic, continuous, and spatial - depending on whether the stratification variables are categorical, continuous, or have spatial correlation. For each method it outlines the steps to define the sampling frame, set precision constraints, build/optimize strata, evaluate the solution, and select the sample. It also provides an example using data on Swiss municipalities.

Uploaded by

Ari Clecius
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

SamplingStrata: : CHEAT SHEET

To install last available release:


library(devtools)
install_github("barcaroli/SamplingStrata")

Optimal stratification Atomic strata B. Method "continuous" Evaluation


strata <- buildStrataDF(frame) Same steps with the exception of strata building, not
Given a sampling frame, SamplingStrata allows necessary.
framenew <- solution$framenew
to optimize its stratification when designing a Frame definition and precision constraints settings are
sampling survey, given precision constraints on Optimization done in the same way than in method "atomic". outstrata <- solution$aggr_strata
One more step is in determination of the most promising ss <-summaryStrata(framenew,outstrata)
target estimates. number of strata with kmeans clustering.
solution <- head(ss)
Three different methods optimStrata(method="atomic",
framesamp = frame,
The optimization can be run by indicating three errors = cv,
Kmeans clustering
different methods, on the basis of the following: iter = 50, Number of
A. if stratification variables are categorical (or Number of solutions per
kmean <- KmeansSolution2(frame=frame,
iterations pops = 10) errors=cv,
reduced to) then the method is the iteration
"atomic"; maxclusters = 10)
B. if stratification variables are continuous, nstrat <- tapply(kmean$suggestions,
then the method is the "continuous"; kmean$domainvalue,
C. if stratification variables are continuous, and FUN=function(x)
there is spatial correlation among units in length(unique(x)))
the sampling frame, then the required sugg <- prepareSuggestion(
method is the "spatial". kmean = kmean,
frame = frame,
plotStrata2d(framenew,
nstrat = nstrat)
outstrata,
domain = 6,
A. Method "atomic" Visualization vars = c("X1","X2"),
of strata by labels = c("POPTOT", "HApoly"))
Different steps: couples of X’s
1. define the sampling frame;
2. set precision constraints;
3. build atomic strata; Evaluation
4. run optimization; Suggested
5. perform evaluation; outstrata <- solution$aggr_strata number of
6. select the sample. framenew <- solution$framenew strata (8) for
Data on 2896 eval <- evalSolution(framenew,outstrata) domain 4
Sampling frame Swiss eval$coeff_var
municipalities
library(SamplingStrata) Optimization
data("swissmunicipalities")
swissmunicipalities$id <- solution <- optimStrata (
c(1:nrow(swissmunicipalities)) method = "continuous",
frame <- buildFrameDF( framesamp = frame,
df = swissmunicipalities, errors = cv,
id = "id",
Stratification Suggestion nStrata = nstrat,
domainvalue = "REG", prepared by iter = 50,
variables X = c("POPTOT","HApoly"), kmeans pops = 10,
Y =c("Surfacesbois", "Airind")) clustering suggestions = sugg)
eval <-
Target evalSolution(framenew,outstrata)
variables Sample selection
eval$coeff_var
Precision constraints
s <- selectSample(framenew,outstrata) Sample selection
ndom <- head(s)
s <- selectSample(framenew,outstrata)
length(unique(frame$domainvalue))
head(s)
cv <- as.data.frame(list(
dplyr::lag() - Offset elements by 1
DOM = rep("DOM1",ndom),
dplyr::lead() - Offset elements by -1
10% of CV1 = rep(0.10,ndom),
maximum CV2 = rep(0.10,ndom),
expected CV domainvalue = c(1:ndom)))

CC BY SA Giulio Barcaroli • [email protected] Learn more at https://barcaroli.github.io/SamplingStrata/• package version 1.5 • Updated: 2020-01
C. Method "spatial"
lead.kr <- krige(lead~dist+soil,
prediction meuse, meuse.grid,
Use of models Sampling frame
model=fit.vgm.lead$var_model) Usually, values of target variables are not available frame <- buildFrameDF(
lead.pred <- ifelse(lead.kr[1]$var1.pred<0, in sampling frames, but only of co-variates. In order df=swissmunicipalities,
In cases where units in the sampling frame are 0,lead.kr[1]$var1.pred) to calculate correctly the variance of target id="id",
geo-referenced and there is spatial correlation lead.var <- ifelse(lead.kr[2]$var1.var < 0, variables in strata, we can make use of models. Co-variates X=c("POPTOT","HApoly"),
among them, it is possible to apply the 0,lead.kr[2]$var1.var) When applying methods ‘atomic’ and as both X’s Y=c("POPTOT","HApoly"),
"spatial" method in the optimization of the ‘continuous’, it possible to declare linear or log- and Y’s domainvalue = "REG")
frame stratification. Sampling frame linear models linking each target variable to one
co-variate available in the sampling frame. frame$airind <-
df <- as.data.frame(list(
Different steps: swissmunicipalities$Airind
dom=rep(1,nrow(meuse.grid)),
1. perform a preliminary spatial analysis and fit lead.pred=lead.pred,
frame$surfacesbois <-
spatial models on target variables lead.var=lead.var, Consider the case with ‘swissmunicipalities’ swissmunicipalities$Surfacesbois
2. define the sampling frame and add lon=meuse.grid$x, dataset. Suppose that for all units we only have
predicted values, prediction errors and lat=meuse.grid$y, values for POPTOT and HApoly, while only on a
coordinates; id=c(1:nrow(meuse.grid)))) subset (500) of it the values for Surfacesbois Optimization
3. set precision constraints; and Airbat are also available.
frame <- buildFrameSpatial(df=df,
We fit the following models: With the same precision constraints of 10% for
4. run optimization; id="id", both target variables we run the optimization step:
5. select the sample. X=c("lead.pred"),
k <- sample(c(1:2896),500)
Y=c("lead.pred"), solution <-
s <- swissmunicipalities[k,]
Spatial analysis variance=c ("lead.var"),
Airind_POPTOT <- optimStrata(
lon="lon", method = "continuous",
We make use of the «Meuse river»datasets, lm(Airind~POPTOT, data=s)
lat="lat", errors = cv,
reporting measures of 4 metals concentration. Bois_HApoly <-
domainvalue = "dom") framesamp = frame,
lm(Surfacesbois~HApoly,data=s)
model = model,
‘model’
dataframe
library(sp) Precision constraints nStrata = rep(5,7),
previously
iter = 50, defined
# locations (155 observed points) cv2 <- as.data.frame(list( For both models we calculate pops = 10)
data("meuse") DOM=rep("DOM1",1), heteroscedasticity indexes and variance:
# grid of points (3103) CV1=rep(0.05,1),
data("meuse.grid") domainvalue=c(1:1) ))
meuse.grid$id <- c(1:nrow(meuse.grid)) airind <-
coordinates(meuse)<-c('x','y') computeGamma(Airind_POPTOT$residuals,
coordinates(meuse.grid)<-c('x','y') Optimization s$POPTOT,nbins = 14)
airind
solution <- optimStrata(method="spatial", # gamma sigma r.square
errors=cv2, framesamp=frame, iter=25, # 0.59235109 0.06794055 0.87070106
nStrata=5, fitting=1, kappa=1, bois <-
Grid of range=fit.vgm.lead$var_model$range[2]) computeGamma(Bois_HApoly$residuals,
Meuse
river s$HApoly,nbins = 14)
framenew <- solution$framenew bois
outstrata <- solution$aggr_strata # gamma sigma r.square
Sample frameres <- SpatialPixelsDataFrame( # 0.8547931 0.4483606 0.9732122 )
of points=framenew[c("LON","LAT")],
observed data=framenew) Evaluation
values frameres$LABEL <- We can now instantiate the values in the
as.factor(frameres$LABEL) ‘model’ dataframe: framenew <- solution$framenew
spplot(frameres,c("LABEL"), outstrata <- solution$aggr_strata
col.regions=bpy.colors(5)) framenew$Y3 <- framenew$AIRIND
library(gstat) model <- NULL
framenew$Y4 <- framenew$SURFACESBOIS
library(automap) model$beta[1] <-
val <- evalSolution(framenew,outstrata)
v <- variogram(lead~dist+soil,data=meuse) Airind_POPTOT$coefficients[2]
val$coeff_var
fit.vgm.lead <- autofitVariogram( model$sig2[1] <- airind[2]^2
# CV1 CV2 CV3 CV4 dom
lead ~dist+soil,meuse,model="Exp") model$type[1] <- "linear"
# 0.0107 0.0706 0.0316 0.0603 DOM1
plot(v, fit.vgm.lead$var_model) model$gamma[1] <- airind[1]
# 0.0073 0.0364 0.0220 0.0426 DOM2
model$beta[2] <-
# 0.0062 0.0252 0.0253 0.0332 DOM3
Bois_HApoly$coefficients[2]
# 0.0071 0.0328 0.0303 0.0572 DOM4
model$sig2[2] <- bois[2]^2
Analysis model$type[2] <- "linear"
# 0.0055 0.0646 0.0171 0.0541 DOM5
and fitting model$gamma[2] <- bois[1]
# 0.0037 0.0745 0.0173 0.0606 DOM6
# 0.0036 0.0753 0.0145 0.0541 DOM7
model <- as.data.frame(model)
model Notice that both the CV’s of the co-variates
Optimal # beta sig2 type gamma (CV1 and CV2) andthe CV’s of the real target
Stratification # 0.01109583 0.1708807 linear 0.4703953
variables (CV3 and CV4) are compliant to the
of meuse.grid # 0.26068155 0.2010272 linear 0.8547931
10% precision constraints.

CC BY SA Giulio Barcaroli • [email protected] Learn more at https://barcaroli.github.io/SamplingStrata/• package version 1.5 • Updated: 2020-01

You might also like