Cheat Sheet: Optimal Stratification

This document provides a cheat sheet on how to use the SamplingStrata R package to optimize stratification for sampling surveys. It describes three methods for stratification - atomic, continuous, and spatial - depending on whether the stratification variables are categorical, continuous, or have spatial correlation. For each method it outlines the steps to define the sampling frame, set precision constraints, build/optimize strata, evaluate the solution, and select the sample. It also provides an example using data on Swiss municipalities.

Uploaded by

Ari Clecius

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views2 pages

Cheat Sheet: Optimal Stratification

Uploaded by

Ari Clecius

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

SamplingStrata: : CHEAT SHEET

To install last available release:

library(devtools)
install_github("barcaroli/SamplingStrata")

Optimal stratification Atomic strata B. Method "continuous" Evaluation

strata <- buildStrataDF(frame) Same steps with the exception of strata building, not
Given a sampling frame, SamplingStrata allows necessary.
framenew <- solution$framenew
to optimize its stratification when designing a Frame definition and precision constraints settings are
sampling survey, given precision constraints on Optimization done in the same way than in method "atomic". outstrata <- solution$aggr_strata
One more step is in determination of the most promising ss <-summaryStrata(framenew,outstrata)
target estimates. number of strata with kmeans clustering.
solution <- head(ss)
Three different methods optimStrata(method="atomic",
framesamp = frame,
The optimization can be run by indicating three errors = cv,
Kmeans clustering
different methods, on the basis of the following: iter = 50, Number of
A. if stratification variables are categorical (or Number of solutions per
kmean <- KmeansSolution2(frame=frame,
iterations pops = 10) errors=cv,
reduced to) then the method is the iteration
"atomic"; maxclusters = 10)
B. if stratification variables are continuous, nstrat <- tapply(kmean$suggestions,
then the method is the "continuous"; kmean$domainvalue,
C. if stratification variables are continuous, and FUN=function(x)
there is spatial correlation among units in length(unique(x)))
the sampling frame, then the required sugg <- prepareSuggestion(
method is the "spatial". kmean = kmean,
frame = frame,
plotStrata2d(framenew,
nstrat = nstrat)
outstrata,
domain = 6,
A. Method "atomic" Visualization vars = c("X1","X2"),
of strata by labels = c("POPTOT", "HApoly"))
Different steps: couples of X’s
1. define the sampling frame;
2. set precision constraints;
3. build atomic strata; Evaluation
4. run optimization; Suggested
5. perform evaluation; outstrata <- solution$aggr_strata number of
6. select the sample. framenew <- solution$framenew strata (8) for
Data on 2896 eval <- evalSolution(framenew,outstrata) domain 4
Sampling frame Swiss eval$coeff_var
municipalities
library(SamplingStrata) Optimization
data("swissmunicipalities")
swissmunicipalities$id <- solution <- optimStrata (
c(1:nrow(swissmunicipalities)) method = "continuous",
frame <- buildFrameDF( framesamp = frame,
df = swissmunicipalities, errors = cv,
id = "id",
Stratification Suggestion nStrata = nstrat,
domainvalue = "REG", prepared by iter = 50,
variables X = c("POPTOT","HApoly"), kmeans pops = 10,
Y =c("Surfacesbois", "Airind")) clustering suggestions = sugg)
eval <-
Target evalSolution(framenew,outstrata)
variables Sample selection
eval$coeff_var
Precision constraints
s <- selectSample(framenew,outstrata) Sample selection
ndom <- head(s)
s <- selectSample(framenew,outstrata)
length(unique(frame$domainvalue))
head(s)
cv <- as.data.frame(list(
dplyr::lag() - Offset elements by 1
DOM = rep("DOM1",ndom),
dplyr::lead() - Offset elements by -1
10% of CV1 = rep(0.10,ndom),
maximum CV2 = rep(0.10,ndom),
expected CV domainvalue = c(1:ndom)))

CC BY SA Giulio Barcaroli • [email protected] Learn more at https://barcaroli.github.io/SamplingStrata/• package version 1.5 • Updated: 2020-01
C. Method "spatial"
lead.kr <- krige(lead~dist+soil,
prediction meuse, meuse.grid,
Use of models Sampling frame
model=fit.vgm.lead$var_model) Usually, values of target variables are not available frame <- buildFrameDF(
lead.pred <- ifelse(lead.kr[1]$var1.pred<0, in sampling frames, but only of co-variates. In order df=swissmunicipalities,
In cases where units in the sampling frame are 0,lead.kr[1]$var1.pred) to calculate correctly the variance of target id="id",
geo-referenced and there is spatial correlation lead.var <- ifelse(lead.kr[2]$var1.var < 0, variables in strata, we can make use of models. Co-variates X=c("POPTOT","HApoly"),
among them, it is possible to apply the 0,lead.kr[2]$var1.var) When applying methods ‘atomic’ and as both X’s Y=c("POPTOT","HApoly"),
"spatial" method in the optimization of the ‘continuous’, it possible to declare linear or log- and Y’s domainvalue = "REG")
frame stratification. Sampling frame linear models linking each target variable to one
co-variate available in the sampling frame. frame$airind <-
df <- as.data.frame(list(
Different steps: swissmunicipalities$Airind
dom=rep(1,nrow(meuse.grid)),
1. perform a preliminary spatial analysis and fit lead.pred=lead.pred,
frame$surfacesbois <-
spatial models on target variables lead.var=lead.var, Consider the case with ‘swissmunicipalities’ swissmunicipalities$Surfacesbois
2. define the sampling frame and add lon=meuse.grid$x, dataset. Suppose that for all units we only have
predicted values, prediction errors and lat=meuse.grid$y, values for POPTOT and HApoly, while only on a
coordinates; id=c(1:nrow(meuse.grid)))) subset (500) of it the values for Surfacesbois Optimization
3. set precision constraints; and Airbat are also available.
frame <- buildFrameSpatial(df=df,
We fit the following models: With the same precision constraints of 10% for
4. run optimization; id="id", both target variables we run the optimization step:
5. select the sample. X=c("lead.pred"),
k <- sample(c(1:2896),500)
Y=c("lead.pred"), solution <-
s <- swissmunicipalities[k,]
Spatial analysis variance=c ("lead.var"),
Airind_POPTOT <- optimStrata(
lon="lon", method = "continuous",
We make use of the «Meuse river»datasets, lm(Airind~POPTOT, data=s)
lat="lat", errors = cv,
reporting measures of 4 metals concentration. Bois_HApoly <-
domainvalue = "dom") framesamp = frame,
lm(Surfacesbois~HApoly,data=s)
model = model,
‘model’
dataframe
library(sp) Precision constraints nStrata = rep(5,7),
previously
iter = 50, defined
# locations (155 observed points) cv2 <- as.data.frame(list( For both models we calculate pops = 10)
data("meuse") DOM=rep("DOM1",1), heteroscedasticity indexes and variance:
# grid of points (3103) CV1=rep(0.05,1),
data("meuse.grid") domainvalue=c(1:1) ))
meuse.grid$id <- c(1:nrow(meuse.grid)) airind <-
coordinates(meuse)<-c('x','y') computeGamma(Airind_POPTOT$residuals,
coordinates(meuse.grid)<-c('x','y') Optimization s$POPTOT,nbins = 14)
airind
solution <- optimStrata(method="spatial", # gamma sigma r.square
errors=cv2, framesamp=frame, iter=25, # 0.59235109 0.06794055 0.87070106
nStrata=5, fitting=1, kappa=1, bois <-
Grid of range=fit.vgm.lead$var_model$range[2]) computeGamma(Bois_HApoly$residuals,
Meuse
river s$HApoly,nbins = 14)
framenew <- solution$framenew bois
outstrata <- solution$aggr_strata # gamma sigma r.square
Sample frameres <- SpatialPixelsDataFrame( # 0.8547931 0.4483606 0.9732122 )
of points=framenew[c("LON","LAT")],
observed data=framenew) Evaluation
values frameres$LABEL <- We can now instantiate the values in the
as.factor(frameres$LABEL) ‘model’ dataframe: framenew <- solution$framenew
spplot(frameres,c("LABEL"), outstrata <- solution$aggr_strata
col.regions=bpy.colors(5)) framenew$Y3 <- framenew$AIRIND
library(gstat) model <- NULL
framenew$Y4 <- framenew$SURFACESBOIS
library(automap) model$beta[1] <-
val <- evalSolution(framenew,outstrata)
v <- variogram(lead~dist+soil,data=meuse) Airind_POPTOT$coefficients[2]
val$coeff_var
fit.vgm.lead <- autofitVariogram( model$sig2[1] <- airind[2]^2
# CV1 CV2 CV3 CV4 dom
lead ~dist+soil,meuse,model="Exp") model$type[1] <- "linear"
# 0.0107 0.0706 0.0316 0.0603 DOM1
plot(v, fit.vgm.lead$var_model) model$gamma[1] <- airind[1]
# 0.0073 0.0364 0.0220 0.0426 DOM2
model$beta[2] <-
# 0.0062 0.0252 0.0253 0.0332 DOM3
Bois_HApoly$coefficients[2]
# 0.0071 0.0328 0.0303 0.0572 DOM4
model$sig2[2] <- bois[2]^2
Analysis model$type[2] <- "linear"
# 0.0055 0.0646 0.0171 0.0541 DOM5
and fitting model$gamma[2] <- bois[1]
# 0.0037 0.0745 0.0173 0.0606 DOM6
# 0.0036 0.0753 0.0145 0.0541 DOM7
model <- as.data.frame(model)
model Notice that both the CV’s of the co-variates
Optimal # beta sig2 type gamma (CV1 and CV2) andthe CV’s of the real target
Stratification # 0.01109583 0.1708807 linear 0.4703953
variables (CV3 and CV4) are compliant to the
of meuse.grid # 0.26068155 0.2010272 linear 0.8547931
10% precision constraints.

CC BY SA Giulio Barcaroli • [email protected] Learn more at https://barcaroli.github.io/SamplingStrata/• package version 1.5 • Updated: 2020-01

The FDA Group - The Guide To CAPA and Root Cause Analysis in FDA-Regulated Industries
100% (1)
The FDA Group - The Guide To CAPA and Root Cause Analysis in FDA-Regulated Industries
34 pages
Foodpac e
No ratings yet
Foodpac e
324 pages
Metro Tech - Chennai: Preliminary Structural Design Brief Report
100% (1)
Metro Tech - Chennai: Preliminary Structural Design Brief Report
26 pages
MatiCard SoftwareManual 396 PDF
No ratings yet
MatiCard SoftwareManual 396 PDF
96 pages
Sampling using R
No ratings yet
Sampling using R
3 pages
Journal of Statistical Software: Spsurvey
No ratings yet
Journal of Statistical Software: Spsurvey
29 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
Tutorial R
100% (1)
Tutorial R
217 pages
RENR 690 - Geostatistics Lab
No ratings yet
RENR 690 - Geostatistics Lab
6 pages
Stat 201 Mt1 Cheatsheet
No ratings yet
Stat 201 Mt1 Cheatsheet
2 pages
Digital Assignment-6: Read The Data
No ratings yet
Digital Assignment-6: Read The Data
30 pages
HW4
No ratings yet
HW4
3 pages
R Programs 2024-2025
No ratings yet
R Programs 2024-2025
13 pages
41743
No ratings yet
41743
84 pages
Holect 18
No ratings yet
Holect 18
6 pages
Spatial Statistics in R
No ratings yet
Spatial Statistics in R
29 pages
Spatial Sampling With R.sanet - ST
No ratings yet
Spatial Sampling With R.sanet - ST
549 pages
Spatial Statistics in R
No ratings yet
Spatial Statistics in R
29 pages
R Fourier
No ratings yet
R Fourier
18 pages
Biotools
No ratings yet
Biotools
34 pages
Random Forest
No ratings yet
Random Forest
5 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Spatial Modelling Self Notes
No ratings yet
Spatial Modelling Self Notes
70 pages
English Boss
No ratings yet
English Boss
4 pages
slidesc53_2
No ratings yet
slidesc53_2
41 pages
SDM PDF
No ratings yet
SDM PDF
96 pages
Gap Analysis
No ratings yet
Gap Analysis
27 pages
Ca09 Pitblado Handout
No ratings yet
Ca09 Pitblado Handout
28 pages
SUBQUERIES.docx
No ratings yet
SUBQUERIES.docx
8 pages
Folien Woche 1-3 4x4
No ratings yet
Folien Woche 1-3 4x4
71 pages
Exploratory Data Analysis With Modelmap
No ratings yet
Exploratory Data Analysis With Modelmap
20 pages
Transversales Script
No ratings yet
Transversales Script
7 pages
DATA I Revision Data Analysis
No ratings yet
DATA I Revision Data Analysis
16 pages
Rrcov
No ratings yet
Rrcov
49 pages
Survey Wss 2010
No ratings yet
Survey Wss 2010
134 pages
Advantages : Simple Random Sampling Systematic Sampling
No ratings yet
Advantages : Simple Random Sampling Systematic Sampling
2 pages
Geor1 INTRODUCCION
No ratings yet
Geor1 INTRODUCCION
36 pages
(SpringerBriefs in Statistics) Osamu Komori, Shinto Eguchi - Statistical Methods for Imbalanced Data in Ecological and Biological Studies-Springer Japan (2019)
No ratings yet
(SpringerBriefs in Statistics) Osamu Komori, Shinto Eguchi - Statistical Methods for Imbalanced Data in Ecological and Biological Studies-Springer Japan (2019)
63 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Diggle Slides
No ratings yet
Diggle Slides
140 pages
Sampling Notes 2016 PDF
No ratings yet
Sampling Notes 2016 PDF
108 pages
Statistical Learning in R
No ratings yet
Statistical Learning in R
31 pages
R MHW
No ratings yet
R MHW
189 pages
Jurnal 3 Skripsit
No ratings yet
Jurnal 3 Skripsit
27 pages
Codes Workshop
No ratings yet
Codes Workshop
13 pages
R code
No ratings yet
R code
9 pages
BA 14- Sampling[1]
No ratings yet
BA 14- Sampling[1]
36 pages
Spatial Analysis
100% (3)
Spatial Analysis
133 pages
Pick Your Flavor of Random Forest: Elizabeth A. Freeman, Tracey S. Frescino, Gretchen G. Moisen September 10, 2018
No ratings yet
Pick Your Flavor of Random Forest: Elizabeth A. Freeman, Tracey S. Frescino, Gretchen G. Moisen September 10, 2018
19 pages
Multi Spat
No ratings yet
Multi Spat
25 pages
RForest-XGBoost
No ratings yet
RForest-XGBoost
6 pages
Analysis PDF
No ratings yet
Analysis PDF
135 pages
Improved Sampling and Prediction Techniques For Spatial Econometric Models
No ratings yet
Improved Sampling and Prediction Techniques For Spatial Econometric Models
29 pages
R Code
No ratings yet
R Code
13 pages
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
No ratings yet
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
13 pages
Summary of Geostatistical Analysis
No ratings yet
Summary of Geostatistical Analysis
135 pages
Answser Keys To Practices in R Short Course R Basics: Practice 1
No ratings yet
Answser Keys To Practices in R Short Course R Basics: Practice 1
7 pages
Data Analysis
No ratings yet
Data Analysis
13 pages
SVM Methodology 20201026
No ratings yet
SVM Methodology 20201026
7 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
FDA Vs EU MDR Technical Documentation Matrix EN
100% (1)
FDA Vs EU MDR Technical Documentation Matrix EN
10 pages
Audit Sampling With Jfa::: Cheat Sheet
No ratings yet
Audit Sampling With Jfa::: Cheat Sheet
1 page
WWW WWW WWWWWW WWWWWW WWW WW WWWWW: Informative and Elegant With Survminer
No ratings yet
WWW WWW WWWWWW WWWWWW WWW WW WWWWW: Informative and Elegant With Survminer
1 page
Shewhart Constants For Control Charts
No ratings yet
Shewhart Constants For Control Charts
1 page
Assessing Product Reliability
No ratings yet
Assessing Product Reliability
202 pages
Aspect TM 1806
No ratings yet
Aspect TM 1806
2 pages
Aspect TM 1806
No ratings yet
Aspect TM 1806
2 pages
Hazardous Substances Data Bank (HSDB) - 1405 - PubChem
No ratings yet
Hazardous Substances Data Bank (HSDB) - 1405 - PubChem
51 pages
Aspect TM 1806
No ratings yet
Aspect TM 1806
2 pages
Hidrogenio Especificação
No ratings yet
Hidrogenio Especificação
14 pages
Hydrogen Sulfide Control in Wastewater Collection Systems
No ratings yet
Hydrogen Sulfide Control in Wastewater Collection Systems
3 pages
Hydrogen Sulfide Treatment - H2S Gas in Wastewater - Aquafix
No ratings yet
Hydrogen Sulfide Treatment - H2S Gas in Wastewater - Aquafix
9 pages
The LCGC Blog - The Future of Separation Science - Goodbye Old Friends
No ratings yet
The LCGC Blog - The Future of Separation Science - Goodbye Old Friends
49 pages
Oilpac e
No ratings yet
Oilpac e
85 pages
Application Report - Fish - DMA-80 Evo
No ratings yet
Application Report - Fish - DMA-80 Evo
4 pages
Technical Guide For The Elaboration of Monographs, 8th Edition (2022)
No ratings yet
Technical Guide For The Elaboration of Monographs, 8th Edition (2022)
75 pages
Astm C595 C595M 19
No ratings yet
Astm C595 C595M 19
6 pages
Oil Pump Type D GEAR SIZES 45-47-55-57-67
No ratings yet
Oil Pump Type D GEAR SIZES 45-47-55-57-67
2 pages
Module v. Calibration and Uses of Pressure & Temperature Measuring Instruments
No ratings yet
Module v. Calibration and Uses of Pressure & Temperature Measuring Instruments
25 pages
Sitronix: Dot Matrix LCD Controller/Driver
No ratings yet
Sitronix: Dot Matrix LCD Controller/Driver
30 pages
Design of Archimedean Spiral Antenna For Radar Appplications
No ratings yet
Design of Archimedean Spiral Antenna For Radar Appplications
6 pages
Interview Prep Tracker 2024
No ratings yet
Interview Prep Tracker 2024
6 pages
Bengali Publication Modified Yearwise
No ratings yet
Bengali Publication Modified Yearwise
2 pages
Interpretation of P&I Diagrams
No ratings yet
Interpretation of P&I Diagrams
14 pages
Coulomb Solids and Low-Frequency Fluctuations in RF Dusty Plasmas
No ratings yet
Coulomb Solids and Low-Frequency Fluctuations in RF Dusty Plasmas
3 pages
Transient Response of A Second-Order System
No ratings yet
Transient Response of A Second-Order System
9 pages
Chemistry Merged Questions
No ratings yet
Chemistry Merged Questions
142 pages
CP 7043 Cryptography Network Security
No ratings yet
CP 7043 Cryptography Network Security
4 pages
Colchicine
0% (1)
Colchicine
2 pages
Filter Banks, Short-Time Fourier Analysis, and The Phase Vocoder
No ratings yet
Filter Banks, Short-Time Fourier Analysis, and The Phase Vocoder
7 pages
The Book
No ratings yet
The Book
23 pages
Chapter 4 Metal Cutting
No ratings yet
Chapter 4 Metal Cutting
45 pages
A - Z Linux Commands - Overview With Examples PDF
No ratings yet
A - Z Linux Commands - Overview With Examples PDF
39 pages
Static and Dynamic Analysis of Al-7075
No ratings yet
Static and Dynamic Analysis of Al-7075
71 pages
NEW Calibration BP
No ratings yet
NEW Calibration BP
12 pages
Lecture 03
No ratings yet
Lecture 03
64 pages
Sankalp 022W - 1-3 - LOT-p1-PH-2-CPT-1-PTC
No ratings yet
Sankalp 022W - 1-3 - LOT-p1-PH-2-CPT-1-PTC
18 pages
Lotus Academy - Weaving
100% (2)
Lotus Academy - Weaving
2 pages
MIS ppt-2
No ratings yet
MIS ppt-2
26 pages
8 - Chapter Eight - KENNY DORHAM
75% (4)
8 - Chapter Eight - KENNY DORHAM
14 pages
Module-5-Control-Hijacking-Attacks
No ratings yet
Module-5-Control-Hijacking-Attacks
61 pages
Subject Title Subject Code No. Paper No.: XXXX/5130/2/C
No ratings yet
Subject Title Subject Code No. Paper No.: XXXX/5130/2/C
4 pages
Cosh X + Sinh X Cosh X - Sinh X.: e e y y
No ratings yet
Cosh X + Sinh X Cosh X - Sinh X.: e e y y
10 pages

Cheat Sheet: Optimal Stratification

Uploaded by

Cheat Sheet: Optimal Stratification

Uploaded by

SamplingStrata: : CHEAT SHEET

To install last available release:

Optimal stratification Atomic strata B. Method "continuous" Evaluation

You might also like