0% found this document useful (0 votes)

46 views

Introduction To Probabilistic Sampling

This document introduces probabilistic sampling methods. It discusses simple random sampling (SRS), where every individual in the population has an equal and known chance of being selected, and properties like sampling weights that allow sample data to represent the larger population. SRS is the simplest probabilistic method but underlies more complex techniques. The document provides examples of how to select SRS and estimates population characteristics from sample data using estimators like the sample mean.

Uploaded by

Olortegui Alcantara Adolfo Randy (Rands)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Introduction To Probabilistic Sampling

Uploaded by

Olortegui Alcantara Adolfo Randy (Rands)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Introduction to Probabilistic Sampling

• In survey samples it is specified a population, whose data values are unknown but are regarded as

fixed, not random. Although, the observed sample is random because it depends the random selection

of individuals from this fixed population.

• Properties of a sampling method

1. Every individual in the population must have a known and a nonzero probability of belonging to

the sample (πi > 0 for individual i). And πi must be known for every individual who ends up in the

sample.

2. Every pair of individuals in the sample must have a known and a nonzero probability of belonging

to the sample (πij > 0 for the pair of individuals (i, j)). And πij must be known for every pair that

ends up in the sample.

1
Sampling weights I

• If we take a simple random sample of 3500 people from Neverland (with total population 35 million)

then any person in Neverland has a chance of being sampled equal to πi = 3500/35000000 = 1/10000

for every i.

• Then, each of the people we sample represents 10000 Neverland inhabitants.

• If 100 people of our sample are unemployed, we would expect then 100 × 10000 = 1 million unemployed

in Neverland.

• An individual sampled with a sampling probability of πi represents 1/πi individuals in the population.

This value is called the sampling weight.

2
Sampling weights II

• Example: Measure the income on a sample of one individual from a population of N individuals, where

πi might be different for each individual.

• The estimate (Tbincome ) of the total income of the population (Tincome ) would be the income for that indi-

vidual multiplied by the sampling weight

1
Tbincome = × incomei
πi

• Not a good estimate, it is based on only one person, but it is will be unbiased: the expected value of

the estimate will equal the true population total:

N N
X 1 X
E Tincome =
b × incomei · πi = incomei = incomeT
i=1
π i i=1

3
The Horvitz-Thompson estimator

• The so called Horvitz-Thompson estimator of the population total is the foundation for many complex

analysis

• If Xi is a measurement of variable X on person i , we write

e i = 1 Xi
X
πi

• Given a sample of size n the Horvitz-Thompson estimator TbX for the population total TX of X is

n n
X 1 X
T̂X = Xi = X
ei
i=1
π i i=1

4
• The variance estimator is
X X X Xi X j

i j
V ar T̂X =
d −
i,j
πij π i πj

• The formula applies to any design, however complicated, where πi and πij are known for the sampled

observations.

• The formula depends on the pairwise sampling probabilities πij , not just on the sampling weights: so

the correlations in the sampling design enter the computations.

• See formal definitions and properties in Lohr (2007) p. 240–244.

5
Simple Random Sampling

• Simple random sampling (SRS) provides a natural starting point for a discussion of probability sampling

methods. It is the simplest method and it underlies many of the more complex methods.

• Notations: Sample size is given by n and the population size by N .

• Formally defined: Simple random sampling is a sampling scheme with the property that any of the

possible subsets of n distinct elements, from the population of N elements, is equally likely to be the

chosen sample.

• Every element in the population has the same probability of being selected for the sample, and the joint

probabilities of sets of elements being selected are equal.

6
E XAMPLE:

• Suppose that a survey is to be conducted in a high school to find out about the students’ leisure habits.

A list of the school’s 1872 students is available, with the list being ordered by the students’ identification

numbers.

• Suppose that an SRS of n = 250 is required for the survey. How to draw this sample?

– By a lottery method: an urn. Although conceptually simple, this method is cumbersome to execute

and it depends on the assumption that the representative discs (one for each student) have been

thoroughly mixed: it is seldom used.

– By means of a table of random numbers. It is useful if you make it by hand: in this way is a

tedious task, requiring a large selection of random numbers, most of which are nonproductive.

7
• There are two options:

– Simple random sampling with replacement: an element can be selected more than once.

sample(1:1872, size=200, replace=TRUE)

– Simple random sampling without replacement: the sample must contain n distinct elements.

sample(1:1872, size=200, replace=FALSE)

• Sampling without replacement gives more precise estimators than sampling with replacement

• Now, assume that we have responses from all those sampled (there are not problems of non-response)

• Next step: Summarize the individual responses to provide estimates of characteristics of interest for

the population.

For instance: average number of hours of television viewing per day and the proportion of students

currently reading a novel.

8
See a visual demonstration about SRS:

library(animation)

sample.simple(nrow=10, ncol=10, size=15, p.col=c("blue", "red"), p.cex = c(1,3))

N OTATIONS:

• Capital letters are used for population values and parameters, and lower-case letters for sample values

and estimators.

• Y1 , Y2 , . . . YN denote the values of the variable y (e.g., hours of television viewing) for the N elements

in the population.

• y1 , y2 , . . . yn are the values for the n sampled elements.

• In general, the value of variable y for the i-th element in the population is Yi (i = 1, 2, . . . N ), and that

for the i-th element in the sample is yi (i = 1, 2, . . . n).

9
• The population mean is given by
N
X Yi
Ȳ =
i=1
N

• and the sample mean by

n
X yi
ȳ =
i=1
n

• In survey sampling the population variance is defined as

N 2
2
X Yi − Ȳ
σ =
i=1
N

• and the sample variance as

n
X (yi − ȳ)2
s2 =
i=1
n−1

10
Suppose that we wish to estimate the mean number of hours of television viewing per day for all the

students in the school: Ȳ

Question: how good the sample mean ȳ is as an estimator of Ȳ ?

On average, the estimator must be very closed to Ȳ over repeated applications of the sampling method.

• Observe that the term estimate is used for a specific value, while estimator is used for the rule of

procedure used for obtaining the estimate.

• In the example we obtain, an estimate of 2.20 hours of television viewing (computed by substituting the

values obtained from the sampled students in the estimator).

11
NOTE:

Statistical theory provides a means of evaluating estimators but NOT estimates.

• Properties of sample estimators are derived theoretically by considering the pattern of results that would

be generated by repeating the sampling procedure an infinite number of times.

• Example: suppose that drawing a SRS of 250 students from the 1872 students and then calculating

the sample mean for each sample were carried out infinite times (with replacement).

• The resulting set of sample means would have a distribution, known as the sampling distribution of the

mean.

12
• If the sample size is not too small (n ≈ 20 is sufficient) the distribution of the means of each sample

approximates the normal distribution, and the mean of this distribution is the population mean, Ȳ .

• Then it is said that the mean of the individual sample estimates ȳ over an infinite number of samples is

an unbiased estimator of Ȳ .

• Although the sampling distribution of ȳ is centered on Ȳ , any one estimate will differ from Ȳ .

• To avoid confusion with the standard deviation of the element values, standard deviations of sampling

distributions are known as standard errors.

• The variance (but it is not useful in practice) of a sample mean (ȳ0 ) of a SRS of size n is given by

N
N − n σ2 N −n 1 X 2
V ar(ȳ0 ) = · = · Yi − Ȳ
N −1 n N − 1 n · N i=1

13
• See example 2.1, p. 29 (artificial example) from Lohr (2006)

Suppose we have a population with eight elements (e.g. an almost extinct species of animal...) and we

know the weight yi for each of the N = 8 units of the whole population. We want to know the whole

weight of the population.

Animal i 1 2 3 4 5 6 7 8

yi 1 2 4 4 7 7 7 8

• We take a sample of size 4. How many samples of size 4, can be drawn without replacement from this

population?

8

• There are 4
= 70 possible samples of size 4 that can be drawn without replacement from this popula-

tion.

• We define as P (S) = 1/70 for each distinct subset of size 4 from the population.

14
# Consider a vector of ’observations’

y <- c(1,2,4,4,7,7,7,8)

# Consider all possible samples of size 4. They are placed into a matrix with 70 rows.

allguys <- NULL

for(i in 1:5){

for (j in (i+1):6) {

for (k in (j+1):7){

for(m in (k+1):8) {

allguys <- rbind(allguys, c(i,j,k,m)) }}}}

dim(allguys)

# Observe how indices change in each row

allguys

# Other option

library(combinat)

cuales <- combn((1:8),4)

t(cuales)

15
# To estimate the whole population weight, we take two times the sum

# of the y values for each possible sample of indices.

alltotal <- apply(allguys, 1, function(i){2*sum(y[i])})

# It produces a vector of length 70: all possible

# estimators of the attribute ’total’ based on the 70

# equally likely samples that could be drawn.

table(alltotal)

# Each of these values divided by 70 gives the probability

# of resulting estimate of total of y attributes.

# Observe that these values are coincident

print(c("Mean:",mean(alltotal)),quote=F)

print(c("Sum:",sum(y)),quote=F)

16
N −n
• The formula V ar(ȳ0 ) depends on and the sample size n.
N −1
N −n
• The term reflects the fact that the survey population is finite in size and that sampling is con-
N −1
ducted without replacement.

• With an infinite population, or if sampling were conducted with replacement, the term is not included

and the expressions are reduced to the familiar forms.

• The term indicates the gains of sampling without replacement over sampling with replacement.

• In many practical situations the populations are large and, even though the samples may also be large,

the sampling fractions are small.

17
• In large populations, the difference between sampling with and without replacement is not important:

even if the sample is drawn with replacement, the chance of selecting an element more than once is

slight.

N −n
• If the sampling fraction n/N is small, is close to 1 and has a negligible effect on the standard
N −1
error.

• The correction factor is commonly neglected (i.e., treated as 1) when the sampling fraction (n/N ) is

less than 1 in 20, or even 1 in 10.

• The larger the sample size n is, the smaller is V ar(ȳ0 ).

• For large populations it is the sample size that is dominant in determining the precision of survey results.

18
• A sample of size 2000 drawn from a country with a population of 200 million yields about as precise

results as a sample of the same size drawn from a small city of 40000 (assuming the element variances

in the two populations are the same).

• The element variance in the population σ 2 is unknown in a practical application. Denote

n
1 X
s2 = (yi − ȳ)2
n − 1 i=1

As (see Scheaffer et al. (1990) in Appendix)

N
E(s2 ) = σ2,
N −1

then,
N − n s2 n s2
Vd
ar(ȳ0 ) = · = 1− ·
N n N n

19
n
• The factor 1 − is called the finite population correction (fpc) where n/N is the sampling fraction.
N

• It is easy to determine a confidence interval for the population mean, applying standard Central Limit

Theorem.

• Example: suppose that the mean hours watching television per day for the 250 sampled students is

ȳ0 = 2.192 hours, with an element variance of s2 = 1.008. Then a 95% confidence interval for Ȳ is
s
250 1.008
2.192 ± 1.96 1− = 2.192 ± 0.116
1872 250

• That is, we are 95% confident that the interval from 2.076 to 2.308 contains the population mean.

• See some functions to illustrate the Central Limit Theorem and to compute confidence intervals pro-

grammed in R.

20
# Using R to illustrate the Central Limit Theorem (from Venables)

N <- 10000

graphics.off()

par(mfrow = c(1,2), pty = "s")

for(k in 1:20) {

m <- (rowMeans(matrix(runif(Nk), N, k)) - 0.5)sqrt(12*k)

hist(m, breaks = "FD", xlim = c(-4,4), main = k,

prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")

pu <- par("usr")[1:2]

x <- seq(pu[1], pu[2], len = 500)

lines(x, dnorm(x), col = "red")

qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")

abline(0, 1, col = "red")

Sys.sleep(1)

21
# Using the TeachingDemo library

library(TeachingDemos)

X11()

clt.examp()

X11()

clt.examp(5)

X11()

clt.examp(30)

X11()

clt.examp(50)

22
srs.mu <- function(y,N,cuacua=0.95) {

# Estimate the confidence interval for the mean

# y is the sample values

# N is the population’s size

# cuacua is the desired quantile (e.g. 0.95)

n <- length(y)

ybar <- mean(y)

z <- qnorm(1-(1-cuacua)/2)

s2 <- var(y)

var.ybar <- ((N-n)/N) * s2/n

se.ybar <- sqrt(var.ybar)

B1 <- ybar + z*se.ybar

B2 <- ybar - z*se.ybar

list(B2,B1)}

23
Sample size for estimation population means

• How large must be a sample? ⇒ Observations cost money, time and efforts.

• The number of observations needed to estimate a population mean µ with a bound on the error of

estimation of magnitude ε is obtained by solving this equation for n:

s
p s2 N − n
2 V ar(ȳ) = 2 =ε
n N

(as z0.025 = 1.96, we approximate this value by 2, for a 95% of confidence).

• Hence, the sample size required to estimate µ with a bound on the error of estimation ε is

N · s2
n= N 2
4
ε + s2

• Note that s2 must be estimated previously by means of another argument.

24
E XAMPLE

• The average amount of money µ for a hospital’s accounts receivable must be estimated. Although no

prior data is available to estimate the population variance σ 2 , it is known that most accounts lie within

a e100 range. There are N = 1000 open accounts. Find the sample size needed to estimate µ with a

bound on the error of estimation ε = e3.

• First we estimate the population variance. Since the range is often approximately equal to four or six

standard deviations (4 · s or 6 · s), depending of the normality of data (see the Chebyshev’s inequality),

then
range 100
s≈ = = 25
4 4

then s2 ≈ 625, so

N · s2 1000 · 625
n= N 2
= 1000 = 217.39 ≈ 217 or 218 observations
4
ε + s2 4
· 9 + 625

25
See a function to calculate sample sizes programmed in R:

n.mu <- function(N,s2,epsilon) {

# n to estimate the population mean

D <- (epsilonˆ2)/4

n <- (N*s2)/((N*D)+s2)

n.round <- round(n)

cbind(n.round)

# Application

n.mu(N=1000,s2=625,epsilon=3)

26
• Many sample surveys are interested about a population total, e.g. when analyzing total accounts.

• The population total (the sum of all observations) in the population is denoted by the symbol τ . Hence

Nµ = τ

• It is expected the estimator of τ to be N times the estimator of µ, then

– Estimator of population total

Pn
N i=1 yi
τ̂ = N ȳ =
n

– Estimated variance of τ̂
n s2
Vd ar(N ȳ) = N 2 1 −
ar(τ̂ ) = Vd
N n
1
Pn
where s2 = n−1 i=1 (yi − ȳ)2

27
Systematic Sampling
• The method of systematic sampling reduces the effort required for the sample selection.

• Systematic sampling is easy to apply, involving simply taking every k-th element after a random start.

• Example: suppose that a sample of 250 students is required from a school with 2000 students. The

sampling fraction is 250/2000, or 1 in 8.

• A systematic sample of the required size, would then be obtained by taking a random number between

1 and 8, to determine the first student in the sample, and taking every eighth student thereafter.

• If the random number were 5, the selected students would be the fifth, thirteenth, twenty-first, and so

on, on the list.

• When the sampling is not a simple integer, we can round the interval to an integer, with a resultant

change in the sample size.

28
• Example: If fraction is 250/1872 or 1 in 7.488, a 1 in 7 sample would produce a sample of 267 or 268,

while a 1 in 8 sample would produce a sample of 234.

• Other solution is to round the interval down to start with an element selected at random from the N

elements in the population, and to proceed until the desired sample size has been achieved. Therefore,

the list is treated as circular, and the last listing is followed by the first.

• Like SRS, systematic sampling gives each element in the population the same chance of being selected

for the sample.

• It differs, however, from SRS in that the probabilities of different sets of elements being included in the

sample are not all equal.

• In systematic sampling the sample mean is a reasonable estimator of the population mean. However,

the unequal probabilities of sets of elements means that the SRS standard error formulae are not

directly applicable.

29
• In order to estimate the standard error of estimators based on systematic samples. Sometimes it is

reasonable to assume that the list is approximately randomly ordered, in which case the sample can be

treated as if it were a simple random sample.

• Lists arranged in alphabetical order may often be reasonably treated in this way.

• Systematic sampling performs badly when the list is ordered in cycles of values of the survey variables

and when the sampling interval coincides with a multiple of the length of the cycle.

• Systematic sampling is widely used in practice without excessive concern for the damaging effects of

undetected cycles in the ordering of the list.

• See visual demonstration in:

library(animation)

sample.system()

• See a function to calculate systematic sampling programmed in R:

30
systematic.sample <- function(n, N, initial=F){

k <- floor(N/n)

if(initial==F){

initial <- sample(1:k,1)}

cat("Interval=", k, " Starting value=", initial, "\n")

# Put the origin in the value ’initial’

shift <- (1:N) - initial

# I search numbers who are multiple of number k

# (equivalent to find the rest of a%%b=0)

guy <- (1:N)[(shift %% k) == 0]

return(guy)

31
Sampling with probabilities proportional to size

• Sometimes is advantageous to select sampling units with different probabilities (other than uniform).

• The method is called sampling with probabilities proportional to size or pps sampling.

• For a sample y1 , y2 , . . . , yn from a population of size N, let πi the probability that yi appears in the

sample.

• the pps estimator of µ only produces smaller variances than an standard SRS if the weights πi are

approximately proportional to the size of the yi under investigation.

32
• In this case, the estimator of the population mean µ is

n
1 X yi
µ̂pps =
N n i=1 πi

and the estimated variance of µ̂ is

n 2
1 X yi
Vd
ar(µ̂pps ) = 2 − N · µ̂pps
N n(n − 1) i=1 πi

• The best practical way to choose the weights πi ’s is to chose them proportional to a known measure-

ment that is highly correlated with yi .

• See the library pps of R from:

http://cran.r-project.org/web/packages/pps/index.html

33
Example (from Scheaffer et al., p. 80):

An investigator wishes to estimate the average number of defects per keyboard on keyboards of elec-

tronic components manufactured for installation in computers. The keyboards contain varying numbers of

components, and the investigator feels that the number of defects should be positively correlated with the

number of components on a keyboard.

Thus, pps sampling is used with the probability of selecting anyone keyboard for the sample being pro-

portional to the number of components on that keyboard. A sample of n = 4 keyboards is to be selected

from the N = 10 keyboards of one day’s production. The number of components on the 10 keyboards are,

respectively: 10, 12, 22, 8, 16, 24, 9, 10, 8, 31.

After the sampling was completed, the number of defects found on the four keyboards were, respectively,

1, 3, 2 and 1. Estimate the average number of defects per keyboard, and place a bound on the error of

estimation.

34
data <- 1:10

N <- length(data)

n <- 4

weights <- c(10, 12, 22, 8, 16, 24, 9, 10, 8, 31)

probs <- weights/sum(weights)

who <- sample((1:N), n, prob=probs, replace=FALSE)

# In Scheaffer et al. the number of defects for each board were

yi <- c(1, 3, 2, 1)

m.pps <- (1/(Nn))sum(yi/probs[who])

m.pps

var.pps <- (1/((Nˆ2)n(n-1)))sum(((yi/probs[who]) - (Nm.pps))ˆ2)

err <- 2*sqrt(var.pps)

cat("interval 95% -> [", m.pps-err, ";" , m.pps+err, "]","\n")

35
Consider a SRS based in the library survey:

http://faculty.washington.edu/tlumley/survey/

We consider this example

# Artificial Data

mydata <- rbind(matrix(rep("nc",165),165,1,byrow=TRUE),

matrix(rep("sc",70),70,1,byrow=TRUE))

mydata <- cbind.data.frame(mydata,c(rep(1,100),rep(2,50),rep(3,15),

rep(1,30),rep(2,40)),100*runif(235))

names(mydata) <- c("state","region","income")

N <- dim(mydata)[[1]]

n <- 50

# Export data to a file with Stata format

library(foreign)

write.dta(mydata,"C:/QM/mydata.dta")

36
# Selection of a sample

srs_rows <- sample(N,n)

srs <- mydata[srs_rows,]

library(survey)

srs$popsize <- N

dsrs <- svydesign(id=∼1, fpc=∼popsize, data=srs)

summary(dsrs)

svytotal(∼income, dsrs, na.rm=TRUE)

svymean(∼income, dsrs, na.rm=TRUE)

svyvar(∼income, dsrs, na.rm=TRUE)

svyquantile(∼income, quantile=c(0.25,0.5,0.75), design=dsrs, na.rm=TRUE, ci=TRUE)

par(mfrow=c(2,1))

svyhist(∼income, dsrs, main="Survey weighted", col="red")

hist(mydata$income, main="Population", xlab="income", col="yellow", prob=TRUE)

37
Consider a SRS programmed with Stata:

* Read the previous artificial data

use C:\QM\mydata.dta

count

* Fix the seed of the randomization

set seed 666

* Take a sample equal to the 20\% of the population

sample 20

count

* Compute weights and the factor of population correction

gen pw = 235/47

gen fpc = 235

38
* Set the sampling design

svyset [pweight=pw], fpc(fpc)

svydescribe

* Compute several statistics

svy: mean income

svy: total income

svy linearized : tabulate region

svy linearized : tabulate state

svy: tabulate region state, row se ci format(%7.4f)

* Compute a box-plot

graph box income [pweight=pw]

Sta 341 Class Notes Final
No ratings yet
Sta 341 Class Notes Final
120 pages
The Crucible Comprehensive Quotes Sheet 2i0i5r5
No ratings yet
The Crucible Comprehensive Quotes Sheet 2i0i5r5
18 pages
Practical Research 1 Module 3 - REVISED
91% (11)
Practical Research 1 Module 3 - REVISED
22 pages
Ss Notes
No ratings yet
Ss Notes
34 pages
Simple Random Sampling
No ratings yet
Simple Random Sampling
10 pages
Sampling Distribution: Estimation and Testing of Hypothesis
No ratings yet
Sampling Distribution: Estimation and Testing of Hypothesis
34 pages
notes on sample survey
No ratings yet
notes on sample survey
34 pages
Principles of Sampling
No ratings yet
Principles of Sampling
20 pages
Sample Surveys: Rohan, Vijayan
No ratings yet
Sample Surveys: Rohan, Vijayan
72 pages
Sample
No ratings yet
Sample
23 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
MATH10282: Introduction To Statistics Lecture Notes
No ratings yet
MATH10282: Introduction To Statistics Lecture Notes
49 pages
2b.-SRS-for-proportion_20.05.221
No ratings yet
2b.-SRS-for-proportion_20.05.221
9 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Chapter 5 Statistics
No ratings yet
Chapter 5 Statistics
11 pages
BSc Sample Surveys Unit I Part II
No ratings yet
BSc Sample Surveys Unit I Part II
12 pages
Chapter 4
No ratings yet
Chapter 4
45 pages
9 Sampling
No ratings yet
9 Sampling
5 pages
Sp25 Module 06 Sampling
No ratings yet
Sp25 Module 06 Sampling
45 pages
Chap5 Statistical Inference
No ratings yet
Chap5 Statistical Inference
38 pages
Sampling Inference
No ratings yet
Sampling Inference
83 pages
Sampling For Proportions and Percentages
No ratings yet
Sampling For Proportions and Percentages
8 pages
P&S Unit 4 total
No ratings yet
P&S Unit 4 total
39 pages
Stat 3014 Notes 11 Sampling
100% (2)
Stat 3014 Notes 11 Sampling
36 pages
Point and Interval Estimate
No ratings yet
Point and Interval Estimate
135 pages
Lecture 14 Simple Random Sampling 3
No ratings yet
Lecture 14 Simple Random Sampling 3
15 pages
CHAPTER TWO Statistics method (2)-1(0)
No ratings yet
CHAPTER TWO Statistics method (2)-1(0)
10 pages
- Module 4-Sampling 2
No ratings yet
- Module 4-Sampling 2
56 pages
BALANGUE ALLEN JOHN Lesson 7
No ratings yet
BALANGUE ALLEN JOHN Lesson 7
12 pages
Lecture 4 Simple Random Sampling
No ratings yet
Lecture 4 Simple Random Sampling
6 pages
Creative Commons Attribution-Noncommercial-Sharealike License
No ratings yet
Creative Commons Attribution-Noncommercial-Sharealike License
50 pages
Basic Univariate Statistics for Engineers 2019
No ratings yet
Basic Univariate Statistics for Engineers 2019
32 pages
Simple Random Sampling
No ratings yet
Simple Random Sampling
18 pages
Chap5 Statistical Inference
No ratings yet
Chap5 Statistical Inference
37 pages
ND Vohra Ch10 Theory of Estimation
No ratings yet
ND Vohra Ch10 Theory of Estimation
37 pages
ch7
No ratings yet
ch7
18 pages
N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR
No ratings yet
N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR
30 pages
EIE2001 Lecture 6b Week 7
No ratings yet
EIE2001 Lecture 6b Week 7
10 pages
SMFDA
No ratings yet
SMFDA
45 pages
Ch1 Prob II NAU Spring23
No ratings yet
Ch1 Prob II NAU Spring23
17 pages
Formalizing The Concepts: Simple Random Sampling: Juan Muñoz Kristen Himelein March 2013
No ratings yet
Formalizing The Concepts: Simple Random Sampling: Juan Muñoz Kristen Himelein March 2013
25 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
34 pages
STA 2402 Design and Analysis of Sample Surveys PDF
No ratings yet
STA 2402 Design and Analysis of Sample Surveys PDF
81 pages
9 Sampling Distribution and Point Estimation of Parameters
No ratings yet
9 Sampling Distribution and Point Estimation of Parameters
4 pages
Why "Sample" The Population? Why Not Study The Whole Population?
No ratings yet
Why "Sample" The Population? Why Not Study The Whole Population?
9 pages
Stat For Comp (7-9)
No ratings yet
Stat For Comp (7-9)
22 pages
Cluster Sampling
No ratings yet
Cluster Sampling
18 pages
Intro To Statistics CH 1
No ratings yet
Intro To Statistics CH 1
8 pages
Point-estimation-and-sampling-distribution
No ratings yet
Point-estimation-and-sampling-distribution
6 pages
CH-5 Sampling Distribution Lecture
No ratings yet
CH-5 Sampling Distribution Lecture
19 pages
FECO Note 1 - Review of Statistics: Xuan Chinh Mai
No ratings yet
FECO Note 1 - Review of Statistics: Xuan Chinh Mai
13 pages
Sampling Distribution and Estimation
No ratings yet
Sampling Distribution and Estimation
46 pages
Sampling and Sampling Distribution of Sample Means
No ratings yet
Sampling and Sampling Distribution of Sample Means
7 pages
Why "Sample" The Population? Why Not Study The Whole Population?
No ratings yet
Why "Sample" The Population? Why Not Study The Whole Population?
9 pages
Lecture 6-3 - Simple Random Sampling
No ratings yet
Lecture 6-3 - Simple Random Sampling
27 pages
Statistics PDF
No ratings yet
Statistics PDF
17 pages
Chapter 4567 - KT 110H
No ratings yet
Chapter 4567 - KT 110H
32 pages
Samplin Distn
No ratings yet
Samplin Distn
37 pages
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
PDF Ancient Boats In North West Europe The Archaeology of Water Transport to AD 1500 Sean Mcgrail download
100% (9)
PDF Ancient Boats In North West Europe The Archaeology of Water Transport to AD 1500 Sean Mcgrail download
50 pages
Runions Et Al-2016-Aggressive Behavior
No ratings yet
Runions Et Al-2016-Aggressive Behavior
12 pages
Mil 2
No ratings yet
Mil 2
14 pages
As Further Mathematics - CP1 - Vectors MS
No ratings yet
As Further Mathematics - CP1 - Vectors MS
12 pages
BladderScan BVI 9400 - User Manual
No ratings yet
BladderScan BVI 9400 - User Manual
80 pages
Childish Gambino's "This Is America": Lyrics
No ratings yet
Childish Gambino's "This Is America": Lyrics
4 pages
3 Tombs in Hue
No ratings yet
3 Tombs in Hue
6 pages
Module Test Theretical Phonetics V1
No ratings yet
Module Test Theretical Phonetics V1
2 pages
Wonder - Reading Comprehension Exercise (Parts 2-3)
100% (1)
Wonder - Reading Comprehension Exercise (Parts 2-3)
6 pages
Sydney Kumalo
No ratings yet
Sydney Kumalo
4 pages
Sri Ramana Maharshi_ Who Am I_
No ratings yet
Sri Ramana Maharshi_ Who Am I_
6 pages
Deed of gift - Nuwini 1
No ratings yet
Deed of gift - Nuwini 1
4 pages
Document Introduction: Patch Connect Plus For 3 Party Automate Patch Management
No ratings yet
Document Introduction: Patch Connect Plus For 3 Party Automate Patch Management
18 pages
Coefficient of Skewness (Grouped Data)
No ratings yet
Coefficient of Skewness (Grouped Data)
4 pages
toh215
No ratings yet
toh215
29 pages
CPG - Use of Corticosteroids in General Practice Dec 2006
No ratings yet
CPG - Use of Corticosteroids in General Practice Dec 2006
131 pages
November 2024
No ratings yet
November 2024
3 pages
N101 Foot Reflexology Course Brochure
No ratings yet
N101 Foot Reflexology Course Brochure
2 pages
How Chan Buddhism Became A Reinvented School by The Song Dynasty
No ratings yet
How Chan Buddhism Became A Reinvented School by The Song Dynasty
5 pages
Archaeological and Literary Sources
100% (1)
Archaeological and Literary Sources
6 pages
Group Assignment 1 (Week 5 / Session 7) (120 Minutes) : A. Listening Skill 5 (9 Points)
No ratings yet
Group Assignment 1 (Week 5 / Session 7) (120 Minutes) : A. Listening Skill 5 (9 Points)
4 pages
Mentor Mentee nearby NIFTEM 15-11-2024
No ratings yet
Mentor Mentee nearby NIFTEM 15-11-2024
15 pages
Set - 1 (Basic -241) (QP)
No ratings yet
Set - 1 (Basic -241) (QP)
8 pages
Zinger - Unit Plan Lessons
No ratings yet
Zinger - Unit Plan Lessons
51 pages
2 - Final Self-Evaluation of Writing Skills
No ratings yet
2 - Final Self-Evaluation of Writing Skills
3 pages
Diagnostic Test Form 4 - 2006 Marking Scheme: Physics Paper 2 Section A Sec MRK Answer Remarks
No ratings yet
Diagnostic Test Form 4 - 2006 Marking Scheme: Physics Paper 2 Section A Sec MRK Answer Remarks
5 pages
Federici Marx Feminism
No ratings yet
Federici Marx Feminism
9 pages
Topic - 3 (Solving Problems by Searching) (31.01.17)
No ratings yet
Topic - 3 (Solving Problems by Searching) (31.01.17)
70 pages

Introduction To Probabilistic Sampling

Uploaded by

Introduction To Probabilistic Sampling

Uploaded by

Introduction to Probabilistic Sampling

of individuals from this fixed population.

• Properties of a sampling method

ends up in the sample.

• Then, each of the people we sample represents 10000 Neverland inhabitants.

This value is called the sampling weight.

πi might be different for each individual.

vidual multiplied by the sampling weight

the estimate will equal the true population total:

• If Xi is a measurement of variable X on person i , we write

the correlations in the sampling design enter the computations.

• See formal definitions and properties in Lohr (2007) p. 240–244.

• Notations: Sample size is given by n and the population size by N .

probabilities of sets of elements being selected are equal.

thoroughly mixed: it is seldom used.

sample(1:1872, size=200, replace=TRUE)

sample(1:1872, size=200, replace=FALSE)

currently reading a novel.

sample.simple(nrow=10, ncol=10, size=15, p.col=c("blue", "red"), p.cex = c(1,3))

• y1 , y2 , . . . yn are the values for the n sampled elements.

for the i-th element in the sample is yi (i = 1, 2, . . . n).

• and the sample mean by

• In survey sampling the population variance is defined as

• and the sample variance as

students in the school: Ȳ

Question: how good the sample mean ȳ is as an estimator of Ȳ ?

procedure used for obtaining the estimate.

values obtained from the sampled students in the estimator).

Statistical theory provides a means of evaluating estimators but NOT estimates.

be generated by repeating the sampling procedure an infinite number of times.

distributions are known as standard errors.

weight of the population.

allguys <- NULL

allguys <- rbind(allguys, c(i,j,k,m)) }}}}

# Observe how indices change in each row

cuales <- combn((1:8),4)

# of the y values for each possible sample of indices.

alltotal <- apply(allguys, 1, function(i){2*sum(y[i])})

# It produces a vector of length 70: all possible

# estimators of the attribute ’total’ based on the 70

# equally likely samples that could be drawn.

# Each of these values divided by 70 gives the probability

# of resulting estimate of total of y attributes.

# Observe that these values are coincident

and the expressions are reduced to the familiar forms.

the sampling fractions are small.

less than 1 in 20, or even 1 in 10.

• The larger the sample size n is, the smaller is V ar(ȳ0 ).

in the two populations are the same).

• The element variance in the population σ 2 is unknown in a practical application. Denote

As (see Scheaffer et al. (1990) in Appendix)

par(mfrow = c(1,2), pty = "s")

m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k)

hist(m, breaks = "FD", xlim = c(-4,4), main = k,

prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")

x <- seq(pu[1], pu[2], len = 500)

lines(x, dnorm(x), col = "red")

qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")

abline(0, 1, col = "red")

# Estimate the confidence interval for the mean

# y is the sample values

# N is the population’s size

# cuacua is the desired quantile (e.g. 0.95)

ybar <- mean(y)

var.ybar <- ((N-n)/N) * s2/n

se.ybar <- sqrt(var.ybar)

B1 <- ybar + z*se.ybar

B2 <- ybar - z*se.ybar

estimation of magnitude ε is obtained by solving this equation for n:

(as z0.025 = 1.96, we approximate this value by 2, for a 95% of confidence).

• Note that s2 must be estimated previously by means of another argument.

bound on the error of estimation ε = e3.

n.mu <- function(N,s2,epsilon) {

# n to estimate the population mean

n.round <- round(n)

• It is expected the estimator of τ to be N times the estimator of µ, then

m <- (rowMeans(matrix(runif(Nk), N, k)) - 0.5)sqrt(12*k)

m.pps <- (1/(Nn))sum(yi/probs[who])

var.pps <- (1/((Nˆ2)n(n-1)))sum(((yi/probs[who]) - (Nm.pps))ˆ2)