100% found this document useful (1 vote)
43 views

Note 3 - Part 1 Simple Random Sampling

This document discusses simple random sampling. Simple random sampling involves selecting a sample from a population so that every possible sample of a given size has an equal chance of being selected. The key aspects are: - Every element in the population has an equal chance of being selected for the sample. - The selection of one element does not affect the selection of other elements. - Common uses are to estimate population means, totals, and proportions from sample data. - Samples can be selected using random number tables or computer programs to ensure randomness.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
43 views

Note 3 - Part 1 Simple Random Sampling

This document discusses simple random sampling. Simple random sampling involves selecting a sample from a population so that every possible sample of a given size has an equal chance of being selected. The key aspects are: - Every element in the population has an equal chance of being selected for the sample. - The selection of one element does not affect the selection of other elements. - Common uses are to estimate population means, totals, and proportions from sample data. - Samples can be selected using random number tables or computer programs to ensure randomness.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Simple Random Sampling

1
Simple Random Sampling
Introduction
• The objective of a sample survey is to make an inference about population parameters from information contained in a
sample.
• Two factors affect the quantity of information contained in the sample and hence the precision of our inference-making
procedure.
• The first is the size of the sample selected from the population.
• The second is the amount of variation in the data; variation can frequently be controlled by the method of selecting the
sample.
• The procedure for selecting the sample is called the sample survey design.
• For a fixed sample size n, we will consider various designs, or sampling procedures, for obtaining the n observations in the
sample.
• Because observations cost money, a design that provides a precise estimator of the parameter for a fixed sample size yields a
savings in cost to the experimenter.
• The basic design, or sampling technique, called simple random sampling is discussed in this chapter.

2
Simple Random Sampling cont…
Introduction cont…
DEFINITION

If a sample of size n is drawn from a population of size N such that every possible sample of size n has the same chance of
being selected, the sampling procedure is called simple random sampling. The sample thus obtained is called a simple random
sample.
• It is a consequence of this definition that all individual elements in a population have the same chance of being selected and
that the selection of individual elements is mutually independent: the presence or absence of a given element from the
sample does not affect the selection probability of any other element.
• We will use simple random sampling to obtain estimators for population means, totals, and proportions.

3
Simple Random Sampling cont…
Introduction cont…
• Consider the following problem. A federal auditor is to examine the accounts for a city hospital. The hospital records
obtained from a computer data file show a particular accounts receivable total, and the auditor must verify this total. If there
are 28,000 open accounts in the hospital, the auditor cannot afford the time to examine every patient record to obtain a total
accounts receivable figure. Hence, the auditor must choose some sampling scheme for obtaining a representative sample of
patient records. After examining the patient accounts in the sample, the auditor can then estimate the accounts receivable
total for the entire hospital. If the computer figure lies within a specified range of the auditor’s estimate, the computer figure
is accepted as valid. Otherwise, more hospital records must be examined for possible discrepancies between the computer
figure and the sample data.
• Suppose that all N = 28,000 patient records are recorded in a computer file and a sample size n = 100 is to be drawn. The
sample is called a simple random sample if every possible sample of n = 100 records has the same chance of being selected.

4
Simple Random Sampling cont…
How to Draw a Simple Random Sample
• To draw a simple random sample from the population of interest is not as simple as it may first appear. How can we draw a
sample from a population in such a way that every possible sample of size n has the same chance of being selected?
• Simple random samples can be selected by using tables of random numbers.
• A random number table is a set of integers generated so that in the long run the table will contain all ten integers (0, 1, . . . ,
9) in approximately equal proportions, with no trends in the pattern in which the digits are generated. Thus, if one number is
selected from a random point in the table, it is equally likely to be any of the digits 0 through 9.

5
Simple Random Sampling cont…
How to Draw a Simple Random Sample cont…
• Choosing numbers from the table is similar to drawing numbers out of a hat containing those numbers on thoroughly mixed
pieces of paper.
• Suppose we want a simple random sample of three people to be selected from seven.
• We could number the people from 1 to 7, put slips of paper containing these numbers (one number on each slip) into a hat,
mix them, and draw out three, without replacing the drawn numbers.
• Similarly, we could drop a pencil point on a random starting point in random number Table.

6
Simple Random Sampling cont…
How to Draw a Simple Random Sample cont…
• Suppose the point falls on the 15th line of
column 9 and we decide to use the
rightmost digit (a 5, in this case).
• This procedure is like drawing a 5 from
the hat.
• We may now proceed in any direction to
obtain the remaining numbers in the
sample.
• Suppose we decide before starting to
proceed down the page.
• The number immediately below the 5 is a
2, so our second sampled person is number
2.
• Proceeding, we next come to an 8, but
there are only seven people in our
population; hence, the 8 must be ignored.
• Two more 5s then appear, but both must be
ignored because person 5 has already been
selected. (The 5 has been removed from
the hat.)
• Finally, we come to a 1, and our sample of
three is completed with persons numbered
5, 2, and 1.
7
Simple Random Sampling cont…
How to Draw a Simple Random Sample cont…

• Note that any starting point can be used and we can move in any predetermined direction.
• If more than one sample is to be used in any problem, each should have its own unique starting point.
• Many computer programs, such as MINITAB, can be used to generate random numbers. A more realistic illustration is given
in Example 1.

EXAMPLE .1 For simplicity, assume there are N = 1000 patient records from which a simple random sample of n = 20 is to be
drawn. We know that a simple random sample will be obtained if every possible sample of n = 20 records has the same chance
of being selected. The digits in random number Table, and in any other table of random numbers, are generated to satisfy the
conditions of simple random sampling. Determine which records are to be included in a sample of size n = 20.

SOLUTION We can think of the accounts as being numbers 001, 002, . . . , 999, 000. That is, we have 1000 three-digit numbers,
where 001 represents the first patient record, 999 the 999th patient record, and 000 the 1000th.

8
Simple Random Sampling cont…
How to Draw a Simple Random Sample cont…

• Refer to random number Table and use the first column; if we drop the last two digits of each
number, we see that the first three-digit number formed is 104, the second is 223, the third is
241, and so on.
• Taking a random sample of 20 digits, we obtain the numbers shown in next Table.
• If the records are actually numbered, we merely choose the records with the corresponding
numbers, and these records represent a simple random sample of n=20 from N=1000.
• If the patient accounts are not numbered, we can refer to a list of the accounts and count from
the 1st to the 10th, 23rd, 70th, and so on, until the desired numbers are reached.
• If a random number occurs twice, the second occurrence is omitted, and another number is
selected as its replacement.

9
Simple Random Sampling cont…
Estimation of a Population Mean and Total
• One way to make inferences is to estimate certain population parameters by using the sample information.
• The objective of a sample survey is often to estimate a population mean, denoted by 𝜇, or a population total, denoted by 𝜏.
• Thus, the auditor in Example might be interested in the mean dollar value for the accounts receivable or the total dollar
amount in these accounts. Hence, we consider the estimation of the two population parameters, 𝜇 and 𝜏, in this section.
• Suppose that a simple random sample of n accounts is drawn, and we are to estimate the mean value per account for the
total population of hospital records.
• Intuitively, we employ the sample average to estimate 𝜇.

𝑛
• 𝜋𝑖 = and the unbiased estimator of the population total, 𝜏, is given by
𝑁

10
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…
𝜏
• Because the population mean is related to the total by the equation = 𝜇, the sample mean will be an unbiased estimator of
𝑁

the population mean. That is,

• Of course, a single value 𝑦ത of tells us very little about the population mean 𝜇, unless we are able to evaluate the goodness
of our estimator.
• Hence, in addition to estimating 𝜇, we would like to place a bound on the error of estimation. To accomplish this we need
the variance of the estimator; for a simple random sample chosen without replacement from a population of size N,

11
Simple Random Sampling cont…
Proof
Estimation of a Population Mean and Total cont…

12
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…
• The variance of the estimator 𝑦ത is the same as that given in an introductory course except that it is multiplied by a correction
factor to adjust for sampling from a finite population.
• The correction factor takes into account the fact that an estimate based on a sample n = 10 from a population of N = 20 items
contains more information about the population than a sample of n = 10 from a population of N = 20,000.
• Returning to the example in Section 3 in which samples of size n = 2 were selected from the population {1, 2, 3, 4}, we can
now demonstrate properties of the sample mean described in these formulas. Table shows the six possible samples of size 2
and the related sample statistics.

13
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…
• If a single observation y is selected at random from this population, then y can take on any of the four possible values, each
with probability 1/4. Thus, 𝑦ത

𝑦ത

Because each of these sample means can occur with probability 1/6, we can compute 𝐸(𝑦)
ത and 𝑉(𝑦).
ത From our definition of
expected value,

14
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…

15
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…

Thus, we have demonstrated that

and that 𝑉෠ 𝑦ത is an unbiased estimator of 𝑉(𝑦).


ത The key results of this section are summarized next.

16
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…

• The quantity 1 – n/N is called the finite population correction (fpc).


• Note that this correction factor differs slightly from the one encountered in the true variance of y.
• When n remains small relative to the population size N, the fpc is close to unity.
• Practically speaking, the fpc can be ignored if 1 – n/N ≥ .95, or equivalently, n ≤ (1/20)N.
• In that case, the estimated variance of y is the more familiar quantity 𝑠 2 /𝑛.
• In many cases, the population size is not clearly defined or is unknown.
• Suppose very small laboratory specimens are selected from a large bulk tank of raw sugar in order to measure pure sugar
content. How N will be determined is unclear, but it can generally be assumed to be quite large. Hence, the fpc can be
ignored.
• If a sample of voters is selected from the population of a state, to obtain a precise N for that point in time is generally
impossible. Again, N is assumed large and the fpc is ignored.
• Some texts present the fpc as (N - n)/N; we prefer 1 – n/N because it highlights the role of the sampling fraction n/N. The
sampling fraction is often denoted by f = n/N, in which case the fpc can be represented as 1 - f.
17
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…
• In theory, if a two-standard deviation bound on the error (often called a margin of error) is subtracted from and added to the
sample mean, the resulting confidence interval has approximately a 95% chance of capturing the population mean within its
boundaries.
• This result is built on a theory that requires the sample mean in question to have approximately a normal distribution. To
illustrate how this works we return to the brain weight data, Figure (slide no 34 in note 2) shows that, for data on the original
scale, the sampling distribution for the mean of samples of size 5 is highly skewed.
• Figure ((slide no 36 in note 2)) shows that, for data on the logarithmic scale, the sampling distribution of sample means is
quite normal looking.

18
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…
• How is this behavior of sampling distributions reflected in the performance of confidence intervals? Figure shows 50
confidence intervals constructed from random samples of size 5 with a two-standard deviation bound on the error using the
original population data for brain weights. Only 28 of the intervals (56%) cover the true population mean of 394.5; many of
the intervals are too short and lie too far to the left.

19
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…

• Using the same method on the log-transformed data results in the intervals in below Figure. Here, 48 of the calculated
intervals (96%) cover the population mean of 2.98.
• Quite a difference! Not only that, the intervals in this Figure are also more uniform in length.
• The message to be learned is that the results of this section will not work well unless there is reasonable assurance that the
sample means being studied have sampling distributions that are not too far from normal.

20
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…

EXAMPLE Refer to the hospital audit in Example and suppose that a random sample of n = 200 accounts is selected from the
total of N = 1000. The sample mean of the accounts is found to be 𝑦ത = $94.22, and the sample variance is 𝑠 2 = 445.21.
Estimate μ, the average due for all 1000 hospital accounts, and place a bound on the error of estimation.

21
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…

EXAMPLE A simple random sample of n = 9 hospital records is drawn to estimate the average amount of money due on N = 484
open accounts. The sample values for these nine records are listed in Table. Estimate μ, the average amount outstanding, and
place a bound on your error of estimation.

Try

22
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…

As we have already seen, many sample surveys are conducted to obtain information about a population total. The federal
auditor in Example would probably be interested in verifying the computer figure for the total accounts receivable (in dollars)
for the N = 1000 open accounts. The population total is denoted by the symbol τ. Because we know that the estimator of τ is N
times the estimator of μ. It is also true that the margin of error for estimating a total is N times the margin of error for estimating
the mean.

23
Simple Random Sampling cont…
Estimation of a Population Mean and Total cont…
EXAMPLE An industrial firm is concerned about the time spent each week by scientists on certain trivial tasks. The time-log
sheets of a simple random sample of n = 50 employees show the average amount of time spent on these tasks is 10.31 hours,
with a sample variance of 2.25. The company employs N = 750 scientists. Estimate the total number of worker-hours lost each
week on trivial tasks and place a bound on the error of estimation.

24
Simple Random Sampling cont…
Selecting the Sample Size for Estimating Population Means and Totals

• At some point in the design of the survey,


someone must make a decision about the size
of the sample to be selected from the
population. So far, we have discussed a
sampling procedure (simple random
sampling) but have said nothing about the
number of observations to be included in the
sample.
• The number of observations needed to
estimate a population mean μ with a bound
on the error of estimation of magnitude B is
found by setting 2SD of the estimator, 𝑦,

equal to B and solving this expression for n.
That is, we must solve 25
Simple Random Sampling cont…
Selecting the Sample Size for Estimating Population Means and Totals cont…

Solving for n in a practical situation presents a problem because the population variance is unknown. Because a sample
variance is frequently available from prior experimentation, we can obtain an approximate sample size by replacing
population variance with sample variance in above equation. We illustrate a method for guessing a value of 𝜎 2 when very
little prior information is available. If N is large, as it usually is, then (N - 1) can be replaced by N in the denominator of above
equation.

26
Simple Random Sampling cont…
Selecting the Sample Size for Estimating Population Means and Totals cont…
EXAMPLE The average amount of money μ for a
hospital’s accounts receivable must be estimated.
Although no prior data are available to estimate the
population variance, it is known that most accounts
lie within a $100 range. There are N = 1000 open
accounts. Find the sample size needed to estimate μ
with a bound on the error of estimation B=$3.

27
Simple Random Sampling cont…
Selecting the Sample Size for Estimating Population Means and Totals cont…

Likewise, we can determine the number of observations needed to estimate a population total τ, with a bound on the error of
estimation of magnitude B. The required sample size is found by setting 2SD of the estimator equal to B and solving this
expression for n. That is, we must solve

28
Simple Random Sampling cont…
Selecting the Sample Size for Estimating Population Means and Totals cont…
EXAMPLE An investigator is interested in estimating the total weight gain in 4 weeks for N = 1000 chicks fed on a new ration.
Obviously, to weigh each bird would be time consuming and tedious. Therefore, determine the number of chicks to be
sampled in this study in order to estimate τ with a bound on the error of estimation equal to 1000 grams. Many similar studies
on chick nutrition have been run in the past. Using data from these studies, the investigator found that 𝜎 2 , the population
variance, was approximately equal to 36.00 (grams)2. Determine the required sample size.

29
Simple Random Sampling cont…
Estimation of a Population Proportion
• The investigator conducting a sample survey is frequently interested in estimating the proportion of the population that
possesses a specified characteristic.
• For example, a congressional leader investigating the merits of an 18-year-old voting age may want to estimate the
proportion of the potential voters in the district between the ages of 18 and 21.
• A marketing research group may be interested in the proportion of the total sales market in diet preparations that is
attributable to a particular product.
• That is, what percentage of sales is accounted for by a particular product?
• You will recognize that all these examples exhibit a characteristic of the binomial experiment—that is, an observation either
does belong or does not belong to the category of interest.
• For example, we can estimate the proportion of eligible voters in a particular district by examining population census data for
several of the boundaries within the district.
• An estimate of the proportion of voters between 18 and 21 years of age for the entire district will be the fraction of potential
voters from the sampled that fell into this age range.

30
Simple Random Sampling cont…
Estimation of a Population Proportion cont…
• We denote the population proportion and its estimator by the symbols p and 𝑝Ƹ respectively.
• The properties of 𝑝Ƹ for simple random sampling parallel those of the sample mean 𝑦ത if the response measurements are
defined as follows.
• Let yi = 0 if the ith element sampled does not possess the specified characteristic and yi = 1 if it does.
• Then the total number of elements in a sample of size n possessing a specified characteristic is

• If we draw a simple random sample of size n, the sample proportion 𝑝Ƹ is the fraction of the elements in the sample that
possess the characteristic of interest. For example, the estimate 𝑝Ƹ of the proportion of eligible voters between the ages of 18
and 21 in a certain district is

In other words, 𝑝Ƹ is the average of the 0 and 1 values from


the sample. Similarly, we can think of the population
proportion as the average of the 0 and 1 values for the entire
population (i.e., p = μ).
31
Simple Random Sampling cont…
Estimation of a Population Proportion cont…

𝑛
• An unbiased estimate of the population variance is 𝑛−1 𝑝Ƹ 𝑞ො

• The commonly used estimator is indeed biased, slightly, but


simpler in construction.
• The bias in the commonly used estimator is usually very small,
so use of the simpler formulation has its understandable
appeal, but we’ve chosen to use the unbiased statistic.

32
Simple Random Sampling cont…
Estimation of a Population Proportion cont…
EXAMPLE A simple random sample of n=100 college seniors was

selected to estimate (1) the fraction of N= 300 seniors going on to


graduate school and (2) the fraction of students that have held
part-time jobs during college. Let yi and xi (i = 1, 2, . . . , 100)
denote the responses of the ith student sampled. We will set yi = 0
if the ith student does not plan to attend graduate school and yi = 1
if he or she does. Similarly, let xi = 0 if he or she has not held a
part-time job sometime during college and xi = 1 if he or she has.
Using the sample data presented in the accompanying table,
estimate pl, the proportion of seniors planning to attend graduate
school, and p2, the proportion of seniors who have had a part-time
job sometime during their college careers (summers included).

33
Simple Random Sampling cont…
Estimation of a Population Proportion cont…

34
Simple Random Sampling cont…
Estimation of a Population Proportion cont…

35

You might also like