Sample Survey Merged
Sample Survey Merged
• provide an overview of basic concept of sample survey, objectives of a sample survey, relative
suitability and application of complete and sample enumeration, terminology used in sample
survey
• know the common sampling designs and understand when it is appropriate to use each design.
• know how to estimate parameters in different sampling designs and understand how to
compare sampling designs or estimators.
Objective of this course
The course is designed in such a way that it helps to understand basic concept of sample survey, its
objective and application, be familiar with terminology used in sample survey, learn random and
non-random sampling, know the most common statistical designs for sampling. Learning this
course, learners can develop the capability of preparing suitable sampling designs and recognize the
design in an application, and know how to estimate population parameters and how to analyze the
data in different sampling designs, they also understand how to compare sampling designs or
estimators, and know what properties make a better design or estimator.
Learning outcomes of this course
After completion of this course successfully, the learners/students would be able to
• explain basic concept of sample survey, relative advantages, disadvantages of complete and
sample enumeration
• understand basic principles of sample survey, the objectives of a sample survey, application of
sample survey and complete enumeration
be familiar with necessary definitions and important terms used in sample survey
• be acquainted with various methods of data collection and tools of data collection,
• know probability sampling and non- probability sampling
• be familiar with various methods of non- probability sampling
• know the most common designs for probability sampling such as Simple random sampling,
Stratified random sampling and systematic sampling with their applications and limitations
• know how to draw sample, how to estimate parameters and how to analyze the data in various
sampling designs
• know how to compare various sampling designs
• use supplementary information for some special methods of estimation such as ratio estimation,
product method of estimation and regression estimation
1
Contents:
Introduction: Basic concept of sample survey, relative advantages, disadvantages and suitability of
complete and sample enumeration, uses of sample survey, role of sampling theory, requirements of a
good sample design. units, population, sampling units, and sampling frame-related problems, basic
principles of sample survey, pilot survey, random or probability sampling and non-random or purposive
sampling, quota sample,o Polls, mixed sample.
Population values and estimates in sample survey: Bias and its effect, precision and accuracy of
estimates, different types of errors associated with sampling and complete enumeration, various
methods of data collection, questionnaire and schedule.
Simple random sampling (SRS): Advantages and disadvantages, drawing samples in with and
without replacement cases, estimates and standard errors, simple random sampling for proportion-
estimate and standard error, determination of sample size for specified precision, introduction to
other probability sampling schemes.
Stratified random sampling (StRS): Reasons for stratification, stratified random sampling-
estimates, standard errors, allocation of samples to strata-proportional allocation, neyman allocation
(optimum without cost), and optimum allocation (with cost), stratified sampling for proportions,
determination of sample size, estimation of gain due to stratification, the construction of strata,
methods of collapsed strata, post-stratification, deep stratification, comparing with one-way
stratification.
Systematic Sampling: Use, limitation, estimates, bias, standard error and efficiency, comparison
with simple random sampling, systematic sampling for populations with linear trend, methods for
dealing with population in random order, population with linear trend and population with periodic
variation, circular systematic sampling.
Texts
1. Lohr, S.L. (2005), Sampling: Design and Analysis
2. Cochran, W. G. (1977): Sampling Techniques, 3rd edition, Wiley Eastern, New Delhi
References
1. Islam M. N (2014): An Introduction to Sampling methods, 4th edition, Mullick and Brothers,
Dhaka.
2. Levy, P. and Lemeshow, S. (1999): Sampling of Populations: methods and applications,
Wiley, New York.
3. Mukhopadhyay P. (2000): Theory and Methods of Survey Sampling, Prentice-Hall of India (P)
Limited, New Delhi.
4. Yates, F.- Sampling Methods for Censuses and Surveys.
5. Raj, D. and Chandhok, P. (1998): Sample Survey Theory, Narosa Publishing House, New
Delhi.
6. Sukhatme, P.V. and Shukhatme B. V. (1984): Sampling theory of surveys with applications, 2nd
edition,Asia Publishing House, London.
7. Singh, Daroga, Chaudhary, F. S. (1986). Theory and Analysis of Sample Survey Designs,
Wiely, New York.
2
Introduction to Sample Survey
3
Relative advantages and disadvantages of census and sample survey
4
tradition. It is also possible to understand the societal history through job titles and arrangements
for the destitute and sick.
As governments assumed responsibility for schooling and welfare, large government departments
made extensive use of census data. Actuarial estimates could be made to project populations and
plan for provision in local government and regions. It was also possible for central government to
allocate funding on the basis of census data. Even into the mid twentieth century, census data was
only directly accessible to large government departments. However, computers meant that
tabulations could be used directly by university researchers, large businesses and local government
offices. They could use the detail of the data to answer new questions and add to local and
specialist knowledge.
Now, census data are published in a wide variety of formats to be accessible to business, all levels
of governance, media, students and teachers, charities and researchers, and any citizen who is
interested. Data can be represented visually or analyzed in complex statistical models, to show the
difference between certain areas, or to understand the association between different personal
characteristics. Census data offer a unique insight into small areas and small demographic groups
which sample data would be unable to capture with precision.
Some uses of survey methods
A sample is taken almost always to provide statistical data on an extensive range of subjects for
both research and administrative purposes. The following examples are designed to illustrate their
importance of sampling in real life:
a) In opinion poll, a relatively small number of persons are interviewed, and their opinions on
current issues are solicited in order to discover the attitude of the community as a whole.
b) Marketing and advertising agencies conduct countless inquiries to determine customers’
expectations, attitudes, buying habits, or shopping patterns. This information is useful to the
manufacturers of goods for sales promotion.
c) c) Large lots of manufactured products are accepted or rejected by purchasing departments
in business or government following inspection of a relatively small number of items drawn
from these lots.
d) d) At border stations, customs officers enforce the laws by checking the effects of only a
small number of travelers crossing the border.
e) e) A department store wishes to examine whether it is loosing or gaining customers by
drawing a sample from its list of credit card holders by selecting every tenth name.
f) f) Auditors often judge the extent to which the proper accounting procedures have been
followed by examining a small number of transactions, selected from a large number of
such transactions taking place within a specified period of time.
g) g) Ministry of Health and Family Welfare might be interested to know the status of
knowledge among the adult population in Dhaka city on the danger of environmental
pollution.
5
Some other examples of uses of survey methods
Government agencies: Information collected from surveys conducted by government agencies has a
huge influence upon the ways our lives are regulated. Some examples are: labour force surveys for
monitoring the extent of unemployment, and the Consumer Price Index (CPI) which is based on a
survey of prices. (The weightings used to combine the prices are obtained by a survey of patterns of
expenditure.)
Acceptance sampling: Many manufacturers sample from batches of components and raw materials
being brought in. If the sample is not up to specified standards the batch will be sent back.
Accounting data in auditing: Accounting auditors cannot check all the accounts of a company in
detail. Instead they sample invoices or accounts and check just these carefully.
Economic forecasts: Business confidence is a very important ingredient in determining whether the
economy grows or contracts (a recession). Surveys of business opinion play an important part in
economic forecasts.
Ratings for TV/radio audiences: These are based upon the viewing or listening habits of a sample
of people. They determine the price of advertising and thus the income to be spent on
programming.
Sociological research: Investigations are carried out into the way we live, the way society is
organized, and the use of local and national facilities (e.g. national parks). This information can be
a basis for government policy decisions.
Tax collection: In many countries a sample of people has their tax returns audited in detail. This
tends to be in addition to a regular rotation in which everyone is audited every 5 years, say.
6
Useful terms in sampling
Population
A population is the collection or aggregate of all elements or items of interest in a particular study
about which we wish to make an inference.
In other words, a population is a complete set of items being studied in an inference procedure. A
population includes all of the elements under study. A population should always be defined in
terms of its content, units, coverage and time of occurrence.
In research terminology the ‘Population’ can be explain as a comprehensive group of individuals,
objects, institutions and so forth which have a common characteristic that are the interest of a
researcher.
Example: All college students constitute a population if the researcher is interested to study on
college students regarding their socio-economic background or some opinion poll or on any other
issues.
Similarly, all patients, all students, all hospitals of Bangladesh, all private banks in Dhaka city are
some examples of population
Target Population
A target population is the entire group about which information is desired and conclusion is made.
The target population is the population you are interested in your study. This is the population you
want your study findings to be generalized to.
Study population
Study population is a subpopulation of target population that you are taking from the target
population for doing your study. That is the population, which we actually sample, is the study
population.
It is also called sampled population, survey population or accessible population.
Sample
Any part of a population is called a sample. A sample may be representative or not. A sample
however, is desired to be representative for further statistical operations.
Let us see the following figure.
7
For example, suppose in a study on diabetes mellitus, we have drawn a sample of 200 diabetic
patients from BIRDEM. Then these 200 patients are the sample, all patients of BIRDEM are study
population and all diabetic patients are the target population for which the study findings will be
generalized.
Sampling Unit
A sampling unit or simply a unit is a well-defined, distinct and identifiable element or group of
elements on which observation is made. Each element in a population is a sampling unit.
Sample Size
Sample size refers to the number of units contained in a sample. It is usually denoted by n.
Population Size
Population size is the number of units which constitute the population.It is usually denoted by N.
Survey
Survey is a general term that refers to the collection of data by means of interviews, questionnaires
or observation.
Census Survey
A census survey simply census, is an investigation or a count of all the population elements.
Sample Survey
A sample survey is a study involving a subset (or sample) of individuals or objects selected from a
larger population by accepted statistical methods.
Sampling
Sampling is a statistical procedure of drawing a small number of elements from a population (also
called universe); to estimate population parameters and draw conclusion regarding population.
Sample Design
8
Sample design or sampling design refers to the plans and methods to be followed in selecting
sample from the target population and the estimation technique vis-à-vis formula for computing the
sample statistics.
Survey Design
Survey design is the process of preparing a complete plan of operations to be followed in
conducting a survey and disseminating its intended results.
Sampling Frame
A sampling frame is a complete list of units or group of units in the population to be sampled. That
is. It’s a complete list of everyone or everything we want to study. In other words, a sampling frame
is a complete list of sampling units.
Qualities of a sampling frame
An ideal sampling frame will have the following qualities:
• all units can be found – their contact information, map location or other relevant
information is present
• the frame has additional information about the units that allow the use of more advanced
sampling frames
• no elements from outside the population of interest are present in the frame
10
The ordering of the questions is important as it brings logic and flow to the interview. Normally the
respondent is eased into the task with relatively straightforward questions while the more difficult
or sensitive ones are left until they are warmed up. Questions on brand awareness are asked first
unprompted and then they are prompted.
Step 6 – Finalize the layout of the questionnaire
The questionnaire now needs to be fully formatted with clear instructions to the interviewer,
including a powerful introduction, routings and probes. There needs to be enough space to write in
answers and the responses codes need to be well separated from each other so there is no danger of
circling the wrong one.
Step 7 – Pretest and revise
The final step is to test the questionnaire. It usually isn’t necessary to carry out more than 10 to 20
interviews in a pilot because the aim is to make sure that it works, and not to obtain pilot results. In
theory the questionnaire should be piloted using the interviewing method that will be used in the
field (over the phone if telephone interviews are to be used; self completed if it will be a self
completion questionnaire). Time and money can preclude a proper pilot so at the very least it
should be tested on one or two colleagues for sense, flow and clarity of instructions. The whole
purpose of the test is to find out if changes are needed so that final revisions can be made. When
carrying out the pilot it is best to run through the questionnaire with the guinea pig respondent and
then go back over the questions and ask for each one, “what was going through your mind when
you were asked this question?”. Questionnaire design is one of the hardest and yet one of the most
important parts of the market research process. Given the same objectives, two researchers would
probably never design the same questionnaire.
Statistic and Parameter
Definition of Statistic
A statistic is a characteristic of a sample obtained from a small part of the population. It is a
descriptive statistical measure and function of sample observations. The common use of statistic is
to estimate a particular population parameter.
From the given population, it is possible to draw multiple samples, and the result (statistic)
obtained from different samples will vary, which depends on the samples.
Definition of Parameter
A fixed characteristic of a population obtained from all the elements of the population is termed as
the parameter. It is a numerical value that remains unchanged, as every member of the population is
surveyed to know the parameter. It indicates true value, which is obtained after the census is
conducted.
Key Differences Between Statistic and Parameter
The difference between statistic and parameter can be drawn clearly on the following grounds:
11
1. A statistic is a characteristic of a small part of the population, i.e. a statistic is a
characteristic of a sample. A parameter is a characteristic of a population. The parameter
is a fixed measure which describes the target population.
2. The statistic is a variable and known number which depend on the sample of the population
while the parameter is a fixed and unknown numerical value.
3. Statistical notations are different for population parameters and sample statistics.
Different symbols are used to denote statistics and parameters, as Table 1 shows some notations.
Table 1: Comparison of some useful Sample statistic and Population parameter
Mean 𝑥 𝜇
Standard deviation 𝑠 𝜎
Variance 𝑠2 𝜎2
Inferential statistics enables you to make an educated guess about a population parameter based on
a statistic computed from a sample randomly drawn from that population (see Figure 1).
Figure 1: Illustration of the relationship between population & sample and parameter & statistic
For example, say you want to know the mean income of freelancers—a parameter of a population.
You draw a random sample of 100 freelancers and determine that their mean income is Tk. 45,500
per month. You conclude that the population mean income μ is likely to be close to Tk. 45,500 as
well. This example is one of statistical inference.
Estimation, Estimator and Estimate
What is an Estimator?
An estimator is a statistic used for the purpose of estimating an unknown parameter. An estimator
is a function of the data in a sample. You can also think of an estimator as the rule that creates an
estimate. Common estimators are the sample mean and sample variance which are used to estimate
the unknown population mean and variance.
12
What is an Estimate?
An estimate is the numerical value of the estimator when it is actually computed using data from a
specific sample.
What is Estimation?
Estimation is the process by which the numerical value of unknown population values are inferred
from sample data.
What is the difference between an estimator and an estimate?
1. An estimator is a function of a sample of data to be drawn randomly from a population
whereas an estimate is the numerical value of the estimator computed from sample data.
2. An estimator is a random variable and an estimate is a number (that is the computed value
of the estimator).
As referred to above example we can see population parameter, estimator and estimate in the
following table:
𝜇 𝑥 45,500
Bias of an estimator
Suppose we are trying to estimate the parameter 𝜃 using an estimator 𝜃 (that is, some function of
the observed data). Then the bias of 𝜃 is defined to be
𝐵𝑖𝑎𝑠 𝜃 = 𝐸 𝜃 ― 𝜃.
In words, this would be "the expected value of the estimator minus the true value 𝜃." This may be
rewritten as
𝐸(𝜃 ―𝜃).
which would read "the expected value of the difference between the estimator and the true value"
(the expected value of 𝜃 is precisely 𝜃).
In particular, bias is zero, the estimator is called unbiased. Then we have
0= 𝐸 𝜃 ―𝜃
=> 𝐸 𝜃 = 𝜃………(1)
Equation (1) gives the condition of unbiasedness of an estimator.
Example: Suppose we have 3 children in a family whose ages are 1, 3 and 5. We want to select
two of them.
In this particular instance, we say that we have a population of size 3 (i.e. N=3) from which a
sample of size 2 (i.e. n=2) is to be selected without replacement. To select these two children, there
13
will be altogether 3 possible samples each of 2 children. The accompanying table displays all
possible samples of size 2.
Table: All possible samples of size 2 without replacement
1 1, 3 2
2 1, 5 3
3 3, 5 4
Population mean 𝜇 =3
Average of 𝑥 =3
That is
𝐸(𝑥) = 𝜇
So 𝑥 is an unbiased estimator of 𝜇.
Mean squared error
The mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the
average of the squares of the errors—that is, the average squared difference between the estimated
values and the actual value. The squaring is done so negative values do not cancel positive values.
Suppose we are trying to estimate the parameter 𝜃 using an estimator 𝜃 (that is, some function of
the observed data). Then the mean squared error of 𝜃 is defined to be
MSE (𝜃)=E(𝜃-𝜃)2
or
14
In a field study due to time and cost involved, generally, only a section of the population is studied.
These respondents are known as the sample and are representative of the general population or
universe. A sample design is a definite plan for obtaining a sample from a population. It refers to
the technique or the procedure for obtaining a sample from a given population.
Following are the characteristics of good sample design:
1. Sample design should ensure a representative sample: A researcher selects a relatively small
number for a sample from an entire population. This sample needs to closely match all the
characteristics of the entire population. If the sample used in an experiment is a representative
sample, then it will help generalize the results from a small group to large universe being studied.
2. Focus on objectives: The sampling method and sample size must be selected depending upon
the research objectives.
3. Proper selection of sample unit: The sample unit must be appropriate. As per objective, the
universe is defined first which comprises of the units. sometimes the universe comprises of the
elements, and each element can be further divided into units.
4. Sample design should have small sampling error: Sampling error is the error caused by
taking a small sample instead of the whole population for study. Sampling error refers to the
discrepancy that may result from judging all on the basis of a small number. Sampling error is
reduced by selecting a large sample and by using efficient sample design and estimation strategies.
5. Sample design should be economically viable: Studies have a limited budget called the
research budget. The sampling should be done in such a way that it is within the research budget
and not too expensive to be replicated.
6. Sample design should have marginal systematic bias: Systematic bias results from errors in
the sampling procedures which cannot be reduced or eliminated by increasing the sample size. The
best bet for researchers is to detect the causes and correct them.
7. Results obtained from the sample should be generalized and applicable to the whole
universe: The sampling design should be created keeping in mind that samples that it covers the
whole universe of the study and is not limited to a part.
Basic Principles of Sampling
Theory of sampling is based on the following principlesor laws -
• Law of Statistical Regularity – This law comes from the mathematical theory of probability.
According to King,” Law of Statistical Regularity says that a moderately large number of the items
chosen at random from the large group are almost sure on the average to possess the features of the
large group.”
According to this law the units of the sample must be selected at random.
• Law of Inertia of Large Numbers – According to this law, the other things being equal – the
larger the size of the sample; the more accurate the results are likely to be.
Types of Sampling: Sampling with Replacement and Sampling without Replacement
Sampling without Replacement
15
In sampling without replacement, the unit drawn is not returned to the population in subsequent
drawings. Unlike sampling with replacement, the probability of drawing any remaining unit in
successive selection will be increased.
Example: Suppose we have 3 members in a family to whom we assign serial numbers 1, 2 and 3.
We need to select two of them for an interview. In this particular instance, we say that we have a
population of size 3 (i.e. N=3) from which a sample of size 2 (i.e. n=2) is to be selected. To select
these two members without replacement, there will be altogether 3 possible samples each of 2
members. The accompanying table displays all possible samples of size 2 without replacement:
Table: Samples of size 2 without replacement
1 (1,2)
2 (1,3)
3 (2,3)
1 (1,1)
2 (1,2)
3 (1,3)
4 (2,1)
5 (2,2)
6 (2,3)
16
7 (3,1)
8 (3,2)
9 (3,3)
17
In judgment sampling, individuals are selected who are considered to be most representative of the
population as a whole. It is a judgment sampling because choice of the individual units depends
entirely on the sampler, who, on his own judgment, decides the sample to be selected that conforms
to some criteria. In study of labor problem, you may decide to talk only with those who have
experienced discrimination while they were in job.
iv) Quota Sampling
Quota sampling is a non-probability sampling, in which the interviewers are told to contact and
interview a certain number of individuals from certain sub-groups or strata of the population to
make up the total sample. The formation of the strata is usually based on such characteristics as sex,
age, social status, region of residence. These characteristics which are used to form strata, are
termed ‘quota control’. The technique is widely used by market researchers, political opinion
seekers and many others to avoid the cost problems of interviewing the individuals.
V) Snowball Sampling
Snowball sampling is the colorful name for technique of building up a list or a sample of a special
population. Some recent authors have referred to snowball sampling as chain referral or network
sampling.
Snowball sampling is conducted in stages. In the first stage, a few persons possessing the requisite
characteristic are identified and interviewed. These persons are used as informants to identify
others who qualify for inclusion in the sample. The second stage involves interviewing these
persons who can be interviewed in the third stage and so on. For example, consider the selection of
beggars for which no frame is available. This can be best done by asking an initial group of beggars
to supply the names of other beggars they come across.
18
Types of Probability Sampling
The different types of probability sampling are following:
a) Simple Random Sampling;
b) Stratified Sampling;
c) Systematic Sampling;
d) Cluster Sampling.
a) Lottery method
The following steps are followed in drawing a simple random sample by lottery method.
i. First, prepare a sampling frame giving id number for each unit in the population.
ii. Selection a simple random sample is accomplished with the aid of traditional lottery
method.
1
iii. Decide on the random number table to be used.
iv. Choose and N-sized random number from any point in the random number table.
v. If this random number is less than or equal to N, this is your first selected unit.
vi. Move on to the next random number not exceeding N, Vertically horizontally or in any
other direction systematically and choose your second unit.
vii. If at any stage of your selection, the random number chosen exceeds N, discard it and
choose the next random number.
viii. If, further, any random number is repeated, it must also be discarded and be replaced by a
fresh random number appearing next.
ix. The process stops once you arrive at your desired sample size.
Example: Draw a simple random sample of size 5 from a population comprising 150 units.
The random numbers are as follows:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121
Solution: Here n=5 and N=150. Assign serial numbers 001, 002, …, 150 to the 150 units in the
population. Since 150 is a three-digit number, we merely read three-digit random numbers
presented in the random number table. Suppose we start from the leftmost digit of first row of the
random number table and proceed downward until we achieve a sample of 5.
Note that we choose only those numbers, which lie in the range 001-150. Any number lying outside
this range is omitted, since they do not correspond to any unit in the population. The process stops
once we arrive at five numbers. Note that the selected numbers are 130, 108, 61, 63 and 121. These
numbers are underlined with bold faces. All these numbers are distinct. If a random number occurs
twice, the second occurrence is omitted, and another number is selected as its replacement.
The random numbers selected are shown as follows:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121
2
2. Remainder method
This is another method of using random numbers. This procedure has the advantage of having less
rejection rate in the selection process. The procedure is illustrated with the following example.
Example: Draw a simple random sample of size 5 from a population comprising 150 units.
The random numbers are as follows:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121
Solution: The population from which sample of 5 has to be chosen, contains 150 units. Suppose we
start from the leftmost digit of first row of the random number table and proceed along rows.
For selecting a unit from 001-150, follow the steps below:
i. Choose a random number between 001 and 150. The number is 277.
ii. Divide 277 by 150. The remainder is 127, The unit labeled 127 in the population is your
first selected unit.
iii. To select the second unit, choose the next random number. This number is 130, which is
less than 150. We directly choose this number as our second unit in the sample.
iv. The next random number is 802, which results in a remainder of 52 when divided by 150.
The unit corresponding to this number is our third selected unit.
v. Continuing this process, we arrive at the next two numbers. These are 108 and 91.
vi. The random numbers thus chosen are 127, 130, 52, 108 and 91.
Selection a simple random sample may also be accomplished with the aid of computer software, or
a scientific calculator.
3
Example: Following table presents a population data for salaries of 30 employees. Select a simple
random sample of size 10 using random number table and estimate the average salary.
1 30 16 26
2 40 17 15
3 25 18 17
4 35 19 18
5 40 20 20
6 25 21 40
7 40 22 18
8 25 23 25
9 25 24 43
10 15 25 50
11 20 26 15
12 22 27 35
13 30 28 65
14 22 29 24
15 40 30 55
4
Solution:
We have the following random numbers:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121
Using remainder method we start from 1st row and 1st column and proceed along row. The selected
IDs with their salaries are presented in the table.
10
Total - n
x
i 1
i
5
The calculation of average is given below.
n
x i
x i 1
.Tk .....
n
Example: Following table presents a population data for gender distribution of 30 employees.
Select a simple random sample of size 10 using random number table and estimate the proportion
male and female employees.
ID Gender ID Gender
1 Male 16 Male
2 Female 17 Male
3 Male 18 Female
4 Male 19 Male
5 Female 20 Male
6 Male 21 Female
7 Female 22 Male
8 Male 23 Female
9 Female 24 Male
10 Female 25 Male
11 Male 26 Male
12 Male 27 Female
13 Male 28 Male
14 Female 29 Female
15 Female 30 Female
6
Solution:
We have the following random numbers:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121
Using remainder method we start from 1st row and 1st column and proceed along column. The
selected IDs with their gender are presented in the table.
Sl Selected ID Gender
10
Following table prepared for presenting the proportions male and female employees.
Male
Female
Total
Comment: The estimated proportions of male and female employees are …..% and……%
respectively.
7
Properties of simple random sampling
1. In sampling with replacement, the probability of selecting any specified unit 𝑢𝑖 from a population
of 𝑁 units in any draw is
1
𝑃(𝑢𝑖 ) = ;𝑖 = 1, 2, 3, ……𝑁
𝑁
2. In sampling without replacement, the probability that a specified unit 𝑢𝑖 from a population of 𝑁
units, will be selected on any draw, such that it was not selected on previous draws is equal to the
1
probability of selecting it on the first draw and it equals to 𝑁.
Proof:
Let
𝑃𝑟 = probability that a specified unit 𝑢𝑖 is selected on rth draw, such that it was not selected on
previous draws.
Then obviously,
1
𝑃1 = 𝑃(𝑢𝑖 is selected on 1 𝑠𝑠 draw) = .
𝑁
𝑃2 = 𝑃(𝑢𝑖 is not selected on 1 𝑠𝑠 draw).𝑃(𝑢𝑖 is selected on 2 𝑛𝑛 draw)
𝑁―1 1
= ×
𝑁 𝑁―1
1
= .
𝑁
𝑃3 = 𝑃(𝑢𝑖 is not selected on 1 𝑠𝑠 draw).𝑃(𝑢𝑖 is not selected on 2 𝑛𝑛 draw).𝑃(𝑢𝑖 is selected on 3 𝑟𝑟 draw)
𝑁―1 𝑁―2 1
= × ×
𝑁 𝑁―1 𝑁―2
1
= .
𝑁
Thus, for rth draw
𝑁―1 𝑁―2 𝑁―3 𝑁 ― (𝑟 ― 1) 1
𝑃𝑟 = × × × …… × ×
𝑁 𝑁―1 𝑁―2 𝑁 ― (𝑟 ― 2) 𝑁 ― (𝑟 ― 1)
1
= .
𝑁
Thus, it is evident (from property 1 and 2) that the probability of selecting a specified unit 𝑢𝑖 of the
population on any draw is equal to the probability of selecting it on the first draw (which equals to
1
𝑁
) irrespective of whether the units are drawn with replacement or without replacement.
3. For sampling without replacement the possible number of different combinations of n elements
formed from 𝑁 elements is 𝑁𝐶𝑛 while for sampling with replacement, the possible number of
combinations is 𝑁 𝑛𝑛 which are not all different.
8
4. In simple random sampling with replacement there are 𝑁 𝑛 distinct samples and each possible
combination of 𝑛 different units out of 𝑁 has the same probability of being selected, and it equals to
1
𝑁𝑛
.
Proof:
1
At first draw, the probability that any unit out of 𝑁 units will be selected is 𝑁 which remains same
at any draw as the sampling is with replacement. Moreover, in sampling with replacement each
draw is independent of others.
5. In simple random sampling without replacement there are 𝑁𝐶𝑛 distinct samples and each possible
combination of n different units out of N has the same probability of being selected and it equals to
1
𝑁𝐶
.
𝑛
Proof:
𝑛
At 1st draw, the probability that one of n specified units will be selected is 𝑁 .
𝑛―1
At 2nd draw, the probability that one of remaining (n-1) specified units will be selected is 𝑁―1 .
𝑛―2
At 3rd draw, the probability that one of remaining (n-2) specified units will be selected is 𝑁―2 .
.
.
.
1
At nth draw, the probability that the remaining 1 unit will be selected is 𝑁―(𝑛―1) .
Hence the probability that all 𝑛specified units are selected in 𝑛draws is
𝑛 𝑛―1 𝑛―2 1
. .
𝑁 𝑁―1 𝑁―2
. ……
𝑁―(𝑛―1)
𝑛!(𝑁 ― 𝑛)!
=
𝑁!
1
= 𝑁𝐶
.
𝑛
1
Therefore, each combination of 𝑛units has the same probability of being selected, which is 𝑁𝐶 .
𝑛
9
Advantages of simple random sample
This is the ideal method of sampling.
Highly representative if subjects are not much heterogeneous.
Estimates are easy to calculate.
10
Estimation of population parameters in simple random sampling
The frequent objective of a sample survey is to estimate the population mean, population total,
population variance, ratio of two totals etc. with a view to draw inferences about a population from
information contained in a sample.
For example, we might be interested in the mean Taka value for the wage paid to the employees or
the total amount in taka. Hence, we consider estimation of the two population parameters here viz.
the mean and the total.
Notations and formulae
The notations and formulae used for the mean, total and variance in case of sample and population
are summarized in the following table.
Size N n
Values of 𝑦1 , 𝑦2 , …, 𝑦𝑁 𝑦1 , 𝑦2 , …, 𝑦𝑛
some variable
𝑁 𝑛
Total 𝑌= 𝑦𝑖 = 𝑦1 + 𝑦2 + … + 𝑦𝑁 𝑦= 𝑦𝑖 = 𝑦1 + 𝑦2 + … + 𝑦𝑛
𝑖=1 𝑖=1
𝑁 𝑛
∑𝑖=1 𝑦𝑖 𝑦1 + 𝑦2 + … + 𝑦𝑁 ∑𝑖=1 𝑦𝑖
𝑌= = 𝑦=
Mean 𝑁 𝑁 𝑛
𝑦1 + 𝑦2 + … + 𝑦𝑛
=
𝑛
𝑁
∑𝑖=1 (𝑦𝑖 ― 𝑌) 2 no sample notation used
variance 𝜎2 =
𝑁
𝑛
Modified 𝑁
∑𝑖=1 (𝑦𝑖 ― 𝑌) 2 ∑𝑖=1 (𝑦𝑖 ― 𝑦) 2
definition of 𝑆2 = 𝑠2 =
𝑁―1 𝑛―1
variance
Population
𝑌 𝑦
Mean
Population
𝑌 𝑌 = 𝑁𝑦
Total
11
variance
𝑆2 𝑠 2 (in sampling without replacement)
Estimation of parameters
We are interested to find the estimators of three types of parameters- mean, total and variance.
Also, the expected value of the estimator and its variance will be obtained. However, these all are
discussed as per convenient sequence of understanding.
b) The sample mean 𝑦 for a simple random sample of size n is an unbiased estimator of population
mean 𝑌.
Symbolically,
𝐸(𝑦) = 𝑌.
Proof:
By definition
𝑛
∑𝑖=1 𝑦𝑖
𝑦=
𝑛
and
𝑁
∑𝑖=1 𝑦𝑖
𝑌=
𝑁
Taking expectation
𝑛 𝑛
1 1
𝐸(𝑦) = 𝐸 𝑦𝑖 = 𝐸(𝑦𝑖 ) ………(1)
𝑛 𝑛
𝑖=1 𝑖=1
12
Now by definition,
𝑁
𝐸(𝑦𝑖 ) = 𝑦𝑖 𝑃(𝑦𝑖 )
𝑖=1
We now need to evaluate𝑃(𝑦𝑖 ), the probability that the 𝑦𝑖 , the ith unit of the population is selected
1
at the rth draw. By the property of simple random sampling this probability is𝑁 .
Hence
𝑁 𝑁
1
𝐸(𝑦𝑖 ) = 𝑦𝑖 𝑃(𝑦𝑖 ) = 𝑦𝑖 =𝑌
𝑁
𝑖=1 𝑖=1
Proof:
𝐸(𝑌) = 𝐸(𝑁𝑦)
= 𝑁𝐸(𝑦)
= 𝑁𝑌 [ ∵ 𝐸(𝑦) = 𝑌]
13
=𝑌.
(proved)
Variance of 𝒚
𝑦
𝑉(𝑦) = 𝑉
𝑛
1
𝑉(𝑦) = 2 𝑉(𝑦)……………………(1)
𝑛
By definition
2
𝑉(𝑦) = 𝐸 𝑦 ― 𝐸(𝑦) ………………(2)
Now,
𝑛
𝐸 (𝑦) = 𝐸 𝑦𝑖 )
𝑖=1
𝑛
= 𝐸(𝑦𝑖 )
𝑖=1
𝑁
= 𝑛. 𝑦𝑖 𝑃(𝑦𝑖 )
𝑖=1
𝑁
1 1
= 𝑛. 𝑦𝑖 [ ∵ 𝑃(𝑦𝑖 ) = ,𝑏𝑦 𝑡ℎ𝑒 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦 𝑜𝑓 𝑆𝑅𝑆]
𝑁 𝑁
𝑖=1
= 𝑛𝑌
∴ 𝐸 (𝑦) = 𝑛𝑌………………(3)
Substituting (3) in (2),
2
𝑉(𝑦) = 𝐸 𝑦 ― 𝑛𝑌
𝑛 𝑛 2
=𝐸 𝑦𝑖 ― 𝑌
𝑖=1 𝑖=1
14
𝑛 2
=𝐸 (𝑦𝑖 ― 𝑌)
𝑖=1
2
= 𝐸 (𝑦1 ― 𝑌) + (𝑦2 ― 𝑌) + … + (𝑦𝑛 ― 𝑌)
𝑛
2
=𝐸 𝑦1 ― 𝑌 + (𝑦2 ― 𝑌) 2 + … + (𝑦𝑛 ― 𝑌) 2 + 𝐸 (𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌)
𝑖≠𝑗
𝑛 𝑛
= 𝑛𝜎 2 + 𝐸(𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌)
𝑖≠𝑗
𝑛
15
From second part of (4),
𝐸(𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌)
= 𝐸(𝑦𝑖 𝑦𝑗 ― 𝑦𝑖 𝑌 ― 𝑦𝑗 𝑌 + 𝑌2 )
= 𝐸(𝑦𝑖 𝑦𝑗 ) ― 𝑌𝐸(𝑦𝑖 ) ― 𝑌𝐸(𝑦𝑗 ) + 𝑌2
= 𝐸(𝑦𝑖 𝑦𝑗 ) ― 𝑌2 ― 𝑌2 +𝑌2
= 𝐸(𝑦𝑖 𝑦𝑗 ) ― 𝑌2
𝑁 𝑁 2
∑𝑖≠𝑗 𝑦𝑖 𝑦𝑗 ∑𝑖=1 𝑦𝑖
= ―
𝑁(𝑁 ― 1) 𝑁
𝑁 2 𝑁 𝑁 2
2
1 ( ∑𝑖=1 𝑦𝑖 ) ― ∑𝑖=1 𝑦𝑖 ( ∑𝑖=1 𝑦𝑖 )
= ―
𝑁 𝑁―1 𝑁
𝑁 2 𝑁 𝑁 2
1 ( ∑𝑖=1 𝑦𝑖 ) ∑𝑖=1 𝑦𝑖2 ( ∑𝑖=1 𝑦𝑖 )
= ― ―
𝑁 𝑁―1 𝑁―1 𝑁
𝑁 2 𝑁
1 1 1 ∑𝑖=1 𝑦2𝑖
= 𝑦𝑖 ― ―
𝑁 𝑁―1 𝑁 𝑁―1
𝑖=1
𝑁 2 𝑁
1 𝑁―𝑁+1 ∑𝑖=1 𝑦𝑖2
= 𝑦𝑖 ―
𝑁 𝑁(𝑁 ― 1) 𝑁―1
𝑖=1
𝑁 2 𝑁
1 ( ∑𝑖=1 𝑦𝑖 ) ∑𝑖=1 𝑦𝑖2
= ―
𝑁 𝑁(𝑁 ― 1) 𝑁―1
𝑁 2
𝑁
―1 ∑𝑖=1 𝑦𝑖
= 𝑦𝑖2 ―
𝑁(𝑁 ― 1) 𝑁
𝑖=1
𝑁
―1
= (𝑦𝑖 ― 𝑌) 2
𝑁(𝑁 ― 1)
𝑖=1
𝑁
―1 ∑𝑖=1 (𝑦𝑖 ― 𝑌) 2
=
𝑁―1 𝑁
𝜎2
=―
𝑁―1
Now the second part of (4) is
𝑛
𝐸 𝑦𝑖 ― 𝑌 𝑦𝑗 ― 𝑌
𝑖≠𝑗
16
𝜎2
= ―𝑛(𝑛 ― 1).
𝑁―1
𝑛
𝜎2
∴ 𝐸 𝑦𝑖 ― 𝑌 𝑦𝑗 ― 𝑌 = ―𝑛(𝑛 ― 1). ………(7)
𝑁―1
𝑖≠𝑗
Thus, for sampling without replacement we have two forms of 𝑉(𝑦): in terms of 𝑆 2 and interms of
𝜎2 .
𝑽(𝒚) in terms of 𝑺𝟐 :
𝑆2 𝑁 ― 𝑛
𝑉(𝑦) = .
𝑛 𝑁
𝑆2 𝑛
= 1―
𝑛 𝑁
𝑆2
= (1 ― 𝑓)
𝑛
𝑛
Where = 𝑁 ; 𝑓 is called sampling fraction.
𝑽(𝒚) in terms of 𝝈𝟐 :
𝜎2 𝑁 ― 𝑛
𝑉(𝑦) = .
𝑛 𝑁―1
17
𝑁―𝑛 𝑁―𝑛
If N is large compared to n, we can write 𝑁―1 ― = 1 ― 𝑓.
𝑁
Hence,
𝜎2
𝑉(𝑦) ― (1 ― 𝑓)
𝑛
𝟏―𝒇
𝑺.𝑬.(𝒚) = 𝑺
𝒏
𝝈𝟐
𝑺.𝑬.(𝒚)― (𝟏 ― 𝒇)
𝒏
………………………….………………………….………………………….……………………
𝑵𝒐𝒕𝒆:𝐻𝑒𝑟𝑒 𝑤𝑒 𝑢𝑠𝑒 𝑖 ≠ 𝑗 𝑎𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑖𝑠 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑟𝑒𝑝𝑙𝑎𝑐𝑒𝑚𝑒𝑛𝑡.
𝑁
…………………………..………………………….………………………….……………………
Relation between 𝑽(𝒚) for sampling with replacement and 𝑽(𝒚) for sampling without
replacement
We have
𝜎2
for sampling with replacement 𝑉(𝑦) =
𝑛
𝜎 2 𝑁―𝑛
for sampling without replacement 𝑉(𝑦) = .
𝑛 𝑁―1
𝑁―𝑛
Thus, 𝑉(𝑦) in sampling without replacement is 𝑁―1 times its value in sampling with replacement.
𝑁―𝑛 𝑛 𝑛
Provided that N is large compared with n, 𝑁―1
―1― = 1 ― 𝑓, 𝑤ℎ𝑒𝑟𝑒 𝑓 = 𝑁 ,
𝑁
𝑁―𝑛
and 𝑁―1 is less than 1 for any n such that 1 < 𝑛 < 𝑁.
Therefore, 𝑉(𝑦) in sampling without replacement is less than the 𝑉(𝑦) in sampling with
replacement.
That is
𝜎2 𝑁 ― 𝑛 𝜎2
. < for any n such that 1 < 𝑛 < 𝑁.
𝑛 𝑁―1 𝑛
19
[What do you mean by finite population correction and sampling fraction? What happen to 𝑉(𝑦) in
case of small sampling fraction?]
𝑛
∑𝑖=1 (𝑦𝑖 ― 𝑦) 2
𝑠2 =
𝑛―1
and
𝑁
∑𝑖=1 (𝑦𝑖 ― 𝑌) 2
𝑆2 =
𝑁―1
Therefore,
𝑛
1
𝐸(𝑠 2 ) = 𝐸 (𝑦𝑖 ― 𝑦) 2
𝑛―1
𝑖=1
𝑛
1 2
= 𝐸 { 𝑦𝑖 ― 𝑌 ― 𝑦 ― 𝑌 }
𝑛―1
𝑖=1
𝑛
1 2 2
= 𝐸 𝑦𝑖 ― 𝑌 ―𝑛 𝑦―𝑌
𝑛―1
𝑖=1
𝑛
1 2 2
= 𝐸 𝑦𝑖 ― 𝑌 ― 𝑛𝐸 𝑦 ― 𝑌
𝑛―1
𝑖=1
1
= 𝑛𝜎 2 ― 𝑛𝑉(𝑦) ………(1)
𝑛―1
20
1 1
= .𝑛𝜎 2 1 ―
𝑛―1 𝑛
= 𝜎2
Thus, for sampling with replacement
𝑠 2 is an unbiased estimator of 𝜎 2 .
……………………………………
Note:
𝑛
2
𝑦𝑖 ― 𝑌 ― 𝑦 ― 𝑌
𝑖=1
𝑛 𝑛 𝑛
2 2
= 𝑦𝑖 ― 𝑌 + 𝑦―𝑌 ―2 𝑦𝑖 ― 𝑌 𝑦―𝑌
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛
2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ―2 𝑦―𝑌 𝑦𝑖 ― 𝑌
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ―2 𝑦―𝑌 𝑦𝑖 ― 𝑌
𝑖=1 𝑖=1 𝑖=1
𝑛
2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ― 2 𝑦 ― 𝑌 𝑛𝑦 ― 𝑛𝑌
𝑖=1
𝑛
2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ― 2 𝑦 ― 𝑌 [𝑛 𝑦 ― 𝑌 ]
𝑖=1
𝑛
2 2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ― 2𝑛 𝑦 ― 𝑌
𝑖=1
21
𝑛
2 2
= 𝑦𝑖 ― 𝑌 ―𝑛 𝑦―𝑌
𝑖=1
……………………………………
Use of 𝒔𝟐
𝑠 2 is used as an estimator of population variance.
𝑠 2 is used to estimate 𝑉(𝑦) and 𝑉(𝑌).
𝑠 2 is used to estimate 𝑆.𝐸.(𝑦) and 𝑆.𝐸.(𝑌).
𝑠 2 is used to find (1 ― 𝛼)% confidence interval for 𝑦 and 𝑌.
In both the cases of sampling with and without replacement, 𝑉(𝑦) involves unknown population
parameters 𝜎 2 and 𝑆 2 respectively. Therefore 𝑉(𝑦) can not be obtained from a sample, rather it
should be estimated. Hence to get the estimator of 𝑉(𝑦) in both cases, we have to estimate 𝜎 2 and
𝑆2.
Since 𝑠 2 is used as an estimator of 𝜎 2 and 𝑆 2 in cases of sampling with and without replacement
respectively, we obtain the estimator of 𝑉(𝑦) in both cases by replacing 𝜎 2 and 𝑆 2 respectively by
𝑠 2 . Also, we obtain the estimator of 𝑉(𝑌) in both cases accordingly.
Using the formulae of 𝑉(𝑦) and 𝑉(𝑌), we get unbiased estimators of 𝑉(𝑦) and 𝑉(𝑌) as follows.
22
𝑠2
𝑣(𝑦) = 𝜎2𝑦 = 𝑠2𝑦 = for sampling with replacement
𝑛
𝑠2
𝑣(𝑦) = 𝜎2𝑦 = 𝑠2𝑦 = (1 ― 𝑓) for sampling without replacement
𝑛
𝑁2𝑠2
𝑣 𝑌 = 𝜎𝑌2 = 𝑠𝑌2 = for sampling with replacement
𝑛
𝑁2𝑠2
𝑣 𝑌 = 𝜎𝑌2 = 𝑠𝑌2 = (1 ― 𝑓) for sampling without replacement
𝑛
𝑠
𝜎𝑦 = 𝑠𝑦 = for sampling with replacement
𝑛
1―𝑓
𝜎𝑦 = 𝑠𝑦 = 𝑠 𝑛
for sampling without replacement
𝑁𝑠
𝜎𝑌 = 𝑠𝑌 = for sampling with replacement
𝑛
1―𝑓
𝜎𝑌 = 𝑠𝑌 = 𝑁𝑠 𝑛
for sampling without replacement
23
Table: Estimation of parameters in SRS at a glance
Population Population
Population Population Mean Population Total variance
variance
parameter 𝑌 𝑌
𝑆2 𝜎2
Estimator 𝑦 𝑌 = 𝑁𝑦 𝑠2 𝑠2
Expected
value of 𝐸(𝑦) = 𝑌 𝐸(𝑌) = 𝑌 𝐸(𝑠 2 ) = 𝑆 2 𝐸(𝑠 2 ) = 𝜎 2
estimator
Standard
𝑆.𝐸.(𝑦) = 𝜎𝑦 𝑆.𝐸. 𝑌 = 𝜎𝑌
𝑆.𝐸.(𝑦) = 𝜎𝑦 𝑆.𝐸. 𝑌 = 𝜎𝑌
error of
1―𝑓 𝜎 1―𝑓 𝑁𝜎 - -
estimator =𝑆 = =
𝑛 = 𝑁𝑆
𝑛 𝑛 𝑛
Estimator 𝑣(𝑦) = 𝜎𝑦2 = 𝑠𝑦2 𝑣(𝑦) = 𝜎𝑦2 = 𝑠𝑣𝑦2 𝑌 = 𝜎𝑌2 = 𝑠𝑌2 𝑣 𝑌 = 𝜎𝑌2 = 𝑠𝑌2
of variance
𝑠2 𝑁2𝑠2 - -
of = (1 2 = (12 2
estimator 𝑛 𝑠 𝑛 𝑁 𝑠
= =
― 𝑓) 𝑛 ― 𝑓) 𝑛
Estimator 𝜎𝑌 = 𝑠𝑌
𝜎𝑦 = 𝑠𝑦
of 𝜎𝑦 = 𝑠𝑦
𝑁𝑠
standard 1―𝑓 𝑠 1―𝑓 𝜎𝑌 = 𝑠𝑌 = - -
=𝑠 = = 𝑁𝑠 𝑛
error of 𝑛 𝑛 𝑛
estimator
24
Theorem: The covariance between 𝑥 and𝑦 in a simple random sample of size n units from a
population of N units without replacement is given by
𝑆𝑥𝑥 𝑁―𝑛 𝜎𝑥𝑥 𝑁―𝑛
(a) 𝜎𝑥𝑦 = 𝐶𝑜𝑣(𝑥, 𝑦) = . = .
𝑛 𝑁 𝑛 𝑁―1
where
𝜎𝑥𝑦 = 𝐶𝑜𝑣(𝑥, 𝑦) = 𝐸(𝑥 ― 𝑋)(𝑦 ― 𝑌)
𝑁
∑ 𝑥𝑖 ― 𝑋 𝑦𝑖 ― 𝑌
𝜎𝑥𝑥 = 𝐶𝑜𝑣(𝑥,𝑦) = 𝐸 𝑥𝑖 ― 𝑋 𝑦𝑖 ― 𝑌 = 𝑖=1
𝑁
𝑁
∑ 𝑥𝑖 ― 𝑋 𝑦𝑖 ― 𝑌 𝑁
𝑆𝑥𝑥 = 𝑖=1 = 𝜎
𝑁―1 𝑁 ― 1 𝑥𝑥
𝑁 2
∑ 𝑥𝑖 ― 𝑋
𝑆2𝑥 = 𝑖=1
𝑁―1
𝑁 2
∑ 𝑦𝑖 ― 𝑌
𝑆2𝑦 = 𝑖=1
𝑁―1
𝐶𝑜𝑣(𝑥, 𝑦)
𝜌𝑥𝑦 =
𝑉(𝑥)𝑉(𝑦)
𝐶𝑜𝑣(𝑥, 𝑦) 𝑆𝑥𝑥
𝜌𝑥 𝑦 = =
𝑉(𝑥)𝑉(𝑦) 𝑆𝑥 𝑆𝑦
Proof (a):
Let 𝑢𝑖 = 𝑥𝑖 + 𝑦𝑖𝑖 , so that 𝑢 = 𝑥 + 𝑦 ………. (1)
𝑆2𝑢 𝑁 ― 𝑛
𝑉(𝑢) = 𝐸(𝑢 ― 𝑈) 2 = .
𝑛 𝑁
𝑁
𝑁 ― 𝑛 ∑𝑖=1 (𝑢𝑖 ― 𝑈) 2
=> 𝐸(𝑢 ― 𝑈) 2 = ………(3)
𝑁𝑛 𝑁―1
25
Using (1) and (2)
𝐸(𝑢 ― 𝑈) 2 = 𝐸[(𝑥 + 𝑦) ― (𝑋 + 𝑌)] 2
2
= 𝐸[ 𝑥 ― 𝑋 + (𝑦 ― 𝑌)]
= 𝐸(𝑥 ― 𝑋) 2 + 𝐸(𝑦 ― 𝑌) 2 + 2𝐸 𝑥 ― 𝑋 (𝑦 ― 𝑌)
= 𝑉(𝑥) + 𝑉(𝑦) + 2𝐶𝑜𝑣(𝑥, 𝑦)
2
𝑆2𝑥 𝑁 ― 𝑛 𝑆𝑦 𝑁 ― 𝑛
= . + . + 2𝐶𝑜𝑣(𝑥, 𝑦)
𝑛 𝑁 𝑛 𝑁
𝑁―𝑛 2 𝑁―𝑛 2
= 𝑆 + 𝑆 + 2𝐶𝑜𝑣(𝑥, 𝑦)
𝑁𝑛 𝑥 𝑁𝑛 𝑦
𝑁―𝑛 2
=> 𝐸(𝑢 ― 𝑈) 2 = 𝑆𝑥 + 𝑆2𝑦 + 2𝐶𝑜𝑣(𝑥, 𝑦)………(4)
𝑁𝑛
Now the second term of RHS of (3) can be expressed as
𝑁 𝑁 2
∑𝑖=1 (𝑢𝑖 ― 𝑈) 2 ∑ [ 𝑥𝑖 + 𝑦𝑖 ) ― (𝑋 + 𝑌 ]
= 𝑖=1
𝑁―1 𝑁―1
𝑁 2
∑ [ 𝑥𝑖 ― 𝑋) + (𝑦𝑖 ― 𝑌 ]
= 𝑖=1
𝑁―1
𝑁 2 𝑁 2 𝑁
∑ 𝑥𝑖 ― 𝑋 ∑ 𝑦𝑖 ― 𝑌 2 ∑𝑖=1 𝑥𝑖 ― 𝑋)(𝑦𝑖 ― 𝑌
= 𝑖=1 + 𝑖=1 +
𝑁―1 𝑁―1 𝑁―1
𝑁 2
∑ 𝑢𝑖 ― 𝑈
=> 𝑖=1 = 𝑆2𝑥 + 𝑆2𝑦 + 2𝑆𝑥𝑥 ………(5)
𝑁―1
26
Proof (b):
By definition
𝑁―𝑛
𝜌𝑥𝑦 =
𝐶𝑜𝑣(𝑥, 𝑦)
= 𝑁𝑛 𝑆𝑥𝑥 =
𝑆𝑥𝑥
= 𝜌𝑥 𝑦
𝑉(𝑥)𝑉(𝑦) 𝑁―𝑛 2 𝑁―𝑛 2 𝑆𝑥 𝑆𝑦
𝑁𝑛 𝑆𝑥 𝑁𝑛 𝑆𝑦
(proved)
Corollary:
Estimators of 𝐶𝑜𝑣(𝑥, 𝑦) and 𝜌𝑥𝑦 are
𝑁―𝑛 𝑠
𝜎𝑥𝑦 = 𝑠𝑥𝑥 and 𝑟𝑥𝑦 = 𝑠 𝑥𝑥
𝑠
= 𝑟𝑥 𝑦 respectively.
𝑁𝑛 𝑥 𝑦
where
𝑛
∑ (𝑥𝑖 ― 𝑥) (𝑦𝑖 ― 𝑦)
𝑠𝑥𝑥 = 𝑖=1
𝑛―1
𝑛
∑𝑖=1 (𝑥𝑖 ― 𝑥)2
𝑠2𝑥 =
𝑛―1
𝑛
∑𝑖=1 (𝑦𝑖― 𝑦)2
𝑠2𝑦 =
𝑛―1
27
Exercise 1: Following table presents a population data for salaries of 30 employees. Select a
simple random sample of size 10 using random number table and estimate the average salary.
Also estimate the standard error of sample mean.
1
Solution:
Hints:
We have the following random numbers:
277 130 802 108 541 603 497 786 666 440 413 258 161
414 945 416 502 413 258 061 608 809 195 609 923 779
493 063 609 923 779 381 396 840 474 433 953 407 582
642 668 724 210 953 407 582 895 154 121 108 541 603
Using remainder method, we start from 1st row and 1st column and proceed along row. The
selected IDs with their salaries are presented in the table.
x
i 1
i
x i
x i 1
.Tk .....
n
Comment: The estimated average salary of 30 employees is Taka…..
Estimation of standard error of sample mean
We have
2
𝑛
∑𝑖=1 (𝑦𝑖 ― 𝑦) 2
𝑠2 =
𝑛―1
2
n
n
yi
yi i 1
2
n =..........
s
2 i 1
n 1
Now,
1―𝑓
𝜎𝑦 = 𝑠𝑦 = 𝑠 =…
𝑛
3
Exercise 2: Following table presents a population data for gender distribution of 30 employees.
Select a simple random sample of size 10 using random number table and estimate the
proportion male and female employees.Also estimate the standard errors of sample proportions
(for both male and female).
ID Gender ID Gender
1 Male 16 Male
2 Female 17 Male
3 Male 18 Female
4 Male 19 Male
5 Female 20 Male
6 Male 21 Female
7 Female 22 Male
8 Male 23 Female
9 Female 24 Male
10 Female 25 Male
11 Male 26 Male
12 Male 27 Female
13 Male 28 Male
14 Female 29 Female
15 Female 30 Female
Solution:
Hints:
We have the following random numbers:
277 130 802 108 541 603 497 786 666 440 413 258 161
414 945 416 502 413 258 061 608 809 195 609 923 779
493 063 609 923 779 381 396 840 474 433 953 407 582
642 668 724 210 953 407 582 895 154 121 108 541 603
Using remainder method we start from 1st row and 1st column and proceed along column. The
selected IDs with their gender are presented in the table.
Sl Selected ID Gender
1
2
3
4
4
5
6
7
8
9
10
Following table prepared for presenting the proportions male and female employees.
Table: Distribution of employeesby gender
Comment: The estimated proportions of male and female employees are …..% and……%
respectively.
Standard error of proportion (male)
𝑁―𝑛 𝑝𝑞
𝑣(𝑝) = 𝜎𝑝2 = 𝑠𝑝2 = . = …………………
𝑁 𝑛―1
𝑁―𝑛 𝑝𝑞
𝑣(𝑝) = 𝜎𝑝2 = 𝑠𝑝2 = . = …………………
𝑁 𝑛―1
Exercise 3:
Following table presents a sample data collected in a household survey. The survey was
conducted in a village of 500 households.
i) Estimate the average household size and estimate its standard error.
ii) Estimate the total number of household members in the village and estimate its standard
error.
5
2 5 4 12 3 6 22 5 4
3 6 3 13 6 3 23 7 3
4 7 2 14 4 5 24 5 6
5 3 5 15 4 4 25 6 5
6 5 3 16 5 5 26 4 4
7 4 4 17 6 6 27 4 6
8 6 5 18 7 3 28 6 5
9 5 4 19 4 3 29 3 6
10 4 3 20 6 4 30 6 3