0% found this document useful (0 votes)
14 views51 pages

Sample Survey Merged

The document outlines the course STA 2102: Introduction to Sample Survey, which aims to provide students with a comprehensive understanding of sample surveys, including concepts, objectives, and various sampling designs. It covers essential topics such as the advantages and disadvantages of census versus sample surveys, methods of data collection, and statistical techniques for estimating population parameters. Upon completion, students will be equipped to apply sampling techniques effectively and analyze data from various sampling designs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views51 pages

Sample Survey Merged

The document outlines the course STA 2102: Introduction to Sample Survey, which aims to provide students with a comprehensive understanding of sample surveys, including concepts, objectives, and various sampling designs. It covers essential topics such as the advantages and disadvantages of census versus sample surveys, methods of data collection, and statistical techniques for estimating population parameters. Upon completion, students will be equipped to apply sampling techniques effectively and analyze data from various sampling designs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

STA 2102: Introduction to Sample Survey Credits: 2

(Examination 35, Continuous Assessment 15)


Number of Lecturers - Minimum 26(hours)
Duration of Examination: 2 hours
3 questions to be answered from 5
Aim of this course

• provide an overview of basic concept of sample survey, objectives of a sample survey, relative
suitability and application of complete and sample enumeration, terminology used in sample
survey
• know the common sampling designs and understand when it is appropriate to use each design.
• know how to estimate parameters in different sampling designs and understand how to
compare sampling designs or estimators.
Objective of this course
The course is designed in such a way that it helps to understand basic concept of sample survey, its
objective and application, be familiar with terminology used in sample survey, learn random and
non-random sampling, know the most common statistical designs for sampling. Learning this
course, learners can develop the capability of preparing suitable sampling designs and recognize the
design in an application, and know how to estimate population parameters and how to analyze the
data in different sampling designs, they also understand how to compare sampling designs or
estimators, and know what properties make a better design or estimator.
Learning outcomes of this course
After completion of this course successfully, the learners/students would be able to

• explain basic concept of sample survey, relative advantages, disadvantages of complete and
sample enumeration
• understand basic principles of sample survey, the objectives of a sample survey, application of
sample survey and complete enumeration
be familiar with necessary definitions and important terms used in sample survey
• be acquainted with various methods of data collection and tools of data collection,
• know probability sampling and non- probability sampling
• be familiar with various methods of non- probability sampling
• know the most common designs for probability sampling such as Simple random sampling,
Stratified random sampling and systematic sampling with their applications and limitations
• know how to draw sample, how to estimate parameters and how to analyze the data in various
sampling designs
• know how to compare various sampling designs
• use supplementary information for some special methods of estimation such as ratio estimation,
product method of estimation and regression estimation

1
Contents:
Introduction: Basic concept of sample survey, relative advantages, disadvantages and suitability of
complete and sample enumeration, uses of sample survey, role of sampling theory, requirements of a
good sample design. units, population, sampling units, and sampling frame-related problems, basic
principles of sample survey, pilot survey, random or probability sampling and non-random or purposive
sampling, quota sample,o Polls, mixed sample.
Population values and estimates in sample survey: Bias and its effect, precision and accuracy of
estimates, different types of errors associated with sampling and complete enumeration, various
methods of data collection, questionnaire and schedule.
Simple random sampling (SRS): Advantages and disadvantages, drawing samples in with and
without replacement cases, estimates and standard errors, simple random sampling for proportion-
estimate and standard error, determination of sample size for specified precision, introduction to
other probability sampling schemes.
Stratified random sampling (StRS): Reasons for stratification, stratified random sampling-
estimates, standard errors, allocation of samples to strata-proportional allocation, neyman allocation
(optimum without cost), and optimum allocation (with cost), stratified sampling for proportions,
determination of sample size, estimation of gain due to stratification, the construction of strata,
methods of collapsed strata, post-stratification, deep stratification, comparing with one-way
stratification.
Systematic Sampling: Use, limitation, estimates, bias, standard error and efficiency, comparison
with simple random sampling, systematic sampling for populations with linear trend, methods for
dealing with population in random order, population with linear trend and population with periodic
variation, circular systematic sampling.
Texts
1. Lohr, S.L. (2005), Sampling: Design and Analysis
2. Cochran, W. G. (1977): Sampling Techniques, 3rd edition, Wiley Eastern, New Delhi
References
1. Islam M. N (2014): An Introduction to Sampling methods, 4th edition, Mullick and Brothers,
Dhaka.
2. Levy, P. and Lemeshow, S. (1999): Sampling of Populations: methods and applications,
Wiley, New York.
3. Mukhopadhyay P. (2000): Theory and Methods of Survey Sampling, Prentice-Hall of India (P)
Limited, New Delhi.
4. Yates, F.- Sampling Methods for Censuses and Surveys.
5. Raj, D. and Chandhok, P. (1998): Sample Survey Theory, Narosa Publishing House, New
Delhi.
6. Sukhatme, P.V. and Shukhatme B. V. (1984): Sampling theory of surveys with applications, 2nd
edition,Asia Publishing House, London.
7. Singh, Daroga, Chaudhary, F. S. (1986). Theory and Analysis of Sample Survey Designs,
Wiely, New York.

2
Introduction to Sample Survey

Basic concepts about sampling


Basic concepts about sampling mainly include useful terminologies, real life examples, basic
principles of sampling, basic types of sampling etc.
How do we study a population?
A population may be studied using one of two approaches: conducting a census survey (simply
census), or a sample survey.
It is important to note that whether a census or a sample survey is used, both provide information
which can be used to draw conclusions about the whole population.

What is a census (complete enumeration)?


A census is the procedure of systematically acquiring and recording information about all the
members of a given population. Simply, a census is a study of every unit in a population. It is
known as a complete enumeration, which means a complete count.
The term is used mostly in connection with national population and housing censuses; other
common censuses include agriculture, business, traffic censuses etc. In Bangladesh, population
census is being conducted by Bangladesh Bureau of Statistics (BBS) after every 10 years.
What is a sample survey (partial enumeration)?
A sample is a subset of units in a population, selected to represent all units in a population of
interest. A sample survey is a study involving a subset (or sample) of individuals or objects selected
from a larger population by accepted statistical methods.
A sample survey is a partial enumeration because it is a count from part of the population.
Information from the sampled units is used to estimate the characteristics for the entire population
of interest.
When to use a census or a sample survey?
Once a population has been identified a decision needs to be made about whether taking a census or
selecting a sample will be the more suitable option. There are advantages and disadvantages to
using a census or sample to study a population:

3
Relative advantages and disadvantages of census and sample survey

Advantages of a census Disadvantages of a census

• provides a true measure of the • higher costs, both in staff and


population (no sampling error) monetary terms, than for a sample
• benchmark data may be obtained for • generally takes longer to collect,
future studies process, and release data than from
a sample
• detailed information about small sub-
groups within the population is more • generally produces many non-
likely to be available sampling errors
• may be difficult to enumerate all
units of the population within the
available time

Advantages of a sample Disadvantages of a sample

• reduces cost - both in monetary terms • data may not be representative of


and staffing requirements. the total population, particularly
where the sample size is small
• reduces time needed to collect and
and/or sampling technique is
process the data and produce results as it
inaccurate
requires a smaller scale of operation.
• often not suitable for producing
• if good sampling techniques are used,
benchmark data
the results can be very representative of
the actual population • as data are collected from a subset
of units and inferences made about
• generally reduces non-sampling errors
the whole population, the data are
subject to sampling error
• decreased number of units will
reduce the detailed information
available about sub-groups within a
population

Some uses of census data


By the beginning of the twentieth century, censuses were recording households and some
indications of their employment. In some countries, census archives are released for public
examination after many decades, allowing genealogists to track the ancestry of interested people.
Archives provide a substantial historical record which may challenge established notions of

4
tradition. It is also possible to understand the societal history through job titles and arrangements
for the destitute and sick.
As governments assumed responsibility for schooling and welfare, large government departments
made extensive use of census data. Actuarial estimates could be made to project populations and
plan for provision in local government and regions. It was also possible for central government to
allocate funding on the basis of census data. Even into the mid twentieth century, census data was
only directly accessible to large government departments. However, computers meant that
tabulations could be used directly by university researchers, large businesses and local government
offices. They could use the detail of the data to answer new questions and add to local and
specialist knowledge.
Now, census data are published in a wide variety of formats to be accessible to business, all levels
of governance, media, students and teachers, charities and researchers, and any citizen who is
interested. Data can be represented visually or analyzed in complex statistical models, to show the
difference between certain areas, or to understand the association between different personal
characteristics. Census data offer a unique insight into small areas and small demographic groups
which sample data would be unable to capture with precision.
Some uses of survey methods
A sample is taken almost always to provide statistical data on an extensive range of subjects for
both research and administrative purposes. The following examples are designed to illustrate their
importance of sampling in real life:
a) In opinion poll, a relatively small number of persons are interviewed, and their opinions on
current issues are solicited in order to discover the attitude of the community as a whole.
b) Marketing and advertising agencies conduct countless inquiries to determine customers’
expectations, attitudes, buying habits, or shopping patterns. This information is useful to the
manufacturers of goods for sales promotion.
c) c) Large lots of manufactured products are accepted or rejected by purchasing departments
in business or government following inspection of a relatively small number of items drawn
from these lots.
d) d) At border stations, customs officers enforce the laws by checking the effects of only a
small number of travelers crossing the border.
e) e) A department store wishes to examine whether it is loosing or gaining customers by
drawing a sample from its list of credit card holders by selecting every tenth name.
f) f) Auditors often judge the extent to which the proper accounting procedures have been
followed by examining a small number of transactions, selected from a large number of
such transactions taking place within a specified period of time.
g) g) Ministry of Health and Family Welfare might be interested to know the status of
knowledge among the adult population in Dhaka city on the danger of environmental
pollution.

5
Some other examples of uses of survey methods
Government agencies: Information collected from surveys conducted by government agencies has a
huge influence upon the ways our lives are regulated. Some examples are: labour force surveys for
monitoring the extent of unemployment, and the Consumer Price Index (CPI) which is based on a
survey of prices. (The weightings used to combine the prices are obtained by a survey of patterns of
expenditure.)
Acceptance sampling: Many manufacturers sample from batches of components and raw materials
being brought in. If the sample is not up to specified standards the batch will be sent back.
Accounting data in auditing: Accounting auditors cannot check all the accounts of a company in
detail. Instead they sample invoices or accounts and check just these carefully.
Economic forecasts: Business confidence is a very important ingredient in determining whether the
economy grows or contracts (a recession). Surveys of business opinion play an important part in
economic forecasts.
Ratings for TV/radio audiences: These are based upon the viewing or listening habits of a sample
of people. They determine the price of advertising and thus the income to be spent on
programming.
Sociological research: Investigations are carried out into the way we live, the way society is
organized, and the use of local and national facilities (e.g. national parks). This information can be
a basis for government policy decisions.
Tax collection: In many countries a sample of people has their tax returns audited in detail. This
tends to be in addition to a regular rotation in which everyone is audited every 5 years, say.

6
Useful terms in sampling
Population
A population is the collection or aggregate of all elements or items of interest in a particular study
about which we wish to make an inference.
In other words, a population is a complete set of items being studied in an inference procedure. A
population includes all of the elements under study. A population should always be defined in
terms of its content, units, coverage and time of occurrence.
In research terminology the ‘Population’ can be explain as a comprehensive group of individuals,
objects, institutions and so forth which have a common characteristic that are the interest of a
researcher.
Example: All college students constitute a population if the researcher is interested to study on
college students regarding their socio-economic background or some opinion poll or on any other
issues.
Similarly, all patients, all students, all hospitals of Bangladesh, all private banks in Dhaka city are
some examples of population
Target Population
A target population is the entire group about which information is desired and conclusion is made.
The target population is the population you are interested in your study. This is the population you
want your study findings to be generalized to.
Study population
Study population is a subpopulation of target population that you are taking from the target
population for doing your study. That is the population, which we actually sample, is the study
population.
It is also called sampled population, survey population or accessible population.
Sample
Any part of a population is called a sample. A sample may be representative or not. A sample
however, is desired to be representative for further statistical operations.
Let us see the following figure.

7
For example, suppose in a study on diabetes mellitus, we have drawn a sample of 200 diabetic
patients from BIRDEM. Then these 200 patients are the sample, all patients of BIRDEM are study
population and all diabetic patients are the target population for which the study findings will be
generalized.
Sampling Unit
A sampling unit or simply a unit is a well-defined, distinct and identifiable element or group of
elements on which observation is made. Each element in a population is a sampling unit.
Sample Size
Sample size refers to the number of units contained in a sample. It is usually denoted by n.
Population Size
Population size is the number of units which constitute the population.It is usually denoted by N.
Survey
Survey is a general term that refers to the collection of data by means of interviews, questionnaires
or observation.
Census Survey
A census survey simply census, is an investigation or a count of all the population elements.
Sample Survey
A sample survey is a study involving a subset (or sample) of individuals or objects selected from a
larger population by accepted statistical methods.
Sampling
Sampling is a statistical procedure of drawing a small number of elements from a population (also
called universe); to estimate population parameters and draw conclusion regarding population.
Sample Design

8
Sample design or sampling design refers to the plans and methods to be followed in selecting
sample from the target population and the estimation technique vis-à-vis formula for computing the
sample statistics.
Survey Design
Survey design is the process of preparing a complete plan of operations to be followed in
conducting a survey and disseminating its intended results.
Sampling Frame
A sampling frame is a complete list of units or group of units in the population to be sampled. That
is. It’s a complete list of everyone or everything we want to study. In other words, a sampling frame
is a complete list of sampling units.
Qualities of a sampling frame
An ideal sampling frame will have the following qualities:

• all units have a logical, numerical identifier

• all units can be found – their contact information, map location or other relevant
information is present

• the frame is organized in a logical, systematic fashion

• the frame has additional information about the units that allow the use of more advanced
sampling frames

• every element of the population of interest is present in the frame

• every element of the population is present only once in the frame

• no elements from outside the population of interest are present in the frame

• the data is 'up-to-date'


Types of sampling frame
Two types of sampling frames are used in sampling problems:
a) List- sampling frame
b) Area- sampling frame
List- sampling frame
A List- sampling frame is a complete list of well- defined reporting units. This list should contain
relevant information about individual units, which will enable efficient sampling. List of
households or addresses of housing units may be the examples of a list- sampling frame.
Area- sampling frame
An area frame is a collection of well-defined land units that is used to draw survey samples.
Common land units composing an area frame include states, provinces, counties, zip code areas, or
blocks. An area frame could be a list, map, aerial photograph, satellite image, or any other
9
collection of land units. Area frames play an important part in area probability samples, multi-stage
samples, cluster samples, and multiple frame samples. They are often used when a list of ultimate
sampling units does not exist, other frames have coverage problems, a geographically clustered
sample is desired, or a geographic area is the ultimate sampling unit.
Questionnaire and Interview Schedule
A questionnaire is a research instrument consisting of a series of questions and other prompts for
the purpose of gathering information from respondents. Although they are often designed
for statistical analysis of the responses, this is not always the case. The questionnaire was invented
by the Statistical Society of London in 1838. A copy of the instrument is published in the Journal of
the Statistical Society, Volume 1, Issue 1, 1838, pages 5–13.
Difference between questionnaire and structured interview schedule
The questionnaire or structured interview schedule is the data collection technique most commonly
used by social surveys. When respondents fill in the instrument on their own without the help of an
interviewer, as is the case in a postal survey, the research instrument is called a questionnaire.
When interviewers are present, asking the questions and helping the respondent, as in face-to-face-
interviews or a telephone survey, the research instrument is known as a structured interview
schedule or simply schedule. The design and way the questionnaire is administered depend on the
type of survey.
Key principles of effective questionnaire design
There are seven steps in the design of a questionnaire:
Step 1 – Decide what information is required
The starting point is for the researcher to refer to the proposal and brief and make a listing of all the
objectives and what information is required in order that they are achieved. Accordingly a list of
indicators or variables is to be prepared.
Step 2 – Make a rough listing of the questions
A list is now made of all the questions that could go into the questionnaire. The aim at this stage is
to be as comprehensive as possible in the listing and not to worry about the phrasing of the
questions. That comes next.
Step 3 – Refine the question phrasing
The questions must now be developed close to the point where they make sense and will generate
the right answers. Tips on how to write good questions are provided later in this chapter.
Step 4 – Develop the response format
Every question needs a response. This could be a pre-coded list of answers or it could be open
ended to collect verbatim comments. Consideration of the responses is just as important as getting
the questions right. In fact, considering the answers will help get the questions right.
Step 5 – Put the questions into an appropriate sequence

10
The ordering of the questions is important as it brings logic and flow to the interview. Normally the
respondent is eased into the task with relatively straightforward questions while the more difficult
or sensitive ones are left until they are warmed up. Questions on brand awareness are asked first
unprompted and then they are prompted.
Step 6 – Finalize the layout of the questionnaire
The questionnaire now needs to be fully formatted with clear instructions to the interviewer,
including a powerful introduction, routings and probes. There needs to be enough space to write in
answers and the responses codes need to be well separated from each other so there is no danger of
circling the wrong one.
Step 7 – Pretest and revise
The final step is to test the questionnaire. It usually isn’t necessary to carry out more than 10 to 20
interviews in a pilot because the aim is to make sure that it works, and not to obtain pilot results. In
theory the questionnaire should be piloted using the interviewing method that will be used in the
field (over the phone if telephone interviews are to be used; self completed if it will be a self
completion questionnaire). Time and money can preclude a proper pilot so at the very least it
should be tested on one or two colleagues for sense, flow and clarity of instructions. The whole
purpose of the test is to find out if changes are needed so that final revisions can be made. When
carrying out the pilot it is best to run through the questionnaire with the guinea pig respondent and
then go back over the questions and ask for each one, “what was going through your mind when
you were asked this question?”. Questionnaire design is one of the hardest and yet one of the most
important parts of the market research process. Given the same objectives, two researchers would
probably never design the same questionnaire.
Statistic and Parameter
Definition of Statistic
A statistic is a characteristic of a sample obtained from a small part of the population. It is a
descriptive statistical measure and function of sample observations. The common use of statistic is
to estimate a particular population parameter.
From the given population, it is possible to draw multiple samples, and the result (statistic)
obtained from different samples will vary, which depends on the samples.
Definition of Parameter
A fixed characteristic of a population obtained from all the elements of the population is termed as
the parameter. It is a numerical value that remains unchanged, as every member of the population is
surveyed to know the parameter. It indicates true value, which is obtained after the census is
conducted.
Key Differences Between Statistic and Parameter
The difference between statistic and parameter can be drawn clearly on the following grounds:

11
1. A statistic is a characteristic of a small part of the population, i.e. a statistic is a
characteristic of a sample. A parameter is a characteristic of a population. The parameter
is a fixed measure which describes the target population.
2. The statistic is a variable and known number which depend on the sample of the population
while the parameter is a fixed and unknown numerical value.
3. Statistical notations are different for population parameters and sample statistics.
Different symbols are used to denote statistics and parameters, as Table 1 shows some notations.
Table 1: Comparison of some useful Sample statistic and Population parameter

Sample statistic Population parameter

Mean 𝑥 𝜇

Standard deviation 𝑠 𝜎

Variance 𝑠2 𝜎2

Inferential statistics enables you to make an educated guess about a population parameter based on
a statistic computed from a sample randomly drawn from that population (see Figure 1).
Figure 1: Illustration of the relationship between population & sample and parameter & statistic

For example, say you want to know the mean income of freelancers—a parameter of a population.
You draw a random sample of 100 freelancers and determine that their mean income is Tk. 45,500
per month. You conclude that the population mean income μ is likely to be close to Tk. 45,500 as
well. This example is one of statistical inference.
Estimation, Estimator and Estimate
What is an Estimator?
An estimator is a statistic used for the purpose of estimating an unknown parameter. An estimator
is a function of the data in a sample. You can also think of an estimator as the rule that creates an
estimate. Common estimators are the sample mean and sample variance which are used to estimate
the unknown population mean and variance.

12
What is an Estimate?
An estimate is the numerical value of the estimator when it is actually computed using data from a
specific sample.
What is Estimation?
Estimation is the process by which the numerical value of unknown population values are inferred
from sample data.
What is the difference between an estimator and an estimate?
1. An estimator is a function of a sample of data to be drawn randomly from a population
whereas an estimate is the numerical value of the estimator computed from sample data.
2. An estimator is a random variable and an estimate is a number (that is the computed value
of the estimator).
As referred to above example we can see population parameter, estimator and estimate in the
following table:

Population parameter Estimator Estimate

𝜇 𝑥 45,500

Bias of an estimator

Suppose we are trying to estimate the parameter 𝜃 using an estimator 𝜃 (that is, some function of
the observed data). Then the bias of 𝜃 is defined to be

𝐵𝑖𝑎𝑠 𝜃 = 𝐸 𝜃 ― 𝜃.

In words, this would be "the expected value of the estimator minus the true value 𝜃." This may be
rewritten as

𝐸(𝜃 ―𝜃).
which would read "the expected value of the difference between the estimator and the true value"
(the expected value of 𝜃 is precisely 𝜃).
In particular, bias is zero, the estimator is called unbiased. Then we have

0= 𝐸 𝜃 ―𝜃

=> 𝐸 𝜃 = 𝜃………(1)
Equation (1) gives the condition of unbiasedness of an estimator.
Example: Suppose we have 3 children in a family whose ages are 1, 3 and 5. We want to select
two of them.
In this particular instance, we say that we have a population of size 3 (i.e. N=3) from which a
sample of size 2 (i.e. n=2) is to be selected without replacement. To select these two children, there

13
will be altogether 3 possible samples each of 2 children. The accompanying table displays all
possible samples of size 2.
Table: All possible samples of size 2 without replacement

Sample Number Ages of selected Sample mean:𝑥


children

1 1, 3 2

2 1, 5 3

3 3, 5 4

Population mean 𝜇 =3

Average of 𝑥 =3
That is

𝐸(𝑥) = 𝜇

So 𝑥 is an unbiased estimator of 𝜇.
Mean squared error
The mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the
average of the squares of the errors—that is, the average squared difference between the estimated
values and the actual value. The squaring is done so negative values do not cancel positive values.

Suppose we are trying to estimate the parameter 𝜃 using an estimator 𝜃 (that is, some function of
the observed data). Then the mean squared error of 𝜃 is defined to be

MSE (𝜃)=E(𝜃-𝜃)2
or

MSE (𝜃)=V (𝜃)+[Bias (𝜃)]2


Precision
Precision of an estimator is a measure of how close an estimator is expected to be the true value of
a parameter. It is usually expressed in terms of imprecision and related to the standard error of the
estimator. That is,
1
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑜𝑓 𝜃 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 𝑜𝑓 𝜃
Less precision is reflected by a larger standard error.
Characteristics of a Good Sample Design

14
In a field study due to time and cost involved, generally, only a section of the population is studied.
These respondents are known as the sample and are representative of the general population or
universe. A sample design is a definite plan for obtaining a sample from a population. It refers to
the technique or the procedure for obtaining a sample from a given population.
Following are the characteristics of good sample design:
1. Sample design should ensure a representative sample: A researcher selects a relatively small
number for a sample from an entire population. This sample needs to closely match all the
characteristics of the entire population. If the sample used in an experiment is a representative
sample, then it will help generalize the results from a small group to large universe being studied.
2. Focus on objectives: The sampling method and sample size must be selected depending upon
the research objectives.
3. Proper selection of sample unit: The sample unit must be appropriate. As per objective, the
universe is defined first which comprises of the units. sometimes the universe comprises of the
elements, and each element can be further divided into units.
4. Sample design should have small sampling error: Sampling error is the error caused by
taking a small sample instead of the whole population for study. Sampling error refers to the
discrepancy that may result from judging all on the basis of a small number. Sampling error is
reduced by selecting a large sample and by using efficient sample design and estimation strategies.
5. Sample design should be economically viable: Studies have a limited budget called the
research budget. The sampling should be done in such a way that it is within the research budget
and not too expensive to be replicated.
6. Sample design should have marginal systematic bias: Systematic bias results from errors in
the sampling procedures which cannot be reduced or eliminated by increasing the sample size. The
best bet for researchers is to detect the causes and correct them.
7. Results obtained from the sample should be generalized and applicable to the whole
universe: The sampling design should be created keeping in mind that samples that it covers the
whole universe of the study and is not limited to a part.
Basic Principles of Sampling
Theory of sampling is based on the following principlesor laws -
• Law of Statistical Regularity – This law comes from the mathematical theory of probability.
According to King,” Law of Statistical Regularity says that a moderately large number of the items
chosen at random from the large group are almost sure on the average to possess the features of the
large group.”
According to this law the units of the sample must be selected at random.
• Law of Inertia of Large Numbers – According to this law, the other things being equal – the
larger the size of the sample; the more accurate the results are likely to be.
Types of Sampling: Sampling with Replacement and Sampling without Replacement
Sampling without Replacement

15
In sampling without replacement, the unit drawn is not returned to the population in subsequent
drawings. Unlike sampling with replacement, the probability of drawing any remaining unit in
successive selection will be increased.
Example: Suppose we have 3 members in a family to whom we assign serial numbers 1, 2 and 3.
We need to select two of them for an interview. In this particular instance, we say that we have a
population of size 3 (i.e. N=3) from which a sample of size 2 (i.e. n=2) is to be selected. To select
these two members without replacement, there will be altogether 3 possible samples each of 2
members. The accompanying table displays all possible samples of size 2 without replacement:
Table: Samples of size 2 without replacement

Sample Number Serial Numbers

1 (1,2)

2 (1,3)

3 (2,3)

Sampling with Replacement


If the sample is taken with replacement from a population, finite or infinite, the unit drawn is
returned to the population and the number of units available for future drawing is not affected and
consequently the probability of drawing any remaining unit in successive selections will remain
unaltered. Sampling with replacement is sometimes referred to as an unrestricted sampling.
Example: Refer to the example above. If sampling is done with replacement, there will be 9
possible samples each of size 2. The following table displays all possible samples of size 2 with
replacement:
Table: Samples of size 2 with replacement

Sample Number Serial Numbers

1 (1,1)

2 (1,2)

3 (1,3)

4 (2,1)

5 (2,2)

6 (2,3)

16
7 (3,1)

8 (3,2)

9 (3,3)

Types of Sampling: Non-probability sampling and Probability sampling


There are two types of sampling such as
i) Non-probability sampling and
ii) Probability sampling
Non-probability Sampling
Non-probability sampling is a non-random and subjective method of sampling where the selection
of the units depends on the personal judgment of the sampler.
Probability Sampling
Probability or random sampling is a method where a sample is selected according to the rules of
probability. In probability sampling every element of the population has a certain probability (not
necessarily equal) of being selected in the sample.
Non-probability Sampling
There are some non-probability sampling methods such as
i) Convenience Sampling; ii) Accidental Sampling;
iii) Judgment Sampling; iv) Quota Sampling;
v) Snowball Sampling;
i) Convenience Sampling
Non-probability samples that are unrestricted are known as convenience samples. Researchers or
field workers have the freedom to choose whomever they find; thus, the name convenience. The
convenience sample may consist of respondents living in an easily accessible locality.
Undoubtedly, it is the simplest and least reliable form of non-probability sampling. The primary
virtue is its low cost.
ii) Accidental Sampling
An accidental type sampling is one in which the selection of the cases is made whatever happens to
be available instantly. In such sampling, individuals are selected as they appear in a process. If it is
decided that only diabetic patients or patients with abdominal pain, will be chosen from a queue in
front of a hospital counter, the resulting sample will fall under accidental sampling procedure.
iii) Judgment Sampling

17
In judgment sampling, individuals are selected who are considered to be most representative of the
population as a whole. It is a judgment sampling because choice of the individual units depends
entirely on the sampler, who, on his own judgment, decides the sample to be selected that conforms
to some criteria. In study of labor problem, you may decide to talk only with those who have
experienced discrimination while they were in job.
iv) Quota Sampling
Quota sampling is a non-probability sampling, in which the interviewers are told to contact and
interview a certain number of individuals from certain sub-groups or strata of the population to
make up the total sample. The formation of the strata is usually based on such characteristics as sex,
age, social status, region of residence. These characteristics which are used to form strata, are
termed ‘quota control’. The technique is widely used by market researchers, political opinion
seekers and many others to avoid the cost problems of interviewing the individuals.
V) Snowball Sampling
Snowball sampling is the colorful name for technique of building up a list or a sample of a special
population. Some recent authors have referred to snowball sampling as chain referral or network
sampling.
Snowball sampling is conducted in stages. In the first stage, a few persons possessing the requisite
characteristic are identified and interviewed. These persons are used as informants to identify
others who qualify for inclusion in the sample. The second stage involves interviewing these
persons who can be interviewed in the third stage and so on. For example, consider the selection of
beggars for which no frame is available. This can be best done by asking an initial group of beggars
to supply the names of other beggars they come across.

18
Types of Probability Sampling
The different types of probability sampling are following:
a) Simple Random Sampling;
b) Stratified Sampling;
c) Systematic Sampling;
d) Cluster Sampling.

Simple Random Sampling (SRS)


Simple Random Sampling is a method for selecting sample elements from a population which has
the following properties:
• The population consists of N objects.
• The sample consists of n objects.
• All possible samples of n objects are equally likely to occur, i.e. all possible samples of n
objects have equal probability of being selected.
Methods of selecting simple random sample
Selection a simple random sample is accomplished with the aid of any of the following methods.

a) Lottery method
The following steps are followed in drawing a simple random sample by lottery method.
i. First, prepare a sampling frame giving id number for each unit in the population.
ii. Selection a simple random sample is accomplished with the aid of traditional lottery
method.

b) Method using random numbers


Method of using random numbers may be accomplished by any of two ways.

1. Method of using random numbers directly


The following 9-step procedure may be followed in drawing a simple random sample of n units
using random numbers from a population of N units.
i. First, prepare a sampling frame giving id number for each unit in the population.
ii. Assign serial numbers to the units in the population from 1 through N.

1
iii. Decide on the random number table to be used.
iv. Choose and N-sized random number from any point in the random number table.
v. If this random number is less than or equal to N, this is your first selected unit.
vi. Move on to the next random number not exceeding N, Vertically horizontally or in any
other direction systematically and choose your second unit.
vii. If at any stage of your selection, the random number chosen exceeds N, discard it and
choose the next random number.
viii. If, further, any random number is repeated, it must also be discarded and be replaced by a
fresh random number appearing next.
ix. The process stops once you arrive at your desired sample size.

Example: Draw a simple random sample of size 5 from a population comprising 150 units.
The random numbers are as follows:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121

Solution: Here n=5 and N=150. Assign serial numbers 001, 002, …, 150 to the 150 units in the
population. Since 150 is a three-digit number, we merely read three-digit random numbers
presented in the random number table. Suppose we start from the leftmost digit of first row of the
random number table and proceed downward until we achieve a sample of 5.
Note that we choose only those numbers, which lie in the range 001-150. Any number lying outside
this range is omitted, since they do not correspond to any unit in the population. The process stops
once we arrive at five numbers. Note that the selected numbers are 130, 108, 61, 63 and 121. These
numbers are underlined with bold faces. All these numbers are distinct. If a random number occurs
twice, the second occurrence is omitted, and another number is selected as its replacement.
The random numbers selected are shown as follows:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121

2
2. Remainder method
This is another method of using random numbers. This procedure has the advantage of having less
rejection rate in the selection process. The procedure is illustrated with the following example.

Example: Draw a simple random sample of size 5 from a population comprising 150 units.
The random numbers are as follows:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121

Solution: The population from which sample of 5 has to be chosen, contains 150 units. Suppose we
start from the leftmost digit of first row of the random number table and proceed along rows.
For selecting a unit from 001-150, follow the steps below:
i. Choose a random number between 001 and 150. The number is 277.
ii. Divide 277 by 150. The remainder is 127, The unit labeled 127 in the population is your
first selected unit.
iii. To select the second unit, choose the next random number. This number is 130, which is
less than 150. We directly choose this number as our second unit in the sample.
iv. The next random number is 802, which results in a remainder of 52 when divided by 150.
The unit corresponding to this number is our third selected unit.
v. Continuing this process, we arrive at the next two numbers. These are 108 and 91.
vi. The random numbers thus chosen are 127, 130, 52, 108 and 91.

Selection a simple random sample may also be accomplished with the aid of computer software, or
a scientific calculator.

3
Example: Following table presents a population data for salaries of 30 employees. Select a simple
random sample of size 10 using random number table and estimate the average salary.

ID Salary (thousand ID Salary


Taka) (thousand
Taka)

1 30 16 26

2 40 17 15

3 25 18 17

4 35 19 18

5 40 20 20

6 25 21 40

7 40 22 18

8 25 23 25

9 25 24 43

10 15 25 50

11 20 26 15

12 22 27 35

13 30 28 65

14 22 29 24

15 40 30 55

The random numbers are as follows:


277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121

4
Solution:
We have the following random numbers:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121
Using remainder method we start from 1st row and 1st column and proceed along row. The selected
IDs with their salaries are presented in the table.

Sl Selected ID Salary (thousand Taka)


xi

10

Total - n

x
i 1
i 

5
The calculation of average is given below.
n

x i
x i 1
 .Tk .....
n

Comment: The estimated average salary of 30 employees is Taka…..

Example: Following table presents a population data for gender distribution of 30 employees.
Select a simple random sample of size 10 using random number table and estimate the proportion
male and female employees.

ID Gender ID Gender

1 Male 16 Male

2 Female 17 Male

3 Male 18 Female

4 Male 19 Male

5 Female 20 Male

6 Male 21 Female

7 Female 22 Male

8 Male 23 Female

9 Female 24 Male

10 Female 25 Male

11 Male 26 Male

12 Male 27 Female

13 Male 28 Male

14 Female 29 Female

15 Female 30 Female

6
Solution:
We have the following random numbers:
277 130 802 108 541 603 497 786 666 440
414 945 416 502 413 258 061 608 809 195
493 063 609 923 779 381 396 840 474 433
642 668 724 210 953 407 582 895 154 121
Using remainder method we start from 1st row and 1st column and proceed along column. The
selected IDs with their gender are presented in the table.

Sl Selected ID Gender

10

Following table prepared for presenting the proportions male and female employees.

Gender of employee Tally Number Percentage

Male

Female

Total

Comment: The estimated proportions of male and female employees are …..% and……%
respectively.
7
Properties of simple random sampling

1. In sampling with replacement, the probability of selecting any specified unit 𝑢𝑖 from a population
of 𝑁 units in any draw is
1
𝑃(𝑢𝑖 ) = ;𝑖 = 1, 2, 3, ……𝑁
𝑁
2. In sampling without replacement, the probability that a specified unit 𝑢𝑖 from a population of 𝑁
units, will be selected on any draw, such that it was not selected on previous draws is equal to the
1
probability of selecting it on the first draw and it equals to 𝑁.

Proof:
Let

𝑃𝑟 = probability that a specified unit 𝑢𝑖 is selected on rth draw, such that it was not selected on
previous draws.
Then obviously,
1
𝑃1 = 𝑃(𝑢𝑖 is selected on 1 𝑠𝑠 draw) = .
𝑁
𝑃2 = 𝑃(𝑢𝑖 is not selected on 1 𝑠𝑠 draw).𝑃(𝑢𝑖 is selected on 2 𝑛𝑛 draw)
𝑁―1 1
= ×
𝑁 𝑁―1
1
= .
𝑁
𝑃3 = 𝑃(𝑢𝑖 is not selected on 1 𝑠𝑠 draw).𝑃(𝑢𝑖 is not selected on 2 𝑛𝑛 draw).𝑃(𝑢𝑖 is selected on 3 𝑟𝑟 draw)
𝑁―1 𝑁―2 1
= × ×
𝑁 𝑁―1 𝑁―2
1
= .
𝑁
Thus, for rth draw
𝑁―1 𝑁―2 𝑁―3 𝑁 ― (𝑟 ― 1) 1
𝑃𝑟 = × × × …… × ×
𝑁 𝑁―1 𝑁―2 𝑁 ― (𝑟 ― 2) 𝑁 ― (𝑟 ― 1)
1
= .
𝑁
Thus, it is evident (from property 1 and 2) that the probability of selecting a specified unit 𝑢𝑖 of the
population on any draw is equal to the probability of selecting it on the first draw (which equals to
1
𝑁
) irrespective of whether the units are drawn with replacement or without replacement.

3. For sampling without replacement the possible number of different combinations of n elements
formed from 𝑁 elements is 𝑁𝐶𝑛 while for sampling with replacement, the possible number of
combinations is 𝑁 𝑛𝑛 which are not all different.
8
4. In simple random sampling with replacement there are 𝑁 𝑛 distinct samples and each possible
combination of 𝑛 different units out of 𝑁 has the same probability of being selected, and it equals to
1
𝑁𝑛
.

Proof:
1
At first draw, the probability that any unit out of 𝑁 units will be selected is 𝑁 which remains same
at any draw as the sampling is with replacement. Moreover, in sampling with replacement each
draw is independent of others.

So, the probability of selecting 𝑛units is


1 1 1 1 1
. . …… = 𝑛
𝑁 𝑁 𝑁 𝑁 𝑁
1
Therefore, each combination of 𝑛units has the same probability of being selected which is 𝑁 𝑛 .

5. In simple random sampling without replacement there are 𝑁𝐶𝑛 distinct samples and each possible
combination of n different units out of N has the same probability of being selected and it equals to
1
𝑁𝐶
.
𝑛

Proof:
𝑛
At 1st draw, the probability that one of n specified units will be selected is 𝑁 .
𝑛―1
At 2nd draw, the probability that one of remaining (n-1) specified units will be selected is 𝑁―1 .
𝑛―2
At 3rd draw, the probability that one of remaining (n-2) specified units will be selected is 𝑁―2 .

.
.
.
1
At nth draw, the probability that the remaining 1 unit will be selected is 𝑁―(𝑛―1) .

Hence the probability that all 𝑛specified units are selected in 𝑛draws is
𝑛 𝑛―1 𝑛―2 1
. .
𝑁 𝑁―1 𝑁―2
. ……
𝑁―(𝑛―1)

𝑛!(𝑁 ― 𝑛)!
=
𝑁!
1
= 𝑁𝐶
.
𝑛

1
Therefore, each combination of 𝑛units has the same probability of being selected, which is 𝑁𝐶 .
𝑛

9
Advantages of simple random sample
 This is the ideal method of sampling.
 Highly representative if subjects are not much heterogeneous.
 Estimates are easy to calculate.

Disadvantages of simple random sample


 It needs sampling frame which is often difficult to obtain. If sampling large frame, this
method is impracticable.
 Minority subgroups of interest in population may not be present in sample in sufficient
numbers for study.
 It is potentially less economical to achieve.

10
Estimation of population parameters in simple random sampling
The frequent objective of a sample survey is to estimate the population mean, population total,
population variance, ratio of two totals etc. with a view to draw inferences about a population from
information contained in a sample.
For example, we might be interested in the mean Taka value for the wage paid to the employees or
the total amount in taka. Hence, we consider estimation of the two population parameters here viz.
the mean and the total.
Notations and formulae
The notations and formulae used for the mean, total and variance in case of sample and population
are summarized in the following table.

Item Population Sample

Size N n

Values of 𝑦1 , 𝑦2 , …, 𝑦𝑁 𝑦1 , 𝑦2 , …, 𝑦𝑛
some variable

𝑁 𝑛

Total 𝑌= 𝑦𝑖 = 𝑦1 + 𝑦2 + … + 𝑦𝑁 𝑦= 𝑦𝑖 = 𝑦1 + 𝑦2 + … + 𝑦𝑛
𝑖=1 𝑖=1

𝑁 𝑛
∑𝑖=1 𝑦𝑖 𝑦1 + 𝑦2 + … + 𝑦𝑁 ∑𝑖=1 𝑦𝑖
𝑌= = 𝑦=
Mean 𝑁 𝑁 𝑛
𝑦1 + 𝑦2 + … + 𝑦𝑛
=
𝑛
𝑁
∑𝑖=1 (𝑦𝑖 ― 𝑌) 2 no sample notation used
variance 𝜎2 =
𝑁
𝑛
Modified 𝑁
∑𝑖=1 (𝑦𝑖 ― 𝑌) 2 ∑𝑖=1 (𝑦𝑖 ― 𝑦) 2
definition of 𝑆2 = 𝑠2 =
𝑁―1 𝑛―1
variance

Estimators of mean, total and variance

Population parameter Estimator

Population
𝑌 𝑦
Mean

Population
𝑌 𝑌 = 𝑁𝑦
Total

Population 𝜎2 𝑠 2 (in sampling with replacement)

11
variance
𝑆2 𝑠 2 (in sampling without replacement)

We have to obtain the following

Unbiasedness of estimator Variance of estimator Estimator of variance of


estimator

𝐸(𝑦) 𝑉(𝑦) 𝑣(𝑦)

𝐸(𝑌) 𝑉(𝑌) 𝑣(𝑌)

𝐸(𝑠 2 ) 𝑉(𝑠 2 ) 𝑣(𝑠 2 )

Estimation of parameters
We are interested to find the estimators of three types of parameters- mean, total and variance.
Also, the expected value of the estimator and its variance will be obtained. However, these all are
discussed as per convenient sequence of understanding.

Estimator of population mean


a) Sample mean 𝑦 is used as an estimator of population mean 𝑌.

b) The sample mean 𝑦 for a simple random sample of size n is an unbiased estimator of population
mean 𝑌.
Symbolically,
𝐸(𝑦) = 𝑌.

Proof:
By definition

𝑛
∑𝑖=1 𝑦𝑖
𝑦=
𝑛
and
𝑁
∑𝑖=1 𝑦𝑖
𝑌=
𝑁

Taking expectation
𝑛 𝑛
1 1
𝐸(𝑦) = 𝐸 𝑦𝑖 = 𝐸(𝑦𝑖 ) ………(1)
𝑛 𝑛
𝑖=1 𝑖=1

12
Now by definition,
𝑁

𝐸(𝑦𝑖 ) = 𝑦𝑖 𝑃(𝑦𝑖 )
𝑖=1

We now need to evaluate𝑃(𝑦𝑖 ), the probability that the 𝑦𝑖 , the ith unit of the population is selected
1
at the rth draw. By the property of simple random sampling this probability is𝑁 .
Hence
𝑁 𝑁
1
𝐸(𝑦𝑖 ) = 𝑦𝑖 𝑃(𝑦𝑖 ) = 𝑦𝑖 =𝑌
𝑁
𝑖=1 𝑖=1

Substituting this in eq (1)


𝑛 𝑛
1 1 1
𝐸(𝑦) = 𝐸(𝑦𝑖 ) = 𝑌= 𝑛𝑌 = 𝑌 .
𝑛 𝑛 𝑛
𝑖=1 𝑖=1

This completes the proof.

Estimator of population total

a) 𝑌 = 𝑁𝑦 is used as an estimator of population total 𝑌.

Note: Sample total y is not used as an estimator of population total Y.

b) 𝑌 = 𝑁𝑦 is an unbiased estimator of population total𝑌.


Symbolically,
𝐸(𝑌) = 𝑌.

Proof:

𝐸(𝑌) = 𝐸(𝑁𝑦)
= 𝑁𝐸(𝑦)
= 𝑁𝑌 [ ∵ 𝐸(𝑦) = 𝑌]

13
=𝑌.
(proved)

Variance and standard error of estimators


a) Variance and standard error of 𝑦
b) Variance and standard error of 𝑌

a) Variance and standard error of 𝒚

Variance of 𝒚
𝑦
𝑉(𝑦) = 𝑉
𝑛
1
𝑉(𝑦) = 2 𝑉(𝑦)……………………(1)
𝑛
By definition
2
𝑉(𝑦) = 𝐸 𝑦 ― 𝐸(𝑦) ………………(2)

Now,
𝑛

𝐸 (𝑦) = 𝐸 𝑦𝑖 )
𝑖=1
𝑛

= 𝐸(𝑦𝑖 )
𝑖=1
𝑁

= 𝑛. 𝑦𝑖 𝑃(𝑦𝑖 )
𝑖=1
𝑁
1 1
= 𝑛. 𝑦𝑖 [ ∵ 𝑃(𝑦𝑖 ) = ,𝑏𝑦 𝑡ℎ𝑒 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦 𝑜𝑓 𝑆𝑅𝑆]
𝑁 𝑁
𝑖=1

= 𝑛𝑌
∴ 𝐸 (𝑦) = 𝑛𝑌………………(3)
Substituting (3) in (2),
2
𝑉(𝑦) = 𝐸 𝑦 ― 𝑛𝑌
𝑛 𝑛 2

=𝐸 𝑦𝑖 ― 𝑌
𝑖=1 𝑖=1

14
𝑛 2

=𝐸 (𝑦𝑖 ― 𝑌)
𝑖=1
2
= 𝐸 (𝑦1 ― 𝑌) + (𝑦2 ― 𝑌) + … + (𝑦𝑛 ― 𝑌)
𝑛
2
=𝐸 𝑦1 ― 𝑌 + (𝑦2 ― 𝑌) 2 + … + (𝑦𝑛 ― 𝑌) 2 + 𝐸 (𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌)
𝑖≠𝑗
𝑛 𝑛

=𝐸 (𝑦𝑖 ― 𝑌) 2 + 𝐸 (𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌)


𝑖=1 𝑖≠𝑗
𝑛 𝑛

= 𝐸(𝑦𝑖 ― 𝑌) 2 + 𝐸(𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌)


𝑖=1 𝑖≠𝑗
𝑛

= 𝑛𝜎 2 + 𝐸(𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌)
𝑖≠𝑗
𝑛

∴ 𝑉(𝑦) = 𝑛𝜎 2 + 𝐸(𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌) ………………(4)


𝑖≠𝑗

Case 1: sampling with replacement


For sampling with replacement, we have
𝐸 𝑦𝑖 ― 𝑌 𝑦𝑗 ― 𝑌 = 0
[Since this term is the covariance between 𝑦𝑖 and 𝑦𝑗 and in sampling with replacement, 𝑦𝑖 and 𝑦𝑗
are independent, hence this term vanishes.]

Therefore, from (4) we have for sampling with replacement


𝑉(𝑦) = 𝑛𝜎 2 ………… (5)

Substituting (5) in (1) we have for sampling with replacement


1
𝑉(𝑦) = 𝑛𝜎 2
𝑛2
𝜎2
⇒𝑉(𝑦) = ………(6)
𝑛
Therefore, for sampling with replacement
𝝈𝟐
𝑽(𝒚) =
𝒏
𝝈
𝑺.𝑬.(𝒚) =
𝒏

Case 2: sampling without replacement

15
From second part of (4),
𝐸(𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌)
= 𝐸(𝑦𝑖 𝑦𝑗 ― 𝑦𝑖 𝑌 ― 𝑦𝑗 𝑌 + 𝑌2 )
= 𝐸(𝑦𝑖 𝑦𝑗 ) ― 𝑌𝐸(𝑦𝑖 ) ― 𝑌𝐸(𝑦𝑗 ) + 𝑌2
= 𝐸(𝑦𝑖 𝑦𝑗 ) ― 𝑌2 ― 𝑌2 +𝑌2
= 𝐸(𝑦𝑖 𝑦𝑗 ) ― 𝑌2
𝑁 𝑁 2
∑𝑖≠𝑗 𝑦𝑖 𝑦𝑗 ∑𝑖=1 𝑦𝑖
= ―
𝑁(𝑁 ― 1) 𝑁
𝑁 2 𝑁 𝑁 2
2
1 ( ∑𝑖=1 𝑦𝑖 ) ― ∑𝑖=1 𝑦𝑖 ( ∑𝑖=1 𝑦𝑖 )
= ―
𝑁 𝑁―1 𝑁

𝑁 2 𝑁 𝑁 2
1 ( ∑𝑖=1 𝑦𝑖 ) ∑𝑖=1 𝑦𝑖2 ( ∑𝑖=1 𝑦𝑖 )
= ― ―
𝑁 𝑁―1 𝑁―1 𝑁

𝑁 2 𝑁
1 1 1 ∑𝑖=1 𝑦2𝑖
= 𝑦𝑖 ― ―
𝑁 𝑁―1 𝑁 𝑁―1
𝑖=1

𝑁 2 𝑁
1 𝑁―𝑁+1 ∑𝑖=1 𝑦𝑖2
= 𝑦𝑖 ―
𝑁 𝑁(𝑁 ― 1) 𝑁―1
𝑖=1

𝑁 2 𝑁
1 ( ∑𝑖=1 𝑦𝑖 ) ∑𝑖=1 𝑦𝑖2
= ―
𝑁 𝑁(𝑁 ― 1) 𝑁―1

𝑁 2
𝑁
―1 ∑𝑖=1 𝑦𝑖
= 𝑦𝑖2 ―
𝑁(𝑁 ― 1) 𝑁
𝑖=1

𝑁
―1
= (𝑦𝑖 ― 𝑌) 2
𝑁(𝑁 ― 1)
𝑖=1
𝑁
―1 ∑𝑖=1 (𝑦𝑖 ― 𝑌) 2
=
𝑁―1 𝑁

𝜎2
=―
𝑁―1
Now the second part of (4) is
𝑛

𝐸 𝑦𝑖 ― 𝑌 𝑦𝑗 ― 𝑌
𝑖≠𝑗

= 𝑛(𝑛 ― 1)𝐸(𝑦𝑖 ― 𝑌)(𝑦𝑗 ― 𝑌)

16
𝜎2
= ―𝑛(𝑛 ― 1).
𝑁―1
𝑛
𝜎2
∴ 𝐸 𝑦𝑖 ― 𝑌 𝑦𝑗 ― 𝑌 = ―𝑛(𝑛 ― 1). ………(7)
𝑁―1
𝑖≠𝑗

Substituting (7) in (4),


𝜎2
𝑉(𝑦) = 𝑛𝜎 2 ― 𝑛(𝑛 ― 1).
𝑁―1
𝑛―1
= 𝑛𝜎 2 1 ―
𝑁―1
𝑁―𝑛
= .𝑛𝜎 2 ………………(8)
𝑁―1
𝑁―𝑛 𝑁―1
= .𝑛𝑆 2 ………………(9) ∵ 𝜎2 = 𝑆2
𝑁 𝑁

Substituting (8) in (1),


1 𝑁―𝑛
𝑉(𝑦) = .𝑛𝜎 2
𝑛2 𝑁 ― 1
𝜎2 𝑁 ― 𝑛
⇒𝑉(𝑦) = . ……………(10)
𝑛 𝑁―1

Substituting (9) in (1),


1 𝑁―𝑛
𝑉(𝑦) = .𝑛𝑆 2
𝑛2 𝑁
𝑆2 𝑁 ― 𝑛
⇒𝑉(𝑦) = . ……………(11)
𝑛 𝑁

Thus, for sampling without replacement we have two forms of 𝑉(𝑦): in terms of 𝑆 2 and interms of
𝜎2 .
𝑽(𝒚) in terms of 𝑺𝟐 :
𝑆2 𝑁 ― 𝑛
𝑉(𝑦) = .
𝑛 𝑁
𝑆2 𝑛
= 1―
𝑛 𝑁
𝑆2
= (1 ― 𝑓)
𝑛
𝑛
Where = 𝑁 ; 𝑓 is called sampling fraction.
𝑽(𝒚) in terms of 𝝈𝟐 :
𝜎2 𝑁 ― 𝑛
𝑉(𝑦) = .
𝑛 𝑁―1
17
𝑁―𝑛 𝑁―𝑛
If N is large compared to n, we can write 𝑁―1 ― = 1 ― 𝑓.
𝑁

Hence,
𝜎2
𝑉(𝑦) ― (1 ― 𝑓)
𝑛

Therefore, for sampling without replacement


𝑺𝟐
𝑽(𝒚) = (𝟏 ― 𝒇)
𝒏

𝟏―𝒇
𝑺.𝑬.(𝒚) = 𝑺
𝒏

In terms of 𝜎 2 , for large N,


𝝈𝟐
𝑽(𝒚) ― (𝟏 ― 𝒇)
𝒏

𝝈𝟐
𝑺.𝑬.(𝒚)― (𝟏 ― 𝒇)
𝒏

………………………….………………………….………………………….……………………
𝑵𝒐𝒕𝒆:𝐻𝑒𝑟𝑒 𝑤𝑒 𝑢𝑠𝑒 𝑖 ≠ 𝑗 𝑎𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑖𝑠 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑟𝑒𝑝𝑙𝑎𝑐𝑒𝑚𝑒𝑛𝑡.
𝑁

𝑇ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑁𝑃2 = 𝑁(𝑁 ― 1)𝑡𝑒𝑟𝑚𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑢𝑚


𝑖≠𝑗

…………………………..………………………….………………………….……………………

b) Variance and standard error of 𝒀

At first, we have to find 𝑉(𝑦) for both cases.

For sampling with replacement


𝑉 𝑌 = 𝑉(𝑁𝑦)
= 𝑁 2 𝑉(𝑦)
𝑁2𝜎2
=
𝑛
For sampling without replacement,
𝑉 𝑌 = 𝑉(𝑁𝑦)
= 𝑁 2 𝑉(𝑦)
18
𝑆2 𝑁 ― 𝑛
= 𝑁2. .
𝑛 𝑁
𝑆2
= 𝑁(𝑁 ― 𝑛)
𝑛

Relation between 𝑽(𝒚) for sampling with replacement and 𝑽(𝒚) for sampling without
replacement

We have
𝜎2
for sampling with replacement 𝑉(𝑦) =
𝑛
𝜎 2 𝑁―𝑛
for sampling without replacement 𝑉(𝑦) = .
𝑛 𝑁―1

𝑁―𝑛
Thus, 𝑉(𝑦) in sampling without replacement is 𝑁―1 times its value in sampling with replacement.
𝑁―𝑛 𝑛 𝑛
Provided that N is large compared with n, 𝑁―1
―1― = 1 ― 𝑓, 𝑤ℎ𝑒𝑟𝑒 𝑓 = 𝑁 ,
𝑁
𝑁―𝑛
and 𝑁―1 is less than 1 for any n such that 1 < 𝑛 < 𝑁.
Therefore, 𝑉(𝑦) in sampling without replacement is less than the 𝑉(𝑦) in sampling with
replacement.
That is
𝜎2 𝑁 ― 𝑛 𝜎2
. < for any n such that 1 < 𝑛 < 𝑁.
𝑛 𝑁―1 𝑛

Finite population correction


For a random sample of size n from an infinite population, it is well known that the variance of the
𝜎2
mean is 𝑛
. The only change in this result when the population is finite is the introduction of the
factor 1 ― 𝑓. The factor 1 ― 𝑓 is a correction factor for the finite size of the population and is called
𝑛
finite population correction (fpc) while 𝑓 = 𝑁 is called sampling fraction. The sampling fraction
𝑛
𝑓 = 𝑁 is small when either the sample is small or the population is large. In either case, the factor
1 ― 𝑓 approaches to 1 and can be ignored. In such cases, 𝑉(𝑦) does not depend on N (i. e. the size
of the population has no direct effect on the standard error of the sample mean), and there is little or
no practical difference between two methods. In practice the fpc can be ignored whenever the
sampling fraction does not exceed 5% and for many purposes even if it is as high as 10%.

19
[What do you mean by finite population correction and sampling fraction? What happen to 𝑉(𝑦) in
case of small sampling fraction?]

Estimator of population variance


𝑛
∑𝑖=1 (𝑦𝑖 ― 𝑦) 2
𝑎) 𝑠2 = is used as an estimator of population variance.
𝑛―1
𝑛
∑𝑖=1 (𝑦𝑖 ― 𝑦) 2
𝑏) 𝑠2 = is an unbiased estimator of population variance.
𝑛―1
Proof:
By definition

𝑛
∑𝑖=1 (𝑦𝑖 ― 𝑦) 2
𝑠2 =
𝑛―1
and
𝑁
∑𝑖=1 (𝑦𝑖 ― 𝑌) 2
𝑆2 =
𝑁―1
Therefore,
𝑛
1
𝐸(𝑠 2 ) = 𝐸 (𝑦𝑖 ― 𝑦) 2
𝑛―1
𝑖=1
𝑛
1 2
= 𝐸 { 𝑦𝑖 ― 𝑌 ― 𝑦 ― 𝑌 }
𝑛―1
𝑖=1
𝑛
1 2 2
= 𝐸 𝑦𝑖 ― 𝑌 ―𝑛 𝑦―𝑌
𝑛―1
𝑖=1
𝑛
1 2 2
= 𝐸 𝑦𝑖 ― 𝑌 ― 𝑛𝐸 𝑦 ― 𝑌
𝑛―1
𝑖=1
1
= 𝑛𝜎 2 ― 𝑛𝑉(𝑦) ………(1)
𝑛―1

Case 1: For sampling with replacement


1
𝐸(𝑠 2 ) = 𝑛𝜎 2 ― 𝑛𝑉(𝑦) [𝑏𝑦 (1)]
𝑛―1
1 𝜎2
= 2
𝑛𝜎 ― 𝑛. [𝑏𝑦 (6)]
𝑛―1 𝑛

20
1 1
= .𝑛𝜎 2 1 ―
𝑛―1 𝑛
= 𝜎2
Thus, for sampling with replacement
𝑠 2 is an unbiased estimator of 𝜎 2 .

Case 2: For sampling without replacement


1
𝐸(𝑠 2 ) = 𝑛𝜎 2 ― 𝑛𝑉(𝑦) [𝑏𝑦 (1)]
𝑛―1
1 𝜎2 𝑁 ― 𝑛
= 2
𝑛𝜎 ― 𝑛. . [𝑏𝑦 (10)]
𝑛―1 𝑛 𝑁―1
𝑁
= 𝜎2
𝑁―1
= 𝑆2
Thus, for sampling without replacement
𝑠 2 is an unbiased estimator of 𝑆 2 .

……………………………………
Note:
𝑛
2
𝑦𝑖 ― 𝑌 ― 𝑦 ― 𝑌
𝑖=1
𝑛 𝑛 𝑛
2 2
= 𝑦𝑖 ― 𝑌 + 𝑦―𝑌 ―2 𝑦𝑖 ― 𝑌 𝑦―𝑌
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛
2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ―2 𝑦―𝑌 𝑦𝑖 ― 𝑌
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ―2 𝑦―𝑌 𝑦𝑖 ― 𝑌
𝑖=1 𝑖=1 𝑖=1
𝑛
2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ― 2 𝑦 ― 𝑌 𝑛𝑦 ― 𝑛𝑌
𝑖=1
𝑛
2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ― 2 𝑦 ― 𝑌 [𝑛 𝑦 ― 𝑌 ]
𝑖=1
𝑛
2 2 2
= 𝑦𝑖 ― 𝑌 +𝑛 𝑦―𝑌 ― 2𝑛 𝑦 ― 𝑌
𝑖=1

21
𝑛
2 2
= 𝑦𝑖 ― 𝑌 ―𝑛 𝑦―𝑌
𝑖=1

……………………………………

Use of 𝒔𝟐
𝑠 2 is used as an estimator of population variance.
𝑠 2 is used to estimate 𝑉(𝑦) and 𝑉(𝑌).
𝑠 2 is used to estimate 𝑆.𝐸.(𝑦) and 𝑆.𝐸.(𝑌).
𝑠 2 is used to find (1 ― 𝛼)% confidence interval for 𝑦 and 𝑌.

Estimators of 𝑽(𝒚) and 𝑽 𝒀


We know
𝜎2
𝑉(𝑦) = for sampling with replacement
𝑛
𝑆2
𝑉(𝑦) = (1 ― 𝑓) for sampling without replacement
𝑛

In both the cases of sampling with and without replacement, 𝑉(𝑦) involves unknown population
parameters 𝜎 2 and 𝑆 2 respectively. Therefore 𝑉(𝑦) can not be obtained from a sample, rather it
should be estimated. Hence to get the estimator of 𝑉(𝑦) in both cases, we have to estimate 𝜎 2 and
𝑆2.
Since 𝑠 2 is used as an estimator of 𝜎 2 and 𝑆 2 in cases of sampling with and without replacement
respectively, we obtain the estimator of 𝑉(𝑦) in both cases by replacing 𝜎 2 and 𝑆 2 respectively by
𝑠 2 . Also, we obtain the estimator of 𝑉(𝑌) in both cases accordingly.

Let us use the following notations.


For 𝑉(𝑦) and 𝑉 𝑌 :
𝑉(𝑦) = 𝜎2𝑦 and 𝑉(𝑌) = 𝜎𝑌2

For estimators of 𝑉(𝑦) and 𝑉 𝑌 :


𝑣(𝑦) = 𝜎2𝑦 = 𝑠2𝑦 and 𝑣 𝑌 = 𝜎𝑌2 = 𝑠𝑌2

Using the formulae of 𝑉(𝑦) and 𝑉(𝑌), we get unbiased estimators of 𝑉(𝑦) and 𝑉(𝑌) as follows.

22
𝑠2
𝑣(𝑦) = 𝜎2𝑦 = 𝑠2𝑦 = for sampling with replacement
𝑛
𝑠2
𝑣(𝑦) = 𝜎2𝑦 = 𝑠2𝑦 = (1 ― 𝑓) for sampling without replacement
𝑛
𝑁2𝑠2
𝑣 𝑌 = 𝜎𝑌2 = 𝑠𝑌2 = for sampling with replacement
𝑛
𝑁2𝑠2
𝑣 𝑌 = 𝜎𝑌2 = 𝑠𝑌2 = (1 ― 𝑓) for sampling without replacement
𝑛

Estimators of standard errors of 𝒚 and 𝒀

𝑠
𝜎𝑦 = 𝑠𝑦 = for sampling with replacement
𝑛

1―𝑓
𝜎𝑦 = 𝑠𝑦 = 𝑠 𝑛
for sampling without replacement

𝑁𝑠
𝜎𝑌 = 𝑠𝑌 = for sampling with replacement
𝑛

1―𝑓
𝜎𝑌 = 𝑠𝑌 = 𝑁𝑠 𝑛
for sampling without replacement

23
Table: Estimation of parameters in SRS at a glance

Population Population
Population Population Mean Population Total variance
variance
parameter 𝑌 𝑌
𝑆2 𝜎2

Type of without with with without with


without replacement
sampling replacement replacement replacement replacement replacement

Estimator 𝑦 𝑌 = 𝑁𝑦 𝑠2 𝑠2

Expected
value of 𝐸(𝑦) = 𝑌 𝐸(𝑌) = 𝑌 𝐸(𝑠 2 ) = 𝑆 2 𝐸(𝑠 2 ) = 𝜎 2
estimator

Variance 𝑆 2 𝑉(𝑦) = 𝜎𝑦2 𝑁 2 𝑆 2 𝑉 𝑌 = 𝜎𝑌2


of 𝑉(𝑦) = 𝜎𝑦2 = (1 𝑉 𝑌 = 𝜎𝑌2 = (1 - -
𝑛 𝜎2 𝑛 𝑁2𝜎2
estimator = =
― 𝑓) 𝑛 ― 𝑓) 𝑛

Standard
𝑆.𝐸.(𝑦) = 𝜎𝑦 𝑆.𝐸. 𝑌 = 𝜎𝑌
𝑆.𝐸.(𝑦) = 𝜎𝑦 𝑆.𝐸. 𝑌 = 𝜎𝑌
error of
1―𝑓 𝜎 1―𝑓 𝑁𝜎 - -
estimator =𝑆 = =
𝑛 = 𝑁𝑆
𝑛 𝑛 𝑛

Estimator 𝑣(𝑦) = 𝜎𝑦2 = 𝑠𝑦2 𝑣(𝑦) = 𝜎𝑦2 = 𝑠𝑣𝑦2 𝑌 = 𝜎𝑌2 = 𝑠𝑌2 𝑣 𝑌 = 𝜎𝑌2 = 𝑠𝑌2
of variance
𝑠2 𝑁2𝑠2 - -
of = (1 2 = (12 2
estimator 𝑛 𝑠 𝑛 𝑁 𝑠
= =
― 𝑓) 𝑛 ― 𝑓) 𝑛

Estimator 𝜎𝑌 = 𝑠𝑌
𝜎𝑦 = 𝑠𝑦
of 𝜎𝑦 = 𝑠𝑦
𝑁𝑠
standard 1―𝑓 𝑠 1―𝑓 𝜎𝑌 = 𝑠𝑌 = - -
=𝑠 = = 𝑁𝑠 𝑛
error of 𝑛 𝑛 𝑛
estimator

24
Theorem: The covariance between 𝑥 and𝑦 in a simple random sample of size n units from a
population of N units without replacement is given by
𝑆𝑥𝑥 𝑁―𝑛 𝜎𝑥𝑥 𝑁―𝑛
(a) 𝜎𝑥𝑦 = 𝐶𝑜𝑣(𝑥, 𝑦) = . = .
𝑛 𝑁 𝑛 𝑁―1

and the correlation coefficient between 𝑥 and 𝑦 is given by


𝑆
(b) 𝜌𝑥𝑦 = 𝑆 𝑥𝑥
𝑆
= 𝜌𝑥𝑥
𝑥 𝑦

where
𝜎𝑥𝑦 = 𝐶𝑜𝑣(𝑥, 𝑦) = 𝐸(𝑥 ― 𝑋)(𝑦 ― 𝑌)
𝑁
∑ 𝑥𝑖 ― 𝑋 𝑦𝑖 ― 𝑌
𝜎𝑥𝑥 = 𝐶𝑜𝑣(𝑥,𝑦) = 𝐸 𝑥𝑖 ― 𝑋 𝑦𝑖 ― 𝑌 = 𝑖=1
𝑁

𝑁
∑ 𝑥𝑖 ― 𝑋 𝑦𝑖 ― 𝑌 𝑁
𝑆𝑥𝑥 = 𝑖=1 = 𝜎
𝑁―1 𝑁 ― 1 𝑥𝑥

𝑁 2
∑ 𝑥𝑖 ― 𝑋
𝑆2𝑥 = 𝑖=1
𝑁―1
𝑁 2
∑ 𝑦𝑖 ― 𝑌
𝑆2𝑦 = 𝑖=1
𝑁―1

𝐶𝑜𝑣(𝑥, 𝑦)
𝜌𝑥𝑦 =
𝑉(𝑥)𝑉(𝑦)

𝐶𝑜𝑣(𝑥, 𝑦) 𝑆𝑥𝑥
𝜌𝑥 𝑦 = =
𝑉(𝑥)𝑉(𝑦) 𝑆𝑥 𝑆𝑦

Proof (a):
Let 𝑢𝑖 = 𝑥𝑖 + 𝑦𝑖𝑖 , so that 𝑢 = 𝑥 + 𝑦 ………. (1)

The corresponding population mean of 𝑢𝑖 is 𝑈 = 𝑋 + 𝑌 ………. (2)


Using the formula of 𝑉(𝑦)we have

𝑆2𝑢 𝑁 ― 𝑛
𝑉(𝑢) = 𝐸(𝑢 ― 𝑈) 2 = .
𝑛 𝑁
𝑁
𝑁 ― 𝑛 ∑𝑖=1 (𝑢𝑖 ― 𝑈) 2
=> 𝐸(𝑢 ― 𝑈) 2 = ………(3)
𝑁𝑛 𝑁―1

25
Using (1) and (2)
𝐸(𝑢 ― 𝑈) 2 = 𝐸[(𝑥 + 𝑦) ― (𝑋 + 𝑌)] 2
2
= 𝐸[ 𝑥 ― 𝑋 + (𝑦 ― 𝑌)]
= 𝐸(𝑥 ― 𝑋) 2 + 𝐸(𝑦 ― 𝑌) 2 + 2𝐸 𝑥 ― 𝑋 (𝑦 ― 𝑌)
= 𝑉(𝑥) + 𝑉(𝑦) + 2𝐶𝑜𝑣(𝑥, 𝑦)
2
𝑆2𝑥 𝑁 ― 𝑛 𝑆𝑦 𝑁 ― 𝑛
= . + . + 2𝐶𝑜𝑣(𝑥, 𝑦)
𝑛 𝑁 𝑛 𝑁
𝑁―𝑛 2 𝑁―𝑛 2
= 𝑆 + 𝑆 + 2𝐶𝑜𝑣(𝑥, 𝑦)
𝑁𝑛 𝑥 𝑁𝑛 𝑦
𝑁―𝑛 2
=> 𝐸(𝑢 ― 𝑈) 2 = 𝑆𝑥 + 𝑆2𝑦 + 2𝐶𝑜𝑣(𝑥, 𝑦)………(4)
𝑁𝑛
Now the second term of RHS of (3) can be expressed as
𝑁 𝑁 2
∑𝑖=1 (𝑢𝑖 ― 𝑈) 2 ∑ [ 𝑥𝑖 + 𝑦𝑖 ) ― (𝑋 + 𝑌 ]
= 𝑖=1
𝑁―1 𝑁―1
𝑁 2
∑ [ 𝑥𝑖 ― 𝑋) + (𝑦𝑖 ― 𝑌 ]
= 𝑖=1
𝑁―1
𝑁 2 𝑁 2 𝑁
∑ 𝑥𝑖 ― 𝑋 ∑ 𝑦𝑖 ― 𝑌 2 ∑𝑖=1 𝑥𝑖 ― 𝑋)(𝑦𝑖 ― 𝑌
= 𝑖=1 + 𝑖=1 +
𝑁―1 𝑁―1 𝑁―1

𝑁 2
∑ 𝑢𝑖 ― 𝑈
=> 𝑖=1 = 𝑆2𝑥 + 𝑆2𝑦 + 2𝑆𝑥𝑥 ………(5)
𝑁―1

Using (3) and (5)


2 𝑁―𝑛 2
𝐸 𝑢―𝑈 = 𝑆𝑥 + 𝑆2𝑦 + 2𝑆𝑥𝑥 ………(6)
𝑁𝑛
Equating (4) and (6)
𝑁―𝑛 2 𝑁―𝑛 2
𝑆𝑥 + 𝑆2𝑦 + 2𝐶𝑜𝑣(𝑥, 𝑦) = 𝑆𝑥 + 𝑆2𝑦 + 2𝑆𝑥𝑥
𝑁𝑛 𝑁𝑛
𝑁―𝑛 2 𝑁―𝑛 2 𝑁―𝑛
=> 𝑆𝑥 + 𝑆2𝑦 + 2𝐶𝑜𝑣(𝑥, 𝑦) = 𝑆𝑥 + 𝑆2𝑦 + 2𝑆𝑥𝑥
𝑁𝑛 𝑁𝑛 𝑁𝑛
𝑁―𝑛
=> 𝐶𝑜𝑣(𝑥, 𝑦) = 𝑆
𝑁𝑛 𝑥𝑥
𝑆𝑥𝑥 𝑁 ― 𝑛 𝜎𝑥𝑥 𝑁 ― 𝑛
=> 𝐶𝑜𝑣(𝑥, 𝑦) = . = .
𝑛 𝑁 𝑛 𝑁―1
(proved)

26
Proof (b):
By definition
𝑁―𝑛
𝜌𝑥𝑦 =
𝐶𝑜𝑣(𝑥, 𝑦)
= 𝑁𝑛 𝑆𝑥𝑥 =
𝑆𝑥𝑥
= 𝜌𝑥 𝑦
𝑉(𝑥)𝑉(𝑦) 𝑁―𝑛 2 𝑁―𝑛 2 𝑆𝑥 𝑆𝑦
𝑁𝑛 𝑆𝑥 𝑁𝑛 𝑆𝑦
(proved)

Corollary:
Estimators of 𝐶𝑜𝑣(𝑥, 𝑦) and 𝜌𝑥𝑦 are
𝑁―𝑛 𝑠
𝜎𝑥𝑦 = 𝑠𝑥𝑥 and 𝑟𝑥𝑦 = 𝑠 𝑥𝑥
𝑠
= 𝑟𝑥 𝑦 respectively.
𝑁𝑛 𝑥 𝑦

where
𝑛
∑ (𝑥𝑖 ― 𝑥) (𝑦𝑖 ― 𝑦)
𝑠𝑥𝑥 = 𝑖=1
𝑛―1
𝑛
∑𝑖=1 (𝑥𝑖 ― 𝑥)2
𝑠2𝑥 =
𝑛―1
𝑛
∑𝑖=1 (𝑦𝑖― 𝑦)2
𝑠2𝑦 =
𝑛―1

27
Exercise 1: Following table presents a population data for salaries of 30 employees. Select a
simple random sample of size 10 using random number table and estimate the average salary.
Also estimate the standard error of sample mean.

ID Salary (‘000 Taka) ID Salary (‘000 Taka)


1 30 16 26
2 40 17 15
3 25 18 17
4 35 19 18
5 40 20 20
6 25 21 40
7 40 22 18
8 25 23 25
9 25 24 43
10 15 25 50
11 20 26 15
12 22 27 35
13 30 28 65
14 22 29 24
15 40 30 55

The random numbers are as follows:


277 130 802 108 541 603 497 786 666 440 413 258 161
414 945 416 502 413 258 061 608 809 195 609 923 779
493 063 609 923 779 381 396 840 474 433 953 407 582
642 668 724 210 953 407 582 895 154 121 108 541 603

1
Solution:
Hints:
We have the following random numbers:
277 130 802 108 541 603 497 786 666 440 413 258 161
414 945 416 502 413 258 061 608 809 195 609 923 779
493 063 609 923 779 381 396 840 474 433 953 407 582
642 668 724 210 953 407 582 895 154 121 108 541 603
Using remainder method, we start from 1st row and 1st column and proceed along row. The
selected IDs with their salaries are presented in the table.

Sl Selected ID Salary (‘000 Taka)


xi
1
2
3
4
5
6
7
8
9
10
Total - n

x
i 1
i 

The calculation of average is given below.


n

x i
x i 1
 .Tk .....
n
Comment: The estimated average salary of 30 employees is Taka…..
Estimation of standard error of sample mean
We have

2
𝑛
∑𝑖=1 (𝑦𝑖 ― 𝑦) 2
𝑠2 =
𝑛―1
2
n 
n
  yi 
 yi   i 1 
2

n =..........
s 
2 i 1

n 1
Now,

1―𝑓
𝜎𝑦 = 𝑠𝑦 = 𝑠 =…
𝑛

3
Exercise 2: Following table presents a population data for gender distribution of 30 employees.
Select a simple random sample of size 10 using random number table and estimate the
proportion male and female employees.Also estimate the standard errors of sample proportions
(for both male and female).

ID Gender ID Gender
1 Male 16 Male
2 Female 17 Male
3 Male 18 Female
4 Male 19 Male
5 Female 20 Male
6 Male 21 Female
7 Female 22 Male
8 Male 23 Female
9 Female 24 Male
10 Female 25 Male
11 Male 26 Male
12 Male 27 Female
13 Male 28 Male
14 Female 29 Female
15 Female 30 Female

Solution:
Hints:
We have the following random numbers:
277 130 802 108 541 603 497 786 666 440 413 258 161
414 945 416 502 413 258 061 608 809 195 609 923 779
493 063 609 923 779 381 396 840 474 433 953 407 582
642 668 724 210 953 407 582 895 154 121 108 541 603
Using remainder method we start from 1st row and 1st column and proceed along column. The
selected IDs with their gender are presented in the table.

Sl Selected ID Gender
1
2
3

4
4
5
6
7
8
9
10

Following table prepared for presenting the proportions male and female employees.
Table: Distribution of employeesby gender

Gender of employee Tally Number Percentage


Male
Female
Total

Comment: The estimated proportions of male and female employees are …..% and……%
respectively.
Standard error of proportion (male)
𝑁―𝑛 𝑝𝑞
𝑣(𝑝) = 𝜎𝑝2 = 𝑠𝑝2 = . = …………………
𝑁 𝑛―1

So, its estimated standard error is 𝑣(𝑝) =

Standard error of proportion (female)

𝑁―𝑛 𝑝𝑞
𝑣(𝑝) = 𝜎𝑝2 = 𝑠𝑝2 = . = …………………
𝑁 𝑛―1

So, its estimated standard error is 𝑣(𝑝) =

Exercise 3:
Following table presents a sample data collected in a household survey. The survey was
conducted in a village of 500 households.
i) Estimate the average household size and estimate its standard error.
ii) Estimate the total number of household members in the village and estimate its standard
error.

ID HH size freq ID HH size freq ID HH size freq


1 3 5 11 5 6 21 6 5

5
2 5 4 12 3 6 22 5 4
3 6 3 13 6 3 23 7 3
4 7 2 14 4 5 24 5 6
5 3 5 15 4 4 25 6 5
6 5 3 16 5 5 26 4 4
7 4 4 17 6 6 27 4 6
8 6 5 18 7 3 28 6 5
9 5 4 19 4 3 29 3 6
10 4 3 20 6 4 30 6 3

You might also like