0% found this document useful (0 votes)
16 views9 pages

Sampling Basics

This document provides an introduction to sampling methodologies, defining key concepts such as population, sample, and sampling methods. It discusses the importance of selecting a representative sample and outlines various sampling techniques, including probability and non-probability methods. The document also highlights core issues in sampling, such as sample size determination and the significance of a sampling frame.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

Sampling Basics

This document provides an introduction to sampling methodologies, defining key concepts such as population, sample, and sampling methods. It discusses the importance of selecting a representative sample and outlines various sampling techniques, including probability and non-probability methods. The document also highlights core issues in sampling, such as sample size determination and the significance of a sampling frame.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Lesson 1 | Sampling Basics

This introductory article depicts the basics of sampling methodologies. Last updated:
There are 10 topics to start. 22OCT2020
Keywords: Sampling 60 minutes read
Authors: Imran Khan, Taposhy Tabassum Shantona Difficulty Level: Easy
Reviewer: Shamimul Islam

Sampling: Sampling is a process of selecting individual observations from the accessible part of
statistical population with a motive -

o to derive parameter estimations from the selected portion and


o to make inferences for the entire population characteristics, like mean, variances,
proportion of some variable/attribute etc.

Sampling is not merely picking up a portion of the population. Instead, it is the science and art of
controlling and measuring reliability of useful statistical information through the theory of probability.

Consider the three examples below.

1. What percentage of university students think that on-campus politics should be banned in
Bangladesh? To answer this question, we have to talk to actual students to know their
perceptions on student politics. But it is not possible to reach every university student in
Bangladesh to know their viewpoint on the issue. It is not even feasible by time, cost, and
manpower. So, we selectively interview some students to learn their views and then try to make
inferences about all university students’ perceptions on student politics.

This is an example of sampling. The students we interview form the “sample”, while all university
students are our “population”. Every individual student of our sample is a “sampling unit”.

2. When we check the taste of the soup / curry being cooked, we taste a little sample to check if
salt, spices, and other ingredients are in right proportion.
3. To learn the disease of a patient, we take a blood sample from him.

Sampling is something we do almost every day.

1. Population
Population is the collection of all items in which a researcher is interested. In other words, a statistical
population is defined as the totality of all individuals that have some characteristics of interest.

There are 2 types of population.

Target population – The complete collection of all observations we want to study. It is important to
define the target population of a study, although it is often a difficult task to do.
Sample population – The collection of all possible observation units that might be chosen in a sample.
This is the population from which the sample is taken. It is also known as survey population.

Think about the student politics example presented before. There are many types of students in a
university. There are full-time and part-time students, distant learners, or remote students. Say, to do
interviews, we think to exclude the remote students as they are not usually physically present in the
campus (Some foreign universities have such distant learners, though this is not very common in
Bangladesh). So, while all students are part of our target population, our sample population will not
include the remote students. In other words, since we will not interview any remote students, though
they are part of our total population, they are not part of our sample population.

Target Population Sample Population


Definition Complete collection of More restricted set of population from which the
observations. sample is chosen.
Extra Although difficult, it is important Population that come with ‘non-response’ and ‘non-
Notes to define target population for a coverage’ is not part of sample population.
research. Sample population is also known as Survey population.

2. Methods of Sampling

Sampling methods: There are 2 types of sampling methods; viz., i) Probability sampling; ii) Non-
probability sampling.

Probability sampling methods Non-probability sampling methods


Definition When each element in the When some element in the
population has a known, non-zero population has no chance of getting
probability of getting included in selected in the sample, then the
the sample, then the method is method is called non-probability
called a probability sampling. sampling.
Merit Probability sampling provides a Reliability of sampling results cannot
valid estimate of sampling error. be determined in terms of
probability.
Probability sampling results in
efficient parameter estimation and
more precise inference-making.
Important sampling 1. Simple Random sampling, 2. 1. Judgement sampling, 2. Quota
methods of the category Stratified sampling, 3. Systematic sampling, 3. Convenience sampling,
sampling, 4. Cluster sampling, 5. 4. Snowball sampling.
Multistage sampling, 6. Multiphase
sampling.
5 core issues regarding sampling
There are 5 issues regarding sampling in every research. Those are –

1. Determining sample size.


2. Determining sampling method i.e., sample selection process.
3. Determining what characteristics should be estimated and how.
4. Determining the reliability of the estimates.
5. Determining the way generalization be made based on calculated estimates.

3. Technical terms: Representative Sample, Parameter vs.


Statistic

Population size: The number of observation units that constitutes the population is known as
population size.

Sample size: The number of observation units that constitutes the sample is known as population size.

Sampling Unit: Every element of a sample is called a sampling unit. In other words, a sampling unit is a
well-defined element or group of elements on which we make observations. For example, if we select 10
students to represent a batch of 100 students, then each of those 10 students is a sampling unit.

Sample: A sample is a collection of sampling units that one desires to study. A sample is expected to be
representative of the total population so that the inferences on population characteristics turn out to be
precise.

Any subset of a population is a sample; but if the sample is not representative, it is not a statistical one.

Random sample: If the sampling units in a sample are selected purely by chance mechanism then the
sample is known as random sample. The chances of individual sampling units may not be equal; but
should be known. By definition, random sampling is free from selection bias.

Representative sample: If a subset of a population represents proportionality reflected by the


characteristics under examination of the target population, then it is called a representative population.
Such samples yield the best estimations since they accurately reflect the subsets of the population.

Example. A researcher wants to find the gender-wise income of the people in an area. She asked two
of her students to perform sampling, giving them the information that there are 100 income-generating
people in total, 40 of them are female.
Both students interview 10 people, but student 1 had some of his male friends living in that area and so
he interviewed all of them. He interviewed only 1 working female. Student 2 chose his interviewees at
random; he interviewed 4 females.

Compare the 2 samples.

Sample of student 1 Sample of student 2


Randomness of sample No, student 1 had selection bias Yes, he selected all interviewees
as he interviewed his friends. by chance.
Representativeness of sample No, student 1 interviewed only Yes, given a 60%-40% male-
1 woman, while there were 40% female split, student 2
female income-generators. interviewed 6 males, 4 females.
Bias Yes, student 1 had bias selecting He had no bias as he picked
his known friends for interview. them at random.

As an ending note, the sample by student 2 is both random and representative.

4. Sampling Frame
A frame is a list/group and a selection procedure. A Sampling Frame is a complete list of units of the
population to be sampled and organized in such a way that

i. every unit in the list occur once and only once, and

ii. no units is excluded from the list, and

iii. every unit has a non-zero probability of being included in the sample.

In other words, a complete and up-to-date list of sampling units that are accessible portion of the
population is called sampling frame. It may include individuals, households, institutions, geographic area
etc.
Example. A research team wants to learn about the education quality of government primary schools
of a particular area. The area has a total of 100 government schools. When they plan to visit the area,
they learn that some part of the area is hit by a monsoon flood and so, the schools of that part (20 in
total) are closed. They also tried communicating them before planning a visit but could not contact 5
schools.

They visited a total of 25 schools, and from 20 schools they get useful information about the ‘quality of
education’, which is their actual research objective. Now look the following terms.

Total Population (all units of interest – 100 schools)

Units that cannot be sampled [20 schools for non-


coverage, 5 schools for non-response]

Sample Population (accessible part of population, aka sampling


frame – 75 schools)

Units that are not sampled [50 schools]

Gross Sample (units that are sampled – 25 schools)

Units that do not provide useful data [5 schools]

Nett Sample (units providing important data – 20 schools)

5. Survey, Census, Sample Survey

5. Merits of Sampling? Random Sampling? Sampling vs Census comparison (NI page 12, sec 1.8)
8. Estimator, Estimation, Bias

5.1 Limitations of sampling

6. 2 Types of Design (check)

Survey design, sample design (NI book , page 9, and lecture sheet 1 page 7)

7. steps of a sample survey design

Infographic with example

8. Sample size determination

One of the first questions researchers have to deal with is that how big a sample they require.

Extra Notes

Conclusion

What we learnt
From: Muhammad Imran Khan <[email protected]>
Date: Tue, 6 Jul 2021 at 20:27
Subject: ASDSNC01 - few data terms
To: Md. Mizanur Rahman <[email protected]>

Statistics is a subject that deals with collection, organization, presentation, analysis, and
interpretation of data. “Statistics” as defined by the American Statistical Association (ASA) “is
the science of learning from data, and of measuring, controlling and communicating
uncertainty.”
The core theme of Statistics is data, and so, there exists no statistics without data. A few terms
below are the vary basics to know.
Data: Observed values of a variable.
Variable: Characteristics that differ from individual to individual, or element to element.
Element: Entries or issues on which data are collected
Observation: The set of measurements collected for a particular element / individual.
Have a look at the table below.

Student ID Course 1 Grade CGPA


Imran 3012 A+ 4
Irfan 3055 A 3.8
Ihsan 3021 A 3.8
Ikram 3025 B+ 3.5

Here we have 4 students as 4 elements. Each element has 3 variables (Id, Grade, CGPA), and
that gives us 3X4 = 12 observations.
---------- Forwarded message ---------
From: Muhammad Imran Khan <[email protected]>
Date: Tue, 6 Jul 2021 at 22:27
Subject: ASDSNC 01 measurement level
To: Md. Mizanur Rahman <[email protected]>

Measurement Levels
We can also describe data as either qualitative or quantitative. With qualitative data there is
no measurable meaning to the “difference” in numbers. For example, one basketball player
is assigned the number 20 and another player has the number 10. We cannot conclude that
the first player plays twice as well as the second player. However, with quantitative data
there is a measurable meaning to the difference in numbers. When one student scores 90 on
an exam and another student scores 45, the difference is measurable and meaningful.
Qualitative data include nominal and ordinal levels of measurement. Quantitative
data include interval and ratio levels of measurement.
Nominal and ordinal levels of measurement refer to data obtained from categorical
questions. Responses to questions on gender, country of citizenship, political affiliation,
and ownership of a mobile phone are nominal. Nominal data are considered the lowest or
weakest type of data, since numerical identification is chosen strictly for convenience and
does not imply ranking of responses.
The values of nominal variables are words that describe the categories or classes of
responses. The values of the gender variable are male and female; the values of Do you
own a car? are yes and no. We arbitrarily assign a code or number to each response. However,
this number has no meaning other than for categorizing. For example, we could
code gender responses or yes>no responses as follows:
1 = Male; 2 = Female
1 = Yes; 2 = No
Ordinal data indicate the rank ordering of items, and similar to nominal data the values
are words that describe responses. Some examples of ordinal data and possible codes
are as follows:
1. Product quality rating (1: poor; 2: average; 3: good)
2. Satisfaction rating with your current Internet provider (1: very dissatisfied; 2: moderately
dissatisfied; 3: no opinion; 4: moderately satisfied; 5: very satisfied)
3. Consumer preference among three different types of soft drink (1: most preferred;
2: second choice; 3: third choice)
In these examples the responses are ordinal, or put into a rank order, but there is
no measurable meaning to the “difference” between responses. That is, the difference between
your first and second choices may not be the same as the difference between your
second and third choices.
Interval and ratio levels of measurement refer to data obtained from numerical variables,
and meaning is given to the difference between measurements. An interval scale indicates
rank and distance from an arbitrary zero measured in unit intervals. That is, data
are provided relative to an arbitrarily determined benchmark. Temperature is a classic
example of this level of measurement, with arbitrarily determined benchmarks generally
based on either Fahrenheit or Celsius degrees. Suppose that it is 80°F in Orlando, Florida,
and only 20°F in St. Paul, Minnesota. We can conclude that the difference in temperature
is 60°, but we cannot say that it is four times as warm in Orlando as it is in St. Paul. The
year is another example of an interval level of measurement, with benchmarks based most
commonly on the Gregorian calendar.

Ratio data indicate both rank and distance from a natural zero, with ratios of two
measures having meaning. A person who weighs 200 pounds is twice the weight of a
person who weighs 100 pounds; a person who is 40 years old is twice the age of someone
who is 20 years old.
After collecting data, we first need to classify responses as categorical or numerical or by
measurement scale. Next, we assign an arbitrary ID or code number to each response. Some
graphs are appropriate for categorical variables, and others are used for numerical variables.
Note that data files usually contain “missing values.” For example, respondents to a
questionnaire may choose not to answer certain questions about gender, age, income, or
some other sensitive topic. Missing values require a special code in the data entry stage.
Unless missing values are properly handled, it is possible to obtain erroneous output.
Statistical software packages handle missing values in different ways.

You might also like