Sampling Basics
Sampling Basics
This introductory article depicts the basics of sampling methodologies. Last updated:
There are 10 topics to start. 22OCT2020
Keywords: Sampling 60 minutes read
Authors: Imran Khan, Taposhy Tabassum Shantona Difficulty Level: Easy
Reviewer: Shamimul Islam
Sampling: Sampling is a process of selecting individual observations from the accessible part of
statistical population with a motive -
Sampling is not merely picking up a portion of the population. Instead, it is the science and art of
controlling and measuring reliability of useful statistical information through the theory of probability.
1. What percentage of university students think that on-campus politics should be banned in
Bangladesh? To answer this question, we have to talk to actual students to know their
perceptions on student politics. But it is not possible to reach every university student in
Bangladesh to know their viewpoint on the issue. It is not even feasible by time, cost, and
manpower. So, we selectively interview some students to learn their views and then try to make
inferences about all university students’ perceptions on student politics.
This is an example of sampling. The students we interview form the “sample”, while all university
students are our “population”. Every individual student of our sample is a “sampling unit”.
2. When we check the taste of the soup / curry being cooked, we taste a little sample to check if
salt, spices, and other ingredients are in right proportion.
3. To learn the disease of a patient, we take a blood sample from him.
1. Population
Population is the collection of all items in which a researcher is interested. In other words, a statistical
population is defined as the totality of all individuals that have some characteristics of interest.
Target population – The complete collection of all observations we want to study. It is important to
define the target population of a study, although it is often a difficult task to do.
Sample population – The collection of all possible observation units that might be chosen in a sample.
This is the population from which the sample is taken. It is also known as survey population.
Think about the student politics example presented before. There are many types of students in a
university. There are full-time and part-time students, distant learners, or remote students. Say, to do
interviews, we think to exclude the remote students as they are not usually physically present in the
campus (Some foreign universities have such distant learners, though this is not very common in
Bangladesh). So, while all students are part of our target population, our sample population will not
include the remote students. In other words, since we will not interview any remote students, though
they are part of our total population, they are not part of our sample population.
2. Methods of Sampling
Sampling methods: There are 2 types of sampling methods; viz., i) Probability sampling; ii) Non-
probability sampling.
Population size: The number of observation units that constitutes the population is known as
population size.
Sample size: The number of observation units that constitutes the sample is known as population size.
Sampling Unit: Every element of a sample is called a sampling unit. In other words, a sampling unit is a
well-defined element or group of elements on which we make observations. For example, if we select 10
students to represent a batch of 100 students, then each of those 10 students is a sampling unit.
Sample: A sample is a collection of sampling units that one desires to study. A sample is expected to be
representative of the total population so that the inferences on population characteristics turn out to be
precise.
Any subset of a population is a sample; but if the sample is not representative, it is not a statistical one.
Random sample: If the sampling units in a sample are selected purely by chance mechanism then the
sample is known as random sample. The chances of individual sampling units may not be equal; but
should be known. By definition, random sampling is free from selection bias.
Example. A researcher wants to find the gender-wise income of the people in an area. She asked two
of her students to perform sampling, giving them the information that there are 100 income-generating
people in total, 40 of them are female.
Both students interview 10 people, but student 1 had some of his male friends living in that area and so
he interviewed all of them. He interviewed only 1 working female. Student 2 chose his interviewees at
random; he interviewed 4 females.
4. Sampling Frame
A frame is a list/group and a selection procedure. A Sampling Frame is a complete list of units of the
population to be sampled and organized in such a way that
i. every unit in the list occur once and only once, and
iii. every unit has a non-zero probability of being included in the sample.
In other words, a complete and up-to-date list of sampling units that are accessible portion of the
population is called sampling frame. It may include individuals, households, institutions, geographic area
etc.
Example. A research team wants to learn about the education quality of government primary schools
of a particular area. The area has a total of 100 government schools. When they plan to visit the area,
they learn that some part of the area is hit by a monsoon flood and so, the schools of that part (20 in
total) are closed. They also tried communicating them before planning a visit but could not contact 5
schools.
They visited a total of 25 schools, and from 20 schools they get useful information about the ‘quality of
education’, which is their actual research objective. Now look the following terms.
5. Merits of Sampling? Random Sampling? Sampling vs Census comparison (NI page 12, sec 1.8)
8. Estimator, Estimation, Bias
Survey design, sample design (NI book , page 9, and lecture sheet 1 page 7)
One of the first questions researchers have to deal with is that how big a sample they require.
Extra Notes
Conclusion
What we learnt
From: Muhammad Imran Khan <[email protected]>
Date: Tue, 6 Jul 2021 at 20:27
Subject: ASDSNC01 - few data terms
To: Md. Mizanur Rahman <[email protected]>
Statistics is a subject that deals with collection, organization, presentation, analysis, and
interpretation of data. “Statistics” as defined by the American Statistical Association (ASA) “is
the science of learning from data, and of measuring, controlling and communicating
uncertainty.”
The core theme of Statistics is data, and so, there exists no statistics without data. A few terms
below are the vary basics to know.
Data: Observed values of a variable.
Variable: Characteristics that differ from individual to individual, or element to element.
Element: Entries or issues on which data are collected
Observation: The set of measurements collected for a particular element / individual.
Have a look at the table below.
Here we have 4 students as 4 elements. Each element has 3 variables (Id, Grade, CGPA), and
that gives us 3X4 = 12 observations.
---------- Forwarded message ---------
From: Muhammad Imran Khan <[email protected]>
Date: Tue, 6 Jul 2021 at 22:27
Subject: ASDSNC 01 measurement level
To: Md. Mizanur Rahman <[email protected]>
Measurement Levels
We can also describe data as either qualitative or quantitative. With qualitative data there is
no measurable meaning to the “difference” in numbers. For example, one basketball player
is assigned the number 20 and another player has the number 10. We cannot conclude that
the first player plays twice as well as the second player. However, with quantitative data
there is a measurable meaning to the difference in numbers. When one student scores 90 on
an exam and another student scores 45, the difference is measurable and meaningful.
Qualitative data include nominal and ordinal levels of measurement. Quantitative
data include interval and ratio levels of measurement.
Nominal and ordinal levels of measurement refer to data obtained from categorical
questions. Responses to questions on gender, country of citizenship, political affiliation,
and ownership of a mobile phone are nominal. Nominal data are considered the lowest or
weakest type of data, since numerical identification is chosen strictly for convenience and
does not imply ranking of responses.
The values of nominal variables are words that describe the categories or classes of
responses. The values of the gender variable are male and female; the values of Do you
own a car? are yes and no. We arbitrarily assign a code or number to each response. However,
this number has no meaning other than for categorizing. For example, we could
code gender responses or yes>no responses as follows:
1 = Male; 2 = Female
1 = Yes; 2 = No
Ordinal data indicate the rank ordering of items, and similar to nominal data the values
are words that describe responses. Some examples of ordinal data and possible codes
are as follows:
1. Product quality rating (1: poor; 2: average; 3: good)
2. Satisfaction rating with your current Internet provider (1: very dissatisfied; 2: moderately
dissatisfied; 3: no opinion; 4: moderately satisfied; 5: very satisfied)
3. Consumer preference among three different types of soft drink (1: most preferred;
2: second choice; 3: third choice)
In these examples the responses are ordinal, or put into a rank order, but there is
no measurable meaning to the “difference” between responses. That is, the difference between
your first and second choices may not be the same as the difference between your
second and third choices.
Interval and ratio levels of measurement refer to data obtained from numerical variables,
and meaning is given to the difference between measurements. An interval scale indicates
rank and distance from an arbitrary zero measured in unit intervals. That is, data
are provided relative to an arbitrarily determined benchmark. Temperature is a classic
example of this level of measurement, with arbitrarily determined benchmarks generally
based on either Fahrenheit or Celsius degrees. Suppose that it is 80°F in Orlando, Florida,
and only 20°F in St. Paul, Minnesota. We can conclude that the difference in temperature
is 60°, but we cannot say that it is four times as warm in Orlando as it is in St. Paul. The
year is another example of an interval level of measurement, with benchmarks based most
commonly on the Gregorian calendar.
Ratio data indicate both rank and distance from a natural zero, with ratios of two
measures having meaning. A person who weighs 200 pounds is twice the weight of a
person who weighs 100 pounds; a person who is 40 years old is twice the age of someone
who is 20 years old.
After collecting data, we first need to classify responses as categorical or numerical or by
measurement scale. Next, we assign an arbitrary ID or code number to each response. Some
graphs are appropriate for categorical variables, and others are used for numerical variables.
Note that data files usually contain “missing values.” For example, respondents to a
questionnaire may choose not to answer certain questions about gender, age, income, or
some other sensitive topic. Missing values require a special code in the data entry stage.
Unless missing values are properly handled, it is possible to obtain erroneous output.
Statistical software packages handle missing values in different ways.