Lecture Note 02 - Statistical Data and Its Collection Process
Lecture Note 02 - Statistical Data and Its Collection Process
Variable
A variable is a characteristic or attribute that can assume different values. For example, age, sex,
family size, income, etc.
Data
Data are the values (measurements or observations) that the variables can assume. For example,
the values of income of the population, the values of age of the population, etc.
A collection of data values forms a data set. Each value in the data set is called a data value
or a datum.
Qualitative variables are variables that can be placed into distinct categories, according to some
characteristic or attribute.
For example, if subjects are classified according to gender (male or female), then the variable
gender is qualitative. Other examples of qualitative variables are religious preference and
geographic locations.
For example, the variable age is numerical, and people can be ranked in order according to
the value of their ages. Other examples of quantitative variables are heights, weights, and body
temperatures.
Quantitative variables can be further classified into two groups: discrete and continuous.
Examples of discrete variables are the number of children in a family, the number of students
in a classroom.
Continuous variables can assume an infinite number of values between any two specific values.
They are obtained by measuring. They often include fractions and decimals.
For example, temperature is a continuous variable, since the variable can assume an infinite
number of values between any two given temperatures.
Prepared by: Suman Biswas, Lecturer, Dept. of Statistics, Islamic University, Kushtia-7003 1
The classification of variables can be summarized as follows:
In addition to being classified as qualitative or quantitative, variables can be classified by how they
are categorized, counted, or measured. Data can be classified according to levels of measurement.
The level of measurement of the data dictates the calculations that can be done to summarize and
present the data. It will also determine the statistical tests that should be performed.
There are actually four levels of measurement: nominal, ordinal, interval, and ratio. The lowest,
or the most primitive, measurement is the nominal level. The highest, or the level that gives us the
most information about the observation, is the ratio level of measurement.
For example, from exam evaluations, students might be ranked as excellent, average, or
poor. Floats in a homecoming parade might be ranked as first place, second place, etc.
Prepared by: Suman Biswas, Lecturer, Dept. of Statistics, Islamic University, Kushtia-7003 2
In summary, the properties of the ordinal level of data are:
1. Data classifications are represented by sets of labels or names (high, medium, low) that have
relative values.
2. Because of the relative values, the data classified can be ranked or ordered.
1. Data classifications are ordered according to the amount of the characteristic they possess.
2. Equal differences in the characteristic are represented by equal differences in the
measurements.
Ratio-Level Data
The ratio level of measurement possesses all the characteristics of interval measurement, and there
exists a true zero. In addition, true ratios exist when the same variable is measured on two different
members of the population. It is the final level of measurement.
Examples of ratio scales are those used to measure height, weight, area, and number of phone calls
received. Ratio scales have differences between units (1 inch, 1 pound, etc.) and a true zero. In
addition, the ratio scale contains a true ratio between values.
1. Data classifications are ordered according to the amount of the characteristics they possess.
2. Equal differences in the characteristic are represented by equal differences in the numbers
assigned to the classifications.
3. The zero point is the absence of the characteristic and the ratio between two numbers is
meaningful.
Prepared by: Suman Biswas, Lecturer, Dept. of Statistics, Islamic University, Kushtia-7003 3
Figure: Summary of the Characteristics for Levels of Measurement
Statistical data may be broadly categorized as primary data and secondary data. Primary data come
mainly from direct field operations, which may either be a census or a separately designed survey.
On the other hand, secondary data are usually procured from already published or unpublished
documents rather than undertaking first-hand field investigations. The primary data collected by
an agency or organization thus constitutes the secondary data in the hands of other agencies.
Bangladesh Bureau of Statistics, for example, conducts occasional surveys on various aspects such
as health, economics, morbidity, etc. Such data in their hands are regarded as primary data. They
are compiling, analyzing, and preparing periodic reports on the issues. If these data are used by
some other interested groups to serve their purposes, the BBS data become secondary to them.
Primary data is that which had not been previously generated. It's collected at the time of research
by the researcher themselves. We can refer to sources of primary data as primary research methods.
Prepared by: Suman Biswas, Lecturer, Dept. of Statistics, Islamic University, Kushtia-7003 4
Sources of Secondary data
As opposed to primary data, secondary data is that which already exists at the time of research.
There are many popular sources of secondary data used in sociology. Some examples include:
Official statistics
o Bangladesh Bureau of Statistics (BBS).
o National Institute of Population Research and Training (NIPORT).
Documents (such as historical documents or government reports).
Newspapers
Recorded music, films and other artwork
In statistics, we are interested in obtaining information about a total collection of elements, which
we will refer to as the population. The population is often too large for us to examine each of its
members. In such cases, we try to learn about the population by choosing and then examining a
subgroup of its elements. This subgroup of a population is called a sample.
Population
The entire set of individuals or objects of interest or the measurements obtained from all individuals
or objects of interest. For example, the whole number of students at Islamic University, Kushtia
while studying the total no. of students, their age, height and so on.
Sample
A representative portion, or part, of the population of interest. For example, representative part
of students at Islamic University, Kushtia while studying the students’ average age, height and so
on.
Reasons to Sample
When studying characteristics of a population, there are many practical reasons why we prefer to
select portions or samples of a population to observe and measure. Here are some of the reasons
for sampling:
In order for the sample to be informative about the total population, it must be, in some sense,
representative of that population. In practice, a given sample generally cannot be considered to be
representative of a population unless that sample has been chosen in a random manner.
To obtain samples that are unbiased—i.e., that give each subject in the population an equally likely
chance of being selected—we usually use four basic methods of sampling: random, systematic,
stratified, and cluster sampling.
To illustrate simple random sampling and selection, suppose a population consists of 56 students
in the 2nd year of Dept. of Pharmacy, IU. A sample of 20 students is to be selected from that
population. One way of ensuring that every student in the population has the same chance of
being chosen is to first write the name of each student on a small slip of paper and deposit all the
slips in a box. After they have been thoroughly mixed, the first selection is made by drawing a slip
out of the box without looking at it. This process is repeated until the sample of 20 students is
chosen.
Where first, k is calculated as the population size divided by the sample size and Random sampling
is used in the selection of the first invoice.
For example, suppose there were 2000 subjects in the population and a sample of 50 subjects were
needed. Since 2000/50=40, then k = 40, and every 40th subject would be selected; however, the
first subject (numbered between 1 and 40) would be selected at random.
A sampling technique where population is divided into subgroups, called strata, and a sample is
then randomly selected from each stratum is termed as stratified random sampling.
Prepared by: Suman Biswas, Lecturer, Dept. of Statistics, Islamic University, Kushtia-7003 6
For example, suppose the chairman of a Dept. of Pharmacy wants to learn how consecutive two
years’ students feel about a certain issue. Furthermore, the Chairman wishes to see if the opinions
of the first-year students differ from those of the second-year students. The chairman will select
students from each group to use in the sample
Cluster Sampling
In cluster sampling, the population is divided into clusters using naturally occurring geographic or
other boundaries. Then, clusters are randomly selected, and a sample is collected by randomly
selecting from each cluster.
Suppose a researcher wishes to survey apartment dwellers in a large city. If there are 10 apartment
buildings in the city, the researcher can select at random 2 buildings from the 10 and interview all
the residents of these buildings. Cluster sampling is used when the population is large or when it
involves subjects residing in a large geographic area.
Prepared by: Suman Biswas, Lecturer, Dept. of Statistics, Islamic University, Kushtia-7003 7