Basic Statistics Notes 2
Basic Statistics Notes 2
Scale of Measurement
Property Nominal Ordinal Interval Ratio
Order No Yes Yes Yes
Difference No No Yes Yes
Ratio No No No Yes
• Examples of variables and their categorization as continuous or
discrete, quantitative or qualitative, and their respective scale of
measurements.
Variable Discrete/ Quantitative/ Scale of
Continuous Qualitative Measurement
Gender Discrete Qualitative Nominal
Body type Discrete Qualitative Nominal
Time in seconds Continuous Quantitative Ratio
Temperature (F) Continuous Quantitative Interval
A Letter grade (A,B,C, D) Discrete Qualitative Ordinal
Aspects to be considered for collection of data
There are various methods of data collection. As such the
researcher must judiciously select the method/methods
for his own study, keeping in view the following factors:
1. Nature, scope and object of enquiry
• This constitutes the most important factor affecting the
choice of a particular method.
• The method selected should be such that it suits the
type of enquiry that is to be conducted by the
researcher.
• This factor is also important in deciding whether the data
already available (secondary data) are to be used or the
data not yet available (primary data) are to be collected.
2. Availability of funds
• Availability of funds for the research project determines to
a large in what extent the method to be used for the
collection of data.
• When funds at the disposal of the researcher are very
limited, he will have to select a comparatively cheaper
method which may not be as efficient and effective as
some other costly method.
• Finance, in fact, is a big constraint in practice and the
researcher has to act within this limitation.
3. Time factor
• Availability of time has also to be taken into account in
deciding a particular method of data collection.
• Some methods take relatively more time, whereas with
others the data can be collected in a comparatively shorter
duration.
• The time at the disposal of the researcher, thus, affects the
selection of the method by which the data are to be
collected.
4. Precision required
• Precision required is yet another important factor to be
considered at the time of selecting the method of
collection of data.
2. Systematic Sampling
• In some instances the most practical way of sampling is to select
every 15th name on a list, every 10th house on one side of a street
and so on.
• Sampling of this type is known as systematic sampling.
• An element of randomness is usually introduced into this kind
of sampling by using random numbers to pick up the unit with
which to start.
• This procedure is useful when sampling frame is available in the
form of a list.
• In such a design the selection process starts by picking some
random point in the list and then every nth element is selected
3 Stratified Sampling
• If the population from which a sample is to be drawn does not constitute a
homogeneous group, then stratified sampling technique is applied so as to
obtain a representative sample.
• In this technique, the population is stratified into a number of non-
overlapping subpopulations or strata and sample items are selected from
each stratum. Example: regions in Tanzania are natural strata, the
characteristics of Dar es Salaam is quite different from those of Dodoma.
• If the items selected from each stratum is based on simple random sampling
the entire procedure, first stratification and then simple random sampling, is
known as stratified random sampling.
• In broader terms, stratified sampling consists of the following
steps:
(a) The entire population of sampling units is divided into distinct
subpopulation, called strata.
(b) Within each stratum a separate sample is selected from all the
sampling units composing that stratum.
(c) From the sample obtained in each stratum, observations are
made and several statistics are calculated e.g. sample mean.
4 Cluster Sampling
• Cluster sampling involves grouping the population and then
selecting the groups or the clusters which are homogeneous rather
than individual elements for inclusion in the sample.
• Example
• Suppose some departmental store wishes to sample its credit card
holders. It has issued its cards to 15,000 customers.
• The sample size is to be kept say 450. For cluster sampling this list
of 15,000 card holders could be formed into 100 clusters of 150
card holders each (called blocks).
• Three clusters might then be selected for the sample randomly.
• The sample size must often be larger than the simple random
sample to ensure the same level of accuracy because is cluster
sampling procedural potential for order bias and other sources of
error is usually accentuated.
• The clustering approach can, however, make the sampling
procedure relatively easier and increase the efficiency of field
work, specially in the case of personal interviews.
• In cluster sampling, cluster, i.e., a group of population elements,
constitutes the sampling unit, instead of a single element of the
population.
• Cluster elements
• Elements within a cluster should ideally be as heterogeneous as
possible, but there should be homogeneity between
cluster means.
• Each cluster should be a small scale representation of the total
population.
• The clusters should be mutually exclusive and collectively
exhaustive.
• In cluster sampling only the selected clusters are studied, no
sampling in the cluster.
5 Multi-stage Sampling
• This is a further development of the principle of cluster sampling.
Example
• Suppose we want to investigate the working efficiency of
nationalized banks in India and we want to take a sample of few
banks for this purpose.
• The first stage is to select large primary sampling unit such as states
in a country.
• Then we may select certain districts and interview all banks in the
chosen districts.
• Thus, this would represent a two-stage sampling design with the
ultimate sampling units being clusters of districts.
• If instead of taking a census of all banks within the selected
districts, we select certain towns and interview all banks in the
chosen towns.
• Thus this would represent a three-stage sampling design.
• If instead of taking a census of all banks within the selected towns,
we randomly sample banks from each selected town, then it is a
case of using a four-stage sampling plan.
• Therefore If we select randomly at all stages, we will have what is
known as ”‘multi-stage random sampling design”.
ii) Non probability sampling
• The elements are chosen arbitrarily, there is no way to estimate
the probability of any one element being included in the sample.
• Also, no assurance is given that each item has a chance of being
included, making it impossible either to estimate sampling
variability or to identify possible bias.
• In straightforward terms, a sampling method is a non-probabilistic
sampling method if it is not a probability-based sampling method.
Some types of non-probability sampling are
• Convenience sampling
Is statistical method of drawing representative data by selecting
people because of the ease of their volunteering or
selecting units because of their availability or easy access. (e.g.,
recruiting patients as they arrive at a medical facility for otherwise
scheduled appointments)
• Purposive sampling
Also commonly called judgmental sampling, is one that selection is
based on the knowledge of a population and the purpose of the
study. The subjects are selected because of some characteristic.
• Snowball sampling (chain sampling, chain-referral
sampling, referral sampling)
Is a non probability sampling technique where existing study
subjects recruit future subjects from among their acquaintances.
Researchers use this sampling method if the sample for the study is
very rare or is limited to a very small subgroup of the population.
After observing the initial subject, the researcher asks for assistance
from the subject to help identify people with a similar trait of
interest.
• Quota sampling
• sampling is a non-probability sampling technique wherein the
assembled sample has the same proportions of individuals as the
entire population with respect to known characteristics, traits or
focused phenomenon.
Drawbacks of non-probability sampling
• Reliability cannot be measured in non-probability sampling; the
only way to address data quality is to compare some of the survey
results with available information about the population.
• Still, there is no assurance that the estimates will meet an
acceptable level of error.
Advantages of non-probability sampling
• Despite these drawbacks, non-probability sampling methods can
be useful when descriptive comments about the sample itself are
desired.
• Secondly, they are quick, inexpensive and convenient.
• There are also other circumstances, such as in applied social
research, when it is unfeasible or impractical to conduct
probability sampling.
• Statistics Canada uses probability sampling for almost all of its
surveys, but uses non-probability sampling for questionnaire
testing and some preliminary studies during the development
stage of a survey.
• Computer generated random numbers: Using Microsoft Excel to generate
random numbers.
TOPIC 4: CLASSIFICATION AND TABULATION OF DATA
Discussion:
• Frequency distributions
• Range and class intervals
• Appropriate choice of class intervals
• Open classes at ends
• Guidelines for constructing tables
1 Class-Intervals
• It refers to the numerical width of any class in a particular
distribution.
• Numerical characteristics refer to quantitative phenomenon which
can be measured through some statistical units.
• Data relating to income, production, age, weight, etc. come under
this category.
• For instance, persons whose incomes, say, are within USD 201 to
USD 400 can form one group, those whose incomes are within
USD 401 to USD 600 can form another group and so on.
• In this way the entire data may be divided into a number of groups
or classes or what are usually called, class-intervals.
1.1. Class Limits
• Each group of class-interval, thus, has an upper limit as well as a
lower limit which are known as class limits.
1.2. Class Magnitude/Size
• The difference between the two class limits is known as class
magnitude or class size.
NOTE
• We may have classes with equal class magnitudes or with unequal
class magnitudes.
• The number of items which fall in a given class is known as the
frequency of the given class.
2. Frequency Distribution
• All the classes or groups, with their respective frequencies taken
together and put in the form of a table, are described as group
frequency distribution or simply frequency distribution.
NOTE
• For nominal and ordinal data, frequency distributions are often
used as a summary.
• Tables make it easier to see how the data are distributed.
EXAMPLE
• A study was conducted to assess the characteristics of a group of
234 smokers by collecting data on gender and other variables.
Gender Frequency (f) Relative
Frequency
Male (1) 110 0.47
Female (2) 124 0.53
Total (N) 234 1
( )
• Relative frequency =
( )
• Relative frequency should sum to 1.
• It also can be converted into percentages by multiplying it by 100%
3 Range
• The simplest measure of dispersion is the range, which is the
difference between the maximum value and the minimum value of
data.
How to Determine the Number of Class Intervals?
Intervals usually involves the following three main problems:
• How may classes should be there?
• What should be their magnitudes?
• How to determine the frequency of each class?
NOTE
• There can be no specific answer with regard to the number of
classes. The decision about this calls for skills and experience of
the researcher.
• With regard to the second part of the question, we can say that, to
the extent possible, class-intervals should be of equal magnitudes,
but in some cases unequal magnitudes may result in better
classification.
• Hence the researchers objective judgement plays an important
part in this connection.
• Some statisticians adopt the following formula, suggested by H.A.
Sturges, determining the size of class interval:
=
( . )
where
• -size of class interval.
• -Range (i.e., difference between the values of the largest item
and smallest item among the given items).
• -Number of items to be grouped.
10 – 14 9.5 – 14.5 12 5
15 – 19 14.5 – 19.5 17 11
20 – 24 19.5 – 24.5 22 12
25 – 29 24.5 – 29.5 27 7
30 – 34 29.5 – 34.5 32 3
35 – 39 34.5 – 39.5 37 2
Total 40
5. Guidelines for Constructing Tables
• Keep them simple.
• Limit the number of variables.
• All tables should be self-explanatory (Include clear title telling
what, when and where, clearly label the rows and columns, and
state clearly the unit of measurements used).
• Explain codes and abbreviations in the foot-note.
• If data is not original, indicate the source in foot-note.
• TOPIC 5: GRAPHICAL PRESENTATION OF DATA