0% found this document useful (0 votes)
20 views118 pages

Basic Statistics Notes 2

The document outlines a course on basic statistics, detailing its schedule, content, and key concepts such as descriptive and inferential statistics. It covers the meaning, scope, uses, and potential abuses of statistics, along with various statistical fields and types of variables. The course aims to provide a comprehensive understanding of statistical methods and their applications in different scientific domains.

Uploaded by

zuwena206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views118 pages

Basic Statistics Notes 2

The document outlines a course on basic statistics, detailing its schedule, content, and key concepts such as descriptive and inferential statistics. It covers the meaning, scope, uses, and potential abuses of statistics, along with various statistical fields and types of variables. The course aims to provide a comprehensive understanding of statistical methods and their applications in different scientific domains.

Uploaded by

zuwena206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

St 111: basic statistics

Tuesday 10:00 – 12:00, edu theatre 4


Thursday 12:00 – 14:00, edu theatre 4

By: Mr. Luvanda


Course content
1. Meaning and Scope of Statistics
2. Data Collection and Data Handling
3. Sampling Techniques
4. Classification and Tabulation of Data
5. Tabular and Graphical Presentation of Data
6. Summary Measures
7. Relationship between two Variables
Topic 1: Meaning and Scope of Statistics
1.1 introduction
• In the beginning, it may be noted that the word ‘statistics’ is
used rather curiously in two senses plural and singular. In the
plural sense, statistics refers to a set of figures or data. In the
singular sense, a statistic refers to the whole body of tools
that are used to collect data, organise and interpret them
and, finally, to draw conclusions from them.
Definition(s);
• Statistics is concerned with scientific methods for
collecting, organising, summarising, presenting and
analyzing data as well as drawing valid conclusions and
making reasonable decisions on the basis of such analysis.
• Statistics can also be defined as the collection, presentation
and interpretation of numerical data (Croxton and Crowed).
• Statistics is a branch of mathematics used to summarize,
analyze, and interpret a group of numbers or observations.
1.2 Branches (types) of statistics
• There are two main branches (types) of statistics namely
descriptive statistics and inferential statistics.
(a) Descriptive statistics: deals with collecting,
summarizing, and simplifying data. They are statistics that
summarize observations.
• Descriptive statistics are procedures used to summarize,
organize, and make sense of a set of scores or
observations. Descriptive statistics are typically presented
graphically, in tabular form (in tables), or as summary
statistics (single values).
• Descriptive statistics are used to present quantitative
descriptions in a manageable form. For example, consider
the scourge of University of Dodoma students, the Grade
Point Average (GPA) which is a single number describes
the general performance of a student across a potentially
wide range of course experiences.
(b) Inferential statistics: are procedures used that allow
researchers to infer or generalize observations made with samples
to the larger population from which they were selected.
• It consists of methods that are used for drawing inferences, or
making broad generalizations, about a totality of observations
on the basis of knowledge about a part of that totality.
• The totality of observations about which an inference may be
drawn, or a generalization made, is called a population or a
universe.
• The part of totality, which is observed for data collection and
analysis to gain knowledge about the population, is called a
sample.
• Differences between descriptive statistics and inferential statistics
• Inferential statistics is also known as inductive statistics,
it goes beyond describing a given problem situation by
means of collecting, summarizing, and meaningfully
presenting the related data.
• Note: inferential statistics are used to help the
researcher infer how well statistics in a sample reflect
parameters in a population.
1.3 scope (the range of a subject covered) of statistics
In this course we will specifically examine the scope of
statistics as applied in natural, physical and social sciences
• In social sciences, statistics measurement systems is used
to study human behaviours in social environment. Social
statistics is used in evaluating a subset of data obtained
about a group of people, or by observation and statistical
analysis of a set of data that relates to people and their
behaviours. Example; the evaluation of the quality
of services available to a group or organization.
• In physical and natural sciences, modern methods of
statistical and computational analysis offer solutions to
dilemmas confronting researchers in these fields of
science. Example; analysis of large multivariate data sets
requires modern statistical methods to extract the best
results from the study.
• Statistics and modern science: in medical science the
statistical tools for collection, presentation and analysis of
observed facts relating to causes and incidence of diseases
and the result of application various drugs and medicine
are of great importance.
1.4 Uses and Abuses of statistics
(A) Uses
• Statistics is used to presents the facts in definite form.
• Statistics is used to simplify many complexities of different
statistical procedures.
• Statistics is very widely used to make comparison for
making accurate decision.
• It compares the result of one year with other years in
finding out the reasons for changes and effect of such
changes in future.
• Statistics is used for the formation of policies in most of
the organizations.
• Statistics widens the individual experience and knowledge.
• Statistics tests the law of other sciences.
• Statistics also helps in formulating and testing of
hypotheses and developing new theories.
• Statistics is very commonly used to make predictions in
many business and non-business organizations.
(B) Abuses
Some people will use statistics in which the data are presented in
ways that are designed to be misleading. Some abuses are not
intentional, but some are.
i) Bad samples
A major source of deceptive statistics is the use of inappropriate
methods to collect data.
• Self-selected survey – (voluntary response sample) is one in which
the respondents themselves decide whether to be included. In
such surveys, people with strong opinions are more likely to
participate, so the obtained responses are not necessarily
representative of the whole population.
ii) Small samples
It can be very misleading to make broad conclusions or
inferences based on samples that are far too small.
Example: children’s defense fund’s publication of
children out of school in Tanzania reported that among
secondary school students suspended in one region, 67%
were suspended at least three times. Whereas that statistic
was based off of only three students, and the media reports
failed to mention that!
iii) Loaded questions
Survey questions can be worded to elicit a desired response. A
“loaded” item uses emotionally charged words – words that have a
strong negative or positive.
Example: the University of Dodoma should stop wasting student
activities funds on elitist organizations such as fraternities and
sororities.
A. Strongly agree B. Agree C. Neutral D. Disagree E. Strongly
disagree
This is a loaded item because of the use of the phrases “wasting
funds” and “elitist organizations.” It would be improved by making
the wording more neutral.
iv) Misleading graphs
Many visual graphs (especially bar charts and pie charts)
exaggerate or hide the true meaning of the data. Using
different increments or not following the area principle is a
key way to exaggerate, so be careful when reading charts!

Figure 1: Uneven Bar Chart.


v) Pictographs
Using pictures and three-dimensional objects as bars in
histograms and bar charts can be misleading and, again, not
follow the area principle.
vi) Distorted percentages
Be careful with interpretation of percentages:
• An ad from continental airlines referring to lost baggage;
“… an area where we’ve already improved 100% in the last
six months.” The new york times took this to mean there is
no lost baggage, which has never been achieved by
continental airlines.
Importance of Statistics:
Statistics has many application including..
• In industries especially quality control departments
• In environmental management. E.g. extent of
environmental pollution
• In business e.g. planning of operations
• In health e.g. invention of certain drugs
Limitations of Statistics
• Statistics deals with quantitative information only
• It also deals with data aggregates of facts and not
individual data items
• Statistical data are only approximately (not 100% precise)
• Statistical data can easily be misused and therefore should
be used by experts.
1.5 introduction to some statistical fields
(A) descriptive statistics
The primary purpose of collecting data is to give meaning to
a statistical story, to uncover some new fact about our
world, and — last but certainly not least —to make a point,
no matter how outlandish. But what do we do when we
have too much data? One important purpose of statistics is
to describe large amounts of data in a way that is
understandable, useful, and, if need be, convincing. This is
called descriptive statistics.
(B) Probability theories
• The study of probability stems/originates from the analysis
of certain games of chance, and it has found applications
in most branches of science and engineering. In this
chapter the basic concepts of probability theory are
presented.
• Probability is defined as the chance that an event will
happen or the likelihood that an event will happen.
(c) Sample space and events
A. Random Experiments:
• In the study of probability, any process of observation is
referred to as an experiment. The results of an observation
are called the outcomes of the experiment. An experiment
is called a random experiment if its outcome cannot be
predicted. Typical examples of a random experiment are
the roll of a die, the toss of a coin, drawing a card from a
deck, or selecting a message signal for transmission from
several messages.
B. Sample Space
• The set of all possible outcomes of a random experiment is
called the sample space (or universal set), and it is
denoted by . An element in is called a sample point. Each
outcome of a random experiment corresponds to a sample
point.

• EXAMPLE 1.1 Find the sample space for the experiment of


tossing a coin (a) once and (b) twice.
• (a) There are two possible outcomes, heads or tails. Thus
S = {H, T}
where H and T represent head and tail, respectively.
• (b) There are four possible outcomes. They are pairs of
heads and tails. Thus
S = {HH, HT, TH, TT}
• EXAMPLE 1.2 Find the sample space for the experiment of
tossing a coin repeatedly and of counting the number of
tosses required until the first head appears.
• Clearly all possible outcomes for this experiment are the
terms of the sequence 1, 2, 3… . Thus
S = {1, 2, 3, …………}
C. Events
• An event is a set of outcomes of an experiment to which a
probability is assigned. It is a subset of a sample space.
Example: in a deck of 52 cards (no jokers), drawing a single
card from a deck, then the sample space is a 52 element set
as each individual card is a possible outcome.
• An event on the other hand is any subset of the sample
space (i.e. A single element, an empty set and the sample
space itself). Other events are proper subsets of the
sample space that contain multiple elements.
So, for example, potential events include:
• "Red and black at the same time without being a joker" (0
elements),
• "The 5 of hearts" (1 element),
• "A king" (4 elements),
• "A face card" (12 elements),
• "A spade" (13 elements),
• "A face card or a red suit" (32 elements),
• "A card" (52 elements).
Types of events
• Independent events: these are two or more events for which the outcome
of one does not affect or depend on the other. They are events that are not
dependent on what occurred previously.
Example: each toss of a fair coin is an independent event.
• Dependent events: two events are said to be dependent if the occurrence
of one event affects the outcome of the other event.
Example:
• Mutually exclusive events: two events or more than two events are said to
be mutually exclusive if they cannot occur together.
• Two sets A and B are called disjoint or mutually exclusive if they contain no
common element.
• Examples:
i) in tossing a coin occurrence of head and tail are
mutually exclusive
ii) in rolling a dice occurrence of 1 and 6 are
mutually exclusive
iii) in playing cards occurrence of heart and black are
mutually
• Definition of probability
i) Classical definition
ii) Relative frequency definition
• Classical definition was defined by James Bernouli and
Pierre Simion Laplace that probability of an event is the
ratio of the number of cases favourable to it to the
number of all cases possible when nothing leads us to
expect that any one of these cases should occur more
than any other.
• =
Or
• If there are “n” total equally likely and mutually exclusive
possibilities and “m” of them are favourable to the
occurrence of an event A, then the probability of an event
A is defined as
• = =
• Relative frequency definition of probability is defined as;
if an experiment is repeated a large number of times say
“t” under uniform condition and if the event A occurs “f”
times then the probability of an event A is defined by
relative frequency, , and symbolically expressed as
• = lim

D. Axioms of probability:
• Let S be a finite sample space and A be an event in S. Then
in the axiomatic definition, the probability P(A) of the
event A is a real number assigned to A which satisfies the
following three axioms :
Axiom 1: 0  P(A)  1
Axiom 2: P(S) = 1
Axiom 3: P(AB) = P(A) + P(B), if AB = 
(D) Biostatistics
• Biostatistics is the branch of applied statistics directed
toward applications in the health sciences and biology.
Biostatistics is sometimes distinguished from the field of
biometry based upon whether applications are in the
health sciences (bio-statistics) or in broader biology
(biometry: e.g. Agriculture, ecology, wildlife biology)
• Other branches of applied statistics; psychometrics,
econometrics, chemo-metrics, astro-statistics, environ-
metrics, etc.
Why biostatistics? What’s the difference?
• Because some statistical methods are more heavily used in
health applications than elsewhere e.G. Survival analysis,
longitudinal data analysis
• Because examples are drawn from health sciences
i) Makes subject more appealing to those interested in
health
ii) Illustrates how to apply methodology to similar
problems encountered in real life
Variables and Attributes
• A variable is any characteristic that varies or changes with the
members of a population.
• In statistics, a variable has two defining characteristics: A
variable is an attribute that describes a person, place, thing,
or idea. The value of the variable can "vary" from one entity
to another.
For examples
• Also the university students’ year of study is a potential
variable which could have the attributes/values of fresher’s
(first year students), sophomores (second year students) and
finalists (third year students).
Qualitative vs. Quantitative Variables: Variables can be classified
as qualitative (aka, categorical) or quantitative (aka, numeric).
• Qualitative variables: describe characteristics that cannot be
measured numerically. Examples; nationality, gender, hair color,
and so on take on values that are names or labels. Also the color
of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie,
shepherd, and terrier) would be examples of qualitative or
categorical variables.
• Quantitative variables: are numeric and they represent a
measurable quantity. For example, when we speak of the
population of a city, we are talking about the number of people in
the city - a measurable attribute of the city. Therefore, population
would be a quantitative variable.
Discrete vs. Continuous Variables
• Quantitative variables can be further classified
as discrete or continuous. If a variable can take on any value
between its minimum value and its maximum value, it is called a
continuous variable; otherwise, it is called a discrete variable.
• A discrete variable is measured in whole units or categories that
are not distributed along a continuum. For example, the of
brothers and sisters you have and your family’s socioeconomic
class (working class, middle class, upper class) are examples of
discrete variables.
• A continuous variable is measured along a continuum at any
place beyond the decimal point. Continuous variables can be
measured in whole units or fractional units. Example; height,
weight, temperature in degrees Celsius.
Some examples will clarify the difference between discrete
and continuous variables.
• Suppose the fire department mandates that all fire fighters
must weigh between 150 and 250 pounds. The weight of a
fire fighter would be an example of a continuous variable;
since a fire fighter's weight could take on any value
between 150 and 250 pounds.
• Suppose we flip a coin and count the number of heads. The
number of heads could be any integer value between 0 and
plus infinity. However, it could not be any number between
0 and plus infinity. We could not, for example, get 2.3
heads. Therefore, the number of heads must be a discrete
variable.
Independent v/s dependent variable
• An independent variable (IV) is the variable that is manipulated in
an experiment. This variable remains unchanged (or
“independent”) between conditions being observed in an
experiment. It is the “presumed cause.”
• The dependent variable (DV) is the variable that is believed to
change in the presence of the independent variable.
Attributes
• An attribute is a category of a variable. An attribute is a
characteristic of an object (person, thing). Attributes are closely
related to variables in a sense that a variable is a logical set of
attributes. Example; attributes of the variable sex, are male and
female.
TOPIC 2: Data Collection and Data Handling
• Data can be defined as a collected set of information, facts and
statistics for reference or analysis. Statistical data, on the other
hand, refer to those aspects of a problem situation that can be
measured, quantified, counted, or classified.
Any object subject phenomenon, or activity that generates
data through this process is termed as a variable.
In other words, a variable is one that shows a degree of variability
when successive measurements are recorded.
• Data (plural) are measurements or observations that are typically
numeric.
• A datum (singular) is a single measurement or observation, usually
referred to as a score or raw score.
Types and Sources of Data:
Types of Data
• In statistics, data are classified into two broad categories:
quantitative data and qualitative data. This classification is
based on the kind of characteristics that are measured.
• Quantitative data: is based on the measurement of
quantity or amount. It is applicable to phenomena that can
be expressed in terms of quantity.
• Qualitative data: is concerned with qualitative
phenomenon, i.e. phenomenon relating or involving quality
or kind.
Sources of Data
• Data sources could be seen as of two types which are
secondary and primary. The two can be defined as under:
(i) Primary data: are those which are collected afresh and
for the first time from the primary source, and thus happen
to be original in character. They are those data which do not
already exist in any form, and thus have to be collected for
the first time from the primary source(s). By their very
nature, these data require fresh and first-time collection
covering the whole population or a sample drawn from it.
(ii) Secondary data: They already exist in some form:
published or unpublished - in an identifiable secondary
source. They are, generally, available from published
source(s), though not necessarily in the form actually
required. Hence secondary data requires some processes
(although not always necessarily) before being used for the
required purpose.
Assignment 2 (a) Give five sources of secondary data available in
Tanzania.
(b) Outline the data forms available in the sources given
at (a) above.
Univariate vs. Bivariate Data
Statistical data are often classified according to the number
of variables being studied.
• Univariate data. When we conduct a study that looks at
only one variable, we say that we are working with
univariate data. Suppose, for example, that we conducted a
survey to estimate the average weight of high school
students. Since we are only working with one variable
(weight), we would be working with univariate data.
• Bivariate data. When we conduct a study that examines
the relationship between two variables, we are working
with bivariate data. Suppose we conducted a study to see if
there were a relationship between the height and weight
of high school students. Since we are working with two
variables (height and weight), we would be working with
bivariate data.

• Data mining: The practice of examining large pre-existing


databases in order to generate new information.
Measurement of Scales
Scales of measurement refer to how the properties of numbers can change
with different uses
Types of Scales of Measurement
(a) Nominal scales are measurements where a number is assigned to
represent something or someone.
• Numbers on a nominal scale identify something or someone; they provide
no additional information.
• Common examples of nominal numbers include ZIP codes, license plate
numbers, credit card numbers, country codes, telephone numbers, and
Social Security numbers. These numbers simply identify locations, vehicles,
or individuals and nothing more. One credit card number, for example, is
not greater than another; it is simply different.
NOTE: Nominal values represent something or someone. They often
reflect coded data in behavioural science.
Coding refers to the procedure of converting a nominal value to a
numeric value.
(b) Ordinal scales are measurements where values convey order or rank
alone.
• An ordinal scale of measurement is one that conveys order alone. This
scale indicates that some value is greater or less than another value.
• Examples of ordinal scales include finishing order in a competition,
education level, and rankings. These scales only indicate that one value is
greater or less than another, so differences between ranks do not have
meaning.
(c) Interval scales are measurements where the values have
no true zero and the distance between each value is
equidistant. Therefore an interval scale measurement can
be understood readily by two defining principles:
equidistant scales and no true zero.
• Equidistant scales are those values whose intervals are
distributed in equal units.
• A true zero describes values where the value 0 truly
indicates nothing. Values on an interval scale do not have a
true zero.
(d) Ratio scales are measurements where a set of values
has a true zero and are equidistant
• Ratio scales are similar to interval scales in that scores are
distributed in equal units. Yet, unlike interval scales, a
distribution of scores on a ratio scale has a true zero. This
is an ideal scale in behavioural research because any
mathematical operation can be performed on the values
that are measured. Common examples of ratio scales
include counts and measures of length, height, weight,
and time.
• Properties of Scales of measurement

Scale of Measurement
Property Nominal Ordinal Interval Ratio
Order No Yes Yes Yes
Difference No No Yes Yes
Ratio No No No Yes
• Examples of variables and their categorization as continuous or
discrete, quantitative or qualitative, and their respective scale of
measurements.
Variable Discrete/ Quantitative/ Scale of
Continuous Qualitative Measurement
Gender Discrete Qualitative Nominal
Body type Discrete Qualitative Nominal
Time in seconds Continuous Quantitative Ratio
Temperature (F) Continuous Quantitative Interval
A Letter grade (A,B,C, D) Discrete Qualitative Ordinal
Aspects to be considered for collection of data
There are various methods of data collection. As such the
researcher must judiciously select the method/methods
for his own study, keeping in view the following factors:
1. Nature, scope and object of enquiry
• This constitutes the most important factor affecting the
choice of a particular method.
• The method selected should be such that it suits the
type of enquiry that is to be conducted by the
researcher.
• This factor is also important in deciding whether the data
already available (secondary data) are to be used or the
data not yet available (primary data) are to be collected.
2. Availability of funds
• Availability of funds for the research project determines to
a large in what extent the method to be used for the
collection of data.
• When funds at the disposal of the researcher are very
limited, he will have to select a comparatively cheaper
method which may not be as efficient and effective as
some other costly method.
• Finance, in fact, is a big constraint in practice and the
researcher has to act within this limitation.
3. Time factor
• Availability of time has also to be taken into account in
deciding a particular method of data collection.
• Some methods take relatively more time, whereas with
others the data can be collected in a comparatively shorter
duration.
• The time at the disposal of the researcher, thus, affects the
selection of the method by which the data are to be
collected.
4. Precision required
• Precision required is yet another important factor to be
considered at the time of selecting the method of
collection of data.

Methods of Data Collection


• There are several methods of collecting primary data,
particularly in surveys and descriptive researches.
• Some of them are observation method, interview method
and through questionnaires.
1. Observation method
• The observation method is the most commonly used
method specially in studies relating to behavioural
sciences.
• In a way we all observe things around us, but this sort of
observation is not scientific observation.
• Observation becomes a scientific tool and the method of
data collection for the researcher when it serves a
formulated research purpose, that is systematically
planned and recorded and is subjected to checks and
controls on validity and reliability.
• Under the observation method, the information is sought
by the way of investigators own direct observation without
asking from the respondent.
Advantages
• The main advantage of this method is that subjective bias
is eliminated, if observation is done accurately.
• Secondly, the information obtained under this method
relates to what is currently happening. It is not
complicated by either the past behaviour or future
intentions or attitudes.
• Thirdly, this method is independent of respondents
willingness to respond and as such is relatively less
demanding of active cooperation on the part of
respondents as happens to be the case in the interview
or the questionnaire method.
• This method is particularly suitable in studies which deal
with subjects (i.e., respondents) who are not capable of
giving verbal reports of their feelings for one reason or
the other.
Limitations
However, observation method has various limitations.
• Firstly, it is an expensive method.
• Secondly, the information provided by this method is very
limited.
• Thirdly, sometimes unforeseen factors may interfere with
the observational task.
• At times, the fact that some people are rarely accessible to
direct observation creates obstacle for this method to
collect data effectively.
2. Interview Method
• The interview method of collecting data involves
presentation of oral-verbal stimuli and reply in terms of
oral-verbal responses.
• This method can be used through personal interviews
and, if possible, through telephone interviews.
2.1. Personal interviews
• Personal interview method requires a person known as
the interviewer asking questions generally in a face-to-
face contact to the other person or persons.
• This sort of interview may be in the form of direct
personal investigation or it may be indirect oral
investigation.
Direct personal investigation
• In the case of direct personal investigation the interviewer
has to collect the information personally from the sources
concerned.
• He has to be on the spot and has to meet people from
whom data have to be collected.
• This method is particularly suitable for intensive
investigations.
Indirect oral investigation
• But in certain cases it may not be possible or worthwhile to
contact directly the persons concerned or on account of the
extensive scope of enquiry, the direct personal investigation
technique may not be used.
• In such cases an indirect oral examination can be conducted under
which the interviewer has to cross-examine other persons who are
supposed to have knowledge about the problem under
investigation and the information, obtained is recorded.
• Most of the commissions and committees appointed by
government to carry on investigations make use of this method.
Advantages
• More information and that too in greater depth can be obtained.
• Interviewer by his own skill can overcome the resistance, if any, of
the respondents; the interview method can be made to yield an
almost perfect sample of the general population.
• There is greater flexibility under this method as the opportunity
to restructure questions is always there, specially in case of
unstructured interviews.
• Observation method can as well be applied to recording verbal
answers to various questions.
• Personal information can as well be obtained easily under this
method.
• Samples can be controlled more effectively as there arises no
difficulty of the missing returns; non-response generally remains
very low.
• The interviewer can usually control which person(s) will answer
the questions. This is not possible in mailed questionnaire
approach. If so desired, group discussions may also be held.
• The language of the interview can be adopted to the ability or
educational level of the person interviewed and as such
misinterpretations concerning questions can be avoided.
• The interviewer can collect supplementary information about the
respondents persona characteristics and environment which is
often of great value in interpreting results.
Weaknesses
• It is a very expensive method, specially when large and widely
spread geographical sample is taken.
• There are the possibility of the bias of interviewer as well as that
of the respondent and headache of supervision and control of
interviewers.
• Certain types of respondents such as important officials or
executives or people in high income groups may not be easily
approachable under this method and to that extent the data may
prove inadequate.
• This method is relatively more-time-consuming, specially when
the sample is large and re-calls upon the respondents are
necessary.
• The presence of the interviewer on the spot may over-stimulate
the respondent, sometimes even to the extent that he may give
imaginary information just to make the interview interesting.
• Under the interview method the organization required for
selecting, training and supervising the field-staff is more complex
with formidable problems.
• Effective interview presupposes proper rapport with respondents
that would facilitate free and frank responses. This is often a very
difficult requirement.
2.2. Telephone interviews
• This method of collecting information consists in contacting
respondents on telephone itself.
• It is not a very widely used method, but plays important part in
industrial surveys, particularly in developed regions.
Advantages
• It is more flexible in comparison to mailing method.
• It is faster than other methods i.e., a quick way of obtaining
information.
• It is cheaper than personal interviewing method; here the cost per
response is relatively low.
• Recall is easy; call-backs are simple and economical.
• There is a higher rate of response than what we have in mailing
method; the non-response is generally very low.
• Replies can be recorded without causing embarrassment to
respondents.
• Interviewer can explain requirements more easily.
• No field staff is required.
• Representative and wider distribution of sample is possible.
Disadvantages
• Little time is given to respondents for considered answers;
interview period is not likely to exceed five minutes in most cases.
• Surveys are restricted to respondents who have telephone
facilities.
• Extensive geographical coverage may get restricted by cost
considerations.
• It is not suitable for intensive surveys where comprehensive
answers are required to various questions.
• Possibility of the bias of the interviewer is relatively more.
• Questions have to be short and to the point; probes are difficult
to handle.
3. Questionnaire method
• A questionnaire consists of a number of questions printed or
typed in a definite order on a form or set of forms.
• The questionnaire is mailed to respondents who are expected to
read and understand the questions and write down the reply in
the space meant for the purpose in the questionnaire itself.
• The respondents have to answer the questions on their own.
• This method of data collection is quite popular, particularly in case
of big enquiries.
• It is being adopted by private individuals, research workers,
private and public organizations and even by governments
Advantages
• There is low cost even when the universe is large and is widely
spread geographically.
• It is free from the bias of the interviewer; answers are in
respondents own words.
• Respondents have adequate time to give well thought out
answers.
• Respondents, who are not easily approachable, can also be
reached conveniently.
• Large samples can be made use of and thus the results can be
made more dependable and reliable.
Disadvantages
• Low rate of return of the duly filled in questionnaires.
• It can be used only when respondents are educated and
cooperating.
• The control over questionnaire may be lost once it is sent.
• There is inbuilt inflexibility because of the difficulty of amending
the approach once questionnaires have been dispatched.
TOPIC 3: SAMPLING
1 Introduction
• Sampling may be defined as the selection of some part of an
aggregate on the basis of which an inference about the aggregate is
made.
• In other words, it is the process of obtaining information about an
entire population by examining only a part of it.
• In most of the research work and surveys, the usual approach
happens to be to make generalizations or to draw inferences based
on samples about the parameters of population from which the
samples are taken.
• All this is done on the assumption that the sample data will
enable to estimate the population parameters.
• The items so selected constitute what is technically called a
sample, their selection process or technique is called sample
design and the survey conducted on the basis of sample is
described as sample survey.
2. Need(Reasons) for Sampling
• Sampling can save time and money. A sample study is usually less
expensive than a census study and produces results at a relatively
faster speed.
• Sampling may enable more accurate measurements for a sample
study when is generally conducted by trained and experienced
investigators.
• Sampling remains the only way when population contains infinitely
many members.
• Sampling remains the only choice when a test involves the
destruction of the item under study.
• Sampling usually enables to estimate the sampling errors and, thus,
assists in obtain information concerning some characteristic of the
population.

3 Some Fundamental Definitions


• Before we talk about details and uses of sampling, it seems
appropriate that we should be familiar with some fundamental
definitions concerning sampling concepts and principles.
3.1 Sampling frame
• The elementary units or the group or cluster of such units may
form the basis of sampling process in which case they are called as
sampling units.
• A list containing all such sampling units is known as sampling
frame.
• Thus sampling frame consists of a list of items from which the
sample is to be drawn.
3.2 Sampling design
• A sample design is a definite plan for obtaining a sample from the
sampling frame
• It refers to the technique or the procedure the researcher would
adopt in selecting some sampling units from which inferences
about the population is drawn.
3.3 Sampling error
• Sample surveys do imply the study of a small portion of the
population and as such there would naturally be a certain amount
of inaccuracy in the information collected.
• This inaccuracy may be termed as sampling error.
• In other words, sampling errors are those errors which arise on
account of sampling in the sample estimates around the true
population values.
Types of Sampling
There are two main types of sampling namely Probability sampling
and Nonprobability Sampling
i) Probability sampling
• Every item has a chance of being included in the sample.
• Assumption is that, there is an even distribution of characteristics
within the population. This is what makes a researcher believe
that any sample would be representative and because of that,
results will be accurate.
• Randomization is a feature of selection process rather than an
assumption about the structure of the population.
Types of probability sampling
• Simple random sampling
• Systematic sampling
• Stratified sampling
• Cluster sampling
• Multi-stage sampling
1 Simple random sampling
• This type of sampling is also known as chance or probability
sampling where each and every item in the population has an
equal chance of inclusion in the sample.
• And each one of the possible samples, in case of finite universe,
has the same probability of being selected
Example
Statement of Problem
• Suppose that we have a population of 85 college students and want to form
a simple random sample of size 11 to survey about some issues on campus.
We begin by assigning numbers to each of our students. Since there are a
total of 85 students, and 85 is a two digit number, every individual in the
population is assigned a two digit number beginning 01, 02, 03, . . . 83, 84,
85.
Use of the Table
• We will use a table of random numbers to determine which of the 85
students should be chosen in our sample. We blindly start at any place in
our table and write the random digits in groups of two. Beginning at the fifth
digit of the first line we have:
23 44 92 72 75 19 82 88 29 39 81 82 88
• The first eleven numbers that are in the range from 01 to 85 are selected
from the list. The numbers below that are in bold print correspond to this:
23 44 92 72 75 19 82 88 29 39 81 82 88
• At this point there are a few things to note about this particular example of
the process of selecting a simple random sample. The number 92 was
omitted because this number is greater than the total number of students in
our population. We omit the final two numbers in the list, 82 and 88. This is
because 82 is already included in our sample and 88 is out of the range. We
only have nine individuals in our sample. To obtain another subject it is
necessary to continue to the next row of the table. This line begins:
29 39 81 82 86 04 21
• The numbers 29, 39, 81 and 82 have already been included in our sample.
So we see that the first 2-digit numbers that fits in our range and does not
repeat a number that has already been selected are 04 and 21.
• Conclusion of the Problem
• The final step is to contact students who have been identified with the
following numbers:
23, 44, 72, 75, 19, 82, 29, 39, 81, 04, 21
• A well-constructed survey can be administered to this group of students
and the results tabulated.
NOTE
• It is easy to draw random samples from finite populations with the aid of
random number tables only when lists are available and items are readily
numbered.
• But in some situations it is often impossible to proceed in the way we have
narrated above. For example, if we want to estimate the mean height of
trees in a forest, it would not be possible to number the trees, and choose
random numbers to select a random sample.
• In such situations what we should do is to select some trees for
the sample haphazardly without aim or purpose, and should treat
the sample as a random sample for study purposes.
• In case of infinite population, the selection of each item in a
random sample is controlled by the same probability and that
successive selections are independent of one another.

2. Systematic Sampling
• In some instances the most practical way of sampling is to select
every 15th name on a list, every 10th house on one side of a street
and so on.
• Sampling of this type is known as systematic sampling.
• An element of randomness is usually introduced into this kind
of sampling by using random numbers to pick up the unit with
which to start.
• This procedure is useful when sampling frame is available in the
form of a list.
• In such a design the selection process starts by picking some
random point in the list and then every nth element is selected
3 Stratified Sampling
• If the population from which a sample is to be drawn does not constitute a
homogeneous group, then stratified sampling technique is applied so as to
obtain a representative sample.
• In this technique, the population is stratified into a number of non-
overlapping subpopulations or strata and sample items are selected from
each stratum. Example: regions in Tanzania are natural strata, the
characteristics of Dar es Salaam is quite different from those of Dodoma.
• If the items selected from each stratum is based on simple random sampling
the entire procedure, first stratification and then simple random sampling, is
known as stratified random sampling.
• In broader terms, stratified sampling consists of the following
steps:
(a) The entire population of sampling units is divided into distinct
subpopulation, called strata.
(b) Within each stratum a separate sample is selected from all the
sampling units composing that stratum.
(c) From the sample obtained in each stratum, observations are
made and several statistics are calculated e.g. sample mean.
4 Cluster Sampling
• Cluster sampling involves grouping the population and then
selecting the groups or the clusters which are homogeneous rather
than individual elements for inclusion in the sample.
• Example
• Suppose some departmental store wishes to sample its credit card
holders. It has issued its cards to 15,000 customers.
• The sample size is to be kept say 450. For cluster sampling this list
of 15,000 card holders could be formed into 100 clusters of 150
card holders each (called blocks).
• Three clusters might then be selected for the sample randomly.
• The sample size must often be larger than the simple random
sample to ensure the same level of accuracy because is cluster
sampling procedural potential for order bias and other sources of
error is usually accentuated.
• The clustering approach can, however, make the sampling
procedure relatively easier and increase the efficiency of field
work, specially in the case of personal interviews.
• In cluster sampling, cluster, i.e., a group of population elements,
constitutes the sampling unit, instead of a single element of the
population.
• Cluster elements
• Elements within a cluster should ideally be as heterogeneous as
possible, but there should be homogeneity between
cluster means.
• Each cluster should be a small scale representation of the total
population.
• The clusters should be mutually exclusive and collectively
exhaustive.
• In cluster sampling only the selected clusters are studied, no
sampling in the cluster.
5 Multi-stage Sampling
• This is a further development of the principle of cluster sampling.
Example
• Suppose we want to investigate the working efficiency of
nationalized banks in India and we want to take a sample of few
banks for this purpose.
• The first stage is to select large primary sampling unit such as states
in a country.
• Then we may select certain districts and interview all banks in the
chosen districts.
• Thus, this would represent a two-stage sampling design with the
ultimate sampling units being clusters of districts.
• If instead of taking a census of all banks within the selected
districts, we select certain towns and interview all banks in the
chosen towns.
• Thus this would represent a three-stage sampling design.
• If instead of taking a census of all banks within the selected towns,
we randomly sample banks from each selected town, then it is a
case of using a four-stage sampling plan.
• Therefore If we select randomly at all stages, we will have what is
known as ”‘multi-stage random sampling design”.
ii) Non probability sampling
• The elements are chosen arbitrarily, there is no way to estimate
the probability of any one element being included in the sample.
• Also, no assurance is given that each item has a chance of being
included, making it impossible either to estimate sampling
variability or to identify possible bias.
• In straightforward terms, a sampling method is a non-probabilistic
sampling method if it is not a probability-based sampling method.
Some types of non-probability sampling are
• Convenience sampling
Is statistical method of drawing representative data by selecting
people because of the ease of their volunteering or
selecting units because of their availability or easy access. (e.g.,
recruiting patients as they arrive at a medical facility for otherwise
scheduled appointments)
• Purposive sampling
Also commonly called judgmental sampling, is one that selection is
based on the knowledge of a population and the purpose of the
study. The subjects are selected because of some characteristic.
• Snowball sampling (chain sampling, chain-referral
sampling, referral sampling)
Is a non probability sampling technique where existing study
subjects recruit future subjects from among their acquaintances.
Researchers use this sampling method if the sample for the study is
very rare or is limited to a very small subgroup of the population.
After observing the initial subject, the researcher asks for assistance
from the subject to help identify people with a similar trait of
interest.
• Quota sampling
• sampling is a non-probability sampling technique wherein the
assembled sample has the same proportions of individuals as the
entire population with respect to known characteristics, traits or
focused phenomenon.
Drawbacks of non-probability sampling
• Reliability cannot be measured in non-probability sampling; the
only way to address data quality is to compare some of the survey
results with available information about the population.
• Still, there is no assurance that the estimates will meet an
acceptable level of error.
Advantages of non-probability sampling
• Despite these drawbacks, non-probability sampling methods can
be useful when descriptive comments about the sample itself are
desired.
• Secondly, they are quick, inexpensive and convenient.
• There are also other circumstances, such as in applied social
research, when it is unfeasible or impractical to conduct
probability sampling.
• Statistics Canada uses probability sampling for almost all of its
surveys, but uses non-probability sampling for questionnaire
testing and some preliminary studies during the development
stage of a survey.
• Computer generated random numbers: Using Microsoft Excel to generate
random numbers.
TOPIC 4: CLASSIFICATION AND TABULATION OF DATA
Discussion:
• Frequency distributions
• Range and class intervals
• Appropriate choice of class intervals
• Open classes at ends
• Guidelines for constructing tables
1 Class-Intervals
• It refers to the numerical width of any class in a particular
distribution.
• Numerical characteristics refer to quantitative phenomenon which
can be measured through some statistical units.
• Data relating to income, production, age, weight, etc. come under
this category.
• For instance, persons whose incomes, say, are within USD 201 to
USD 400 can form one group, those whose incomes are within
USD 401 to USD 600 can form another group and so on.
• In this way the entire data may be divided into a number of groups
or classes or what are usually called, class-intervals.
1.1. Class Limits
• Each group of class-interval, thus, has an upper limit as well as a
lower limit which are known as class limits.
1.2. Class Magnitude/Size
• The difference between the two class limits is known as class
magnitude or class size.
NOTE
• We may have classes with equal class magnitudes or with unequal
class magnitudes.
• The number of items which fall in a given class is known as the
frequency of the given class.
2. Frequency Distribution
• All the classes or groups, with their respective frequencies taken
together and put in the form of a table, are described as group
frequency distribution or simply frequency distribution.
NOTE
• For nominal and ordinal data, frequency distributions are often
used as a summary.
• Tables make it easier to see how the data are distributed.
EXAMPLE
• A study was conducted to assess the characteristics of a group of
234 smokers by collecting data on gender and other variables.
Gender Frequency (f) Relative
Frequency
Male (1) 110 0.47
Female (2) 124 0.53
Total (N) 234 1
( )
• Relative frequency =
( )
• Relative frequency should sum to 1.
• It also can be converted into percentages by multiplying it by 100%

3 Range
• The simplest measure of dispersion is the range, which is the
difference between the maximum value and the minimum value of
data.
How to Determine the Number of Class Intervals?
Intervals usually involves the following three main problems:
• How may classes should be there?
• What should be their magnitudes?
• How to determine the frequency of each class?
NOTE
• There can be no specific answer with regard to the number of
classes. The decision about this calls for skills and experience of
the researcher.
• With regard to the second part of the question, we can say that, to
the extent possible, class-intervals should be of equal magnitudes,
but in some cases unequal magnitudes may result in better
classification.
• Hence the researchers objective judgement plays an important
part in this connection.
• Some statisticians adopt the following formula, suggested by H.A.
Sturges, determining the size of class interval:

=
( . )
where
• -size of class interval.
• -Range (i.e., difference between the values of the largest item
and smallest item among the given items).
• -Number of items to be grouped.

• With regard to the third part of the question, determination of the


frequency of each class can be done either by tally sheets or by
mechanical aids.
• Under the technique of tally sheet, the class-groups are written on
a sheet of paper (commonly known as the tally sheet) and for each
item a stroke (usually a small vertical line) is marked against the
class group in which it falls.
• The general practice is that after every four small vertical lines in a
class group, the fifth line for the item falling in the same group, is
indicated as horizontal line through the said four lines and the
resulting flower represents five items.
• All this facilitates the counting of items in each one of the class
groups.
• Example
Frequency Distribution Table
Income Group Tally Mark Frequency
(TZ Shillings)
Below 500,000 IIII IIII III 13
500,001 – 1,000,000 IIII IIII IIII IIII 20
1,000,000 – 2,000,000 IIII IIII II 12
2,000,001 – 3,000,000 IIII IIII IIII III 18
3,000,001 – 4,000,000 IIII III 8
4,000,001 and above IIII II 7
Total 78
4 Open Classes
• It should be kept in mind that in case one or two or very few items
have very high or very low values, one may use what are known as
open-ended intervals in the overall frequency distribution.
• Such intervals may be expressed like below/under TZShs500,000 or
TZShs 4,000,001 and above/over.
• Such intervals are generally not desirable, but often cannot be
avoided.
• The researcher must always remain conscious of this fact while
deciding the issue of the total number of class intervals in which
the data are to be classified.
NOTE
• Class limits may generally be stated in any of the following forms:
1. Exclusive type class intervals
• They are usually stated as follows:
11 – 20
21 – 30
31 – 40
41 – 50
• Under the exclusive type class intervals, the items whose values
are equal to the upper limit of a class are grouped in the next
higher class.
• For example, an item whose value is exactly 30 would be put in 30
− 40 class interval and not in 20 − 30 class interval.
• In simple words, we can say that under exclusive type class
intervals, the upper limit of a class interval is excluded and items
with values less than the upper limit (but not less than the lower
limit) are put in the given class interval.
2. Inclusive type class intervals
• In inclusive type class intervals the upper limit of a class interval is also
included in the concerning class interval.
• Thus, an item whose value is 20 will be put in 11 − 20 class interval.
• The stated upper limit of the class interval 11 − 20 is 20 but the real limit is
20.99999 and as such 11 − 20 class interval really means 11 and under 21.
• When the phenomenon under consideration happens to be a discrete one
(i.e., can be measure and stated only in integers), then we should adopt
inclusive type classification.
• But when the phenomenon happens to be a continuous one capable of
being measured in fractions as well, we can use exclusive type class
intervals.
DEFINITIONS:
Mid-point
• The value of the interval which lies midway between the lower
and the upper limits of a class.
• It is obtained by summing the lower and upper class boundaries
and dividing by 2.
True limits (class boundary)
• Are those limits that make an interval of a continuous variable
continuous in both directions.
• Used for smoothening of the class intervals.
• Example
Frequency Distribution Table
Class Interval Class Mid-point Frequency
(Hours) Boundary

10 – 14 9.5 – 14.5 12 5
15 – 19 14.5 – 19.5 17 11
20 – 24 19.5 – 24.5 22 12
25 – 29 24.5 – 29.5 27 7
30 – 34 29.5 – 34.5 32 3
35 – 39 34.5 – 39.5 37 2
Total 40
5. Guidelines for Constructing Tables
• Keep them simple.
• Limit the number of variables.
• All tables should be self-explanatory (Include clear title telling
what, when and where, clearly label the rows and columns, and
state clearly the unit of measurements used).
• Explain codes and abbreviations in the foot-note.
• If data is not original, indicate the source in foot-note.
• TOPIC 5: GRAPHICAL PRESENTATION OF DATA

You might also like