Advanced Quantitative
Advanced Quantitative
Nominal level
Data that can only be classified into categories and cannot be arranged in an ordering
Ordinal Level
Measurement on an ordinal scale involves putting individuals into an order, ranking them
according to some criteria Example, excellent, good, and poor. During a taste test of three
soft drinks, Coca Cola was ranked number 1, Sprite was ranked number 2, and Fanta was
ranked number 3
Interval level
Interval level is similar to the ordinal level, with the additional property that meaningful
amounts of differences between data values can be determined. For example, temperature
Ratio level
The interval levels with an inherent zero starting point. Differences and ratios are meaningful
for this level of measurement. Examples: Money and heights of basketball players.
Unit 2: Population and Sample
Population
Population is a collection of all possible
individuals, objects, or measurements of interest
A collection of items of interest in research
A complete set of things
A group that you wish to generalize your
research to
Sample
Sample is a portion of the population of interest
A subset or part, of the population of interest
The size smaller than the size of a population
Representative: An accurate reflection of the
population (A primary problem in statistics)
Population (N) Sample (n)
An example of population is: All the population in
Mettu town
An example of sample is: 100 people randomly
selected from Mettu Population
Sampling Techniques
• It simply indicates the way how the already decided
samples are selected. There are two types of sampling
techniques/methods
1. Probability sampling
2. Non-probability sampling
Characteristics of probability sampling
3. A known likelihood of being included in the
sample
4. Each item in the population has an equal chance
of being chosen
5. Probability sample may be representative of the
population
Characteristics of Non-probability Sampling:
The following are the main characteristics of
nonprobability sample:
There is no idea of population in non-
probability sampling
There is no probability of selecting any
individual
Non-probability sample has free distribution
The observations of non-probability sample are
not used for generalization purpose
Non-parametric or non-inferential statistics are
used in non probability sample
There is no risk for drawing conclusions from
non-probability sample.
Classification of sampling techniques
There is several probability sampling methods
1. Simple random sampling
2. Systematic sampling
3. Stratified sampling
4. Multi-stage sampling
5. Cluster sampling
Simple Random Sample
• In this sampling method each item or person in the population
has the same chance of being included in the sample
• Most widely used type of sampling
Methods of Randomization
The following are main methods of randomization:
(a) Lottery method of randomization.
(b) Tossing a coin (Head or tail) method.
(c) Throwing a dice.
(d) Blind folded method.
(e) Random tables (Tiptt’s Table of Randomization) .
Systematic Sampling
Systematic sampling is an improvement over the simple
random sampling. This method requires the complete
information about the population. There should be a list of
information of all the individuals of the population in any
systematic way. Now we decide the size of the sample.
Let sample size = n
and population size = N
Now we select each N/nth individual from the list and thus we
have the desired size of sample which is known as systematic
sample. Thus for this technique of sampling population should
be
arranged in any systematic way.
Stratified Sampling
When employing this technique, the researcher divides his
population in strata on the basis of some characteristics and
from each of these smaller homogeneous groups (strata)
draws at random a predetermined number of units
Cluster Sampling
In Cluster sampling the sample units contain groups of
elements (clusters) instead of individual members or
items in the population
Cluster sampling is an example of two-stage sampling
or multistage sampling: in the first stage sample of
cluster/s is/are chosen while in the second stage
sample/s of respondent/s within those areas is/are
selected.
Multi-Stage Sampling
This sample is more comprehensive and
representative of the population.
Multistage sampling is a complex form of cluster
sampling
Non-probability Sampling Techniques
Convenience sampling
Judgement/Purposive sampling
Quota sampling
Snowball sampling
Volunteer sampling
Convenience sampling is that in which the study units that happen to
be available at the time of data collection are selected for purposes of
convenience. Most clinic based studies use this method.
In this method, the decision maker selects a sample form the
population in a manner that is relatively easy and convenient.
Judgement/Purposive Sampling
This involves the selection of a group from the population on the
basis of available information thought. For instance, selection of
focus group discussion and key informants interviews
Quota Sampling
The population is classified into several categories: on the basis of
judgement or assumption or the previous knowledge, the proportion
of population falling into each category is decided. Thereafter a
quota of cases to be drawn is fixed and the observer is allowed to
sample as he likes
Snowball Sampling
In this technique initial respondents are selected by
probability methods and then additional respondents are
selected from information provided by the initial
respondents. This method is used to locate members of
rare population by referrals
Volunteer sampling
• A common method of volunteer sampling is phone-in
sampling, used mainly television and radio stations to
gauge public opinion on current affairs issues such as
preferred political party.
Types of Errors in Sampling
• The samples of behavioural research are not representative
and suffer from two types of errors:
(1)Random error
(2) Systematic error
Sample Size Determination
It refers to the number of sampling units selected from the
population for investigation
One of the first questions that the researcher typically asks,
concerns with the number of subjects that need to be
included in his sample
Determine the size of sample is the crucial problem for the
research scholars to determine the size of sample
There is no single rule that can be used to determine
sample size
Generally 95 to 99 percent confidence intervals are
acceptable i.e. 5 to 1 percent error
The chances are 95 to 5 that a sample in separated
sampling will fall within the interval M ± 1.96
Using Formulas to Calculate a Sample Size
You can determine your sample size by the following formula, if the total population size is greater
than 10,000.
n = z2pq
e2
Where, n = the desired sample size
Z= the standard normal deviation, usually set at 1.96 which corresponds to the 95 percent
confidence interval.
P= the proportion in the target population to have a particular characteristic. If there is
reasonable estimate, then we can use 0.5
q=1 – P
e= error margin –the maximum error you need to tolerate.
Example: If the proportion of a target population with a certain characteristic is 0.5, the z-
statistic is 1.96 and we desire at 0.05 error margin, what will be the sample size provided
that the entire population is greater than 10,000. n=
z2pq
e2
n= (1.96)2(0.5) (0.5) = 384
(0.05)2
Unit 3. Data Collection Methods and Analysis
Source of data
It is important for a researcher to know the source of data
which the researcher requires for his different purpose
Data are nothing but the information needed for the study
There are two sources of information or data
Primary data
Secondary data
Primary data mean the data collected for the first time. It
is one that itself collects the data. Primary data may be
obtained by applying the following methods.
Household questionnaire survey
Key informants interviews
Focus group discussions
Personal observation
Instrumental measurement
Secondary data mean the data that have already
been collected and used earlier by somebody or some
agency
Published and unpublished articles
Books, online sources, government reports
Climatic data (rainfall and temperature)
Demographic data
Both the sources of information have their merits and
demerits
The selection of a particular source of information
depends upon
a) Purpose and scope of enquiry
b) Availability of time
c) Availability of finance
d) Accuracy required
e) Statistical units to be used
Data collection tools/Instruments
There are various data collection tools. These are:
Questionnaires
key informant interviews
Focus Groups
Direct/participant observation
Document analysis
Instrument measurement
Questionnaire
What is questionnaire?
A list of questions properly selected and
arranged and pertaining to the investigation
The questionnaire is the media of communication
between the investigator of the respondent
Respondent is a person who fill the questionnaire
or supplies the required information
There are three basic types of questionnaire closed ended,
open-ended or a combination of both.
Closed-ended questionnaires
• Closed-ended questionnaires are probably the type with
which you are most familiar
• As these questionnaires follow a set format, and as most
can be scanned straight into a computer for ease of
analysis, greater numbers can be produced
Open-ended questionnaires
• Open-ended questionnaires are used in qualitative
research. The questionnaire does not contain boxes to tick,
but instead leaves a blank section for the respondent to
write in an answer.
Combination of both
• Many researchers tend to use a combination of both open
and closed questions. Many questionnaires begin with a
series of closed questions, with boxes to tick or scales to
rank, and then finish with a section of open questions for
more detailed response.
Interviews
Three types of interview are used in social
research:
The most common of these are
unstructured, semi-structured and
structured interviews.
Unstructured interviews
• Unstructured or in-depth interviews are sometimes
called life history interviews. This is because they
are the
• In this type of interview, the researcher attempts
to achieve a holistic understanding of the
interviewees’ point of view or situation
• In unstructured interviews researchers need to
remain alert, recognising important information
and probing for more detail
Semi-structured interviews
Semi-structured interviewing is perhaps the most common type of interview used in
qualitative social research. In this type of interview, the researcher wants to know
specific information which can be compared and contrasted with information gained
in other interviews. For this type of interview, the researcher produces an interview
schedule. This may be a list of specific questions or a list of topics to be discussed
Structured interviews
Structured interviews are used in quantitative research and can be conducted face-
to-face or over the telephone, sometimes with the aid of lap-top computers.
Structured interviews are used frequently in market research
Focus group discussions
• Focus groups may be called discussion groups or group interviews. A number of
people are asked to come together in a group to discuss a certain issue.
• The discussion is led by a moderator or facilitator who introduces the topic, asks
specific questions, controls digressions and stops break-away conversations
• Focus groups are held with a number of people to obtain a group opinion.
• Focus groups are run by a moderator who asks questions and makes sure the
discussion does not digress.
• Number of participants should be range between 6-12 individuals
Participant observation
Observation is a method that employ vision as its main means
of data collection
It implies the use of eyes rather than the ears and the voice
Field observation takes place in a natural setting
• Participant observation is used when a researcher wants to
immerse herself in a specific culture to gain a deeper
understanding.
• There are two main ways in which researchers observe direct
observation and participant observation
• Participant observation is popular amongst anthropologists and
sociologists who wish to study and understand another
community, culture or context
• Participant observation can be cover or overt participation
Covert participant observation when the researcher entering
organisations and participating in their activities without anyone
knowing that they were conducting research
Overt participant observation, where everyone knows who the
researcher is and what she is doing, however, can be a valuable
and rewarding method for qualitative inquiry.
Methods of Data Analysis
In this section:
how to organize,
analyze and
interpret collected data,
The details of the statistical techniques
and rationale for using such techniques
should be also described in the research
Software packages employed in the
research
Reliability and validity are important concepts
in research
Validity of research is about the degree to which the
research findings are true.
There are three different types of validity
Measurement validity: The degree to which
measures (e.g. questions on a questionnaire)
successfully indicate concepts.
Internal validity: The extent to which causal
statements are supported by the study.
External validity: The extent to which findings can
be generalized to populations or to other settings.
Reliability is the degree to which an assessment tool
produces stable and consistent results. Reliability is
the extent to which an experiment, test, or any
measuring procedure yields the same result on
repeated trials
Unit 4: Descriptive Statistics
The Concept of Statistics
• Statistics is the science of collecting, organizing,
presenting, analyzing, and interpreting data to assist in
making more effective decisions and draw meaning full
inferences from data that then lead to improve decision
• The study of statistics is usually divided into two categories:
descriptive statistics and inferential statistics.
Descriptive statistics: is the method of organizing,
summarizing presenting data in an informative way.
Inferential statistics: is also called statistical inference or
inductive statistics. Our main concern regarding inferential
statistics is finding something about a population based on a
sample taken from that population.
Inferential statistics comes in two ways – estimated and
hypothesis testing.
Measures of central tendency
Central tendency is a single value that summarizes a set of data. It locates the
center of the values. The most common measures of central tendency that widely
used in statistical analysis. These are:
i) Arithmetic mean
ii) Median
iii) Mode
i). Arithmetic mean
What is mean?
It is the simplest but most useful measure of central tendency
• The mean is the statistical name for what is commonly called the average.
• Arithmetic mean, usually abbreviated to 'mean', is the 'average' in common
use. It is found by totaling the values in a data set and dividing by the number
of items.
Example
The weights of seven grade 9 students are 45,50, 55, 48, 56, 49, and 47, .Find the median
Step 1. Arrange the data in order 45, 47, 48, 49, 50, 55, 56,
Step 2. Select the middle value.
45, 47, 48, 49, 50, 55, 56,
Median
The Mode
The mode is the value that appears most frequently
• A set of data can have more than one mode.
Example : The exam scores for ten students are 81,93, 84, 75, 81, 87. Since the score of 81 occurs
the most, the modal score is 81.
Measures of dispersion are concerned with the distribution of values
around the mean in data: The most commonly used measures of dispersions
are:
Range
Standard deviation
Variance
Coefficient of variation
Range. For ungrouped data, the range is the difference between the
highest and lowest values in a set of data. Range = Highest Value -Lowest
Value
Standard Deviation
Standard deviation, also known as root mean squared deviation, explains the average amount
N
of variation on either side of the mean
( xi x ) 2
Population Standard Deviation = i 1
N
Examples:
An analysis of the monthly wages paid (in Birr) to workers in two firms A and B belonging to the same industry gives the
following results
Solutions:
Calculate coefficient of variation for both firms.
SA 10
C.V A * 100 * 100 19.05%
XA 52.5
SB 11
C.VB * 100 * 100 23.16%
XB 47.5
Unit 5: Inferential statistics
Correlation techniques
Correlation analysis aims to measure the degree to which two parametric variables can vary together
by means of a single index
Two variables can have association if one variable has relation to another one.
Independent variable (X)
Dependent variable (Y)
• Dependent Variable: The variable that is being predicted or estimated,
• Independent Variable: The variable that provides the basis for estimation. It is the predictor variable.
When only two variables (dependent and independent) are involved the correlation is simple correlation
Measures of correlation are employed to explore three points, namely:
presence or absence of correlation, that is, whether or not there is a correlation between the
variables in question;
direction of correlation, that is, if there is a correlation, whether it is Positive or negative;
Strength of correlation, that is, whether an existing correlation is strong or weak
Relationships between variables have both direction and strength.
• Direction
positive,
negative,
indeterminate
• Strength the magnitude of the association
Very Strong
Strong
Weak
No relationship
Existence of directional and strength of correlation are demonstrated in
the coefficient of correlation.
A zero correlation indicates that there is no correlation between the
variables.
The sign in front of the coefficient indicates whether the variables change
in the same direction (positive correlation) or in opposite direction
(negative correlation). This quantitative expression of dependent and
independent variables is known as Coefficient of correlation. The most
commonly used coefficient of correlation in parametric statistic is
Pearson product-moment coefficient of correlation, r.
It is the most common measure of association scaled on an interval level
The Coefficient of Correlation, r
• The Coefficient of Correlation (r) is a measure of the strength of the
relationship between two variables.
• It requires interval or ration scaled data (variables)
• It can range from – 1 to 1
• Values of -1 or 1 indicate perfect correlation.
• Values close to 0.0 indicate weak correlation
• Zero value indicates no relationship
• Negative values indicate an inverse relationship and positive values
indicate a direct relationship
If the coefficient has a value;
• Under 0.20 it indicates very weak correlation
• 0.21 - 0.40 = weak
• 0.41 - 0.70 = moderate
• 0.71 - 0.91 = strong
• ≥ 0.91 = very strong
The formulas are:
• Where:
x X X
y Y Y
Example 1: the following data are the amount of yield farmers produce by using different
amount of fertilizer application. Amount of yield per hectare 10, 25, 14, 15, 16, 17, 18, 20,
21, 12 and the amount of fertilizer application are: 0.5, 0.8, 1, 1.1, 1.2, 1.3, 1.3, 1.4, 1.4, and
0.5. Is there a relationship between the variables? Indicate the direction and strengthen of
the relationship.
Where,
b is the intercept,
a is the slope of the line, and
is the expected value of Y (called Y hat) for a given value of X.
This is normally called the estimating equation
The values of a and b are estimates, and when the values of X are
substituted into the equation, the solution of the equation, the solution of
the equation provides estimates of forgiven values of X.
For example, the regression of applied N-fertilizer (X) kg ha -1 and grain yield (Y) Mg ha-1
of the following Table is explained hereafter
Example 1: by using the above given data, find the regression line equation and draw
the graph and interpret the result?
Fertilizer-N (kg
-1
ha ) 10 12 14 15 16 17 18 20 21 25
-1
Yield-(Mg ha ) 0.5 0.8 1 1.1 1.2 1.3 1.3 1.4 1.4 1.5
1.6
Grain yield (Mg) ha
1.4
-1
1.2
1
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16 18 20 22 24
-1
N Fertilizer (kg ha )
Example 1: by using the above given data, find the regression line equation and draw
the graph and interpret the result?
ha
1.8
1.6
Grain yield (Mg
y = 0.065x + 0.062
)
1.4
-1
2
R = 0.8609
1.2
1
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16 18 20 22 24
-1
N Fertilizer (kg ha )
The sign of a expresses the direction of the relationship between Y and X;
when a is positive, it implies that an increase in X is accompanied by an increase in
Y and
When a is negative, Y decreases as X increases.
The regression coefficient, a, is the rate of change (constant) of yield and represents
the slope of the linear function between the two variables.
In this example, a is equal to 0.065. Or in other words, the grain yield increases 0.065
Mg ha-1 for each N-fertilizer increase in kg.
The constant b is the value of Y when X is equal to zero. This constant is called the Y-
intercept, and represents the point at which the linear function passes through the Y-
axis.
In this example, when N-fertilizer is equal to 0, grain yield equals 0.062.
Remember that a indicates the rate of change in the dependent variable (Y) given one
unit change in X
a and b are obtained by the following equation:
For example, the regression equation for Fertilizer (X) and Yield (Y) can be represented
by:
For example, the regression equation for Fertilizer (X) and Yield (Y) can be represented by:
• Using this equation we can extrapolate or interpolate (estimate) Y values (yields) for
some rates of applied N-fertilizer.
• Extrapolation is the estimation of the dependent variable by using the values beyond
the given data set.
• Interpolation is the estimation of the dependent variable by using the values between
the given data set.
For example, in our example, there is no yield for farmers applying less than 10 kg
For example, the regression equation for Fertilizer (X) and Yield (Y) can be represented by:
N-fertilizer ha-1. The yield estimate ha-1 for a farmer applying 4 kg N-fertilizer ha-1
would be:
Parametric and non parametric tests
• Parametric tests are statistical tests for population parameters such as means,
variances, and proportions about populations from which the samples were
selected. One assumption is that these populations are normally distributed.
• Statistical tests such as z test, t test, F test and ANOVA are typical examples
of parametric tests. For instance, for comparing mean we use t-test.
Comparing between groups (independent-t test), comparing measurements
within the same subject (paired t-test).
What is a non parametric test?
• Statistical analyses which do not depend upon knowledge of the distribution
and parameters of the population are called non-parametric statistics or
distribution free statistics.
Example of non parametric statistics
• Popular Nonparametric Tests
• Sign Test
• Wilcoxon Rank Sum Test
• Wilcoxon Signed Rank Test
• Kruskal Wallis H-Test
• Kolmogorov-Smirnov Test
• Friedman’s Fr-Test
The differences between Parametric and non parametric tests
Ho is true Ho is false
Ho , , , , 2 , ,
H1 , , , 2 , ,
Steps in Hypothesis Testing
Step 1: State the Null hypothesis (Ho) and the Alternate hypothesis
(H1)
The first step ·is to state the hypothesis being tested
Step 2: Select the Level of Significance
Level of Significance: The probability of rejecting the null hypothesis when it
is actually true. The level of significance is designated a. It is sometimes
called the level of risk. There is no one level of significance that is applied to
all tests. A decision is made to use the .05 level (often stated as the 5
percent level), 01 level, the 10 level or any other level between 0 and I.
Step 3: Compute the Test Statistic
There are many test statistics. Z test, t - test, F, and X2 are only few
Step 4: Formulate the Decision Rule
The decision rule states the conditions when Ho is rejected
Step 5: Make a Decision
This step involves the decision concerning the null hypothesis
Step 6: Interpretation of the decision
There is sufficient proof at the 05 level to allow us to reject the null
hypothesis