0% found this document useful (0 votes)
5 views54 pages

Advanced Quantitative

The document provides an overview of quantitative research design in geographic studies, detailing the use of mathematical models and statistical techniques to analyze geographical data. It covers concepts such as population and sampling methods, data collection techniques, and the importance of reliability and validity in research. Additionally, it distinguishes between primary and secondary data sources, and outlines various sampling techniques, including probability and non-probability methods.

Uploaded by

aderugudeta314
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views54 pages

Advanced Quantitative

The document provides an overview of quantitative research design in geographic studies, detailing the use of mathematical models and statistical techniques to analyze geographical data. It covers concepts such as population and sampling methods, data collection techniques, and the importance of reliability and validity in research. Additionally, it distinguishes between primary and secondary data sources, and outlines various sampling techniques, including probability and non-probability methods.

Uploaded by

aderugudeta314
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Unit 1: Quantitative Research Design in Geographic Studies

1.1. General concept of quantitative techniques/methods


 Quantitative techniques simply refer to the use of mathematical
models, theorems, and proofs in understanding geographical forms
and relations
 Quantitative methods are the techniques that use mathematical
models, theorems, proofs and statistics to analyze data
1.2 The quantitative paradigm in Geographic Studies
 In the 1950s and 1960s the development of quantitative geography
becomes developed.
 Quantitative geography is the use of quantitative method
(mathematics, statistics, proofs etc) in geographic problems and
issues.
 It consists of the following activities:
 The analysis of numerical special data;
 the development of special theory; and
 The construction and testing of mathematical models of spatial
process
 The use of quantitative methods enables geographers to treat a
large amount of data and a large number of variables in an
objective manner
1.3 Quantitative data sources and techniques of data
acquisition
 Household questionnaire survey
 Surveys can be done by using a variety of methods
a.Telephone surveys
b.Mailed questionnaire
c.Personal interview surveys
 Measurement of spatial,
 bioclimatic
 economic and demographic data
 Geographers may collect data in the field through surveying or from
secondary sources, such as censuses, statistical surveys, maps and
photographs
Variables
 Variable defined as any characteristics (attribute) of an
element of a population that can be measured in some
form.
 A variable is any characteristic of an individual.
 A variable can take different values for different
individuals.
Types of variables
Qualitative variables: are non-numeric variables and
cannot be measured; examples include gender, religious
affiliation, place of birth, eye color.
Quantitative variables: are numerical variables and can
be measured.
Quantitative variables can further be classified as either
discrete or continuous
Discrete variables are usually obtained by counting. There
are a finite or countable number of choices available with
discrete data. You cannot have 46.7 pupils in the classroom.
Measurement Level

There are four levels of measurement

Nominal level

Data that can only be classified into categories and cannot be arranged in an ordering

scheme. Examples: eye color, gender, religious affiliation

Ordinal Level

Measurement on an ordinal scale involves putting individuals into an order, ranking them

according to some criteria Example, excellent, good, and poor. During a taste test of three

soft drinks, Coca Cola was ranked number 1, Sprite was ranked number 2, and Fanta was

ranked number 3

Interval level

Interval level is similar to the ordinal level, with the additional property that meaningful

amounts of differences between data values can be determined. For example, temperature

can be measured in degrees centigrade to any desired number of decimal places

Ratio level

The interval levels with an inherent zero starting point. Differences and ratios are meaningful

for this level of measurement. Examples: Money and heights of basketball players.
Unit 2: Population and Sample
Population
 Population is a collection of all possible
individuals, objects, or measurements of interest
 A collection of items of interest in research
 A complete set of things
 A group that you wish to generalize your
research to
Sample
 Sample is a portion of the population of interest
 A subset or part, of the population of interest
 The size smaller than the size of a population
Representative: An accurate reflection of the
population (A primary problem in statistics)
Population (N) Sample (n)
 An example of population is: All the population in
Mettu town
 An example of sample is: 100 people randomly
selected from Mettu Population
Sampling Techniques
• It simply indicates the way how the already decided
samples are selected. There are two types of sampling
techniques/methods
1. Probability sampling
2. Non-probability sampling
Characteristics of probability sampling
3. A known likelihood of being included in the
sample
4. Each item in the population has an equal chance
of being chosen
5. Probability sample may be representative of the
population
Characteristics of Non-probability Sampling:
The following are the main characteristics of
nonprobability sample:
 There is no idea of population in non-
probability sampling
 There is no probability of selecting any
individual
 Non-probability sample has free distribution
 The observations of non-probability sample are
not used for generalization purpose
 Non-parametric or non-inferential statistics are
used in non probability sample
 There is no risk for drawing conclusions from
non-probability sample.
Classification of sampling techniques
There is several probability sampling methods
1. Simple random sampling
2. Systematic sampling
3. Stratified sampling
4. Multi-stage sampling
5. Cluster sampling
Simple Random Sample
• In this sampling method each item or person in the population
has the same chance of being included in the sample
• Most widely used type of sampling
Methods of Randomization
The following are main methods of randomization:
(a) Lottery method of randomization.
(b) Tossing a coin (Head or tail) method.
(c) Throwing a dice.
(d) Blind folded method.
(e) Random tables (Tiptt’s Table of Randomization) .
Systematic Sampling
Systematic sampling is an improvement over the simple
random sampling. This method requires the complete
information about the population. There should be a list of
information of all the individuals of the population in any
systematic way. Now we decide the size of the sample.
Let sample size = n
and population size = N
Now we select each N/nth individual from the list and thus we
have the desired size of sample which is known as systematic
sample. Thus for this technique of sampling population should
be
arranged in any systematic way.
Stratified Sampling
When employing this technique, the researcher divides his
population in strata on the basis of some characteristics and
from each of these smaller homogeneous groups (strata)
draws at random a predetermined number of units
Cluster Sampling
 In Cluster sampling the sample units contain groups of
elements (clusters) instead of individual members or
items in the population
 Cluster sampling is an example of two-stage sampling
or multistage sampling: in the first stage sample of
cluster/s is/are chosen while in the second stage
sample/s of respondent/s within those areas is/are
selected.
Multi-Stage Sampling
 This sample is more comprehensive and
representative of the population.
 Multistage sampling is a complex form of cluster
sampling
Non-probability Sampling Techniques
 Convenience sampling
 Judgement/Purposive sampling
 Quota sampling
 Snowball sampling
 Volunteer sampling
Convenience sampling is that in which the study units that happen to
be available at the time of data collection are selected for purposes of
convenience. Most clinic based studies use this method.
 In this method, the decision maker selects a sample form the
population in a manner that is relatively easy and convenient.
Judgement/Purposive Sampling
 This involves the selection of a group from the population on the
basis of available information thought. For instance, selection of
focus group discussion and key informants interviews
Quota Sampling
 The population is classified into several categories: on the basis of
judgement or assumption or the previous knowledge, the proportion
 of population falling into each category is decided. Thereafter a
quota of cases to be drawn is fixed and the observer is allowed to
sample as he likes
Snowball Sampling
 In this technique initial respondents are selected by
probability methods and then additional respondents are
selected from information provided by the initial
respondents. This method is used to locate members of
rare population by referrals
Volunteer sampling
• A common method of volunteer sampling is phone-in
sampling, used mainly television and radio stations to
gauge public opinion on current affairs issues such as
preferred political party.
Types of Errors in Sampling
• The samples of behavioural research are not representative
and suffer from two types of errors:
(1)Random error
(2) Systematic error
Sample Size Determination
 It refers to the number of sampling units selected from the
population for investigation
 One of the first questions that the researcher typically asks,
concerns with the number of subjects that need to be
included in his sample
 Determine the size of sample is the crucial problem for the
research scholars to determine the size of sample
 There is no single rule that can be used to determine
sample size
 Generally 95 to 99 percent confidence intervals are
acceptable i.e. 5 to 1 percent error
 The chances are 95 to 5 that a sample in separated
sampling will fall within the interval M ± 1.96
Using Formulas to Calculate a Sample Size

You can determine your sample size by the following formula, if the total population size is greater
than 10,000.
n = z2pq
e2
Where, n = the desired sample size
Z= the standard normal deviation, usually set at 1.96 which corresponds to the 95 percent
confidence interval.
P= the proportion in the target population to have a particular characteristic. If there is
reasonable estimate, then we can use 0.5
q=1 – P
e= error margin –the maximum error you need to tolerate.
Example: If the proportion of a target population with a certain characteristic is 0.5, the z-
statistic is 1.96 and we desire at 0.05 error margin, what will be the sample size provided
that the entire population is greater than 10,000. n=
z2pq
e2
n= (1.96)2(0.5) (0.5) = 384
(0.05)2
Unit 3. Data Collection Methods and Analysis
Source of data
 It is important for a researcher to know the source of data
which the researcher requires for his different purpose
 Data are nothing but the information needed for the study
There are two sources of information or data
 Primary data
 Secondary data
Primary data mean the data collected for the first time. It
is one that itself collects the data. Primary data may be
obtained by applying the following methods.
 Household questionnaire survey
 Key informants interviews
 Focus group discussions
 Personal observation
 Instrumental measurement
Secondary data mean the data that have already
been collected and used earlier by somebody or some
agency
 Published and unpublished articles
 Books, online sources, government reports
 Climatic data (rainfall and temperature)
 Demographic data
Both the sources of information have their merits and
demerits
The selection of a particular source of information
depends upon
a) Purpose and scope of enquiry
b) Availability of time
c) Availability of finance
d) Accuracy required
e) Statistical units to be used
Data collection tools/Instruments
There are various data collection tools. These are:
 Questionnaires
 key informant interviews
 Focus Groups
 Direct/participant observation
 Document analysis
 Instrument measurement

Questionnaire
What is questionnaire?
 A list of questions properly selected and
arranged and pertaining to the investigation
 The questionnaire is the media of communication
between the investigator of the respondent
 Respondent is a person who fill the questionnaire
or supplies the required information
There are three basic types of questionnaire closed ended,
open-ended or a combination of both.
Closed-ended questionnaires
• Closed-ended questionnaires are probably the type with
which you are most familiar
• As these questionnaires follow a set format, and as most
can be scanned straight into a computer for ease of
analysis, greater numbers can be produced
Open-ended questionnaires
• Open-ended questionnaires are used in qualitative
research. The questionnaire does not contain boxes to tick,
but instead leaves a blank section for the respondent to
write in an answer.
Combination of both
• Many researchers tend to use a combination of both open
and closed questions. Many questionnaires begin with a
series of closed questions, with boxes to tick or scales to
rank, and then finish with a section of open questions for
more detailed response.
Interviews
 Three types of interview are used in social
research:
 The most common of these are
unstructured, semi-structured and
structured interviews.
Unstructured interviews
• Unstructured or in-depth interviews are sometimes
called life history interviews. This is because they
are the
• In this type of interview, the researcher attempts
to achieve a holistic understanding of the
interviewees’ point of view or situation
• In unstructured interviews researchers need to
remain alert, recognising important information
and probing for more detail
Semi-structured interviews
Semi-structured interviewing is perhaps the most common type of interview used in
qualitative social research. In this type of interview, the researcher wants to know
specific information which can be compared and contrasted with information gained
in other interviews. For this type of interview, the researcher produces an interview
schedule. This may be a list of specific questions or a list of topics to be discussed
Structured interviews
Structured interviews are used in quantitative research and can be conducted face-
to-face or over the telephone, sometimes with the aid of lap-top computers.
Structured interviews are used frequently in market research
Focus group discussions
• Focus groups may be called discussion groups or group interviews. A number of
people are asked to come together in a group to discuss a certain issue.
• The discussion is led by a moderator or facilitator who introduces the topic, asks
specific questions, controls digressions and stops break-away conversations
• Focus groups are held with a number of people to obtain a group opinion.
• Focus groups are run by a moderator who asks questions and makes sure the
discussion does not digress.
• Number of participants should be range between 6-12 individuals
Participant observation
 Observation is a method that employ vision as its main means
of data collection
 It implies the use of eyes rather than the ears and the voice
 Field observation takes place in a natural setting
• Participant observation is used when a researcher wants to
immerse herself in a specific culture to gain a deeper
understanding.
• There are two main ways in which researchers observe direct
observation and participant observation
• Participant observation is popular amongst anthropologists and
sociologists who wish to study and understand another
community, culture or context
• Participant observation can be cover or overt participation
Covert participant observation when the researcher entering
organisations and participating in their activities without anyone
knowing that they were conducting research
Overt participant observation, where everyone knows who the
researcher is and what she is doing, however, can be a valuable
and rewarding method for qualitative inquiry.
Methods of Data Analysis
In this section:
 how to organize,
 analyze and
 interpret collected data,
 The details of the statistical techniques
and rationale for using such techniques
should be also described in the research
 Software packages employed in the
research
Reliability and validity are important concepts
in research
Validity of research is about the degree to which the
research findings are true.
There are three different types of validity
Measurement validity: The degree to which
measures (e.g. questions on a questionnaire)
successfully indicate concepts.
Internal validity: The extent to which causal
statements are supported by the study.
External validity: The extent to which findings can
be generalized to populations or to other settings.
Reliability is the degree to which an assessment tool
produces stable and consistent results. Reliability is
the extent to which an experiment, test, or any
measuring procedure yields the same result on
repeated trials
Unit 4: Descriptive Statistics
The Concept of Statistics
• Statistics is the science of collecting, organizing,
presenting, analyzing, and interpreting data to assist in
making more effective decisions and draw meaning full
inferences from data that then lead to improve decision
• The study of statistics is usually divided into two categories:
descriptive statistics and inferential statistics.
Descriptive statistics: is the method of organizing,
summarizing presenting data in an informative way.
Inferential statistics: is also called statistical inference or
inductive statistics. Our main concern regarding inferential
statistics is finding something about a population based on a
sample taken from that population.
Inferential statistics comes in two ways – estimated and
hypothesis testing.
Measures of central tendency
Central tendency is a single value that summarizes a set of data. It locates the
center of the values. The most common measures of central tendency that widely
used in statistical analysis. These are:
i) Arithmetic mean
ii) Median
iii) Mode
i). Arithmetic mean
What is mean?
 It is the simplest but most useful measure of central tendency
• The mean is the statistical name for what is commonly called the average.
• Arithmetic mean, usually abbreviated to 'mean', is the 'average' in common
use. It is found by totaling the values in a data set and dividing by the number
of items.

Expressed as a formula it is:


Where (x-bar) = mean
(Sigma) = the sum of
x = the values of the variable
n = the number of items in the set
Example
Find the mean of the following values? 6.2, 9.3, 4.8, 7.2, 5.5
Median
 It is the midpoint of the values after they have been ordered from the smallest to the largest,
or the largest to the smallest. The median is the halfway point in a data set
 Median is the score or value ‘of the central item which divides the series into two equal
parts

Example
The weights of seven grade 9 students are 45,50, 55, 48, 56, 49, and 47, .Find the median
Step 1. Arrange the data in order 45, 47, 48, 49, 50, 55, 56,
Step 2. Select the middle value.
45, 47, 48, 49, 50, 55, 56,

Median
The Mode
The mode is the value that appears most frequently
• A set of data can have more than one mode.
Example : The exam scores for ten students are 81,93, 84, 75, 81, 87. Since the score of 81 occurs
the most, the modal score is 81.
Measures of dispersion are concerned with the distribution of values
around the mean in data: The most commonly used measures of dispersions
are:
 Range
 Standard deviation
 Variance
 Coefficient of variation
Range. For ungrouped data, the range is the difference between the
highest and lowest values in a set of data. Range = Highest Value -Lowest
Value
Standard Deviation
Standard deviation, also known as root mean squared deviation, explains the average amount
N
of variation on either side of the mean
 ( xi  x ) 2
Population Standard Deviation =   i 1
N

Sample Standard Deviation


Coefficient of Variation
V=

Examples:
An analysis of the monthly wages paid (in Birr) to workers in two firms A and B belonging to the same industry gives the
following results

Value Firm A Firm B

Mean wage 52.5 47.5

Median wage 50.5 45.5

Variance 100 121

Solutions:
Calculate coefficient of variation for both firms.

SA 10
C.V A  * 100  * 100 19.05%
XA 52.5
SB 11
C.VB  * 100  * 100 23.16%
XB 47.5
Unit 5: Inferential statistics
Correlation techniques
Correlation analysis aims to measure the degree to which two parametric variables can vary together
by means of a single index
Two variables can have association if one variable has relation to another one.
 Independent variable (X)
 Dependent variable (Y)
• Dependent Variable: The variable that is being predicted or estimated,
• Independent Variable: The variable that provides the basis for estimation. It is the predictor variable.
When only two variables (dependent and independent) are involved the correlation is simple correlation
Measures of correlation are employed to explore three points, namely:
 presence or absence of correlation, that is, whether or not there is a correlation between the
variables in question;
 direction of correlation, that is, if there is a correlation, whether it is Positive or negative;
 Strength of correlation, that is, whether an existing correlation is strong or weak
Relationships between variables have both direction and strength.
• Direction
 positive,
 negative,
 indeterminate
• Strength the magnitude of the association
 Very Strong
 Strong
 Weak
 No relationship
 Existence of directional and strength of correlation are demonstrated in
the coefficient of correlation.
 A zero correlation indicates that there is no correlation between the
variables.
 The sign in front of the coefficient indicates whether the variables change
in the same direction (positive correlation) or in opposite direction
(negative correlation). This quantitative expression of dependent and
independent variables is known as Coefficient of correlation. The most
commonly used coefficient of correlation in parametric statistic is
Pearson product-moment coefficient of correlation, r.
 It is the most common measure of association scaled on an interval level
The Coefficient of Correlation, r
• The Coefficient of Correlation (r) is a measure of the strength of the
relationship between two variables.
• It requires interval or ration scaled data (variables)
• It can range from – 1 to 1
• Values of -1 or 1 indicate perfect correlation.
• Values close to 0.0 indicate weak correlation
• Zero value indicates no relationship
• Negative values indicate an inverse relationship and positive values
indicate a direct relationship
If the coefficient has a value;
• Under 0.20 it indicates very weak correlation
• 0.21 - 0.40 = weak
• 0.41 - 0.70 = moderate
• 0.71 - 0.91 = strong
• ≥ 0.91 = very strong
The formulas are:

• Where:
x X  X

y Y  Y
Example 1: the following data are the amount of yield farmers produce by using different
amount of fertilizer application. Amount of yield per hectare 10, 25, 14, 15, 16, 17, 18, 20,
21, 12 and the amount of fertilizer application are: 0.5, 0.8, 1, 1.1, 1.2, 1.3, 1.3, 1.4, 1.4, and
0.5. Is there a relationship between the variables? Indicate the direction and strengthen of
the relationship.

Fertilizer-N (kg ha-1) (X) Yield-Mg ha-1) (Y)


10 0.5
12 0.8
14 1
15 1.1
16 1.2
17 1.3
18 1.3
20 1.4
21 1.4
25 1.5
There is relationship between yield and fertilizer application. The relationship is positive and very
strong. Which means as the farmer applies greater amount of fertilizer he will get larger amount of
yield in a specific plot of land.
Coefficient of Determination
If we wish to know the proportion of variance in Y explained by X, we
can calculate the coefficient of determination by squaring the
correlation coefficient (r2). The coefficient of determination (r2)
measures that part of the total variance of Y that is accounted for by
knowing the value of X. In the example showing the correlation
between yield and application of chemical fertilizer, r = 0.93.
Therefore, (r2) = 0.86.

About 86% of the variance in grain yield can be explained by the


variance in applied chemical fertilizer.
The other variation, i.e., 14% is due to other factors like soil fertility,
rainfall availability, etc.
Coefficient of Determination
Regression is a method that allows us to make predictions about the
value of one variable (Y), if another (X) is known. Predictions are made
by means of the regression line, the definition of which is given by the
formula containing the intercept and the slope of the line. The
regression formula is as follows:
The regression equation: Y= a + bx, where;
Y and X are variables
a and b are constants
The constant a stands for the value Y when x = 0, and represents the
y-intercept. ( a is the Y -intercept, or the estimated Y value when X=O)
The constant b together with x, (bx) represents the slope of the line. In
other words, b is the slope of the line, or the average change in Y' for
each change of one unit in X
The correlation coefficient estimates the degree of closeness of the
linear relationship between two variables. But, the most interesting
questions about these variables are:
 How much does one variable change for a given change in the
other?
 How accurately can the value of one variable be predicted from
the knowledge of the other?
These questions can be answered with the aid of regression
analysis.
Linear regression analysis can be expressed by linear algebraic model:

Where,
b is the intercept,
a is the slope of the line, and
is the expected value of Y (called Y hat) for a given value of X.
This is normally called the estimating equation
The values of a and b are estimates, and when the values of X are
substituted into the equation, the solution of the equation, the solution of
the equation provides estimates of forgiven values of X.
For example, the regression of applied N-fertilizer (X) kg ha -1 and grain yield (Y) Mg ha-1
of the following Table is explained hereafter
Example 1: by using the above given data, find the regression line equation and draw
the graph and interpret the result?
Fertilizer-N (kg
-1
ha ) 10 12 14 15 16 17 18 20 21 25
-1
Yield-(Mg ha ) 0.5 0.8 1 1.1 1.2 1.3 1.3 1.4 1.4 1.5

1.6
Grain yield (Mg) ha

1.4
-1

1.2
1
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16 18 20 22 24
-1
N Fertilizer (kg ha )
Example 1: by using the above given data, find the regression line equation and draw
the graph and interpret the result?
ha

1.8
1.6
Grain yield (Mg

y = 0.065x + 0.062
)

1.4
-1

2
R = 0.8609
1.2
1
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16 18 20 22 24
-1
N Fertilizer (kg ha )
The sign of a expresses the direction of the relationship between Y and X;
when a is positive, it implies that an increase in X is accompanied by an increase in
Y and
When a is negative, Y decreases as X increases.
The regression coefficient, a, is the rate of change (constant) of yield and represents
the slope of the linear function between the two variables.
In this example, a is equal to 0.065. Or in other words, the grain yield increases 0.065
Mg ha-1 for each N-fertilizer increase in kg.
The constant b is the value of Y when X is equal to zero. This constant is called the Y-
intercept, and represents the point at which the linear function passes through the Y-
axis.
In this example, when N-fertilizer is equal to 0, grain yield equals 0.062.
Remember that a indicates the rate of change in the dependent variable (Y) given one
unit change in X
a and b are obtained by the following equation:
For example, the regression equation for Fertilizer (X) and Yield (Y) can be represented
by:
For example, the regression equation for Fertilizer (X) and Yield (Y) can be represented by:

• And a and b computed as:

[10( 204.7)  (168)(11.5)] 115


a  0.065
(10)( 3000)  (168) 2 1776

(11.5)( 3000)  (168)( 204.7) 110.4


b  0.062
10(3000)  (168) 2 1776

b 1.15  0.064752 * 16.8 0.062


• Thus, the equation:

Y 0.065 X  0.062

• Using this equation we can extrapolate or interpolate (estimate) Y values (yields) for
some rates of applied N-fertilizer.

• Extrapolation is the estimation of the dependent variable by using the values beyond
the given data set.

• Interpolation is the estimation of the dependent variable by using the values between
the given data set.
For example, in our example, there is no yield for farmers applying less than 10 kg
For example, the regression equation for Fertilizer (X) and Yield (Y) can be represented by:

N-fertilizer ha-1. The yield estimate ha-1 for a farmer applying 4 kg N-fertilizer ha-1
would be:
Parametric and non parametric tests
• Parametric tests are statistical tests for population parameters such as means,
variances, and proportions about populations from which the samples were
selected. One assumption is that these populations are normally distributed.
• Statistical tests such as z test, t test, F test and ANOVA are typical examples
of parametric tests. For instance, for comparing mean we use t-test.
Comparing between groups (independent-t test), comparing measurements
within the same subject (paired t-test).
What is a non parametric test?
• Statistical analyses which do not depend upon knowledge of the distribution
and parameters of the population are called non-parametric statistics or
distribution free statistics.
Example of non parametric statistics
• Popular Nonparametric Tests
• Sign Test
• Wilcoxon Rank Sum Test
• Wilcoxon Signed Rank Test
• Kruskal Wallis H-Test
• Kolmogorov-Smirnov Test
• Friedman’s Fr-Test
The differences between Parametric and non parametric tests

Parametric test Non parametric test


Information about the population is completely No information about the population is
known available
Specific assumptions are made regarding the No assumptions are made regarding the
population. Example normal distribution population. Example distribution free
Null hypothesis is made on parameter of Null hypothesis is free from parameter
population distribution
Test statistics is based on distribution Test statistics is arbitrary
Parametric tests are applicable only for It is applied for both variable and attributes
variable
No parametric test exist for normal scale data Non parametric test exist for nominal or
ordinal scale data
Parametric test is powerful if it exist It is not so powerful like parametric test
Unit 6: The techniques of hypothesis testing
What is a Hypothesis?
Hypothesis: A statement about the value of a population parameter
developed for the purpose of testing.
• It is also said to be an intelligent guess
• It is a conjecture about the population
There are two types of statistical hypothesis for each situation: the null
hypothesis and the alternative hypothesis.
The null hypothesis symbolized by Ho is a statistical hypothesis that
states that there is no difference between a parameter and a specific
value or there is no difference between two parameters.
The alternative hypothesis symbolized by H1, is a statistical hypothesis
that states the existence of a difference between and the specific value or
there is a difference between two parameters.
What is Hypothesis Testing?
Hypothesis testing: A procedure based on sample evidence and
probability theory used to determine whether the hypothesis is a
reasonable statement and should not be rejected or is unreasonable and
should be rejected.
Hypothesis testing is a form of statistical inference that uses data from a sample to draw
conclusions about a population parameter or a population probability distribution
The hypothesis-testing procedure involves using sample data to
determine whether or not Ho can be rejected. If H o is rejected, the
statistical conclusion is that the alternative hypothesis H1 is true
By rejecting a true hypothesis, we committed a Type I error. The
probability of committing a Type I error is a. The probability of committing
another type of error. Called Type II error, is designated by the Greek
letter beta. ( ) .
• Type I Error: Rejecting the null hypothesis when it was actually true.
• Type II Error: Accepting the null hypothesis when it was actually false.
• Test statistic: A value, determined from sample information, used to
determine whether or not to reject the null hypothesis.
The following table summarizes the decisions the researcher could make
and the possible consequences.

Ho is true Ho is false

Rejects Ho Error type I Correct decision

Accept Ho Correct decision Error type II


Test Statistic: A value, determined from sample information,
used to determine whether to rejects the null hypothesis.
When testing for the population means from a large sample and
the population standard deviation is known, the test statistic is
given by:
Hypothesis Symbols Parameters

Ho , , ,  ,  2 ,  , 

H1 , , ,  2 ,  , 
Steps in Hypothesis Testing
Step 1: State the Null hypothesis (Ho) and the Alternate hypothesis
(H1)
 The first step ·is to state the hypothesis being tested
Step 2: Select the Level of Significance
 Level of Significance: The probability of rejecting the null hypothesis when it
is actually true. The level of significance is designated a. It is sometimes
called the level of risk. There is no one level of significance that is applied to
all tests. A decision is made to use the .05 level (often stated as the 5
percent level), 01 level, the 10 level or any other level between 0 and I.
Step 3: Compute the Test Statistic
 There are many test statistics. Z test, t - test, F, and X2 are only few
Step 4: Formulate the Decision Rule
 The decision rule states the conditions when Ho is rejected
Step 5: Make a Decision
 This step involves the decision concerning the null hypothesis
Step 6: Interpretation of the decision
 There is sufficient proof at the 05 level to allow us to reject the null
hypothesis

You might also like