100% found this document useful (11 votes)
9K views41 pages

SCR 314 Social Statistics Lecture Notes 2021

This document discusses social statistics and the process of collecting and analyzing social data. It defines social statistics as using statistical methods to study human behavior. It also defines statistics and discusses why statistics are studied. The document then covers planning data collection, tools for data collectors, principles of data collection, and recruiting and training data collectors. The overall purpose is to outline the process of gathering and evaluating social data using statistical methods.

Uploaded by

Maxwel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (11 votes)
9K views41 pages

SCR 314 Social Statistics Lecture Notes 2021

This document discusses social statistics and the process of collecting and analyzing social data. It defines social statistics as using statistical methods to study human behavior. It also defines statistics and discusses why statistics are studied. The document then covers planning data collection, tools for data collectors, principles of data collection, and recruiting and training data collectors. The overall purpose is to outline the process of gathering and evaluating social data using statistical methods.

Uploaded by

Maxwel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

SCR 314 SOCIAL STATISTICS LECTURE NOTES

TOPIC 1: STATISTICS AND SOCIAL RESEARCH


What is Social Statistics?
Social statistics is the use of statistical measurement systems to study human behavior in a
social environment. This is mainly done through sampling a particular group of people,
evaluating a particular subset of data obtained about a group of people, or by observation and
statistical analysis of a set of data that relates to people and their behaviors.
What is statistics then?
Statistics like many other sciences is a developing discipline. It is not static. It has gradually
developed during last few centuries. In different times, it has been defined in different manners.
Some of the definitions are reproduced here:
1.
According to Wikipedea the free encyclopedia, it is the study of the collection, organization,
analysis, and interpretation of data. It deals with all aspects of this, including the planning of
data collection in terms of the design of surveys and experiments.
2. Houghton Mifflin (2005) defines it as the branch of mathematics that deals with the
collection, organization, analysis, and interpretation of numerical data. Statistics is
especially useful in drawing general conclusions about a set of data from a sample of the
data
3. It is a branch of applied mathematics concerned with the collection and interpretation of
quantitative data and the use of probability theory to estimate population parameters
4. According to the World English dictionary (2011), statistics is the science that deals with the
collection, classification, analysis, and interpretation of numerical facts or data, and that, by
use of mathematical theories of probability, imposes order and regularity on aggregates of
more or less disparate elements.
5. In the modern definition “statistics are the numerical statement of facts capable of analysis
and interpretation and the science of statistics is the study of the principles and the methods
applied in collecting, presenting, analysis and interpreting the numerical data in any field of
inquiry.”
Why study statistics
Statistics plays a vital role in every fields of human activity. The concept has an important role in
determining the existing position of a particular phenomenon. For example per capita income,

1
unemployment, population growth rate, housing, schooling, medical facilities, crime rate, living
standards etc…in a country. In particular, statistics holds a central position in almost every field
like Industry, Commerce, Trade, Physics, Chemistry, Economics, Mathematics, Biology, Botany,
Psychology, Astronomy, Criminology, Social work etc. In essence application of statistics is
very wide. Statistics are important for many reasons. For example statistics may be useful in:
1. The evaluation of the quality of services available to a particular group or organization
2. Analyzing behaviors of groups of people in their environment and special situations
3. Determining the wants or needs of people through statistical sampling
4. Providing simple yet instant information on the matter it centers on.
5. Statistical methods are useful tools in aiding researches and studies in different fields such as
economics, social sciences, business, medicine and many others.
6. Provides a vivid presentation of collected and organized data through the use of figures,
charts, diagrams and graphs.
7. Helps provide more critical analyses of information to enable decision making, law making
or policy formulation.
8. Statistical techniques are used to make decisions that affect our daily lives. That is, they
affect our personal welfare.
Examples of application of statistics in various fields:
Statistics in School
 May be used to see how the students are performing collectively in their studies.
 Gives information about the school‟s population change for planning and allocation of
resources.
 Helps in processing certain evaluations and surveys given to help improve the school‟s
system
 Determine the relationship of educational performance to other factors such as
socioeconomic background, gender, and region.
Statistics in Social Science
 Helps in providing the government more information about its citizens – planning and
allocation of resources
 Statistical results may initiate social reforms that would help benefit the standard of living
 Aids in knowing which problems or matters are there to prioritize and give much attention to
Statistics in Sports
2
 Gives a vivid summary of the events in a game with the help of well-tabulated scores and
other parameters
Statistics in Science
 Endangered species of different wildlife could be protected through regulations and laws
developed using statistics
 Epidemics and diseases are monitored with the aid of statistics.
 Helps in the evaluation of certain medical practices and the effectiveness of drug
Statistics in Criminology
Question:
Discuss the place of statistics in Criminology and Social Work
a) Criminology
Determine predominant crimes in the society
To establish the perpetrators of the various crimes in society
To establish victims of various crimes
Formulation of laws/policies based on stat
In summary, there are at least three reasons for studying statistics:
(1) Data are everywhere,
(2) Statistical techniques are used to make many decisions that affect our lives, and
(3) No matter what your career, you will make professional decisions that involve data. An
understanding of statistical methods will help you make these decisions more effectively

3
TOPIC 2: PLANNING, CODING, GROUPING AND PROCESSING DATA
• Data collection plan refers to a document that defines all details concerning data
collection, including how much and what type of data is required, when and how it
should be collected.
• Some common data collection instruments used in Social Sciences includes;
questionnaires, interview schedule, observation schedule, focus group discussion and
document analysis.
Purpose of data collection
According to Kombo D. K.et al (2006), data collection is done because of the following reasons;
 Stimulate new ideas by identifying areas related to the research topic
 Create awareness and improvement by highlighting situation as it appears
 Influence legislative policies and regulations
 Provide justification of an existing programme
 Evaluate responsiveness and effectiveness of the study
 Promote decision making and resource allocation based on solid evidence
Why should we plan for data collection?
• It helps to ensure that the data gathered contain real information, useful to the
improvement effort.
• It prevents errors that commonly occur in the data collection process.
• It saves time and money that otherwise might be spent on repeated or failed attempts to
collect useful data.
Preparation for data collection
The success of any study depends on how well one plans for the actual data. Most important are
the data collectors in the field who are in charge of gathering and recording accurate and reliable
data. Therefore their preparation cannot be taken for granted. In preparation for data collection
one needs to:
• plan the data collection visits;
• prepare the data collection forms needed for field visits;
• prepare information materials and tools for data collectors; and
• Arranging for regular communications.

4
Steps for an effective data collection plan:

o Define the sample population


o Reflect on the research design
o Ensure research instruments are ready and in order
o Define the data or information to be collected or sought
o Request permission from relevant authority i.e. National Council of Science and
Technology
o Pretest the instruments using pilot study

Tools and information for data collectors

• A list of data collection teams and their contact

5
• A schedule of visits to survey sites
• The contact details of the sites to be visited
• Copies of letter(s) of endorsement and introduction
• Relevant handouts or instruction sheets
• Pens (pencils should not be used to record data), a clipboard and other supplies
• A field notebook to record any significant events
• Field allowance for local expenses
• Get advice from affected people
Principals of data collection

• Explore specific needs i.e. vulnerability of various groups to ensure that their benefits are
considered
• Examine the accurate information, event or opinion
• Consider misunderstood information
• Identify abandoned groups to ensure their benefits are considered
• Understand changes and trend affecting the society
• Prepare for unexpected events that may happen
• Examine effects of the study on the overall society
• keep in mind of how to use information
Recruiting and training data collectors

The people selected should portray the following characteristics:

• Are respected and trusted by the respondents

• Good listeners

• God inter-personal skills will set the respondents at ease

• Understands and sensible to the issues to be discussed

• Fast in responding to training and demonstrate that he/she would follow instructions and
application of protocol

Dealing with study participants

• Ensuring study participants are treated properly

• Researchers and their assistants should be open and feel responsible for the needs of
participants

• Try to put yourself in the participant‟s shoes and try to think what the participant would
find unpleasant

• The researcher needs to ensure there is a relaxed atmosphere

6
• The researcher should provide concrete information prior to/during recruitment, including
what exactly is expected from participants

• The researcher should pass on complaint if any from participant immediately to the
project leader.

Ethical issues in data collection

• The researcher must justify the research via an analysis of the balance of costs

• Maintain confidentiality at all times

• Researchers are responsible for their own work and contribution

• Researchers should obtain consent from the subjects used and give information
voluntarily

• Researcher should be open and honest with other researchers and research subjects

• Subject‟s physical and psychological being should be protected

• Researcher must fully explain the research in advance and „debrief‟ subjects afterwards

Choosing Data Processing Tools

• Use electronic data processing machines in studies involving a large number of cases or
complex analysis procedure.

• In deciding to use computers for statistical analysis consider:

• Number of cases.

• Number of variables under study.

• Complexity of statistical analysis.

• Number of analysis to be done.

• Availability of suitable computer programs.

• Availability of consultants familiar with the programs to be used.

Coding data

• Done to permit quantitative analysis.


• Data is converted to numeric codes representing attributes or measurement of variables.

7
• As much information as possible should be included at this stage to avoid losing details
that would initially be omitted.
• Understand the coding scheme for consistency.
• Code categories in the instruments should be exhaustive: only one code assigned to each
response category.
• Choice of coding procedure done according to the level of indicator or type of data
(numeric or categorical) you have.

8
TOPIC 3: SAMPLING AND SAMPLING TECHNIQUES
Research involves studying a particular phenomenon to establish its position/status. The phenomenon the
researcher is interested in is called the target population.
What is a target population?
1. Are the members of a real or hypothetical set of people, events or objects which the researcher
wishes to generalize the results of the findings
2. Is a set of people or objects the researcher intents to reach or question
3. Is the population of individuals which we are interested in describing and making statistical
inferences
4. Is the collection of all individuals, families, groups, organizations, and events that we are
interested in finding out about. For example, all undergraduate students in MMUST
However in research it is not possible to study the whole population as it is in census. This is because of
the cost involved, the time and logistic requirements. Mainly social scientists opt to study a portion of the
targeted population with a view of inference to the target population. To do so they employ the concept of
sampling.
What is sampling?
1. The act, process, or technique of selecting a representative part of a population for the purpose of
determining parameters or characteristics of the whole population
2. The process of selecting a sub-set of people, events, cases or objects from a set in order to draw
conclusions about the entire set
3. Statistical method of obtaining representative data or observations from a group (lot, batch,
population, or universe).

4. The act, process, or technique of selecting an appropriate sample

5. Sampling the process of taking any portion of a population or universe as representative of that
population or universe.

What one needs to do before sampling?


1. Have a well-defined population
2. Have an adequately chosen sample frame
3. Have a well-defined sample unit
Sampling frame
A set of information or procedure used to identify a sample population for statistical treatment. A
sampling frame includes a numerical identifier for each individual, plus other identifying information
about characteristics of the individuals, to aid in analysis and allow for division into further frames for
more in-depth analysis.
9
The actual list of sampling units from which the sample, or some stage of the sample, is selected. It is
simply a list of the study population.
Sampling unit
That element or set of elements considered for selection in some stage of sampling.
A single section selected to research and gather statistics of the whole. For example, when studying a
group of college students, a single student could be a sampling unit.
Sample Size
The number of elements in the obtained sample
Why sample?
 Gathering data on a sample is less time consuming
 Gathering data on a sample is less costly
 Sampling is the only practical method of data collection
 Sampling is the only practical method of data analysis
 Sampling permits one to inference results to the sample
 Sampling enables one to conduct research
 A small data set ensures homogeneity and improves the accuracy and quality of the data.

The end product in the sampling process is a sample. Samples are used in statistical testing when
population sizes are too large for the test to include all possible members or observations. A sample
should represent the whole population and not reflect bias toward a specific attribute.
What is a sample?
1. A subset containing the characteristics of a larger population
2. A set of individuals or items selected from a population for analysis to yield estimates of, or to
test hypotheses about, parameters of the whole population.
3. A sample is a smaller, manageable version of a larger group.
4. A portion, piece, or segment that is representative of a whole
5. A portion of the members of a set of people, events or objects which the researcher wishes to use
to generalize the results of the findings
6. A sample is some part of a larger body specially selected to represent the whole.
Types of samples
1. A biased sample
A sample in which the items selected is not as a result of probability. Items are selected because
they share some property
2. A random sample
10
Is a sample whose selection is not biased but subject to probability where each member in the
population has a chance to be selected.
Sampling techniques
Are strategies or methods used in selecting a sample from the target population.
There are mainly two types of sampling techniques in research namely probability and non-probability
sampling.
1. Probability sampling
Is a method of sampling that utilizes some form of random selection. In order to have a random selection
method, you must set up some process or procedure that assures that the different units in your population
have equal probabilities of being chosen. Humans have long practiced various forms of random selection,
such as picking a name out of a hat, or choosing the short straw. These days, we tend to use computers as
the mechanism for generating random numbers as the basis for random selection
Probability sampling methods
a) Simple Random Sampling

 Is a sampling scheme with the probability that any of the possible subsets of the sample is
equally likely to be the chosen
 A way of selecting the sample is by means of a table of random numbers. SRS can be
with or without replacement.
b) Systematic Sampling/Interval Random Sampling

 Where each element in the population has the same chance of being selected from the
sample.
 Where every Kth person, starting with a person randomly selected from among the first K
persons is selected.
 This method is referred to as a systematic sample with a random start.
c) Stratified Sampling
 This is where populations are classified into strata and separate samples selected from
each strata.
 The ultimate function of stratification is to organize the population into homogeneous
subsets and to select the appropriate number of elements from each.
Reasons for Stratification

 Increase sample efficiency (ie lower sampling variance)


11
 To ensure that certain key subgroups will have sufficient sample
 Creation of strata permits the use of different sample designs for different portions of the
population.
Methods involved
Proportionate Stratified
This is where the strata sample size are made proportional to the strata population size-ie a
uniformed sampling fraction is used.
Disproportionate Stratified
This is where an uninformed sampling is used.
d) Cluster Sampling
This is where all the elements in selected clusters are included in the sample.
Usually the sampling unit contains more than one population element.
2. Non-Probability Sampling
This is where the probability of inclusion in the sample is unknown.
Types of Nonprobability Sampling
a) Availability/Accidental Sampling
This is where the first available appropriate sample are used.
b) Quota Sampling
Selects quotas to represent sub-populations.
c) Purposive/Judgmental Sampling
Selecting sample on the basis of knowledge of the research problem to allow selection of
"typical" persons for inclusion in the sample.
d) Snowball Sampling
This is where researchers solicit help from respondents in identifying the population under study.

12
TOPIC 3: CLASIFICATION AND TABULATION OF DATA
1. Classifications of data
Data classification is the categorization of data for its most effective and efficient use.
a) According to Nature
Data can either be:
i) Quantitative data
The information obtained from numeral variables e.g. age, bills, etc
ii) Qualitative Data
It is the information obtained from variables in the form of categories, characteristics, names or labels or
alphanumeric variables (e.g. birthdays, gender etc.)
b) According to Source
i) Primary data
First- hand information obtained from autobiography, financial statement
ii) Secondary data
This is second-hand information obtained from biography, weather forecast, newspapers etc.
c) According to Measurement
i) Discrete data
Are countable numerical observations which assume whole numbers only
- has an equal whole number interval
- obtained through counting (e.g. corporate stocks, etc.)
ii) Continuous data
Measurable observations that assume both whole and decimals or fractions
-obtained through measuring (e.g. bank deposits, volume of liquid etc.)
d) According to arrangement
i) Ungrouped data
Is raw data which has been obtained from the field and in its natural form with no specific arrangement
ii) Grouped Data
Organized set of data arranged in a particular form as either tallying, simple frequency table or grouped
table

13
Tabulation of data

It is cumbersome to study or interpret large data without grouping it, even if it is arranged sequentially.
For this, the data are usually organized into groups called classes and presented in a table which gives the
frequency in each group. Such a frequency table gives a better overall view of the distribution of data and
enables a person to rapidly comprehend important characteristics of the data.
The process of placing classified data into tabular form is known as tabulation. A table is a symmetric
arrangement of statistical data in rows and columns. Rows are horizontal arrangements whereas columns
are vertical arrangements. It may be simple, double or complex depending upon the type of classification.
Simple frequency tables
It is a table containing raw data arranged in ascending or descending order indicating the number of
frequency for each variable
Example
A survey was taken on Maple Avenue. In each of 20 homes, people were asked how many cars were
registered to their households. The results were recorded as follows:
1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0
Arrange the data in a simple frequency table

Number of cars (x) Tally Frequency (f)


0 4
1 6
2 5
3 3
4 2

Grouped frequency tables


It‟s a table which group data into classes using class intervals
A class interval is the number of elements in a given class e.g. 2 – 5 is a class containing elements 2, 3, 4
and 5. The total number of elements is 4, hence the class interval is 4
Example
Thirty AA batteries were tested to determine how long they would last. The results, to the nearest minute,
were recorded as follows:
423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401, 363, 391, 405, 382, 400, 381, 399,
415, 428, 422, 396, 372, 410, 419, 386, 390
The lowest value is 363 and the highest is 431.
14
Using the given data and a class interval of 10, the interval for the first class is 360 to 369 and includes
363 (the lowest value). Remember, there should always be enough class intervals so that the highest value
is included.
The completed frequency distribution table should look like this:

Battery life, minutes (x) Tally Frequency (f)

360–369 2

370–379 3

380–389 5

390–399 7

400–409 5

410–419 4

420–429 3

430–439 1

Total 30

15
6.0 Probability

Probability is a branch of mathematics that deals with calculating the likelihood of a given event's
occurrence and can be expressed either as a fraction, decimal or percentage. The probability of an event
ranges between 1 and 0. An event with a probability of 1 is considered a certainty: for example, the
probability of dying. An event with a probability of 0 is considered as impossibility: for example, the
probability of being God.
In general, for any event A, a minimum value P (A) = 0 and the maximum value of P (A) = 1. Therefore
the probability of any event ranges between 0≤ P (A) ≤1
Definition of terms
a) Sample Space
It is the set of all possible elementary events or outcomes for the experiment.
Examples
i. For the experiment of throwing a coin, S = {H, T} where "S" represents the sample space, "H"
the elementary event of getting a head and "T" the elementary event of getting a tail.
ii. For the experiment of rolling a dice, S = {1, 2, 3, 4, 5, 6}, where "S" represents the sample space,
"1" the elementary event of the number 1 appearing on the top of the dice, "2" the elementary
event of the number 1 appearing on the top of the dice, ... and"6" the elementary event of the
number 6 appearing on the top of the dice.
iii. For the experiment of tossing 3 coins, S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
iv. For the experiment of rolling 2 dice
v. For the experiment of throwing a dice and a coin
b) Event
It is any subset of the sample space or an event is one or more outcomes of an experiment.
Examples
i. In the experiment of rolling a dice, whose sample space is Ω = {1, 2, 3, 4, 5, 6}
a. Event of getting an even number (say Event "A"), would be represented as A = {2, 4, 6}
b. Event of getting a prime number (say Event "H"), would be represented as H = {2, 3, 5}
c. Event of getting a multiple of 3 (say Event "F"), would be represented as F = {3, 6}
A, H and F are all Subsets of Ω
ii. In the experiment of tossing 3 coins whose sample space is S = {HHH, HHT, HTH, HTT, THH,
THT, TTH, TTT}
a. Event of getting two heads (say Event "C"), would be represented as C = {HHT, HTH,
THH}
b. Event of getting all three of the same kind (say Event "G"), would be represented as G =
{HHH, TTT}. C and G are Subsets of S.
c) An outcome
An outcome is the result of a single trial of an experiment.
Example
The possible outcomes on landing on yellow, blue, green or red ball
The outcome of getting a 1, 2, 3, 4, 5 or 6 in throwing a dice

Probability of an event
The probability of event A is the number of ways event A can occur divided by the total number
of possible outcomes.
16
The Number Of Ways Event A Can Occur
P(A) =
The total number Of Possible Outcomes

Example
Experiment 1: A spinner has 4 equal sectors colored yellow, blue, green and red. After
spinning the spinner, what is the probability of landing on each color?

Outcomes: The possible outcomes of this experiment are yellow, blue, green, and red.

Probabilities: # of ways to land on yellow 1


P(yellow) = =
total # of colors 4

# of ways to land on blue 1


P(blue) = =
total # of colors 4

# of ways to land on green 1


P(green) = =
total # of colors 4

# of ways to land on red 1


P(red) = =
total # of colors 4

Experiment 2: A single 6-sided die is rolled. What is the probability of each outcome? What is
the probability of rolling an even number? or rolling an odd number?

Outcomes: The possible outcomes of this experiment are 1, 2, 3, 4, 5 and 6.

Probabilities: # of ways to roll a 1 1


P(1) = =
total # of sides 6

# of ways to roll a 2 1
P(2) = =
total # of sides 6

# ways to roll an even number 3 1


P(even) = = =
total # of sides 6 2

# ways to roll an odd number 3 1


P(odd) = = =
total # of sides 6 2

Mutually Exclusive Event


Definition: Two events are mutually exclusive if the occurrence of an event A bars the occurrence of
another event B. i.e. they cannot occur at the same time (i.e., they have no outcomes in
common).

17
Example:
The probability of landing on a 1 excludes a 2, 3, 4, 5, or 6
Failing excludes passing and vice versa

In general if two events A and B are mutually exclusive, then their probability of occurring is given by:
P (A or B) = P (A) + P (B)

Independent Events

Definition: Two events, A and B, are independent if the fact that A occurs does not affect the
probability of B occurring.
Some other examples of independent events are:
 Landing on heads after tossing a coin AND rolling a 5 on a single 6-sided die.
 Choosing a marble from a jar AND landing on heads after tossing a coin.
 Choosing a 3 from a deck of cards, replacing it, AND then choosing an ace as the second card.
 Rolling a 4 on a single 6-sided die, AND then rolling a 1 on a second roll of the die.

In general if two events A and B are independent, then their probability of occurring is given by:
P (A and B) = P (A) * P (B)

18
WORKED EXAMPLES OF NON-PARAMETRIC TESTS

1. CONTINGENCY TABLES/CHI SQUARE STATISTICS

A contingency table is a table that shows the relationship between two categorical variables. The
Chi-square statistic reflects the strength of this relationship. All else equal, the greater the chi-
square statistic, the stronger the relationship. The chi square statistic is usually reported at the
bottom of a contingency table. The probability associated with the chi-square statistic indicates
the probability that you would be incorrect if you were to assert that there is a relationship
between these same two variables in the population from which you drew your sample.

What is a Chi Square?

It is a statistical technique which compares the tallies or counts of categorical


responses between two (or more) independent groups.
When do we use a Chi Square Test?
When we examine the relationship between two categorical variables
Assumptions of Chi Square
The statistics generated by the computer for chi-square are only valid if the data
meet the following qualifications:
a) Both the independent and dependent variables are categorical
b) Researchers used a random sample to collect data
c) Researchers had an adequate sample size. Generally the sample size should be at least 100
d) The number of respondents in each cell should be at least 5. If not, you can use a Fisher‟s
Exact or other tests.
e) The variables you consider must be mutually exclusive
f) Data should be in a contingency table
Characteristics of the chi square statistic
It is relatively easy to interpret a chi square statistic if you know three things
•First – all else equal, the greater the chi square number, the stronger the relationship between
the dependent and independent variable
•Second – the lower the probability associated with a chi-square statistic, the stronger the
relationship between the dependent and independent variable.

19
•Third – If your probability is .05 or less, then you can generalize from a random sample to a
population, and claim the two variables are associated in the population.
Question 1
A public opinion poll surveyed a simple random sample of voters to establish whether there is a
relationship between gender and voting preference. Respondents were classified by gender (male
or female) and by voting preference (Republican, Democrat, or Independent). Results are shown
below.
Voting Preferences
Gender Republican Democrat Independent
Male 200 150 50
Female 250 300 50

a) Giving reasons, which is the best statistical test for analysing the data?
b) State and explain any three assumptions of the test statistic identified in a above
c) State the null hypothesis of the study.
d) At 0.05 significance level, does the data support the null hypothesis?
e) Report your findings.
Voting Preferences
Gender Republican Democrat Independent Total
Male 200 150 50 400
Female 250 300 50 600
Total 450 450 100 1000

O E O-E (O-E)^2 (O-E)^2/E


200 180 20 400 2.2222
150 180 -30 900 5.0000
50 40 10 100 2.5000
250 270 -20 400 1.4815
300 270 30 900 3.3333
50 60 -10 100 1.6667
16.2037

The chi square calculated is 16.2037


The degree of freedom (df) for the chi square is given by = (c-1)(r-1)=(3-1)(2-1)=2x1=2

20
At df = 2 and significance level = 0.05, the chi square critical is 5.99 form the table of critical
values of the chi square.
Since chi square calculated (16.2037) is greater than the chi square critical (5.99), we reject the
hypothesis that voting preference is not guided by gender, in deed it is
Question 2
A researcher sought to establish the relationship between stream and perception in mathematics.
The results are tabulated below.
Perception in mathematics
Stream Excellent Average Poor
3A 88.7 60.2 40.1
3B 82.6 64.2 37.3
3C 85.6 66.4 42.8
a) Giving reasons, which is the best statistical test for analysing the data?
b) State the null hypothesis to be tested.
c) At 0.05 significance level, does the data support the null hypothesis?
d) Report your findings.
Critical Values of Chi square

Level of Significance
df 0.20 0.10 0.05 0.02 0.01 0.001
1 1.642 2.706 3.841 5.412 6.635 10.828
2 3.219 4.605 5.991 7.824 9.210 13.816
3 4.642 6.251 7.815 9.837 11.345 16.266
4 5.989 7.779 9.488 11.668 13.277 18.467
5 7.289 9.236 11.070 13.388 15.086 20.515
6 8.558 10.645 12.592 15.033 16.812 22.458
7 9.803 12.017 14.067 16.622 18.475 24.322
8 11.030 13.362 15.507 18.168 20.090 26.124
9 12.242 14.684 16.919 19.679 21.666 27.877
10 13.442 15.987 18.307 21.161 23.209 29.588
11 14.631 17.275 19.675 22.618 24.725 31.264
12 15.812 18.549 21.026 24.054 26.217 32.909
13 16.985 19.812 22.362 25.472 27.688 34.528

21
2. SPEARMAN’S RANK CORRELATION
The Spearman rank-order correlation coefficient also referred to as Spearman Correlation
Coefficient or Spearman's rho. It is typically denoted either with the Greek letter rho (ρ), or rs is
a nonparametric measure of the strength and direction of association that exists between two
variables measured on at least an ordinal scale. It is denoted by the symbol rs (or the Greek letter
ρ, pronounced rho). The test is used for either ordinal variables or for continuous data that has
failed the assumptions necessary for conducting the Pearson's product-moment correlation. For
example, you could use a Spearman‟s correlation to understand whether there is an association
between exam performance and time spent revising; whether there is an association between
depression and length of unemployment; and so forth.
Assumptions of Spearman’s Rank Correlation
1. Your two variables should be measured on an ordinal, interval or ratio scale.
2. There needs to be a monotonic relationship between the two variables. A monotonic
relationship exists when either the variables increase in value together, or as one variable
value increases, the other variable value decreases. Whilst there are a number of ways to
check whether a monotonic relationship exists between your two variables, we suggest
creating a scatterplot using SPSS Statistics, where you can plot one variable against the other,
and then visually inspect the scatterplot to check for monotonicity. The following graphs
illustrate monotonic functions:

Monotonically increasing Monotonically decreasing Not monotonic


 Monotonically increasing - as the x variable increases the y variable never decreases;
 Monotonically decreasing - as the x variable increases the y variable never increases;

22
 Not monotonic - as the x variable increases the y variable sometimes decreases and
sometimes increases.
Spearman’s correlation coefficient
Spearman‟s correlation coefficient is a statistical measure of the strength of a monotonic
relationship between paired data. In a sample it is denoted by and is by design constrained as
follows: -1≤ rs≤1
And its interpretation is similar to that of Pearsons, e.g. the closer is to the stronger the
monotonic relationship. Correlation is an effect size and so we can verbally describe the strength
of the correlation using the following guide for the absolute value of rs:
.00-.19 “very weak”, .20-.39, “weak”; .40-.59, “moderate”; .60-.79, “strong”; .80-1.0, “very
strong”
Formula for calculating Spearman’s correlation coefficient

The following formula can be used to calculate this coefficient, it is

Where: d2 is the sum of the squared differences between the pairs of ranks, and n is the number
of pairs.

The advantages of this coefficient are that, if calculation is to be done by hand, it is easier to
calculate, and can be used for any data that can be ranked - which includes quantitative data.

Procedure for calculating the Spearman’s correlation coefficient


1. Create a table from your data.
2. Rank the two data sets. Ranking is achieved by giving the ranking '1' to the biggest number
in a column, '2' to the second biggest value and so on. The smallest value in the column will
get the lowest ranking. This should be done for both sets of measurements. Tied scores are
given the mean (average) rank.
3. Find the difference in the ranks (d): This is the difference between the ranks of the two
values on each row of the table.
4. Square the differences (d²) to remove negative values and then sum them ( d²).

23
5. Use the following formula below to calculate the Spearman‟s correlation coefficient.

Example
A researcher sought to establish the rating whether the price of a bottle of water decreases as
distance from the Contemporary Art Museum increases. The results are tabulated below.
Convenience Store Distance from CAM (m) Price of 50cl bottle (€)
1 50 1.80
2 175 1.20
3 270 2.00
4 375 1.00
5 425 1.00
6 580 1.20
7 710 0.80
8 790 0.60
9 890 1.00
10 980 0.85

a) Which is the best test statistic suitable to analyse the data and why?
b) State the assumptions of the identified test statistic
c) State the null hypothesis to be tested
There is no significant relationship between the price of a convenience item and distance
from the Contemporary Art Museum.
d) Does the data support the null hypothesis?
Calculate the value of Spearman’s correlation coefficient using step 1 -4 above.

24
Distance
Price of Difference
Convenience from Rank Rank
50cl bottle between d²
Store CAM distance price
(€) ranks (d)
(m)
1 50 10 1.80 2 8 64
2 175 9 1.20 3.5 5.5 30.25
3 270 8 2.00 1 7 49
4 375 7 1.00 6 1 1
5 425 6 1.00 6 0 0
6 580 5 1.20 3.5 1.5 2.25
7 710 4 0.80 9 -5 25
8 790 3 0.60 10 -7 49
9 890 2 1.00 6 -4 16
10 980 1 0.85 8 -7 49
d² =
285.5

 Calculate the coefficient (rs) using the formula:

 The answer will always be between 1.0 (a perfect positive correlation) and -1.0 (a perfect
negative correlation).
 Now to put all these values into the formula.
 Find the value of all the d² values by adding up all the values in the d² column. In our
example this is 285.5. Multiplying this by 6 gives 1713.
 Now for the bottom line of the equation. The value n is the number of sites at which you
took measurements. This, in our example is 10. Substituting these values into n(n2 – n)
we get 10(102-1)= 10(100-1) =10*99 = 990
 We now have the formula: rs = 1 - (1713/990) which gives a value for rs: 1 - 1.73 = -
0.7303
What does this rs value of -0.73 mean?
The closer rs is to +1 or -1, the stronger the likely correlation. A perfect positive correlation
is +1 and a perfect negative correlation is -1. The rs value of -0.7303 suggests a fairly strong
negative relationship.

25
A further technique is now required to test the significance of the relationship. We do so by
comparing the calculated spearman rank correlation test statistic and the critical spearman
rank correlation test statistic. To get the latter, we check the value of the statistic where the
level of significance on a two or one tailed (depends of the way the Ho is stated) meet the
degree of freedom (d.f) of the spearman rank correlation.
The degree of freedom for spearman rank correlation is given by the number of pairs in your
sample minus 2 i.e. df= (n-2). In the example it is 8 i.e. (10 - 2). Our level of significance is
0.05 on a two-tailed test (Ho). In the table of critical values for Spearman's rank correlation
coefficient check for the spearman rank correlation statistic value given by the meeting point
of the level of significance 0.05 and n-2 (n=8), the test statistic is 0.738
The value is
Critical values for Spearman's rank correlation coefficient
df Two-sided
(n-2) .10 .05 .01
5 .900 -- --
6 .829 .886 --
7 .714 .786 .929
8 .643 .738 .881
9 .600 .700 .833
10 .564 .648 .794
11 .536 .618 .818
12 .497 .591 .780
13 .475 .566 .745
14 .457 .545 .716
15 .441 .525 .689
16 .425 .507 .666
17 .412 .490 .645
18 .399 .476 .625
19 .388 .462 .608
20 .377 .450 .591
21 .368 .438 .576
22 .359 .428 .562
23 .351 .418 .549
24 .343 .409 .537
26
25 .336 .400 .526
26 .329 .392 .515
27 .323 .385 .505
28 .317 .377 .496
29 .311 .370 .487
30 .305 .364 .478

Since our Spearman's rank correlation coefficient calculated value (0.7303) is less than our
Spearman's rank correlation coefficient critical value (0.738), we accept our null hypothesis.
Example 2
Two doctors assessed the condition of eight patients suffering from particular symptoms. To do
this they ranked the patients from 1 (best) to 8 (worst): the results are tabulated.
Patient Doctor A Doctor B
1 4 5
5 1 3
3 3 1
4 2 2
5 6 6
6 5 4
7 8 7

Is the ranking significant?

Parametric testing is defined by making one or more assumptions about the population's
properties. The most common assumptions to make are that the population will be normally
distributed or have data based on an equal-interval scale

3. THE INDEPENDENT-SAMPLES T-TEST (OR INDEPENDENT T-TEST)


Introduction
The independent-samples t-test (or independent t-test, for short) compares the means between
two unrelated groups on the same continuous, dependent variable. For example, you could use an
independent t-test to understand whether first year graduate salaries differed based on gender
(i.e., your dependent variable would be "first year graduate salaries" and your independent
variable would be "gender", which has two groups: "male" and "female"). Alternately, you could
use an independent t-test to understand whether there is a difference in test anxiety based on
educational level (i.e., your dependent variable would be "test anxiety" and your independent

27
variable would be "educational level", which has two groups: "undergraduates" and
"postgraduates").
Assumptions of independent sample t-test
When you choose to analyse your data using an independent t-test, part of the process involves
checking to make sure that the data you want to analyse can actually be analysed using an
independent t-test. You need to do this because it is only appropriate to use an independent t-test
if your data "passes" six assumptions that are required for an independent t-test to give you a
valid result
1. Your dependent variable should be measured on a continuous scale (i.e., it is measured at
the interval or ratio level). Examples of variables that meet this criterion include revision
time (measured in hours), intelligence (measured using IQ score), exam performance
(measured from 0 to 100), weight (measured in kg), and so forth.
2. Your independent variable should consist of two categorical, independent groups.
Example independent variables that meet this criterion include gender (2 groups: male or
female), employment status (2 groups: employed or unemployed), smoker (2 groups: yes or
no), and so forth.
3. You should have independence of observations, which means that there is no relationship
between the observations in each group or between the groups themselves. For example,
there must be different participants in each group with no participant being in more than one
group.
4. There should be no significant outliers. Outliers are simply single data points within your
data that do not follow the usual pattern. The problem with outliers is that they can have a
negative effect on the independent t-test, reducing the validity of your results.
5. Your dependent variable should be approximately normally distributed for each group
of the independent variable. We talk about the independent t-test only requiring
approximately normal data because it is quite "robust" to violations of normality, meaning
that this assumption can be a little violated and still provide valid results. You can test for
normality using the Shapiro-Wilk test of normality.
6. There needs to be homogeneity of variances. You can test this assumption using Levene‟s
test for homogeneity of variances.

28
The independent t-test, as we have already mentioned is used when we wish to compare the
statistical significance of a possible difference between the means of two groups on some
independent variable and the two groups are independent of one another. The formula for the
independent sample t-test is:

is the mean for group 1,

is the mean for group 2,

is the sum of squares for group 1,

is the sum of squares for group 2,


n1 is the number of subjects in group 1, and
n2 is the number of subjects in group 2.
The sum of squares is a new way of looking at variance. It gives us an indication of how spread
out the scores in a sample is. The t-value we are finding is the difference between the two means
divided by their sum of squares and taking the degrees of freedom into consideration.

and

We also need to know the degrees of freedom for the independent t-test which is:

Example problem using the independent t-test


Job satisfaction as a function of work schedule was investigated in two different factories. In the
first factory the employees are on a fixed shift system while in the second factory the workers
have a rotating shift system. The results are indicated in the table below.

29
Fixed Shift 79 83 68 59 81 76 80 74 58 49 68
Rotating Shift 63 71 46 57 53 46 57 76 52 68 73

a) Which test statistic would be most suitable to analyse the above data and why
b) Explain three assumptions of the test above
c) State the null hypothesis to be tested
d) Using the scores above determine if there is a significant difference in job satisfaction
between the two groups of workers
X1 X2 (X1)2 (X2)2
79 63 6241 3969
83 71 6889 5041
68 46 4624 2116
59 57 3481 3249
81 53 6561 2809
76 46 5776 2116
80 57 6400 3249
74 76 5476 5776
58 52 3364 2704
49 68 2401 4624
68 73 4624 5329
775 662 55837 40982

We can use the totals from this worksheet and the number of subjects in each group to calculate
the sum of squares for group 1, the sum of squares for group 2, the mean for group 1, the mean
for group 2, and the value for the independent t.

30
Therefore our t-calculated value is 2.209. We need to compare this with the t-critical from
statistical tables.
The degree of freedom for the t-test is given by: df = n1 + n2 - 2 = 11 + 11 - 2 = 20. The
significance level is 0.05
To know the critical value for critical t, we use the statistical tables for t-test with an alpha level
of 0.05 and a two-tailed test. Look for the column of the table under .05 for Level of significance
for two-tailed tests, read down the column until you are level with 20 in the df column, and you
will find the critical value of t which is 2.086.

31
Finally, compare the calculated t value (2.209) with the critical t value (2.086).That means our
result is significant if the calculated t value is greater than or equal to -2.086 or is less than or
equal to 2.086.
Since our calculated value of t (2.209) is greater than the critical value of t (2.086), we reject the
null hypothesis and accept the alternative hypothesis.
Therefore, there is a significant difference in job satisfaction between the two groups of workers
as shown by the t –test, t(20) = 2.209, p = 2.086, at 0.05.

Exercise
A researcher sought to establish whether two types of music, type-I and type-II, had different
effects upon the ability of college students to perform a series of mental tasks requiring
concentration? The researcher picked a fairly homogeneous subject pool of 30 college students,
randomly sorting them into two groups, A and B, of sizes Na=15 and Nb=15. (It is not essential
for this procedure that the two samples be of the same size.) He then had the members of each
group, one at a time, perform a series of 40 mental tasks while one or the other of the music
types is playing in the background. For the members of group A it is music of type-I, while for
those of group B it is music of type-II. The following table shows how many of the 40
components of the series each subject was able to complete.

Group A 26 21 22 26 19 22 26 25 24 21 23 23 18 29 22
music of type-I
Group B 18 23 21 20 20 29 20 16 20 26 21 25 17 18 19
music of type-II

Do two types of music, type-I and type-II, have different effects upon the ability of college
students to perform a series of mental tasks requiring concentration?

32
4. PAIRED SAMPLES T-TESTS

The dependent t-test (called the paired-samples t-test in SPSS) compares the means between two
related groups on the same continuous, dependent variable. For example, you could use a
dependent t-test to understand whether there was a difference in smokers' daily cigarette
consumption before and after a 6 week hypnotherapy programme (i.e., your dependent variable
would be "daily cigarette consumption", and your two related groups would be the cigarette
consumption values "before" and "after" the hypnotherapy programme).

1. Your dependent variable should be measured on a continuous scale (i.e., it is measured at


the interval or ratio level).
2. Your independent variable should consist of two categorical, "related groups" or
"matched pairs". "Related groups" indicates that the same subjects are present in both
groups. The reason that it is possible to have the same subjects in each group is because each
subject has been measured on two occasions on the same dependent variable. For example,
you might have measured 10 individuals' performance in a spelling test (the dependent
variable) before and after they underwent a new form of computerized teaching method to
improve spelling. You would like to know if the computer training improved their spelling
performance. The first related group consists of the subjects at the beginning of (prior to) the
computerized spelling training and the second related group consists of the same subjects, but
now at the end of the computerized training.
3. There should be no significant outliers in the differences between the two related groups.
Outliers are simply single data points within your data that do not follow the usual pattern (
4. The distribution of the differences in the dependent variable between the two related
groups should be approximately normally distributed.

The formula for the dependent t is:

Where D is the difference between pairs of scores,

33
Notice that we subtract the score for the first X from the paired second X. This is probably so
that when we are finding the difference between the pre-test and post-test, that we subtract the
pre-test (X1) from the post-test (X2). The degree of freedom for the dependent-t test is df = n – 1
and n is the number pairs of subjects in the study.

Example problem using the dependent t-test


The Beck Depression Scale (pre-test) was administered to ten adolescents undergoing anger
management therapy. After four weeks of therapy the same scale was administered again (post-
test). The results are tabulated below.
Pre-Test (X1) 14 6 4 15 3 3 6 5 6 3
Post-Test (X2) 0 0 3 20 0 0 1 1 1 0

a) What is the appropriate test statistic that will be used to analyse this data and why.
In this problem we are comparing pre-test and post-test scores for a group of subjects. At the
same time the dependent variable is in ratio while the independent variable is categorical (pretest
and posttest). This would be an appropriate situation for the dependent t-test.
b) What are the three basic assumptions of the test statistic in (a) above?
c) State the null hypothesis to be tested

Note: Our problem stated that the therapy would decrease the depression score. Therefore our
alternative hypothesis states that mu1 (the pre-test score) will be greater than mu2 (the post-test
score).
d) Does the anger management therapy significantly reduce the scores on the depression scale?
The pre-test and post-test scores, as well as D and D2 are shown in the following table
D= (X2-
Pre-Test (X1) Post-Test (X2)
X1) D2
14 0 -14 196
6 0 -6 36
4 3 -1 1
15 20 5 25
34
3 0 -3 9
3 0 -3 9
6 1 -5 25
5 1 -4 16
6 1 -5 25
3 0 -3 9
-39 351

1. Calculate the t-value using the formula:

and the degrees of freedom for this problem is:


2. Calculate the degree of freedom of the t-test

Set the alpha level. Note: As usual we will set our alpha level at .05, we have 5 chances in 100
of making a type I error.
3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for
the statistical test if necessary. t = -2.623 df = n - 1 = 10 - 1 = 9 Note: We have calculated the t-
value and will also need to know the degrees of freedom when we go to look up the critical value
of t.
4. Write the decision rule for rejecting the null hypothesis. Reject H0 if t is <= -1.833 Note:
To write the decision rule we need to know the critical value for t, with an alpha level of .05 and
a one-tailed test. That means our result is significant if the calculated t value is less than or equal
to -1.833. Note: Why are we looking for a negative value of t? This is a little tricky, but we are
looking at the post-test being less than the pre-test. Now the dependent t is calculated by

35
subtracting the pre-test from the post-test so if the post-test is actually less than the pre-test, post-
test minus pre-test will be a negative quantity.
5. Write a summary statement based on the decision. Reject H0, p < .05, one-tailed Note:
Since our calculated value of t (-2.623) is less than or equal to -1.833, we reject the null
hypothesis and accept the alternative hypothesis.
6. Write a statement of results in Standard English. The management therapy did
significantly reduce the depression scores for the adolescents.

Exercise
Consider the following study in which standing and supine systolic blood pressures were
compared. This study was performed on twelve subjects. Their blood pressures were measured in
both positions.

Standing 132 146 135 141 139 162 128 137 145 151 131 143
Supine 136 145 140 147 142 160 137 136 149 158 120 150

Does the data suggest that there is no difference between the mean blood pressures in the two
populations?

36
5. THE ANALYSIS OF VARIANCE (ANOVA)
Analysis of variance like the t-test, tests the hypothesis that the means of OVERALL groups are
equal. One of the most important difference is that in a t-test only TWO groups are distinguished,
whereas analysis of variance usually compares three or more groups. For example, you can use
ANOVA to determine whether the means of a number of groups are equal.
Do students in Boys' only, Girls' only and Co-educational schools, on average, perform equally
on a self-efficacy questionnaire? or suppose you distinguish between age groups: young- less
than 25 years; Middle- 25 to 45 years; and adult- over 45 years. Do all age groups, on average,
spend equal amount of time on face book? ANOVA answers these kinds of questions.
Assumptions of ANOVA

1. Interval data. ANOVA assumes an interval-level dependent. With Likert scales and other
ordinal dependents, the nonparametric Kruskal-Wallace test is preferred.
2. Homogeneity of variances. The dependent variable should have the same variance in each
category of the independent variable. When there is more than one independent, there must
be homogeneity of variances in the cells formed by the independent categorical variables.
The reason for this assumption is that the denominator of the F-ratio is the within-group
mean square, which is the average of group variances taking group sizes into account. When
groups differ widely in variances, this average is a poor summary measure. However,
ANOVA is robust for small and even moderate departures from homogeneity of variance.
Still, a rule of thumb is that the ratio of largest to smallest group variances should be
3.0 or less. The Levene's test of homogeneity of variance is computed by SPSS to test the
ANOVA assumption that each group (category) of the independent)(s) has the same
variance. If the Levene statistic is significant at the .05 level or better, the researcher rejects
the null hypothesis that the groups have equal variances.
3. Random sampling. For purposes of significance testing, the subjects in each group are
randomly sampled.
4. Multivariate normality. For purposes of significance testing, variables follow multivariate
normal distributions. The dependent variable is normally distributed in each category of the
independent variable(s). ANOVA is robust even for moderate departures from multivariate
normality.

37
5. Equal or similar sample sizes. The groups formed by the categories of the independent(s)
should be equal or similar in sample size. The more the groups are similar in size, the more
robust ANOVA will be with respect to violations of the assumptions of normality and
homogeneity of variance.
ANOVA compares variance between the groups with the variance within the groups. Dividing
the former variance with the later, you get the F-statistic. Therefore instead of ANOVA, the term
F-test is also used.
Example 1
A manager wishes to determine whether the mean times required to complete a certain task differ
for the three levels of employee training. He randomly selected 6 employees with each of the
three levels of training (Beginner, Intermediate and Advanced). The data is summarized below.
Beginner: 56, 63, 52, 61, 65, 67
Intermediate: 41, 44, 51, 43, 41, 55
Advanced: 55, 45, 45, 60, 49, 53
a) Which is the best test static to analyse the data and why
When we want to compare the mean difference of more than two groups
b) State three assumptions of using test static
• Sample must be randomly selected
• All of the standard deviations are the same -No standard deviation is more than twice any
other.
• All of the populations are normally distributed
• The dependent variable should be interval or ratio
• More than two independent categorical groups
c) State the null hypothesis
There is no statistically significant difference in the mean times required to complete a
certain for the three levels of employee training
d) Does the data support the hypothesis?
ANOVA= F-ratio = Between Column Variance (BCV)/Within Column Variance (WCV)

38
Beginner Intermediate Advanced
56 41 55
63 44 45
52 51 45
61 43 60
65 41 49
67 55 53
60.6667 45.8333 51.1667

a) Between Column Variance (BCV)


N= total number of elements in a given group/category
n = number of groups; in this case we have 3 groups (beginner, intermediate and advanced)
N Mean d d2 N*d2
6 60.6667 8.1111 65.7901 394.7407
6 45.8333 -6.7222 45.1883 271.1296
6 51.1667 -1.3889 1.9290 11.57407
G mean 52.5556 677.4444
Grand mean=summation of mean of beginner + intermediate+ advanced)/3
=60.6667 +45.8333+51.1667=52.5556
Between column variance (BCV) = total variance/n-1=677.4444/3-1 =338.7222
b) Within Column Variance (WCV)
Get the variance for each of the independent groups
Beg d d2 Inter d d^2 Adv d d^2
56 -4.6667 21.778 41 -4.8333 23.3611 55 3.8333 14.69444
63 2.3333 5.4444 44 -1.8333 3.36111 45 -6.1667 38.02778
52 -8.6667 75.111 51 5.1667 26.6944 45 -6.1667 38.02778
61 0.3333 0.1111 43 -2.8333 8.02778 60 8.8333 78.02778
65 4.3333 18.778 41 -4.8333 23.3611 49 -2.1667 4.694444
67 6.3333 40.111 55 9.1667 84.0278 53 1.8333 3.361111
60.6667 161.3333 45.8333 168.8333 51.1667 176.8333

39
c) Variance for beginners = Total variance/N-1=161.333/6-1= 32.2666
d) Variance for intermediate = Total variance/N-1=168.833/6-1= 33.76666
e) Variance for beginners = Total variance/N-1=176.8333/6-1= 35.36666
Within column variance(WCV) = sum of the variance of the groups/number of groups
= (32.2666 + 33.76666+ 35.36666)/3 = 101.39986/3=33.8
ANOVA= F-ratio = Between Column Variance (BCV)/Within Column Variance (WCV)
=338.7222/33.8=10.021
F critical / tabulated is established from ANOVA tables through the intersection of Degree of
freedom of the numerator and Degree of freedom of the denominator.
Degree of freedom of the numerator = n-1 = 3-1 = 2, where n is the number of groups
Degree of freedom of the denominator = (N1-1)+( N2-1)+ (N3-1) = (6-1)+(6-1)+(6-1) = 5+5+5=
15
whereN1, N2 and N3 are number of elements in group 1, 2 and 3 respectively
Critical F-ratio = 3.68, that is where 2 and 15 intersect of the ANOVA table
Ho: There is no difference in the mean times required to complete a certain task by levels of
training
We now compare the F calculated and the F critical to make a decision whether to accept or
reject the null Ho: There is no difference in the mean times required to complete a certain task by
levels of training
Since Calculated F-ratio (10.02) is more than the critical F-ratio (3.68) we reject the null
hypothesis that there is no difference in the mean times required to complete a certain task by
levels of training, the results indicate a statistically significant difference

40
F Distribution critical values for P=0.05

Denominator

Numerator DF

DF 1 2 3 4 5 7 10 15 20 30 60 120 500 1000

1 161.45 199.50 215.71 224.58 230.16 236.77 241.88 245.95 248.01 250.10 252.20 253.25 254.06 254.19

2 18.513 19.000 19.164 19.247 19.296 19.353 19.396 19.429 19.446 19.462 19.479 19.487 19.494 19.495

3 10.128 9.5522 9.2766 9.1172 9.0135 8.8867 8.7855 8.7028 8.6602 8.6165 8.5720 8.5493 8.5320 8.5292

4 7.7086 6.9443 6.5915 6.3882 6.2560 6.0942 5.9644 5.8579 5.8026 5.7458 5.6877 5.6580 5.6352 5.6317

5 6.6078 5.7862 5.4095 5.1922 5.0504 4.8759 4.7351 4.6187 4.5582 4.4958 4.4314 4.3985 4.3731 4.3691

7 5.5914 4.7375 4.3469 4.1202 3.9715 3.7871 3.6366 3.5108 3.4445 3.3758 3.3043 3.2675 3.2388 3.2344

10 4.9645 4.1028 3.7082 3.4780 3.3259 3.1354 2.9782 2.8450 2.7741 2.6996 2.6210 2.5801 2.5482 2.5430

15 4.5431 3.6823 3.2874 3.0556 2.9013 2.7066 2.5437 2.4035 2.3275 2.2467 2.1601 2.1141 2.0776 2.0718
20 4.3512 3.4928 3.0983 2.8660 2.7109 2.5140 2.3479 2.2032 2.1241 2.0391 1.9463 1.8962 1.8563 1.8498
30 4.1709 3.3159 2.9223 2.6896 2.5336 2.3343 2.1646 2.0149 1.9317 1.8408 1.7396 1.6835 1.6376 1.6300
60 4.0012 3.1505 2.7581 2.5252 2.3683 2.1666 1.9927 1.8365 1.7480 1.6492 1.5343 1.4672 1.4093 1.3994
120 3.9201 3.0718 2.6802 2.4473 2.2898 2.0868 1.9104 1.7505 1.6587 1.5544 1.4289 1.3519 1.2804 1.2674
500 3.8601 3.0137 2.6227 2.3898 2.2320 2.0278 1.8496 1.6864 1.5917 1.4820 1.3455 1.2552 1.1586 1.1378
1000 3.8508 3.0047 2.6137 2.3808 2.2230 2.0187 1.8402 1.6765 1.5811 1.4705 1.3318 1.2385 1.1342 1.1096

41

You might also like