BIO206 Biostatistics
BIO206 Biostatistics
GUIDE
BIO206
STATISTICS FOR AGRICULTURE AND BIOLOGICAL
SCIENCES
Lagos Office
National Open University of Nigeria
14/16 Ahmadu Bello Way
Victoria Island
Lagos
e-mail:
URL: www.nou.edu.ng
Published by
National Open University of Nigeria
Printed 2009. Reprinted 2017
Reviewed: 2023
ISBN: 978-978-058-945-5
1
Bio206 Statistics For Agriculture And Biological Sciences
Course Guide:
Introduction
Course Competencies
The course will provide general overview of the course synopsis; this
course material shall be divided into appropriate sections to help the
learners understand and assimilate the contents of the course. The course
guide will help students to understand how to go about Tutor- Marked-
Assignment which will form part of the overall assessment at the end
of the course.
2
Bio206 Statistics For Agriculture And Biological Sciences
Course Objectives
The successful completion of this course entails the studying of the course
guide and the reference textbooks/materials as well as other materials
provided by the National Open University of Nigeria. The course guide is
divided into sections, each section has self-assessment exercise. The
practice of the assessment will positively influence your academic
performance in the course. The course is expected to cover a minimal
period of 8 weeks to complete.
Study Units
The Modules of this course shall be in accordance with the course objectives
thus;
3
Bio206 Statistics For Agriculture And Biological Sciences
Presentation Schedule
Assignment Marks
TMA 1-4 Four T M A s , best three marks of the
four count at 10% each - 30% of course marks.
End of course 70% of overall course marks
examination
Total 100% of course materials
Assessment
Online Facilitation
Course Information
Ice Breaker
5
Bio206 Statistics For Agriculture And Biological Sciences
CONTENTS
1.1: Introduction
1.2: Intended Learning Outcomes
1.3: Concept of Biostatistics
1.3.1: Types of Statistics
1.3.2: Usefulness of statistics
1.3.3: Terminologies
1.3.4: Processing of Statistics Data
1.3.5: Presentation of Data
1.3.6: Accuracy of Measurement
1.3.7: Rounding of Figure
1.3.8: Limitations of Statistics
1.4: Summary
1.5: References/Further Readings/Web Sources
1.6: Possible Answers to Self-Assessment Exercises
1.1 Introduction
6
Bio206 Statistics For Agriculture And Biological Sciences
1.3.3: Terminologies
i. Population/Universe is defined as the entire collection of
measurements about which the statistician wishes to draw
conclusion. If one for example wishes to draw conclusions about
the heights of students of a National Open University of Nigeria,
then the population under consideration comprises the heights of
all students in the University.
ii. Samples is a subset of all the measurements in the population of
interest. Since the populations of interest are generally so large,
making it unfeasible to obtain all the measurements desired, the
statistician normally resorts to obtaining a subset of all the
measurements in the population. A sample is a subgroup of the
population selected for study. When a sample is chosen at
random from a population, it is said to be an unbiased sample.
That is, the sample for the most part, is representative of the
population. But if a sample is selected incorrectly, it may be
a biased sample when some type of systematic error has been
made in the selection of the subjects. However, the sample
must be random in order to make valid inferences about the
population. The importance of samples are;
9
Bio206 Statistics For Agriculture And Biological Sciences
10
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises
1. Variables that are assigned integers are called?
2. The measurements and observations that variables can assume is called?
1.4: Summary
11
Bio206 Statistics For Agriculture And Biological Sciences
https://www.youtube.com/watch?v=a23NhTCyxUY
https://www.youtube.com/watch?v=_e4mwlqCQrc
1- Discrete variables
2- Data
12
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
2.1: Introduction
2.2: Intended Learning Outcomes
2.3: Frequency Preparation
2.3.1: Raw data & Arrays
2.3.2: Graphical Presentation
2.3.3: Frequency table
2.4: Summary
2.5: References/Further Readings.Wed Sources
2.6: Possible Answers to Self-Assessment Exercises
2.1: Introduction
Very often, large amount of data are encountered in research. The best
way to present these data is by the use of frequency tables, which assists
to summarize the data, making them more meaningful. This involves the
listing of all observed values of the variables under consideration and
stating how many times each value is observed (i.e., the frequency).
Measurements or counting gives rise to raw data. Raw data itself is
difficult to comprehend because it lacks organization, summarization,
which renders it meaningless. Thus, the raw data has to be put in some
order through classification and tabulation so as to reduce its volume
and heterogeneity. To describe situations, draw conclusions or make
inferences about events, the researcher must organize the data in some
meaningful way. The most convenient method of organizing data is to
construct a frequency distribution.
13
Bio206 Statistics For Agriculture And Biological Sciences
The collected data that have not been organized numerically is called
RAW DATA. OR they are data recorded in the original way they were
sampled. An example is the set of masses of 100 male students obtained
from an alphabetical listing of University record. While the arrangement
of raw data in either ascending or descending order of magnitude is called
an ARRAY.
Pie Charts
A pie chart is more commonly used to display percentages, although it
can be used to display frequencies or relative frequencies. The whole pie
(or circle) represents the total sample or population. It is defined as a circle
divided into portions that represent the relative frequencies or percentages
of a population or a sample belonging to different categories.
Bar Graphs
A graph made of bars whose heights represent the frequencies of
respective categories is called a bar graph. The bar graphs for relative
frequency and percentage distributions can be drawn simply by marking
the relative frequencies or percentages, instead of the frequencies, on the
vertical axis. Sometimes a bar graph is constructed by marking the
categories on the vertical axis and the frequencies on the horizontal axis.
14
Bio206 Statistics For Agriculture And Biological Sciences
15
Bio206 Statistics For Agriculture And Biological Sciences
Polygons
A polygon is another device that can be used to present quantitative data
in graphic form. A polygon with relative frequencies marked on the
vertical axis is called a relative frequency polygon. Similarly, a polygon
with percentages marked on the vertical axis is called a percentage
16
Bio206 Statistics For Agriculture And Biological Sciences
Arrange the numbers 17, 45, 38, 27, 6,58,48, 32, 19, 22, 34 in an array
and determine the range.
17
Bio206 Statistics For Agriculture And Biological Sciences
78 72 70 74 76 75 75 79 75 74
75 70 73 75 70 74 76 74 75 74
78 74 75 74 73 74 71 72 71 79
Construct a frequency distribution table.
18
Bio206 Statistics For Agriculture And Biological Sciences
https://www.toppr.com/guides/maths/statistics/frequency-distribution/
https://www.youtube.com/watch?v=amLYLq73RvE
https://www.statisticshowto.com/probability-and-statistics/statistics-
definitions/cumulative-frequency-distribution/
19
Bio206 Statistics For Agriculture And Biological Sciences
1. Frequency
2. Categorical frequency
3. Construction of frequency
4. constructing a frequency distribution
5. 30
6. 27.5
7. 25
8. 17.5
9. 26mm
10. 5
11. 216mm
12. 100
13. 19.63mm
14. 16mm
15. 18mm
20
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
3.1: Introduction
3.2: Intended Learning Outcomes
3.3: Probability distribution
3.3.1: Normal distribution
3.3.2: Poisson distribution
3.3.3: Binomial distribution
3.4: Summary
3.5: References/Further Readings/Web Sources
3.6: Possible Answers to Self-Assessment Exercises
3.1: Introduction
21
Bio206 Statistics For Agriculture And Biological Sciences
showed its use in statistics. The normal distribution is defined by just two
statistics, the mean and the standard deviation. Normal distribution is
concerned with results obtained by taking measurements on continuous
random variable (i.e., the quantified value of a random event) like weight,
yield etc. The normal distribution is a particular pattern of variation of
numbers around the mean. It is symmetrical (hence we express the
standard deviation as ±) and the frequency of individual numbers falls off
equally away from the mean in both directions. It so happens that the
curve given by this probability’s distribution approximates very closely
to a Mathematical curve. This curve is called the Normal curve.
In checking for normality, it is important to know whether an
experimental data is an approximate fit to a normal distribution. This
is easily checked with large samples. There should be roughly equal
numbers of observations on either side of the mean. Things are more
difficult when we have only a few samples. In experiments, it is not
uncommon to have no more than three data per treatment. However, even
here we can get clues. If the distribution is normal, there should be no
relationship between the magnitude of the mean and its standard deviation.
22
Bio206 Statistics For Agriculture And Biological Sciences
23
Bio206 Statistics For Agriculture And Biological Sciences
iii. You can also randomly select a number of squares, if the area is
two large.
iv. The probability of X occurrences in an interval of time, volume,
area etc. for a variable where λ (lambda) is the mean number of
occurrences per unit (time, volume, area etc) is given by:
i. Each trial can have only two outcomes or outcomes that can be
reduced to two outcomes. i.e., these outcomes can either be success
or failure. No two events can occur simultaneously.
ii. There must be a fixed number of trails.
iii. The outcomes of each trial must be independent of each other.
iv. The probability of a success must remain the same for each
trial.The binomial probability formula is given by
24
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises
1. The term used to describe how probably given random variable values in a
range of such values occur is?
2. The other name for normal distribution is?
3. A particular pattern of variation of number around the mean is?
4. Normal distribution is easily checked with…….sample.
5. Unimodal symmetrical curve is a property of?
6. State the formula for poison distribution.
7. The discrete probability distribution that is useful when n is large and p is
small is?
8. A survey on birds showed that one out of five fire finch was trapped using
mix net, in a given season, if 10 birds were selected at random, what type of
probability distribution is applicable here?
9. A survey on birds showed that one out of five fire finch was trapped using
mix net, in a given season, if 10 birds were selected at random, find the
numerical probability of success?
10. A survey on birds showed that one out of five fire finch was trapped using
mix net, in a given season, if 10 birds were selected at random, find the
numerical probability of failure?
3.4: Summary
This unit is able to discuss about the three different types of probability
distribution with worked examples. An introduction to probability
distribution was provided. The three types of probability distributions
were discussed with appropriate exercises.
25
Bio206 Statistics For Agriculture And Biological Sciences
London.
Daniel, W.W. (1995). Biostatistics: a foundation for Analysis in
Health sciences. Sixth Edition. John Wiley and sons Incorporated.
USA.
https://www.khanacademy.org/math/precalculus/x9e81a4f98389efdf
:prob-comb/x9e81a4f98389efdf:probability-distributions-
introduction/v/discrete-probability-distribution
https://www.youtube.com/watch?v=CfZa1daLjwo
https://www.coursera.org/lecture/statistics-international-
business/probability-and-probability-distributions-an-
introduction-ASvPO
26
Bio206 Statistics For Agriculture And Biological Sciences
1. Probability distribution
2. Gaussian distribution
3. Normal distribution, Gaussian distribution
4. Large
5. Normal curve
ℓ−𝜆 𝜆𝑥
6. P(x, 𝜆) =
𝑋!
7. Poisson distribution
8. Binomial
9. 0.2
10. 0.8
27
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
4.1: Introduction
4.2: Intended Learning Outcomes
4.3: Methods of Estimation and Sampling
4.4: Summary
4.5: References/Further Readings/Web Sources
4.6: Possible Answers to Self-Assessment Exercises
4.1: Introduction
The use of estimation has been discussed in this section as well as its
application in life sciences. In order to obtain unbiased samples, several
sampling methods have been developed. The most common methods
are random, systematic, stratified, and clustered sampling.
28
Bio206 Statistics For Agriculture And Biological Sciences
29
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises 1
1. Any statistics used to estimate a parameter is called?
2. When a single number is used to estimate a population, parameters is called?
3. An unbiased sample is when a sample is?
4. The most common methods to obtain unbiased sample is?
5. A sample that has the same chance as any of other of being selected is?
4.4: Summary
30
Bio206 Statistics For Agriculture And Biological Sciences
https://www.youtube.com/watch?v=pqEPtona94A
https://www.youtube.com/watch?v=5P57_sobsnk
https://www.coursera.org/lecture/statistical-inference-for-estimation-in-
data-science/estimators-and-sampling-distributions-2Zfz7
31
Bio206 Statistics For Agriculture And Biological Sciences
1. Estimator
2. Point estimate
3. Chosen at random from a population
4. Random, Systematic, Stratified, Clustered
5. Random
32
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
5.1: Introduction
5.2: Intended Learning Outcomes
5.3: Hypotheses Formulation
5.3.1: Importance of Hypotheses
5.3.2: Sources of Hypotheses
5.3.3: Types of Hypotheses
5.3.4: Formulating Hypotheses
5.3.5: Characteristics of Good Hypotheses
5.3.6: Errors in Hypotheses
5.4: Experimental Design
5.5: Summary
5.6: References/Further Readings/Web Sources
5.7: Possible Answers to Self-Assessment Exercises
5.1: Introduction
The use of hypotheses testing has been discussed in this section as well
as its application in life sciences. To understand the concept of hypothesis,
there is need to understand the steps in the scientific methods as it relates
to the conduct of a research. Conducting research in Biology and other
natural sciences involves the following steps:
34
Bio206 Statistics For Agriculture And Biological Sciences
35
Bio206 Statistics For Agriculture And Biological Sciences
Null Hypothesis
Designated by: H0 or HN. Pronounced as “H oh” or “H-null”. The null
hypothesis represents a theory that has been put forward, either because
it is believed to be true or because it is to be used as a basis for argument,
but has not been proved. It has serious outcome if incorrect decision is
made! It assumes that non-sampling errors such as bias is absent in the
measurement and that the differences is solely due to chance. Null
hypothesis is always about the current state of the affairs. Statistically
given as:
Ho : X1 =X2
Ho is the null hypothesis; X1 and X2 are the sample mean.
For example, a researcher may be interested in finding out the influence
of the feeding on the weights of children according to the parental type
(single versus dual parenting). The null hypothesis in this case will be
there is no significant influence of feeding on the weights of the children
from the single parents and the dual parents.
Alternative Hypothesis
Designated by: H1 or Ha. The alternative hypothesis is a statement of
what a hypothesis test is set up to establish. Opposite of Null Hypothesis.
Only reached if H0 is rejected. Frequently “alternative” is actual desired
conclusion of the researcher! The alternative hypothesis is sometimes
arbitrary. Statistically denoted as:
Ha: X1 < X2
Ha is the alternative hypothesis; X1 and X2 are the sample mean.
In the given example above, the alternative hypothesis will be stated as
there is significant influence of feeding on the weights of the children
from single and dual parents since surely, one category of the children
from one parent type will have a different weight to the other category. It
should be stated that the occurrence of a highly improbable difference
does not disprove the null hypothesis since such a difference owing to
chance will be highly unlikely. Thus, the null hypothesis will be rejected.
36
Bio206 Statistics For Agriculture And Biological Sciences
Working Hypothesis
The working or trail hypothesis is provisionally adopted to explain the
relationship between some observed facts for guiding a researcher in the
investigation of a problem. A Statement constitutes a trail or working
hypothesis (which) is to be tested and conformed, modifies or even
abandoned as the investigation proceeds.
37
Bio206 Statistics For Agriculture And Biological Sciences
38
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises - 1
1. The hypothesis that is often used in research study is?
2. The degree of difference between sample means and population mean is?
3. How many probabilities occurs when you reject null hypothesis?
39
Bio206 Statistics For Agriculture And Biological Sciences
Components
a. Control group: a change in the dependent variable could result in
a change in the dependent variable, but a change in the dependent
variable could also be the result of a random variable which may
find its way into the experiment thus to ensure this does not
happen, a controlled experiment is run:
a. Control group is an additional experimental run or treatment.
b. It is a separate experiment, set up like the others with exactly the
same conditions.
c. The only difference is that test variable is changed.
d. In an experiment, a control is a treatment which is included to
provide a reference set of data which can be compared with data
obtained from the experimental treatments. For example, an
investigation into the effect of increased copper in the water in fish
growth would have as a control a group of fish cultured in copper
free growth medium. The effect of copper can be determined by
comparing the growth rates of fish in growth medium with various
levels of copper with the control. For this comparison to be valid,
it is critical there are no other variables apart from the independent
variable that differs between the control and experimental groups.
b. Variables: in designing an experiment it is important to consider
all the factors that can change in some way. These are called
variables. There are three types of variables in an experimental
design:
a) Independent variability: it can also be referred to as the
manipulated or the variable which is deliberately altered in some
way by the person carrying out the investigation.
b) Dependent variable: it can also be referred to as responding
variable or the variable which is measured/directly observed by the
experimenter.
c) Fixed/controlled variables: the major types of fixed/controlled
variables that needs to be controlled are:
Factors affecting the organism: if organisms are used in the
experiments, then many of the variables relating to them will need
to be controlled. If the investigation uses small numbers in the
same size, then all due care needs to be taken to ensure that the
control group and experimental group are matched as closely as
possible with regards to things such as the number of organisms,
their age, size, sex and any other relevant characteristics. However,
if the sample size is large then these variables often average
themselves out.
The environment factors: in all the experiments the environmental
factors will need to be controlled as much as possible. This is much
easier to achieve if the investigation is carried out in a laboratory
environment but not as easy in a field situation. For instance, in the
40
Bio206 Statistics For Agriculture And Biological Sciences
Good sampling design should take into account both of these and should
a. Relate to the objectives of the investigation
b. Be practical and achievable
c. Be cost effective in terms of equipment and labour
d. Provide estimates of population parameters that are truly
representative and unbiased
e. An ideal representative samples should be:
a) Taken at random so that every member of the population of data
has an equal chance of selection.
b) Large enough to give sufficient precision
c) Unbiased by the sampling procedures or equipment
It is very important in sampling procedures to take into account relevant
factors such as: location, habitat, time, age, sex, physiological condition
and disease status. These also need to be noted in the design as otherwise
a wrong interpretation may arise from the result.
41
Bio206 Statistics For Agriculture And Biological Sciences
Errors in Experiments
There are many potential sources of error when designing and carrying
out experiments. Error can arise from:
a. The design of the experiment
b. The measurement and sampling of data
c. Measurement error can arise for a number of reasons:
a) Instrument error: calibration of the instrument has not been carried
out or is faulty consequently accuracy and precision are affected
b) Personal error: observer making in accurate observations. This
type of error can be overcome by taking an average measurement,
especially if data is collected by two or more independent
observers.
c) Sampling errors: these can also arise because of the size or nature
of the sample used. Sample size can either be too small or not
random enough. Replication of experiment also reduces errors.
A B A
A E B
D D E
C C C
F F E
F B D
Advantages of CRD
i. The design is very flexible and can be used for any number of
treatments.
ii. The statistical analysis is comparatively easy and straightforward.
iii. It is unaffected by missing observations for any treatment for some
purely random accidental reason.
42
Bio206 Statistics For Agriculture And Biological Sciences
Disadvantage of CRD
Self-Assessment Exercises - 2
1. The phenomena used to describe the ability to estimate the effect of the
treatment so that valid conclusion can be drawn is called?
2. The type of experimental design that can be applied when same experimental
material is used on different experimental unit is called?
3. The experimental design in which the total area is divided into blocks and all
treatments are arranged within each block in a random order is?
43
Bio206 Statistics For Agriculture And Biological Sciences
5.5: Summary
In this Unit the essence, of hypotheses were discussed with the various
types. The Unit also examined the different components, types of
experimental designs and their appropriate uses. This Unit looks at the
importance of hypotheses, sources, types characteristics and the methods
of formulating hypotheses. In the text, students will be able to learn how
to design experiment and take into account the different components in
the experiment as well as the possible statistical tools required.
44
Bio206 Statistics For Agriculture And Biological Sciences
https://www.scribbr.com/methodology/experimental-design/
https://www.scribbr.com/methodology/experimental-design/
https://study.com/academy/lesson/formulating-the-research-
hypothesis-and-null-hypothesis.html
Self-Assessment Exercises - 1
1. Null hypothesis
2. Significance difference
3. Five
Self-Assessment Exercises - 2
1. Sensitivity
2. Completely randomized design
3. Randomized block design
45
Bio206 Statistics For Agriculture And Biological Sciences
Glossary
Alternate hypothesis: the opposite of the null hypothesis. It is the
conclusion when the null hypothesis is rejected.
Bar chart or Bar graph: a chart or graph used with nominal
characteristics to display the numbers or percentages of
observations with the characteristics of interest.
Bell-shaped distribution: a term used to describe the shape of the
normal (Gaussian) distribution
Bias: A systematic error
Biostatistics: the application of research study design and
statistical analysis to application in life sciences.
Categorical variable: A variable having only certain possible
values for which there is no logical ordering of the values. Also
called a nominal, polytomous, discrete categorical variable or
factor.
Class limits: the subdivisions of a numerical characteristics (or the
widths of the classes) when it is displayed in a frequency table or
graph
Continuous variable: A variable that can take on any number of
possible values.
distribution: he values of a characteristic or variable along with
the frequency of their occurrence. Distributions may be based on
empirical observations or may be theoretical probability
distributions (eg, normal, binomial, chi-square).
Estimate: A statistical estimate of a parameter based on the data.
Estimation: The process of using information from a sample to
draw conclusions about the values of parameters in a population.
frequency polygon: A line graph connecting the midpoints of the
tops of the columns of a histogram. It is useful in comparing two
frequency distributions.
frequency table: A table showing the number or percentage of
observations occurring at different values (or ranges of values) of
a characteristic or variable.
Histogram: A graph of a frequency distribution of numerical
observations.
hypothesis test: An approach to statistical inference resulting in a
decision to reject or not to reject the null hypothesis
modal class: The interval (generally from a frequency table or
histogram) that contains the highest frequency of observations.
null hypothesis: Customarily but not necessarily a hypothesis of no
effect.
percentage polygon: A line graph connecting the midpoints of the
tops of the columns of a histogram based on percentages instead of
counts. It is useful in comparing two or more sets of observations
when the frequencies in each group are not equal.
46
Bio206 Statistics For Agriculture And Biological Sciences
47
Bio206 Statistics For Agriculture And Biological Sciences
4 white flowers and 5 blue flowers, determine the probability that it is not
red flower.
12. A flower is drawn at random from garden containing 6 red flowers,
4 white flowers and 5 blue flowers, determine the probability that it is a
red or white flower?
13. A special probability distribution that describes the distribution of
probabilities when there are only two possible outcomes for each trial
experiment is?
14. The sampling technique that increases precision is?
15. How many advantages exist between cluster sampling and others?
16. The sampling technique that is more current is?
17. The sampling technique that involves taking an item as a sample
from a large population at regular interval is?
18. When a large population is given a questionnaire to determine
those who meet the qualification for a study is called
19. The type of error that occurs when one rejects the null hypothesis
when it is true is?
20. The experimental design in which the number of rows, columns
and treatments are equal and each treatment occurs just once in each row
and column is termed?
Answers
1. Biostatistics
2. Experimental design
3. Qualitative variables
4. Discrete variable
5. Continuous variable
6. Since the data are categorical, the blood groups: A, B, O and AB
can be used as the classes for the distribution.
Class Tally Frequency Percent
A ////,////,// 12 30
B ////,////,/ 11 27.5
O ////,//// 10 25
AB ////,// 7 17.5
TOTAL 100
Therefore, it can be concluded that in the sample more students have type
A blood group because its frequency is the highest.
48
Bio206 Statistics For Agriculture And Biological Sciences
72 72 73 73 73 73 73 74 74 74
74 74 74 74 74 74 74 75 75 75
75 75 75 75 75 75 75 76 76 76
76 76 76 77 78 78 78 78 79 79
Find the range of the data: Highest value – lowest value (79 – 68 =11).
Since the range of the data is small, classes of single data values can be
used.
100
49
Bio206 Statistics For Agriculture And Biological Sciences
12
10
8
Frequency
0
68 70 71 72 73 74 75 76 77 78 79
Wing Length
8. 0.4
9. 0.27
10. 0.33
11. 0.6
12. 0.67
13. Binomial
14. Stratified sampling
15. Three
16. Cluster
17. Systematic, Skip
18. Double sampling
19. Type I error
20. Latin square design
50
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
1.1: Introduction
1.2: Intended Learning Outcomes
1.3: Measure of Central Tendency
1.3.1: Characteristics
1.3.2: Types
1.4: Summary
1.5: References/Further Readings/Web Sources
1.7: Possible Answers to Self-Assessment Exercises
1.1: Introduction
1.3.1: Characteristics
1.3.2: Types
Arithmetic Mean/Average
If mean is mentioned, it implies arithmetic mean as the other means are
identified by their full names. Mean can be defined as the sum of the
observed values of a set divided by the number of observations in the set.
If X1, X2…..Xn are N observed values, the mean
Mean = (∑▒𝑭𝑿)/(∑▒𝑭)
For weighted mean, in case K variate values X1, X2….Xk have known
weight W1, W2…Wk respectively then the weighted mean (µ) =
(𝑊1𝑋1 + 𝑊2𝑋2 + ⋯ 𝑊𝑘𝑋𝑘)/(𝑊1 + 𝑊2 + ⋯ 𝑊𝑘) = (∑▒𝑊𝑋)/
(∑▒𝑊) = 1/𝑊 ∑▒𝑊𝑖𝑋𝑖 i=1,2...k
Weighted mean is commonly used in the construction of index numbers.
Properties
i) The algebraic sum of the deviations of a set of numbers from their
arithmetic mean is zero.
ii) The sum of the square of the deviations of a set of numbers from
any number above is a minimum if and only if a is equal to mean.
iii) If Fi numbers have Yi and Fk has Yk, then the mean of all the
number is (𝐹1𝑌1 + 𝐹2𝑌2 + ⋯ 𝐹𝑘𝑌𝑘)/(𝐹1 + 𝐹2 + ⋯ 𝐹𝑘)
Median:
It has been pointed out that mean cannot be calculated whenever there is
frequency distribution with open end intervals. Also, the mean is to a great
extent affected by the extreme values of the set of observations. Hence in
such cases, there has been a search for some better measure of central
tendency. Median is the value of the variable which divides it into two
equal halves; in an order series of data, median is an observation lying in
the middle of the series, in a set of observations below it and remaining
52
Bio206 Statistics For Agriculture And Biological Sciences
half above it. The median for a set of observations can easily be found out
after arranging them in either descending or ascending order.
Let X1, X2… Xn be N ordered observations. Now two possibilities are
there:
a) N is an odd number say N=2p +1 where p is an integer. In this case
(p+1)th observation will be the median value.
b) If N is even N =2p, then the average of pth and (p+1)th
observations will be the median value.
The median for grouped data: if the data are given with class interval as:
Mode
Mode is a value of a particular type of items which occurs most
frequently. It can be defined as a variate value which occurs most
frequently in a set of values. In case of discrete distribution one can find
53
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises
1. The measures of the diameter of a plant leaf were 38.8, 40.9, 39.2. 39.7, 40.2,
39.5, 40.3, 39.2, 39.8, 40.6mm, find the number of measures?
2. The measures of the diameter of a plant leaf were 38.8, 40.9, 39.2. 39.7, 40.2,
39.5, 40.3, 39.2, 39.8, 40.6mm, find the total number of frequency?
3. The measures of the diameter of a plant leaf were 38.8, 40.9, 39.2. 39.7, 40.2,
39.5, 40.3, 39.2, 39.8, 40.6mm, find the arithmetic mean?
4. Find the mean of the set of figures obtained from measuring the weight to
nearest gram of fingerlings: 3, 5, 2, 6, 5, 9, 5, 2, 8, 6
1.4: Summary
54
Bio206 Statistics For Agriculture And Biological Sciences
https://statisticsbyjim.com/basics/measures-central-tendency-mean-
median-mode/
https://www.thoughtco.com/measures-of-central-tendency-3026706
https://study.com/academy/lesson/central-tendency-measures-
definition-examples.html
1. 10mm
2. 398.2mm
3. 39.8mm
4. 5.1
55
Bio206 Statistics For Agriculture And Biological Sciences
2.1: Introduction
2.3.1: Purposes
56
Bio206 Statistics For Agriculture And Biological Sciences
Properties
i) It should be based on all values of a series.
ii) It should not be susceptible to fluctuation of sampling.
iii) It should be rigidly defined i.e.; each investigator should arrive at
the same value for the same set of data.
iv) It should be capable of further algebraic treatment.
v) It is preferable that the unit of measurement of dispersion should
be the same as the unit of measurement of observations
vi) It should be calculable with reasonable ease i.e., the formular
should be such that it does not complicate the computation of a
measure of dispersion.
vii) It should be least affected by extreme values.
2.3.2: Types
Range
Range is defined as the difference between the largest and the smallest
observation in a set
Range (R) = L – S
Where L = largest observation
S = Smallest observation
A relative measure known as coefficient of range is given as
Coefficient of Range = (𝐿 − 𝑆)/(𝐿 + 𝑆)
The lesser the range or coefficient of range, the better the result
Properties
a) It is the simplest measure and can easily be understood.
b) Besides the above merit, it hardly satisfies any property of a good
measure of dispersion e.g., it is based on two extreme values only,
ignoring the others.
c) It is not liable to further algebraic treatment.
57
Bio206 Statistics For Agriculture And Biological Sciences
Properties
a) It has mostly removed the lacunae which are present in the
measures of dispersion given before it.
b) The main demerit is that its unit is the square of the unit of
measurement of variate values. E.g., the variable X is measured in
cm, the unit of variance is cm2
c) The variance gives more weight to the extreme values as compared
to those which are near to mean value, because the difference is
squared in variance
Standard deviation
The positive square root of the variance is called deviation (S.D) =
√(δ^2 ) = √(S^2 )In simple words, we can say that standard deviation
explains the average amount of variation on either side of the mean. It has
the same S.I unit as the measurement.
Properties
a) It is considered to be the best measure of dispersion and is used
widely.
b) There is however, one difficulty with it. If the unit of measurement
of variables of two series is not the same, then their variability
cannot be compared by comparing the values of standard
deviation.
Properties
a) It is one of the most widely used measured of dispersion because
of its virtues.
b) Smaller the value of C.V than the C.V of other series is more
consistent i.e.; it has less variability.
c) For field experiment C.V is generally reported. If C.V is low, it
indicates more reliability of experimental findings.
58
Bio206 Statistics For Agriculture And Biological Sciences
Standard Error
It tells how close the values of means are to the population mean S.E =
square root of variance divides by the frequency of sampling (n). The unit
of the standard error is the same as the unit of the individual
measurements
Which of the following are methods under measures of dispersion?
a. Standard deviation
b. Mean deviation
c. Range
d. All of the above
Which of the following are characteristics of a good measure of
dispersion?
a. It should be easy to calculate
b. It should be based on all the observations within a series
c. It should not be affected by the fluctuations within the sampling
d. All of the above
Self-Assessment Exercises
1. If all the observations within a series are multiplied by five,
then ________
a. The new standard deviation would be decreased by five
b. The new standard deviation would be increased by five
c. The new standard deviation would be half of the previous
standard deviation
d. The new standard deviation would be multiplied by five
2. The coefficient of variation is a percentage expression for
__________.
a. Standard deviation
b. Quartile deviation
c. Mean deviation
d. None of the above
3. While calculating the standard deviation, the deviations are
only taken from ______
a. The mode value of a series
b. The median value of a series
c. The quartile value of a series
d. The mean value of a series
4. ____________ and ____________ are types of measures of
dispersion.
a. Nominal, Real
b. Nominal, Relative
c. Real, Relative
d. Absolute, Relative
59
Bio206 Statistics For Agriculture And Biological Sciences
2.4: Summary
The measures that determine the closeness of values to the centre were
considered in this Unit. The different types of measures of dispersion
were considered and treated in this Unit.
Measures of dispersion
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3198538/#:~:text=Stan
dard%20deviation%20(SD)%20is%20the,by%20the%20number
%20of%20observations.
https://www.toppr.com/guides/business-mathematics-and-
statistics/measures-of-central-tendency-and-
dispersion/measure-of-dispersion/
https://www.youtube.com/watch?v=YvGeUSeQGYU
https://www.youtube.com/watch?v=dAwRlYhEWOs
60
Bio206 Statistics For Agriculture And Biological Sciences
1. : d
2.: a
3. d
4. d
61
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
3.1: Introduction
3.2: Intended Learning Outcomes
3.3: Student’s t test
3.3.1: One Sample t-test
3.3.2: Independent Sample t-test
3.3.3: Paired Sample t-test
3.4: Summary
3.5: References/Further Readings/Web Sources
3.6: Possible Answers to Self-Assessment Exercises
3.1: Introduction
62
Bio206 Statistics For Agriculture And Biological Sciences
The paired samples t test, sometimes called the dependent samples t-test,
is used to determine whether the change in means between two paired
observations is statistically significant? In this test, same subjects are
measured at two time points or observed by two different methods. To
apply this test, paired variables (pre-post observations of same subjects)
are used where paired variables should be continuous and normally
distributed. Further mean and SD of the paired differences and sample
size (i.e., no. of pairs) would be used to calculate significance level.
63
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises
3.4: Summary
This Unit is able to explain the simplest method of comparing two sets of
variables. The introduction the Unit was provided as well the different
types of the distribution. The rationale for the application of the different
types were also provided.
Student's t-distribution
https://www.investopedia.com/terms/t/tdistribution.asp#:~:text=T
64
Bio206 Statistics For Agriculture And Biological Sciences
he%20T%20distribution%2C%20also%20known,distributions%2
C%20hence%20the%20fatter%20tails.
https://www.statisticshowto.com/probability-and-statistics/t-
distribution/
https://www.youtube.com/watch?v=32CuxWdOlow
https://study.com/academy/lesson/student-t-distribution-definition-
example-quiz.html
1. t-distribution
2. z-test
65
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
4.1: Introduction
4.2: Intended Learning Outcomes
4.3: Chi-Square
4.3.1: Properties of Chi-square
4.3.2: Chi-Square testing
4.4: Summary
4.5: References/Further Readings/Web Sources
4.6: Possible Answers to Self-Assessment Exercises
4.1: Introduction
4.3:Chi-Square
Chi-square (χ2) is the general method for testing compatibility based on
a measure of the extent to which the observed and expected frequencies
agree. Chi- square is also, referred to as test for homogeneity
randomness, association, independence and goodness of fit. The
assumptions for the chi-square goodness- of-fit test are:
The data are obtained from a random sample.
The expected frequency for each category must be 5 or more.
66
Bio206 Statistics For Agriculture And Biological Sciences
67
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises
1. In an experiment to test the effectiveness of three different traps for catching
birds, the number of birds captured in each trap design over the study period
was recorded as follows:
Design Observed Frequency
A 10
B 27
C 15
Total 52
2. The offspring of a certain cross gave the following colours: Red, Black or
white in the ratio 9:3:4. Assuming the experiment gave 74, 32, and 38
offspring respectively in those categories, is the theory substantiated?
4.4: Summary
The Unit explain another statistical tool used to compare two variables.
Chi-square significance, methods and application was discussed in this
Unit.
68
Bio206 Statistics For Agriculture And Biological Sciences
https://study.com/academy/lesson/contingency-table-statistics-
probability-examples.html
https://www.youtube.com/watch?v=9KIQC9Npndg
69
Bio206 Statistics For Agriculture And Biological Sciences
70
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
5.1: Introduction
5.2: Intended Learning Outcomes
5.3: Analysis of Variance
5.3.1: Assumptions of ANOVA
5.3.2: Mechanism of Calculation
5.4: Summary
5.5: References/Further Reading/Web Sources
5.6: Possible Answers to Self-Assessment Exercises
5.1: Introduction
Specifically, a data set should meet the following criteria before being
subjected to ANOVA:
Parametric data: A parametric ANOVA, the topic of the article,
requires parametric data (ratio or interval measures). There are
non-parametric, one-factor versions of ANOVA for nonparametric
ordinal (ranked) data, specifically the Kruskal-Wallis test for
independent groups and the Friedman test for repeated measures
analysis.
71
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises
1. The table below shows the number of seeds for five varieties of garden
egg to three level of Indo-acetic acid (IAA)
IAA\varieties A B C D E
I 3 5 10 7 8
II 2 4 7 4 5
III 4 5 8 6 7
5.4: Summary
This Unit, looked in another test tool used to compare two variables. The
single factor analysis of variance was considered here
72
Bio206 Statistics For Agriculture And Biological Sciences
https://www.statisticssolutions.com/analysis-of-covariance-ancova/
https://www.investopedia.com/terms/a/anova.asp
https://www.youtube.com/watch?v=ZSwjaIUPBRg
IAA\varieties A B C D E Total
I 3 5 10 7 8 33
II 2 4 7 4 5 22
III 4 5 8 6 7 30
Total 9 14 25 17 20 GT=85
73
Bio206 Statistics For Agriculture And Biological Sciences
Then calculate:
The calculated F-values are compared with the F-distribution table, using
their respective degrees of freedoms.
SOURCE DF SS MS F
Block 2 12.9 6.45 13.58**
Varieties 4 48.6 12.15 25.58**
Error 8 3.8 0.475
Total 14 65.3
** indicates that the values are highly significant.
Conclusion: Since the F-values are highly significant, we reject the null
hypothesis. It means that the three levels of IAA have effect on the seed
number of the five varieties of garden egg.
74
Bio206 Statistics For Agriculture And Biological Sciences
VARIETIESDF = 18 – 15 = 3
Conclusion:
The observed variance ratio of 7.12 is greater than the table values at
both 5% (3.29) and 1% (5.42). That means there is high-significant
difference among the varieties. Therefore, we reject the null hypothesis
that the varieties are the same.
75
Bio206 Statistics For Agriculture And Biological Sciences
Glossary
ANOVA: Analysis of variance usually refers to an analysis of a
continuous dependent variable where all the predictor variables are
categorical.
Chi-square distribution: the distribution used to analyze counts in
frequency tables.
descriptive statistics: Statistics, such as the mean, the standard
deviation, the proportion, and the rate, used to describe attributes
of a set of data.
goodness of fit: Assessment of the agreement of the data with
either a hypothesized pattern (e.g., independence of row and
column factors in a contingency table or the form of a regression
relationship) or a hypothesized distribution (e.g., comparing a
histogram with expected frequencies from the normal
distribution).
mean (X̅): The most common measure of central tendency,
denoted by ľ in the population and by in the sample. In a sample,
the mean is the sum of the X values divided by the number n in the
sample (σX/n).
median: Value such that half of the observations’ values are less
than and half are greater than that value.
P-value: The probability of getting a result (e.g., t or χ2 statistics)
as or more extreme than the observed statistic had H0 been true.
significance level: A preset value of α against which P-values are
judged in order to reject H0
standard deviation: A measure of the variability (spread) of
measurements across subjects.
standard error: The standard deviation of a statistical estimator.
two-sided test: A test that is non-directional and that leads to a two-
sided P-value.
variance: A measure of the spread or variability of a distribution,
equaling the average value of the squared difference between
measurements and the population mean measurement.
76
Bio206 Statistics For Agriculture And Biological Sciences
a. Negative
b. Zero
c. Larger than the variance
d. None of the above
6. The average of squared deviations from the arithmetic mean is
known as ___________.
a. Quartile deviation
b. Standard deviation
c. Variance
d. None of the above
7. Which of the following is not a characteristic of a good measure
of dispersion?
a. It should be rigidly defined
b. It should be based on extreme values
c. It should be capable of further mathematical treatment and
statistical analysis
d. None of the above
8. Which of the following cannot be calculated for open-ended
distributions?
a. Standard deviation
b. Mean deviation
c. Range
d. None of the above
9. The statistical tool for the mean of a population used when the
population is normally or approximately normally distributed is?
10. Two plant extracts are claimed to be effective in curing stomach
ulcer were tested on patients. The patients’ reactions to treatment were
recorded in the table below:
EFFICACY
Income
High Medium Low
Class of degree 22
First 10 10
Second 10 13 7
Third 20 6 6
Pass 5 9 15
77
Bio206 Statistics For Agriculture And Biological Sciences
Determine the relationship between the class of degree and their income.
Answers
1. 5
2. 5
3. 49.8
4. 49.5
5. Answer: a
6. Answer: c
7. Answer: d
8. Answer: b
9. t-test
10. Our null hypothesis (Ho): The two plant extracts have the same
effect on the patients. First calculate the expected frequencies.
Helped Harmed No effect Total
A. 62 84 24 17
senegalensis
B. monandra 34 44 22 100
Total 96 128 46 270
Income
High Medium Low Total
Class of degree 22
First 10 10 42
Second 10 13 7 30
Third 20 6 6 32
Pass 5 9 15 29
Total 57 38 38 133
79
Bio206 Statistics For Agriculture And Biological Sciences
Conclusion:
Since the calculated value of 19.65 in higher than the table value of
12.59, it shows that the relationship between the class of degree and
income of the A.B.U Zaria graduates in Abuja is significant. Therefore,
we reject our null hypothesis
(Ho) and accept our alternative hypothesis (H1).
12. B
13. A
14. B
15. C
16. A
80
Bio206 Statistics For Agriculture And Biological Sciences
1.1: Introduction
1.2: Intended Learning Outcomes
1.3: Simple Linear Regression
1.4: Summary
1.5: References/Further Reading/Web Sources
1.6: Possible Answers to Self-Assessment Exercises
1.1: Introduction
Where a = intercept at Y
b = slope/gradient/regression coefficient.
∑𝑿∑𝒀
∑ 𝑿𝒀−
𝒏
b= (∑ 𝑿)²
∑ 𝑿𝟐 −
𝒏
a = y – bx
The line that best fit the point in a scatter diagram is called?
In a straight-line equation ‘Y = M + bX’, the ‘b’ denotes?
Self-AssessmentExercises
A scientist was interested in finding out the acute effect of neem leaf dust
(mg/l) on African catfish during the 4 days experimental period and obtained
the following result.
Concentration of neem 0.00 1.00 2.00 3.00 4.00 5.00 6.00
leaf dust (mg/l)
Cumulative mortality (%) 0.00 30.00 40.00 56.70 63.60 76.70 90.00
Use the data to calculate aforementioned
1.4: Summary
82
Bio206 Statistics For Agriculture And Biological Sciences
https://www.jmp.com/en_sg/statistics-knowledge-portal/what-is-
regression.html
https://www.youtube.com/watch?v=owI7zxCqNY0
https://www.youtube.com/watch?v=GhrxgbQnEEU
1.
∑𝑿∑𝒀
∑ 𝑿𝒀−
𝒏
b= (∑ 𝑿)²
∑ 𝑿𝟐 −
𝒏
X Y XY X2
0 0 0 0
1 30 30 1
2 40 80 4
3 56.7 170.1 9
4 63.6 254.4 16
5 76.7 383.5 25
6 90 540 36
21 356.7 1458 91
83
Bio206 Statistics For Agriculture And Biological Sciences
100
90 y = 13.821x + 9.5357
R² = 0.9662
80
60
50
40
30
20
10
0
0 1 2 3 4 5 6 7
NEEM POWDER (mg/L)
84
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
2.1: Introduction
2.2: Intended Learning Outcomes
2.3: Simple Linear Correlation
2.4: Summary
2.5: References/Further Reading/Web Sources
2.6: Possible Answers to Self-Assessment Exercises
2.1: Introduction
85
Bio206 Statistics For Agriculture And Biological Sciences
↑ ↑ ↑
Strong negative No linear relationship (0) Strong Positive
Generally, simple correlation and simple linear regression may be:
SPURIOUS CORRELATION
When interpreting correlation, r, it is important to realize that, there may
be no direct connection at all between highly correlated variables. When
86
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises
1. A scientist was interested in finding out the acute effect of neem leaf dust (mg/l)
on African catfish during the 4 days experimental period and obtained the
following result.
Concentration of neem 0.00 1.00 2.00 3.00 4.00 5.00 6.00
leaf dust (mg/l)
Cumulative mortality (%) 0.00 30.00 40.00 56.70 63.60 76.70 90.00
Use the data to calculate correlation coefficient
2. The following data of dissolved oxygen values was recorded for two stations of a
River. Use the Spearman’s correlation to determine any relationship between the values
obtained for the two stations.
Station 1 Station 2
7.4 10.4
7.6 10.8
7.9 11.1
7.2 10.2
7.4 10.3
7.1 10.2
7.4 10.7
7.2 10.5
7.8 10.8
7.7 11.2
7.8 10.6
8.3 11.4
7.4 8.6
2.4: Summary
87
Bio206 Statistics For Agriculture And Biological Sciences
https://www.youtube.com/watch?v=wHatBwHLrnA
https://www.youtube.com/watch?v=aztcS-3MwH0
88
Bio206 Statistics For Agriculture And Biological Sciences
1.
X Y XY X2 Y2
0 0 0 0 0
1 30 30 1 900
2 40 80 4 1600
3 56.7 170.1 9 3214.89
4 63.6 254.4 16 4044.96
5 76.7 383.5 25 5882.89
6 90 540 36 8100
21 356.7 1458 91 23742.74
(∑ 𝑿)( ∑ 𝒀)
∑ 𝑿𝒀 −
𝒏
(∑ 𝑿 ) 𝟐 (∑ 𝒀)𝟐
√(∑ 𝑿𝟐 − )(∑ 𝒀𝟐 − )
𝒏 𝒏
𝟐𝟏 𝒙 𝟑𝟓𝟔.𝟕
𝟏𝟒𝟓𝟖−
𝟕
r= (𝟐𝟏)𝟐 (𝟑𝟓𝟔.𝟕)𝟐
√(𝟗𝟏− )(𝟐𝟑𝟕𝟒𝟐.𝟕𝟒− )
𝟕 𝟕
𝟕𝟒𝟗𝟎.𝟕
𝟏𝟒𝟓𝟖−
𝟕
= 𝟒𝟒𝟏 𝟏𝟐𝟕𝟐𝟑𝟒.𝟖𝟗
√(𝟗𝟏− )(𝟐𝟑𝟕𝟒𝟐.𝟕𝟒− )
𝟕 𝟕
𝟏𝟒𝟓𝟖 − 𝟏𝟎𝟕𝟎. 𝟏
√(𝟗𝟏 − 𝟔𝟑)(𝟐𝟑𝟕𝟒𝟐. 𝟕𝟒 − 𝟏𝟖𝟏𝟕𝟔. 𝟒𝟏)
𝟑𝟖𝟕. 𝟗
√(𝟐𝟖)(𝟓𝟓𝟔𝟔. 𝟑𝟑)
𝟑𝟖𝟕. 𝟗
√𝟏𝟓𝟓𝟖𝟓𝟕. 𝟐𝟒
𝟑𝟖𝟕. 𝟗
𝟑𝟗𝟒. 𝟕𝟗
r =0.9825 this implies that it strongly positively correlated
r2 = 0.98252
r2 = 0.9654
interpretation = 96.54% i.e. you are 96.54% sure that the relationship
exist (p<0.05)
2.
Station 1 Station 2 Rank 1 Rank 2 d dt2
7.4(4) 10.4(5) 5.5 5 0.5 0.25
7.6(8) 10.8(9) 8 9.5 -1.5 2.25
7.9 11.1(11) 12 11 1 1
7.2(3) 10.2(2) 2.5 2.5 0 0
89
Bio206 Statistics For Agriculture And Biological Sciences
6 ∑ 𝑑𝑡 2
rs = 1 –( )
𝑛3 −𝑛
6(69)
rs = 1 –( )
133 −13
390
= 1 –( )
2197 −13
414
= 1 –( )
2184
= 1 – 0.1896
= 0.810
positively correlated
r2 = 0.810 x 0.810 = 0.656 = 65.6% = p>0.05
90
Bio206 Statistics For Agriculture And Biological Sciences
3.1: Introduction
3.2: Intended Learning Outcomes
3.3: Non-Parametric tests
3.3.1: The sign test
3.3.2: Wilcoxon Signed Rank test
3.3.3: Mann-Whitney test
3.3.4: Kruskal-Wallis Rank test
3.4: Summary
3.5: References/Further Readings/Web Sources
3.6: Possible Answers to Self-Assessment Exercises
3.1: Introduction
There are five advantages that non parametric methods have over
parametric methods.
a. They can be used to test population when the variable is not
normally distributed.
b. They can be used when the data are nominal or ordinal.
c. Can be used to test hypothesis that do not involve population
parameters.
d. In most cases, computation is easier than in parametric.
e. They are easier to understand.
91
Bio206 Statistics For Agriculture And Biological Sciences
The simplest non-parametric test is the sign test for single samples.
It is used to test the value of a median for a specific sample. In using
Sign test, you:
a. Hypothesize the specific value for the median of a population.
b. Select a sample of data and compare each value with the
conjectured median.
c. Assign plus sign if the data value is above the conjectured
median.
d. Assign minus sign if the data value is below the conjecture
median.
e. And zero (0) if it is the same as the conjecture median.
f. Compare the number of plus and minus signs and ignore the
zeros
g. If the null hypothesis (Ho) is true, the number of plus
signs should be approximately equal to the number of minus
signs.
h. But if the Ho is not true, there will be disproportionate
number of plus or minus signs.
92
Bio206 Statistics For Agriculture And Biological Sciences
How many advantages does exist of non-parametric test methods over the
parametric methods?
The group of tests that can be used when data are nominal or ordinal is?
What is parametric test known to compare?
93
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises
3.4: Summary
94
Bio206 Statistics For Agriculture And Biological Sciences
https://www.youtube.com/watch?v=ftnOBcXtBEQ
https://www.statisticshowto.com/mann-whitney-u-test/
https://www.youtube.com/watch?v=fEobVCV2TJE
https://www.statisticshowto.com/probability-and-statistics/statistics-
definitions/kruskal-wallis/
https://www.youtube.com/watch?v=q1D4Di1KWLc
95
Bio206 Statistics For Agriculture And Biological Sciences
2.
Male Female Rank male Rank female
7.6 6.9 12 6
7.4 6.8 11 5
7.3 6.6 10 3
7.2 6.5 9 2
7.1 6.4 8 1
7.0 - 7
6.7 - 4
n1=7 n2=5 R1=61 R2=17
Prove to either accept or reject the null hypothesis and verify the
accuracy of the result.
U = n1 n2 +n₁((n₁ + 1))/2 – R1
U = 7x5 +7 ((7 + 1))/2 – 61
U = 35 +7 ((8))/2 - 61
U = 35 +((56))/2 - 61
U = 35 +28 - 61
U = 63 – 61
U=2
U1 = n1n2─ U
U1 =7x5─ 2
35 - 2
U1= 33
R1+R2 = (𝐍 (𝐍 + 𝟏))/𝟐
61+17 = (12 (12 + 1))/2
78 = (12 (13))/2
78 = 156/2
78 = 78
96
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
4.1: Introduction
4.2: Intended Learning Outcomes
4.3: Ecological Indices
4.3.1: Species Richness
4.3.2: Diversity index
4.3.3: Species Evenness
4.3.4: Species Dominance
4.4: Summary
4.5: References/Further Readings/Web Sources
4.6: Possible Answers to Self-Assessment Exercises
4.1: Introduction
97
Bio206 Statistics For Agriculture And Biological Sciences
98
Bio206 Statistics For Agriculture And Biological Sciences
Self-Assessment Exercises
1. The following data were recorded of the abundance of Bacteria in three stations
of Nile River. Using the data provided, determine all the ecological statistics in
each station.
Species Station 1 Station 2 Station 3
E. coli 21 19 20
Salmonella typhi 8 23 0
Shigella sp 18 7 23
Pseudomonas aureus 17 16 0
Streptococcus sp. 20 24 23
Staphylococcus sp 10 0 12
Klebsilla sp. 14 13 14
4.4: Summary
The Unit dealt with some ecological statistical tools where Five
ecological indices were considered and treated.
99
Bio206 Statistics For Agriculture And Biological Sciences
https://www.youtube.com/watch?v=GEsGTzOedXw
https://www.youtube.com/watch?v=w9TvlB4hf7k
https://www.youtube.com/watch?v=ghhZClDRK_g
https://www.youtube.com/watch?v=OBfpdM9SJIc
6−1
Margalef’s index (d) =
In (102)
5
d=
4.6240
d =1.0811
Station 3
5−1
Margalef’s index (d) =
In (92)
4
d=
4.5218
d =0.8446
S
Menhinicks index (D) =
VN
Station 1
7
D=
V108
7
D=
10.39
D = 0.6737
Station 2
6
D=
V102
6
D=
10.09
D = 0.5946
Station 3
5
D=
V98
5
D=
9.8995
D = 0.5051
101
Bio206 Statistics For Agriculture And Biological Sciences
89.0412
H=
108
H= 0.8244
Station 2
Species Fi LogFi FiLogFi
E. coli 19 1.2787 24.2953
Salmonella typhi 23 1.3617 31.3191
Shigella sp 7 0.8450 5.9150
Pseudomonas 16 1.2041 19.2656
aureus
Streptococcus sp. 24 1.3802 33.1248
Staphylococcus sp 0 0 0
Klebsilla sp. 13 1.1139 14.4807
N=102 128.4005
S=6
H = 0.6848
Station 1
Species Fi Log Fi FiLogFi (F₁logF₁)² (F₁logF
/Fi
E. coli 21 1.3222 27.7662 770.9618 36.7124
Salmonella typhi 8 0.9030 7.2240 52.1861 6.5232
Shigella sp 18 1.2552 22.5936 510.4700 28.3594
Pseudomonas aureus 17 1.2304 20.9168 437.5125 25.7360
Streptococcus sp. 20 1.3010 26.0200 677.0404 33.8520
Staphylococcus sp 10 1.0000 10.0000 100.0000 10.000
102
Bio206 Statistics For Agriculture And Biological Sciences
Station 2
Species Fi LogFi FiLogFi (F₁logF₁)² (
F₁logF₁)²/
Fi
E. coli 19 1.278 24.2953 590.2616 31.0664
7
Salmonella 23 1.361 31.3191 980.8860 42.6472
typhii 7
Shigella sp 7 0.845 5.9150 34.9872 4.9981
0
Pseudomonas 16 1.204 19.2656 371.1633 23.1977
aureus 1
Streptococcus 24 1.380 33.1248 1097.2523 45.7188
sp. 2
Staphylococcu 0 0 0 0.0000 0
s sp
Klebsilla sp. 13 1.113 14.4807 209.6907 16.1300
9
N=10 128.400 163.7582
2 5
S=6
F₁log²F₁ = 163.7582
(∑F₁logF₁)²
∑F₁log²F₁−
N
Where S²H =Variance of H =
N²
(128.4005)²
163.7582−
102
Where S²H =Variance of H =
102²
16486.6884
163.7582−
102
Where S²H =Variance of H =
10404
163.7582−161.6342
Where S²H =Variance of H =
10404
103
Bio206 Statistics For Agriculture And Biological Sciences
2.1240
Where S²H =Variance of H =
10404
S²H =Variance of H = 0.00020415
H₁−H₂
t=
√H₁S² −H₂S²
(∑F₁logF₁)²
∑F₁log²F₁−
N
Where S²H =Variance of H =
N²
(F₁logF₁)²
F₁log²F₁ =
F₁
H₁−H₂
t=
√H₁S² −H₂S²
0.8244−0.7497
t=
√0.000147 − 0.00020415
0.0747
t=
√−0.00005715
0.0747
t=
0.007559
t = 9.8822
Degree of freedom are calculated using the equation
(S²H₁+S²H₂)²
df = (S²H₁)² (S²H₂)²
+
N₁ N₂
(S²H₁+S²H₂)²
df = (S²H₁)² (S²H₂)²
+
N₁ N₂
(0.000147+ 0.00020415)²
df = (0.000147)² (0.00020415)²
+
108 102
(0.00035115)²
df = 0.0000021609 0.00000004168
+
108 102
0.0000001233
df =
0.000000020008 + 0.0000000004086
0.0000001233
df =
0.0000000204166
df = 6.03920 = 6
Evenness
Station 1
0.8244
E=
Log 7
0.8244
E=
0.8450
E = 0.9756
Station 2
H H
E= =
Hmax Log S
0.7497
E=
Log 6
0.7497
E=
0.7781
E = 0.9635
Station 3 ???
104
Bio206 Statistics For Agriculture And Biological Sciences
Station 1
Species ni ni-1 ni(ni-1)
E. coli 21 20 420
Salmonella typhi 8 7 56
Shigella sp 18 17 306
Pseudomonas aureus 17 16 272
Streptococcus sp. 20 19 380
Staphylococcus sp 10 9 90
Klebsilla sp. 14 13 182
N=108
S=7 1706
1706
D= ∑s0=1
108(108 −1)
1706
D=
108(107)
1706
D=
11556
D = 0.1476
105
Bio206 Statistics For Agriculture And Biological Sciences
Unit Structure
5.1: Introduction
5.2: Intended Learning Outcomes
5.3: Statistical Software
5.4: Summary
5.5: References/Further Readings/Web Sources
5.6: Possible Answers to Self-Assessment Exercises
5.1: Introduction
The absence of adequate tools has been, for many years, an obstacle to
the development of Multivariate and Multidimensional analyses as they
were studied only in a theoretical context. Multidimensional analysis may
be defined as a group of techniques that have the aim to visualize, classify
and interpret the data. It tries to underline the latent structure of the data,
removing the redundant information. Multidimensional Statistical
analysis includes: Principal Component Analysis, Correspondence
Analysis, Discriminant Analysis, Canonical Correlation Analysis and
Cluster Analysis.
Simple Correspondence analysis is one of the most known tools for
qualitative data. It studies the relationships between the modalities of two
qualitative variables. Multiple Correspondence Analysis is used when
there are more than 2 qualitative variables where the Simple
Correspondence Analysis is not possible and the relationships between
the characters are studied. Discriminant analysis is used to verify if the
prior classification is confirmed after using the explicative variables. I.e.,
it classifies a new observation in one of the groups. Cluster analysis is a
group of techniques that have the aim to classify observations or
individuals in clusters. The observations in each cluster must be similar
and the clusters must be well separated.
Self-Assessment Exercises
1. What is SPSS?
2. Multiple Correspondence Analysis is used when there are more
than……..qualitative variables
5.4: Summary
107
Bio206 Statistics For Agriculture And Biological Sciences
London.
Daniel, W.W. (1995). Biostatistics: a foundation for Analysis in
Health sciences. Sixth Edition. John Wiley and sons Incorporated.
USA.
Helmut F. van Emden.(2008). Statistics for Terrified Biologists.
Blackwell Publishing Limited. USA.
https://www.xlstat.com/en/
https://www.youtube.com/watch?v=4wrtykLDdus
https://www.graphpad.com/series/getting-started/
https://www.youtube.com/watch?v=M0Sl-3eu974
108
Bio206 Statistics For Agriculture And Biological Sciences
2. two
Glossary
Data set: A collection of related, discrete items of data that may
be accessed individually or collectively, or managed as a single,
holistic entity. Data sets are generally organized into some formal
structure, often in a tabular format
nonparametric tests: A test that makes minimal assumptions about
the distribution of the data or about certain parameters of a
statistical model.
Pearson’s correlation coefficient (r): this is used for any bivariate
populations which are normally distributed.
regression to the mean: Tendency for a variable that has an extreme
value on its first measurement to have a more typical value on its
second measurement.
Spearman’s rank correlation coefficient = rs: this is non parametric
rank correlation
Species richness: The number of species within a region. (A term
commonly used as a measure of species diversity, but technically
only one aspect of diversity.)
Statistical computing: The collection and interpretation of data
aimed at uncovering patterns and trends. It may be used in
scenarios such as gathering research interpretations, statistical
modeling or designing surveys and studies, and advanced business
intelligence. R is a programming language that's highly compatible
with statistical computing.
The Shannon evenness index, abbreviated as SEI, provides
information on area composition and richness. It covers the
number of different land cover types (m) observed along the
straight line and their relative abundances (Pi). It is calculated by
dividing the Shannon diversity index by its maximum (h (m)).
Therefore, it varies between 0 and 1 and is relatively easy to
interpret.
109
Bio206 Statistics For Agriculture And Biological Sciences
110
Bio206 Statistics For Agriculture And Biological Sciences
B, B, B+, B-, C, C, C+, C-, D+, D. Test the null hypothesis that the
students performed equally well in the course under both lecturers
and prove the accuracy of the result.
9. A hydrobiologist studying the effluent characteristics of flow
station at different locations collected five effluent samples from
three flow stations for the determination of total hydrocarbon
(THC) concentrations (mg/l). The result of the analysis are given
below.
Location Location Location
A B C
14.6 8.4 6.9
12.1 5.0 7.3
9.6 5.5 5.8
8.2 6.6 4.1
10.2 6.3 5.1
111
Bio206 Statistics For Agriculture And Biological Sciences
Pond 7.69 7.70 7.71 7.73 7.74 7.74 7.78 7.81 7.82 -
B
Pond 7.74 7.75 7.77 7.78 7.80 7.81 7.81 - - -
C
14. Consider the scores of 22 students in Biostatistics II class thought
by one of the two lecturers A and B but took the same examination.
Lecturer A: A, A, A,B-, B-, B+, C-, C-, C, D Lecturer B: A, A, A, B, B, B+,
B-, C, C, C+, C-, D+, D. Test the null hypothesis that the students
performed equally well in the course under both lecturers and prove the
accuracy of the result.
15. Which measurement of species diversity does take into
consideration the number of individuals within a species or population?
16. What type of statistics can MINITAB be used for
Answers
1. Simple linear regression
2. Controlled/Manipulated
3. Simple
4. Negative simple relationship
5.
120
Cumulative Mortality (%)
100
80
60
40
20
0
0 2 4 6 8 10 12 14 16 18
Conc. of neem powder (mg/L)
∑𝑿∑𝒀
∑ 𝑿𝒀−
𝒏
b= (∑ 𝑿)²
∑ 𝑿𝟐 −
𝒏
X Y XY X2
0 0 0 0
2 20 40 4
4 38 152 16
6 48 288 36
8 60 480 64
10 72 720 100
12 82 984 144
14 90 1260 196
16 98 1568 256
18 100 1800 324
112
Bio206 Statistics For Agriculture And Biological Sciences
6.
X Y XY X2
0 0 0 0
2 20 40 4
4 38 152 16
6 48 288 36
8 60 480 64
10 72 720 100
12 82 984 144
14 90 1260 196
16 98 1568 256
18 100 1800 324
∑ 𝟗𝟎 ∑ 𝟔𝟎𝟖 ∑ 𝟕𝟐𝟗𝟐 ∑ 𝟏𝟏𝟒𝟎
n = 10
(∑ 𝑿)( ∑ 𝒀)
∑ 𝑿𝒀−
𝒏
r= (∑ 𝑿)𝟐 (∑ 𝒀)𝟐
𝟐
√(∑ 𝑿 − )(∑ 𝒀𝟐 − )
𝒏 𝒏
(𝟗𝟎)(𝟔𝟎𝟖)
Y2 𝟕𝟐𝟗𝟐−
𝟏𝟎
0 r= (𝟗𝟎)𝟐 (𝟔𝟎𝟖)𝟐
√(𝟏𝟏𝟒𝟎− )(𝟒𝟕𝟑𝟔𝟎− )
𝟏𝟎 𝟏𝟎
400 𝟕𝟐𝟗𝟐−
𝟓𝟒𝟕𝟐𝟎
𝟏𝟎
1444 r= 𝟖𝟏𝟎𝟎 𝟑𝟔𝟗𝟔𝟔𝟒
√(𝟏𝟏𝟒𝟎− )(𝟒𝟕𝟑𝟔𝟎− )
2304 𝟏𝟎
𝟕𝟐𝟗𝟐− 𝟓𝟒𝟕𝟐
𝟏𝟎
3600 r=
√(𝟏𝟏𝟒𝟎− 𝟖𝟏𝟎)(𝟒𝟕𝟑𝟔𝟎− 𝟑𝟔𝟗𝟔𝟔.𝟒)
5184 r=
𝟏𝟖𝟐𝟎
6724 √(𝟑𝟑𝟎)(𝟏𝟎𝟑𝟗𝟑.𝟔)
𝟏𝟖𝟐𝟎
8100 r=
√𝟑𝟒𝟐𝟗𝟖𝟖𝟖
9604 r=
𝟏𝟖𝟐𝟎
10000 𝟏𝟖𝟓𝟏.𝟗𝟗𝟓𝟕
r = 0.9827
∑ 47360
r2 = 0.98272
113
Bio206 Statistics For Agriculture And Biological Sciences
r2 = 0.9657
7.
Station 1 Station 2 Rank 1 Rank 2 D dt2
3.56 3.23 6 7.5 -1.5 2.25
3.67 3.76 7 13 -6 36
3.98 3.09 10.5 3 7.5 56.25
3.80 3.02 9 2 7 49
3.76 3.42 8 11 -3 9
3.98 3.23 10.5 7.5 3 9
2.56 3.67 1 12 -11 121
3.23 3.24 4 9 -5 25
3.52 3.34 5 10 -5 25
4.32 2.17 13 1 12 144
2.67 3.11 2 4 -2 4
4.21 3.12 12 5 7 49
3.12 3.13 3 6 -3 9
538.5
6 ∑ 𝑑𝑡 2 6(538.5) 3231
rs = 1 –( ) rs = 1 –( ) rs = 1 –( )
𝑛3 −𝑛 133 −13 2197−13
3231
rs = 1 –( ) rs = 1 –1.4794 rs = -0.4794
2184
8
LA LB Rank LA Rank LB
A1 A6 3.5 3.5
A2 A5 3.5 3.5
A3 A4 3.5 3.5
B-11 B10 12 9.5
B-12 B9 12 9.5
B+7 B+8 7.5 7.5
C-18 B-13 19 12
C-19 C15 19 16
C17 C16 16 16
D22 C+14 22.5 14
C-20 19
D+21 21
D23 22.5
n1=10 n2=13 R1=118.5 R2=157.5
U = n1 n2 +n₁((n₁ + 1))/2 – R1
U = 10x13 +10 ((10 + 1))/2 – 118.5
U = 130 +10 ((11))/2 – 118.5
U = 130 +10𝑥 5.5 – 118.5
U = 130 +55 – 118.5
U = 185 – 118.5
114
Bio206 Statistics For Agriculture And Biological Sciences
U = 66.5
U 1 = n 1 n2 ─ U
U1 =10x13─ 66.5
130 -66.5
U1= 63.5
R1+R2 = (𝐍 (𝐍 + 𝟏))/𝟐
118.5+157.5 = (23 (23 + 1))/2
276 = (23 (24))/2
276 = 552/2
276= 276
9
Location Location Location Rank A Rank B Rank C
A B C
14.6 8.4 6.9 15 11 8
12.1 5.0 7.3 14 2 9
9.6 5.5 5.8 12 4 5
8.2 6.6 4.1 10 7 1
10.2 6.3 5.1 13 6 3
R1 = 64 R2= 30 R3= 26
𝟏𝟐 𝑹₁²
H= ∑𝐤𝒊=𝟏 − 𝟑 (𝐍 + 𝟏)
𝐍(𝐍+𝟏) 𝒏₁
𝟏𝟐 𝐤 𝟔𝟒² 𝟑𝟎² 𝟐𝟔²
H= (∑𝒊=𝟏 + + ) − 𝟑 (𝟏𝟓 + 𝟏)
𝟏𝟓(𝟏𝟓+𝟏) 𝟓 𝟓 𝟓
𝟏𝟐 𝐤 𝟒𝟎𝟗𝟔 𝟗𝟎𝟎 𝟔𝟕𝟔
H= (∑𝒊=𝟏 + + ) − 𝟑 (𝟏𝟔)
𝟏𝟓(𝟏𝟔) 𝟓 𝟓 𝟓
𝟏𝟐
H= (𝟖𝟏𝟗. 𝟐 + 𝟏𝟖𝟎 + 𝟏𝟑𝟓. 𝟐) − 𝟒𝟖
𝟐𝟒𝟎
H = 𝟎. 𝟎𝟓 (𝟏𝟏𝟑𝟒. 𝟒) − 𝟒𝟖
H = 𝟓𝟔. 𝟕𝟐 − 𝟒𝟖
H = 𝟖. 𝟕𝟐
10
115
Bio206 Statistics For Agriculture And Biological Sciences
𝟏𝟐 𝑹₁²
H= ∑𝐤𝒊=𝟏 − 𝟑 (𝐍 + 𝟏)
𝐍(𝐍+𝟏) 𝒏₁
𝟏𝟐 𝟗𝟑𝟐 𝟗𝟓𝟐 𝟏𝟒𝟒 𝟏𝟔𝟒𝟐
H= (∑𝐤𝒊=𝟏 + + + ) − 𝟑 (𝟑𝟏 + 𝟏)
𝟑𝟏(𝟑𝟏+𝟏) 𝟖 𝟖 𝟕 𝟖
𝟏𝟐 𝟖𝟔𝟒𝟗 𝟗𝟎𝟐𝟓 𝟐𝟎𝟕𝟑𝟔 𝟐𝟔𝟖𝟗𝟔
H= (∑𝐤𝒊=𝟏 + + + ) − 𝟑 (𝟑𝟐)
𝟑𝟏(𝟑𝟐) 𝟖 𝟖 𝟕 𝟖
𝟏𝟐
H= (𝟏𝟎𝟖𝟏. 𝟏𝟐𝟓 + 𝟏𝟏𝟐𝟖. 𝟏𝟐𝟓 + 𝟐𝟗𝟔𝟐. 𝟐𝟖𝟓 + 𝟑𝟑𝟔𝟐) − 𝟗𝟔
𝟗𝟗𝟐
H = 𝟎. 𝟎𝟏𝟐𝟎𝟗 (𝟖𝟓𝟑𝟑. 𝟓𝟑𝟓) − 𝟗𝟔
H = 𝟏𝟎𝟑. 𝟏𝟕 − 𝟗𝟔
H = 𝟕. 𝟏𝟕𝟎
Groups (M) 1 2 3 4 5 6 7
Tied Ranks 3.5 6 10 13.5 20 23.5 26.5
3.5 6 10 13.5 20 23.5 26.5
6 10 13.5 20 26.5
13.5 26.5
No. of tied Ranks 2 3 3 4 3 2 4
H
Hc =
C
∑𝐓
C= correction factor = 1 −
𝐍³−𝐍
∑T= ∑𝒎 𝒊=𝟏(𝒕₁³ − 𝒕₁)
Where;
T=number of ties per group
df =K─1
116
Bio206 Statistics For Agriculture And Biological Sciences
Hc = 7.219
11
Drug A Drug B X1 –X2 Ranks of Signed
(X1) (X2) (di) di Rank
54.2 80.3 -26.1 8 -8
60.4 99.3 -38.9 12 -12
80.5 50.5 30 11 11
49.5 75.5 -26 7 -7
33.2 60.2 -27 9 -9
35.5 105.1 -69.6 13 -13
20.3 25.4 -5.1 2 -2
29.1 19.5 9.6 3 3
40.8 30.1 10.7 4 4
33.2 34.4 -1.2 1 -1
28.9 46.3 -17.4 6 -6
33.2 49.4 -16.2 5 -5
32.1 60.1 -28 10 -10
Wilcoxon (T ) = m (n+1) ─ T
I
where m= number of ranks with less frequent sign; T= sum of ranks with
less frequent sign; n= total number of sample
TI= 3 (13+1) ─ 18 T I= 3 (14) ─ 18
TI= 42 ─ 18
TI=24
12.
Male Female Rank male Rank female
7.6 6.9 18 10.5
7.4 6.8 17 9
7.3 6.6 16 6.5
7.2 6.5 14.5 5
7.1 6.4 13 4
7.0 6.2 12 3
6.7 6.9 8 10.5
8.2 7.2 19 14.5
5.6 6.6 1 6.5
5.8 - 2
8.9 - 20
n1 = n2 = 9 R1 =140.5 R2 =69.5
11
U = n1 n2 +𝑛₁
(𝑛1+1)
– R1 U1 = 99 – 24.5
2 U1 = 74.5
(11+1)
U = (11)(9) +11 – 140.5
2
12 𝑵 (𝑵+𝟏)
U = 99 +11 – 140.5 R1+R2 =
2 𝟐
U = 99 +11(6) – 140.5
117
Bio206 Statistics For Agriculture And Biological Sciences
13.
Pond Pond Pond Rank Rank B Rank C
A B C A
7.68 7.69 7.74 1 2 11*
7.70 7.70 7.75 3.5* 3.5* 13
7.72 7.71 7.77 6 5 15
7.73 7.73 7.78 8* 8* 17*
7.73 7.74 7.80 8* 11* 19.5*
7.76 7.74 7.81 14 11* 22.5*
7.78 7.78 7.81 17* 17* 22.5*
7.80 7.81 - 19.5* 22.5* -
7.81 7.82 - 22.5* 25 -
8.23 - - 26 - -
R1= R2=105 R3=120.5
125.5
𝟏𝟐 𝑹₁²
H= ∑𝒌𝒊=𝟏 − 𝟑 (𝑵 + 𝟏)
𝑵(𝑵+𝟏) 𝒏₁
Where n = number of observations in a sample (group i)
N = ∑𝑘𝑖=1 (total number of observations in all i= 1)
R1 = sum of ranks of ni observation in group i
14.
LA LB Rank LA Rank LB
A A 3.5 3.5
A A 3.5 3.5
A A 3.5 3.5
B- B 12 9.5
B- B 12 9.5
B+ B+ 7.5 7.5
C- B- 18 12
118
Bio206 Statistics For Agriculture And Biological Sciences
C- C 18 15.5
D C 21.5 15.5
C+ 14
C- 18
D+ 20
D 21.5
n1=9 n2=13 R1=99.5 R2=153.5
U = n1 n2 +𝑛₁
(𝑛1+1)
– R1 U1 = 117 – 62.5
2 U1 = 54.5
(9+1)
U = (9)(13) +9 – 99.5 𝑵 (𝑵+𝟏)
2 R1+R2 =
10 𝟐
U =117 +9 – 99.5 𝟐𝟐 (𝟐𝟐+𝟏)
2 99.5+153.5 =
U = 117 +9(5) – 99.5 𝟐𝟐 (𝟐𝟑)
𝟐
U = 62.5 𝟐
253 = 𝟐𝟓𝟑
U1 = n1n2─ U
U1 = (9)(13)─ (62.5)
15.Species Richness
16.MINITAB can be used for various types of statistical analysis ranging
from editing and manipulating data, basic statistics, arithmetic,
regression, Anova, non-parametric test, exploratory data analysis etc
119