0% found this document useful (0 votes)
18 views86 pages

Biostatistics 2023

The document discusses different statistical concepts such as populations, samples, parameters, and statistics. It also covers topics like measures of central tendency (mean, median, mode) and variability (range, standard deviation, variance). Finally, it examines experimental designs including completely randomized designs and randomized complete block designs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views86 pages

Biostatistics 2023

The document discusses different statistical concepts such as populations, samples, parameters, and statistics. It also covers topics like measures of central tendency (mean, median, mode) and variability (range, standard deviation, variance). Finally, it examines experimental designs including completely randomized designs and randomized complete block designs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Biostatistics

First Part
Biotechnology
Level 3

Prof. Dr Sedhom Asaad


Statistical Terms
Population

Population refers to the group you


are studying. This might include a
certain demographic or a sample of
the group, which is a subset of the
population.
Sample

A sample is a collection of units


from a population.
Parameter

A numerical property of a
population, such as its mean.
Statistic

a numerical characteristic of the sample;


a statistic estimates the corresponding
population parameter.
Mean (Average)
(4+5+6+7+5) 27
Mean = ------------------ = ----- = 5.4
5 5

If you add a constant (say 3) to this list what will


happen to the mean?

4, 5, 6, 7, 5
Median.

"Middle value" of a list. The smallest number such


that at least half the numbers in the list are no
greater than it. If the list has an odd number of
entries, the median is the middle entry in the list
after sorting the list into increasing order. If the list
has an even number of entries, the median is the
smaller of the two middle numbers after sorting.
Mode

For lists, the mode is a most common


(frequent) value.

What is the mode of this list?

3, 4, 5, 3, 6, 4, 3, 5, 3
2- Measures of Variability
Range.
The range of a set of numbers is the largest
value in the set minus the smallest value in
the set. Note that as a statistical term, the
range is a single number, not a range of
numbers.
What is the range of these numbers?
8, 10, 15, 7, 9, 10, 15, 3, 15, 21
𝑋 2
𝑋2 −
𝑛
𝑆=
𝑛−1
Standard Deviation:
Calculate standard deviation for the following
sample:
4, 5, 6, 7, 6
Variance, population variance (S2)

The variance of a list is the square of the


standard deviation of the list, that is, the
average of the squares of the deviations of
the numbers in the list from their mean.
What is the variance of the previous sample?
S2 = (1.14)2
= 1.29
Simple Experiments
What are ways to eliminate sources of
variability?
• Randomization
– objects or individuals are randomly assigned (by
chance) to an experimental group. Using
randomization is the most reliable method of
creating homogeneous treatment groups, without
involving any potential biases or judgments.
• Replication
– the repetition of an experiment on a large group
of subjects, is required. If a treatment is truly
effective, the long-term averaging effect of
replication will reflect its experimental worth.
Completely Randomized Design
(one-factor design)

• Experimental units are relatively


homogeneous.
• Experiment will use very few replicates.
• Treatments are assigned to
experimental units at random.
• Each treatment is replicated the same
number of times (balanced design).
The Completely Randomized Single-Factor Experiment

We may describe the observations by the linear statistical


model:

The model could be written as

24
The Completely Randomized Single-Factor Experiment

The Analysis of Variance

We wish to test the hypotheses:

The analysis of variance partitions the total


variability into two parts.

25
Completely Randomized Design

Worked Example:

Suppose we would like to evaluate four wheat


varieties, i.e., (A) Sids1, (B) Sids2, (C) Sids3 and (D)
Sids4 in homogenous soil using four repeats. What is
the lay out of this experiment and what is the source
of variation and degrees of freedom in this case?
How to randomize?
Flip a coin or draw numbers out of a bag

Use a random number table; Table B in your book

Use a statistical software package or program.


Completely Randomized Design

Worked Example:

Randomization of the treatments:

A A C C

B D B D

A C B A

B D D C
Partitioning of the total sum of squares for
the completely randomized design
ANOVA Table
Completely Randomized Design

Worked Example:

Source of variance and degree of freedom (ANOVA Table):

S.O.V d.f S.S M.S F


(Calculated)
Treatments 3
(Varieties)
Error 12
Total 15
Completely Randomized Design

Worked Example:

Analysis of variance of the experiment

A 5 A 3 C 7 C 8

B 7 D 4 B 6 D 5
A 4 C 8 B 8 A 5

B 7 D 4 D 5 C 9
Completely Randomized Design

Worked Example:

Analysis of variance of the experiment

A B C D
5 7 7 4
3 6 8 5
4 8 8 4
5 7 9 5
17 28 32 18
95
Completely Randomized Design

Worked Example:

Analysis of variance of the experiment

(95)2
Correction Factor = ----------- = 564.06
16
T.S.S. =(52 + 32 + 42 + 52 +….+ 52) - 564.06 = 613- 564.06
= 48.94
t.S.S = (172 + 282 + 322 + 182)/ 4 – 564.06 = 41.19

Experimental Error = 48.94 – 41.19 = 7.75


Completely Randomized Design

Worked Example:
Source of variance and degree of freedom (ANOVA Table):
S.O.V d.f S.S M.S F (Cal.)

Treatments 3 41.19 13.73** 21.12


(Varieties)
Error 12 7.75 0.65
Total 15 48.94 3.26
To test Null Hypothesis: Compare f (cal.) value with f (tab. =
2.25) value…
The Completely Randomized Single-Factor Experiment

Multiple Comparisons Following the ANOVA


The least significant difference (LSD) is

If the sample sizes are different in each treatment:

36
Completely Randomized Design
Worked Example:
Source of variance and degree of freedom (ANOVA Table):
S.O.V d.f S.S M.S F (Cal.)

Treatments 3 41.19 13.73 21.12


(Varieties)
Error 12 7.75 0.65
Total 15 48.94

To compare between means you should calculate LSD (Least


Significant Difference) as follows:
LSD (Value) = t (0.05), 12 x {(2 x EMS/ 4)0.05}
= 2.06 x 1.14 = 1.18
Then arrange the studied means in an ascending/ descending
order and compare each two means with LSD value.
Randomized (Complete) Block Design

• The RCB is the standard design for


‘agricultural’ experiments. The field is divided
into units to account for any variation in the
field. Treatments are then assigned at random
to the subjects in the blocks-once in each block.
Randomized (Complete) Block Design

• Treatments are assigned at random within blocks of


adjacent subjects, each treatment once per block.

• The number of blocks is the number of replications.

• Any treatment can be adjacent to any other


treatment, but not to the same treatment within the
block.

• Used to control variation in an experiment by


accounting for spatial effects.
Randomized Complete Block Designs

The randomized block design is an extension of


the paired t-test to situations where the factor of
interest has more than two levels.

40
Randomized Complete Block Designs

The appropriate linear statistical model:

We assume
• treatments and blocks are initially fixed effects
• blocks do not interact

41
Randomized Complete Block Design

Worked Example:

Suppose we would like to evaluate the four wheat


varieties, i.e., (A) Sids1, (B) Sids2, (C) Sids3 and (D)
Sids4 in heterogenous soil using four replications.
What is the lay out of this experiment and what is the
source of variation of this experiment? Analyze the
variance and test the null hypothesis then find out
the best variety.
Blocking Example

B A A C
moisture gradient

A B B D

C C D A

D D B
C

Treatment effects confounded with moisture effect!


Blocking Example

A C D B
Moisture gradient

B C A D

C B D A

D A B C

Block effect now removes moisture effect, fair comparisons among treatments.
Randomized Complete Block Designs

ANOVA Table

45
Randomized Complete Block Design

Worked Example:

Source of variance and degree of freedom (ANOVA Table):

S.O.V d.f S.S M.S F


(Calculated)
Replication 3

Treatments 3
(Varieties)
Error 9
Total 15
Randomized Complete Block Design
Worked Example:

Analysis of variance of the experiment

R1 A C D B
5 7 4 7
R2 B 6 C A 3 D
8 5
R3 C 8 B 8 D 4 A 4

R4 D 5 A 5 B 7 C 9
Randomized Complete Block Design
Worked Example:
Analysis of variance of the experiment

A B C D
R1
5 7 7 4 23
R2
3 6 8 5 22
R3
4 8 8 4 24
R4 5 7 9 5 26
17 28 32 18
Sum
95
Randomized Complete Block Design
Worked Example:

Analysis of variance of the experiment

(95)2
Correction Factor = ----------- = 564.06
16
T.S.S. =(52 + 32 + 42 + 52 +….+ 52) - 564.06 = 613- 564.06
= 48.94
r.S.S = (232 + 222 + 242 + 262)/ 4 – 564.06 = 2.19
t.S.S = (172 + 282 + 322 + 182)/ 4 – 564.06 = 41.19
Experimental Error = 48.94 – 2.19 - 41.19 = 5.56
Randomized Complete Block Design
Worked Example:
Source of variance and degree of freedom (ANOVA Table):
S.O.V d.f S.S M.S F (Cal.)

Replication 3 2.19 0.73


Treatments 3 41.19 13.73** 22.15
(Varieties)
Error 9 5.56 0.62
Total 15 48.94
To test Null Hypothesis: Compare F (cal.) value with F (tab. =
2.05) value…
Randomized Complete Block Design
Worked Example:

To compare between means you should calculate LSD (Least


Significant Difference) as follows:
LSD (Value) = t(0.05), 9 x {(2 x EMS/ 4) 0.05}
= 2.02 x 1.14 = 1.12
Then arrange the studied means in an ascending/ descending
order and compare each two means with LSD value.
Randomized Complete Block Design

Worked Example:

Analysis of variance of the experiment

A B C D
R1
5 7 7 4 23
R2
3 6 8 5 22
R3
4 8 8 4 24
R4 5 7 9 5 26
17 28 32 18
Sum
95
Mean 4.25 7.00 8.00 4.50
Randomized Complete Block Design
Worked Example:

Compare between means using LSD value:


A B C D
Mean 4.25 7.00 8.00 4.50
First arrange the means in descending order then compare
each difference by the value of LSD

C B D A
8.00 7.00 4.50 4.25
It is clear that the best two varieties are C and B, because
the have the highest yield without significant difference
between each other.
Randomized Complete Block Design
Question:
Suppose we would like to evaluate the four wheat varieties, i.e.,
(A) Sids1, (B) Sids2, (C) Sids3 and (D) Sids4 in heterogenous soil
using four replications and the varieties means were: 4.25, 7.00,
8.00 and 4.50 kg/ plot for A, B, C and D varieties, respectively.
1- What is the layout of this experiment?
2- Complete the ANOVA table and test null hypothesis.
3- What is the best variety?
S.O.V d.f S.S M.S F (Cal.) F (Tab.)

Replication 0.73 2.05


Treatments
(Varieties)
Error
Total 48.94
Randomized Complete Block Design
Question:
Suppose we would like to evaluate the four wheat varieties, i.e., (A) Sids1, (B)
Sids2, (C) Sids3 and (D) Sids4 in heterogenous soil using four replications and
the varieties means were: 4.25, 7.00, 8.00 and 4.50 kg/ plot for A, B, C and D
varieties, respectively.
1- What is the layout of this experiment?
2- Complete the ANOVA table and test null hypothesis.
3- What is the best variety?
S.O.V d.f S.S M.S F (Cal.) F (Tab.)

Replication 3 0.73

Treatments 3 2.05
(Varieties)

Error 9

Total 15 48.94
Randomized Complete Block Design
Question:
Suppose we would like to evaluate the four wheat varieties, i.e., (A) Sids1, (B)
Sids2, (C) Sids3 and (D) Sids4 in heterogenous soil and the varieties means
were: 4.25, 7.00, 8.00 and 4.50 kg/ plot for A, B, C and D varieties, respectively.
1- What is the layout of this experiment?
2- Complete the ANOVA table and test null hypothesis.
3- What is the best variety?
S.O.V d.f S.S M.S F (Cal.) F (Tab.)

Replication 0.73

Treatments 2.05
(Varieties)

Error
9
Total 48.94
Randomized Complete Block Design
Question:
Suppose we would like to evaluate the four wheat varieties, i.e., (A) Sids1, (B)
Sids2, (C) Sids3 and (D) Sids4 in heterogenous soil using four replications and
the varieties means were: 4.25, 7.00, 8.00 and 4.50 kg/ plot for A, B, C and D
varieties, respectively.
1- What is the layout of this experiment?
2- Complete the ANOVA table and test null hypothesis.
3- What is the best variety?
S.O.V d.f S.S M.S F (Cal.) F (Tab.)

Replication 3 2.19 0.73

Treatments 3 2.05
(Varieties)

Error 9

Total 15 48.94
Randomized Complete Block Design
Question:
The varieties means were: 4.25, 7.00, 8.00 and 4.50 kg/
plot for A, B, C and D, respectively.

S.O.V d.f S.S M.S F (Cal.) F (Tab.)

Replication 3 2.19 0.73

Treatments 3 2.05
(Varieties)

Error 9

Total 15 48.94
Randomized Complete Block Design
Worked Example:

Analysis of variance of the experiment

(95)2
Correction Factor = ----------- = 564.06
16

t.S.S = (172 + 282 + 322 + 182)/ 4 – 564.06 = 41.19

Experimental Error = 48.94 – 41.19 = 5.56


Randomized Complete Block Design
Question:
Suppose we would like to evaluate the four wheat varieties, i.e., (A) Sids1, (B)
Sids2, (C) Sids3 and (D) Sids4 in heterogenous soil using four replications and
the varieties means were: 4.25, 7.00, 8.00 and 4.50 kg/ plot for A, B, C and D
varieties, respectively.
1- What is the layout of this experiment?
2- Complete the ANOVA table and test null hypothesis.
3- What is the best variety?
S.O.V d.f S.S M.S F (Cal.) F (Tab.)

Replication 0.73
3 2.19
Treatments 2.05
(Varieties)
3 41.19

Error
9 5.56
Total 48.94
15
Randomized Complete Block Design
Question:
Suppose we would like to evaluate the four wheat varieties, i.e., (A) Sids1, (B)
Sids2, (C) Sids3 and (D) Sids4 in heterogenous soil using four replications and
the varieties means were: 4.25, 7.00, 8.00 and 4.50 kg/ plot for A, B, C and D
varieties, respectively.
1- What is the layout of this experiment?
2- Complete the ANOVA table and test null hypothesis.
3- What is the best variety?
S.O.V d.f S.S M.S F (Cal.) F (Tab.)

Replication 0.73
3 2.19
Treatments 2.05
(Varieties)
3 41.19 13.73 22.14

Error
9 5.56 0.62
Total 48.94
15
Advantages and Disadvantages
Advantages of a Blocked Design
• Controls a single extraneous source of variation and removes its
effect from the estimate of experimental error.
• Allows more flexibility in experimental layout.
• Allows more flexibility in experimental implementation and
administration.

Disadvantages of a Blocked Design


• Generally unsuited when there is a large number of treatments
because of possible loss of within block homogeneity.
• Serious problem with the analysis if a block factor by treatment
interaction effect actually exists and no replication within blocks
has been included. (solution: use replication within blocks when
possible).
Latin Square Design

• This design is used when soil heterogeneity


exists in two directions. Number of treatments
is equal to number of columns and rows.
Treatments are arranged randomly within each
column and row.
Latin square design
In most cases rather weak test
if analyzed as Latin square
(i.e. column and row taken as
factors in incomplete three
way ANOVA)
Latin Square Design

Worked Example:

Suppose we would like to evaluate the four wheat


varieties, i.e., (A) Sids1, (B) Sids2, (C) Sids3 and (D)
Sids4 in two- directional heterogenous soil. What is
the lay out of this experiment and what is the source
of variation of this experiment? Analyze the
variance and test the null hypothesis then find out
the best variety.
Latin Square Design
Worked Example:
Randomization :

Column 1 Column 2 Column 3 Column 4

Row1 A D C B

Row2 B C A D

Row3
C B D A

Row4
D A B C
Latin Square Design
Worked Example:
Randomization :

Column 1 Column 2 Column 3 Column 4

Row1 A 5 D 4 C 7 B 7

Row2 B 6 C 8 A 3 D 5

Row3
C 8 B 8 D 4 A 4

Row4
D 5 A 5 B 7 C 9

Treatments Table 2-dimensioned Table


Latin Square Design
Worked Example:

Source of variance and degree of freedom (ANOVA Table):

S.O.V d.f S.S M.S F


(Calculated)
Columns 3
Rows 3
Treatments 3
(Varieties)
Error 6
Total 15
Latin Square Design
Worked Example:
Analysis of variance of the experiment
Column 1 Column 2 Column 3 Column 4

Row1 5 4 7 7 23
Row2 6 8 3 5 22
Row3 8 8 4 4 24
Row4 5 5 7 9 26
24 25 21 25
Sum 95

Raw Data
Latin Square Design
Worked Example:
Analysis of variance of the experiment

A B C D
5 7 7 4
3 6 8 5
4 8 8 4
5 7 9 5
17 28 32 18
Sum 95

Raw Data
Latin Square Design
Worked Example:
Analysis of variance of the experiment
(95)2
Correction Factor = ----------- = 564.06
16
T.S.S. =(52 + 32 + 42 + 52 +….+ 52) - 564.06 = 613- 564.06
= 48.94
c.S.S = (242 + 242 + 212 + 252)/ 4 – 564.06 = 2.69
r.S.S = (232 + 222 + 242 + 262)/ 4 – 564.06 = 2.19
t.S.S = (172 + 282 + 322 + 182)/ 4 – 564.06 = 41.19
Error = 48.94 – 2.69 – 2.19 - 41.19 = 2.87
Latin Square Design
Worked Example:

Analysis of variance Table:


S.O.V d.f S.S M.S F
(Calculated)
Columns 3 2.69 0.90
Rows 3 2.19 0.73
Treatments 3 41.19 13.73 28.60
(Varieties)
Error 6 2.87 0.48
Total 15 48.94
Latin Square Design
Question:
Suppose we would like to evaluate the four wheat varieties, i.e., (A) Sids1, (B)
Sids2, (C) Sids3 and (D) Sids4 in Latin square design and the varieties means
were: 4.25, 7.00, 8.00 and 4.50 kg/ plot for A, B, C and D varieties, respectively.
1- What is the layout of this experiment?
2- Complete the ANOVA table and test null hypothesis.
3- What is the best variety?

S.O.V d.f S.S M.S F (Cal.) F (Tab.)

Colums 0.90
Rows 2.19

Treatments 28.60
(Varieties)
Error

Total 48.94
Factorial Experiments
Factorial Experiments

What is a Factorial Design?


A factorial experimental design is used to investigate
the effect of two or more independent variables on
one dependent variable.
Interaction:
- Interaction between factors means that the
effect produced by a change in a factor on
the response depends on the level of the
other factor(s).
- Two independent variables interact if the
effect of one of the variables differs
depending on the level of the other
variable.
Factorial Experiments

Worked example:
Suppose a researcher wants to evaluate two maize
hybrids (S.C. 10, and S.C. 30k 8) under three nitrogen
levels (40, 80, 120 kg N/ fed), in RCBD with four
replications. Show how to arrange these treatments in
the experimental plots and what is the source of
variation and degrees of freedom of this experiment?
Factorial Experiments

Worked example:
The number of treatments of this experiment is the
combination of two hybrids and three nitrogen levels
as follows:

H1 H2

N1 N2 N3 N1 N2 N3

A B C D E F
Factorial Experiments
Worked example:
Randomization:
As long as the soil is heterogenous then it should be
divided into four replications and the six treatments
must arrange at random inside each replication as
follows:
Randomization

A C D B E F
Moisture gradient

B C F E A D

C B E F D A

D A B C E F

Block effect now removes moisture effect, fair comparisons among treatments.
Factorial Experiments
Worked example:
Source of variation and degrees of freedom:
It must be as follows:
S.O.V d.f
Replication 3
Hybrids (A) 1
N Levels (B) 2
Interaction (AB) 2
Error 15
Total 23
Definitions

Experiment design -- An experiment design is a plan


for collecting and analyzing data.
Experiment unit -- The single individual (person,
animal, plant, soil plots, etc.) to which the different treatments are
assigned.
Factor
The entity whose effect on the response is investigated in the
experiment. = is the explanatory variable. A factor is a variable
over which you have direct control in an experiment. Some
examples are time, temperature, and pressure.
Level
The setting of a factor used in the experiment. = a specific
value for the factor. For example, 6 hrs. is a level for time.
Definitions
Treatment
The levels of a factor in a single factor experiment are also
referred to as treatments. In experiments with many
factors a combination of the levels of the factors is
referred to as a treatment. = a specific experimental
condition applied to the units.

Experimental Error (also known as chance error)


If an experiment is replicated in every way except with new
experimental units, then one would expect slightly different results
each time.
One of the objectives of experimental design is to reduce the
size of experimental error as much as possible for a fixed number
of experimental units.
Definitions
Example (1): A consumer group wants to test
cake pans to see which works the best (bakes
evenly). It will test aluminum, glass, and
plastic pans in both gas and electric ovens.
Experimental units are:
Factors are:
Levels are:
Response variable is:
Number of treatments are:
Definitions
Example (1): A consumer group wants to test
cake pans to see which works the best (bakes
evenly). It will test aluminum, glass, and plastic
pans in both gas and electric ovens.
Experimental units are: Cake batter
Factors are: Two factors- type of pan & type of oven
Levels are: Type of pan has 3 levels (aluminum, glass, &
plastic & type of oven has 2 levels (electric & gas).
Response variable is: How evenly the cake bakes.
Number of treatments are: 6
Special Terminology : Design of Experiments
• Response variable
– Measured output value
• Factors
– Input variables that can be changed
• Levels
– Specific values of factors (inputs)
• Continuous or discrete
• Replication
– Completely re-run experiment with same input levels
– Used to determine impact of measurement error
• Interaction
– Effect of one input factor depends on level of another input
factor
Thank you

You might also like