0% found this document useful (0 votes)

65 views106 pages

Basic Statistics For Research

Statistics

Uploaded by

Masita Draneb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views106 pages

Basic Statistics For Research

Statistics

Uploaded by

Masita Draneb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 106

Basic Statistics for Research

Kenneth M. Y. Leung

Why do we
need statistics?

Statistics
Derived from the Latin for state - governmental
data collection and analysis.
Study of data (branch of mathematics dealing
with numerical facts i.e. data).

The analysis and interpretation of data with a

view toward objective evaluation of the
reliability of the conclusions based on the
data.
Three major types: Descriptive, Inferential and
Predictive Statistics

Variation - Why statistical

methods are needed
http://www.youtube.com/watch?v=fsRY
kRqQqgg&feature=related
By UCMSCI

3 Major Types of Stats

Descriptive statistics (i.e., data distribution
central tendency and data dispersion)
Inferential statistics (i.e., hypothesis
testing)
Predictive statistics (i.e., modelling)

Descriptive Stats

http://www.censtatd.gov.hk

Inferential Stats Hypothesis Testing

From observation to scientific questioning:

Why do females generally live longer than males in
human and other mammals?
Setting hypothesis (theory) for testing:
Hypothesis: Metabolic rate of males is faster than that
of females, leading to shorter life span in males.
Hypothesis: Males consume more food than females,
leading to a higher chance of exposure to toxic
substances.

Inferential Stats

A Hypothesis
A statement relating to an observation that
may be true but for which a proof (or
disproof) has not been found.
The results of a well-designed experiment
may lead to the proof or disproof of a
hypothesis (i.e. accept or reject of the
corresponding null hypothesis).

Inferential Stats

For example, Heights of male vs. female at age of 30.

Our observations: male H > female H; it may be
linked to genetics, consumption and exercise etc.
Is that true for the hypothesis (HA): male H > female H?
A corresponding Null hypothesis (Ho): male H female H
Scenario I:

Randomly select 1 person from each sex.

Male: 170
Female: 175
Then, Female H> Male H. Why?

Scenario II: Randomly select 3 persons from each sex.

Male: 171, 163, 168
Female: 160, 172, 173
What is your conclusion then?

Inferential Stats

Samples

Sub-samples
Population

Inferential Stats

0.10
0.09

After taking 100 random samples, the

two distributions are uncovered.

Probability density

0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
140

150

160

170

Height (cm)

180

190

Inferential Stats

Important Take-Home Messages:

(1) Sample size is very important and will affect your

conclusion.
(2) Measurement results vary among samples (or
subjects) that is variation or uncertainty.
(3) Variation can be due to measurement errors
(random or systematic errors) and variation
inherent within samples; e.g., at age 30, female
height varies between 148 and 189 cm. Why?
(4) Therefore, we always deal with distributions of
data rather than a single point of measurement
or event.

How many samples are needed?

Mean values
Minimum
sample
size

True mean

Sample size
*Assuming data follow the normal distribution

Determine the minimum sample size by plotting

the running means

Stabilization of mean and SD

Inferential Stats

Which one do you prefer?

Zimmer 2001

Inferential Stats

We can infer if the observed preference frequencies are identical to

the hypothetical preference frequencies (e.g. 1:2:10:11:3:2:1) using
a Chi-square test.
Chi-square = (Oi-Ei)2/Ei

Zimmer 2001

Inferential Stats Hypothesis Testing

How can we test the following hypotheses?

Ho 1: The water sample A is cleaner than the water
sample B in terms of E. coli count.
Ho 2: Water quality in Site A is better than Site B in
terms of E. coli count during the swimming
season.

Ho 3: Water quality in Site A is better than Site B in

terms of E. coli count at all times.

Predictive Stats

b: Sullivans method

c: A regression model

Predictive Stats

Source: Hong Kong Observatory

Basic Descriptive Statistics

Measurement Theory
Environmental scientists use measurements
routinely in Lab or field work by assigning
numbers or groups (classes).
Mathematical operations may be applied to
the data, e.g. predicting fish mass by their
length through an established regression
Different levels of measurements:
nominal, ordinal, interval scale, ratio

Nominal

Ordinal

100

1000
Scale

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

How to Describe the Data Distribution?

Central tendency
Mean for normally distributed data
Median for non-normally distributed data

Dispersal pattern
Standard deviation for normally distributed
data

Range and/or Quartiles for non-normally

distributed data

Measurements
(data)

Descriptive
statistics

Normality Check
Frequency histogram
(Skewness & Kurtosis)
Probability plot, K-S
test

YES

Data transformation

Median, range,
Q1 and Q3

Mean, SD, SEM,

95% confidence
interval
Data transformation
Check the
Homogeneity
of Variance

YES
Parametric Tests
Students t tests for
2 samples; ANOVA
for 2 samples; post
hoc tests for
multiple comparison
of means

Non-Parametric
Test(s)
For 2 samples: MannWhitney
For 2-paired samples:
Wilcoxon
For >2 samples:
Kruskal-Wallis
Sheirer-Ray-Hare

Ball-Balls Flowchart

Measurements of Central Tendency

mean

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0 mm

Mean = Sum of values/n = Xi/n

e.g. length of 8 fish larvae at day 3 after hatching:
0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm
mean length = (0.6+0.7+1.2+1.5+1.7+2.0+2.2+2.5)/8
= 1.55 mm

mean
median

0.5

1.0

1.5 2.0

2.5

3.0 3.5

4.0 mm

Median, Percentiles and Quartiles

Order = (n+1)/2
e.g. 0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm
order 1 2
3 4 5 6 7 8
order = (8+1)/2 = 4.5
Median = 50th percentile = (1.5 + 1.7)/2 = 1.6 mm
order for Q1 = 25th percentile = (8+1)/4 = 2.25
then Q1 = 0.7 + (1.2 - 0.7)/4 = 0.825 mm

mean
median

0.5

1.0

1.5 2.0

2.5

3.0 3.5

4.0 mm

2.5

3.0

4.0 mm

mean
median

0.5

1.0

1.5

2.0

3.5

Median is often used with mean.

Mean is, however, used much more frequent.
Median is a better measure of central tendency for data
with skewed distribution or outliers.

Other Measures of Central Tendency

Range midpoint or range = (Max value - Min value)/2
not a good estimate of the mean and seldom-used

Geometric mean = n(x1x2 x3 x4.xn)

= 10^[mean of log10(xi)]
Only for positive ratio scale data
If data are not all equal, geometric mean < arithmetic mean

Use in averaging ratios where it is desired to give each ratio

equal weight

Measurements of Dispersion

Range
e.g. length of 8 fish larvae at day 3 after hatching:
0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm
Range = 2.5 - 0.6 = 1.9 mm (or say from 0.6 t 2.5mm)

Percentile and quartiles

Population Standard Deviation ()

Averaged measurement of deviation from mean
xi - x
e.g. five rainfall measurements, whose mean is 7
Rainfall (mm)
xi - x
(xi - x)2
12

12 - 7 = 5

0
2
5
16

0 - 7 = -7
2 - 7 = -5
5 - 7 = -2
16 - 7 = 9
Sum = 184

25
49
25
4
81
Sum = 184

Population variance: 2 = (xi - x)2/n = 184/5 = 36.8

Population SD: = (xi - x)2/n = 6.1

Sample SD (s)
s = [(xi - x)2]/ (n - 1)
s = [xi2 ((xi)2 /n)]/ (n - 1)

Two modifications:
by dividing [(xi - x)2] by (n -1) rather than n, gives a better
unbiased estimate of (however, when n increases,
difference between s and declines rapidly)
the sum of squared (SS) deviations can be calculated as
(xi2)- ( xi)2/ n

Sample SD (s)
e.g. five rainfall measurements, whose mean is 7.0
xi2

144

0
2
5
16

0
4
25
256
(xi2) = 429

Rainfall (mm)

0
2
5
16
xi = 35
(xi)2 = 1225

s2 = [xi2 - (xi)2 /n]/ (n - 1) = [429 - (1225/5)]/ (5 - 1)

= 46.0 mm
s = (46.0) = 6.782 = 6.8 mm

Basic Experimental Design for

Environmental Research
1. Setting environmental questions into statistical
questions [e.g. spatial and temporal variations]
2. Setting hypotheses and then statistical null hypotheses
4. Statistical consideration (treatment groups, sample size,
true replication, confounding factors etc.)

5. Sampling design (independent, random, samples)

6. Data collection & measurement (Quality Control and
Quality Assurance Procedures)
7. Data analysis
Too few data: cannot obtain reliable conclusions
Too many data: extra effort (time and money) in
data collection

Generalized scheme of logical components of a

research programme (Underwood 1997)
Weapon size versus body
size as a predictor of
winning fights

Start here
Observations
Patterns in space or time
Models
Explanations or theories
Hypothesis
Predictions based on model

Carcinus maenas
Reference: Sneddon et
al. 1997, in
Behav. Ecol. Sociobiol.
41: 237 - 242

Null Hypothesis
Logical opposite to hypothesis

Experiment
Critical test of null hypothesis
Retain Ho
Refute
hypothesis
and model

Interpretation
Don't end here

Reject Ho
Support
hypothesis
and model

Randomized Sampling
Lucky Draw Concept
To randomly select 30 out of 200 sampling stations in
Hong Kong waters, you may perform a lucky draw.
So, the chance for selecting each one of them for each
time of drawing would be more or less equal (unbiased).
It can be done with or without replacement.
Sampling with Transects and a Random Number Table
Randomly lay down the transects based on random nos.
Randomly take samples along each transect.

Randomized Sampling
Spatial Comparison Clustered Random
Sampling
Randomly take
e.g. 10 samples
from each
randomly
selected site

S8 S9 S10 S11 S12 S13 S14

S15 S16 S17 S18 S19 S20 S21

Temporal Comparison
Wet Season vs. Dry Season
Randomly select sampling days within each
season (assuming each day is independent to
other days) covering both neap and spring tides.
Transitional period should not be selected to
ensure independency of the two seasons.

Study Sites (HK map)

Spatial

Temporal

Stratified (Random) Sampling

The population is first divided into a number of parts or
'strata' according to some characteristic, chosen to be
related to the major variables being studied.
Water samples from three different water depths (1 m from the
surface, mid-depth, 1 m above seabed).

Water samples from a point source of pollution using a transect

(set away from the source to open sea) with fixed sampling
intervals (e.g. 1, 5, 10, 20, 50, 100, 500, 1000, 2000 m).
Sediment samples from the high (2 m of Chart Datum), mid (1 m
CD) and low intertidal zones (0.5 m CD).
Sediment and water samples from different beneficial uses in
Hong Kong waters.

Precision and Accuracy

Neither
precise
nor
accurate

Moderately
precise
and
accurate

Highly
precise
but not
accurate

Highly
precise
and
accurate

Quality Control & Quality Assurance

e.g. Total phosphate measurements for a water sample
Step 1: Pipette 1 ml
sample to a cuvette

Step 2: Pipette 0.5 ml

colour reagent

Precision can be
estimated using
procedure replicates.

Accuracy can be
checked with certified
standard reference
solutions.

Abs

Step 3: Reaction for

15 minutes

Conc.

QC & QA:
Control Chart

Lead 0.065 0.007

The measured mean value can be

compared with the certified mean
value using one-sample t-test.

Why is it so important to use the mean of the

means in the experimental design?

Central Limit Theorem

The mean will remain the same if a
mean of the means is used instead of
taking a simple mean but the SD of the
means will be substantially smaller
than the original sample SD.
For each water body, 50 samples are
taken. It is advantageous if they are
grouped into 5 groups of 10 samples to
compute the mean of the means. This
will increase the power for subsequent
comparison with other sites.

True Replication vs. Pseudo- Replication

Control

Treatment A

Treatment B

Will it be correct to say that there are four replicates

per group? If not, why?

True Replication vs. Pseudo- Replication

Control

Treatment A

Treatment B

With the same replication

arrangement as those in
the Control.

Mean 1

Mean 2

Mean 3

Will it be correct to say that there are three replicates per

group? If yes, why?

A Bathing Beach

Strom drainage
outfall

Wave breaker

Sea

How can we obtain a statistically sound

figure of E. coli count for this bathing
beach?

True Replication vs. Pseudo- Replication

Site A

Site B

Site C

With the same replication

arrangement as those in
the Site A.

Five replicates per group and each replicate with three

procedure replicates to ascertain the measurement precision.

True Replication vs. Pseudo- Replication

Site A

3 Sub-sites

Three replicated sites per site, each replicated site with three
replicate samples and each sample with three procedure
replicates to ascertain the measurement precision.

Inferential Statistics

Frequency Distribution

Sediment grain sizes

e.g.

8.2
5.3
5.2
5.5
4.3
4.2

The particle sizes (m) of 37 grains

from a sample of sediment from an
estuary
Define
6.3
6.8
6.4
8.1
6.3
convenient
7.0
6.8
7.2
7.2
7.1
classes (equal
width) and
5.3
5.4
6.3
5.5
6.0
class intervals
5.1
4.5
4.2
4.3
5.1
e.g. 1 m
5.8
4.3
5.7
4.4
4.1
4.8
3.8
3.8
4.1
4.0
4.0

Frequency Distribution

e.g. A frequency distribution table for the size of particles

collected from the estuary

Particle size (m)

3.0 to under 4.0
4.0 to under 5.0
5.0 to under 6.0
6.0 to under 7.0
7.0 to under 8.0
8.0 to under 9.0

Frequency
2
12
10
7
4
2

Frequency Histogram

Frequency

3 to <4

4 to <5

5 to <6

6 to <7

7 to <8

8 to <9

Particle size (m)

e.g.

A frequency distribution for the size of particles

collected from the estuary

e.g. A frequency distribution of height of the 30

years old people (n = 52: 30 females & 22 males)
14
12

Why bimodal-like ?

Frequency

10
8
6
4
2
0

>149-153>153-157>157-161>161-165>165-169>169-173>173-177>177-181>181-185

Height (cm)

The Normal Curve

f(x) = [1/(2)]exp[(x )2/(22)]

Parameters and determine
the position of the curve on the
x-axis and its shape.

Until 1950s, it was then applied

to environmental problems.
(P.S. non-parametric statistics
were developed in the 20th
century)

0.09
0.08

Probability density

Normal curve was first

expressed on paper (for
astronomy) by A. de Moivre in
1733.

0.10

0.07
0.06

male
female

0.05
0.04
0.03
0.02
0.01
0.00
140

150

160

170

Height (cm)

180

190

f(x) = [1/(2)]exp[(x )2/(22)]

0.50

Probability density

0.40

N(10,1)
N(20,1)

0.30
0.20

N(20,2)

N(10,3)

0.10
0.00
0

20
X

Normal distribution N(,)

Probability density function: the area under the
curve is equal to 1.

The Standard Normal Curve with a Mean = 0

(Pentecost 1999)

= 0, = 1 and with the total area under the curve = 1

units along x-axis are measured in units
Figures: (a) for 1 , area = 0.6826 (68.26%); (b) for 2
95.44%; (c) the shaded area = 100% - 95.44%

Inferential statistics - testing the null hypothesis

Alternatively, we can state the null hypothesis
as that a random observation of Z will lie
outside the limit -1.96 or +1.96.
There are 2 possibilities:
Either we have chosen an unlikely value
of Z, or our hypothesis is incorrect.
Conventionally, when performing a
significant test, we make the rule that if
Z values lies outside the range 1.96, then the null hypothesis is rejected and
the Z value is termed significant at the 5% level or = 0.05 (or p < 0.05) critical value of the statistics.
For Z = 2.58, the value is termed significant at the 1% level.

Accept Ho

Reject Ho

Statistical Errors in Hypothesis Testing

Consider court judgements where the accused is
presumed innocent until proved guilty beyond
reasonable doubt (I.e. Ho = innocent).
If the accused is If the accused is
truly innocent
truly guilty
(Ho is true)
(Ho is false)
Courts
decision:
Guilty

Wrong
judgement

Courts
decision:
Innocent

Wrong
judgement

Statistical Errors in Hypothesis Testing

Similar to court judgements, in testing a null
hypothesis in statistics, we also suffer from
the similar kind of errors:

If Ho is rejected

If Ho is true

If Ho is false

Type I error

No error

If Ho is accepted No error

Type II error

Statistical Errors in Hypothesis Testing

For example, Ho: The average ammonia concentrations are similar
between the suspected polluted Site A and the reference clean
Site B, i.e. A = B
If Ho is indeed a true statement about a statistical population,
it will be concluded (erroneously) to be false 5% of time (in case
= 0.05).
Rejection of Ho when it is in fact true is a Type I error (also
called an error).
If Ho is indeed false, our test may occasionally not detect this
fact, and we accept the Ho.
Acceptance of Ho when it is in fact false is a Type II error
(also called a error).
Minimization of Type II error is vitally essential for environmental management.

Power of a Statistical Test

Power is defined as 1-.

is the probability to have Type II error.

Power (1- ) is the probability of rejecting
the null hypothesis when it is in fact false
and should be rejected.
Probability of Type I error is specified as .
But is a value that we neither specify nor
known.

Power of a Statistical Test

However, for a given sample size n, value is
related inversely to value.
Lower p of committing a Type I error is
associated with higher p of committing a Type II
error.
The only way to reduce both types of error
simultaneously is to increase n.

For a given , a large n will result in statistical

test with greater power (1 - ).

What is next?
1. Group Discussion on the Experimental
Design for a Case Study

2. Introduction to Two Classes of Basic

Statistical Techniques:
(1) correlation based methods and
(2) group comparison methods
3. Power Analysis

Measurements
(data)

Descriptive
statistics

Normality Check
Frequency histogram
(Skewness & Kurtosis)
Probability plot, K-S
test

YES

Data transformation

Median, range,
Q1 and Q3

Mean, SD, SEM,

95% confidence
interval
Data transformation
Check the
Homogeneity
of Variance

YES
Parametric Tests
Students t tests for
2 samples; ANOVA
for 2 samples; post
hoc tests for
multiple comparison
of means

Non-Parametric
Test(s)
For 2 samples: MannWhitney
For 2-paired samples:
Wilcoxon
For >2 samples:
Kruskal-Wallis
Sheirer-Ray-Hare

Ball-Balls Flowchart

Power Analysis with G*Power

A. Comparing Two Samples
Independent Samples t test
B. Comparing More than 2 Samples
Analysis of Variance (ANOVA)

G*Power 3 Free Software

http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/

Mr. Student = Mr. William Sealey Gosset (1876 1937)

Photo source: http://www-groups.dcs.st-and.ac.uk/~history/PictDisplay/Gosset.html

Measurements
(data)

Descriptive
statistics

Normality Check
Frequency histogram
(Skewness & Kurtosis)
Probability plot, K-S
test

YES

Data transformation

Median, range,
Q1 and Q3

Mean, SD, SEM,

95% confidence
interval
Data transformation
Check the
Homogeneity
of Variance

YES
Parametric Tests
Students t tests for
2 samples; ANOVA
for 2 samples; post
hoc tests for
multiple comparison
of means

Non-Parametric
Test(s)
For 2 samples: MannWhitney
For 2-paired samples:
Wilcoxon
For >2 samples:
Kruskal-Wallis
Sheirer-Ray-Hare

Ball-Balls Flowchart

The difference between two sample

means with limited data
If n<30, the above method gives an unreliable estimate of z
This problem was solved by Student who introduced the t-test
early in the 20th century
Similar to z-test, but instead of referring to z, a value of t is
required (Table B3 in Zar)
df = 2n - 2 for n1 = n2
For all degrees of freedom
below infinity, the curve
appears leptokurtic
compared with the normal
distribution, and this
property becomes
extreme at small degrees
of freedom.
Figure source: Pentecost 1999

Mean measured unit

Comparison of 2 Independent Samples

Measured unit

Comparison of 2 Independent Samples

B
Error bar = 95% C.I.

Measured unit

Comparison of 2 Independent Samples

B
Error bar = 95% C.I.

Measured unit

Comparison of 2 Independent Samples

Error bar = 95% C.I.

Power and sample size for Students t test

We can estimate the minimum sample size to use to
achieve desired test characteristics:

n (2SP2/2)(t, + t(1),)2
where is the smallest population difference we wish
to detect: = 1 - 2
Required sample size depends on , population
variance (2), , and power (1-)
If we want to detect a very small , we need a larger
sample.

If the variability within samples is great, a large n is

required. The results of pilot study or pervious study
of this type would provide such an information.

Estimation of minimum detectable difference

n (2SP2/2)(t, + t(1),)2
The above equation can be rearranged to
ask how small a population difference ()
is detectable with a given sample size:

[(2SP2/n)](t, + t(1),)

www.myspace.com/mtkchronicles

Some Notes about Effect Size

If aliens were to land on earth, how
long would it take for them to
realise that, on average, human
males are taller than females?
The answer relates to the effect
size (ES) of the difference in height
between men and women.
The larger the ES, the quicker they
would suspect that men are taller.

Cohen (1992) suggested where 0.2 is

indicative of a small ES, 0.5 a
medium ES and 0.8 a large ES.
http://spss.wikia.com/wiki/Sample_Size,_Effect_Size,_and_Power

A Students t Test with na = nb

Example 1

e.g. The chemical oxygen demand (COD) is measured at two

industrial effluent outfalls, a and b, as part of consent procedure.
Test the null hypothesis: Ho: a = b while HA: a b
a
3.48
2.99
3.32
4.17
3.78
4.00
3.20
4.40
3.85
4.52
3.09
3.62

b
3.89
3.19
2.80
4.31
3.42
3.41
3.55
2.40
2.99
3.08
3.31
4.52

mean

3.701

3.406

0.257

0.366

SS = sum of square = S2
sp2 = (SS1+ SS2) / (1+ 2) = [(0.257 11) + (0.366 11)]/(11+11)
= 0.312
sX1 X2 = (sp2/n1 + sp2/n2) = (0.312/12) 2 = 0.228
t = (X1 X2) / sX1 X2 = (3.701 3.406) / 0.228 = 1.294
df = 2n - 2 = 22
t = 0.05, df = 22, 2-tailed = 2.074 > t observed = 1.294, p > 0.05

The calculated t-value < the critical t value.

Thus, accept Ho.

Need to check Power.

Remember to always check the homogeneity of variance before running the t test.

Example 1

N = 2 x 48 = 96

Growth of 8 months old non-transgenic and transgenic tilapia

was determined by measuring the body mass (wet weight).
Since transgenic fish cloned with growth hormone (GH) related
gene OPAFPcsGH are known to grow faster in other fish
species (Rahman et al. 2001), it is hypothesized that HA:
transgenic > non-transgenic while the null hypothesis is given as Ho:
transgenic non-transgenic

Example 2

Ho: transgenic non-transgenic

HA: transgenic > non-transgenic
Given that mass (g) of tilapia are normally distributed.
transgenic non-transgenic
700
305
680
280
500
275
510
250
670
490
670
275
620
275
650
300

mean 625.0

306.25

5798.2

6028.6

Example 2

sp2 = (SS1+ SS2) / (1+ 2) = 5913.4

sX1 X2 = (sp2/n1 + sp2/n2) = 38.45
t = (X1 X2) / sX1 X2 = 8.29
df = 2n - 2 = 14

t = 0.05, df = 14, 1-tailed = 1.761 << 8.29 ; p < 0.001

The t-value is greater than the critical t value.

Thus, reject Ho.

If we are going to repeat this

study, can we reduce the
sample size? How many?
Remember to always check the homogeneity of variance before running the t test.

Example 2

N=2x3=6

Example 3

Comparison of [PBDEs] in tissues of

transplanted mussels collected from 6 sites
along a anticipated pollution gradient
Expected that high
[PBDEs] in samples from
polluted sites than clean
sites
Ha: unequal means
Ho: equal means

[PBDEs] in mussels from various sites

(ng/g)
P1

4.25

3.50

7.20

4.00

0.50

2.50

3.45

3.80

6.50

5.50

2.50

4.75

4.70

4.00

2.20

2.25

2.30

5.60

1.01

2.20

1.70

3.00

3.30

3.20

6.00

3.50

6.00

5.00

4.50

Example 3

Comparison of [PBDEs] in tissues of transplanted mussels collected

from 6 sites along a anticipated pollution gradient

ANOVA
Source of Variation

P-value

0.650981

0.663547

Between Groups

9.465417

1.893083

Within Groups

69.79308

2.908045

79.2585

[PBDEs] in mussels (ng/g)

Total

common SD

= 2.908

8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
1

1.705299

4
Sites

Example 3

N = 6 x 21 = 126

Example 4

2-Way ANOVA: Effects of dietary PCBs

and sex on heart rate in birds

Source of variance
Total
Cells
PCB
Sex
PCB x Sex
Within cells (error)

SS
1827.7
1461.3
1386.1
70.31
4.900
366.4

DF MS = SS/DF F
F critical, 0.05(1), 1, 16
P
19
3
1
1386.10 60.53
4.49
< 0.001
1
70.31 3.07
4.49
> 0.05
1
4.90 0.21
4.49
> 0.05
16
22.90

45
40

Female
Male

Heart rate (beat/min)

35
30
25
20
15
10
5
0

Control

PCB treated

There was a significant effect

of chemical treatment on the
heart rate in the birds (P
<0.001).
There was no interaction
between sex and hormone
treatment while the sex effect
was not significant (likely due
to inadequate power).
N=2x2x4

Example 4

Source of variance
Total
Cells
PCB
Sex
PCB x Sex
Within cells (error)

SS
1827.7
1461.3
1386.1
70.31
4.900
366.4

DF MS = SS/DF F
F critical, 0.05(1), 1, 16
P
19
3
1
1386.10 60.53
4.49
< 0.001
1
70.31 3.07
4.49
> 0.05
1
4.90 0.21
4.49
> 0.05
16
22.90

For the sex effect

Variance for sex = 70.31
Error variance = 22.90
N should be 2 x 2 x 7 = 28

Measurements
(data)

Due to
shortcomings of
inferential stats

Descriptive
statistics

Normality Check
Frequency histogram
(Skewness & Kurtosis)
Probability plot, K-S
test

YES

Data transformation

Median, range,
Q1 and Q3

Mean, SD, SEM,

95% confidence
interval
Data transformation
Check the
Homogeneity
of Variance

YES
Parametric Tests
Students t tests for
2 samples; ANOVA
for 2 samples; post
hoc tests for
multiple comparison
of means

Non-Parametric
Test(s)
For 2 samples: MannWhitney
For 2-paired samples:
Wilcoxon
For >2 samples:
Kruskal-Wallis
Sheirer-Ray-Hare

Alternatives to Hypothesis
testing exist

There are problems in the conventional hypothesis testing:

http://www.youtube.com/watch?v=ez4DgdurRPg

YouTube - Bayes' Formula

http://www.youtube.com/watch?v=pPTLK5hF
GnQ&feature=channel
By bionicturtledotcom

YouTube - Bayes' Theorem Part 2

http://www.youtube.com/watch?v=bcA
LcVmLva8&feature=related
By westofvideo

A Simple Example
1000 People

10 Exposed

8 Sick

2 Fine

990 Non-Exposed

95 Sick

895 Fine

What is the chance to be sick after eating scallops (i.e. exposed)?

Probability = 8 exposed with illness/(total of 103 with illness)
= 0.078

A Probability Diagram Bayesian Approach

1000 People (P=1)

P=0.010 Exposed

P=0.800
Sick

P=0.990 Non-Exposed

P=0.200
Fine

P=0.096
Sick

P=0.904
Fine

What is the chance to be sick after eating scallops (i.e. exposed)?

P(ExposedSick) =
=

P(Exposed) P(SickExposed)
P(Sick)

(0.010)(0.800)
(0.010*0.800+0.990*0.096)

= 0.078

This figure illustrates how

the natural frequency
approach can lead to
these same inferences
using the p(Pfiesteria)
estimate of 0.205. From
the figure, the likelihood
ratio can be calculated.
Mike Newman, et al. 2007.
Coastal and estuarine
ecological risk
assessment: the need for
a more formal approach
to stressor identification.
Hydrobiologia 577: 31-40.
Credit: M.C. Newman

Example: Fishkills
Yes
0.081
(810)

Yes
0.520

No
0.919
(9190)

High
Pfiesteria
Yes
0.205

No
0.480

421 Cases
of Kills with
Pfiesteria

389 Cases
of Kills without
Pfiesteria

Low
Oxygen

178 Cases
of Kills with
Low DO

No
0.780
632 cases
of Kills without
Low DO

High
Pfiesteria
No
0.795

1884 Cases
of no Kills
with Pfiesteria

Large
Fish Kill

Yes
0.081
(810)

Yes
0.220

Large
Fish Kill

Yes
0.095

7306 Cases
of no Kills
without Pfiesteria

No
0.919
(9190)

Low
Oxygen

873 Cases
of no Kills
with Low DO

No
0.905
8317 Cases
of no Kills without
Low DO

421 Cases of l arg e fish kills with high Pfiesteria levels

0.22346
1884 Cases of no l arg e fish kills with high Pfiesteria levels
178 Cases of l arg e fish kills with low dissolved oxygen concentrat ions
0.20389
873 Cases of no l arg e fish kills with low dissolved oxygen concentrat ions
Yes
0.081
(810)

0.22346
Likelihood Ratio
1.096
0.20389
Yes
0.520

Credit: M.C. Newman

Yes
0.205

No
0.480
389 Cases
of Kills without
Pfiesteria

Low
Oxygen

178 Cases
of Kills with
Low DO

No
0.780
632 cases
of Kills without
Low DO

High
Pfiesteria
No
0.795

1884 Cases
of no Kills
with Pfiesteria

Large
Fish Kill

Yes
0.081
(810)

Yes
0.220

No
0.919
(9190)

High
Pfiesteria

421 Cases
of Kills with
Pfiesteria

p( Fish Kill | Pfiesteria )

1.095
p( Fish Kill | Low DO)

Large
Fish Kill

Yes
0.095

7306 Cases
of no Kills
without Pfiesteria

No
0.919
(9190)

Low
Oxygen

873 Cases
of no Kills
with Low DO

No
0.905
8317 Cases
of no Kills without
Low DO

Urbanization

Sediment
concentrationsinorganics

Sediment
concentrationsPAHs

English sole (Pleuronectes vetulus) from Puget Sound

Marine Environmental Research 45: 47-67 (1998).

Sediment
concentrationsorganochlorines
(DDTs, chlordane)

Stomach
concentrationsInorganics

Fish liver tissue

concentrationsinorganics

Fish liver tissue

concentrationsPAHs

Stomach
concentrationsorganochlorines

Fish liver tissue

concentrationsorganochlorines

Fish liver
lesions
Fish sex

Credit: M.C. Newman

Stomach
concentrationsPAHs

Fish
mortality

Fish age

Software Exists for More Complex Situations

Credit: M.C. Newman

Supplemental Readings
Aven, T. & J.T. Kval y, 2002. Implementing the Bayesian paradigm in risk analysis.
Reliability Engineering and System Safety 78: 195-201.
Bacon, P.J., J.D. Cain & D.C. Howard, 2002. Belief network models of land manager
decisions and land use change. Journal of Environmental Management 65: 1-23.
Belousek, D.W., 2004. Scientific consensus and public policy: the case of Pfiesteria.
Journal Philosophy, Science & Law 4: 1-33.
Borsuk, M.E., 2004. Predictive assessment of fish health and fish kills in the Neuse
River estuary using elicited expert judgment, Human and Ecological Assessment
10: 415-434.
Borsuk, M.E., D. Higdon, C.A. Stow & K.H. Reckhow, 2001. A Bayesian hierarchical
model to predict benthic oxygen demand from organic matter loading in estuaries
and coastal zones. Ecological Modelling 143: 165-181.
Garbolino, P. and F. Taroni. 2002. Evaluation of scientific evidence using Bayesian
networks. Forensic Sci Intern. 125:149-155.
Newman, M.C. and D. Evans. 2002. Causal inference in risk assessments: Cognitive
idols or Bayesian theory? In: Coastal and Estuarine Risk Assessment. CRC Press
LLC, Boca Raton, FL, pp. 73-96.
Newman, M.C., Zhao, Y., and J.F. Carriger. 2007. Coastal and estuarine ecological
risk assessment: the need for a more formal approach to stressor identification.
Hydrobiologia 577: 31-40.
Uusitalo, L. 2007. Advantages and challenges of Bayesian networks in environmental
modeling. Ecol. Modelling 203:312-318.

Credit: M.C. Newman

YouTube - Bayes' Theorem

Introduction
http://www.youtube.com/watch?v=0NG
mrwu_BkY&feature=related
By westofvideo

Error Type (Type I & II)

http://www.youtube.com/watch?v=taE
mnrTxuzo&feature=related
By bionicturtledotcom

Notes On Elementary Probability and Statistics
100% (1)
Notes On Elementary Probability and Statistics
125 pages
Wolkite University: Department of Horticulture
100% (1)
Wolkite University: Department of Horticulture
167 pages
Spin The Wheel (24 CHOICES)
No ratings yet
Spin The Wheel (24 CHOICES)
48 pages
Grammar Noun
No ratings yet
Grammar Noun
19 pages
Lesson 2
No ratings yet
Lesson 2
32 pages
1 Standart Normal Distribution, Applications, Central Limit Theorem, Confidence Intervals
No ratings yet
1 Standart Normal Distribution, Applications, Central Limit Theorem, Confidence Intervals
44 pages
Community MCQ
50% (2)
Community MCQ
271 pages
Answer Key - Worksheet 3
100% (1)
Answer Key - Worksheet 3
11 pages
Chapter 3
No ratings yet
Chapter 3
19 pages
Centra Tendency
No ratings yet
Centra Tendency
55 pages
Unit 2 DS PDF
No ratings yet
Unit 2 DS PDF
97 pages
Ratios Worksheet PDF
No ratings yet
Ratios Worksheet PDF
2 pages
Basic Statistics For Research
100% (1)
Basic Statistics For Research
119 pages
Internet Concepts and Web Design PDF
100% (2)
Internet Concepts and Web Design PDF
191 pages
Introduction To Statistical Analysis
No ratings yet
Introduction To Statistical Analysis
41 pages
Grammar One SB
100% (1)
Grammar One SB
98 pages
Biostatistics Revision DR - NJ
No ratings yet
Biostatistics Revision DR - NJ
67 pages
75 Reasoning English 27-1-98
No ratings yet
75 Reasoning English 27-1-98
29 pages
Statistics
No ratings yet
Statistics
10 pages
Greenwood High School 2021 - 2022 Mathematics - Project 2: Aarav Batra Grade 9, B
No ratings yet
Greenwood High School 2021 - 2022 Mathematics - Project 2: Aarav Batra Grade 9, B
19 pages
Section 6 Slides PDF
No ratings yet
Section 6 Slides PDF
362 pages
Statistics
No ratings yet
Statistics
47 pages
Day 7 Biostatistics
No ratings yet
Day 7 Biostatistics
44 pages
QM Formula Class
No ratings yet
QM Formula Class
31 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
11 pages
Sampling Methods: - Reasons To Sample
No ratings yet
Sampling Methods: - Reasons To Sample
15 pages
Statistics Assignment 05
50% (2)
Statistics Assignment 05
14 pages
Week 1 Prob
No ratings yet
Week 1 Prob
31 pages
Ngacho Final Thesis PDF
No ratings yet
Ngacho Final Thesis PDF
218 pages
4.1 Introduction To Statistics SK 1
No ratings yet
4.1 Introduction To Statistics SK 1
76 pages
Summation Notation Involves
No ratings yet
Summation Notation Involves
5 pages
Philippine History and Geography
No ratings yet
Philippine History and Geography
193 pages
Data Management
No ratings yet
Data Management
50 pages
Review:: Prepared by Ms. Krissha Mae N. Ke, LPT
No ratings yet
Review:: Prepared by Ms. Krissha Mae N. Ke, LPT
23 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
17 pages
Analysing Quantitative Data
No ratings yet
Analysing Quantitative Data
56 pages
World Geography IQ Trivia Quiz
No ratings yet
World Geography IQ Trivia Quiz
31 pages
Lesson 5 - Quantitative Analysis and Interpretation of Data
No ratings yet
Lesson 5 - Quantitative Analysis and Interpretation of Data
78 pages
Improve English Skills (Writing, Listening, Reading, Speaking)
No ratings yet
Improve English Skills (Writing, Listening, Reading, Speaking)
4 pages
Communication
No ratings yet
Communication
114 pages
Communication
No ratings yet
Communication
114 pages
Summary Biometry
No ratings yet
Summary Biometry
51 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
Week 01 Introduction
No ratings yet
Week 01 Introduction
33 pages
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
No ratings yet
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
90 pages
PHD THESIS Enterprise Resource Planning Systems
100% (1)
PHD THESIS Enterprise Resource Planning Systems
289 pages
Pie Chart
No ratings yet
Pie Chart
7 pages
SF SF SF
No ratings yet
SF SF SF
18 pages
Math G6 Q2 Mod7 v2
No ratings yet
Math G6 Q2 Mod7 v2
22 pages
Lecture 7
No ratings yet
Lecture 7
7 pages
Philippine History With Politics and Governance
92% (13)
Philippine History With Politics and Governance
5 pages
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
No ratings yet
Mathematics As A Tool (Descriptive Statistics) (Midterm Period) Overview: This Module Tackles Mathematics As Applied To Different Areas Such As Data
33 pages
Introduction To Statistics: February 21, 2006
No ratings yet
Introduction To Statistics: February 21, 2006
34 pages
Likert Scale Response Options MWCC
No ratings yet
Likert Scale Response Options MWCC
5 pages
College Entrance Exam Reviewer (Day 5) : Brought To You by
No ratings yet
College Entrance Exam Reviewer (Day 5) : Brought To You by
16 pages
Chapter 1&2 Exercise Ce Statistic
No ratings yet
Chapter 1&2 Exercise Ce Statistic
19 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
Exercises CE Statistic
No ratings yet
Exercises CE Statistic
12 pages
Stat Review Lecture (Complete)
No ratings yet
Stat Review Lecture (Complete)
18 pages
FINAL EXAM IN E-WPS Office
No ratings yet
FINAL EXAM IN E-WPS Office
12 pages
Faseeh Stats Project
No ratings yet
Faseeh Stats Project
10 pages
2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C
No ratings yet
2: Methodology:: Name: Hassan Raza Roll:20-10755 Math-107-Project Section: C
10 pages
CHAPTER 3 The Acad Use
No ratings yet
CHAPTER 3 The Acad Use
5 pages
Legal Research and Thesis Writing
No ratings yet
Legal Research and Thesis Writing
5 pages
Excel Basics
100% (1)
Excel Basics
13 pages
A Performance Evaluation Framework of Construction
No ratings yet
A Performance Evaluation Framework of Construction
23 pages
Statistics: Organize Understand
No ratings yet
Statistics: Organize Understand
9 pages
Your Salary
No ratings yet
Your Salary
1 page