0% found this document useful (0 votes)
42 views

Data Analysis

This document discusses analyzing data from environmental samples. It notes that when seeing a measurement like "0.6 parts per trillion" of dioxin in fish, important questions to ask are what the number represents, why we care about the measurement, and what is known about the data quality and sampling. The document discusses how to interpret concentration units and examines an example data set of dioxin measurements in fish, looking at how assumptions about non-detect or "less than" values can affect summary statistics calculated from the data.

Uploaded by

Kwame Panyin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Data Analysis

This document discusses analyzing data from environmental samples. It notes that when seeing a measurement like "0.6 parts per trillion" of dioxin in fish, important questions to ask are what the number represents, why we care about the measurement, and what is known about the data quality and sampling. The document discusses how to interpret concentration units and examines an example data set of dioxin measurements in fish, looking at how assumptions about non-detect or "less than" values can affect summary statistics calculated from the data.

Uploaded by

Kwame Panyin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Data Analysis

Dr. Doug McLaughlin


November 18, 2013

The goal of this lecture is to


encourage deeper thinking about
data or values we see or derive
ourselves.

Whats In A Number?
The concentration of dioxin in fish is 0.6 parts
per trillion.

Whats In A Number?
The concentration of dioxin in fish is 0.6 parts
per trillion.
Important questions to ask:
About the objective (why do we care?)
About the value (what does this number
represent)
About the data (what else should we know about
the data set it comes from? Is data quality
acceptable?)

Some Good Questions To Ask


About the Value

What does 0.6 represent?

Mean? Median? Geometric mean?


From a representative sample?
How sure are we (confidence limits)?
Is it changing (trends over time)?
Data assumptions (e.g., not skewed, no values
below detection limits)?

Concentration Units
Concen
tration

SI
Prefix
Name

g/L
1 mg/L
1 ug/L
1 ng/L

-milli
micro
nano

1 pg/L
1 fg/L

Parts
per
thousand

million

billion

trillion

Factor
(Decimal
Notation)

1
0.001
0.000001
0.000000001

pico quadrillion 0.000000000001


femto quintillion 0.000000000000001

Factor
(Scientific
Notation,
g/L)

100
10-3
10-6
10-9

10-12
10-15

USEPA Data Quality Objectives Process:

A 7 Step Framework for Data Collection, Data Analysis, and


Decision-Making

USEPA Data Quality Objectives Process:

A Framework for Data Collection, Data Analysis, and DecisionMaking


How hazardous
is consuming
fish from a
river?
Compare
chemical
concentrations
in fish tissue with
health guidance.
Determine
chemical
concentrations
in representative
fish tissue.

USEPA Data Quality Objectives Process:

A Framework for Data Collection, Data Analysis, and DecisionMaking - continued

USEPA Data Quality Objectives Process:

A Framework for Data Collection, Data Analysis, and DecisionMaking - continued


Most
commonly
caught fish
species from
River x.
Measure
contaminant
concentrations
in whole body
tissue samples.
Compare to
health guidance
values.

USEPA Data Quality Objectives Process:

A Framework for Data Collection, Data Analysis, and DecisionMaking

USEPA Data Quality Objectives Process:

A Framework for Data Collection, Data Analysis, and DecisionMaking


How certain
must the
estimated mean
be, i.e., how
small must the
confidence
interval on the
estimated mean
be?
How many
samples are
needed ? What
analytical
method should
be used?

Example Data Set


37 dioxin
concentration
measurements from
fish collected
downstream of a
pulp and paper mill

Year0
0
0
0
0
0
3
3
3
7
7
7
7
7
7
7
7
7

Dioxin (ppt)
1.8
1.7
2.6
0.84
1.1
1.4
0.26
1.1
0.63
0.4
0.28
0.79
0.4
0.31
0.2
0.51
0.44

Year0
10
10
10
10
10
10
10
10
10
10
13
13
13
13
13
13
13
13
13
13

Dioxin (ppt)
<0.25
0.79
<0.39
<0.23
<0.25
0.37
0.37
0.27
0.30
0.26
0.12
0.24
0.24
0.29
0.36
0.38
0.33
0.24
0.38
<0.12

Data Summary Statistics

Assumes less thans are equal to the detection limit


Parameter

Value

37

Variance

0.29

Mean

0.57

S.D.

0.54

Median

0.37

Coef. Var.
(S.D./Mean)

95%

25th
percentile

0.26

Min.

0.12

75th
percentile

0.71

Max.

2.6

Example Data Set

Assumes less thans are equal to the detection limit

Understanding Data Distributions

Understanding Data Distributions

Examples of Normal and Lognormal


Distributions
Distribution Plot

Distribution Plot

Normal, Mean=0.57, StDev=0.29

Lognormal, Loc=0.57, Scale=0.57, Thresh=0

0.5

1.4
1.2

0.4

0.8

De nsity

Density

1.0

0.6
0.4

0.2
0.1

0.2
0.0

0.3

-0.5

0.0

0.5
X

1.0

1.5

0.0

Making Assumptions About


Censored Values (Nondetects)

Assume/substitute specific values for NDs


0, detection limit, full detection limit are
common substitutions

Convenient, but can lead to incorrect


conclusions in certain cases.

Hard to predict when problems will arise

Are there alternatives? Yes. One example is


the Kaplan-Meier procedure for estimating a
mean.

Effect of Less Than Substitution


Assumption
DL = detection limit
S.D. = standard deviation
Coef. Var. = coefficient of
variation

Parameter

ND=DL

ND=0

37

37

Variance

0.29

0.32

Mean

0.57

0.53

ND=1/2
DL
37

0.55

0.30

S.D.

0.54

0.56

0.55

Median

0.37

0.36

0.36

75th percentile

0.71

0.71

0.71

Coef. Var.
(S.D./Mean)
25th percentile
Min.

Max.

95%

0.26

0.12

2.6

106%

0.24

2.6

100%

0.24

0.06

2.6

Effect of Less Than Substitution


Assumption
DL = detection limit
S.D. = standard deviation
Coef. Var. = coefficient of
variation

Parameter

ND=DL

ND=0

37

37

Variance

0.29

0.32

Mean

0.57

0.53

ND=1/2
DL
37

0.55

0.30

S.D.

0.54

0.56

0.55

Median

0.37

0.36

0.36

75th percentile

0.71

0.71

0.71

Coef. Var.
(S.D./Mean)
25th percentile
Min.

Max.

95%

0.26

0.12

2.6

106%

0.24

2.6

100%

0.24

0.06

2.6

Effect of Less Than Substitution


Assumption
DL = detection limit
S.D. = standard deviation
Coef. Var. = coefficient of
variation

Parameter

ND=DL

ND=0

37

37

Variance

0.29

0.32

Mean

0.57

0.53

ND=1/2
DL
37

0.55

0.30

S.D.

0.54

0.56

0.55

Median

0.37

0.36

0.36

75th percentile

0.71

0.71

0.71

Coef. Var.
(S.D./Mean)
25th percentile
Min.

Max.

95%

0.26

0.12

2.6

106%

0.24

2.6

100%

0.24

0.06

2.6

Example Data Set


37 dioxin
concentration
measurements
from fish
collected
downstream of
a pulp and
paper mill

Sample No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Dioxin (ppt)
1.8
1.7
2.6
0.84
1.1
1.4
0.26
1.1
0.63
0.4
0.28
0.79
0.4
0.31
0.2
0.51
0.44

Sample No.
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Dioxin (ppt)
<0.25
0.79
<0.39
<0.23
<0.25
0.37
0.37
0.27
0.30
0.26
0.12
0.24
0.24
0.29
0.36
0.38
0.33
0.24
0.38
<0.12

You might also like