0% found this document useful (0 votes)
255 views44 pages

Lecture 3 2014 Statistical Data Treatment and Evaluation

This document discusses statistical methods used by experimentalists to evaluate data quality and sharpen judgments about measurements. It covers calculating confidence intervals to define a probable range for a population mean based on replicate results. It also discusses determining the number of replicates needed to achieve a given confidence interval width and estimating the probability that experimental means or a mean and true value differ. Additional topics include hypothesis testing, defining null hypotheses, significance levels, and using statistical tests to detect systematic errors or biases in analytical methods.

Uploaded by

Robert Edwards
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views44 pages

Lecture 3 2014 Statistical Data Treatment and Evaluation

This document discusses statistical methods used by experimentalists to evaluate data quality and sharpen judgments about measurements. It covers calculating confidence intervals to define a probable range for a population mean based on replicate results. It also discusses determining the number of replicates needed to achieve a given confidence interval width and estimating the probability that experimental means or a mean and true value differ. Additional topics include hypothesis testing, defining null hypotheses, significance levels, and using statistical tests to detect systematic errors or biases in analytical methods.

Uploaded by

Robert Edwards
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Statistical Data Treatment an

d Evaluation
Lecture 3

Experimentalist use statistical calculations t


o sharpen their judgments concerning the q
uality of experimental measurements. These
applications include:

Defining a numerical interval around the mean of a set o


f replicate analytical results within which the population
mean can be expected to lie with a certain probability. T
his interval is called the confidence interval (CI).

Determining the number of replicate measurements req


uired to ensure at a given probability that an experiment
al mean falls within a certain confidence interval.

Estimating the probability that (a) an experimental mean a


nd a true value or (b) two experimental means are differen
t.
Determining at a given probability level whether the precisi
on of two sets of measurements differs.
Comparing the means of more than two samples to deter
mine whether differences in the means are real or the resu
lt of random error. This process is known as analysis of v
ariance
Deciding whether what appears to be an outlier in a set of
replicate measurements is the result of a gross error or it i
s a legitimate result.

Confidence Intervals

Significance level and Confidence


level
Significance level

The probability that a result is outside the confidence interval is ofte


n called the significance level. When expressed as a fraction, the sig
nificance level is often given the symbol .

Confidence level

The confidence level is the probability value 1 associated with a


confidence interval, where is the level of significance. It can also
be expressed as a percentage and is sometimes called the confidence
coefficient.

The confidence level (CL) is related to on a percentage basis by


CL = (1 ) 100%

Areas under a Gaussian curve for various


values of z.

We may assume that 90 times out of 100, true mean, , will be within
1.64 of any measurement that we make.
Confidence level is 90% and the confidence interval is z = 1.64

Confidence interval when s is a


good approximation of Confidence level for
various values of z

Single measurement:
CI for = z
n measurement:
CI for = z
Keep in mind at all times that
confidence interval based on the
above equation apply only in the
absence of bias and only if we can
assume that s

Confidence
Level, %

50.0

0.67

68.0

1.00

80.0

1.28

90.0

1.64

95.0

1.96

95.4

2.00

99.0

2.58

99.7

3.00

99.9

3.29

EXAMPLE A

Determine the 80% and 95% confidence intervals for (a)


1108mg/L glucose and (b) the mean value (1100.3mg/L) for month 1 in
the example. Assume that in each part, s =19 is a good
estimate of .
Ans:
(a) z=1.28 and 1.96 for the 80% and 95% confidence levels:
80% CI = 11081.2819 = 110824.3 mg/L
95% CI = 11081.9619 = 110837.2 mg/L

(b)For the 7 measurements:

How many replicate measurements in month 1 in


Example A are needed to decrease the 95%
confidence interval to 1100.310.0 mg/L of glucose?

We thus conclude that 14 measurements are needed to provide a


slightly better than 95% chance that the population mean will lie
within 10 mg/L of glucose of the experimental mean.

Finding the confidence interval


when is unknown

CI for =
N.B. t z as the number of degree of freedom becomes infinite

Example of calculating a
confidence interval
Consider measurement of dissolved Ti in a standard seawater (NASS3):
Data: 1.34, 1.15, 1.28, 1.18, 1.33, 1.65, 1.48 nM
xDF = n 1 = 7 1 = 6
= 1.34 nM
s = 0.17

95% confidence interval


t(df=6, 95%) = 2.447
CI95 = 1.34 0.16 nM
50% confidence interval
t(df=6, 50%) = 0.718
CI50 = 1.34 0.05 nM

ts

Values of t for Various Levels of Probability


Degrees of
Freedom

80%

90%

95%

99%

99.9%

3.08

6.31

12.7

63.7

637

1.89

2.92

4.30

9.92

31.6

1.64

2.35

3.18

5.84

12.9

1.53

2.13

2.78

4.60

8.61

1.48

2.02

2.57

4.03

6.87

1.44

1.94

2.45

3.71

5.96

1.42

1.90

2.36

3.50

5.41

1.40

1.86

2.31

3.36

5.04

1.38

1.83

2.26

3.25

4.78

10

1.37

1.81

2.23

3.17

4.59

15

1.34

1.75

2.13

2.95

4.07

20

1.32

1.73

2.09

2.84

3.85

40

1.30

1.68

2.02

2.70

3.55

60

1.30

1.67

2.00

2.62

3.46

1.28

1.64

1.96

2.58

3.29

Statistical Aids to Hypothesis Testi


ng

Experimental results seldom agree exactly wi


th those predicted from a theoretical model.

Scientists and engineers frequently must judge


whether a numerical difference is a results of the
random errors inevitable in all measurements or a
result of systematic errors. Certain statistical tests
are useful in sharpening these judgments.
Tests of this kind make use of a null hypothesis,
which assumes that the numerical quantities bein
g compared are the same.

Null hypothesis

In statistics, a null hypothesis is a hypothesis that is presume


d true until statistical evidence in the form of a hypothesis test
indicates otherwise. It is a hypothesis that two or more popul
ations are identical.

The purpose of hypothesis testing is to test the viability of the


null hypothesis in the light of experimental data. Depending o
n the data, the null hypothesis either will or will not be rejecte
d as a viable possibility.

The null hypothesis is often the reverse of what the experimen


ter actually believes; it is put forward to allow the data to cont
radict it.

Hypothesis testing

Hypothesis testing is a method of inferential statistics.


An experimenter starts with a hypothesis about a population parameter ca
lled the null hypothesis.
Data are then collected and the viability of the null hypothesis is determi
ned in light of the data.
If the data are very different from what would be expected under the ass
umption that the null hypothesis is true, then the null hypothesis is rejecte
d.
If the data are not greatly at variance with what would be expected under
the assumption that the null hypothesis is true, then the null hypothesis is
not rejected.
Failure to reject the null hypothesis is not the same thing as accepting the
null hypothesis.

Significance level

The probability of a false rejection of the null hypothesis in a statistical


test. Also called level of significance.

The significance level of a test is the probability that the test statistic wi
ll reject the null hypothesis when the hypothesis is true.

In hypothesis testing, the significance level is the criterion used for reje
cting the null hypothesis.

The significance level is used in hypothesis testing as follows:

Firstly, the difference between the results of the experiment and the null hypothe
sis is determined.

Then, assuming the null hypothesis is true, the probability of a difference that lar
ge is computed.

Finally, this probability is compared to the significance level.

Significance level contd

If the probability is less than or equal to the significance lev


el, then the null hypothesis is rejected and the outcome is sa
id to be statistically significant.
Traditionally, experimenters have used either the .05 level (s
ometimes called the 5% level) or the .01 level (1% level)
The lower the significance level, the more the data must div
erge from the null hypothesis to be significant.
Therefore, the .01 level is more conservative than the .05 le
vel.
At 95 % confidence level, the significance level is 5%.

Rejection regions for the 95% confidence level.


(a) Two-tailed test for Ha : o. Note the critical
value of z is 1.96.
(b) One-tailed test for Ha : > o. Here, the critical
value of zcrit is 1.64, so that 95% of the area is to
the left of and 5% of the area is to the right.
(c) One-tailed test for Ha : < o . Here the critical
value is again 1.64, so that 5% of the area lies to
the left of zcrit.

Comparing an Experimental Me
an with the True Value

A common way of testing for bias in an analytical method


is to use the method to analyze a sample whose compos
ition is accurately known.
Bias in an analytical method is illustrated by the two curv
es on the next slide.
This show the frequency distribution of replicate results i
n the analysis of identical samples by two analytical met
hods.
Method A has no bias, so the population mean A is the t
rue value xt.
Method B has a systematic error, or bias, that is given by
bias = B - xt = B - A

Bias affects all the data in the set in the same way and that it can be
either positive or negative.

Comparing an experimental mean


with a know value contd
A common way of testing for bias in an analytical method is to use the me

thod to analyse a sample whose composition is accurately known


Bias are normally caused by systematic errors

1. State the null hypothesis: Ho : = t


2. Carry out the statistical testing :

- xt = t s / n

If a good estimate of is available, the above equation can be modified by


replacing t with z and s with

Detection of Systematic Error (Bias)


A standard material known to contain
38.9% Hg was analysed by
atomic absorption spectroscopy.
The results were 38.9%, 37.4%
and 37.1%. At the 95% confidence level,
is there any evidence for
a systematic error in the method?

x 37.8%

xi 113.4
s

x x t 11%
.

xi2 4208.30

4208.30 (113.4) 2 3
0.943%
2

Assume null hypothesis (no bias). Only reject this if

x xt ts

But t (from Table) = 4.30, s (calc. above) = 0.943% and N = 3

ts N 4.30 0.943
x xt 1.1%
x xt ts

3 2.342%

Therefore the null hypothesis is maintained, and there is no


evidence for systematic error at the 95% confidence level.

Comparing a measured result with a known value-example


Dissolved Fe analysis verified using NASS-3 seawater SRM
Certified value = 5.85 nM
Experimental results: 5.76 0.17 nM (n = 10)
known value x
5.85 5.76
tcalc
n
10 1.674
s
0.17
Compare to ttable; df = 10 - 1 = 9, 95% CL

ttable(df=9,95% CL) = 2.262


If |tcalc| < ttable, results are not significantly different at the 95% CL.
If |tcalc| ttable, results are significantly different at the 95% CL.
For this example, tcalc < ttest, so experimental results are not significantly
different at the 95% CL

Comparison of two experimental mea


ns

Here, the chemist has to judge whether a difference in th


e means of two sets of identical analyses is real and con
stitutes evidence that the samples are different or wheth
er the discrepancy is simply a consequence of random e
rrors in the two set

Again, If a good estimate of is available, the above


equation can be modified by replacing t with z and s
with

Comparing replicate measurements or comparing


means of two sets of data

Example: Given the same sample analyzed by two different


methods, do the two methods give the same result?

t calc
s pooled

x1 x 2

n1 n2

s pooled

n1 n2

s12 (n1 1) s 22 (n 2 1)
n1 n2 2

Will compare tcalc to tabulated value of t at appropriate df and C


L.
df = n1 + n2 2 for this test

N.B.

When we compare our test value of t with the critic


al value obtained from the table for the particular co
nfidence level desired.

If the absolute value of the test statistic is smaller than the


critical value, the null hypothesis is accepted and no signifi
cant difference between the means has been demonstrate
d. If tcalculated < t table (95%), the difference is not significant

A test value of t greater than the critical value of t indicates


that there is a significant difference between the means.
If tcalculated > t table (95%), the difference is significant

Comparing replicate measurements or compari


ng means of two sets of dataexample
Determination of nickel in sewage
sludge
Method
2: Spectrophotometry
using two different
methods
Method 1: Atomic absorption spe
ctroscopy
Data: 3.91, 4.02, 3.86, 3.99 mg/
g

x1
s1
n1

= 3.94 mg/g
= 0.07 mg/g
=4

Data: 3.52, 3.77, 3.49, 3.59 mg


/g

x2

= 3.59 mg/g

s2

= 0.12 mg/g

n2 = 4

Comparing replicate measurements or comparing means of


two sets of dataexample
s pooled

tcalc

s12 (n1 1) s22 (n2 1)


n1 n2 2

x1 x2

n1 n2

s pooled

n1 n2

(0.073) 2 (4 1) (0.13) 2 (4 1)
442

3.94 3.59
0.0993

(4)( 4)
44

0.0993

5.056

Compare to ttable at df = 4 + 4 2 = 6 and 95% CL.


ttable(df=6,95% CL) = 2.447
If |tcalc| ttable, results are not significantly different at the 95%.
CL.
If |tcalc| ttable, results are significantly different at the 95% CL.
Since |tcalc| (5.056) ttable (2.447), results from the two
methods are significantly different at the 95% CL.

Comparing replicate measurements or comparin


g means of two sets of data
Please Note: There is an important assumption
associated with this t-test:
It is assumed that the standard deviations (i.e., the
precision) of the two sets of data being compared are not
significantly different.

How do you test to see if the two std. devs. are different?

How do you compare two sets of data whose std. devs.


are significantly different?

The F test : comparison of


precision
The F test is used to compare the precision of two sets of data.

The sets do not necessarily have to be obtained from the same


sample as long as the samples are sufficiently alike that the sources
of random error can be assumed to be the same
The F test is a test designed to indicate whether there is a
significant difference between two methods based on their standard
deviations. F is defined in terms of the variance of the two
methods.
F = s12 / s22 V1 / V2
Where s12 > s22. There are two different degrees of freedom. If the
calculated F value exceeds a tabulated F value at selected
confidence level, then there is a significant difference between the
variances of the two methods.

The F test : comparison of precision

The F test provides insight into either two que


stions:

For the first of these applications

Is method A more precise than Method B?


Is there a difference between the precision of the t
wo methods?
The variance of the supposedly more precise proc
edure is always placed in the denominator

For the second of these applications

The larger variance appears in the numerator

Critical Values of F at the 5% Probability Level (95% confidence level)


Degrees of
Freedom
(Denominator)

Degrees of Freedom (Numerator)


2

10

12

20

19.00

19.16

19.25

19.30

19.33

19.40

19.41

19.45

19.50

9.55

9.28

9.12

9.01

8.94

8.79

8.74

8.66

8.53

6.94

6.59

6.39

6.26

6.16

5.96

5.91

5.80

5.63

5.79

5.41

5.19

5.05

4.95

4.74

4.68

4.56

4.36

5.14

4.76

4.53

4.39

4.28

4.06

4.00

3.87

3.67

10

4.10

3.71

3.48

3.33

3.22

2.98

2.91

2.77

2.54

12

3.89

3.49

3.26

3.11

3.00

2.75

2.69

2.54

2.30

20

3.49

3.10

2.87

2.71

2.60

2.35

2.28

2.12

1.84

3.00

2.60

2.37

2.21

2.10

1.83

1.75

1.57

1.00

F-test to compare standard deviations


From previous example:
Let s1 = 0.12 and s2 = 0.073

Fcalc

s12
s22

(0.12) 2
(0.073) 2

2.70

Note: Keep 2 or 3 decimal places to compare with Ftable.


Compare Fcalc to Ftable at df = (n1 -1, n2 -1) = 3,3 and 95% CL.
If Fcalc Ftable, std. devs. are not significantly different at 95% CL.
If Fcalc Ftable, std. devs. are significantly different at 95% CL.
Ftable(df=3,3;95% CL) = 9.28
Since Fcalc (2.70) < Ftable (9.28), std. devs. of the two sets of data are
not significantly different at the 95% CL. (Precisions are similar.)

Comparing replicate measurements or comparin


g means of two sets of data-revisited
The use of the t-test for comparing means was justified f
or the previous example because we showed that stan
dard deviations of the two sets of data were not signifi
cantly different.
If the F-test shows that std. devs. of two sets of data are s
ignificantly different and you need to compare the mea
ns, use a different version of the t-test

Comparing replicate measurements or comparing


means from two sets of data when std. devs. are si
gnificantly different

tcalc

x1 x2

s12 / n1 s22 / n2

DF

( s / n1 s / n2 )
2
2
2
2
( s1 / n1 )
( s2 / n2 )

n1 1
n2 1
2
1

2
2

Flowchart for comparing means of two sets of d


ata or replicate measurements
Use F-test to see if
std. devs. of the 2
sets of data are
significantly different
or not
Std. devs. are
significantly
different
Use the 2nd
version of the ttest (the beastly
version)

Std. devs. are not


significantly
different
Use the 1st version of
the t-test (see
previous, fully workedout example)

One last comment on the F-test

Note that the F-test can be used to simply test whether or not
two sets of data have statistically similar precisions or not.
Can be used to answer a question such as: Do method one a
nd method two provide similar precisions for the analysis of
the same analyte?

Errors in Hypothesis Testing

The choice of a rejection region for the null


hypothesis is made so that we can readily
understand the errors involved.

Type I error:

The error that results from rejecting H0 when it is true.


An unusual results occurred that put our test statistic z or t into the rejection regio
n.

Ex: at the 95% confidence level, there is a 5% chance that


we will reject the null hypothesis even though it is true

The significance level gives the frequency of rejecting H0 when it is true.

Type II error:

We accept H0 when it is false.


The probability of a type II error is given the symbol .

Errors in Hypothesis Testing cont


d

Making smaller (0.01 instead of 0.05) would appear to minimize the t


ype I error rate.
Decreasing the type I error rate, however, increases the type II error rat
e because they are inversely related.

If a type I error is much more likely to have serious consequences than a type II
error, it is reasonable to choose a small value of .
On the other hand, in some situations a type II error would be quite serious, and
so a larger value of is employed to keep the type II error rate under control.

As a general rule of thumb, the largest that is tolerable for the situatio
n should be used

This ensures the smallest type II error while


keeping the type I error within acceptable limits.

Next Class

Statistical Data Treatment and Evaluation contd

Q Test

Laboratory Management

Good Laboratory Practices


Laboratory Accreditation
Management Systems
Method Validation
Method Verification
Control Charts
Proficiency Testing
Etc.

You might also like