CH7 - Statistical Data Treatment and Evaluation
CH7 - Statistical Data Treatment and Evaluation
A type I error occurs when we reject the hypothesis that two quantities
are the same when they are statistically identical.
A type II error occurs when we accept that they are the same when they
are not statistically identical.
Some of the most common applications of statistical data treatment are:
The size of the confidence interval, which is computed from the sample
standard deviation, depends on how well the sample standard deviation
s estimates the population standard deviation s.
Finding the confidence interval when s is known or s is a good estimate of s
In each of a series of five normal error curves, the relative frequency is plotted as a
function of the quantity z. The shaded areas in each plot lie between the values of 2z
and 1z that are indicated to the left and right of the curves. The numbers within the
shaded areas are the percentage of the total area under the curve that is included
within these values of z.
The confidence level (CL) is the probability that the true mean lies within a certain
interval and is often expressed as a percentage.
The probability that a result is outside the confidence interval is often
called the significance level.
CIfor x z
Tests of this kind use a null hypothesis, which assumes that the
numerical quantities being compared are the same.
Other significance levels, such as 0.01 (1%) or 0.001 (0.1%), may also
be adopted, depending on the certainty desired in the judgment.
Comparing an Experimental Mean with a Known Value
In many cases the mean of a data set needs to be compared with a
known value.
This is called a two-tailed test since rejection can occur for results in
either tail of the distribution.
For the 95% confidence level, the probability that z exceeds zcrit is 0.025
in each tail or 0.05 total.
If instead our alternative hypothesis is Ha: > 0, the test is said to be a
one-tailed test. In this case, we can reject only when z zcrit.
Rejection regions for the 95% confidence level.
(a)Two-tailed test for Ha: 0. Note the critical value of z is 1.96.
The critical value of z is 1.64 so that 95% of the area is to the left of zcrit
and 5% of the area is to the right.
In other cases, the results are used to determine whether two analytical
methods give the same values or whether two analysts using the same
methods obtain the same means.
Typically, both sets contain only a few results, and we must use the t test.
The alternative hypothesis is Ha: 1 2, and the test is a two-tailed test.
To get a better estimate of s than given by s1 or s2 (estimates of
population standard deviation) alone, we use the pooled standard
deviation.
s
s m1 1
The std deviation of the mean of analyst 1 is N1
2
The variance of the mean of analyst 1 is s 2
s1
m1 N1
s2
The variance of the mean of analyst 2 is sm2
N2
2 2 2
The variance of the difference s d between the means is
2 s d s m1 s m2
sd s12 s 22
The std deviation of the difference between the means is
N N1 N 2
Further assumption that the pooled standard deviation spooled is a better
estimate of than s1 or s2.
sd s 2pooled s 2pooled N1 N 2
s
N N1 N2 N1 N 2
x1 x 2
The test statistic t is now found from t
N1 N 2
s pooled
N1 N 2
If there is good reason to believe that the standard deviations of the two
data sets differ, the two-sample t test must be used.
However, the significance level for this t test is only approximate, and
the number of degrees of freedom is more difficult to calculate.
Paired Data
The paired t test uses the same type of procedure as the normal t test
except that we analyze pairs of data and compute the differences, d.
Where d
d i
The F test is also used in comparing more than two means and in linear
regression analysis.
The F test is based on the null hypothesis that the two population
variances under consideration are equal, H0: 21 = 22
The test statistic F, which is defined as the ratio of the two sample
variances (F = s21/s22), is calculated and compared with the critical value
of F at the desired significance level.
The null hypothesis is rejected if the test statistic differs too much from
unity.
The F test can be used in either a one-tailed mode or in a two-tailed
mode.
For a one-tailed test we test the alternative hypothesis that one variance
is greater than the other.
The larger variance always appears in the numerator, which makes the
outcome of the test less certain; thus, the uncertainty level of the F
values doubles from 5% to 10%.
7C Analysis of variance
The methods used for multiple comparisons fall under the general
category of analysis of variance, or ANOVA.
The alternative hypothesis Ha: at least two of the i’s are different.
For the calcium example, there are five levels corresponding to analyst 1, analyst
2, analyst 3, analyst 4, and analyst 5.
N1 N N N
x( ) x1 ( 2 ) x 2 ( 3 ) x3 ..... ( 1 ) x I
N N N N
The grand average can also be found by summing all the data values and
dividing by the total number of measurements N.
To calculate the variance ratio needed in the F test, it is necessary to obtain
several other quantities called sums of squares:
1.The sum of the squares due to the factor SSF is
SSF N1 ( x1 x) 2 N 2 ( x2 x) 2 N 3 ( x3 x) 2 ..... N I ( x I x) 2
2.The total sum of the squares SST is obtained as the sum of SSF and SSE:
SST = SSF + SSE
The total sum of the squares SST has N - 1 degrees of freedom.
Just as SST is the sum of SSF and SSE, the total number degrees of freedom N - 1
can be decomposed into degrees of freedom associated with SSF and SSE.
Since there are I groups being compared, SSF has I - 1 degrees of freedom.
This leaves N - I degrees of freedom for SSE. Or,
SST = SSF + SSE
(N - 1) = (I - 1) + (N - I)
Thus, the two mean squares should be nearly identical under these
circumstances. If the factor effect is significant, MSF is greater than MSE.
2 MSE
LSD t
Ng
where MSE is the mean square for error and the value of t has N – I
degrees of freedom.
7D Detection of gross errors
An outlier is a result that is quite different from the others in the data set.
The choice of criterion for the rejection of a suspected result has its perils. If
the standard is too strict so that it is quite difficult to reject a questionable result,
there is a risk of retaining a spurious value that has an inordinate effect on the
mean.
If we set a lenient limit and make the rejection of a result easy, we are likely to
discard a value that rightfully belongs in the set, thus introducing bias to the data.
2. If possible, estimate the precision that can be reasonably expected from the
procedure to be sure that the outlying result actually is questionable.
3. Repeat the analysis if sufficient sample and time are available. Agreement
between the newly acquired data and those of the original set that appear to be
valid will lend weight to the notion that the outlying result should be rejected.
4.If more data cannot be secured, apply the Q test to the existing set to see if the
doubtful result should be retained or rejected on statistical grounds.
5. If the Q test indicates retention, consider reporting the median of the set
rather than the mean. The median has the great virtue of allowing inclusion of all
data in a set without undue influence from an outlying value. In addition, the
median of a normally distributed set containing three measurements provides a
better estimate of the correct value than the mean of the set after the outlying
value has been discarded.