Statistics in Analytical Chemistry-Part 2: Instructor: Nguyen Thao Trang
Statistics in Analytical Chemistry-Part 2: Instructor: Nguyen Thao Trang
Lecture 2
Statistics in Analytical Chemistry- Part 2
• Sampling
2
Hypothesis test
• Experimental results seldom agree exactly with those
predicted from a theoretical:
– Scientists/engineers frequently must judge whether a numerical
difference is a result of the random errors or systematic errors. Certain
statistical tests are useful in sharpening these judgments.
– To test this kind, we use null hypothesis, which assumes that the
numerical quantities being compared are not different.
3
Hypothesis test
• Comparison an experimental mean with a known
value (true or predicted value).
– A large number of measurements or known σ.
– A small number of measurements or unknown σ.
4
Comparing an Experimental Mean with a Known Value
5
Comparing an Experimental Mean with a Known Value
6
Comparing an Experimental Mean with a Known Value
• Test procedure:
– Step 1: Formulation of an appropriate test statistic:
• z statistic: a large number of measurements or known σ.
• t statistic: small numbers of measurements with unknown σ.
• If not sure: use t statistic.
7
Comparing an Experimental Mean with a Known Value
8
Comparing an Experimental Mean with a Known Value
9
Comparing an Experimental Mean with a Known Value
10
Comparing an Experimental Mean with a Known Value
12
Comparing an Experimental Mean with a Known Value
– Obtain tcrit from Table 7.3 with the degree of freedom of (N1+ N2 -2)
– Compare t with tcrit:
• If 𝑡 < tcrit : null hypothesis is accepted à no difference between the means
• If 𝑡 > tcrit : null hypothesis is rejected à significant difference between the means
14
Comparison of Two Experimental Means
• t test for differences in the means:
– Example: 2 barrels of wine were analyzed for their alcohol content to
determine whether they were from different sources. On the basis of
6 analyses, the average content of the 1st barrel was 12.61% ethanol. 4
analyses of the 2nd barrel gave a mean of 12.53% alcohol. The 10
analyses yielded spooled of 0.070%. Do the data indicate a difference
between the wines?
– Null hypothesis H0: μ1 = μ2, and alternative hypothesis Ha: μ1 ≠ μ2.
– The test statistic t :
15
Comparison of Two Experimental Means
• Paired data:
– Use of pairs of measurements on the same sample to minimize
sources of variability that are not of interest.
– The paired t test uses the same type of procedure as the normal t test
except that pairs of data are analyzed.
– Null hypothesis is H0: μd = △0, where △0 is a specific value of the
difference to be tested, often zero.
– Alternative hypothesis: μd ≠ △0 ; μd <△0 or μd >△0
– The test statistic t :
∑+
, )*
• Where 𝑑̅ is the average difference 𝑑̅ = ; di: difference in each data pair
-
• sd is the standard deviation of the difference:
/
∑- 𝑑
∑123 𝑑 − 123
- /
𝑠𝑑 = 𝑁
𝑁−1
16
Comparison of Two Experimental Means
• Paired data:
– Example: A new automated procedure for determining glucose in
serum (Method A) is to be compared with the established method
(Method B). Both methods are performed on serum from the same 6
patients to eliminate patient-to-patient variability. Do the following
results confirm a difference in the two methods at the 95% CI?
– Since t > tcrit = 2.57 (at 95% CI and 5 degrees of freedom) à reject the
null hypothesis and conclude that 2 methods give different results. 17
Comparison of Precision: F test
• F test: can be used when
– Comparing the variances ( or standard deviations) of two populations
under the provision that the populations follow the normal (Gaussian)
distribution.
– Comparing more than two means and in linear regression analysis.
18
Comparison of Precision: F test
• Critical values of F at the 0.05 significance level are shown:
– Two degrees of freedom: one associated with the numerator and the
other with the denominator.
– Can used in either a one-tailed mode or a two- tailed mode.
19
Comparison of Precision: F test
• Example: A standard method for the determination of CO level in
gaseous mixtures is known from many hundreds of measurements to have
a standard deviation s of 0.21 ppm CO. A modification of the method
yields a value for s of 0.15 ppm CO for a pooled data set with 12 degrees
of freedom. A 2nd modification, also based on 12 degrees of freedom, has
a s of 0.12 ppm CO. Is either modification significantly more precise than
the original?
– Null hypothesis H0: 𝜎𝑠𝑡𝑑 / = 𝜎 / (where 𝜎𝑠𝑡𝑑 / is the variance of the
standard method and is 𝜎 / the variance of the modified method).
Alternative hypothesis is one-tailed, Ha: 𝜎2 < 𝜎𝑠𝑡𝑑 /
– The variances of the modifications are placed in the denominator:
• Calculate test statistic F for 1st and 2nd modifications:
– F2 > Fcrit : reject the null hypothesis. The 2nd method does appear to
give better precision a the 95% confidence level.
• With Fcrit = 2.69. Since F < 2.69, we must accept H0 and conclude that
the two methods give equivalent precision.
21
Detection of gross errors: Q test
• Q test is used to decide whether a suspected result should be
retained or rejected:
• Calculate Q:
23
Detection of gross errors: Q test
• Example: The analysis of a calcite sample yielded CaO percentages of
55.95, 56.00, 56.04, 56.08, and 56.23. The last value appears anomalous;
should it be retained or rejected at the 95% confidence level?
– The difference between 56.23 and 56.08 is 0.15%. The spread (56.23 –
55.95 ) is 0.28%. Thus:
24
Standardization and calibration
• Calibration:
– Determines the relationship between the analytical
response and the analyte concentration.
– Procedure:
• A series of such external standards containing the analyte in
known concentrations is prepared.
• Calibration is accomplished by obtaining the response signal
(absorbance, peak height, peak area) as a function of the known
analyte concentration.
• A calibration curve is prepared by plotting the data or by fitting
them to a suitable mathematical equation.
26
The least-squares method
• Assumptions:
1. A linear relationship actually exists between
the measured response y and the standard
analyte concentration x, described by
equation y = mx +b à regression model.
27
The least-squares method
• The least-squares method finds the sum of the squares of the
residuals SSresid and minimizes them.
Intercept:
28
The least-squares method
• Total sum of the squares, SStot, is defined as:
– The closer R2 is to unity, the better the linear model explains the y
variations.
29
Transformed variables
• Least-squares method can be applied to nonlinear models by
converting them into simple linear model as shown in Table
8.3:
30
Using excel
• Calculation of slope and intercept:
31
Using excel
• Plotting a graph and the least-squares fit
Create a chart using built-in
Chart Wizard of Excel
32
Sampling
• Sampling is one of the most important operations in a
chemical analysis.
• Chemical analyses use only a small fraction of the available
sample. The fractions of the samples that collected for
analyses must be representative of the bulk materials
33
Sampling
34
Sampling
• Sampling is the process by which a sample population is
reduced in size to an amount of homogeneous material that can
be conveniently handled in the lab and whose composition is
representative of the population (unbiased estimate of
population mean).
• Ex. Analysis average lead concentration in 100 coins à
Population: 100 coins
- Each coin is a sampling unit or an increment
- Gross sample: 5 coinsà the collection of individual sampling
units or increments
- Lab sample: the gross sample is reduced in size and made
homogeneous
35
Sampling
36
Sampling: uncertainties
• Total error so = sampling error + method error
38
Sampling: gross sample
• To minimize sampling errors, the sample must be collected at
an appropriate size.
- Too smallà different from the population;
- Too large à time and cost consuming!
Assuming two types of particles A (containing fixed conc. of analyte) and B
(without analyte), p is probability of randomly drawing A. If we collect a
sample containing n particles, the expected number of particles containing
analyte, nA is:
𝒏𝑨 = 𝒏𝒑
The standard deviation for the sampling:
𝒔𝒔𝒂𝒎𝒑 = 𝒏𝒑(𝟏 − 𝒑)
The relative standard deviation for the sampling:
𝒓𝒆𝒍 𝒏𝒑(𝟏 − 𝒑)K
𝒔𝒔𝒂𝒎𝒑 = 𝒏𝒑
𝟏L𝒑 𝟏
à𝒏= × 𝟐
39
𝒑 𝒔 𝒓𝒆𝒍
𝒔𝒂𝒎𝒑
Sampling: gross sample
40
Sampling: laboratory sample
• How many laboratory samples should be taken?
• If the measurement uncertainty sm has been reduced to less
than 1/3 ssamp à ssamp will limit the analysis precision.
• If the sampling standard deviation is known s:PQR :
ts:PQR
𝜇=U X±
n:PQR
42
Sampling: laboratory sample
• The relative standard deviation 𝜎Y at a given confidence
interval:
43
Sampling: laboratory sample
44