0% found this document useful (0 votes)
183 views

Statistics in Analytical Chemistry-Part 2: Instructor: Nguyen Thao Trang

This document provides an overview of statistical analysis techniques used in analytical chemistry, including hypothesis testing, comparing experimental means to known values, and comparing two experimental means. Hypothesis tests involve forming a null hypothesis and alternative hypothesis to determine if experimental results differ significantly from predicted values. Statistical tests like the z-test and t-test are used to evaluate experimental data depending on factors like sample size and known/unknown variability. Paired data analysis also uses a t-test to minimize non-relevant sources of variability between measurements.

Uploaded by

Leo Pis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views

Statistics in Analytical Chemistry-Part 2: Instructor: Nguyen Thao Trang

This document provides an overview of statistical analysis techniques used in analytical chemistry, including hypothesis testing, comparing experimental means to known values, and comparing two experimental means. Hypothesis tests involve forming a null hypothesis and alternative hypothesis to determine if experimental results differ significantly from predicted values. Statistical tests like the z-test and t-test are used to evaluate experimental data depending on factors like sample size and known/unknown variability. Paired data analysis also uses a t-test to minimize non-relevant sources of variability between measurements.

Uploaded by

Leo Pis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Analytical Chemistry

Lecture 2
Statistics in Analytical Chemistry- Part 2

Instructor: Nguyen Thao Trang


Outlines
• Hypothesis test

• Detection of gross errors

• Standardization and calibration

• Sampling

2
Hypothesis test
• Experimental results seldom agree exactly with those
predicted from a theoretical:
– Scientists/engineers frequently must judge whether a numerical
difference is a result of the random errors or systematic errors. Certain
statistical tests are useful in sharpening these judgments.
– To test this kind, we use null hypothesis, which assumes that the
numerical quantities being compared are not different.

• Specific examples of hypothesis tests:


– Compare with what is believed to be the true value;
– Compare the mean to a predicted or cutoff (threshold) value;
– Compare the means or the standard deviations from two or more sets
of data.

3
Hypothesis test
• Comparison an experimental mean with a known
value (true or predicted value).
– A large number of measurements or known σ.
– A small number of measurements or unknown σ.

• Comparison between two experimental means.


– t test for differences of the means.
– t test for paired data.

• Comparison of precision: F test

4
Comparing an Experimental Mean with a Known Value

• A statistical hypothesis test is used to draw conclusions about


the population mean μ and its closeness to the known value
μ0.

• A known value (μ0):


– The true or accepted value based on prior knowledge or experience.

– Predicted from theory.

– A threshold value for making decisions about the presence or absence


of a constituent.

5
Comparing an Experimental Mean with a Known Value

• Two contradictory outcomes:


1. Null hypothesis H0 : μ = μ0
2. Alternative hypothesis Ha :
– Reject the null hypothesis if μ ≠ μ0
– Reject the null hypothesis if μ>μ0 or μ<μ0

• Example: determining whether the concentration of lead in an


industrial wastewater discharge exceeds the maximum
permissible amount of 0.05 ppm:
– H0 : μ = 0.05 ppm

– Ha: μ > 0.05 ppm

6
Comparing an Experimental Mean with a Known Value

• Test procedure:
– Step 1: Formulation of an appropriate test statistic:
• z statistic: a large number of measurements or known σ.
• t statistic: small numbers of measurements with unknown σ.
• If not sure: use t statistic.

– Step 2: Identification of a rejection region:


• The null hypothesis is rejected if the test statistic lies within the
rejection region.

7
Comparing an Experimental Mean with a Known Value

• A large number of measurement (or known σ) – z test


statistic:
– State the null hypothesis H0: μ = μ0
– Form the test statistic:

– State the alternative hypothesis, Ha, and determine the rejection


region:
• For Ha: μ ≠ μ0, reject H0 if z ≧ zcrit or if z ≦ – zcrit
• For Ha: μ > μ0, reject H0 if z ≧ zcrit
• For Ha: μ < μ0, reject H0 if z ≦ –zcrit
– zcrit: critical value of z listed in Table 7.1 (Chapter 2- p.37)at
different values of confidence level.

8
Comparing an Experimental Mean with a Known Value

• A large number of measurement (or known σ) – z test


statistic:
– For Ha: μ ≠ μ0, reject H0 if z ≧ zcrit or if z ≦ – zcrit à reject for either a
positive value of z or for a negative value of z that exceeds the critical
value à two-tailed test
• At 95% confidence level: zcrit = 1.96:

9
Comparing an Experimental Mean with a Known Value

• A large number of measurement (or known σ) – z test


statistic:
– For Ha: μ > μ0, reject H0 if z ≧ zcrit à reject for a positive value of z
that exceeds the critical value à one-tailed test.
– For Ha: μ < μ0, reject H0 if z ≦ –zcrit à reject for a negative value of z
that exceeds the critical value à one-tailed test.
• At 95% confidence level:

10
Comparing an Experimental Mean with a Known Value

• A large number of measurement (or known σ) – z test


statistic:
– Example: A class of 30 students determined the activation energy of a
chemical reaction to be 27.7± 5.2 kcal/mol. Are the data in agreement
with the literature value of 30.8 kcal/mol at (1) the 95% confidence
level and (2) the 99% confidence level?
• Assuming that s should be a good estimate of σ. Our null
hypothesis is that μ = 30.8 kcal/mol, the alternative hypothesis is
μ≠ 30.8 kcal/mol.
• Calculate z:

• Look up for zcrit:


zcrit = 1.96 for the 95% confidence level
zcrit = 2.58 for the 99% confidence level
Since z (= -3.26) ≦ –1.96, we reject the null hypothesis at the
11
95% confidence level. Similar for 99% confidence level.
Comparing an Experimental Mean with a Known Value

• A small number of measurement (or unknown σ) – t test


statistic:
– State the null hypothesis H0: μ = μ0

– Form the test statistic:

– State the alternative hypothesis, Ha, and determine the rejection


region:
• For Ha: μ ≠ μ0, reject H0 if t ≧ tcrit or if t ≦ – tcrit

• For Ha: μ > μ0, reject H0 if t ≧ tcrit

• For Ha: μ < μ0, reject H0 if t ≦ – tcrit


– tcrit: critical value of t listed in Table 7.3 (Chapter 3- p.44) at
different values of confidence level.

12
Comparing an Experimental Mean with a Known Value

• A small number of measurement (or unknown σ) – t test


statistic:
– Example: A new procedure for the rapid determination of the
percentage of sulfur in kerosenes was tested on a sample known from
its method of preparation to contain 0.123% (μ0 = 0.123%) S. The
results were % S = 0.112, 0.118, 0.115, and 0.119. Do the data indicate
that there is a bias in the method at the 95% confidence level?
• The null hypothesis is H0: μ= 0.123% S, and the alternative
hypothesis is Ha: μ≠ 0.123% S.

• Look up Table 7.3: at 95% confidence level and degree of freedom


of 3: tcrit = 3.18
• Calculated t (-4.375) < -tcrit (-3.18) à a significant difference at the
95% confidence level and thus bias in the method. 13
Comparison of Two Experimental Means
• t test for differences in the means:
– Null hypothesis: 2 means are identical and that any difference is the
result of random errors: H0: μ1 =μ2
– Alternative hypothesis: Ha: μ1 ≠ μ2
– The test statistic t is calculated by:

• 𝑥̅ 1 and 𝑥̅ 2 are the means of set 1 and set 2.


• Where spooled is the pooled estimate of σ (Chapter 2 - p. 30).
• N1 and N2 are the numbers of results of set 1 and set 2.

– Obtain tcrit from Table 7.3 with the degree of freedom of (N1+ N2 -2)
– Compare t with tcrit:
• If 𝑡 < tcrit : null hypothesis is accepted à no difference between the means

• If 𝑡 > tcrit : null hypothesis is rejected à significant difference between the means
14
Comparison of Two Experimental Means
• t test for differences in the means:
– Example: 2 barrels of wine were analyzed for their alcohol content to
determine whether they were from different sources. On the basis of
6 analyses, the average content of the 1st barrel was 12.61% ethanol. 4
analyses of the 2nd barrel gave a mean of 12.53% alcohol. The 10
analyses yielded spooled of 0.070%. Do the data indicate a difference
between the wines?
– Null hypothesis H0: μ1 = μ2, and alternative hypothesis Ha: μ1 ≠ μ2.
– The test statistic t :

– tcrit at 95% confident level (degree of freedom: 10-2 = 8) = 2.31


– As 1.771 < 2.31 à null hypothesis is accepted: no difference in the
alcohol content between 2 barrels.

15
Comparison of Two Experimental Means
• Paired data:
– Use of pairs of measurements on the same sample to minimize
sources of variability that are not of interest.
– The paired t test uses the same type of procedure as the normal t test
except that pairs of data are analyzed.
– Null hypothesis is H0: μd = △0, where △0 is a specific value of the
difference to be tested, often zero.
– Alternative hypothesis: μd ≠ △0 ; μd <△0 or μd >△0
– The test statistic t :

∑+
, )*
• Where 𝑑̅ is the average difference 𝑑̅ = ; di: difference in each data pair
-
• sd is the standard deviation of the difference:

/
∑- 𝑑
∑123 𝑑 − 123
- /
𝑠𝑑 = 𝑁
𝑁−1
16
Comparison of Two Experimental Means
• Paired data:
– Example: A new automated procedure for determining glucose in
serum (Method A) is to be compared with the established method
(Method B). Both methods are performed on serum from the same 6
patients to eliminate patient-to-patient variability. Do the following
results confirm a difference in the two methods at the 95% CI?

– Hypotheses: If μd is the true average difference between 2 methods,


null hypothesis H0: μd = 0, alternative hypothesis, Ha: μd ≠ 0.
– Test statistic t:

– Since t > tcrit = 2.57 (at 95% CI and 5 degrees of freedom) à reject the
null hypothesis and conclude that 2 methods give different results. 17
Comparison of Precision: F test
• F test: can be used when
– Comparing the variances ( or standard deviations) of two populations
under the provision that the populations follow the normal (Gaussian)
distribution.
– Comparing more than two means and in linear regression analysis.

• F test for comparison of the variances:


– Null hypothesis H0: 𝜎1/ = 𝜎2/ ;
– Alternative hypothesis Ha: 𝜎1/ # ≠ 𝜎2/ (2 tailed test) or 𝜎1/ > 𝜎2/ (1
tailed test).
:3;
– Calculate test statistic F: 𝐹 = (place larger variance in nominator ).
:/;
– Compare F with Fcrit at desired significant levels.

18
Comparison of Precision: F test
• Critical values of F at the 0.05 significance level are shown:

– Two degrees of freedom: one associated with the numerator and the
other with the denominator.
– Can used in either a one-tailed mode or a two- tailed mode.
19
Comparison of Precision: F test
• Example: A standard method for the determination of CO level in
gaseous mixtures is known from many hundreds of measurements to have
a standard deviation s of 0.21 ppm CO. A modification of the method
yields a value for s of 0.15 ppm CO for a pooled data set with 12 degrees
of freedom. A 2nd modification, also based on 12 degrees of freedom, has
a s of 0.12 ppm CO. Is either modification significantly more precise than
the original?
– Null hypothesis H0: 𝜎𝑠𝑡𝑑 / = 𝜎 / (where 𝜎𝑠𝑡𝑑 / is the variance of the
standard method and is 𝜎 / the variance of the modified method).
Alternative hypothesis is one-tailed, Ha: 𝜎2 < 𝜎𝑠𝑡𝑑 /
– The variances of the modifications are placed in the denominator:
• Calculate test statistic F for 1st and 2nd modifications:

• Sstd is a good estimate of σ and the number of the degrees of


freedom from the numerator can be taken as infinite , at the 95%
confidence level is Fcrit 2.30. 20
Comparison of Precision: F test
• Example:
– F1 < Fcrit : accept the null hypothesis. There is no improvement in
precision.

– F2 > Fcrit : reject the null hypothesis. The 2nd method does appear to
give better precision a the 95% confidence level.

– Comparison between 2 methods:


• Null hypothesis: 𝜎1/ = 𝜎2/ ;
• Calculate test statistic F:

• With Fcrit = 2.69. Since F < 2.69, we must accept H0 and conclude that
the two methods give equivalent precision.

21
Detection of gross errors: Q test
• Q test is used to decide whether a suspected result should be
retained or rejected:

• Calculate Q:

Where xq is questionable result xq, its


nearest neighbor is xn, and w is the spread of
the entire set

• Compared with critical values


Qcrit in Table 7-5:
If Q > Qcrit, the questionable result
can be rejected with the indicated
degree of confidence.
22
Detection of gross errors: Q test

23
Detection of gross errors: Q test
• Example: The analysis of a calcite sample yielded CaO percentages of
55.95, 56.00, 56.04, 56.08, and 56.23. The last value appears anomalous;
should it be retained or rejected at the 95% confidence level?

– The difference between 56.23 and 56.08 is 0.15%. The spread (56.23 –
55.95 ) is 0.28%. Thus:

– For 5 measurements, Qcrit at the 95% confidence level is 0.71. Because


0.54 < 0.71, we must retain the outlier at the 95% confidence level.

24
Standardization and calibration
• Calibration:
– Determines the relationship between the analytical
response and the analyte concentration.

– Usually accomplished by the use of chemical standards.

– Standards comparison methods:

• Direct comparison: compare a property of the analyte with a


standard such that the property being tested matches or nearly
matches that of the standard.

• Titration procedure: the analyte reacts with a standardized


reagent (the titrant) in a reaction of known stoichiometry.
25
External standard calibration
• External standards:
– Prepared separately from the sample.

– Used to calibrate instruments and procedures when there are no


interference effects from matrix components in the analyte solution.

– Procedure:
• A series of such external standards containing the analyte in
known concentrations is prepared.
• Calibration is accomplished by obtaining the response signal
(absorbance, peak height, peak area) as a function of the known
analyte concentration.
• A calibration curve is prepared by plotting the data or by fitting
them to a suitable mathematical equation.

26
The least-squares method
• Assumptions:
1. A linear relationship actually exists between
the measured response y and the standard
analyte concentration x, described by
equation y = mx +b à regression model.

2. Any deviation of the individual points from


the straight line arises from error in the
measurement.

• The vertical deviation of


each point from the
straight line is called a
residual.

27
The least-squares method
• The least-squares method finds the sum of the squares of the
residuals SSresid and minimizes them.

Where xi and yi are individual pair of


data for x and y;
N is the number of data pairs
𝑥̅ and 𝑦> are average values of x and y

Slope: Standard deviation about the regression:

Intercept:

Standard deviation of the slope:


Standard deviation of the intercept:

28
The least-squares method
• Total sum of the squares, SStot, is defined as:

• Coefficient of determination (R2): measures the fraction of the


observed variation in y that is explained by the linear
relationship:

– The closer R2 is to unity, the better the linear model explains the y
variations.

29
Transformed variables
• Least-squares method can be applied to nonlinear models by
converting them into simple linear model as shown in Table
8.3:

30
Using excel
• Calculation of slope and intercept:

31
Using excel
• Plotting a graph and the least-squares fit
Create a chart using built-in
Chart Wizard of Excel

Right click on any data point


and then dick on Add
trendline

32
Sampling
• Sampling is one of the most important operations in a
chemical analysis.
• Chemical analyses use only a small fraction of the available
sample. The fractions of the samples that collected for
analyses must be representative of the bulk materials

33
Sampling

Sample size Type of analysis


> 0.1g Macro
0.01~0.1g Semimicro
0.0001~0.01g Micro
< 10–4 g Ultramicro

Analytical level Type of constituent


1%~100% Major
0.01%(100ppm)~1% Minor
1ppb~100ppm Trace
<1 ppb Ultratrace

34
Sampling
• Sampling is the process by which a sample population is
reduced in size to an amount of homogeneous material that can
be conveniently handled in the lab and whose composition is
representative of the population (unbiased estimate of
population mean).
• Ex. Analysis average lead concentration in 100 coins à
Population: 100 coins
- Each coin is a sampling unit or an increment
- Gross sample: 5 coinsà the collection of individual sampling
units or increments
- Lab sample: the gross sample is reduced in size and made
homogeneous

35
Sampling

36
Sampling: uncertainties
• Total error so = sampling error + method error

so = ( ssamp2 + sm2 )1/2


Note: When sm< ss/3, there is no point in trying to improve the
measurement precision.
• In designing a sampling plan the following points should be
considered.
– the number of samples to be taken
– the size of the sample
– should individual samples be analyzed
• or should a sample composed of two or more increments
(composite) be prepared
37
Sampling: uncertainties

38
Sampling: gross sample
• To minimize sampling errors, the sample must be collected at
an appropriate size.
- Too smallà different from the population;
- Too large à time and cost consuming!
Assuming two types of particles A (containing fixed conc. of analyte) and B
(without analyte), p is probability of randomly drawing A. If we collect a
sample containing n particles, the expected number of particles containing
analyte, nA is:
𝒏𝑨 = 𝒏𝒑
The standard deviation for the sampling:
𝒔𝒔𝒂𝒎𝒑 = 𝒏𝒑(𝟏 − 𝒑)
The relative standard deviation for the sampling:
𝒓𝒆𝒍 𝒏𝒑(𝟏 − 𝒑)K
𝒔𝒔𝒂𝒎𝒑 = 𝒏𝒑

𝟏L𝒑 𝟏
à𝒏= × 𝟐
39
𝒑 𝒔 𝒓𝒆𝒍
𝒔𝒂𝒎𝒑
Sampling: gross sample

40
Sampling: laboratory sample
• How many laboratory samples should be taken?
• If the measurement uncertainty sm has been reduced to less
than 1/3 ssamp à ssamp will limit the analysis precision.
• If the sampling standard deviation is known s:PQR :
ts:PQR
𝜇=U X±
n:PQR

• The number of samples will be determined by:


t / s:PQR
/
n:PQR = U
(X − 𝜇)/
Note: t is nsamp dependent, can be solved by iteration!
41
Sampling: laboratory sample
• If the sampling standard deviation is known 𝜎: :

• If the sampling standard deviation ss:

42
Sampling: laboratory sample
• The relative standard deviation 𝜎Y at a given confidence
interval:

• The number of samples N:

• Note: t is N dependent, can be solved by iteration!

43
Sampling: laboratory sample

44

You might also like