Statistical Testing For Treatment of Data I - Module 2
Statistical Testing For Treatment of Data I - Module 2
Introduction
Let me begin my lecture by recalling some essential information you already knew about
this topic. Let me ask you few questions before I start my discussions.
Why is experiment necessary to subjects like chemistry? What activities were carried out
when you performed your experiment? What did you obtain when you performed your
experiment? How did you manipulate the results of your experiment? What did you do then with
these manipulated results?
You are all made aware that being an experimental science, chemistry involves
laboratory activities designed to explain scientific theories. In your general chemistry laboratory,
you did not only measure the value of the scientific property but you also gathered data as part of
the experimentation. Perhaps the bigger challenge that you encountered was how you were going
to manipulate or treat the data systematically in order to find the value of the property that was
being measured.
Interestingly, statistics will teach you how to manipulate or treat experimental data
systematically. As a scientific study, statistics is not only used in treating experimental data as it
is also utilized extensively in almost all fields of disciplines. One common situation that every
one of you might have already encountered is finding the simple mean or the average of the
values. This is a situation that we usually experience not only in the laboratory particularly in
treatment of data but also in every day activity.
This module covers the introductory concepts of statistics. The topics that I included in
this module are pre-requisite to statistical testing and treatment of data. Some of these topics
might have already been discussed in high school mathematics. But because of their extensive
use in the statistical treatment of data in introductory analytic chemistry, it is necessary that we
review these topics.
: 3 hours
Before we proceed, let us first check your prior knowledge and skills on the
topics included in this module. Please take the test below. For multiple-choice
questions, just encircle the letter you think is the correct answer. Otherwise,
provide the correct answer to the question. You have 30 minutes to do this.
1. This is the value that occurs with the 2. This is the value of the middle term in
highest frequency in a data set. a data set that has been ranked in
a. mean b. median increasing order.
c. mode d.standard deviation a. mean b. median
c. mode d. standard deviation
Check your answer against the KEY found at the end of this module.
How did you perform in the test? If your score is in the range 8-10, you are equipped to succeed
in your study in this module. If you scored 6 or 7, you have the potential of doing far better. If
you scored 4 or 5, you probably need to focus more on the reading materials and the
mathematical procedure as we go along. But if your score is lower than 4, you need to review
your past lessons of the subject. I advise you to retake the test until you get a score of 4 or better.
Central value is expressed in three expressions and the use of each depends on how one
wants to describe and interpret the data.
where x1, x2, x3, . . . , xn are the individual values, n is the number of values, and 𝑥𝑖 the
sum of values of x.
1.1.2 Median
Another central value which is less commonly used is the median. It is the middle
numerical value in a set of values.
Example 1-1
Find the median of the five values 20.4, 20.6, 20.1, 20.7, and 20.0.
Rules:
Example 1-2
Rearranging,
20.4 + 20.6
median = = 20.5
2
1.1.3 Mode
The mode, which is not so common in analytical chemistry, of data is the value
that is most frequently repeated in the data set.
Example 1-3
The mode of the data 20.2, 20.1, 20.0, 20.1, 20.4, 20.0, 20.1, and 20.7 is 20.1
because it appeared in the data the most frequent at three times.
Example 1-4
Calculate the mean and the median for each of the following sets of data:
Solution
6.37 + 6.41
median = = 6.39
2
The terms precision and accuracy are often used when dealing with the uncertainties of
measured values. Precision is a measure of how closely individual measurements agree with one
another while accuracy refers to how closely individual measurements agree with the correct, or
“true,” value. The dart analogy in Figure 1-1 illustrates the difference between these two
concepts.
𝑒𝑟𝑟𝑜𝑟
relative error =
µ
Parts per hundred (pph) or percent error is the relative error multiplied by 100.
Parts per thousand (ppt) error is the relative error multiplied by 1000, and so on.
pph and ppt used as expressions of relative error should not be confused with or
used interchangeably as the percent concentration (mass %, volume %, and mole
%) and the ppt. Although the mathematical thought and the procedure of
calculating them may be the same in some respect, percent concentration and ppt
are some of the expressions of concentration.
Example 1-5
Calculate the absolute error, percent error, and parts per thousand error for the
mean of the following data set.
𝑒𝑟𝑟𝑜𝑟 0.05 𝑚𝑔
% error = x 100 = x 100 = 0.6
µ 8.27 𝑚𝑔
𝑒𝑟𝑟𝑜𝑟 0.05 𝑚𝑔
ppt error = x 1000 = x 1000 = 6
µ 8.27 𝑚𝑔
The average error is calculated like the average value or arithmetic mean except
that the individual errors rather than the individual values are used.
Precision, a term used often mistakenly used in place of accuracy, refers to the
agreement between values in a set of data. The fact that the values of replicate
measurements all agree well does not necessarily mean that they are close to the true
value. There are several common ways to express the precision of data, as shown in the
following:
ǀ 𝑥𝑖 − 𝑥 ǀ
𝑑 =
𝑛
n = number of observations
(𝑥 𝑖 − 𝑥 )2
s =
𝑛− 1
where 𝑥𝑖 = observation
n = number of observations
Example 1-6
Quantitative analysis of student obtained the following results for the determination of
isooctane in gasoline using Gas Chromatography.
Determination Percent
Number isooctane
1 3.83
2 3.97
3 3.94
4 3.88
5 3.94
6 3.90
Solution:
You can calculate the standard deviation of the given data set using the formula,
(𝑥 𝑖 − 𝑥 )2
s = . The formula indicates that the mean is necessary. So
𝑛− 1
you need to calculate first the mean.
3.83 + 3.97 + 3.94 + 3.88 + 3.94 + 3.90 41.60
𝑥 = = = 3.91%
6 4
𝑥𝑖 ǀ 𝑥𝑖 − 𝑥 ǀ (𝑥𝑖 − 𝑥 )2
0.0128
s = = 0.051%
6− 1
𝑑
relative average deviation =
𝑥
𝑠
relative standard deviation =
𝑥
Range
The range, is the absolute difference between the largest and smallest values in
the data set.
The equation for computing a pooled standard deviation from several sets of data
takes the form. You use the formula given by Holler, F. and Crouch, S. (2014) to solve
pooled standard deviation.
where N1 is the number of results in set 1, N2 is the number in set 2, and so forth. The
term Nt is the total number of data set pooled.
Glucose levels are routinely monitored in patients suffering from diabetes. The glucose
concentrations in a patient with mildly elevated glucose were determined at different
months through a spectrophotometric analytical method. The patient was placed on a low
sugar diet to reduce the glucose levels. The frequency of monitoring varies every month
as shown below and the days of the month when monitoring is conducted are chosen
randomly. The following results were obtained during a study to determine the
effectiveness of the low-sugar diet. Calculate a pooled estimate of the standard deviation
for the method.
Month 4 799, 745, 750, 774, 777, 800 771. 9 2950.86 22.2
758
Solution
For the first month, the sum of squares in the next to the last was calculated as follows:
+ (1100 – 1100.3)2
= 1687.43
6907.89
spooled = = 18.58 or 19 mg/L
24−4
Note that this pooled value is a better estimate of σ than any of the individual s values in
the last column. Note also that one degree of freedom is lost for each of the four sets.
Another essential element of experimentation is how you report data that is consistent
with the rules set in various standards. You look closely at the following topics that I am going to
explain. Understand the rules and procedures that I emphasize.
A numerical result is worthless to users of the data unless they know something about its
quality. Therefore, it is always essential to indicate your best estimate of the reliability of your
data. According to Holler and Crouch (2014), a much less satisfactory but more common
indicator of the quality of the data is significant figure convention.
All digits of a measured quantity, including the uncertain one, are called
significant figures. A measured mass reported as 2.2 g has two significant figures,
whereas one reported as 4.8405 g has five significant figures. The greater the number of
significant figures, the greater the certainty implied for the measurement.
Example 1-8
A sample that has a mass of about 25 g is placed on a balance that has a precision of ±
0.001 g. How many significant figures should be reported for this measurement?
Answer: five, as in the measurement 24.995 g or 25.005 g, the uncertainty being in the
third decimal place
4. Zeros at the end of a number are significant if a decimal point is written in the
number.
0.0500 g (3 sf) 5.0 cm (2 sf)
5. Zeros at the end of a number without a decimal point may or may not be significant.
Exponential notation can be used to indicate whether end zeros are significant.
For example, a mass of 20,700 mg can be written to show three, four, or five
significant figures depending on how the measurement is obtained such as the
accuracy of the instrument used:
(The exponential term 104 does not add to the number of significant figures.)
Example 1-9
How many significant figures are in each of the following measurements: (a) 3.549 g,
(b) 4.5 x 10-3 m, (c) 0.00146 mL?
Answers: (a) four, (b) two, (c) three
The purpose of doing dimensional analysis is to get the desired unit out of a given unit. In
doing so, the given quantity should be multiplied with a conversion factor. Conversion factor is
ratio of two equal quantities expressed in different units. The unit of the quantity in the
numerator is the unit desired and the unit in the denominator is similar to that of the given unit.
Conversion tables found in textbooks and general references are used as conversion factors.
Here are some pointers that you need to follow in carrying out dimensional analysis.
What do you need on top? (This is the desired unit in a single-step conversion.)
What do you know? (This refers to the conversion factor to be used that can be obtained
from conversion tables.)
How do you get there? (This is the mathematical manipulation to show how the desired
unit is obtained out of the given unit. Often, the desired unit cannot
be obtained from the given unit by using only one conversion
factor. This calls for the use of more than one conversion factor to get the
desired unit.)
Note: Aside from conversion tables, you may also obtain conversion factor from the
relationships of quantities given or cited in the situation or in the problem.
Example 1-10
An individual with a high cholesterol level has 232 mg cholesterol per 100.0 mL of
his blood. How many grams of cholesterol are in his blood if he has a total blood volume
of 5.2 L?
Solution
𝑚𝑔 𝑐ℎ𝑜𝑙𝑒𝑠𝑡𝑒𝑟𝑜𝑙
What are given: 232 ; 5.2 L of blood ; 1000 mg = 1 g
100.0 𝑚𝐿
What is required: grams (g) of cholesterol
What I know: 1 L = 1000 mL, 1g = 1000mg
What do I need on top: grams (g)
What do I need at the bottom: none
1000 mL 232 mg 1g
5.2 L blood x x x = 12 g
1L 100.0 mL 1000 mg
Note: In this example, only two (2) significant figures are included in the final
answer since only multiplication and divisions are involved, in which the number of
significant figure in the final answer must agree with the quantity with the least number
There are two kinds of numbers that are encountered in scientific work: exact
numbers are those whose values are known exactly while inexact numbers are those
whose values have some uncertainty. Exact numbers have defined values or can result
from counting objects. Inexact numbers are obtained from measurements whose
uncertainties may have caused by the inherent limitations of the equipment and by human
differences.
Exact Inexact
Example 1-11
Indicate whether the number is exact or inexact: (a) the mass of the 32-oz can of coffee;
(b) the volume of blood in the capillary tube; (c) the number of inches in a mile; (d) the
average height of the students in the class; and (e) the number of pages in you book.
Answers: (a), (b), and (d) are inexact while (c) and (e) are exact.
Example 1-12
The volume of a van container used to deliver frozen fishes is 35.00m3. What is the
volume in liters?
Example 1-13
The density of a certain substance is 1.945 x103kg/m3. What is its density in g/mL?
Solution
What is given: density equal to 1.945 x 103 kg/m3
What is required: density in g/mL
What you know: 1000 g = 1 kg; 1000 mL = 1L; 1000 L = 1 m3
What you need on top: grams (g)
The rule is applied one time to the first digit only following the last retained digit. Under
no circumstances should the rounding off be done sequentially. For example, 9.1547
should be rounded with three significant figures to 9.15 because 4 is less than 5. You
should not round off the 7, making the number 9.155, and then round off the 5, making it
9.16.
Example 1-14
Perform the following operations. Report the answer with the correct number of
significant figures.
Solutions
(a) 14.6481 (4 decimal places)
+ 17.347 (3 decimal places)
+ 44.31 (2 decimal places, least uncertain)
76.3051 → 76.31
Example 1-15
Perform the following operations. Report the answer with the correct number of
significant figures.
Solutions
(c) 34.60
x 2.46287
85.215302 → 85.22
Example 1-16
Compute the answer to the following expression using the correct number of significant
18.1 x 0.219
figures. + 12.045
2.7
Solution
The result of the multiplication/division should contain two significant figures (same as
the value 2.7). This rounded-off result is then added to 12.045, with the answer rounded
off according to the rules of addition.
18.1 x 0.219
+ 12.045
2.7
1.468111… + 12.045 (Only the underlined digits, up to the tenths digit
are significant of the first addend.)
1.468 . . . + 12.045 = 13.513 → 13.5
Exercise: What is the answer, with the correct number of significant figures, to the
following arithmetic expressions?
20.3 x 0.1533
104.228
You can now assess yourselves if you understand the mathematical and statistical
procedures that you just read. Consider the assignments given in the following page for your
exercise to reinforce what you learned from my discussions.
9. (a) What is the length of the pencil in the following figures if the scale reads in
centimeters? How many significant figures are in the measurement? (b) An oven
thermometer with a circular scale reading degrees Fahrenheit is shown. What temperature
does the scale indicate? How many significant figures are in the measurement? (c) The
analytical balance shown can read up to 0.1 mg. Which number is uncertain in the
reading? What place value is this?
10. Indicate the number of significant figures in each of the following measured quantities:
(a) 0.0234 cm2 (b) 5.500 mm (c) 5.404 x 102 km (d) 430.98 (e) 204.080
11. Carry out the following operations, and express the answer with the appropriate number of
significant figures:
(a) 340.55 – (3216.6/2.6) (b) (5.03 x 10-4) (3.8765)
(c) (0.0045 x 20.000.0) + (2813 x 12) (d) 863 x [1255 – (3.45 x 108)]
12. Perform the following conversions:
(a) 0.076 L to mL (b) 1.55 kg/m3 to g/L (c) 5.850 lb/ft3 to g/mL
13. (a) The recommended adult dose of Elixophillin®, a drug used to treat asthma, is 6 mg/kg of
body mass. Calculate the dose in milligrams for a 150-lb person if 1 kg = 2.205 lb (b) A
pound of coffee beans yields 50 cups of coffee (4 cups = 1 qt). How many millilitres of
coffee can be obtained from 1g of coffee beans?
In the later part of the module, you learned about the difference between exact and
inexact numbers. You also learned that exact number is not governed by the rules on counting
and operations involving significant figures because they have indefinite number of significant
figures. You also learned how to report measured data accurately with the correct number of
significant figures based on the accuracy of the instrument. Further, you now gained techniques
in performing dimensional analysis in converting a given unit into a desired unit.
Brown, T. L., et al. (2012). Chemistry the central science. 12th ed. Illinois: Pearson Education,
Inc.
Hargis, L.G. (1988). Analytical chemistry principles and techniques. New Jersey: Prentice-Hall,
Inc.
Holler, F. J. and Crouch, S. R. (2014). Skoog and West’s fundamentals of analytical chemistry.
9th ed. USA: Brooks/Cole CENGAGE Learning Inc.
Mann, P. S. (2011). Introductory statistics. New Jersey: John Wiley and Sons, Inc.
End of Module 1