1 - Graphing and Statistical Analysis
1 - Graphing and Statistical Analysis
OBJECTIVE
In this experiment, you will learn how to statistically analyze data with Excel.
INTRODUCTION
Much of the work done in the laboratory involves measuring chemical properties. One
example is the mass of a sample. In many laboratory investigations you will do this year, a
primary purpose will be to find the mathematical relationship between two variables. For
example, you might want to know how the absorbance of a compound varies with
concentration. Finally, we may want to propose or test a theoretical relationship between
controlled and measured data. As these are all measured quantities, there is some degree of
experimental error. We will discuss these sources of data for each lab as it will be part of
understanding the particular measurement technique. We will want to test the data to see
if the error is systematic or random. We need tests to validate whether our mathematical
relationships are valid. Excel has many statistical tools to help us make those
determinations.
A single physical property such as the mass of a sample can be measured multiple times to
get the “true” value. Measurements will have two properties – accuracy and precision.
Accuracy is how close the measured value is to the “true” value. Precision is a measure of
the reproducibility of the measurement. You have probably heard of surveys where they
quote a number (how many people intend to vote for a particular candidate for President,
for example) and a measure of how well that number is known (the error). Accuracy is
harder to assess, but precision can be easily calculated.
The mean or average is a way to systematically determine the most accurate value from a
series of measurements. We normally use the arithmetic mean.
x=
∑ x i There are other types of means, but this will be the one we use most often. The
n
function for this in Excel is AVERAGE(). The second feature we use is the standard
deviation. There are a variety of ways to calculate standard deviation, depending on the
nature of the data set. The one we will use is:
√
❑
s= ∑ ( x ¿¿ i−x)2 / ( n−1 ) ¿The Excel function for this is STDEV.S. The key is the (n-1)
i
portion, as this designates that this is the standard deviation of a sample of a population.
The population is all the possible measurements of a property (which is infinite in many
cases). The sample are those measurements we actually make. The variance of the data is
the square of the standard deviation, s2. The coefficient of variance or relative standard
deviation is
100 × s
CV =RSD=
x
Variance∧RSD will be important later when we discuss the propagation of error .
Standard deviation gives the spread of values, but it does not tell us anything about the
distribution of the measurements. We can determine this, but we need a lot more
measurements.
If we consider the data from Miller and Miller for the concentration of nitrate, we have 50
replicates (repeated measurements). The mean is 0.500 µg/mL, the standard deviation is
0.0165 µg/mL. We can visually analyze this data via a histogram, which is a plot of the
frequency of a value occurring vs the value. You can do this by using the HISTOGRAM
function in the Data Analysis Toolpak. First you create a list of bins into which you sort
your date. In the case of the nitrate concentration data, there are values ranging from 0.47
µg/mL to 0.53 µg/mL. Start the HISTOGRAM function, and input data you want to sort in
the Input Range, and the range of bin values into the Bin Range. Choose the location where
you want the results to go (the Output Range) and click on the Chart Output radio button.
This will generate the following graph:
Histogram
15
Frequency
10
Frequency
5
0
0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53More
Nitrate Ion Concentration in ug/mL
Note that the data is roughly symmetrical around the mean (but not perfectly
symmetrical).
Gaussian Distributions
In theory, a measurement could have any value. The nitrate data is to two significant
figures because of the way it was measured (a standard balance, not an analytical one). A
continuous curve would better follow the population from which the sample was taken.
The formula for a normal or Gaussian distribution is:
1
exp [−( x−μ ) /2 σ ] where x is the measured value and y is the frequency with
2 2
y=
σ √2 π
which it occurs. The curve is symmetrical around µ and gets wider as σ gets larger. The key
things to remember about a Gaussian distribution are
Approximately 68% of the population values lie within ±σ of the mean
Approximately 95% of the population values lie within ±2σ of the mean
Approximately 99.7% of the population values lie within ±3σ of the mean
For our nitrate data 33 of the 50 results (66%) lie between 0.483 and 0.517, which is
decent agreement with a Gaussian model. Other types of data may obey other distributions,
but the Gaussian is most common.
In reality, we’re not going to make 50 measurements, so how do we get a good analysis?
Normally, we make 5 measurements and use the mean of this sample as an estimate of the
true value, µ. The measure of the error of the sample mean is the standard error of the
mean (SEM) and can be calculated from the standard deviation
σ
SEM = The t-test can allow to estimate the range over which we can have a given
√n
confidence, by using the formula
t s
x ± n −1
√n
The t value can be looked up in standard tables for a given confidence limit (95% and
99%).
General Method
1. Select data to be plotted. In this case select cells A2 through A6 and cells F2 through
F6. (hold the “Ctrl” key and click on each cell)
2. Click on the “Insert” tab at the top of the window.
3. Select a “Scatter” tab.
4. When the drop down menu appears, select the graph in the top left corner, “Scatter
with only Markers”.
5. After the chart appears it will add additional tool bar options and should bring you
right into Chart Tools.
6. Select the “Layout” tab.
7. Select the “Axis Titles” drop down menu.
8. Add both a horizontal and vertical axis label. (for the vertical, choose “Rotated
Title”)
9. Click on the chart where is says “Axis Title” and rename it “Concentration (M)”.
10. Change the vertical title to say “Absorbance”.
11. Click on the drop down menu “Chart Title”, and add an “Above Chart” title.
12. Change the chart name to the compound being shown.
13. Right click on the vertical scale.
14. Select “Format Axis”
15. Under “Axis Options”, change Minimum to “Fixed” and then just below the value of
your first point.
16. Rick click on the Horizontal Scale.
17. Repeat steps 14 and 15.
18. Click on the “Legend” drop down menu, and then select “None”.
19. Select the “Trendline” drop down menu.
20. Select “Linear Trendline”.
21. Again select the “Trendline” drop down menu.
22. Select “More Trendline Options”.
23. Check the boxes at the bottom which show “Display Equation on chart”, and “Display
R-squared value on chart”.
24. (Optional), to get full page chart, Select the “Design” tab under “Chart Tools”. Click
on “Move Chart Location” on the right of the tool bars. Choose the “New sheet” radio
button and name the chart as appropriate.
1. Have your data arranged in two columns (they do not have to be adjacent).
2. Choose Data, Data Analysis, Regression
3. Click on the Input Y Range Button
4. Click and drag to choose the Y data
5. Press Enter
6. Click on the Input X Range Button
7. Click and drag to choose the X data
8. Check the Output Range radio button
9. Select a cell for the upper left corner of the ANOVA data
10. Check the Residuals Plots and Line Fit Plots radio buttons
11. Press OK.
12. The slope will be labelled X Variable. The Y-intercept will be labelled y-intercept.
Practice Problems
1. In Excel, name column A Temperature(K), and name column B Volume(L). Enter the
following data in those two columns:
Temperature(K) Volume(L)
200 16
220 18
240 19
260 21
280 23
Create a graph plotting Volume vs. Temperature. Calculate the slope and the intercept.
2. Construct a Beer’s Law plot including a line fit plot and regression plot for the following
data. Calculate the extinction coefficient, correlation coefficient and intercept.
Inverse relationships
1. The volume of 0.005 mol of a gas at 273 K was measured at seven different pressures.
Pressure Volume 1/V n= 0.005
1.25 0.089653 11.15409 R = 0.0821
2.33 0.048097 20.79123 T = 273
2.67 0.041972 23.82514
3.11 0.036034 27.75138
3.78 0.029647 33.72997
4.03 0.027808 35.96079
5.33 0.021026 47.56105
Plot pressure vs. volume and note the shape of the line that results. Now plot pressure vs
1/V, and note the shape of the line that results. The slope of the line should be nRT.
Prepare a graph of the reciprocal of the NO2 concentration vs time. Calculate the slope.
Logarithmic relationships
Table 1. Below shows real experimentally determined Tm values for a particular DNA
molecule as a function of log[Na+]. Using this data, construct a plot of Tm vs log[Na+] and
determine the slot of the resultant linear fit.
Table 1.
Tm(K) log[Na+]
343.7 -1.301
346.5 -1.125
348.7 -0.9393
352.0 -0.6989
354.4 -0.5229
1/Tm ln[DNA]
2.877 × 10-3 -9.459
2.868 × 10-3 -8.940
2.861 × 10-3 -8.236
2.851 × 10-3 -7.340
2.842 × 10-3 -6.502
slope = -1.15 × 10-5; y intercept = 2.77 × 10-3; ΔHº = -1.72 × 105 cal/mol; ΔSº = 4.78 ×
102 cal/K mol.
3. We can study the decomposition of C2H4O into CH4 and CO by graphing the natural log of
the concentration of C2H4O vs time in seconds.
convert the time into seconds, and convert concentration into ln(concentration). Graph
ln(concentration) vs time and report the slope.
Using Excel 2013
Turning on the Analysis Toolpak
(Tip: The column width can be adjusted by placing the cursor on the line between
the two columns at the top. It will change to a vertical line with two arrows pointed
left and right. Double left click and the column will resize to the widest cell.)
6. Place cursor over box on bottom right of B1 (cursor will change to skinny plus)
7. Holding down the left mouse button, drag to include cells C1, D1. and E1
8. Now input your data for concentration and your four absorbances of each. Do NOT
include unknown data.
9. Enter “Avg. Abs.” in cell F1
10. In cell F2, type “=”.
11. Above A1, there should be a bar with a down arrow next to it. Click the down arrow
and highlight “AVERAGE”.
12. In the screen that comes up, click the box with the red arrow in it that is next to
13. Highlight cells B2, C2, D2, and E2, and then press enter.
14. Press the “OK” box.
15. The average should appear.
16. Select F2 so the cell is highlighted with a black box.
17. Place cursor over box on bottom right of F2 (cursor will change to skinny plus).
18. Holding down the left mouse button, drag to include cells F3 to F6
19. F2 through F6 will contain the average of the four scans.
20. Enter “Std. Dev.” In cell G1
21. In cell G2 type “=”
22. Above A1, there should be a bar with a down arrow next to it. Click the down arrow
and highlight “STDEV”.
23. Repeat Steps 12 though 19 to give the standard deviation of the four averages in
cells F2 though F6.