0% found this document useful (0 votes)
33 views9 pages

1 - Graphing and Statistical Analysis

Uploaded by

Dan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views9 pages

1 - Graphing and Statistical Analysis

Uploaded by

Dan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Excel for Data Analysis

OBJECTIVE
In this experiment, you will learn how to statistically analyze data with Excel.

INTRODUCTION
Much of the work done in the laboratory involves measuring chemical properties. One
example is the mass of a sample. In many laboratory investigations you will do this year, a
primary purpose will be to find the mathematical relationship between two variables. For
example, you might want to know how the absorbance of a compound varies with
concentration. Finally, we may want to propose or test a theoretical relationship between
controlled and measured data. As these are all measured quantities, there is some degree of
experimental error. We will discuss these sources of data for each lab as it will be part of
understanding the particular measurement technique. We will want to test the data to see
if the error is systematic or random. We need tests to validate whether our mathematical
relationships are valid. Excel has many statistical tools to help us make those
determinations.

Chemometrics is a developing field of chemistry involving the use of mathematical methods


to solve chemical problems, particularly those involving large data sets. A related field is
that of bioinformatics, which is the same idea applied to biological problems. We will talk
about that in class next semester. Essentially, we will be learning elementary chemometric
techniques.

Single Physical or Chemical Properties - Mean and Standard Deviation

A single physical property such as the mass of a sample can be measured multiple times to
get the “true” value. Measurements will have two properties – accuracy and precision.
Accuracy is how close the measured value is to the “true” value. Precision is a measure of
the reproducibility of the measurement. You have probably heard of surveys where they
quote a number (how many people intend to vote for a particular candidate for President,
for example) and a measure of how well that number is known (the error). Accuracy is
harder to assess, but precision can be easily calculated.

The mean or average is a way to systematically determine the most accurate value from a
series of measurements. We normally use the arithmetic mean.

x=
∑ x i There are other types of means, but this will be the one we use most often. The
n
function for this in Excel is AVERAGE(). The second feature we use is the standard
deviation. There are a variety of ways to calculate standard deviation, depending on the
nature of the data set. The one we will use is:



s= ∑ ( x ¿¿ i−x)2 / ( n−1 ) ¿The Excel function for this is STDEV.S. The key is the (n-1)
i
portion, as this designates that this is the standard deviation of a sample of a population.
The population is all the possible measurements of a property (which is infinite in many
cases). The sample are those measurements we actually make. The variance of the data is
the square of the standard deviation, s2. The coefficient of variance or relative standard
deviation is

100 × s
CV =RSD=
x
Variance∧RSD will be important later when we discuss the propagation of error .

Distribution of Repeated Measurements

Standard deviation gives the spread of values, but it does not tell us anything about the
distribution of the measurements. We can determine this, but we need a lot more
measurements.

If we consider the data from Miller and Miller for the concentration of nitrate, we have 50
replicates (repeated measurements). The mean is 0.500 µg/mL, the standard deviation is
0.0165 µg/mL. We can visually analyze this data via a histogram, which is a plot of the
frequency of a value occurring vs the value. You can do this by using the HISTOGRAM
function in the Data Analysis Toolpak. First you create a list of bins into which you sort
your date. In the case of the nitrate concentration data, there are values ranging from 0.47
µg/mL to 0.53 µg/mL. Start the HISTOGRAM function, and input data you want to sort in
the Input Range, and the range of bin values into the Bin Range. Choose the location where
you want the results to go (the Output Range) and click on the Chart Output radio button.
This will generate the following graph:

Histogram
15
Frequency

10
Frequency
5
0
0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53More
Nitrate Ion Concentration in ug/mL

Note that the data is roughly symmetrical around the mean (but not perfectly
symmetrical).

A digression into statistics – populations vs samples


The set of all possible measurements is the population. If there are no systematic errors,
the mean of the population is given the symbol µ, and is the true value of the nitrate ion
concentration. The mean of the sample, x , is an estimate of µ. The population has a
standard deviation, σ. The standard deviation, s, gives us an estimate of σ.

Gaussian Distributions
In theory, a measurement could have any value. The nitrate data is to two significant
figures because of the way it was measured (a standard balance, not an analytical one). A
continuous curve would better follow the population from which the sample was taken.
The formula for a normal or Gaussian distribution is:
1
exp [−( x−μ ) /2 σ ] where x is the measured value and y is the frequency with
2 2
y=
σ √2 π
which it occurs. The curve is symmetrical around µ and gets wider as σ gets larger. The key
things to remember about a Gaussian distribution are
Approximately 68% of the population values lie within ±σ of the mean
Approximately 95% of the population values lie within ±2σ of the mean
Approximately 99.7% of the population values lie within ±3σ of the mean
For our nitrate data 33 of the 50 results (66%) lie between 0.483 and 0.517, which is
decent agreement with a Gaussian model. Other types of data may obey other distributions,
but the Gaussian is most common.
In reality, we’re not going to make 50 measurements, so how do we get a good analysis?
Normally, we make 5 measurements and use the mean of this sample as an estimate of the
true value, µ. The measure of the error of the sample mean is the standard error of the
mean (SEM) and can be calculated from the standard deviation
σ
SEM = The t-test can allow to estimate the range over which we can have a given
√n
confidence, by using the formula
t s
x ± n −1
√n
The t value can be looked up in standard tables for a given confidence limit (95% and
99%).

How to Graph in Excel:

General Method

1. Select data to be plotted. In this case select cells A2 through A6 and cells F2 through
F6. (hold the “Ctrl” key and click on each cell)
2. Click on the “Insert” tab at the top of the window.
3. Select a “Scatter” tab.
4. When the drop down menu appears, select the graph in the top left corner, “Scatter
with only Markers”.
5. After the chart appears it will add additional tool bar options and should bring you
right into Chart Tools.
6. Select the “Layout” tab.
7. Select the “Axis Titles” drop down menu.
8. Add both a horizontal and vertical axis label. (for the vertical, choose “Rotated
Title”)
9. Click on the chart where is says “Axis Title” and rename it “Concentration (M)”.
10. Change the vertical title to say “Absorbance”.
11. Click on the drop down menu “Chart Title”, and add an “Above Chart” title.
12. Change the chart name to the compound being shown.
13. Right click on the vertical scale.
14. Select “Format Axis”
15. Under “Axis Options”, change Minimum to “Fixed” and then just below the value of
your first point.
16. Rick click on the Horizontal Scale.
17. Repeat steps 14 and 15.
18. Click on the “Legend” drop down menu, and then select “None”.
19. Select the “Trendline” drop down menu.
20. Select “Linear Trendline”.
21. Again select the “Trendline” drop down menu.
22. Select “More Trendline Options”.
23. Check the boxes at the bottom which show “Display Equation on chart”, and “Display
R-squared value on chart”.
24. (Optional), to get full page chart, Select the “Design” tab under “Chart Tools”. Click
on “Move Chart Location” on the right of the tool bars. Choose the “New sheet” radio
button and name the chart as appropriate.

Graphing via the Regression Tool

1. Have your data arranged in two columns (they do not have to be adjacent).
2. Choose Data, Data Analysis, Regression
3. Click on the Input Y Range Button
4. Click and drag to choose the Y data
5. Press Enter
6. Click on the Input X Range Button
7. Click and drag to choose the X data
8. Check the Output Range radio button
9. Select a cell for the upper left corner of the ANOVA data
10. Check the Residuals Plots and Line Fit Plots radio buttons
11. Press OK.
12. The slope will be labelled X Variable. The Y-intercept will be labelled y-intercept.

Practice Problems

I. Linear Relationships and the Slope of the Line

1. In Excel, name column A Temperature(K), and name column B Volume(L). Enter the
following data in those two columns:
Temperature(K) Volume(L)
200 16
220 18
240 19
260 21
280 23
Create a graph plotting Volume vs. Temperature. Calculate the slope and the intercept.
2. Construct a Beer’s Law plot including a line fit plot and regression plot for the following
data. Calculate the extinction coefficient, correlation coefficient and intercept.

Concentration (M) Absorbance


0.2 0.27
0.3 0.41
0.4 0.55
0.5 0.69
3. In my Ph.D. research, I synthesized the new complex ion [Ru(bpy)2((NO)(OH2)]2+. The
data below was obtained in water at a wavelength of 322 nm. Construct a Beer’s Law
plot including a line fit plot and regression plot for the following data. Calculate the
extinction coefficient, correlation coefficient and intercept.

Concentration (M) Absorbance


5.25 × 10-5 0.782
4.20 × 10-5 0.615
3.36 × 10-5 0.491
2.69 × 10-5 0.393
2.15 × 10-5 0.317

Inverse relationships

1. The volume of 0.005 mol of a gas at 273 K was measured at seven different pressures.
Pressure Volume 1/V n= 0.005
1.25 0.089653 11.15409 R = 0.0821
2.33 0.048097 20.79123 T = 273
2.67 0.041972 23.82514
3.11 0.036034 27.75138
3.78 0.029647 33.72997
4.03 0.027808 35.96079
5.33 0.021026 47.56105

Plot pressure vs. volume and note the shape of the line that results. Now plot pressure vs
1/V, and note the shape of the line that results. The slope of the line should be nRT.

2. We can study the decomposition of NO2 in NO and O2 by plotting 1/[NO2] vs t.


NO2(g)  NO(g) + ½O2(g)
Given the following data:
t (s) [NO2]
0 0.0831
4.2 0.0666
7.9 0.0567
11.4 0.0497
15.0 0.0441

Prepare a graph of the reciprocal of the NO2 concentration vs time. Calculate the slope.
Logarithmic relationships

1. The biological macromolecule DNA exists as a right-handed double stranded structure,


i.e. duplex, under physiological conditions. The double helix is stabilized by hydrogen
bonds between the constituent strands. At temperatures above physiological
conditions, the hydrogen bonds holding the two strands together will break, and the
DNA will become single stranded. This is called “melting” since it is an equilibrium
controlled phase transition, much like the melting of ice. The temperature at the
midpoint of the double stranded to single stranded transition is referred to the melting
temperature, or Tm. It is an indirect measure of the stability of the DNA duplex. The
magnitude of the Tm depends upon, among other things, the molar concentration of Na+,
and in some cases, the molar concentration of DNA. In particular, the Tm of a DNA
molecule is linearly related to the log of the Na+ concentration.

Table 1. Below shows real experimentally determined Tm values for a particular DNA
molecule as a function of log[Na+]. Using this data, construct a plot of Tm vs log[Na+] and
determine the slot of the resultant linear fit.

Table 1.
Tm(K) log[Na+]
343.7 -1.301
346.5 -1.125
348.7 -0.9393
352.0 -0.6989
354.4 -0.5229

Answer: slope = 13.56


2. For very short pieces of DNA, called oligomers, the Tm is also dependent on the natural
log of the concentration of the DNA as indicated by the following equation:
1
=(
Tm ∆ H °
R
) ln ⁡[ DNA ]+
∆ S°
∆ H°
where R is the gas constant (1.987 cal/K mol, ΔHº is the
standard enthalpy change for the double stranded to single stranded transition and ΔSº
is the standard entropy change. Given that the above equation is that of a straight line,
y=mx+b, where y = 1/Tm and x = ln[DNA], the enthalpy change can be determined from
the slope (i.e. m= R/ΔHº) and the entropy can be determined from the y intercept 9i.e. b
= ΔSº/ ΔHº.

Table 2 gives real experimentally determined values of 1/Tm as a function of [DNA].


Construct a plot of 1/Tm vs ln[DNA] and determined the slope and y intercept. Once
obtained, calculate ΔHº and ΔSº.

1/Tm ln[DNA]
2.877 × 10-3 -9.459
2.868 × 10-3 -8.940
2.861 × 10-3 -8.236
2.851 × 10-3 -7.340
2.842 × 10-3 -6.502

slope = -1.15 × 10-5; y intercept = 2.77 × 10-3; ΔHº = -1.72 × 105 cal/mol; ΔSº = 4.78 ×
102 cal/K mol.
3. We can study the decomposition of C2H4O into CH4 and CO by graphing the natural log of
the concentration of C2H4O vs time in seconds.

C2H4O(g)  CH4(g) + CO(g)

Given the data:


t (min) [C2H4O]
0 0.0860
50 0.0465
72 0.0355
93 0.0274
130 0.0174

convert the time into seconds, and convert concentration into ln(concentration). Graph
ln(concentration) vs time and report the slope.
Using Excel 2013
Turning on the Analysis Toolpak

1. Choose File then Options


2. Find the choice Add-ins
3. Select Analysis Toolpak
4. Click on Go at the bottom
5. A menu should come up asking to Manage Add-ins. Make sure Analysis Toolpak and
Solver are checked. Click OK.
6. To check to see if it worked, open a workbook, and click on Data. You should see in
the upper right hand corner two new options labeled Data Analysis and Solver.

How to calculate average and standard deviation

1. Right click on the bottom of the page where it says “Sheet 1”


2. Choose Rename
3. Rename with the complex being entered
4. In A1, enter “Concentration (M)”
5. In B1 enter “Absorbance”. Do not worry that part of A1 has been covered up.

(Tip: The column width can be adjusted by placing the cursor on the line between
the two columns at the top. It will change to a vertical line with two arrows pointed
left and right. Double left click and the column will resize to the widest cell.)

6. Place cursor over box on bottom right of B1 (cursor will change to skinny plus)
7. Holding down the left mouse button, drag to include cells C1, D1. and E1
8. Now input your data for concentration and your four absorbances of each. Do NOT
include unknown data.
9. Enter “Avg. Abs.” in cell F1
10. In cell F2, type “=”.
11. Above A1, there should be a bar with a down arrow next to it. Click the down arrow
and highlight “AVERAGE”.
12. In the screen that comes up, click the box with the red arrow in it that is next to
13. Highlight cells B2, C2, D2, and E2, and then press enter.
14. Press the “OK” box.
15. The average should appear.
16. Select F2 so the cell is highlighted with a black box.
17. Place cursor over box on bottom right of F2 (cursor will change to skinny plus).
18. Holding down the left mouse button, drag to include cells F3 to F6
19. F2 through F6 will contain the average of the four scans.
20. Enter “Std. Dev.” In cell G1
21. In cell G2 type “=”
22. Above A1, there should be a bar with a down arrow next to it. Click the down arrow
and highlight “STDEV”.
23. Repeat Steps 12 though 19 to give the standard deviation of the four averages in
cells F2 though F6.

You might also like