Regression Using Excel
Regression Using Excel
Regression Using Excel
n
i = 1
[y yfit]
2
(1)
where, y is the data point, yfit is the value of the curve
at point y, and SS is the sum of the squares of all the
data points. It involves the user providing initial esti-
mates of the parameter values upon which the first it-
eration calculates an initial SS value. The second iter-
58 / O C TO B ER 2001
Application Note
Nonlinear regression analysis of data using a spreadsheet
BY ANGUS M.BROWN
ation involves changing the parameter values by a
small amount and recalculating the SS. This process is
repeated many times to ensure that changes in the pa-
rameter values result in the smallest possible value of
SS. SOLVER employs the generalized reduced gradi-
ent (GRG) method of iteration.
2
The following example illustrates how to use the
spreadsheet programs SOLV E R function to fit data
with user-input nonlinear functions. The example used
is the Boltzmann equation, but any nonlinear function
can be used simply by substituting the relevant equation.
1
y =
1 + exp
(2)
where y is the dependent variable, x is the independent
variable (Voltage), and V and Slope are the parameter
values. V is the half activation voltage and Slope de-
scribes the slope at the point V and indicates the steep-
ness of the curve. This paper does not address the critical
issue of which functions are suitable to describe individ-
ual data, but this topic is discussed in detail elsewhere.
3 , 4
Configuring the spreadsheet for nonlinear regression
1. Input onto a spreadsheet the raw data in two
columns, the x column containing the independent
variable, and the y column containing the dependent
variable. This is illustrated as Columns A and B of
Figure 1a.
2. Graph the data contained in cells A2 to B20.
3. Enter labels in cells G1 to G8 to describe the con-
tents of the adjacent cells. In cell G1 enter V, which
will describe the parameter in cell H1. For cell H1 se-
lect the Insert menu, choose Name then Define for cell
H1. Name the cell V. Similarly, for cells G2 to G8, en-
(V x)
Slope
ter Slope, Mean of y, df, SE of y, R2, Critical t and CI,
r e s p e c t i v e l y. Name cells H2 to H8, Slope, Mean_of_y,
df, SE_of_y, RSQ, Critical_t and CI, respectively.
4. Insert initial estimates of the parameters V a n d
Slope into cells G1 and G2, respectively. Approximate
estimates are 80 and 30, respectively.
5. In Column C (Boltzmann) enter the equation de-
scribing the Boltzmann function. This has been rear-
ranged from Equation 2 into a form that the program
recognizes: =(1/(1+EXP((V-A2)/Slope))), where Va n d
Slope refer to the parameter values in cells H1 and H2.
6. Copy the equation from cell C2 down to and in-
cluding C20.
7. The mean of the y values is calculated by entering
the following formula in H3: =AV E R A G E ( B 2 : B 2 0 ) .
8. The degrees of freedom (df) is defined as the num-
ber of data points minus the number of parameters in the
function. It is calculated by entering the following for-
mula in H4: =COUNT(B2:B20)-COUNT(H1:H2).
9. The standard error of the y values is calculated by
entering this formula in H5: =SQRT ( S U M ( ( B 2 : B 2 0 -
C 2 : C 2 0 ) ^ 2 ) / d f ) .
H o w e v e r, because this formula must be expressed
as an array formula, press Ctrl+Shift+Enter. This en-
closes the whole formula within a pair of curly brack-
ets ({}), denoting it as an array formula.
10. The R
2
value, the correlation index or coeff i-
cient of determination, is calculated by entering the
following formula in H6 and expressing it as an array
formula as described above: =1-SUM((B2:B20-
C2:C20)^2)/SUM((B2:B20-Mean_of_y)^2).
11. In order for the confidence interval of the fit to
be calculated, the critical t value at a significance level
of 95%is calculated by entering the following formula
in H7: =tinv(0.05,df). The confidence interval is calcu-
lated by entering the following formula in H8: =Criti-
Figure 1 Spreadsheet template for nonlinear regr e s s i o n . a) Fo rmulae used in the curve fitting procedure. The (x , y) data are en -
tered into Columns A and B, respective l y, with Column C used to generate the fit based on the parameters in Cells H1 and H2.
Columns D and E calculate the 95% confidence interval around the fit. b) The solution of the fit calculated by SOLV E R .
60 / O C TO B ER 2001
c a l _ t * S E _ o f _ y. Enter the following formula in D2:
=C2+CI, and copy it down to D20. Similarly, enter
=C2-CI in E2 and copy down to E20. This calculates
the upper and lower confidence limits (95%) of the fit.
12. The SE of the y values, R
2
and CI, are automati-
cedure utilizing the SOLV E R function is performed and
the resulting curve fit overlaid on the data. In addition,
the R
2
value, an index of goodness of fit, and the 95%
confidence intervals are calculated and displayed.
Once the spreadsheet template has been set up, it
can be repeatedly used for new sets of data. If a new
function is used to describe data, it is entered manually
in Column C and the appropriate parameters are desig-
nated on the worksheet. It is important that the function
be entered in the correct format, since it is very easy to
make mistakes when converting formula into the sin-
gle line format that the program recognizes. Care
should also be taken when entering the initial parame-
ter estimates, because the iteration procedure may pro-
ceed in the wrong direction and a solution never found
if inappropriate values are entered.
The spreadsheet template described in this paper is
available for download from the authors Web site at
http://faculty.washington.edu/ambrown/.
References
1. Brown AM. A step-by-step guide to non-linear regression analy-
sis of experimental data using a Microsoft Excel spreadsheet.
Comp Prog Meth Biomed 2001; 65:191200.
2. Smith S, Lasdon L. Solving large sparse nonlinear programs us-
ing GRG. ORSAJournal on Computing 1992; 4:215.
3. Johnson ML. W h y, when, and how biochemists should use least
squares. Anal Biochem 1992; 206:21525.
4. Dempster J. Computer Analysis of Electrophysiological Signals.
London:Academic Press, 1993; 10432.
cally calculated: 0.122, 0.895, and 0.257, respectively.
13. Figure 1a illustrates the spreadsheet template
with the formulas used in the fitting protocol displayed.
14. Graph Columns C, D and E versus Column A
such that they are displayed as continuous lines on the
graph (shown in Figure 2a). It can be seen that the ini-
tial estimate (blue line) is not a good fit of the data
with large confidence limits (red line).
15. Open the SOLVER function, which can be
found under the Tools menu.
16. In Set Target Cell box enter RSQ.
17. Set the Equal To option to Max. SOLVER at-
tempts to maximize the value of R
2
.
18. In By Changing Cells box, enter V, Slope.
19. Choose Solve to perform the fit. The program
will iteratively cycle through the fitting routine,
changing the parameter values of V and Slope until
the largest value of R
2
is calculated. These changes
will be displayed on the spreadsheet template, as illus-
trated in cells H1 and H2 of Figure 1b . The optimal
values of V and Slope are 99.366 and 24.388, respec-
tively, and the maximal value of R
2
is 0.997. The blue
line in Figure 2b illustrates the best fit and it is clear
that it is an improvement over the fit provided by the
initial parameter values. A d d i t i o n a l l y, the confidence
intervals (red line) around the fit have been reduced.
Conclusion
The procedure described in this paper allows the user
to carry out nonlinear regression analysis of data within
an Excel spreadsheet without the need of specialist
curve fitting programs. The procedure involves manu-
ally entering data and graphing it. The curve fitting pro-
Figure 2 Boltzmann fit. a) This graph displays the ex p e ri m e n -
tal data points (filled circles), the fit based on the initial para m -
eter estimates (blue line), and the 95% confidence interva l s
(red line) around the fit. b) The fit as calculated by SOLV E R .
D r. Brown is Assistant Pro f e s s o r, Department of Neuro l o g y,
Box 356465, University of Washington School of Medicine,
Seattle, WA 98195-6465, U.S.A.; tel.: 206-616-8278; fax:
206-685-8100; e-mail: [email protected].
a
b
APPLICATION NOTE cont.