Surge
Surge
INTRODUCTION
One of the “new” families of functions that are being introduced in the early
years of the mathematics curriculum is the surge function, which is treated
at the calculus level in [2, 3] and at the precalculus level in [1]. The surge
function, whose graph is shown in Figure 1, has the form
where p > 0 and b > 0, which is equivalent to f(x) = Axp cx , 0 < c < 1.
Surge functions are used to model a variety of real-world applications, such
as the response to an initial dose of a drug (the level of the medication in
the bloodstream rises relatively rapidly to a peak and thereafter decays as
the drug is washed out of the body by the kidneys) or the body’s response
to an infection. Surge functions are also used to model the results of an
advertising campaign that initially causes a fast increase in sales, but which
then slowly diminish. From a modeling point of view, the initial surge is
81
PRIMUS March 2006 Volume XVI Number 1
accounted for by the power function term xp and the subsequent slow decay
is accounted for by the decaying exponential term e−bx or cx.
82
Gordon Fitting Surge Functions to Data
In this article, we look at this issue again and consider in detail an approach
that has the advantage of giving reasonably accurate results with a readily
available tool, as well as the approach of applying the least squares criterion
directly with the assistance of a CAS package.
83
PRIMUS March 2006 Volume XVI Number 1
C(t) = At1.44e−1.2t.
84
Gordon Fitting Surge Functions to Data
tion (4) shown in Figure 3 is 53,459.9. We will use this value for comparison
in our subsequent calculations.
85
PRIMUS March 2006 Volume XVI Number 1
86
Gordon Fitting Surge Functions to Data
for the constant term and for the three variables X1 , X2 , and X3 are then
the coefficients of the desired polynomial.
We now apply a similar approach to fitting a surge function to a set of
(x, y) data, and will then apply the procedure to the data on the concen-
tration levels of Viagra. Since the surge function we seek has the form
y = Axp e−bx ,
and so log y is a linear function of x and log x. Thus, we can set Y = log y,
X1 = x and X2 = log x and apply multivariate linear regression to an
extended table of values that also includes a column of log x values and a
column of log y values. In order to take logs of the t and the C values,
we need to avoid the obvious starting point where t = 0 and C = 0; we
do this by making a very minor change in the values of the two variables
at that point and use t = 0.05 instead of 0 and C = 10 instead of 0. We
presume that the researchers at Pfizer did the same in producing the graph
in Figure 2 on their website; otherwise, it is far more natural to use (0, 0) as
the starting point. For the Viagra data, we then have the extended Table
2.
C t log t log C
10 0.05 -1.30103 1
50 0.4 -0.39794 1.69897
320 0.6 -0.22185 2.50515
440 1.2 0.079181 2.643453
410 1.8 0.255273 2.612784
350 2.1 0.322219 2.544068
250 3 0.477121 2.39794
170 4 0.60206 2.230449
80 6 0.778151 1.90309
50 8 0.90309 1.69897
30 10 1 1.477121
20 12 1.079181 1.30103
12 18 1.255273 1.079181
6 24 1.380211 0.778151
87
PRIMUS March 2006 Volume XVI Number 1
When we “hit” this set of transformed data with the multivariate regression
features of Excel, we get the linear regression equation
10log C = C = 102.3190−0.1242t+0.7613log t
= 102.319010−0.1242t100.7613 log t
0.7613
= 208.45(10−0.1242)t10log t
= 208.45(0.7513)tt0.7613.
88
Gordon Fitting Surge Functions to Data
the fit is extremely poor. Yet, the logic leading up to this result seems rea-
sonable in the sense that the multivariate regression process produces the
best-fit plane to the transformed data and therefore leads us to expect a
much better fit. Let’s see what has gone wrong.
In Figure 6, we show the plot of the points (log t, log C). Notice that,
other than the left-most point (−1.5, 1), the remaining points are clustered
relatively tightly and mostly display a clear pattern. This suggests that the
results we get for the regression equation might be very sensitive to small
changes in the values of the coordinates at the left-most point in the sense
that this point may have a disproportionate effect on the coefficients in the
regression equation.
Moreover, the left-most point is what we estimated to avoid the problem
with taking logs of 0. And, because it involves a negative value for t, a
relatively minor change in the value of t near 0 would likely result in a
major change in the value of log t. In addition, when you look back at
Figure 2 (the Pfizer website graph), it is evident that this initial point is
the one for which it is hardest to estimate an accurate value.
Let’s see just how much of an effect we get by trying a slightly different
estimate for the value of t for this point. Instead of using t = 0.05, suppose
we try t = 0.10 and maintain the value C = 10. The resulting surge
function is C(t) = 619.44t0.8236e−0.2924t. The value for the coefficient has
changed dramatically from A = 208.45; the power in the power function
term has changed a bit from p = 0.7613 to p = 0.8236; and the multiple
in the exponential term has changed fairly minimally from b = −0.2859 to
b = −0.2924. However, the corresponding value of the sum of the squares
is now 700,741.1, which is almost four times as large as the previous value
of 192,812.4 and the resulting surge function is a far poorer fit to this data.
More significantly, a relatively small change in the estimate of the point
89
PRIMUS March 2006 Volume XVI Number 1
The associated value for the sum of the squares is 153,650.7. This is a
substantial improvement over the two preceding surge functions using mul-
tivariate regression with estimates of the point near the origin. However, it
is still considerably larger than the value of 53,459.9 we initially obtained
using the calculus argument, let alone the value of 25,815.1 that resulted
from the Mathematica routine for minimizing the sum of the squares.
Incidentally, there is another statistical measure used to assess how well
a multivariate linear function fits a set of data; it is the coefficient of multiple
determination and is denoted by R. It is the extension of the correlation
coefficient r to multivariate data. For the three functions we have created
using multivariate regression, the corresponding values are R = 0.8768,
R = 0.8581, and R = 0.8988, respectively. While all three values are
statistically significant, the fact that they are all quite close to one another
means that we should not make a definitive call on which of the three surge
functions is the best fit based solely on the value of R.
This still leaves one rather perplexing question. How can all three of
these surge functions based on multivariate linear regression be so much
poorer fits than the one based on calculus, let alone the one obtained us-
ing the computer search method? After all, multivariate linear regression
is supposed to produce the best fit! The key is that it does produce the
best linear fit to the set of transformed (t, log t, log C) data. If all we did
was to stop there, we would indeed have the best possible fit. However,
we started with (t, C) data and, in the process of transforming it via log-
arithms, we stretched the data values in a non-linear way. After we got
the corresponding multivariable linear regression equation, we undid the
original transformation, which entails another non-linear stretch, but this
time the inverse transformation is applied to the function, not to the data.
So, although the three regression planes we obtained were the best linear
90
Gordon Fitting Surge Functions to Data
fits to the three different sets of transformed data, the corresponding surge
functions are not necessarily the best, or even extremely good, fits to the
original data. They may be good fits if the original data fall very closely
into a surge function pattern; however, if the data is not extremely close to
such a pattern, the resulting function based on multivariate regression may
be a surprisingly poor fit.
A comparable situation arises with the curve fitting routines in calcula-
tors and in Excel; rather than directly fitting an exponential, logarithmic,
or power functions to a set of data, these routines transform the data (either
a semi-log plot or a log-log plot), find the regression line for the transformed
data, and then undo the transformation algebraically. In the process, one
obtains the best possible line for the transformed data, but in the process of
undoing the transformation, a nonlinear stretch takes place and the result-
ing function is not necessarily the best fit within that family of functions.
91
PRIMUS March 2006 Volume XVI Number 1
(the darker curve) spikes to a considerably higher level than either the data
or the surge function do; however, it dies out more slowly than the surge
function and so is a better fit to the data after about t = 12 hours.
Figure 7. The rational function (darker curve) vs. the surge function.
CONCLUSIONS
92
Gordon Fitting Surge Functions to Data
APPENDIX A:
PERFORMING DIRECT CURVE FITTING
IN MATHEMATICA
93
PRIMUS March 2006 Volume XVI Number 1
APPENDIX B:
PERFORMING MULTIVARIATE
REGRESSION IN EXCEL
With Excel’s Analysis ToolPak installed, enter the data values for the de-
pendent variable C in Column A, say, and those for the dependent variable
t in Column B and then create lists of values for log t in Column C and
log C in Column D, as is done in Table 2. Then click on Tools, followed
by Data Analysis, and finally Regression and OK. This will bring up the
Excel dialog box shown in Figure 9. In this dialog, the first box asks you
to Input Y Range; the Y-values are the values for the desired dependent
variable, here log C, which are in Column D. The second box asks you to
Input X Range; the X-values for t and log t are in Columns B and C. Next,
select the first option, Output Range, under Output options; this will give
the cells in which all the regression analysis output will appear. Select a
collection of cells that are empty, say from A20 to I47.
When you click on OK, Excel will perform the complete regression anal-
ysis and print the results in the cells that were indicated. Sample results
are shown in Figure 10. Of all the output results, the only ones that are of
significance to this discussion are the values for the regression coefficients
in Rows 36-38 and possibly the value for the coefficient of multiple deter-
mination R in Row 23. In particular, the constant coefficient is 2.428, the
coefficient of the first independent variable t is -0.0920 and the coefficient
of the second independent variable log t is 0.2251, leading to the regres-
sion equation Y = 2.428 − −0.0920X1 + 0.2251X2, which is equivalent to
log C = 2.428 − 0.0920t + 0.2251 log t.
94
Gordon Fitting Surge Functions to Data
95
PRIMUS March 2006 Volume XVI Number 1
BIOGRAPHICAL SKETCH
Sheldon Gordon is Professor of Mathematics at Farmingdale State Univer-
sity of New York. He is a member of a number of national committees
involved in undergraduate mathematics education and is leading a national
initiative to refocus the courses below calculus. He is the principal author
of Functioning in the Real World and a co-author of the texts developed
under the Harvard Calculus Consortium.
96