Prism 6 Curve Fitting Guide
Prism 6 Curve Fitting Guide
Prism 6 Curve Fitting Guide
Guide
GraphPad Software Inc.
www.graphpad.com
Table of Contents
Foreword
1 Understanding
...................................................................................................................................
mathematical models
8
What is a m odel?
.......................................................................................................................................................... 9
Three exam ple..........................................................................................................................................................
m odels
9
The problem ..........................................................................................................................................................
w ith choosing m odels autom atically
12
Advice: How to
..........................................................................................................................................................
understand a m odel
12
2 Principles
...................................................................................................................................
of linear regression
13
The goal of linear
..........................................................................................................................................................
regression
14
How linear regression
..........................................................................................................................................................
w orks
15
Com paring linear
..........................................................................................................................................................
regression to correlation
16
Com paring linear
..........................................................................................................................................................
regression to nonlinear regression
17
Advice: Look ..........................................................................................................................................................
at the graph
18
Advice: Avoid..........................................................................................................................................................
Scatchard, Linew eaver-Burke and sim ilar transform s
19
3 Getting
...................................................................................................................................
started with nonlinear regression
21
Distinguishing
..........................................................................................................................................................
nonlinear regression from other kinds of regression
21
The goal of nonlinear
..........................................................................................................................................................
regression
22
The six steps..........................................................................................................................................................
of nonlinear regression
24
Preparing data
..........................................................................................................................................................
for nonlinear regression
25
Don't fit a m odel
..........................................................................................................................................................
to sm oothed data
26
Reparam eterizing
..........................................................................................................................................................
an equation can help
27
4 Weighted
...................................................................................................................................
nonlinear regression
30
The need for ..........................................................................................................................................................
unequal w eighting in nonlinear regression
30
Math theory of
..........................................................................................................................................................
w eighting
31
Don't use w eighted
..........................................................................................................................................................
regression w ith norm alized data
35
What are the consequences
..........................................................................................................................................................
of choosing the w rong w eighting m ethod
37
5 The many
...................................................................................................................................
uses of global nonlinear regression
40
What is global..........................................................................................................................................................
nonlinear regression?
41
The uses of global
..........................................................................................................................................................
nonlinear regression
41
Using global regression
..........................................................................................................................................................
to fit incom plete datasets
42
Fitting m odels..........................................................................................................................................................
w here the param eters are defined by m ultiple data sets
43
Advice: Don't ..........................................................................................................................................................
use global regression if datasets use different units
45
6 Comparing
...................................................................................................................................
fits of nonlinear models
45
Questions that
..........................................................................................................................................................
can be answ ered by com paring m odels
46
Approaches to
..........................................................................................................................................................
com paring m odels
48
How the F test
..........................................................................................................................................................
w orks to com pare m odels
50
How the AICc..........................................................................................................................................................
com putations w ork
51
7 Outlier
...................................................................................................................................
elimination and robust nonlinear regression
53
When to use autom
..........................................................................................................................................................
atic outlier rem oval
53
When to avoid..........................................................................................................................................................
autom atic outlier rem oval
54
Outliers aren't
..........................................................................................................................................................
alw ays 'bad' points
56
The ROUT m ethod
..........................................................................................................................................................
of identifying outliers
57
Robust nonlinear
..........................................................................................................................................................
regression
59
8 How nonlinear
...................................................................................................................................
regression works
60
Why m inim ize..........................................................................................................................................................
the sum -of-squares?
61
How nonlinear..........................................................................................................................................................
regression w orks
62
1995-2014 GraphPad Software, Inc.
Contents
Nonlinear regression
..........................................................................................................................................................
w ith unequal w eights
65
How standard..........................................................................................................................................................
errors and confidence intervals are com puted
66
How confidence
..........................................................................................................................................................
and prediction bands are com puted
67
Replicates
.......................................................................................................................................................... 68
How dependency
..........................................................................................................................................................
is calculated
70
How confidence
..........................................................................................................................................................
and prediction bands are com puted
72
Who developed
..........................................................................................................................................................
nonlinear regression?
74
74
1 What's
...................................................................................................................................
new in Prism 6 (regression)?
74
2 Linear
...................................................................................................................................
regression with Prism
75
How to: Linear..........................................................................................................................................................
regression
76
Finding the.........................................................................................................................................................
best-fit slope and intercept
76
Interpolating
.........................................................................................................................................................
from a linear standard curve
79
Advice: When
.........................................................................................................................................................
to fit a line w ith nonlinear regression
82
Results of linear
..........................................................................................................................................................
regression
84
Slope and .........................................................................................................................................................
intercept
84
r2, a measure
.........................................................................................................................................................
of goodness-of-fit of linear regression
86
Is the slope.........................................................................................................................................................
significantly different than zero?
88
Comparing.........................................................................................................................................................
slopes and intercepts
89
Runs test follow
.........................................................................................................................................................
ing linear regression
90
Analysis checklist:
.........................................................................................................................................................
Linear regression
91
Graphing tips:
.........................................................................................................................................................
Linear regression
92
Questions .........................................................................................................................................................
and answ ers
95
Dem ing regression
.......................................................................................................................................................... 100
Key concepts:
.........................................................................................................................................................
Deming regression
100
How to: Deming
.........................................................................................................................................................
regression
100
Q&A: Deming
.........................................................................................................................................................
Regression
102
Analysis .........................................................................................................................................................
checklist: Deming regression
103
3 Interpolating
...................................................................................................................................
from a standard curve
104
Key concept:..........................................................................................................................................................
Interpolating
104
How to interpolate
.......................................................................................................................................................... 105
Exam ple: Interpolating
..........................................................................................................................................................
from a sigm oidal standard curve
107
Equations used
..........................................................................................................................................................
for interpolating
111
The results of
..........................................................................................................................................................
interpolation
113
Interpolating..........................................................................................................................................................
w ith replicates in side-by-side subcolum ns
114
Interpolating..........................................................................................................................................................
several data sets at once
116
When X values
..........................................................................................................................................................
are logarithm s
117
Analysis checklist:
..........................................................................................................................................................
Interpolating
118
Reasons for ..........................................................................................................................................................
blank (m issing) results
120
Q&A: Interpolating
.......................................................................................................................................................... 121
How Prism interpolates
.......................................................................................................................................................... 122
Standard Addition
..........................................................................................................................................................
Method
123
4 Nonlinear
...................................................................................................................................
regression tutorials
125
Exam ple: Fitting
..........................................................................................................................................................
an enzym e kinetics curve
126
Exam ple: Com
..........................................................................................................................................................
paring tw o enzym e kinetics m odels
131
Exam ple: Autom
..........................................................................................................................................................
atic outlier elim ination (exponential decay)
135
Exam ple: Global
..........................................................................................................................................................
nonlinear regression (dose-response curves)
138
Exam ple: Am..........................................................................................................................................................
biguous fit (dose-response)
145
5 Nonlinear
...................................................................................................................................
regression with Prism
151
How to fit a m
..........................................................................................................................................................
odel w ith Prism
152
Which choices
..........................................................................................................................................................
are essential?
154
1995-2014 GraphPad Software, Inc.
6 Interpreting
...................................................................................................................................
nonlinear regression results
181
Interpreting ..........................................................................................................................................................
results: Nonlinear regression
181
Standard.........................................................................................................................................................
errors and confidence intervals of parameters
183
Normality .........................................................................................................................................................
tests of residuals
185
R squared
......................................................................................................................................................... 187
Sum-of-squares
......................................................................................................................................................... 191
Why Prism
.........................................................................................................................................................
doesn't report the chi-square of the fit
191
Runs test......................................................................................................................................................... 193
Replicates.........................................................................................................................................................
test
194
Dependency
.........................................................................................................................................................
of each parameter
197
Covariance
.........................................................................................................................................................
matrix
199
Confidence
.........................................................................................................................................................
and prediction bands
200
Hougaard's
.........................................................................................................................................................
measure of skew ness
204
Could the.........................................................................................................................................................
fit be a local minimum?
206
Outliers ......................................................................................................................................................... 207
Troubleshooting
.........................................................................................................................................................
nonlinear regression
209
Why results
.........................................................................................................................................................
in Prism 5 and 6 can differ from Prism 4
211
Interpreting ..........................................................................................................................................................
results: Com paring m odels
212
Interpreting
.........................................................................................................................................................
comparison of models
212
Interpreting
.........................................................................................................................................................
the extra sum-of-squares F test
213
Interpreting
.........................................................................................................................................................
AIC model comparison
215
How Prism
.........................................................................................................................................................
compares models w hen outliers are eliminated
215
Interpreting
.........................................................................................................................................................
the adjusted R2
216
Analysis checklists:
..........................................................................................................................................................
Nonlinear regression
217
Analysis .........................................................................................................................................................
checklist: Fitting a model
218
Analysis .........................................................................................................................................................
checklist: Comparing nonlinear fits
221
Analysis .........................................................................................................................................................
checklist: Interpolating from a standard curve
223
Error m essages
..........................................................................................................................................................
from nonlinear regression
225
"Bad initial.........................................................................................................................................................
values"
226
"Interrupted"
......................................................................................................................................................... 226
"Not converged"
......................................................................................................................................................... 227
"Ambiguous"
......................................................................................................................................................... 228
"Hit constraint"
......................................................................................................................................................... 231
"Don't fit"......................................................................................................................................................... 232
"Too few .........................................................................................................................................................
points"
232
"Perfect fit"
......................................................................................................................................................... 233
"Impossible
.........................................................................................................................................................
w eights"
233
"Equation.........................................................................................................................................................
not defined"
233
"Can't calculate"
......................................................................................................................................................... 234
1995-2014 GraphPad Software, Inc.
Contents
7 Models
...................................................................................................................................
(equations) built-in to Prism
234
Dose-response
..........................................................................................................................................................
- Key concepts
235
What are.........................................................................................................................................................
dose-response curves?
235
The EC50......................................................................................................................................................... 236
Confidence
.........................................................................................................................................................
intervals of the EC50
238
Hill slope ......................................................................................................................................................... 240
Choosing.........................................................................................................................................................
a dose-response equation
241
Pros and .........................................................................................................................................................
cons of normalizing the data
243
Converting
.........................................................................................................................................................
concentration to log(concentration)
244
The term .........................................................................................................................................................
"logistic"
245
50% of w.........................................................................................................................................................
hat? Relative vs absolute IC50.
248
Fitting the.........................................................................................................................................................
absolute IC50
251
Incomplete
.........................................................................................................................................................
dose-respone curves
253
Troubleshooting
.........................................................................................................................................................
fits of dose-response curves
254
Dose-response
..........................................................................................................................................................
- Stim ulation
255
Equation:.........................................................................................................................................................
log(agonist) vs. response
255
Equation:.........................................................................................................................................................
log(agonist) vs. response -- Variable slope
256
Equation:.........................................................................................................................................................
log(agonist) vs. normalized response
258
Equation:.........................................................................................................................................................
log(agonist) vs. normalized response -- Variable slope
259
Dose-response
..........................................................................................................................................................
- Inhibition
261
Equation:.........................................................................................................................................................
log(inhibitor) vs. response
261
Equation:.........................................................................................................................................................
log(inhibitor) vs. response -- Variable slope
262
Equation:.........................................................................................................................................................
log(inhibitor) vs. normalized response
264
Equation:.........................................................................................................................................................
log(inhibitor) vs. normalized response -- Variable slope
265
Dose-response
..........................................................................................................................................................
-- Special
267
Asymmetrical
.........................................................................................................................................................
(five parameter)
267
Equation:.........................................................................................................................................................
Biphasic dose-response
269
Equation:.........................................................................................................................................................
Bell-shaped dose-response
271
Equation:.........................................................................................................................................................
Operational model - Depletion
272
Equation:.........................................................................................................................................................
Operational model - Partial agonist
275
Equation:.........................................................................................................................................................
Gaddum/Schild EC50 shift
277
Equation:.........................................................................................................................................................
EC50 shift
279
Equation:.........................................................................................................................................................
Allosteric EC50 shift
281
Equation:.........................................................................................................................................................
ECanything
283
Receptor binding
..........................................................................................................................................................
- Key concepts
285
Law of mass
.........................................................................................................................................................
action
285
Nonspecific
.........................................................................................................................................................
binding
288
Ligand depletion
......................................................................................................................................................... 289
The radioactivity
.........................................................................................................................................................
w eb calculator
290
Receptor binding
..........................................................................................................................................................
- Saturation binding
291
Key concepts:
.........................................................................................................................................................
Saturation binding
291
Equation:.........................................................................................................................................................
One site -- Total binding
292
Equation:.........................................................................................................................................................
One site -- Fit total and nonspecific binding
293
Equation:.........................................................................................................................................................
One site -- Total, accounting for ligand depletion
295
Equation:.........................................................................................................................................................
One site -- Specific binding
297
Equation:.........................................................................................................................................................
One site -- Specific binding w ith Hill slope
300
Binding potential
......................................................................................................................................................... 302
Equation:.........................................................................................................................................................
Tw o sites -- Specific binding only
304
Equation:.........................................................................................................................................................
Tw o sites -- Fit total and nonspecific binding
307
Equation:.........................................................................................................................................................
One site w ith allosteric modulator
309
Receptor binding
..........................................................................................................................................................
- Com petitive binding
311
Key concepts:
.........................................................................................................................................................
Competitive binding
311
Equation:.........................................................................................................................................................
One site - Fit Ki
312
Equation:.........................................................................................................................................................
One site - Fit logIC50
314
1995-2014 GraphPad Software, Inc.
Contents
Equation:.........................................................................................................................................................
Cumulative Gaussian distribution
389
Equation:.........................................................................................................................................................
Lorentzian
391
Sine w aves .......................................................................................................................................................... 393
Standard.........................................................................................................................................................
sine w ave
393
Damped sine
.........................................................................................................................................................
w ave
394
Sinc w ave
......................................................................................................................................................... 395
Classic equations
..........................................................................................................................................................
from prior versions of Prism
396
Equation:.........................................................................................................................................................
One site binding (hyperbola)
396
Equation:Tw
.........................................................................................................................................................
o site binding
397
Equation:.........................................................................................................................................................
Sigmoidal dose-response
397
Equation:.........................................................................................................................................................
Sigmoidal dose-response (variable slope)
398
Equation:.........................................................................................................................................................
One site competition
399
Equation:.........................................................................................................................................................
Tw o site competition
400
Equation:.........................................................................................................................................................
Boltzmann sigmoid
400
Equation:.........................................................................................................................................................
One phase exponential decay
401
Equation:.........................................................................................................................................................
Tw o phase exponential decay
402
Equation:.........................................................................................................................................................
One phase exponential association
402
Equation:.........................................................................................................................................................
Tw o phase exponential association
403
Equation:.........................................................................................................................................................
Exponential grow th
403
Equation:.........................................................................................................................................................
Pow er series
404
Equation:.........................................................................................................................................................
Sine w ave
404
Equation:.........................................................................................................................................................
Gaussian distribution
405
8 Entering
...................................................................................................................................
a user-defined model into Prism
405
Overview : User-defined
..........................................................................................................................................................
equations
406
How to: Enter..........................................................................................................................................................
a new equation
407
How to: Clone
..........................................................................................................................................................
an equation
409
How to: Manage
..........................................................................................................................................................
your list of equations
410
Syntax of user-defined
..........................................................................................................................................................
equations
411
Multiline m odels
.......................................................................................................................................................... 413
Lim itations w
..........................................................................................................................................................
hen entering equations
414
Entering a differential
..........................................................................................................................................................
equation
415
Entering an im
..........................................................................................................................................................
plicit equation
416
Available functions
..........................................................................................................................................................
for user-defined equations
418
Fitting different
..........................................................................................................................................................
segm ents of the data to different m odels
421
Fitting different
..........................................................................................................................................................
m odels to different data sets
423
Colum n constants
.......................................................................................................................................................... 424
Defining equation
..........................................................................................................................................................
w ith tw o (or m ore) independent variables
426
Reparam eterizing
..........................................................................................................................................................
an equation
429
Rules for initial
..........................................................................................................................................................
values
433
Default constraints
.......................................................................................................................................................... 437
Reporting transform
..........................................................................................................................................................
s of param eters
438
9 Plotting
...................................................................................................................................
a function
442
How to: Plot ..........................................................................................................................................................
a function
442
Plotting t, z, F
..........................................................................................................................................................
or chi-square distributions
444
Plotting a binom
..........................................................................................................................................................
ial or Poisson distribution
445
10 Fitting
...................................................................................................................................
a curve without a model
446
Spline and Low
..........................................................................................................................................................
ess curves
447
Using nonlinear
..........................................................................................................................................................
regression w ith an em pirical m odel
448
Index
449
1.1
1.1.1
What is a model?
The whole point of nonlinear regression is to fit a model to your data. So that
raises the question: What is a model?
A mathematical model is a description of a physical, chemical or biological state
or process. Using a model can help you think about chemical and physiological
processes or mechanisms, so you can design better experiments and
comprehend the results. When you fit a model to your data, you obtain best-fit
values that you can interpret in the context of the model.
A mathematical model is neither a hypothesis nor a theory. Unlike scientific
hypotheses, a model is not verifiable directly by an experiment. For all
models are both true and false.... The validation of a model is not that it is
"true" but that it generates good testable hypotheses relevant to important
problems.
R. Levins, Am. Scientist 54:421-31, 1966
Your goal in using a model is not necessarily to describe your system perfectly.
A perfect model may have too many parameters to be useful. Rather, your
goal is to find as simple a model as possible that comes close to describing your
system. You want a model to be simple enough so you can fit the model to
data, but complicated enough to fit your data well and give you parameters
that help you understand the system, reach valid scientific conclusions, and
design new experiments.
1.1.2
10
Reality check
Mathematically, the equation works for any value of X. However, the results
only make sense with certain values.
Negative X values are meaningless, as concentrations cannot be negative.
The model may fail at high concentrations of substance where the reaction
is no longer limited by the concentration of substance.
The model may also fail at high concentrations if the solution becomes so
dark (the optical density is so high) that little light reaches the detector. At
that point, the noise of the instrument may exceed the signal.
It is not unusual for a model to work only for a certain range of values. You just
have to be aware of the limitations, and not try to use the model outside of its
useful range.
Exponential decay
Exponential equations whenever the rate at which something happens is
proportional to the amount which is left. Examples include ligands dissociating
11
Converting the differential equation into a model that defines Y at various times
requires some calculus. There is only one function whose derivative is
proportional to Y, the exponential function. Integrate both sides of the equation
to obtain a new exponential equation that defines Y as a function of X (time),
the rate constant k, and the value of Y at time zero, Y0 .
Equilibrium binding
When a ligand interacts with a receptor, or when a substrate interacts with an
enzyme, the binding follows the law of mass action.
12
1.1.3
1.1.4
13
Tip 1. Make sure you know the meaning and units of X and Y
For this example, Y is enzyme activity which can be expressed in various units,
depending on the enzyme. X is the substrate concentration in Molar or
micromolar or some other unit of concentration.
1.2
14
14
15
1.2.1
Y
X
Slope= Y/ X
Y intercept
The slope quantifies the steepness of the line. It equals the change in Y for each
unit change in X. It is expressed in the units of the Y axis divided by the units of
the X axis. If the slope is positive, Y increases as X increases. If the slope is
negative, Y decreases as X increases.
15
The Y intercept is the Y value of the line when X equals zero. It defines the
elevation of the line.
Correlation and linear regression are not the same. Review the differences.
1.2.2
16
16
1.2.3
Assumptions
With linear regression, the X values can be measured or can be a variable controlled by the
experimenter. The X values are not assumed to be sampled from a Gaussian distribution.
The distances of the points from the best-fit line is assumed to follow a Gaussian distribution,
with the SD of the scatter not related to the X or Y values.
The correlation coefficient itself is simply a way to describe how two variables vary together,
so it can be computed and interpreted for any two variables. Further inferences, however,
require an additional assumption -- that both X and Y are measured (are interval or ratio
variables), and both are sampled from Gaussian distributions. This is called a bivariate
Gaussian distribution. If those assumptions are true, then you can interpret the confidence
interval of r and the P value testing the null hypothesis that there really is no correlation
between the two variables (and any correlation you observed is a consequence of random
sampling).
17
1.2.4
62
18
Using nonlinear regression to analyze data is only slightly more difficult than
using linear regression. Your choice of linear or nonlinear regression should be
based on the model you are fitting. Do not use linear regression just to avoid
using nonlinear regression. Avoid transformations 19 such as Scatchard or
Lineweaver-Burke transforms whose only goal is to linearize your data.
1.2.5
197
19
1.2.6
20
Before nonlinear regression was readily available, the best way to analyze
nonlinear data was to transform the data to create a linear graph, and then
analyze the transformed data with linear regression. Examples include
Lineweaver-Burke plots of enzyme kinetic data, Scatchard plots of binding data,
and logarithmic plots of kinetic data.
These methods are outdated, and should not be used to analyze data
The problem with these methods is that the transformation distorts the
experimental error. Linear regression assumes that the scatter of points around
the line follows a Gaussian distribution and that the standard deviation is the
same at every value of X. These assumptions are rarely true after transforming
data. Furthermore, some transformations alter the relationship between X and
Y. For example, in a Scatchard plot the value of X (bound) is used to calculate Y
(bound/free), and this violates the assumption of linear regression that all
uncertainty is in Y while X is known precisely. It doesn't make sense to minimize
the sum of squares of the vertical distances of points from the line, if the same
experimental error appears in both X and Y directions.
Since the assumptions of linear regression are violated, the values derived from
the slope and intercept of the regression line are not the most accurate
determinations of the variables in the model. Considering all the time and effort
you put into collecting data, you want to use the best possible technique for
analyzing your data. Nonlinear regression produces the most accurate results.
The figure below shows the problem of transforming data. The left panel shows
data that follows a rectangular hyperbola (binding isotherm). The right panel is a
Scatchard plot of the same data. The solid curve on the left was determined by
nonlinear regression. The solid line on the right shows how that same curve
would look after a Scatchard transformation. The dotted line shows the linear
regression fit of the transformed data. Scatchard plots can be used to
determine the receptor number (Bmax , determined as the X-intercept of the
linear regression line) and dissociation constant (Kd, determined as the negative
reciprocal of the slope). Since the Scatchard transformation amplified and
distorted the scatter, the linear regression fit does not yield the most accurate
values for Bmax and Kd.
21
Don't use linear regression just to avoid using nonlinear regression. Fitting
curves with nonlinear regression is not difficult.
Although it is usually inappropriate to analyze transformed data, it is often
helpful to display data after a linear transformation. Many people find it easier to
visually interpret transformed data. This makes sense because the human eye
and brain evolved to detect edges (lines) not to detect rectangular
hyperbolas or exponential decay curves. Even if you analyze your data with
nonlinear regression, it may make sense to display the results of a linear
transformation.
1.3
22
24
1.3.1
25
26
Polynomial regression
A polynomial model has this form: Y= A + BX + CX2 + DX3 ....
Like linear regression, it is possible to fit polynomial models without fussing with initial
values. For this reason, some programs (i.e. Excel) can perform polynomial regression,
but not nonlinear regression. And some programs have separate modules for fitting data
with polynomial and nonlinear regression. Prism fits polynomial models using the same
22
analysis it uses to fit nonlinear models. Polynomial equations are available within Prism's
nonlinear regression analysis.
Multiple regression
A multiple regression model has more than one independent (X) variable. Like
linear and nonlinear regression, the dependent (Y) variable is a measurement.
GraphPad Prism 6 does not perform multiple regression. But by using column
constants 424 , you can effectively fit models with two independent variables in
some circumstances.
Logistic regression
A logistic regression model is used when the outcome, the dependent (Y)
variable, has only two possible values. Did the person get the disease or not?
Did the student graduate or not? There can be one or several independent
variables. These independent variables can be a variable like age or blood
pressure, or have discrete values to encode which treatment each subject
received.
GraphPad Prism 6 does not perform logistic regression.
1.3.2
23
perhaps to draw a graph with a smooth curve. If this is your goal, you can
assess it purely by looking at the graph of data and curve. There is no need to
learn much theory. Jump right to an explanation of interpolation with Prism. 104
Least-squares regression
The most common assumption is that data points are randomly scattered
around an ideal curve (or line) with the scatter following a Gaussian distribution.
If you accept this assumption, then the goal of regression is to adjust the
model's parameters to find the curve that minimizes the sum of the squares of
the vertical distances of the points from the curve.
Why minimize the sum of the squares of the distances? Why not simply
minimize the sum of the actual distances?
If the random scatter follows a Gaussian distribution, it is far more likely to
have two medium size deviations (say 5 units each) than to have one small
deviation (1 unit) and one large (9 units). A procedure that minimized the sum
of the absolute value of the distances would have no preference over a curve
that was 5 units away from two points and one that was 1 unit away from one
point and 9 units from another. The sum of the distances (more precisely, the
sum of the absolute value of the distances) is 10 units in each case. A
procedure that minimizes the sum of the squares of the distances prefers to be
5 units away from two points (sum-of-squares = 25) rather than 1 unit away
from one point and 9 units away from another (sum-of-squares = 82). If the
scatter is Gaussian (or nearly so), the curve determined by minimizing the
sum-of-squares is most likely to be correct.
24
1.3.3
25
values are fairly far from the correct values. You'll get the same best-fit curve
no matter what initial values you use, unless the initial values are extremely far
from correct. Initial values matter more when your data have a lot of scatter or
your model has many variables.
Step 4. If you are fitting two or more data sets at once, decide whether to
share any parameters
If you enter data into two or more data set columns, Prism will fit them all in
one analysis. But each fit will be independent of the others unless you specify
that one or more parameters are shared. When you share parameters, the
analysis is called a global nonlinear regression 41 .
1.3.4
26
26
1.3.5
The figure shows the number of hurricanes over time. The left panel shows the
number of hurricanes in each year, which jumps around a lot. To make it easier
to spot trends, the right panel shows a rolling average. The value plotted for
each year is the average of the number of hurricanes for that year plus the
prior eight years. This smoothing lets you see a clear trend.
But there is a problem. These are not real data. Instead, the values plotted in
the left panel were chosen randomly (from a Poisson distribution, with a mean
of 10). There is no pattern. Each value was randomly generated without regard
to the previous (or later) values.
27
Creating the running average creates the impression of trends by ensuring that
any large random swing to a high or low value is amplified, while the year-toyear variability is muted.
You should not fit a model to the rolling average data with linear or nonlinear
regression, or compute a correlation coefficient. Any such results would would
be invalid and misleading. The problem is that regression assumes that each
value is independent of the others, but the rolling average are not at all
independent of each other. Rather each value is included as part of the
neighboring values.
This example is adapted from Briggs (2008).
1.3.6
What is reparameterizing?
There are two forms of the model used to fit the sigmoidal enzyme kinetics
data to a standard model:
Y=Vmax*X^h/(Khalf^h + X^h)
Y=Vmax*X^h/(Kprime + X^h)
The two equations are equivalent. They both fit Vmax (the maximum activity
extrapolated to very high concentrations of substrate) and h (Hill slope,
describing the steepness of the curve). But one model fits Khalf (the
concentration needed to obtain a velocity half of maximal) and the other fits
Kprime (a more abstract measure of substrate action).
Which model is best? The two are equivalent, with Kprime equal to Khalfh , so
they will generate exactly the same curve.
28
29
Clearly the distribution of Khalf is quite symmetrical, and looks Gaussian. The
skewness is close to zero, as expected for a symmetrical distribution. In
contrast, the distribution of Kprime is quite skewed. Note that a few of the
simulations had best-fit values of Kprime greater than 100. The skewness value
(4.89) confirms what is obvious by inspection -- the distribution is far from
symmetrical.
"99% CI"
5.0%
1.0%
8.8%
4.8%
5.1%
1.0%
These results show that Khalf is well behaved, as expected given its symmetry
(see above). The 95% confidence interval is expected to miss the true value in
5.0% of the simulations. In fact, it happened 5.1% of the time. Similarly, the
99% CI is expected to miss the true value in 1.0% of the simulations, which is
exactly what happened. In contrast, Kprime is less well behaved. The intervals
computed to be 95% confidence intervals were not wide enough so missed the
true value in 8.8% of the simulations. The 99% intervals were similarly not wide
enough so missed the true value in 4.8% of the simulations. Thus the
confidence intervals computed to be 99% intervals, actually turned out to be
95% intervals.
These simulations show the advantage of choosing the equation that fits Khalf,
rather than the one that fits Kprime. Khalf has a symmetrical distribution so the
confidence intervals computed from these fits can be interpreted at face value.
30
One way to choose is to match other text books and papers, so your results
can easily be compared to others. Another approach is to choose the form that
fits the way you think. If you prefer to think graphically, choose the Khalf. If you
think mechanistically, choose Kprime.
But the choice is more than a matter of convenience and convention. The rest
of this article explains how the choice of model determines the accuracy of the
confidence intervals.
1.4
1.4.1
Both graphs above show dose-response curves, with response measured in ten
replicate values at each response. In the left graph, the standard deviation of
those replicates is consistent. It is about the same all the way along the curve.
In the right graph, the standard deviation of the replicates is related to the value
of Y. As the curve goes up, variation among replicates increases.
31
These data are simulated. In both cases, the scatter among replicates is
sampled from a Gaussian distribution. In the graph on the left, the SD of that
Gaussian distribution is the same for all doses. On the right, the SD is a
constant fraction of the mean Y value. When a response is twice as high as
another response, the standard deviation among replicates is also twice as
large. In other words, the coefficient of variation (CV) is constant.
What happens if you fit a model to the data on the right without taking into
account the fact that the scatter increases as Y increases? Consider two doses
that give different responses that differ by a factor of two. The average
distance of the replicates from the true curve will be twice as large for the
higher response. Since regression minimizes the sum of the squares of those
distances those points will be expected to contribute four times as much to the
sum-of-squares as the points with the smaller average Y value. In other words,
the set of replicates whose average Y value is twice that of another set will be
given four times as much weight. This means, essentially, that the curve fitting
procedure will work harder to bring the curve near these points, and relatively
ignore the points with lower Y values. You'd need to have four times as many
replicates in the lower set to equalize the contribution to the sum-of-squares.
The goal of weighting is for points anywhere on the curve to contribute equally
to the sum-of-squares. Of course random factors will give some points more
scatter than others. But the goal of weighting is to make those differences be
entirely random, not related to the value of Y. The term weight is a bit
misleading, since the goal is to remove that extra weight from the points with
high Y values. The goal is really to unweight.
Prism offers six choices on the Weights tab
160
of nonlinear regression.
32
points will therefore contribute more to the sum-of-squares, and thus dominate
the calculations.
The solution is not to minimize the sum-of-squares, but rather to minimize the
weighted sum-of-squares. In other words, minimize the relative distances of
the Y values of the data (Ydata) from the Y values of the curve (Ycurve). When
you choose relative weighting, nonlinear regression minimizes this quantity:
The left side is easiest to understand. For each point, compute how far it is (in
the Y direction) from the curve, divide that value by the Y value of the curve,
and square that ratio. Add up that squared ratio for all points. The right side is
equivalent. It squares the numerator and denominator separately, and then
computes the ratio of those two squared values. This is the way most
mathematical statisticians think about weighting, so relative weighting is also
called weighting by Y2 .
Poisson weighting
Weighting by 1/Y is a compromise between minimizing the actual distance
squared and minimizing the relative distance squared. One situation where 1/Y
weighting is appropriate is when the Y values follow a Poisson distribution. This
would be the case when Y values are radioactive counts and most of the
scatter is due to counting error. With the Poisson distribution, the standard
error of a value equals the square root of that value. Therefore you divide the
distance between the data and the curve by the square root of the value, and
then square that result. The equation below shows the quantity that Prism
minimizes, and shows why it is called weighting by 1/Y (but Prism actually
weights by the absolute value of Y.)
General weighting
The first three equations below shows how absolute, Poisson and relative
weighting are related. Note that taking anything to the zero power results in
33
The equation on the right shows general weighting. You enter K so can
customize the weighting scheme to fit your data. Generally this choice is used
with values of K between 1.0 and 2.0. Reference 1 below uses this approach.
If you want to experimentally determine the best value of K, you can do so:
1.Collect data with lots (over a dozen; maybe several dozen) replicates at
many points along the curve.
2.Plot the data the usual way to make sure the data seem correct.
3.Create a second graph that ignores the X values (time or concentration...).
Instead, in this new graph, X is the logarithm of mean of the replicate Y
values for each point on the first graph, and Y on this new graph is the
logarithm of variance (square of the standard deviation) among the replicate
values. You can use either natural logarithms or logarithms base 10. It
doesn't matter, so long as both logarithms uses the same base.
4.Fit a straight line to this graph using linear regression. Since the assumption
of a Gaussian variation around this line is dubious, use nonlinear regression
and choose a robust fit.
5.The slope of this regression line is K. If K is close to 0.0, then the SD does
not vary with Y so no weighting is needed. If K is close to 2.0, then the
standard deviation is proportional to Y, and relative weighting is appropriate.
If K is close to 1.0, choose Poisson weighting. If K has some other value, use
general weighting and enter this value of K as a constant.
Note that if Ycurve is negative, Prism actually takes the the absolute value of
Ycurve to the K power.
34
This method is most useful when you have formatted the data table for entry
of SD values, but then entered values that you had calculated elsewhere based
on understanding how the scatter (or errors) arise in your experimental
system. The "SD" values you enter should be computed weighting factors, not
the actual SD of the data.
If you enter the actual SD into the SD subcolumn, or enter replicate values so
Prism computes the SD, then Prism will use these actual SD values as weighting
factors. This is way less useful than it sounds. With small to moderate sample
sizes, the SD will jump around a lot by chance, and it is not appropriate to use
these random SD values for weighting. Weighting should be by predicted SD not
actual SD (which is subject to random factors).
Of course, weighting by SD is impossible if any of the entered SD values are
zero. It also is impossible if Prism is computing the SD from replicates and there
is only one replicate, or if all replicates are identical (so their SD equals zero).
Reference
1. LM Lavasseur, H Faessel, HK SLocum, and WR Greco, Implications for
Clinical Pharmacodynamic Studies of the Statistical Characterization, of an In
Vitro Antiproliferation Assay, J. Pharmacokinetics and pharmacodynamics, 26:
717-733, 1998.
Word file with equations
1.4.3
35
These data fit nicely using relative weighting. This minimizes the sum of the
squares of the relative distance between the points and the curve. In other
words it minimizes:
I ran 10,000 simulations, and found that in every case the fit worked well and
gave a reasonable answer (with the EC50 within the range of the data). No
surprise so far. The data were fitting assuming a model that exactly matched
the method used to simulate the data, and those fits worked well.
36
I answered that question with simulations. Of 1,000 simulated data sets, 223
could not be fit at all. Moreover, 60 simulated data sets gave nonsense results,
with the EC50 outside of the range of the data. The remaining 72% of the
simulations seemed ok, but the confidence intervals were very wide in some.
37
Bottom line
The whole idea of weighted nonlinear regression is to match the weighting
scheme used by the regression to the variation in the actual data. If you
normalize the data, none of the usual weighting schemes will work well.
If you really want to show your data on a normalized axis running from 0% to
100%, you can do so. First fit the model to the actual data using an
appropriate weighting scheme. Then normalize both the data and the curve.
Details and link to Prism file
1.4.4
Simulations
I picked a very simple model -- a straight line. I simulated the data so the SD of
scatter at any point along the line is proportional to the Y value of that point.
The graph below shows a simulated data set. You can clearly see that the
scatter among replicates increases as the line goes up.
38
The line was fit to the data by "nonlinear" regression. Prism does not offer
differential weighting as part of its linear regression analysis, but "nonlinear"
regression can fit a straight line with many options not available in the linear
regression analysis 82 .
The red line used the default choice -- no weighting; minimize sum of squares.
The blue line used relative weighting. This choice is appropriate when you expect
the SD of replicate residuals to be proportional to Y. The two lines are not
identical.
I simulated 5000 such data sets using the Monte Carlo analysis of Prism 6.
Each of the 5000 simulated data sets was fit with both unweighted and
weighted (relative weighting) regression. I recorded both the best-fit value of
the slope and its standard error (SE) for both analyses of each of the 5000
data sets.
39
40
Since these are simulated data, we know the true population slope (it is 1.0).
So we can ask for each simulation whether the reported 95% confidence
interval included the true value. For the relative weighting simulations, the 95%
confidence intervals included the true value in 95.3% of the simulated data sets.
When those same data sets were analyzed without weighting, only 92.6% of
the "95%" confidence intervals included the true value.
Summary
In summary, when we chose the wrong weighting scheme in this example:
The best-fit value of the slope was less precise.
The SE of the slope was larger, and thus the confidence intervals were wider.
Even though the 95% confidence intervals were wider, they weren't wide
enough. The "95%" confidence interval only included the true value in fewer
than 95% of the simulations.
This is just one example. With other examples, the choice of weighting method
matters less. But with still other examples, it may matter more. It is worth
trying to choose the appropriate weighting scheme.
Download the Prism file for this example.
1.5
41
41
1.5.2
42
The first two uses of global fitting do not require writing special models. The
third use requires that you write a model for this purpose.
1.5.3
While the curves nicely fit the data points, the confidence intervals are quite
wide. We really haven't determined the EC 5 0 with sufficient precision to make
useful conclusions. The problem is that the control data (squares) dont really
define the bottom plateau of the curve, and the treated data (circles) dont
really define the top plateau of the curve. Since the data dont define the
minimum and maximum responses very well, the data also dont define very
clearly the point half-way between the minimum and maximum responses.
Accordingly, the confidence intervals for each EC 5 0 extend over more than an
order of magnitude. The whole point of the experiment was to determine the
two EC 5 0 values, but there is an unacceptable amount of uncertainty in the
value of the best-fit values of the EC 5 0 .
The problem is solved by sharing parameters. For this example, share the
parameters that define the top and bottom plateaus and the slope. But dont
share the EC50 value, since the EC50 values for control and treated data are
clearly distinct.
Here are the results.
43
The graph of the curves looks only slightly different. But now the program finds
the best-fit parameters with great confidence. The 95% confidence intervals for
the EC 5 0 values span about a factor of two (compared to a factor of ten or
more when no parameters were shared).
The control data define the top of the curve pretty well, but not the bottom.
The treated data define the bottom of the curve pretty well, but not the top. By
fitting both data sets at once, sharing some parameters, both EC50 values
were determined with reasonable certainty.
1.5.4
Fitting models where the parameters are defined by multiple data sets
Global fitting is most useful when the parameters you care most about are
not defined by any one data set, but rather by the relationship between two
data sets.
Sample data
Choose the XY sample data set: Binding --Saturation binding to total and
nonspecific
Fit the data using nonlinear regression, open the "Binding --Saturation" list of
equations, and choose "One site -- total and nonspecific". You'll see the fit
below.
44
45
A, total binding). It defines the Y values in that dataset to equal the sum of total
and nonspecific binding.
The fourth line is preceded by <B> so only applies to the second data set, and
defines those Y values to equal nonspecific binding.
The equation is defined with the constraint that the parameters NS and
background are shared between the two data sets. That way, Prism finds one
best-fit value for NS and background, based on fitting both data sets. Since
Bmax and Kd are only used in fitting the first dataset, it wouldn't be meaningful
to share these parameters.
The parameters you care about (Bmax and Kd) cannot be determined precisely
by fitting just one dataset. But fitting a model that defines both data sets (and
their relationship) while sharing the parameter NS between the datasets, lets
Prism get the most information possible from the data.
1.5.5
1.6
46
questions
46
50
51
1.6.1
157
For each data set, which of two equations (models) fits best?
Compare the fit of two models, taking into account differences in the number of
47
parameters to be fit. Most often, you will want to compare two related
equations. Comparing the fits of two unrelated equations is rarely helpful.
Example: Compare a one-phase exponential decay with a two-phase
exponential decay.
Do the best-fit values of selected unshared parameters differ between data sets?
Compare the fit when the selected parameter(s) are shared among all datasets
with the fit when those parameter(s) are fit individually to each dataset.
If you pick one parameter, you are asking whether the best-fit value of that
one parameter differs among datasets.
If you pick all the parameters, you are asking whether a single curve adequately
fits all the data points, or if you get a better fit with individual curves for each
dataset.
Example: Fit a family of dose-response curves and compare the fit when the
slope factor (Hill slope) is shared with the fit when each curve is fit individually.
This is a way to test whether the curves are parallel.
For each dataset, does the best-fit value of a parameter differ from a theoretical
value?
You may have theoretical reasons to believe that a parameter will have a
certain value (often 0.0, 100, or 1.0). Compare the fit when the parameter is
constrained to that value with the unconstrained fit.
Example: Test if a Hill Slope differs from 1.0 (a standard value).
Does one curve adequately fit all data sets?
This choice compares the fits of separate curves to each data set with the fit of
a single curve fit to all the data sets. It asks whether there is evidence that the
treatments did anything to shift the curves.
This choice is new in Prism 6 and is identical to choosing "Do the best-fit values
of selected unshared parameters differ between data sets?" and then selecting
all the parameters.
48
1.6.2
50
The null hypothesis is that the simpler model (the one with fewer parameters)
is correct. The improvement of the more complicated model is quantified as the
difference in sum-of-squares. You expect some improvement just by chance,
and the amount you expect by chance is determined by the number of data
points and the number of parameters in each model. The F test compares the
difference in sum-of-squares with the difference you would expect by chance.
The result is expressed as the F ratio, from which a P value is calculated.
The P value answers this question:
If the null hypothesis is really correct, in what fraction of experiments (the
size of yours) will the difference in sum-of-squares be as large as you
observed, or even larger?
If the P value is small, conclude that the simple model (the null hypothesis) is
wrong, and accept the more complicated model. Usually the threshold P value
49
is set at its traditional value of 0.05. If the P value is less than 0.05, then you
reject the simpler (null) model and conclude that the more complicated model
fits significantly better.
Information theory approach Akaike's criterion (AIC)
This alternative approach is based on information theory, and does not use the
traditional hypothesis testing statistical paradigm. Therefore it does not
generate a P value, does not reach conclusions about statistical significance,
and does not reject any model.
The method determines how well the data supports each model, taking into
account both the goodness-of-fit (sum-of-squares) and the number of
parameters in the model. The results are expressed as the probability that each
model is correct, with the probabilities summing to 100%. If one model is much
more likely to be correct than the other (say, 1% vs. 99%), you will want to
choose it. If the difference in likelihood is not very big (say, 40% vs. 60%), you
will know that either model might be correct, so will want to collect more data.
How the calculations work 51 .
50
In one fit, the model is separately fit to each data set, and the goodness-of-fit
is quantified with a sum-of-squares. The sum of these sum-of-square values
quantifies the goodness of fit of the family of curves fit to all the data sets.
The other fit is a global fit to all the data sets at once, sharing specified
parameters. If you ask Prism whether one curve adequately fits all data sets,
then it shares all the parameters.
These two fits are nested (the second is a simpler case of the first, with fewer
parameters to fit) so the sums-of-squares (actually the sum of sum of squares
for the first fits) can be compared using either the F test 50 or Akaike's method
51 .
1.6.3
212
SS1 is the sum-of-squares for the simpler model (which will be higher) and SS2
is the sum-of-squares of the more complicated model.
If the more complicated model is correct, then you expect the relative increase
in sum-of-squares (going from complicated to simple model) to be greater than
the relative increase in degrees of freedom:
The F ratio quantifies the relationship between the relative increase in sum-ofsquares and the relative increase in degrees of freedom.
51
F ratios are always associated with a certain number of degrees of freedom for
the numerator and a certain number of degrees of freedom for the
denominator. This F ratio has DF1-DF2 degrees of freedom for the numerator,
and DF2 degrees of freedom for the denominator.
If the simpler model is correct you expect to get an F ratio near 1.0. If the ratio
is much greater than 1.0, there are two possibilities:
The more complicated model is correct.
The simpler model is correct, but random scatter led the more complicated
model to fit better. The P value tells you how rare this coincidence would be.
The P value answers this question:
If model 1 is really correct, what is the chance that you would randomly
obtain data that fits model 2 so much better?
If the P value is low, conclude that model 2 is significantly better than model 1.
Otherwise, conclude that there is no compelling evidence supporting model 2,
so accept the simpler model (model 1).
1.6.4
52
AIC = Nln
SS2
+2 DF
SS1
The equation now makes intuitive sense. Like the F test, it balances the change
in goodness-of-fit as assessed by sum-of-squares with the change in the
number of degrees of freedom (due to differences in the number of parameters
to be fit). Since model 1 is the simpler model, it will almost always fit worse, so
SS1 will be greater than SS2. Since the logarithm of a fraction is always
negative, the first term will be negative. Model 1 has fewer parameters and so
has more degrees of freedom, making the last term positive. If the net result is
negative, that means that the difference in sum-of-squares is more than
expected based on the difference in number of parameters, so you conclude
that the more complicated model is more likely.
Prism reports the difference between the two AICc values as the AICc of the
simpler model minus the AICc of the more complicated model. When the more
complicated (more parameters) model has the lower AICc and so is preferred,
Prism reports the difference of AICc as a positive number. When the simpler
model has the lower AICc and so is preferred, Prism reports the difference of
AICc as a negative number.
The equation above helps you get a sense of how AIC works balancing
change in goodness-of-fit vs. the difference in number of parameters. But you
dont have to use that equation. Just look at the individual AIC values, and
choose the model with the smallest AIC value. That model is most likely to be
correct.
Prism actually doesn't report the AIC, but rather the AICc. That value includes a
correction for low sample size. The equation is a bit more complicated, and is
more accurate with small sample size. With larger sample sizes, the AIC and
AICc are almost the same.
Note that these calculations are based on information theory, and do not use
the traditional hypothesis testing statistical paradigm. Therefore there is no P
value, no conclusion about statistical significance, and no rejection of a
model.
From the difference in AICc values, Prism calculates and reports the probability
that each model is correct, with the probabilities summing to 100%. If one
model is much more likely to be correct than the other (say, 1% vs. 99%), you
will want to choose it. If the difference in likelihood is not very big (say, 40% vs.
60%), you will know that either model might be correct, so will want to collect
more data.
Note that this method simply compares the fits of the two models you chose.
53
It is possible that a third model, one you didn't choose, fits far better than
either model you chose.
1.7
54
Outlier elimination is misleading when you are fitting the wrong model
5000
5000
4000
4000
Response
Response
1.7.2
3000
2000
1000
Outlier.
3000
2000
Different model
1000
0
0
10 -9
10 -8
10 -7
10 -6
Dose
10 -5
10 -4
10 -3
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
Dose
55
The left panel above shows the data fit to a dose response curve 256 . In this
figure, one of the points is a significant outlier. But this interpretation assumes
that youve chosen the correct model. The right panel shows the data fit to an
alternative bell-shaped dose-response model 271 , where high doses evoke a
smaller response than does a moderate dose. The data fit this model very well,
with no outliers detected (or even suspected).
This example points out that outlier elimination is only appropriate when you
are sure that you are fitting the correct model.
Outlier
Response
Response
7500
5000
2500
5000
2500
0
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
10 -9
10 -8
10 -7
Dose
10 -6
10 -5
10 -4
10 -3
Dose
The left panel above show data fit to a dose-response model with one point (in
the upper right) detected as an outlier. The right panel shows that the data
really come from two different experiments. Both the lower and upper plateaus
of the second experiment (shown with upward pointing triangles) are higher
than those in the first experiment (downward pointing triangles). Because these
are two different experiments, the assumption of independence was violated in
the analysis in the left panel. When we fit each experimental run separately, no
outliers are detected.
6000
6000
5000
5000
4000
3000
Four Outliers
2000
1000
Response
Response
4000
3000
2000
1000
0
10 -9
10 -8
10 -7
10 -6
Dose
10 -5
10 -4
10 -3
-9
-8
-7
-6
-5
-4
-3
Dose
The left panel above shows data fit to a dose-response model. Four outliers
were identified (two are almost superimposed). But note that the values with
larger responses (Y values) also, on average, are further from the curve. This
1995-2014 GraphPad Software, Inc.
56
makes least-squares regression inappropriate. To account for the fact that the
SD of the residuals is proportional to the height of the curve, we need to use
weighted regression 30 . The right panel shows the same data fit to the same
dose-response model, but minimizing sum of the squares of the distance of the
point from the curve divided by the height of the curve, using relative weighting.
Now no outliers are identified. Using the wrong weighting method created false
outliers.
1.7.3
Definition of an 'outlier'
The term 'outlier' is defined fairly vaguely, but refers to a value that is far from
the others. In Prism's nonlinear regression, an outlier is a point that is far from
the best-fit curve defined by robust regression.
Of course, there is some possibility that an outlier really comes from the same
Gaussian population as the others, and just happens to be very high or low.
You can set the value of Q 160 to control how aggressively Prism defines
outliers.
57
In quality control analyses, an outlier can tell you about a process that is out of
control. You wouldn't want to delete outliers, without first figuring out why the
value is far from the others. The outlier might be telling you something
important.
1.7.4
59
2.The residuals of the robust fit are analyzed to identify any outliers. This step
uses a new outlier test adapted from the False Discovery Rate approach of
testing for multiple comparisons.
3.Remove the outliers, and perform ordinary least-squares regression on the
remaining data.
Prism then identifies the outliers, eliminates them, and fits the remaining points.
The outliers are shown in a separate table, and the number of outliers is
tabulated on the main results table.
58
The mathematical details are explained in reference 1. This value is set in the
Weights tab 160 of the Nonlinear regression dialog.
If you set Q to a higher value, the threshold for defining outliers is less strict.
This means that Prism will have more power to detect outliers, but also will
falsely detect 'outliers' more often.
If you set Q to a lower value, the threshold for defining outliers is stricter. This
means that Prism will have a less power to detect real outliers, but also have a
smaller chance of falsely defining a point to be an outlier.
Unless you have a strong reason to choose otherwise, we recommend sticking
with the default value of 1%. Our simulations have shown that if all the scatter
is Gaussian, Prism will falsely find one or more outliers in about 2-3% of
experiments. This does not mean that a few percent of all values are declared
to be outliers, but rather that one or more outliers will be detected in a few
percent of experiments. If there really are outliers present in the data, Prism will
detect them with a False Discovery Rate less than 1%.
1.Motulsky HM and Brown RE, Detecting outliers when fitting data with
nonlinear regression a new method based on robust nonlinear regression
and the false discovery rate, BMC Bioinformatics 2006, 7:123.
1.7.5
59
-6
(Gaussian)
-5
-4
-3
-2
-1
The widest distribution in that figure, the t distribution for df=1, is also known as
the Lorentzian distribution or Cauchy distribution. The Lorentzian distribution
has wide tails, so outliers are fairly common and therefore have little impact on
the fit.
We adapted the Marquardt 62 nonlinear regression algorithm to accommodate
the assumption of a Lorentzian (rather than Gaussian) distribution of residuals,
60
1. Press WH, Teukolsky SA, Vettering WT, Flannery BP: Numerical Recipes in C. the Art
of Scientific Computing. New York, NY: Cambridge University Press; 1988.
2. Motulsky HM and Brown RE, Detecting outliers when fitting data with nonlinear regression
a new method based on robust nonlinear regression and the false discovery rate, BMC
Bioinformatics 2006, 7:123. Download as pdf.
1.8
61
61
62
68
1.8.1
70
62
1.8.2
63
You want to fit a binding curve to determine Bmax and Kd using the equation
How can you find the values of Bmax and Kd that fit the data best? You can
generate an infinite number of curves by varying Bmax and Kd. For each of the
generated curves, you can compute the sum-of-squares to assess how well
that curve fits the data. The following graph illustrates the situation.
64
Each point on the surface corresponds to one possible curve. The goal of
nonlinear regression is to find the values of Bmax and Kd that make the sumof-squares as small as possible (to find the bottom of the valley).
The method of linear descent follows a very simple strategy. Starting from the
initial values, try increasing each parameter a small amount. If the sum-ofsquares goes down, continue. If the sum-of-squares goes up, go back and
decrease the value of the parameter instead. You've taken a step down the
surface. Repeat many times. Each step will usually reduce the sum-of-squares.
If the sum-of-squares goes up instead, the step must have been so large that
you went past the bottom and back up the other side. If this happens, go back
and take a smaller step. After repeating these steps many times, you will reach
the bottom.
The Gauss-Newton method is a bit harder to understand. As with the method
of linear descent, start by computing how much the sum-of-squares changes
when you make a small change in the value of each parameter. This tells you
the slope of the sum-of-squares surface at the point defined by the initial
values. If the equation really is linear, this is enough information to determine
the shape of the entire sum-of-squares surface, and thus calculate the best-fit
values of Bmax and Kd in one step. With a linear equation, knowing the slope at
one point tells you everything you need to know about the surface, and you
can find the minimum in one step. With nonlinear equations, the Gauss-Newton
method won't find the best-fit values in one step, but that step usually
improves the fit. After repeating many iterations, you reach the bottom.
This method of linear descent tends to work well for early iterations, but works
slowly when it gets close to the best-fit values (and the surface is nearly flat).
In contrast, the Gauss-Newton method tends to work badly in early iterations,
but works very well in later iterations. The two methods are blended in the
method of Marquardt (also called the Levenberg-Marquardt method). It uses
the method of linear descent in early iterations and then gradually switches to
the Gauss-Newton approach.
Prism, like most programs, uses the Marquardt method for performing
nonlinear regression.
References
1.8.3
65
66
If you choose both unequal weighting and automatic outlier removal, Prism first
fits using robust regression (ignoring your weighting choice), and then uses the
weighting factors in identifying the outliers, as explained in reference 1.
Reference
1. Motulsky HM and Brown RE, Detecting outliers when fitting data with nonlinear regression
a new method based on robust nonlinear regression and the false discovery rate, BMC
Bioinformatics 2006, 7:123. Download as pdf.
1.8.4
67
where BestFit(Pi) is the best fit value for the i-th parameter, and t is the value
from the t distribution for 95% confidence and DF degrees of freedom.
How Prism computes the confidence interval for the difference or ratio of
two parameters
See this document.
1.8.5
68
In both these equations, the value of c (defined above) depends on the value of
X, so the confidence and prediction bands are not a constant distance from the
curve. The value of SS is the sum-of-squares for the fit, and DF is the number
of degrees of freedom (number of data points minus number of parameters).
CriticalT is a constant from the t distribution based on the amount of confidence
you want and the number of degrees of freedom. For 95% limits, and a fairly
large df, this value is close to 1.96. If DF is small, this value is higher
1.8.6
Replicates
If you entered data with replicates (in side-by-side subcolumns), Prism gives
you two choice for how to deal with the replicates.
Consider each replicate as an individual point.
Fit the means of each set of replicates.
The rest of this page explains how to decide. When in doubt, choose to fit
individual replicates. The other choice is only rarely useful.
Independent replicates
In most experiments, it is fair to consider each replicate to be an independent
data point. Each particular replicate is subject to random factors, which may
increase or decrease its value. Each random factor affects individual replicates,
and no random factor affects the replicates as a group. In any kind of
biochemical experiment, where each value comes from a test tube or plate
well, the replicates are almost certain to be independent.
When your replicates are independent, Prism will treat each replicate as a
separate point. If there are four replicates at one X value and two at another,
the four replicates will automatically get twice the weight, since the program
considers them to be four separate data points.
If you ask Prism to fit the mean values, rather than individual replicates, you
won't get valid standard errors and confidence intervals. If you have different
number of replicates at different X values, you will lose the extra weights that
the points with more replicates deserve, so will get incorrect best-fit values.
69
70
1.8.7
What is dependency?
When the model has two or more parameters, as is almost always the case,
the parameters can be intertwined.
What does it mean for parameters to be intertwined? After fitting a model,
change the value of one parameter but leave the others alone. The curve
moves away from the points. Now, try to bring the curve back so it is close to
the points by changing the other parameter(s). If you can bring the curve closer
to the points, the parameters are intertwined. If you can bring the curve back
to its original position, then the parameters are redundant.
Prism can quantify the relationships between parameters by reporting the
correlation matrix or reporting dependency.
Interpreting dependency
You can interpret dependency 197 without knowing much about how it is
calculated. Read on if you are interesting in knowing how the value is
computed.
Time
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
11.0
12.0
13.0
14.0
15.0
16.0
17.0
18.0
19.0
20.0
71
Signal
39.814
32.269
29.431
27.481
26.086
25.757
24.932
23.928
22.415
22.548
21.900
20.527
20.695
20.105
19.516
19.640
19.346
18.927
18.857
17.652
We will focus on the rate constant, K. The best fit value is 0.2149 sec-1, which
corresponds to a half-life of 3.225 seconds. Its SE is 0.0248 sec-1, which
corresponds to a 95% confidence interval of 0.1625 to 0.2674 sec- 1 .
It is clear that the three parameters are not entirely independent. If you forced
K to have a higher value (faster decay), the curve would get further from the
points. But you could compensate a bit by starting the curve at a higher value
and ending at a lower one (increase Span and decrease Plateau). The SE values
of the parameters depend on one another.
Fix Span and Plateau to their best fit values, and ask Prism to fit only the rate
constant K. This will not change the best fit value, of course, since we fixed
Span and Plateau to their best-fit values. But the SE of K is lower now, equal to
0.008605. This makes sense. Changing the value of K has a bigger impact on
goodness-of-fit (sum-of-squares) when you fix the Span and Plateau than it
does when you allow the values of Span and Plateau to also change to
compensate for the change in K.
The lower value of the SE of K when you fix the other parameters tells you that
the uncertainty in K is dependent on the other parameters. We want to quantify
this by computing the dependency.
Before we can compare the two SE values, we have to correct for a minor
problem. When computing the SE, the program divides by the square root of
the number of degrees of freedom (df). For each fit, df equals the number of
72
data points minus the number of parameters fit by the regression. For the full
fit, df therefore equals 20 (number of data points) minus 3 (number of
parameters) or 17. When we held the values of Plateau and Span constant,
there was only one parameter, so df=19. Because the df are not equal, the
two SE values are not quite comparable. The SE when other parameters were
fixed is artificially low. This is easy to fix. Multiply the SE reported when two of
the parameters were constrained by the square root of 19/17. This corrected
SE equals 0.00910.
Now we can compute the dependency. It equals 1.0 minus the square of the
ratio of the two (corrected) SE values. So the dependency for this example
equals 1.0-(0.0091/0.0248)2, or 0.866. Essentially, this means that 86.6% of
the variance in K is due to is interaction with other parameters.
Each parameter has a distinct dependency (unless there are only two
parameters). The dependency of Span is 0.613 and the dependency of Plateau
is 0.813.
1.8.8
73
74
1.8.9
76
100
447
2.1
104
(linear or
75
2.2
76
76
76
79
82
84
84
90
91
88
89
86
92
95
100
100
100
102
2.2.1.1
103
77
2. Enter data
If you chose sample data, you'll see these values:
If you enter Y values for several data sets (column A, B and C), Prism will
report regression results for X vs. YA, for X vs. YB, and for X vs. YC. It can
also test whether the slopes (and intercepts) differ significantly.
If the different data sets don't share the same X values, use different rows for
different data sets like this:
3. Analysis choices
Click Analyze, and then choose linear regression from the list of XY analyses.
Force the line to go through a specified point (such as the origin)?
If you choose regression, you may force the line to go through a particular
point such as the origin. In this case, Prism will determine only the best-fit
slope, as the intercept will be fixed. Use this option when scientific theory tells
you that the line must go through a particular point (usually the origin, X=0,
Y=0) and you only want to know the slope. This situation arises rarely.
78
Use common sense when making your decision. For example, consider a
protein assay. You measure optical density (Y) for several known
concentrations of protein in order to create a standard curve. You then want to
interpolate unknown protein concentrations from that standard curve. When
performing the assay, you adjusted the spectrophotometer so that it reads
zero with zero protein. Therefore you might be tempted to force the regression
line through the origin. But this constraint may result in a line that doesn't fit the
data very well. Since you really care that the line fits the standards very well
near the unknowns, you will probably get a better fit by not constraining the
line.
If in doubt, you should let Prism find the best-fit line without any constraints.
Fit linear regression to individual replicates or to means?
If you collected replicate Y values at every value of X, there are two ways to
calculate linear regression. You can treat each replicate as a separate point, or
you can average the replicate Y values, to determine the mean Y value at each
X, and do the linear regression calculations using the means.
You should consider each replicate a separate point when the sources of
experimental error are the same for each data point. If one value happens to
be a bit high, there is no reason to expect the other replicates to be high as
well. The errors are independent.
Average the replicates and treat the mean as a single value when the replicates
are not independent. For example, the replicates would not be independent if
they represent triplicate measurements from the same animal, with a different
animal used at each value of X (dose). If one animal happens to respond more
than the others, that will affect all the replicates. The replicates are not
independent.
Test departure from linearity with runs test
See Runs test
90
92
2.2.1.2
79
2. Enter data
Enter the unknowns below the standards on the same table. Enter Y values
with no X values in those rows (example below), or X values with no Y values in
those rows. Optionally enter row titles to label those unknowns.
80
4. Analysis choices
Click
Curve.
Alternatively, you can click the Interpolate a standard curve button right on
top of the Analyze button.
81
Choose Line in the list of Standard curves to interpolate. Unless you have a
good reason, you can leave the other choices on that dialog set to their default
values.
The results will appear in several pages, including one with the interpolated values, which
will be in the same units as your original data.
82
The second page of results tabulates the best-fit values of the parameters and
much more. For this example, we aren't too interested in these results.
2.2.1.3
30
59
157
59
fitting method.
185
83
on the residuals.
197
Compare the scatter of points from the line with the scatter among
replicates with a replicates test 194 .
Report the best-fit values with 90% confidence limits (or any others).
Prism's linear regression analysis only reports 95% CI. Nonlinear regression
lets you choose the confidence level you want.
Report the results of interpolation 105 from the line/curve along with 95%
confidence intervals of the predicted values. Prism's linear regression
analysis does not include those confidence intervals.
With linear regression, the SE of the slope is always reported with the slope
as a plus minus value. With nonlinear regression, the SE values are a
separate block of results that can be copy and pasted elsewhere.
Use global nonlinear regression 41 to fit one line to several data sets. Or
share the intercept or slope among several data sets, while fitting the other
parameter individually to each data set.
Run a Monte Carlo analysis.
When you enter data with multiple replicates at each X value, Prism's
nonlinear regression can perform the replicates test 194 to ask whether the
data deviate systematically from the straight line model. Prism does not
offer the replicates test with linear regression.
Test whether the slope (or intercept) significantly differs from some
proposed value. For example, test whether the slope differs from a
hypothetical value of 1.0, or whether an intercept differs significantly from
0.0.
Situations where Prism can give different results with linear vs. nonlinear
regression
There are two situations where linear regression will give different results than
fitting a straight line with nonlinear regression:
If you enter the Y values as Mean, SD (or SEM) and N. In this case, Prism's
linear regression analysis fits the means only, ignoring the scatter and sample
size. In contrast, Prism's nonlinear regression gives you a choice (in the
84
Weights tab 160 ) of fitting just the mean, or of accounting for scatter and
sample size. With the latter choice, which is the default, the results will be
identical to what they would have been had you entered the raw data. If you
want to account for the SD among replicates, use nonlinear regression.
If you enter replicate Y values and choose the runs test. With linear
regression, Prism averages the replicates and computes the runs test on
those mean Y values. With nonlinear regression, Prism won't compute the
runs test when you enter replicate Y values. Instead,
2.2.2
2.2.2.1
85
At the bottom of the results page, the slope and intercept are reported again in
the form of the equation that defines the best-fit line. You can copy this
equation and paste onto a graph, or into a manuscript.
86
171
The meaning of r2
The value r2 is a fraction between 0.0 and 1.0, and has no units. An r2 value of
0.0 means that knowing X does not help you predict Y. There is no linear
relationship between X and Y, and the best-fit line is a horizontal line going
through the mean of all Y values. When r2 equals 1.0, all points lie exactly on a
straight line with no scatter. Knowing X lets you predict Y perfectly.
How r2 is computed
This figure demonstrates how Prism computes r2 .
The left panel shows the best-fit linear regression line This lines minimizes the
sum-of-squares of the vertical distances of the points from the line. Those
vertical distances are also shown on the left panel of the figure. In this example,
the sum of squares of those distances (SSreg) equals 0.86. Its units are the
1995-2014 GraphPad Software, Inc.
87
88
Interpreting the P value when Prism fits both slope and intercept
Prism reports the P value testing the null hypothesis that the overall slope is
zero. The P value answers this question:
If there were no linear relationship between X and Y overall, what is the
probability that randomly selected points would result in a regression line as
far from horizontal (or further) than you observed?
Equivalently:
If there were no linear relationship between X and Y overall, what is the
probability that randomly selected points would result in an R2 value as high
(or further) as you observed?
The P value is calculated from an F test, and Prism also reports the value of F
and its degrees of freedom. You would get exactly the same P value from the t
ratio computed by dividing the slope by its standard error.
89
green line), the best fit value of the slope is 0.00. The P value answers the
question: If the true slope is zero, what is the chance that the slope will be
further from zero than the observed slope due only to random sampling. Since
the observed slope is zero, there is almost a 100% chance of obtaining a slope
that is further than zero than observed! So the P value is greater than 0.99, as
high as a P value can be. Some people are confused and think the P value
should be small purely because the points for a pattern. Not so. The P value,
from conventional linear regression fitting both slope and intercept, will be small
only when the points form a linear pattern that is not horizontal.
The results are very different when you fit linear regression with the constraint
that the line has to go through the origin (blue line). To make the line go
through the origin and also go near the points, the best-fit line has a slope is far
from zero. Since the line is far from horizontal, the P value is tiny. Given the
constraint that the line must go through the origin (X=0, Y=0; lower-left of
graph), the data are quite convincing that best fit line is far from horizontal, so
it makes sense that the P value is tiny.
Constraining a line to go through the origin (or some other point) can be very
useful in some situations. Usually this option is used to fit calibration curves
used for interpolation, in which case the P value is not useful. If you force the
line through the origin, be very wary when interpreting the P value. It is rarely
useful, and easy to misinterpret.
Download the Prism file.
2.2.2.4
Prism compares slopes of two or more regression lines if you check the option:
"Test whether the slopes and intercepts are significantly different".
Comparing slopes
Prism compares slopes first. It calculates a P value (two-tailed) testing the null
hypothesis that the slopes are all identical (the lines are parallel). The P value
answers this question:
If the slopes really were identical, what is the chance that randomly selected
data points would have slopes as different (or more different) than you
observed.
90
41
the
Comparing intercepts
If the slopes are significantly different, there is no point comparing intercepts. If
the slopes are indistinguishable, the lines could be parallel with distinct
intercepts. Or the lines could be identical. with the same slopes and intercepts.
Prism calculates a second P value testing the null hypothesis that the lines are
identical. If this P value is low, conclude that the lines are not identical (they are
distinct but parallel). If this second P value is high, there is no compelling
evidence that the lines are different.
Reference
The runs test determines whether your data differ significantly from a straight
91
line. Prism can only calculate the runs test if you entered the X values in order.
A run is a series of consecutive points that are either all above or all below the
regression line. In other words, a run is a consecutive series of points whose
residuals are either all positive or all negative.
If the data points are randomly distributed above and below the regression line,
it is possible to calculate the expected number of runs. If there are Na points
above the curve and Nb points below the curve, the number of runs you expect
to see equals [(2NaNb)/(Na+Nb)]+1. If you observe fewer runs than expected,
it may be a coincidence of random sampling or it may mean that your data
deviate systematically from a straight line. The P value from the runs test
answers this question:
If the data really follow a straight line, and you performed many experiments
like this one, what fraction of the time would you obtain as few (or fewer)
runs as observed in this experiment?
If the runs test reports a low P value, conclude that the data do not really
follow a straight line, and consider using nonlinear regression to fit a curve.
The P values are always one-tail, asking about the probability of observing as
few runs (or fewer) than observed. If you observe more runs than expected,
the P value will be higher than 0.50.
.
2.2.2.6
Is the scatter of data around the line Gaussian (at least approximately)?
Linear regression analysis assumes that the scatter of data around the best-fit line is
Gaussian.
92
high or low X values tend to be further from the best-fit line. The assumption that the
standard deviation is the same everywhere is termed homoscedasticity. (If the scatter
goes up as Y goes up, you need to perform a weighted regression. Prism can't do this
via the linear regression analysis. Instead, use nonlinear regression but choose to fit to
a straight-line model.
93
Confidence bands
Two confidence bands surrounding the best-fit line define the confidence interval
of the best-fit line.
The dashed confidence bands are curved. This does not mean that the
confidence band includes the possibility of curves as well as straight lines.
Rather, the curved lines are the boundaries of all possible straight lines. The
figure below shows four possible linear regression lines (solid) that lie within the
confidence band (dashed).
Given the assumptions of linear regression, you can be 95% confident that the
two curved confidence bands enclose the true best-fit linear regression line,
leaving a 5% chance that the true line is outside those boundaries.
Many data points will be outside the 95% confidence bands. The confidence
bands are 95% sure to contain the best-fit regression line. This is not the same
as saying it will contain 95% of the data points.
Prediction bands
Prism can also plot the 95% prediction bands. The prediction bands are further
from the best-fit line than the confidence bands, a lot further if you have many
data points. The 95% prediction band is the area in which you expect 95% of all
data points to fall. In contrast, the 95% confidence band is the area that has a
94
95% chance of containing the true regression line. This graph shows both
prediction and confidence intervals (the curves defining the prediction intervals
are further from the regression line).
95
Residuals
If you check an option on the linear regression dialog, Prism will create a results
table with residuals, which are the vertical distances of each point from the
regression line. The X values in the residual table are identical to the X values
you entered. The Y values are the residuals. A residual with a positive value
means that the point is above the line; a residual with a negative value means
the point is below the line.
When Prism creates the table of residuals, it also automatically makes a new
graph containing the residuals and nothing else. You can treat the residuals table
like any other table, and do additional analyses or make additional graphs.
If the assumptions of linear regression have been met, the residuals will be
randomly scattered above and below the line at Y=0. The scatter should not
vary with X. You also should not see large clusters of adjacent points that are
all above or all below the Y=0 line.
See an example
2.2.2.8
179
16
96
Follow the Y=0 baseline from left to right. The region between the 95%
confidence bands for the best fit line (blue curves) is the 95% CI of the X
intercept. You can see that this confidence interval (between the two
outmost dotted lines) is not symmetrical around the X intercept (the middle
dotted line).
This asymmetry will be very noticeable if you only have a few points with lots
of scatter, and will be almost unnoticeable with lots of points with little
scatter.
GraphPad Prism reports the 95% confidence interval of the X intercept if you
check an option on the Linear regression parameters dialog.
Because the uncertainty is not symmetrical, it rarely makes sense to report a
standard error of the X-intercept. It is much better to report both ends of the
95% confidence interval, which Prism reports. If you really want to compute
a single standard error for the X intercept, you can do so by choosing
nonlinear regression, and fitting this user-defined equation to the data:
Y = slope*(X-Xintercept)
Prism will report the best-fit value of the X intercept along with a SE and 95%
confidence interval. Since this confidence interval will be computed from the
SE value it will be symmetrical around the X intercept, and so won't be as
accurate as the asymmetrical interval reported by linear regression.
97
How to test whether the slope from linear regression differs from 1.0 or some other value?
Prism tests whether the best-fit slope from linear regression line differs
significantly from zero. But you need to use extra steps to test whether the
slope differs from some other value.
Since Prism's linear regression analysis can't answer the question, use
nonlinear regression instead. Nonlinear regression can fit a straight line and
offers many advantages 82 .
In the nonlinear regression parameters dialog, go to the Compare tab, check
the option to ask compare the best-fit value of a parameter to a theoretical
value, and then choose Slope and enter the theoretical value (1.0).
Can Prism do weighted linear regression?
Prism's linear regression analysis cannot handle differential weighting.
However, Prism's nonlinear regression analysis can do weighted regression,
and you can fit a straight line using "nonlinear" regression 82 .
How can Prism fit linear regression with the slope constrained to equal 1.0 (or some other
value)?
Prism's linear regression analysis cannot constrain the slope to a particular
value. However, Prism's nonlinear regression analysis can constrain any
parameter to any value, and you can fit a straight line using "nonlinear"
regression 82 .
How to fit one linear regression line to multiple data sets?
Prism usually fits one line through each data set. But you can get it to fit one
line through all the data. Use the nonlinear regression analysis to fit the line 82
. Then go to the Constrain tab, and share both the slope and the intercept.
"Sharing" means that Prism fits one best-fit value for all the data sets, rather
than one for each data set. Since you share all the parameters, only one line
is fit to all the data.
Why doesn't Prism report R2 for linear regression when I force the line through the origin
(or some other point)?
The problem is that when you constrain a line to go through a point, there
would be two ways to compute R2 :
Compare the fit of the best-fit line with the fit of a horizontal line at the
mean Y value. But that null hypothesis (horizontal line through the Ymean)
98
1*(Ymax-Ymin)/(Xmax-Xmin).
Define Xcross and Ycross to both be shared parameters, so Prism will find
one global best-fit value for those parameters that applies to both data sets.
Do not share the slope parameter, as you want Prism to fit separate slopes
for each data set. Getting these sharing settings correct is essential.
Prism will fit four parameters:
Slope for the first data set
Slope for the second data set
99
100
2.2.3
Deming regression
2.2.3.1
Standard linear regression 13 assumes that you know the X values perfectly,
and all the uncertainty is in Y. It minimizes the sum of squares of the vertical
distance of the points from the line.
If both X and Y variables are subject to error, fit linear regression using a
method known as Deming, or Model II, regression.
If your goal is to compare two analysis methods, consider using a Bland-Altman
plot instead.
2.2.3.2
2. Enter data
If you enter Y values for several data sets (column A, B and C), Prism will
report regression results for X vs. YA, for X vs. YB, and for X vs. YC.
3. Analysis choices
Click Analyze and choose Deming regression from the list of XY analyses.
Most of the dialog is self-explanatory. The choices on top are the hardest to
understand, but also are the most important as the calculations depend on
knowing relative magnitude of the errors in X and Y.
101
d i2
N
Compute the SD of the error for each method or variable (X and Y), enter the
two SDerror values into the Deming regression analysis dialog, and Prism will fit
the line for you. If the X variable has a much smaller SD than the Y value, the
results will be almost identical to standard linear regression.
If you try to compare Prisms results with those of another program or book,
102
SD X error
SD Y error
Prism requires you to enter individual SD values, but uses these values only to
calculate , which is then used in the Deming regression calculations. If you
know , but not the individual SD values, enter the square root of as the SD
of the X values, and enter 1.0 as the SD of the Y error. The calculations will be
correct, since Prism uses those two values only to compute .
Reference
1. PW Strike, Statistical Methods in Laboratory Medicine, ISBN:0750613459. Equation
8.15.
2.2.3.3
103
method:
SDerror =
d i2
N
It is easy to derive that equation. To compute the variance, you find the
average square of the difference between each value and the mean. For pairs,
the mean is half the difference. So take half that distance squared, but also
compute it for the other value of the pair. For both values, therefore, the
contribution is di squared. Add that up for all the measurements and divide by
the number of measurements, and you have the variance. Take the square
root to get the SD.
Why N, rather than N-1? I suspect it should be N-1, but that won't matter
much. It will make the SD a bit smaller than it should be, but that will be true for
both the X and Y SD, so the ratio will hardly be changed.
2.2.3.4
104
2.3
104
105
111
117
118
121
2.3.1
113
123
105
Notes:
This graph shows the response going up as the concentration gets larger. In
many cases, the response goes down as the concentration gets larger.
Interpolation works in either case.
Sigmoid shaped dose-response curves are commonly used with many
assays. If the X values analyzed by the analysis are the logarithm of
concentration, then so are the interpolated values. You need to take the
antilogarithm of these values to return to concentration units.
This example shows the most common use of interpolating. Finding an X
value (concentration) for a given Y value. Prism can also interpolate the other
way: Finding a Y value for an entered X value.
2.3.2
How to interpolate
106
unknown values right below those standards. In most cases, as shown below,
you'll enter Y values and leave X blank. Prism can also interpolate the other
way. In this case, enter X values and leave Y blank.
107
method 57 . Finally, if the scatter among replicates is larger when the Y values are larger,
consider using relative weighting 160 .
Note that this simplified analysis to interpolate is simply a subset of the
nonlinear regression analysis, with fewer choices on the dialog. If you want to
select options that aren't available, click the "More.." button at the bottom of
the dialog to convert the analysis to the full nonlinear regression analysis, with
many more options (six tabs worth). Once you switch to nonlinear regression
analysis, it is not possible to switch back (except by starting over).
Nonlinear regression
Nonlinear regression offers the interpolation option at the bottom of the first
(Fit) tab of the dialog. This is a good choice if you want to use options not
available on the simpler interpolation analysis.
Linear regression
Linear regression
79
Spline
Prism can also interpolate from a standard curve in the spline 447 analysis. Splines go
through every point. The advantage of a spline is that you don't have to choose a model to
fit to your data. The disadvantage is that the curve may wiggle too much.
2.3.3
108
The first seven rows contain the standard curve, in duplicate. Below that are
three unknown values. These have a Y values that you measured, but no X.
The goal of this analysis is to interpolate the corresponding X values
(concentrations) for these unknowns. Note that three of the four unknowns
are labeled, so you can later match up the results with the labels.
Why are X values negative? Because in this example, the X values are the
logarithm of concentrations expressed in molar. So a concentration of 1
micromolar (10 - 6 Molar) would be entered as -6.
The graph Prism makes automatically is fairly complete. You can customize the
symbols, colors, axis labels, etc. You can also choose to plot the individual
duplicates rather than plot the means. Since the unknowns have no X value,
they are not included on the graph.
109
5. Choose a model
Choose the equation: Sigmoidal, 4PL, X is log(Concentration). Note that 4PL
means four parameter logistic, which is another name for this kind of equation.
For this example, leave all the other settings to their default values.
Click OK to see the curves superimposed on the graph.
110
The second page is the table of results for the overall curve fit. It tabulates the
best-fit values of the parameters and much more. For this example, we aren't
too interested in these results.
On the Transform dialog check the option to transform X values and choose
the transform X=10^X
111
Now the X column is in molar concentration units. Note that the column title
hasn't changed. Prism isn't smart enough to adjust the column titles when you
transform data.
Click and edit the column title to "Concentration (M)".
2.3.4
112
Asymmetric
Sigmoidal, 5PL, X
is
log(concentratio
n) 267
Biphasic
269
Straight line
371
Semilog line -- X
is log, Y is linear
376
113
Hyperbola (X is
concentration) 297
Second order
polynomial
(quadratic) 384
Third order
polynomial
(cubic) 384
2.3.5
To view this table, click the interpolated results page of the results. In some
cases you may first need to click the + in front of the results main page to see
all the pages within.
114
2.3.6
115
Place computed unknown values into the X column, with replicate values
stacked
In this case, Prism creates two pages of results.
The interpolation is the same for both tables. Prism interpolates each replicate
value individually.
The first page, Interpolated Mean X Values, shows the mean of the Y values
you entered and the mean of the interpolated X values. The X column shows
the concentrations (or logarithm of concentrations) to match the data you
entered.
The second page, Interpolated X replicates, shows individual replicate values for
the interpolated results. The X column maintains the same meaning as it does in
the data table you entered. To do so, with only a single X column, requires that
the replicates be stacked.
116
Place computed unknown values into a Y data set, maintaining the side-byside arrangement of replicates.
When you make this choice, the interpolated values appear in the Y column,
and the X column is blank. You can view the interpolated values, with their
name (if you entered row titles). But the values you entered are not shown.
The first page shows the mean values. The second page shows the interpolated
values in side-by-side replicates, to match the format you entered them. Even
though these values are concentrations (or logarithm of concentrations), note
that the results are in the Y column.
2.3.7
117
Why can't Prism handle the kind of data shown in the right panel above?
Because then it might get mixed up about which values are unknowns to be
interpolated, and which are simply adjacent to values left blank because they
are missing.
2.3.8
118
113
5.From the list of transforms, choose Y=10^Y or X=10^X. These choices are
about one third of the way down the long list of transforms.
6.The results sheets will have the interpolated values as concentrations, rather
than logs.
Why 10^Y or 10^X? Those are abbreviations for 10 Y or 10 X , which are the
antilogarithm functions. Review logarithms and antilogarithms.
2.3.9
Established assay
If the assay is well established, then you know you are fitting the right model
and know what kind of results to expect. In this case, evaluating a fit is pretty
easy.
Does the curve go near the points?
Is the R2 'too low' compared to prior runs of this assay?
If so, look for outliers, or use Prism's automatic outlier detection.
119
New assay
With a new assay, you also have to wonder about whether you picked an
appropriate model.
Does the curve go near the points?
Look at the graph. Does it look like the curve goes near the points.
Are the confidence bands too wide?
How wide is too wide? The prediction bands show you how precise
interpolations will be. Draw a horizontal line somewhere along the curve, and
look at the two places where that line intercepts the confidence bands. This will
be the confidence interval for the interpolation.
Does the scatter of points around the best-fit curve follow a Gaussian
distribution?
Least squares regression is based on the assumption that the scatter of points
around the curve follows a Gaussian distribution. Prism offers three normality
tests (in the Diagnostics tab) that can test this assumption (we recommend the
D'Agostino test). If the P value for a normality test is low, you conclude that
the scatter is not Gaussian.
Could outliers be impacting your results?
Nonlinear regression is based on the assumption that the scatter of data
around the ideal curve follows a Gaussian distribution. This assumption leads to
the goal of minimizing the sum of squares of distances of the curve from the
points. The presence of one or a few outliers (points much further from the
curve than the rest) can overwhelm the least-squares calculations and lead to
misleading results.
You can spot outliers by examining a graph (so long as you plot individual
replicates, and not mean and error bar). But outliers can also be detected
120
121
values and expect Prism to interpolate an X value, then the unknown Y values
must be entered in the same units as the Y values for the standards.
Collapse all
122
understand how Prism handles this situation, you need to know how it interpolates. It
starts at the minimum X value and works its way to the largest X value. It stops when it
finds the Y value you entered. If it doesn't find a Y value matching the one you entered,
it then looks at X values lower than the minimum shown on the graph and to X values
larger than the maximum. If there are two points on the curve that have the Y value you
entered as an unknown, Prism will find the lower of the two corresponding X values.
123
If you ask Prism to compute X for a given Y, it is more complicated. Prism does
not try to solve the equation algebraically but rather does the interpolation
numerically. The results are accurate to at least 6-7 decimal places.
Prism decides on the range of X values to consider. To allow for extrapolation a
bit beyond the range of the data, Prism creates an interpolation/extrapolation
range that includes the range of the data, and extends in each direction by a
distance equal to half the difference between Xmax and Xmin. There are two
special cases. When all the data are positive (or zero), that range is clipped to
exclude negative numbers. Similarly, when the data are all negative (or zero),
the interpolation/extrapolation range is clipped to exclude positive numbers.
Prism then divides that interpolation/extrapolation range into 1000 line
segments.
For each value to be interpolated or extrapolated, Prism first tries to interpolate
within the range of the X values of the data.
1.It starts with the lowest X value (scans from left to right on the graph). If
more than one line segment includes the Y value, Prism only finds the first
(lowest X value).
2.Prism then interpolates within that line segment to determine X as accurately
as possible. In most cases, it does this by binary bisection. It divides the
segment in half and figures out which contains Y. Then it divides that half in
half again. And again. This continues until X is determined as accurately as
possible given the numerical precision of the computer. In rare cases, it is
possible that Y is not monotonic within the range determined in step 1. In this
case, Prism uses linear interpolation rather than binary bisection.
If Prism is not able to interpolate a Y value that corresponds to the entered X
value within the X range of the data, it will try to extrapolate a value to the
extended range mentioned above. It first looks at X values below the minimum
X value in the data, and then at X values above the maximum X value in the
data.
When Prism interpolates a value (the X value is in the range of the data), it will
also compute a confidence interval if you request it to do so. It does this by
determining where the two confidence bands intersect the Y value you entered.
When Prism extrapolates a value from the curve, it will not compute a
confidence interval for that extrapolated value.
124
perform the assay at a number of known concentrations, fit a line or curve, and
interpolate the unknown values.
But there is a problem with interpolating from a standard curve. The results can
be incorrect when the unknown sample are contaminated with other
substances that alter the assay. This is known as the matrix effect problem.
The Standard Addition Method is a way to bypass this problem. Add various
known concentrations (including zero) of known substance to a constant
amount of the unknown, and perform the assay. This ensures that all the
samples have the same amount of unknown, including any substances that
interfere with the assay.
Fit the data with linear regression. The Standard Addition Method only works
when the assay measurement (Y) is proportional to the concentration of
substance you are measuring.
The dotted line in the figure below shows how the assay results (usually optical
density) varies as you change the concentration of the substance you are
assaying. The circles and solid line show the assay results in the presence of an
unknown concentration of the substance you are assaying. This pushes the line
up and to the left.
The value you want to know is how far the solid line is shifted to the left of the
dotted line. There is an easy way to find out: Extrapolate the line down to Y=0.
One of the parameters that Prism reports is the X intercept, which will be
negative. Take the absolute value, and that is the concentration of the
unknown substance. The confidence interval for the X intercept gives you the
confidence interval for the concentration of the unknown. Simply multiply both
confidence limits by -1.
125
To plot the data in Prism, you'll want to extend the linear regression line to start
at an X value equal to the X intercept (a choice in the Linear regression
parameters dialog). You may also want to move the origin to the lower left, a
choice on the first tab of the Format Axis dialog. Prism file.
2.4
126
126
response curves)
138
145
The data are in triplicate. Some values are missing, and Prism always handles
these just fine.
127
Since this is the first time you are viewing the graph, Prism will pop up the
Change Graph Type dialog. Select the fifth choice, to plot individual replicates
rather than mean and error bars.
The graph Prism makes automatically is fairly complete. You can customize the
symbols, colors, axis labels, position of legend, etc.
5. Choose a model
On the Fit tab of the nonlinear regression dialog, open the equation folder,
Enzyme Kinetics - Substrate vs. Velocity. Then choose the Michaelis-Menten
equation.
128
335
For this example, leave all the other settings to their default values.
Click OK to see the curves superimposed on the graph.
129
The goal of nonlinear regression is to find the best-fit values of the parameters.
These are reported at the top of the table. You can't really interpret the best-fit
values without knowing how precise they are, and this is reported both as
standard errors and confidence intervals 183 .
130
Go to the Diagnostics tab 165 , and check the option to perform the replicates
test 194 . Note that you can also check an option to make your settings here
become the default for future fits.
131
The P value is small (0.013). This means that the scatter of the data from the
curve is greater than you'd expect from the variation among triplicates. This
suggests that you might want to consider fitting an alternative model, which we
do in the next example 131 .
2.4.2
126
132
157
Choose: For each data set, which of two equations (models) fits best?
There are two ways to compare models. For this example, choose the extra
sum-of-squares F test 50 .
For the second equation, choose "Allosteric sigmoidal
kinetics section.
342
133
134
The P value of the replicates test 194 is high, which means the scatter of points
around the curve is consistent with variability of replicates from each other.
The parameter H equals 2.0, with a 95% confidence interval ranging from 1.5
to 2.5. A value of 2.0 suggests that this enzyme might be a dimer. When H
equals 1.0, the allosteric model is identical to the Michaelis-Menten model.
Further interpretation must, of course, be in the context of what is known
about this enzyme from prior work. Statistics is only part of analyzing scientific
data.
2.4.3
135
136
137
138
Prism identified the outlier, and plotted it in red, overlaid on top of the data
graph. After identifying the outlier, Prism fit the remaining data points as if the
outlier wasn't present. Before accepting the results, think about why the point
was an outlier. Remember, not all outliers are "bad" points 56 .
Double click on the graph to bring up the Format Graph dialog. Go to the
second tab. You can see that this graph now has three data sets, the data, the
curve fit, and the outliers. Read more about graphing outliers 176 .
2.4.4
139
The X values are the logarithm of the concentration of agonist. The Y values are
responses, in duplicate, in two conditions.
140
5. Choose a model
On the Fit tab of the nonlinear regression dialog, open the panel of inhibitory
dose-response models and choose: log(inhibitor) vs. response -- variable
slope.
For now, leave all the other settings to their default values.
Click OK to see the curves superimposed on the graph.
141
The control results are labeled ambiguous. This means that Prism is unable to
find a unique curve through the data. Lots of other sets of parameter values
would lead to curves that fit just as well. You can see which parameters are
ambiguous by looking at the 95% confidence intervals. Instead of reporting an
interval, Prism reports 'very wide' for the Bottom and logEC50.
The data do not define a bottom plateau for the control (circles) data set, so its
best-fit value is ambiguous. The EC50 is the concentration that gives a
response half way between the bottom and top plateaus of the curve. If the
bottom is ambiguous, so is the EC50.
The treated curve is not labeled 'ambiguous', but the confidence intervals are
wider than you'd like.
142
that that the top and bottom plateaus, and the slope, are the same under
control and treated conditions. In other words, you assume that the treatment
shifts the EC50 but doesn't change the basal response, the maximum
response, or the Hill slope.
Return to the nonlinear regression dialog by clicking the button in the upper left
of the results table.
Go to the constraints tab and choose to share the value of Bottom, Top, and
HillSlope. When you share these parameters, Prism fits the data sets globally to
find one best-fit value for Bottom, Top and HillSlope (for both data sets) and
separate best-fit values for the logEC50.
143
The fit is no longer labeled 'ambiguous' and the confidence intervals are much
tighter.
144
Prism will now fit the data two ways. The first is the same as before, fitting a
separate IC50 for each data set. The second fit shares all the parameters. In
this case, three parameters were already shared but one wasn't. So in this
second fit, all four parameters are shared, so Prism fits one curve through all
the data, ignoring which treatment group they are in. It compares the sum-ofsquares (really the sum-of-sum-of squares since there are two data sets fit in
each case), using the extra sum-of-squares F test. The results are shown at
the top of the results sheet.
The P value is tiny, so we reject the null hypothesis that the two IC50 values
are identical in the population, and instead conclude that the two IC50 values
are different.
1995-2014 GraphPad Software, Inc.
145
This equation was designed to do exactly what is needed for this example. Read
about how this equation was set up 279 , so you can construct your own
equations when necessary.
2.4.5
146
147
5. Choose a model
On the Fit tab of the nonlinear regression dialog, open the panel of Stimulatory
dose-response equations and choose: log(agonist) vs. response -- Variable
slope.
256
For this example, leave all the other settings to their default values.
Click OK to see the curves superimposed on the graph.
148
149
Note the word 'ambiguous' at the top of the results. This means that Prism was
unable to find a unique fit to these data. Lots of sets of parameter values would
lead to curves that fit just as well. Learn more about ambiguous fits 228 .
Prism does not report confidence intervals for the logEC50 or the Bottom of
the curve, but instead simply says the intervals are 'very wide'. That tells you it
was impossible to fit those parameters precisely.
While the Y values of the data rum about 100 to about 600, the best-fit value
for the bottom of the curve is -1067. Furthermore, the best-fit value of the
logEC50 is outside the range of the data.
Even though the curve is close to the points (the R2 is 0.9669), the best-fit
parameter values are useless.
150
the mid point and steepness. But the data simply don't define the bottom of the
curve. In fact, Prism the best fit value of Bottom is very very far from the data.
These data were calculated so the basal nonspecific response was subtracted
away. This means that you know that the response (Y) at very low
concentrations of agonist (very low values of X) has to be zero. Prism needs to
know this to fit the data sensibly.
You don't have to do the fit over again. Instead click the button in the upper left
corner of the results table to return to the nonlinear regression dialog.
Go to the Constrain tab, and check constrain the parameter Bottom to have a
constant value which you set to 0.0.
2.5
154
151
152
2.5.1
12
153
be very accurate. If you are having problems estimating initial values, set aside
your data and simulate curves using the model. Change the variables one at a
time, and see how they influence the shape of the curve. Once you have a
better feel for how the parameters influence the curve, you might find it easier
to estimate initial values.
When fitting a simple model to clean data, it won't matter much if the initial
values are fairly far from the correct values. You'll get the same best-fit curve
no matter what initial values you use, unless the initial values are extremely far
from correct. Initial values matter more when your data have a lot of scatter or
your model has many variables.
Step 5. If you are fitting two or more data sets at once, decide whether to
share any parameters
If you enter data into two or more data set columns, Prism will fit them all in
one analysis. But each fit will be independent of the others unless you specify
that one or more parameters are shared. When you share parameters, the
analysis is called a global nonlinear regression 41 .
164
, Output
164
and Diagnostics
165
tab.
154
2.5.2
2.5.3
155
164
Diagnostics tab
tables.
2.5.3.1
165
Fit tab
Choose an equation
Selecting an equation is the most important step in fitting a curve. The choice
cannot be automated 12 .
If your goal is to fit a model in order to understand your data and make
comparisons, then choosing a model is a scientific decision that you must make
with care. If your goal is to interpolate unknowns from a standard curve, then it
matters less which equation you pick, so long as it ends up creating a smooth
curve through your data.
Some tips:
Prism provides a long list of equations that you can choose. But if these don't
fit your needs, don't be afraid to create or clone an equation 409 to fit your
needs.
Part of choosing a model is choosing constraints 158 . Don't skip that step. For
example, if you choose a sigmoidal dose response model, you must decide
whether you wish Prism to find the best fit value of the bottom plateau based
on the trend of the data. The alternative is to constrain the bottom plateau to
equal zero, if you have subtracted off a baseline, or some other value
(defined by controls). A computer can't make these decisions for you.
Choosing which constraints to apply to your model is a fundamental decision
in data analysis that can have a huge impact on the results.
If you are fitting several data sets at once, part of choosing a model is
156
deciding which parameters you want to share between data sets. When you
share a parameter (a choice on the Constrain tab 158 ), Prism finds one best-fit
value for the parameter that applies to all the data sets. Read more 41 about
shared parameters (global fitting).
Fitting method
Prism offers three methods to fit curves.
Interpolate
Check this option in order to interpolate the concentration of unknown samples
from the best-fit curve. Learn more. 107
With this option, Prism will report the Y value for any X values you enter, and
the X value for any Y values you enter (including extrapolating in each direction
a distance equal to half the length of the X axis).
2.5.3.2
157
Compare tab
When fitting biological data with regression, your main objective is often to
discriminate between different models, to ask if an experimental intervention
changed a parameter, or to ask if the best-fit value of a parameter differs
significantly from a theoretical value. Learn more about these four kinds of
comparisons 46 . Your choice, of course, has to be based on your experimental
goals.
Prism can perform the comparison using two alternative methods 48 : the extra
sum-of-squares F test 50 , and using Akaike's information criteria 51 . Use these
guidelines to choose:
In most cases, the two models will be 'nested'. This means that one model is
a simpler case of the other. For example, a one-phase exponential model is a
simpler case of a two-phase exponential model. Either the F test or the AICc
method may be used with nested models. The choice is usually a matter of
personal preference and tradition. Basic scientists in pharmacology and
physiology tend to use the F test. Scientists in fields like ecology and
population biology tend to use AICc.
If the models are not nested, then the F test is not valid so you should
choose AICc. Note that Prism does not enforce this. It will calculate the F test
even if the models are not nested, but the results won't be useful.
Both these methods only make sense when the models being compared have
different numbers of parameters, and so have different numbers of degrees of
freedom. If you want to compare two models with the same number of
parameters, there is no need to use either the F test or AIC. Simply choose the
model that fits the data the best with the smallest sum-of-squares.
158
2.5.3.3
Constrain tab
159
the values of one or more parameters can prevent the nonlinear regression
process from being led astray. With huge numbers of data points, you might
see a noticeable speeding up of the fitting process.
The constraint helped nonlinear regression choose from several local
minima. Nonlinear regression works by changing parameter values step by
step until no small change affects the sum-of-squares (which quantifies
goodness-of-fit). With some models, there can be two sets of parameter
values that lead to local minima in sum-of-squares. Applying a constraint
can ensure that nonlinear regression finds the minimum with scientifically
relevant values, and ignores another minimum that fits the curve well but
using parameter values that make no scientific sense (i.e. negative
concentrations).
The constraint prevents nonlinear regression from finding a minimum sumof-squares. Instead, the best the program can do (while obeying the
constraint) is set the parameter to the limit of the constrained range. Prism
then reports that the fit 'hit constraint'.
In the first case, the constraint is harmless but useless.
In the next two cases, the constraint helps the nonlinear regression reach
sensible results. Essentially, the constraint can give the nonlinear regression
process some scientific judgment about which parameter values are simply
impossible. These cases are really what constraints are for.
The last case, when the fit ends with a parameter set to one end of its
constraint, it is tricky to interpret the results. 231
some parameters between data sets. For each shared parameter, Prism finds
one (global) best-fit value that applies to all the data sets. For each non-shared
parameter, the program finds a separate (local) best-fit value for each data set.
Global fitting is a very useful tool in two situations:
The parameter(s) you care about are determined from the relationship
between several data sets. Learn more. 43
Each dataset is incomplete, but the entire family of datasets defines the
parameters. See example. 42
160
2.5.3.4
Weights tab
ROUT coefficient
If you ask Prism to automatically exclude outliers (a choice on the Fit tab) or to
count (but not remove from the analysis) outliers (a choice on the Diagnostics
tab), set the ROUT coefficient to determine how aggressively Prism defines
outliers.
We recommend using a value of 1%. Our simulations have shown that if all the
scatter is Gaussian, Prism will falsely find one or more outliers in about 2-3% of
experiments. If there really are outliers present in the data, Prism will detect
them with a False Discovery Rate less than 1%. See reference 1.
If you set Q to a higher value, the threshold for defining outliers is less strict.
This means that Prism will have more power to detect outliers, but also will
falsely detect 'outliers' more often. If you set Q to a lower value, the threshold
for defining outliers is stricter. This means that Prism will have a less power to
detect real outliers, but also have a smaller chance of falsely defining a point to
be an outlier.
If you set Q to 0, Prism will fit the data using ordinary nonlinear regression
without outlier identification.
1995-2014 GraphPad Software, Inc.
161
Unequal weighting
It is often useful to differentially weight the data points. Learn why.
30
31
Weight by 1/X or 1/X2 .These choices are used rarely. Only choose these
weighting schemes when it is the standard in your field, such as a linear fit of a
bioassay.
Weight by 1/SD2. If you enter replicate Y values at each X (say triplicates), it
is tempting to weight points by the scatter of the replicates, giving a point less
weight when the triplicates are far apart so the standard deviation (SD) is high.
But unless you have lots of replicates, this doesn't help much. The triplicates
constituting one mean could be far apart by chance, yet that mean may be as
accurate as the others. Weighting needs to be based on systematic changes in
scatter. The choice to weight by 1/SD2 is most useful when you want to use a
weighting scheme not available in Prism. In this case, enter data as mean and
SD, but enter as "SD" weighting values that you computed elsewhere for that
point. In other words, the values you enter in the SD subcolumn are not
actually standard deviations, but are weighting factors computed elsewhere.
Notes on weighting
If you have normalized your data, weighting rarely makes sense.
162
Simulations can show you how much difference it makes if you choose the
wrong weighting scheme.
If you choose unequal weighting, Prism takes this into account
plotting residuals.
Prism accounts for weighting when it computes R
Learn about the math of weighting
31
187
187
179
when
65
Replicates
Reference
1. Motulsky HM and Brown RE, Detecting outliers when fitting data with
nonlinear regression a new method based on robust nonlinear regression
and the false discovery rate, BMC Bioinformatics 2006, 7:123..
2.5.3.5
163
compute initial values. These rules use the range of the X and Y values to come
up with initial values, which become the original automatic initial values. You
can change the rules 433 for user-defined equations, and can clone built-in
equations to make them user-defined. The new rules will be invoked when you
next choose this equation for a new analysis. It won't change the initial values
for the analysis you are working on.
164
2.5.3.6
Range tab
Table of XY coordinates
Check this option if you want to see a table of XY coordinates of line segments
defining the curve.
In prior versions of Prism, this table was critical to plotting the curve and
interpolating unknowns. But Prism 5 can graph the curve and interpolate
unknowns from the curve, without creating this table. Check the option to
create this table only if you want to copy (or export) the table into another
program. In most cases, you will have no need to view this table, and so should
leave the option unchecked.
2.5.3.7
Output tab
Summary table
When analyzing several data sets, the results table is rather lengthy. To display
key results on a summary table, check the option box to create a summary
table and select the variable you wish to summarize. Prism creates a summary
table (as an additional results view) that shows the best-fit value of that
parameter for each data set, and graphs this table.
Depending on your choices in the dialog, this may be a bar graph or an XY
graph. It shows the best-fit value of a selected parameter for each data set on
the table. In some cases, you may analyze the summary table with linear or
nonlinear regression. For example, the summary graph may show the best-fit
value of a rate constant as a function of concentration (obtained from the
column titles of the original data). You can fit a line or curve to that graph.
When Prism compares the fits of two equations, it shows only the results for
the second equation. Since this may not be helpful, we suggest that you only
165
Additional output
Prism offers two options here (dose ratios, and Ki from IC50). These are
available only to be compatible with earlier versions of Prism (so Prism can
open files made with older versions). They are almost always unavailable,
becoming available only if you select certain equations from the classic
equations list.
Since Prism 5, you can calculate a Gaddum/Schild EC50 shift 277 directly, without
need to separately compute dose ratios. Similarly, you can fit competitive
binding curves directly to determine the Ki for one 312 or two 317 sites, without a
separate Cheng-Prusoff calculation of Ki from IC50(1).
2.5.3.8
Diagnostics tab
Nonlinear regression works iteratively, and begins with initial values 162 for each
parameter. Check "don't fit the curve" to see the curve generated by your initial
values. If the curve is far from the data, go back to the initial parameters tab
and enter better values for the initial values. Repeat until the curve is near the
points. Then go back to the Diagnostics tab and check "Fit the curve". This is
1995-2014 GraphPad Software, Inc.
166
If your goal is to find the best-fit value of the parameters, you will also want to
know how precise those estimates are. We suggest that you report
confidence intervals, as inspecting the confidence intervals of best-fit
parameters is an essential part of evaluating any nonlinear fit. Standard errors
are intermediate values used to compute the confidence intervals, but are not
very useful by themselves. Include standard errors in the output to compare
Prism's results to those of another program that doesn't report confidence
intervals.
Choose to report confidence intervals as a range or separate blocks of lower
and upper confidence limits (useful if you want to paste the results into another
program).
1500
1500
1000
1000
Response
Response
500
167
500
0
-9
-8
-7
-6
-5
-4
-3
log(Dose)
-9
-8
-7
-6
-5
-4
-3
log(Dose)
The 95% confidence bands enclose the area that you can be 95% sure
contains the true curve. It gives you a visual sense of how well your data define
the best-fit curve.
The 95% prediction bands enclose the area that you expect to enclose 95%
of future data points. This includes both the uncertainty in the true position of
the curve (enclosed by the confidence bands), and also accounts for scatter of
data around the curve. Therefore, prediction bands are always wider than
confidence bands. When you have lots of data points, the discrepancy is huge.
Learn more about confidence and prediction bands.
200
You will probably want to ask Prism to report R2 187 , simply because it is
standard to do so, even though knowing R2 doesn't really help you interpret the
results. Reporting the sum-of-squares and sy.x will only be useful if you want to
compare Prism's results to those of another program, or you want to do
additional calculations by hand.
The adjusted R 190 2 190 takes into account the number of parameters fit to the
data, so has a lower value than R2 (unless you fit only one parameter, in which
case R2 and adjusted R2 are identical).
Prism also lets you choose to report the AICc 51 . This would be useful only if
you separately fit the same data to three or more models. You can then use
the AICc to choose between them. But note that it only makes sense to
compare AICc between fits, when the only difference is the model you chose. If
the data or weighting are not identical between fits, then the comparison of
AICc values would be meaningless.
1995-2014 GraphPad Software, Inc.
168
Normality tests
Does the curve follow the trend of the data? Or does the curve systematically
deviate from the trend of the data? Prism offers two tests that answer these
questions.
If you have entered replicate Y values, choose the replicates test 194 to find out
if the points are 'too far' from the curve (compared to the scatter among
replicates). If the P value is small, conclude that the curve does not come close
enough to the data.
The runs test 193 is available if you entered single Y values (no replicates) or
chose to fit only the means rather than individual replicates (weighting tab). A
'run' is a series of consecutive points on the same side of the curve. If there are
too few runs, it means the curve is not following the trend of the data.
If you choose a residual plot 179 , Prism creates a new graph. The X axis is the
same as the graph of the data, while the Y axis plots the distance of each point
from the curve (the residuals). Points with positive residuals are above the
curve; points with negative residuals are below the curve. Viewing a residual
plot can help you assess whether the distribution of residuals is random above
and below the curve.
169
160
If you chose the option in the Fit tab 155 to exclude outliers from the
calculations, then this option to simply count outliers (in the Diagnostics tab) is
not available.
1995-2014 GraphPad Software, Inc.
170
Nonlinear regression is an iterative process. It starts with initial values 162 of the
parameters, and then repeatedly changes those values to increase the
goodness-of-fit. Regression stops when changing the values of the parameters
makes a trivial change in the goodness of fit.
Prism lets you define the convergence criteria in three ways. The medium
choice is default, and will work fine in most cases. With this choice, nonlinear
regression ends when five iterations in a row change the sum-of-squares by
less than 0.0001%. If you are having trouble getting a reasonable fit, you might
want to try the stricter definition of convergence: five iterations in a row
change the sum-of-squares by less than 0.00000001%. It won't help very
often, but is worth a try. The only reason not to always use the strictest choice
is that it takes longer for the calculations to complete. That won't matter with
small data sets, but will matter with large data sets or when you run scripts to
analyze many data tables.
If you are fitting huge data sets, you can speed up the fit by using the 'quick'
definition of convergence: Two iterations in a row change by less than 0.01%.
2.5.4
171
on graphing curves
171
176
Graphing residuals
.
179
2.5.4.1
171
contains the true curve. It gives you a visual sense of how well your data define
the best-fit curve.
The 95% prediction bands enclose the area that you expect to enclose 95%
of future data points. This includes both the uncertainty in the true position of
the curve (enclosed by the confidence bands), and also accounts for scatter of
data around the curve.
Choose in the Diagnostics tab 165 of the Nonlinear regression parameters dialog.
Learn how they are calculated 67 .
Prediction bands are always wider than confidence bands. When you have lots
of data points, the discrepancy is huge.
1500
1500
1000
1000
Response
Response
172
500
500
0
-9
-8
-7
-6
-5
-4
-3
-9
-8
-7
log(Dose)
-6
-5
-4
-3
log(Dose)
173
174
175
226
Prism does not automatically write equations on the graph. There are two
approaches you can use to do so.
Microsoft Office comes with an equation editor. If Microsoft Equation Editor is
installed on your system, you will see a quick-access button on your toolbar
when you are on either a graph or a layout sheet.
176
You can create either a generic equation, or an equation with the best-fit values
of the parameters substituted for the variable names.
If the button doesn't appear on your toolbar, create the equation inside of
Microsoft Word or some other program, and then copy and paste.
Another approach is to simply show the parameter values from the curve of
best fit in tabular form. Copy selected results and then paste onto a graph. The
table will be linked, so its values will change if you edit or replace the data. This
doesn't show the form of the equation (exponential decay, in this example) but
gives you the results of this particular experiment, which may be more
important.
The example below shows use of the equation editor to write both a generic
and specific equation (for the control data set) and an embedded table showing
the results for both data sets.
2.5.4.4
Graphing outliers
177
these should simply be plotted or also ignored by the curve fitting process.
Choose to count the outliers on the Diagnostics tab 165 , and to eliminate outliers
on the Fit tab 59 of the nonlinear regression dialog.
You can adjust the value of Q
outliers.
160
The example above shows a graph where each replicate is plotted individually.
The example below shows a graph which plots mean and SD error bars. The
error bars are computed from all the data, including any outliers. The outliers
are then superimposed on the graph as well.
178
Note that Prism does not remove the outlier from the regular data set. When
you plot mean and error bar, the outlier is included in the calculation. When you
plot individual data points, the outlier is still plotted with the full dataset. Then
Prism superimposes the outlier(s) as a separate dataset plotted in front.
Look back at the first example, showing each replicate. You can see that the
curve goes over the data points. If you want the curve to go under the data
points, click the Reverse/Flip/Rotate button.
The results, above, are not exactly what you want. Indeed the relative front-toback order of data and curve has reversed, so the curve is now behind the
179
points. But the relative front-to-back order of the full dataset and the outlier
dataset is also reversed, so the outlier is now behind the regular data point, so
is invisible. This demonstrates that Prism does not really change the color of the
data points of outliers. Rather, it plots the outliers twice -- once as part of the
full data set and again as part of the outlier dataset superimposed on the graph.
To get the effect you want here, go to the Data on Graph tab of the Format
Graph dialog, and fine-tune the back-to-front order of data sets (with the
outlier(s) and curve considered to be data sets).
2.5.4.5
Residual plot
180
see that the data points are not randomly distributed above and below the
curve. There are clusters of points at early and late times that are below the
curve, and a cluster of points at middle time points that are above the curve.
This is much easier to see on the graph of the residuals in the inset. The data
are not randomly scattered above and below the X-axis.
2.6
181
225
Analysis
checklist
218
Analysis
checklist
221
2.6.1
Learn more.
181
Learn more.
212
Learn more.
104
223
182
227
185
187
193
Replicates test
194
2.6.1.1
183
197
204
206
207
184
185
1500
1500
1000
1000
Response
Response
The 95% confidence bands enclose the area that you can be 95% sure
contains the true curve. It gives you a visual sense of how well your data define
the best-fit curve. It is closely related to the 95% prediction bands , which
enclose the area that you expect to enclose 95% of future data points. This
includes both the uncertainty in the true position of the curve (enclosed by the
confidence bands), and also accounts for scatter of data around the curve.
Therefore, prediction bands are always wider than confidence bands.
500
500
0
-9
-8
-7
-6
-5
-4
-3
log(Dose)
2.6.1.2
-9
-8
-7
-6
log(Dose)
-5
-4
-3
186
160
Exclude outliers
59
A large P value means that your data are consistent with the assumptions of
regression (but certainly does not prove that the model is correct). With small
numbers of data points, normality tests have little power to detect modest
deviations from a Gaussian distribution.
187
2.6.1.3
R squared
Meaning of R2
Key points about R2
The value R2 quantifies goodness of fit.
It is a fraction between 0.0 and 1.0, and has no units. Higher values indicate
that the model fits the data better.
When R2 equals 0.0, the best-fit curve fits the data no better than a
horizontal line going through the mean of all Y values. In this case, knowing X
does not help you predict Y.
When R2 =1.0, all points lie exactly on the curve with no scatter. If you know
X you can calculate Y exactly.
You can think of R2 as the fraction of the total variance of Y that is explained
by the model (equation). With experimental data (and a sensible model) you
will always obtain results between 0.0 and 1.0.
There is really no general rule of thumb about what values of R2 are high,
adequate or low. If you repeat an experiment many times, you will know
what values of R2 to expect, and can investigate further when R2 is much
lower than the expected value.
By tradition, statisticians use uppercase (R2 ) for the results of nonlinear and
multiple regression and lowercase (r2 ) for the results of linear regression, but
188
Don't overemphasize R2
A common mistake is to use R2 as the main criteria for whether a fit is
reasonable. A high R2 tells you that the curve came very close to the points.
That doesn't mean the fit is "good" in other ways. The best-fit values of the
parameters may have values that make no sense (for example, negative rate
constants) or the confidence intervals may be very wide. The fit may be
ambiguous. You need to look at all the results to evaluate a fit, not just the R2 .
189
R2 can be negative!
Appearances can be deceptive. R2 is not really the square of anything. If SSres
is larger than SStot, R2 will be negative (see equation above). While it is
surprising to see something called "squared" have a negative value, it is not
impossible (since R2 is not actually the square of R).
How can this happen? SSres is the sum of the squares of the vertical distances
of the points from the best-fit curve (or line). SStot is the sum of the squares
of the vertical distances of the points from a horizontal line drawn at the mean
Y value. SSres will exceed SStot when the best-fit line or curve fits the data
even worse than does a horizontal line.
R2 will be negative when the best-fit line or curve does an awful job of fitting the
data. This can only happen when you fit a poorly chosen model (perhaps by
mistake), or you apply constraints to the model that don't make any sense
(perhaps you entered a positive number when you intended to enter a negative
number). For example, if you constrain the Hill slope of a dose-response curve
to be greater than 1.0, but the curve actually goes downhill (so the Hill slope is
negative), you might end up with a negative R2 value and nonsense values for
the parameters.
Below is a simple example. The blue line is the fit of a straight line constrained
to intercept the Y axis at Y=150 when X=0. SSres is the sum of the squares of
the distances of the red points from this blue line. SStot is the sum of the
squares of the distances of the red points from the green horizontal line. Since
Sres is much larger than SStot, the R2 (for the fit of the blue line) is negative.
190
Prism file
If R2 is negative, check that you picked an appropriate model, and set any
constraints correctly.
Adjusted R2
The R2 quantifies how well a model fits the data. When you compare models,
the one with more parameters can bend and twist more to come nearer the
points, and so almost always has a higher R2 . This is a bit misleading.
The adjusted R2 accounts for the number of parameters fit by the regression,
and so can be compared between models with different numbers of
parameters. The equations for the regular and adjusted R2 are compared here
(SSresiduals is the sum-of-squares of the discrepancy between the Y value of
the curve and the data; SStotal is the sum-of-squares of the differences
between the overall Y mean and each Y value; n is the number of data points,
and K is the number of parameters fit):
191
2.6.1.4
Sum-of-squares
Sum-of-squares
Nonlinear regression finds the curve that minimizes the sum of square of the
distances of the points from the curve. So Prism reports that sum-of-square
value. This is useful if you want to compare Prism with another program, or
compare two fits manually. Otherwise, the value is not very helpful.
If you choose to differentially weight your data, Prism reports both the absolute
and the weighted sum-of-squares.
If you chose robust regression, Prism computes a different value we call the
Robust Standard Deviation of the Residuals (RSDR). The goal here is to
compute a robust standard deviation, without being influenced by outliers. In a
Gaussian distribution, 68.27% of values lie within one standard deviation of the
mean. We therefore calculate this value, which we call P68. It turns out that this
value underestimates the SD a bit, so the RSDR is computed by multiplying the
P68 by N/DF, where N is the number of points fit and DF is the number of
degrees of freedom (equal to N -K, where K is the number of parameters fit).
2.6.1.5
192
Chi-square is the sum of the square of the ratio of the distance of a point from
the curve divided by the predicted standard deviation at that value of X. Note
that the denominator is the predicted standard deviation, not the actual
standard deviation computed in this particular experiment.
If you know that the SD is the same for all values of X, this simplifies to:
The standard deviation value must be computed from lots of data so the SD is
very accurate. Or, better, the SD can come from theory.
If you assume that replicates are scattered according to a Gaussian distribution
with the SD you entered, and that you fit the data to the correct model, then
the value of chi-square computed from that equation will follow a known chisquare distribution. This distribution depends on the number of degrees of
freedom, which equals the number of data points minus the number of
parameters. Knowing the value of chi-square and the number of degrees of
freedom, a P value can be computed.
How can you interpret a small P value? If you are quite sure the scatter really is
Gaussian, and that predicted SD is correct, a small P value tells you that your
model is not right -- that the curve really doesn't follow the data very well. You
should seek a better model.
But often a low P value just tells you that you don't predict the SD as well as
you thought you would. It is hard to determine the SD values precisely, so hard
1995-2014 GraphPad Software, Inc.
193
to interpret the chi-square value. For this reason, Prism doesn't attempt the
chi-square computation. We fear it would be more misleading than helpful.
Alternatives
Several approaches have been devised to answer the question of whether the
SS is too high:
The value of sum-of-squares can be used to compute R2 . This value is
computed by comparing the sum-of-squares (a measure of scatter of points
around the curve) with the total variation in Y values (ignoring X, ignoring the
model). What values of R2 do you expect? How low a value is too low? You
can't really answer that in general, as the answer depends on your
experimental system.
If you have collected replicate Y values at each value of X, you can compare
the SS with a value predicted from the scatter among replicates. Prism calls
this the replicates test 194 . This is very useful, but only if you have collected
replicate Y measurements at each X.
You can propose alternative models, and compare their fit to the data
Summary
Chi-square compares the actual discrepancies between the data and the curve
with the expected discrepancies (assuming you selected the right model) based
on the known SD among replicates. If the discrepancy is high, then you have
some evidence that you've picked the wrong model. The advantage of the chisquare calculation is that it tests the appropriateness of a single model, without
having to propose an alternative model and without having to have replicate
values. The disadvantage is that the calculation depends on knowing the SD
values with sufficient precision, which is often not the case.
Recommended alternatives are to compare the fits of two models
the replicates test 194 .
2.6.1.6
46
, or use
Runs test
194
chose to fit only the means rather than individual replicates (weighting tab). If
you entered and analyzed replicate data, use the replicates test 194 instead.
A run is a series of consecutive points that are either all above or all below the
regression curve. Another way of saying this is that a run is a consecutive
series of points whose residuals are either all positive or all negative. After fitting
a curve, Prism counts the actual number of runs and calculates the predicted
number of runs (based on number of data points). The runs test compares
these two values.
Replicates test
195
you can't really answer that question (except by referring to other similar
experiments). But if you have collected replicate Y values at each X, then you
can ask whether the average distance of the points from the curve is 'too far'
compared to the scatter among replicates.
If you have entered replicate Y values, choose the replicates test to find out if
the points are 'too far' from the curve (compared to the scatter among
replicates). If the P value is small, conclude that the curve does not come close
enough to the data.
Example
The response at the last two doses dips down a bit. Is this coincidence? Or
evidence of a biphasic response?
One way to approach this problem is to specify an alternative model, and then
compare the sums-of-squares of the two fits. In this example, it may not be
clear which biphasic model to use as the alternative model. And there probably
is no point in doing serious investigation of a biphasic model in this example,
without first collecting data at higher doses.
Since replicate values were obtained at each dose, the scatter among those
replicates lets us assess whether the curve is too far from the points.
After checking the option (on the Diagnostics tab) to perform the replicates
test, Prism reports these results:
196
The value in the first row quantifies the scatter of replicates, essentially pooling
the standard deviations at all X values into one value. This value is based only
on variation among replicates. It can be computed without any curve fitting. If
the replicates vary wildly, this will be a high value. If the replicates are all very
consistent, this value will be low.
The value in the second row quantifies how close the curve comes to the mean
of the replicates. If the curve comes close the the mean of the replicates, the
value will be low. If the curve is far from some of those means, the value will be
high.
The third row (F) is the square of the ratio of those two SD values.
If the model is correct, and all the scatter is Gaussian random variation around
the model, then the two SD values will probably be similar, so F should be near
1. In this example, F is greater than 1 because the SD for lack of fit is so much
greater than the SD of replicates. The P value answers this question:
If the model was chosen correctly and all scatter is Gaussian, what is the chance of
finding a F ratio so much greater than 1.0?
A small P value is evidence that the data actually follow a model that is different
than the model that you chose. In this example, this suggests that maybe
some sort of biphasic dose-response model is correct -- that the dip of those
last few points is not just coincidence.
197
Intertwined parameters
When your model has two or more parameters, as is almost always the case,
the parameters can be intertwined.
What does it mean for parameters to be intertwined? After fitting a model,
change the value of one parameter but leave the others alone. this will move
the curve away from the points. Now change the other parameter(s) in an
attempt to to move the curve close to the data points. If you can bring the
curve closer to the points, the parameters are intertwined. If you can bring the
curve back to its original position, then the parameters are completely
redundant.
198
Prism can quantify the relationships between parameters in two ways. If you
are in doubt, we suggest that you focus on the dependency values and not
bother with the covariance matrix. 199
Dependency
What is dependency and how do I ask Prism to compute it?
Dependency is reported for each parameter, and quantifies the degree to which
that parameter is intertwined with others. Check a check box on the
Diagnostics tab of nonlinear regression to view dependencies for each
parameter.
Interpreting dependency
The value of dependency always ranges from 0.0 to 1.0.
A dependency of 0.0 is an ideal case when the parameters are entirely
independent (mathematicians would say orthogonal). In this case the increase
in sum-of-squares caused by changing the value of one parameter cannot be
reduced at all by also changing the values of other parameters. This is a very
rare case.
A dependency of 1.0 means the parameters are redundant. After changing the
value of one parameter, you can change the values of other parameters to
reconstruct exactly the same curve. If any dependency is greater than 0.9999,
GraphPad labels the fit 'ambiguous 228 '.
With experimental data, of course, the value will almost always lie between
these extremes. Clearly a low dependency value is better. But how high is too
high? Obviously, any rule-of-thumb is arbitrary. But dependency values up to
0.90 and even 0.95 are not uncommon, and are not really a sign that anything
is wrong.
A dependency greater than 0.99 is really high, and suggests that something is
wrong. This means that you can create essentially the same curve, over the
range of X values for which you collected data, with multiple sets of parameter
values. Your data simply do not define all the parameters in your model. If your
dependency is really high, ask yourself these questions:
Can you fit to a simpler model?
199
2.6.1.9
70
Covariance matrix
What is the covariance matrix and how do I ask Prism to compute it?
The normalized covariance is reported for each pair of parameters, and
quantifies the degree to which those two parameters are intertwined. Check a
check box on the Diagnostics tab of nonlinear regression to view this
covariance matrix.
200
1500
1500
1000
1000
Response
Response
The 95% confidence bands enclose the area that you can be 95% sure
contains the true curve. It gives you a visual sense of how well your data define
the best-fit curve. It is closely related to the 95% prediction bands , which
enclose the area that you expect to enclose 95% of future data points. This
includes both the uncertainty in the true position of the curve (enclosed by the
confidence bands), and also accounts for scatter of data around the curve.
Therefore, prediction bands are always wider than confidence bands.
500
500
0
-9
-8
-7
-6
log(Dose)
-5
-4
-3
-9
-8
-7
-6
-5
-4
-3
log(Dose)
201
202
1.Check the option (on the Diagnostics tab) to plot "90%" confidence (or
prediction) bands. When you plot in only one direction, this is really a 95%
confidence band.
2. On the Format Graph dialog, choose the data set that defines regression
curve and make sure that error bars are turned on with the "---" style.
3.Choose to plot those error bands in one direction.
203
If the best-fit value of a parameter hits a constraint 231 , the fit is unlikely to be
useful. Prism does not plot confidence or prediction bands, because they
would almost certainly be misleading.
If the results of nonlinear regression are ambiguous 228 , the confidence or
prediction bands would be super wide, maybe infinitely wide. They would not
be useful, so Prism does not plot them.
If you choose robust nonlinear regression, Prism does not compute
confidence or prediction bands, as it cannot compute standard errors or
confidence intervals of the parameters.
The fit is perfect 233 . If the sum-of-squares is 0.0 and R2 is 1.0, it is not
possible to compute or interpret confidence or prediction bands.
If the fit is interrupted
226
204
The prediction bands extend a further distance above and below the curve,
equal to:
= sqrt(c+1)*sqrt(SS/DF)*CriticalT(Confidence%, DF)
In both these equations, the value of c (defined above) depends on the value of
X, so the confidence and prediction bands are not a constant distance from the
curve. The value of SS is the sum-of-squares for the fit, and DF is the number
of degrees of freedom (number of data points minus number of parameters).
CriticalT is a constant from the t distribution based on the amount of confidence
you want and the number of degrees of freedom. For 95% limits, and a fairly
large df, this value is close to 1.96. If DF is small, this value is higher.
Read a more mathematical explanation.
205
0.10 - 0.25
Adequate
0.25 - 1.00
> 1.00
Note that these values are for the absolute value of the Hougaard's measure.
For the simulated data set for the example above, Hougaard's skewness is
0.09 for Khalf and 1.83 for Kprime. This one value (no simulations needed) tells
you to choose the form of the model that fits Khalf rather than the form that
fits Kprime.
Notes
Hougaard's measure of skewness is measured for each parameter in the
equation (omitting parameters fixed to constant values).
Prism does not compute Hougaard's skewness if you chose unequal
weighting, or a robust fit, because the method is not defined for these
situations.
The values depend on the equation, the number of data points, the spacing of
the X values, and the values of the parameters.
Hougaard's measure of skewness has no units.
206
References
1. P. Hougaard. The appropriateness of the asymptotic distribution in a
nonlinear regression model in relation to curvature. Journal of the Royal
Statistical Society. Series B (Methodological) (1985) pp. 103-114
2. David A. Ratkowsky, Nonlinear Regression Modeling: A Unified Practical
Approach (Statistics: a Series of Textbooks and Monogrphs).
IBSN:0824719077
3.SAS documentation about Hougaard's measure.
2.6.1.12 Could the fit be a local minimum?
207
the ridge that you are unaware of. In nonlinear regression, large changes in
variables might decrease the sum-of-squares.
This problem (called finding a local minimum) is intrinsic to nonlinear regression,
no matter what program you use. You will rarely encounter a local minimum if
your data have little scatter, you collected data over an appropriate range of X
values, and you have chosen an appropriate equation.
165
If you chose to exclude outliers (on the Fit tab 155 ), then the outliers are
ignored by the fit, but are still included on the graph 176 .
In both cases, the aggressiveness of the outlier hunt is determined by the ROUT
208
160
209
fit the data with the outlier ignored, go back to the nonlinear regression
dialog and choose "Automatic outlier elimination" (on the Fit tab).
2.6.1.14 Troubleshooting nonlinear regression
210
218
221
223
211
Prism 5 and 6 use the same algorithms, so should always report identical
results.
However, Prism 4 used slightly different algorithms, so curve fitting results can
differ from results with Prism 5 or 6 in these cases:
If your fit is labeled "Ambiguous" by Prism 5 or 6, you know that some of the
parameters are not determined precisely. Prism 4 presented a full set of
results in this case, but the results are not useful when the fit is ambiguous.
If you chose no weighting, check the sum-of-squares from the two
programs. The goal of regression is to minimize that sum of squares, so see
which version of Prism found a fit with the smaller sum-of-squares. Prism 5
and 6 have a few improvements in the fitting algorithm, so occasionally it can
find a better fit than did Prism 4. The differences, if any, are usually trivial.
If you chose to weight by the Y values (or the Y values squared), Prism 5 and
6 handle weighting differently than did Prism 4. Prism now weights by the Y
value of the curve, while Prism 4 (and earlier releases) weighted by the Y
value of the data. Weighting by the Y value of the curve is better, so the
results of Prism 5 and 6 are more correct. Since the weighting is computed
differently, you can't directly compare the weighted sum-of-square values
reported by the two versions of Prism.
When you compare two models, Prism now does an extra step. If one of the
models is ambiguous, then Prism chooses the other model, without doing the
F test or AIC comparison.
Prism now offers more rules for defining initial parameter values. If your
equation uses one of these new rules, Prism 4 might not be able to find a
reasonable fit until you tweak those initial values. In particular, Prism now has
smarter rules for fitting sigmoidal log(dose) vs. response curves.
If you entered data as mean, SD (or SEM) and N, then Prism 4 (by default)
fits the means and weights by the sample size (N). This is one of the two
options on the weighting tab (the other option is to fit means only, ignoring
N). Prism is now smarter by default (although you can choose to just fit the
means and ignore N and SD values). It accounts not only for sample size N
but also for the SD (or SEM) values you enter. With Prism 5 or 6 (but not
Prism 4), you'll get exactly the same results from data entered as mean, SD
and N as you would have by entering raw data. Prism 4 only accounts for
differences in N, but not SD. The best fit values of the parameters, and thus
the appearance of the curve, is the same with Prism 4 and 5. But Prism 5
212
2.6.2
212
of
2.6.2.1
213
freedom. If you want to compare two models with the same number of
parameters, there is no need to use either the F test or AIC. Simply choose the
model that fits the data the best with the smallest sum-of-squares.
2.6.2.2
214
215
Prism names the null and alternative hypotheses, and reports the P value. You
set the threshold P value in the Compare tab of the nonlinear regression dialog.
If the P value is less than that threshold, Prism chooses (and plots) the
alternative (more complicated) model. It also reports the value of F and the
numbers of degrees of freedom, but these will be useful only if you want to
compare Prism's results with those of another program or hand calculations.
2.6.2.3
This alternative approach is based on information theory, and does not use the
traditional hypothesis testing statistical paradigm. Therefore it does not
generate a P value, does not reach conclusions about statistical significance,
and does not reject any model.
The method determines how well the data supports each model, taking into
account both the goodness-of-fit (sum-of-squares) and the number of
parameters in the model. The results are expressed as the probability that each
model is correct, with the probabilities summing to 100%. If one model is much
more likely to be correct than the other (say, 1% vs. 99%), you will want to
choose it. If the difference in likelihood is not very big (say, 40% vs. 60%), you
will know that either model might be correct, so will want to collect more data.
Of course, these probabilities are meaningful only in the context of comparing
those two models. It is possible a third model you didn't test fits much better
so is much more likely to be correct.
Prism names the null and alternative hypotheses, and reports the likelihood that
each is correct It also reports the difference between the AICc values (as the
AICc of the simple model minus the AICc of the more complicated model), but
this will be useful only if you want to compare Prism's results with those of
another program or hand calculations. Prism chooses and plots the model that
is more likely to be correct, even if the difference in likelihoods is small.
2.6.2.4
Comparing models only makes sense when both models 'see' exactly the same
set of data. That makes it tricky to combine outlier elimination with model
1995-2014 GraphPad Software, Inc.
216
The equations above show how the adjusted R2 is computed. The sum-ofsquares of the residuals from the regression line or curve have n-K degrees of
freedom, where n is the number of data points and K is the number of
parameters fit by the regression. The total sum-of-squares is the sum of the
squares of the distances from a horizontal line through the mean of all Y values.
Since it only has one parameter (the mean), the degrees of freedom equals n 1995-2014 GraphPad Software, Inc.
217
2.6.3
218
221
218
2.6.3.1
Curve
Does the graph look sensible?
Your first step should be to inspect a graph of the data with superimposed
curve. Most problems can be spotted that way.
Does the runs or replicate test tell you that the curve deviates
systematically from the data?
The runs 193 and replicates 194 tests are used to determine whether the curve
follows the trend of your data. The runs test is used when you have single Y
values at each X. It asks if data points are clustered on either side of the curve
rather than being randomly scattered above and below the curve. The
replicates test is used when you have replicate Y values at each X. It asks if the
points are 'too far' from the curve compared to the scatter among replicates.
If either the runs test or the replicates test yields a low P value, you can
conclude that the curve doesn't really describe the data very well. You may
have picked the wrong model, or applied invalid constraints.
Parameters
Are the best-fit parameter values plausible?
When evaluating the parameter values reported by nonlinear regression, check
219
that the results are scientifically plausible. Prism doesn't 'know' what the
parameters mean, so can report best-fit values of the parameters that make
no scientific sense. For example, make sure that parameters don't have
impossible values (rate constants simply cannot be negative). Check that EC50
values are within the range of your data. Check that maximum plateaus aren't
too much higher than your highest data point.
If the best-fit values are not scientifically sensible, then the results won't be
useful. Consider constraining the parameters to a sensible range, and trying
again.
How precise are the best-fit parameter values?
You don't just want to know what the best-fit value is for each parameter. You
also want to know how certain that value is. Therefore an essential part of
evaluating results from nonlinear regression is to inspect the 95% confidence
intervals for each parameter.
If all the assumptions of nonlinear regression are true, there is a 95% chance
that the interval contains the true value of the parameter. If the confidence
interval is reasonably narrow, you've accomplished what you wanted to do
found the best fit value of the parameter with reasonable certainty. If the
confidence interval is really wide, then you've got a problem. The parameter
could have a wide range of values. You haven't nailed it down. How wide is 'too
wide' depends on the scientific context of your work.
Are the confidence bands 'too wide'?
Confidence bands visually show you how precisely the parameters have been
determined. Choose to plot confidence bands by checking an option on the Fit
tab of the nonlinear regression dialog. If all the assumptions of nonlinear
regression have been met, then there is a 95% chance that the true curve falls
between these bands. This gives you a visual sense of how well your data
define the model.
Residuals
Does the residual plot look good?
A residual plot shows the relationship between the X values of your data and
the distance of the point from the curve (the residuals). If the assumptions of
the regression are met, the residual plot 179 should look bland, with no trends
apparent.
220
Does the scatter of points around the best-fit curve follow a Gaussian
distribution?
Least squares regression is based on the assumption that the scatter of points
around the curve follows a Gaussian distribution. Prism offers three normality
tests (in the Diagnostics tab) that can test this assumption (we recommend the
D'Agostino test). If the P value for a normality test is low, you conclude that
the scatter is not Gaussian.
Could outliers be impacting your results?
The presence of one or a few outliers (points much further from the curve than
the rest) can overwhelm the least-squares calculations and lead to misleading
results.
You can spot outliers by examining a graph (so long as you plot individual
replicates, and not mean and error bar). But outliers can also be detected
automatically. GraphPad has developed a new method for identifying outliers
we call the ROUT method. Check the option on the diagnostics tab to count the
outliers, but leave them in the calculations. Or check the option on the Fit tab to
exclude outliers from the calculations.
Models
Would another model be more appropriate?
Nonlinear regression finds parameters that make a model fit the data as closely
as possible (given some assumptions). It does not automatically ask whether
another model might work better.
Even though a model fits your data well, it may not be the best, or most
correct, model. You should always be alert to the possibility that a different
model might work better. In some cases, you can't distinguish between models
without collecting data over a wider range of X. In other cases, you would need
to collect data under different experimental conditions. This is how science
moves forward. You consider alternative explanations (models) for your data,
and then design experiments to distinguish between them.
If you chose to share parameters among data sets, are those
datasets expressed in the same units?
Global nonlinear regression (any fit where one or more parameter is shared
among data sets) minimizes the sum (over all datasets) of the sum (over all
221
data points) of the squared distance between data point and curve. This only
makes sense if the Y values for all the datasets are expressed in the same
units.
Goodness of fit
Is the R2 'too low' compared to prior runs of this experiment?
While many people look at R2 first, it really doesn't help you understand the
results very well. It only helps if you are repeating an experiment you have run
many times before. If so, then you know what value of R2 to expect. If the R2
is much lower than expected, something went wrong. One possibility is the
presence of outliers.
Are the values of sum-of-squares and sy.x 'too low' compared to
prior runs of this experiment?
These values are related to the R2, and inspecting the results can only be useful
when you have done similar experiments in the past so know what values to
expect.
2.6.3.2
222
AICc methods
48
Finally, Prism reports the 'preferred' model. You should understand how Prism
decides which model is 'preferred' as you may 'prefer' the other model.
If you chose the extra sum-of-squares F test, then Prism computes a P value
that answers this question:
If the null hypothesis is really correct, in what fraction of experiments (the
size of yours) will the difference in sum-of-squares be as large as you
observed or larger?
In the Compare tab, you also tell Prism which P value to use as the cut off (the
default is 0.05). If the P value is lower than that value, Prism chooses the more
complicated model. Otherwise it chooses the simpler model.
If you chose Akaike's method, Prism chooses the model that is more likely to
be correct. But you should look at the two probabilities. If they are similar in
value, then the evidence is not persuasive and both models fit pretty well.
228
or perfect
233
223
2.6.3.3
224
Established assay
If the assay is well established, then you know you are fitting the right model
and know what kind of results to expect. In this case, evaluating a fit is pretty
easy.
Does the curve go near the points?
Is the R2 'too low' compared to prior runs of this assay?
If so, look for outliers, or use Prism's automatic outlier detection.
Are the confidence bands too wide?
The confidence bands 171 let you see how accurate interpolations will be, so we
suggest always plotting prediction bands when your goal is to interpolate from
the curve. If your are running an established assay, you know how wide you
expect the prediction bands to be.
New assay
With a new assay, you also have to wonder about whether you picked an
appropriate model.
Does the curve go near the points?
Look at the graph. Does it look like the curve goes near the points.
Are the prediction bands too wide?
How wide is too wide? The prediction bands show you how precise
interpolations will be. Draw a horizontal line somewhere along the curve, and
look at the two places where that line intercepts the prediction bands. This will
be the confidence interval for the interpolation.
Does the scatter of points around the best-fit curve follow a Gaussian
distribution?
Least squares regression is based on the assumption that the scatter of points
around the curve follows a Gaussian distribution. Prism offers three normality
tests (in the Diagnostics tab) that can test this assumption (we recommend the
D'Agostino test). If the P value for a normality test is low, you conclude that
225
2.6.4
226
"Not converged"
"Ambiguous"
228
"Hit constraint"
"Don't fit"
1995-2014 GraphPad Software, Inc.
232
227
231
226
226
232
233
"Impossible weights"
233
233
Checklist
Did you enter or choose the right equation?
Did you enter sensible values for initial values?
Is the range of X values invalid? Focus on the first and last X value (and on
X=0 if included in the range). Can the equation be evaluated at those X values
with the initial values you entered?
2.6.4.2
"Interrupted"
Checklist
If the maximum number of iterations was set to a low value, set it to a higher
1995-2014 GraphPad Software, Inc.
227
value and try again. If you have lots of data points and lots of parameters,
nonlinear regression can sometimes require hundreds of iterations.
If the maximum number of iterations was already set to a high value, you can
try a still higher value, but most likely Prism is still not going to be able to find a
best-fit curve. Things to check:
Did you enter the right model?
Does the curve defined by your initial values come near your data? Check
the option on the diagnostics tab 165 to plot that curve.
If you entered constraints, were they entered correctly?
If you didn't enter any constraints, consider whether you can constrain
one or more parameters to a constant value? For example, in a doseresponse curve can you constrain the bottom plateau to be zero?
Can you share a parameter over all the data sets (global fitting)?
2.6.4.3
"Not converged"
Checklist
Did you enter the right model?
228
Does the curve defined by your initial values come near your data? Check
the option on the diagnostics tab 165 to plot that curve.
If you entered constraints, were they entered correctly?
If you didn't enter any constraints, consider whether you can constrain
one or more parameters to a constant value? For example, in a doseresponse curve can you constrain the bottom plateau to be zero?
Can you share a parameter over all the data sets (global fitting)?
2.6.4.4
"Ambiguous"
Does it matter?
If your goal is to interpolate unknowns from a standard curve, you won't care
that the parameter values are 'ambiguous'. So long as the curve goes through
the points, and doesn't wiggle too much, the interpolations will be useful.
If your goal is to learn about your data by inspecting the values of the
229
parameters, then you've got a real problem. At least one of the parameters
has a best-fit value that you should not rely upon.
The data above show a fit of a dose-response curve to a set of data that don't
define a top plateau. Since the top plateau was not constrained to a constant
value, Prism reports the fit to be ambiguous.
230
The data above fit fine to a standard dose-response curve. But if you try to fit it
to a biphasic dose-response curve, Prism reports that the results are
ambiguous. The data follow a standard dose-response model just fine, with no
evidence of a second component. So fitting to a biphasic model -- with two
EC50 values -- is ambiguous.
Model has redundant parameters
The simplest example would be fitting this model: Y= A + B + C*X. This model
describes a straight line with a slope equal to C and a Y intercept equal to the
sum of A and B. But there is nothing in the data to fit A and B individually. There
are an infinite number of pairs of values of A and B that lead to the same sum,
so the same Y intercept. If Prism attempts to fit this model, it will conclude that
the fit is ambiguous.
Prism cannot tell the difference between this case and the previous one. But the
two cases are distinct. The model in this example has redundant parameters. It
doesn't matter what the data look like, or how many data points you collect,
fitting this model will always result in 'ambiguous' results, because two of the
parameters are intertwined. In the previous example, the fit was ambiguous
with the example data set, but would not have been ambiguous with other data
sets.
Checklist
Can you constrain any parameters to a constant value?
231
"Hit constraint"
Checklist
Did you enter the constraint correctly? Did you mix up "<" and ">".
232
"Don't fit"
The top of the Diagnostics tab offers the choice to plot the curve defined by the
initial values of the parameters.
When you make this choice, the top of the results will show "Don't fit".
This is a very useful option to use when diagnosing problem fits. But you won't
learn anything by looking at the results page. Instead, look at the graph. If the
curve goes near the points, then the initial values are OK. If the curve is far
from the points, or follows a different shape altogether, then you know that
there is a problem with the choice of model or initial values.
If you checked this option by mistake, go back to the Diagnostics tab and
check the alternative choice ("Fit the curve.").
2.6.4.7
2.6.4.8
233
"Perfect fit"
Prism reports "perfect fit' when the curve goes through every point. The sumof-squares is 0.0, and R2 is 1.00.
If you are testing nonlinear regression with made up values, add some random
scatter to make a better example.
If you are fitting actual data, and the fit is perfect, you either got very lucky or
have very few data points. It is not possible to compute the standard errors
and confidence intervals of the parameters when the fit is perfect, nor is it
possible to compare models.
2.6.4.9
"Impossible weights"
226
"Not converged"
"Ambiguous"
228
"Hit constraint"
227
231
226
234
"Don't fit"
232
232
233
"Impossible weights"
233
233
If you ask Prism to use either the F test or the AICc method to compare two
models, it will report "Can't calculate" if the two models have the same number
of parameters. This means it can't calculate the comparison between fits, not
that it can't fit the two models to the data.
Both these methods to compare models only make sense when the models
being compared have different numbers of parameters, and so have different
numbers of degrees of freedom. These methods are alternative methods of
dealing with the tradeoff between how well the model fits and how complicated
it is (assessed by the number of parameters).
If you want to compare two models with the same number of parameters,
there is no need to use either the F test or AIC. Simply choose the model that
fits the data the best with the smallest sum-of-squares.
2.7
235
255
261
267
285
291
335
Lines
311
324
Exponential
235
337
344
355
370
Polynomial
Gaussian
381
385
Sine waves
393
2.7.1.1
396
236
The EC50
237
Response
Percent of Control
100
75
50
25
EC50
10 -9
10 -8
10 -7
10 -6
10 -5
10 -4
10 -3
[Agonist, M]
Don't over interpret the EC50. It is simply the concentration of agonist required
to provoke a response halfway between the baseline and maximum responses.
Because the EC50 defines the location of the dose-response curve for a
particular drug, it is the most commonly used measure of an agonists potency.
However, the EC50 is usually not the same as the Kd for the binding of agonist
to its receptor -- it is not a direct measure of drug affinity.
The pEC50
The pEC50 is defined as the negative logarithm of the EC50. If the EC50 equals
1 micromolar (10 - 6 molar), the log EC50 is 6 and the pEC50 is 6. There is no
particular advantage to expressing potency this way, but it is customary in
some fields.
The IC50
In many experiments, you vary the concentration of an inhibitor. With more
inhibitor, the response decreases, so the dose-response curve goes downhill.
With such experiments, the midpoint is often called the IC50 ("I" for inhibition)
rather than the EC50 ("E" for effective). This is purely a difference in which
abbreviation is used, with no fundamental difference.
ECanything
A simple rearrangement of the equation lets you fit EC80 (or EC90 or
238
2000
Response
2.7.1.3
283
1000
0
-9
-8
-7
-6
-5
-4
-3
Best-fit v alues
BOTTOM
TOP
LOGEC50
EC50
Std. Error
BOTTOM
TOP
LOGEC50
95% Confidence Interv als
BOTTOM
TOP
LOGEC50
EC50
351.9
1481
-6.059
8.7250e-007
101.4
99.29
0.2717
128.7 to 575.1
1263 to 1700
-6.657 to -5.461
2.2020e-007 to 3.4570e-006
log[Dose]
The sample data above were fit to a dose-response curve with a Hill slope of 1.
The best-fit value for logEC50 is -6.059. Converting to the EC50 is no problem
simply take the antilog, which is 0.87 mM.
The standard error of the logEC50 is 0.2717. It is used to calculate a 95%
confidence interval, which ranges from -6.657 to -5.461. Take the antilog of
1995-2014 GraphPad Software, Inc.
239
2.7.1.4
Hill slope
Percent Response
240
100%
90%
10%
0%
81 fold
Log[Agonist]
Since the linkage between agonist binding and response can be very complex,
any shape is possible. It seems surprising, therefore, that so many doseresponse curves have shapes almost identical to receptor binding curves, even
when we know there are multiple steps between binding and measured
response. It turns out that no matter how many steps intervene between
agonist binding and response, the dose-response curve will have the usual
steepness so long as each messenger binds to a single binding site according to
the law of mass action.
241
Percent Response
100
Slope = 0.5
50
Slope = 1.0
Slope = 1.5
0
-9
-8
-7
-6
-5
-4
-3
Log[Agonist]
2.7.1.5
Stimulation or inhibition?
Prism offers one set of dose-response equations for stimulation and another
set for inhibition. The inhibitory equations are set up to run downhill. The only
difference is that the inhibitory equations fit the IC50 ("I" for inhibition) while the
242
Normalized or not?
If your data have been normalized so the curve runs from Y= 0 to Y=100, you
may wish to choose a normalized model. These models don't fit the bottom
and top plateaus, but rather force the bottom plateau to equal 0 and the top
plateau to equal 100. Only choose a 'normalized response' equation when you
have determined the values that define 0 and 100 very precisely. Just because
the data have been normalizeddoesn't mean to have to constrain the curve in
that way. 243
267
269
Bell-shaped dose-response
271
272
275
277
243
281
283
The dose-response model has four parameters: the bottom plateau, the top
plateau, the EC50, and the slope factor (which is often constrained to a
standard value).
The main goal of fitting the dose-response curve in many situations is to
determine the best-fit value of the EC50, which is the concentration that
provokes a response halfway between the top and bottom plateaus. If those
plateaus are not well defined, the EC50 will be very uncertain. Think of it this
way: If you have not defined "100" and "0" very precisely, you also have not
defined "50" precisely, and therefore cannot determine the EC50 precisely.
One way to solve the problem is to constrain the Top or Bottom, or both, to
control values. Another alternative is to normalize your data so responses run
from 0 to 100, and then choose a "normalized response" model. These models
don't fit the bottom and top plateaus, but rather force the bottom plateau to
equal 0 and the top plateau to equal 100. Only choose a 'normalized response'
equation when you have determined the values that define 0 and 100 very
precisely.
Prism makes it easy to normalize the data so the values run from 0% to 100%
. Simply click Analyze, choose the Normalize analysis., and define how 0% and
100% are defined. When fitting a dose-response curve, you can fit either the
raw data or normalized data.
Notes:
It is not necessary to normalize before fitting dose-response data. In many
cases, it is better to show the actual data.
You can only plot several different dose-response curves on one graph using
one axis when they are comparable. If the different experiments measured
different variables, normalizing puts them into comparable units. This can be
useful.
Whether or not you choose to normalize your data, you still need to choose
how to fit the data. Do you want Prism to find best-fit values for the Top and
Bottom plateaus? Or do you want those plateaus to be determined by control
data?
244
If you normalize your data, you can choose one of the normalized doseresponse equations. These constrain the the curve to run from 0% to 100%.
This kind of constraint only makes sense, when 0% and 100% are defined by
good control data. If the definitions of 0% and 100% are ambiguous, then so
is the definition of "50%", and thus the EC50 is also ambiguous.
Just because you chose to normalize your data doesn't mean you must
constrain the curve to run from 0 to 100%. You may prefer to have Prism fit
those two plateaus.
If you don't normalize your data, you can use the Constrain tab to fix Top
and Bottom to values determined from control experiments. So the decision
to constrain Top and Bottom is quite distinct from the decision to normalize
your data before fitting.
It is possible, and can be reasonable, to fix one of those parameters (Top or
Bottom) to a constant value but not the other.
If you normalize, don't also choose to differentially weight the data
2.7.1.7
35
245
2.7.1.8
The terms logistic has three meanings which have little relationship to each
other (1).
246
time (Nt). But population growth slows down as it reaches the maximum, so is
also proportional to (Nmax - Nt). So the rate of change of population is
proportional to Nt(Nmax - Nt).
Integrate that differential equation, and the result is called a logistic equation. It
defines a sigmoidal shaped curve that defines the population at any time. The
model has three parameters: the starting population, the maximum population,
and the time it takes to reach half-maximal. Sometimes it is modified to add a
fourth parameter to define the steepness of the curve.
This curve was used by demographers in the past, but actually doesn't do a
very good job of describing the growth of human populations. It is still used to
model the growth of tumors, and to model the fraction of a population that
uses a new product (like a mobile phone).
This model is also used for autocatalytic reactions, where the product of the
reaction is also a catalyst for that reaction. With this kind of reaction, the rate
of product accumulation is proportional to the concentration of product already
produced times the concentration of remaining substrate. This has the same
mathematical form as the population growth model. The graph is identical to
the one above, except the Y-axis would be the concentration of the product
produced by the enzyme reaction (instead of population).
247
248
is between 0 and 1 the logistic regression model actually predicts the natural
logarithm of the odds. The function that computes the natural logarithm of the
odds from a fraction is called the logit function (pronounced with a long O and a
soft G), so regression used to predict the logit of a probability from multiple
independent variables is called logistic regression.
References
1. J.S. Cramer. The origins and development of the logit model. Chapter 9 of
Logit models from economics and other fields, Cambridge University Press,
2003
2. N. Bindslev, Hill in Hell (pdf), Chapter 10 of Drug-Acceptor Interactions,
ISBN: 978-91-977071-0-7
2.7.1.9
249
The green symbols show measurements made with controls. The ones on the
left (Blank) have no inhibitor, so define "100%". The ones on the right are in
the presence of a maximal concentration of a standard inhibitor, so define
"0%". The data of the experimental dose-response curve (red dots) extend all
the way between the two control values.
When fitting this curve, you need to decide how to fit the top plateau of the
curve. You have three choices:
Fit the data only, ignoring the Blank control values.
Average the Blank control values, and set the parameter Top to be a constant
value equal to the mean of the blanks.
Enter the blank values as if they were part of the dose-response curve.
Simply enter a low dose, perhaps 10-10 or 10-11. You can't enter zero,
because zero is not defined on a log scale.
The results will be very similar with any of these methods, because the data
form a complete dose-response curve with a clear top plateau that is
indistinguishable from the blank. I prefer the third method, as it analyzes all the
data, but that is not a strong preference.
Similarly, there are three ways to deal with the bottom plateau: Fit the data
only, set Bottom to be a constant equal to the average of the NS controls,
and put the NS controls into the fit as if they were a very high concentration of
inhibitor.
250
That is the ideal situation. There is no ambiguity about what IC50 means.
Clearly, a single value cannot summarize such a curve. You'd need at least two
values, one to quantify the middle of the curve (the drug's potency) and one to
quantify how low it gets (the drug's maximum effect).
The graph above shows two definitions of the IC50.
The relative IC50 is by far the most common definition, and the adjective
relative is usually omitted. It is the concentration required to bring the curve
down to point half way between the top and bottom plateaus of the curve.
The NS values are totally ignored with this definition of IC50. This definition is
the one upon which classical pharmacological analysis of agonist and antagonist
interactions is based. With appropriate consideration of the biological system
and concentrations of interacting ligands, estimated Kd values can often be
derived from the IC50 value defined this way (not so for the "so-called absolute
IC50" mentioned below).
The concentration that provokes a response halfway between the Blank and
the NS value is sometimes called the absolute IC50, The horizontal dotted
251
lines show how 100% and 0% are defined, which then defines 50%. This term
is not entirely standard. Since this value does not quantify the potency of a
drug, the authors of the International Union of Pharmacology Committee on
Receptor Nomenclature (1) think that the concept of absolute IC50 (and that
term) is not useful (R. Neubig, personal communication). I agree.
The concept (but not the term "absolute IC50") is used to quantify drugs that
slow cell growth. The abbreviation GI50 is used for what we call here the
absolute IC50. It is also used by the Environmental Protection Agency (EPA) in
evaluating endocrine disrupters (Appendix A). That document uses the term
IC50 to refer to the absolute IC50, and the term EC50 to refer to the relative
IC50. It doesn't use the terms relative and absolute.
If you really want to use the absolute IC50, the next page has instructions for
fitting a curve to find it 251 .
Reference
1. R. R. Neubig et al. International Union of Pharmacology Committee on
Receptor Nomenclature and Drug Classification. XXXVIII. Update on terms and
symbols in quantitative pharmacology. Pharmacol Rev (2003) vol. 55 (4) pp.
597-606
Download the Prism file used to create all the graphs in this article.
Fifty=(Top+Baseline)/2
Y= Bottom + (Top-Bottom)/(1+10^((LogIC50-X)*HillSlope + log((Top-Bottom)/(
Note the distinction between the parameter Bottom and Baseline. Bottom is the
Y value of the bottom plateau of the curve itself. Baseline is the Y value that
defines 0% -- maximal inhibition by a standard drug. You'll definitely want to
constrain Baseline to be a constant value based on controls. You may also
want to constrain Top.
Download the Prism file that fits that equation to make the graph shown above.
1995-2014 GraphPad Software, Inc.
252
When fitting data to that equation, don't forget to constrain Baseline and Top to
appropriate values determined by controls. Additionally, this file contains
another graph where the data are already normalized to run from 0 to 100%.
These data are fit to a simpler equation where Baseline is set to equal zero, and
Top is set to equal 100. These are hard wired into the equation, so you don't
have to remember to constrain those two parameters to constant values.
253
Any attempt to determine an IC50 by fitting a curve to the data in the graph
above will be useless. Prism might, or might not, be able to fit a dose-response
curve to the data. But if the curve fits, the value of the IC50 is likely to be
meaningless and have a very wide confidence interval. The data simply don't
really define a top plateau (which would define 100) and certainly doesn't
provide even a hint of a bottom plateau (which would define 0). If data haven't
defined 100 or 0, then 50 is undefined too, as is the IC50.
If you also have control values that define 100 and 0, then the curve can be
easily fit. The curve below was created by fitting a dose response curve, but
constraining the Top plateau to be a constant value equal to the mean of the
Blanks values, and the Bottom plateau equal to the mean of the NS values.
Note that the blank and NS values are shown in green.
254
The value of the IC50 fit this way only makes sense if you assume that higher
concentrations of the inhibitor would eventually inhibit down to the NS values.
That is an assumption that can't be tested with the data at hand.
The distinction between relative and absolute IC50 248 doesn't really apply to
these data. Because the data don't define a bottom plateau, the IC50 can only
be defined relative to the NS control values.
2.7.1.12 Troubleshooting fits of dose-response curves
255
Is the bottom plateau defined by the data? If not, can it be constrained? If the bottom
plateau is not defined either by the data or by a constraint, then any fit of a dose-response
curve is unlikely to be useful.
Were the data normalized? To what values? If the data were normalized, consider
constraining the top and bottom plateaus to be 0 and 1 (or 100) in the Constraint tab?
Hill slope fixed? A Hill slope of 1.0 or -1.0 is commonly seen in many systems, but not all.
Four parameter assumes symmetry. The usual equations to fit dose response curves have
four parameters (top, bottom, EC50, Hill Slope), and define symmetrical curves. You can
choose an equation that adds a fifth parameter to fit asymmetrical curves 267 .
2.7.2
Dose-response - Stimulation
2.7.2.1
Introduction
Many log(dose) vs. response curves follow the familiar symmetrical sigmoidal
shape. The goal is to determine the EC50 of the agonist - the concentration
that provokes a response half way between the basal (Bottom) response and
the maximal (Top) response.
This model assumes that the dose response curve has a standard slope, equal
to a Hill slope (or slope factor) of 1.0. This is the slope expected when a ligand
binds to a receptor following the law of mass action, and is the slope expected
of a dose-response curve when the second messenger created by receptor
stimulation binds to its receptor by the law of mass action. If you don't have
many data points, consider using the standard slope model. If you have lots of
data points, pick the variable slope model to determine the Hill slope from the
data.
This equation is sometimes called a three parameter dose-response curve. If
you also fit the Hill slope, then it is a four parameter equation.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
into X. Enter response into Y in any convenient units. Enter one data set into
column A, and use columns B, C... for different treatments, if needed.
If you prefer to enter concentrations, rather than the logarithm of
256
244
From the data table, click Analyze, choose nonlinear regression, choose the
panel of equations "Dose-response curves - Stimulation" and then choose the
equation log(Agonist) vs. response.
If you have subtracted off any basal response, consider constraining Bottom to
a constant value of 0.
Model
Y=Bottom + (Top-Bottom)/(1+10^((LogEC50-X)))
2.7.2.2
Introduction
Many log(dose) response curves follow the familiar symmetrical sigmoidal
shape. The goal is to determine the EC50 of the agonist - the concentration
that provokes a response half way between the basal (Bottom) response and
the maximal (Top) response.
Many dose-response curves have a standard slope of 1.0. This model does not
257
assume a standard slope but rather fits the Hill Slope from the data, and so is
called a Variable slope model. This is preferable when you have plenty of data
points. It is also called a four-parameter dose-response curve, or fourparameter logistic curve, abbreviated 4PL.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
into X. Enter response into Y in any convenient units. Enter one data set into
column A, and use columns B, C... for different treatments, if needed.
If you prefer to enter concentrations, rather than the logarithm of
concentrations, use Prism to transform the X values to logs 244 .
From the data table, click Analyze, choose nonlinear regression, choose the
panel of equations "Dose-response curves - Stimulation" and then choose the
equation "log(Agonist) vs. response -- Variable slope".
Consider constraining the parameter HillSlope to its standard values of 1.0. This
is especially useful if you don't have many data points, and therefore cannot fit
the slope very well.
If you have subtracted off any basal response, consider constraining Bottom to
a constant value of 0.
Model
Y=Bottom + (Top-Bottom)/(1+10^((LogEC50-X)*HillSlope))
258
2.7.2.3
Introduction
Many log(dose) vs. response curves follow the familiar symmetrical sigmoidal
shape.
If you have good control data, it can make sense to normalize the response
to run between 0% and 100%. This model assumes that the data have been
normalized, so forces the curve to run from 0% to 100%. The goal is to
determine the EC50 of the agonist - the concentration that provokes a
response equal to 50%.
It only makes sense to fit a normalized model when you are sure you have
defined 0% and 100% quite accurately. If your data define a complete
sigmoidal curve, it is best to fit the entire curve and let Prism fit the Top and
Bottom plateaus 256 . If your data don't form a full sigmoidal curve, but you can
define the bottom and top by solid control data, then fitting to a normalized
model is preferable.
This model assumes that the dose response curve has a standard slope, equal
to a Hill slope (or slope factor) of 1.0. This is the slope expected when a ligand
binds to a receptor following the law of mass action, and is the slope expected
of a dose-response curve when the second (and third...) messengers created
by receptor stimulation binds to its receptor by the law of mass action. If you
don't have many data points, consider using the standard slope model. If you
have lots of data points, pick the variable slope model to determine the Hill
slope from the data.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
into X. Enter response into Y in any convenient units. Enter one data set into
column A, and use columns B, C... for different treatments, if needed.
If you prefer to enter concentrations, rather than the logarithm of
1995-2014 GraphPad Software, Inc.
244
259
From the data table, click Analyze, choose nonlinear regression, choose the
panel of equations "Dose-response curves - Stimulation" and then choose the
equation "log(Agonist) vs. normalized response".
Model
Y=100/(1+10^((LogEC50-X)))
2.7.2.4
Introduction
Many log(dose) response curves follow the familiar symmetrical sigmoidal
shape.
If you have good control data, it can make sense to normalize the response to
run between 0% and 100%. This model assumes that the data have been
normalized, so forces the curve to run from 0% to 100%. The goal is to
determine the EC50 of the agonist - the concentration that provokes a
response equal to 50%.
260
It only makes sense to fit a normalized model when you are sure you have
defined 0% and 100% quite accurately. If your data define a complete
sigmoidal curve, it is best to fit the entire curve and let Prism fit the Top and
Bottom plateaus 256 . If your data don't form a full sigmoidal curve, but you can
define the bottom and top by solid control data, then fitting to a normalized
model is preferable.
Many dose-response curves have a standard slope of 1.0. This model does not
assume a standard slope but rather fits the Hill Slope from the data, and so is
called a Variable slope model. This is preferable when you have plenty of data
points.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
into X. Enter response into Y in any convenient units. Enter one data set into
column A, and use columns B, C... for different treatments, if needed.
If you prefer to enter concentrations, rather than the logarithm of
concentrations, use Prism to transform the X values to logs 244 .
From the data table, click Analyze, choose nonlinear regression, choose the
panel of equations "Dose-response curves - Stimulation" and then choose the
equation "log(Agonist) vs. normalized response -- Variable slope".
Model
Y=100/(1+10^((LogEC50-X)*HillSlope))
261
2.7.3
Dose-response - Inhibition
2.7.3.1
Introduction
Many log(inhibitor) vs. response curves follow the familiar symmetrical
sigmoidal shape. The goal is to determine the IC50 of the inhibitor - the
concentration that provokes a response half way between the maximal (Top)
response and the maximally inhibited (Bottom) response.
This model assumes that the dose response curves has a standard slope,
equal to a Hill slope (or slope factor) of -1.0. This is the slope expected when a
ligand binds to a receptor following the law of mass action, and is the slope
expected of a dose-response curve when the second messenger created by
receptor stimulation binds to its receptor by the law of mass action. If you
don't have many data points, consider using the standard slope model. If you
have lots of data points, pick the variable slope model to determine the Hill
slope from the data.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
inhibitor into X. Enter response into Y in any convenient units. Enter one data
set into column A, and use columns B, C... for different treatments, if needed.
If you prefer to enter concentrations, rather than the logarithm of
concentrations, use Prism to transform the X values to logs 244 .
262
From the data table, click Analyze, choose nonlinear regression, choose the
panel of equations "Dose-response curves - Inhibition" and then choose the
equation "log(inhibitor) vs. response".
If you have subtracted off any basal response, consider constraining Bottom to
a constant value of 0.
Model
Y=Bottom + (Top-Bottom)/(1+10^((X-LogIC50)))
2.7.3.2
Introduction
Many log(inhibitor) vs. response curves follow the familiar symmetrical
sigmoidal shape. The goal is to determine the IC50 of the inhibitor - the
concentration that provokes a response half way between the maximal (Top)
response and the maximally inhibited (Bottom) response.
263
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
inhibitor into X. Enter response into Y in any convenient units. Enter one data
set into column A, and use columns B, C... for different treatments, if needed.
If you prefer to enter concentrations, rather than the logarithm of
concentrations, use Prism to transform the X values to logs 244 .
From the data table, click Analyze, choose nonlinear regression, choose the
panel of equations "Dose-response curves - Inhibition" and then choose the
equation "log(inhibitor) vs. response -- Variable slope".
If you have subtracted off any basal response, consider constraining Bottom to
a constant value of 0.
Model
Y=Bottom + (Top-Bottom)/(1+10^((LogIC50-X)*HillSlope))
264
2.7.3.3
Introduction
Many log(inhibitor) vs. response curves follow the familiar symmetrical
sigmoidal shape.
If you have good control data, it can make sense to normalize the response
to run between 0% and 100%. This model assumes that the data have been
normalized, so forces the curve to run from 100% down to 0%. The goal is to
determine the IC50 of the inhibitor - the concentration that provokes a
response equal to 50%.
It only makes sense to fit a normalized model when you are sure you have
defined 0% and 100% quite accurately. If your data define a complete
sigmoidal curve, it is best to fit the entire curve and let Prism fit the Top and
Bottom plateaus 261 . If your data don't form a full sigmoidal curve, but you can
define the bottom and top by solid control data, then fitting to a normalized
model is preferable.
This model assumes that the dose response curve has a standard slope, equal
to a Hill slope (or slope factor) of -1.0. This is the slope expected when a ligand
binds to a receptor following the law of mass action, and is the slope expected
of a dose-response curve when the second messenger created by receptor
stimulation binds to its receptor by the law of mass action. If you don't have
many data points, consider using the standard slope model. If you have lots of
data points, pick the variable slope model to determine the Hill slope from the
data.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
inhibitor into X. Enter response into Y in any convenient units. Enter one data
set into column A, and use columns B, C... for different treatments, if needed.
If you prefer to enter concentrations, rather than the logarithm of
concentrations, use Prism to transform the X values to logs 244 .
1995-2014 GraphPad Software, Inc.
265
From the data table, click Analyze, choose nonlinear regression, choose the
panel of equations "Dose-response curves - Inhibition" and then choose the
equation "log(inhibitor) vs. normalized response".
Model
Y=100/(1+10^((X-LogIC50)))
2.7.3.4
Introduction
Many log(inhibitor) vs. response curves follow the familiar symmetrical
sigmoidal shape.
If you have good control data, it can make sense to normalize the response
to run between 0% and 100%. This model assumes that the data have been
normalized, so forces the curve to run from 100% down to 0%. The goal is to
determine the IC50 of the inhibitor - the concentration that provokes a
response equal to 50%.
It only makes sense to fit a normalized model when you are sure you have
defined 0% and 100% quite accurately. If your data define a complete
266
sigmoidal curve, it is best to fit the entire curve and let Prism fit the Top and
Bottom plateaus 261 . If your data don't form a full sigmoidal curve, but you can
define the bottom and top by solid control data, then fitting to a normalized
model is preferable.
Many inhibitory dose-response curves have a standard slope of -1.0. This
model does not assume a standard slope but rather fits the Hill Slope from the
data, and so is called a Variable slope model. This is preferable when you have
plenty of data points.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
inhibitor into X. Enter response into Y in any convenient units. Enter one data
set into column A, and use columns B, C... for different treatments, if needed.
If you prefer to enter concentrations, rather than the logarithm of
concentrations, use Prism to transform the X values to logs 244 .
From the data table, click Analyze, choose nonlinear regression, choose the
panel of equations "Dose-response curves - Inhibition" and then choose the
equation "log(inhibitor) vs. normalized response -- variable slope".
Model
Y=100/(1+10^((LogIC50-X)*HillSlope)))
267
give a response nowhere near "50". Prism reports both the IC50 and its log.
HillSlope describes the steepness of the family of curves. A HillSlope of -1.0 is
standard, and you should consider constraining the Hill Slope to a constant
value of -1.0. A Hill slope more negative than -1 (say -2) is steeper.
2.7.4
Dose-response -- Special
.
2.7.4.1
Introduction
The standard dose-response curve is sometimes called the four-parameter
logistic equation. It fits the bottom and top plateaus of the curve, the EC50,
and the slope factor (Hill slope). This curve is symmetrical around its midpoint.
To extend the model to handle curves that are not symmetrical, the Richards
equation adds an additional parameter, S, which quantifies the asymmetry. This
equation is sometimes referred to as a five-parameter logistic equation,
abbreviated 5PL.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
into X. Enter response into Y in any convenient units.
From the data table, click Analyze, choose nonlinear regression, and choose the
panel of equations: Dose-Response -- Special. Then choose Asymmetrical
(five parameter).
Consider constraining the Hill Slope to a constant value of 1.0 (stimulation) or 1 (inhibition).
Model
LogXb = LogEC50 + (1/HillSlope)*Log((2^(1/S))-1)
Numerator = Top - Bottom
Denominator = (1+10^((LogXb-X)*HillSlope))^S
Y = Bottom + (Numerator/Denominator)
268
269
Response
100
50
S = 0.5 1 2
0
-10
-9
-8
-7
-6
-5
-4
-3
Log [X]
Notes
The inflection point is called LogXb. It is not the same as the logEC50. Using the
built in equation, Prism does not fit logXb, but you can do so using this
equation:
LogX b =LogEC50 +
1
HillSlope
Log 21/S -1
You can rewrite the equation so Prism fits the logXb rather than the logEC50.
Clone 409 the built-in equation, and edit the copy. Remove the first line [LogXb =
LogEC50 + (1/HillSlope)*Log((2^(1/S))-1)] from the equation. Then
establish a rule for the initial value 433 of the logXb to be 1 * (Value of X at
Ymid).
Reference
Introduction
A common deviation from the standard monotonic sigmoid shape is the
biphasic sigmoid shape.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
into X. Enter response into Y in any convenient units.
From the data table, click Analyze, choose nonlinear regression, and choose the
1995-2014 GraphPad Software, Inc.
270
Model
Span=Top-Bottom
Section1=Span*Frac/(1+10^((LogEC50_1-X)*nH1))
Section2=Span* (1-Frac)/(1+10^((LogEC50_2-X)*nH2))
Y=Bottom + Section1 +Section2
2.7.4.3
271
Introduction
Some drugs may cause an inhibitory response at low concentrations, and a
stimulatory response at high concentrations, or vice-versa. The net result is a
bell-shaped dose-response curve.
The model explained here is the sum of two dose-response curves, one that
stimulates and one that inhibits. you will need lots of data to determine all the
parameters without ambiguity, so this model will rarely be useful for data
analysis. But it might be useful as a way to draw a smooth curve through the
data
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
into X. Enter response into Y in any convenient units.
From the data table, click Analyze, choose nonlinear regression, and choose the
panel of equations: Dose-Response -- Special. Then choose Bell-shaped doseresponse.
Consider constraining nH1 and nH2 to constant values of 1.0 (stimulation) and
-1 (inhibition).
Model
Span1=Plateau1-Dip
Span2=Plateau2-Dip
Section1=Span1/(1+10^((LogEC50_1-X)*nH1))
Section2=Span2/(1+10^((X-LogEC50_2)*nH2))
Y=Dip+Section1+Section2
272
2.7.4.4
273
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
ligand into X. Enter response into Y in any convenient units. Enter data with a
full agonist and no receptor depletion into column A. Enter data collected after
receptor depletion into column B. Repeat, if you have data with different levels
of receptor depletion for column C, D, E, ... You don't have to know the degree
to which the receptors are depleted, and don't have to enter any values in the
column titles (although they are useful as labels).
From the data table, click Analyze, choose nonlinear regression, and choose the
panel of equations: Dose-Response -- Special. Then choose Operational Model
- Depletion.
If you have subtracted off any basal response, consider constraining the
parameter Basal to a constant value of zero.
Also consider constraining the transducer slope n to a constant value of 1.0.
When set to 1.0, all dose-response curves are constrained to have Hill slopes of
1.0, which is observed commonly.
Model
operate= (((10^logKA)+(10^X))/(10^(logtau+X)))^n
Y=Basal + (Effectmax-Basal)/(1+operate)
1995-2014 GraphPad Software, Inc.
274
275
all the dose-response curves will have Hill slopes of 1.0. If n does not equal 1.0,
the Hill Slope does not equal either 1.0 or n.
Notes
Since Tau measures efficacy, Prism fits a different value of tau for each data
set. Receptor depletion reduce the value of tau. The other parameters are fit
globally, to find one best-fit value for all the data sets.
Reference
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
ligand into X. Enter response into Y in any convenient units. Enter data with the
full agonist into column A. Enter data collected with a partial agonist into column
B. Repeat, if you have data with different partial agonists, for column C, D,
E, ..., each with a different amount of depletion.
276
From the data table, click Analyze, choose nonlinear regression, and choose the
panel of equations: Dose-Response -- Special. Then choose Operational Model
- Partial agonists.
If you have subtracted off any basal response, consider constraining the
parameter Basal to a constant value of zero.
Also consider constraining the transducer slope n to a constant value of 1.0.
When set to 1.0, all dose-response curves are constrained to have Hill slopes of
1.0, which is observed commonly. If n is not 1.0, the Hill slopes will not be 1.0,
but the Hill slopes will not equal exactly n.
Model
operate= (((10^logKA)+(10^X))/(10^(logtau+X)))^n
<A> Y = Basal + (Effectmax-Basal)/(1+10^((LogEC50-X)*n))
<~A> Y = Basal + (Effectmax-Basal)/(1+operate)
The second line is preceded with <A> which means it only applies to the first
data set. It fits a variable slope dose-response curve. The third line is preceded
with <~A> which means it applies to all data set except the first. It fits the
operational model to determine the affinity (KA) of the partial agonist.
277
Introduction
A competitive inhibitor competes for agonist binding to a receptor, and shifts
the dose-response curve to the right without changing the maximum response.
By fitting all the curves globally, you can determine the affinity of the
competitive inhibitor.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
ligand into X. Enter response into Y in any convenient units. Enter data with no
inhibitor into column A. Enter data collected with a constant concentration of
inhibitor into column B. Repeat, if you have data, for column C, D, E, ..., each
with a different concentration of inhibitor. Enter the inhibitor concentration (in
molar so 1nM is entered as '1e-9') into the column titles. Don't forget to enter
'0' as the column title for data set A.
From the data table, click Analyze, choose nonlinear regression, and choose the
panel of equations: Dose-Response -- Special. Then choose Gaddum/Schild
EC50 shift.
278
Model
EC50=10^LogEC50
Antag=1+(B/(10^(-1*pA2)))^SchildSlope
LogEC=Log(EC50*Antag)
Y=Bottom + (Top-Bottom)/(1+10^((LogEC-X)*HillSlope))
279
1. Colquhoun, D.. Why the Schild method is better than Schild realised. Trends
Pharmacol Sci (2007) vol. 28 (12) pp. 608-14
2.7.4.7
Introduction
An competitive inhibitor competes for agonist binding to a receptor, and shifts
the dose-response curve to the right without changing the maximum response.
This model fits the two dose response curves and determines the fold shift.
280
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
ligand into X. Enter response into Y in any convenient units. Enter data with no
inhibitor into column A. Enter data collected with a constant concentration of
inhibitor into column B.
From the data table, click Analyze, choose nonlinear regression, and choose the
panel of equations: Dose-Response -- Special. Then choose Dose shift.
If you have subtracted off any basal signal, constrain the parameter Bottom to
a constant value of zero.
Model
<A>LogEC=LogEC50Control
<~A>LogEC=LogEC50Control + log(EC50Ratio)
Y=Bottom + (Top-Bottom)/(1+10^((LogEC-X)*HillSlope))
281
Notes
If you have several concentrations of antagonist, use a different model that
will directly fit the Schild model 277 and determine the pA2.
2.7.4.8
Introduction
An allosteric modulator can reduce or enhance agonist binding. This model fits
entire dose-response curves determined in the absence and presence of a
modulator. The goal is to learn the affinity of the modulator for binding to its
site, and also determine the value of alpha, the ternary complex constant that
quantifies the degree to which binding of the modulator alters the affinity of the
radioligand for the receptor site.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
ligand into X. Enter response into Y in any convenient units. Enter data with no
modulator into column A. Enter data collected with a constant concentration of
modulator into column B. Repeat, if you have data, for column C, D, E, ..., each
with a different concentration of modulator. Enter the modulator concentration
(in molar so 1nM is entered as '1e-9') into the column titles. Don't forget to
enter '0' as the column title for data set A.
From the data table, click Analyze, choose nonlinear regression, and choose the
panel of equations: Dose-Response -- Special. Then choose Allosteric EC50
shift.
You do not need to constrain any parameters to constant values
Model
EC50=10^LogEC50
KB=10^LogKB
alpha=10^Logalpha
Antag=(1+B/KB)/(1+alpha*B/KB)
LogEC=Log(EC50*Antag)
Y=Bottom+(Top-Bottom)/(1+10^((LogEC-X)*HillSlope))
282
EC50 is the concentration of agonist that gives half maximal response in the
absence of modulator.
Kb is the equilibrium dissociation constant (Molar) of modulator binding to its
allosteric site. It is in the same molar units used to enter the modulator
concentration into column titles on the data table.
Alpha is the ternary complex constant. When alpha=1.0, the modulator won't
alter binding. If alpha is less than 1.0, then the modulator reduces ligand binding.
If alpha is greater than 1.0, then the modulator increases binding. In the
example shown about, alpha equals 0.01 so the modulator greatly decreases
binding.
Top and Bottom are plateaus in the units of the Y axis.
Notes
This model is designed to analyze data when the modulator works via an
allosteric site. Since the agonist and modulator are acting via different sites,
it is incorrect to refer to the modulator as a competitor.
The model is written to fit the logarithm of alpha, rather than alpha itself.
This is because alpha is asymmetrical: All values from 0 to 1 mean that the
modulator decreases binding, while all values from 1 to infinity mean that
the modulator enhances binding. On a log scale, its values are more
symmetrical, so the confidence interval computed on a log scale (as Prism
does) are more accurate. Prism reports both alpha and log(alpha).
283
Equation: ECanything
Introduction
Many log(dose) response curves follow the familiar symmetrical sigmoidal
shape. The usual goal is to determine the EC50 of the agonist - the
concentration that provokes a response half way between the basal (Bottom)
response and the maximal (Top) response. But you can determine any spot
along the curve, say a EC80 or EC90.
Many dose-response curves have a standard slope of 1.0. This model does not
assume a standard slope but rather fits the Hill Slope from the data. Hence the
name Variable slope model. This is preferable when you have plenty of data
points.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the agonist
into X. Enter response into Y in any convenient units. Enter one data set into
column A, and use columns B, C... for different treatments, if needed.
From the data table, click Analyze, choose nonlinear regression, and choose the
panel of equations: Dose-Response -- Special. Then choose "log(Agonist) vs.
response -- Find ECanything".
You must constrain the parameter F to have a constant value between 0 and
100. Set F to 80 if you want to fit the EC80. If you constrain F to equal 50,
1995-2014 GraphPad Software, Inc.
284
Model
logEC50=logECF - (1/HillSlope)*log(F/(100-F))
Y=Bottom + (Top-Bottom)/(1+10^((LogEC50-X)*HillSlope))
ICanything
This equation can also fit inhibitory data where the curve goes downhill rather
than uphill. The best-fit value of the Hill Slope will be negative in this case.The
result will always be reported as ECf. Let's say you set F=80. Then the ECf for
inhibitory data would be the concentration (X value) required to bring the curve
285
down to 80%. If you want the concentration that brings the curve down by
80%, to 20%, then you'd need to set F equal to 20.
2.7.5
2.7.5.1
Binding occurs when ligand and receptor collide due to diffusion, and when the
collision has the correct orientation and enough energy. The rate of association
is:
Number of binding events per unit of time =[Ligand] [Receptor] kon.
Once binding has occurred, the ligand and receptor remain bound together for a
random amount of time. The probability of dissociation is the same at every
instant of time. The receptor doesn't "know" how long it has been bound to the
ligand. The rate of dissociation is:
Number of dissociation events per unit time = [ligandreceptor]koff.
After dissociation, the ligand and receptor are the same as at they were before
binding. If either the ligand or receptor is chemically modified, then the binding
does not follow the law of mass action.
Equilibrium is reached when the rate at which new ligandreceptor complexes
are formed equals the rate at which the ligandreceptor complexes dissociate.
At equilibrium:
286
Meaning of Kd
Rearrange that equation to define the equilibrium dissociation constant Kd.
Name
Units
kon
M-1 min-1
koff
min-1
Kd
Fractional occupancy
The law of mass action predicts the fractional receptor occupancy at equilibrium
as a function of ligand concentration. Fractional occupancy is the fraction of all
receptors that are bound to ligand.
This equation is not useful, because you don't know the concentration of
unoccupied receptor, [Receptor]. A bit of algebra creates a useful equation.
287
This equation assumes equilibrium. To make sense of it, think about a few
different values for [Ligand].
[Ligand]
Fractional Occupancy
0%
1.Kd
50%
4.Kd
80%
9.Kd
90%
99.Kd
99%
Assumptions
Although termed a "law", the law of mass action is simply a model that can be
used to explain some experimental data. Because it is so simple, the model is
not useful in all situations. The model assumes:
All receptors are equally accessible to ligands.
Receptors are either free or bound to ligand. It doesn't allow for more than
one affinity state, or states of partial binding.
Binding does not alter the ligand or receptor.
Binding is reversible.
Despite its simplicity, the law of mass action has proven to be very useful in
describing many aspects of receptor pharmacology and physiology.
288
2.7.5.2
Nonspecific binding
289
is higher.
Ideally, you should get the same results defining nonspecific binding with a
range of concentrations of several drugs, and you should test this when
possible. In many assay systems, nonspecific binding is only 10-20% of the
total radioligand binding. If the nonspecific binding makes up more than half of
the total binding, you will find it hard to get quality data. If your system has a
lot of nonspecific binding, try different kinds of filters, a larger volume of
washing buffer, warmer washing buffer, or a different radioligand.
2.7.5.3
Ligand depletion
In many experimental situations, you can assume that a very small fraction of
the ligand binds to receptors (or to nonspecific sites). In these situations, you
can also assume that the free concentration of ligand is approximately equal to
the concentration you added. This assumption vastly simplifies the analysis of
binding experiments, and the standard analysis methods depend on this
assumption.
In other situations, a large fraction of the ligand binds to the receptors (or binds
nonspecifically). This means that the concentration of ligand free in the solution
does not equal the concentration you added. The discrepancy is not the same
in all tubes or at all times. The free ligand concentration is depleted by binding.
Many investigators use this rule of thumb: If less than 10% of the ligand binds,
don't worry about ligand depletion; if more than 10% of the ligand binds, you
have three choices:
Change the experimental conditions. Increase the reaction volume without
changing the amount of tissue. The problem with this approach is that it
requires more radioligand, which is usually very expensive.
Measure the free concentration of ligand in every tube. This is possible if you
use centrifugation or equilibrium dialysis, but is quite difficult if you use
vacuum filtration to remove free radioligand.
Use analysis techniques that adjust for the difference between the
concentration of added ligand and the concentration of free ligand. Prism
includes such models for analyzing saturation 295 and competition 318 data.
These special analyses only work with radioactive ligands, so the
assessment of added ligand and bound ligand are in the same counts-perminute units. These methods don't work with fluorescent ligands.
290
2.7.5.4
Description
Isotope decay
Conc. of stock
Dilution of
stock
Calculation
Description
Cpm to sites/
cell
Cpm to nM
2.7.6
2.7.6.1
291
297
Analyze the total binding only 292 , inferring the amount of nonspecific binding
from the shape of the total binding curve.
Globally fit total and nonspecific binding together
293
We recommend the third approach (global fitting of total and nonspecific). The
problem with fitting specific binding is that you have to make some
assumptions in order to subtract nonspecific from total, and the resulting values
that you fit aren't really data. When possible, we suggest that you fit the data
you actually collect, and avoid creating derived data sets (specific binding, in this
case).
Fitting total binding only requires less data, so saves experimental time and
money. But most people feel unconformable defining nonspecific binding purely
1995-2014 GraphPad Software, Inc.
292
2.7.6.2
Introduction
You don't have to measure nonspecific binding directly. Instead, you can
determine Bmax and Kd by fitting only total binding by assuming that the
amount of nonspecific binding is proportional to the concentration of
radioligand.
Step by step
Create an XY data table. Enter radioligand concentration into X, and total
binding into Y. If you have several experimental conditions, place the first into
column A, the second into column B, etc.
Use any convenient units for X and Y. The Kd will be reported in the same units
as X, and the Bmax will be reported in the same units as Y.
From the table of total binding, click Analyze, choose nonlinear regression,
choose the panel of Saturation Binding equations, and choose One site -- Total.
Consider constraining the parameter Background to a constant value of zero.
This is the measured 'binding' when there is no radioligand binding added, so
represents the counter background, if there is any.
Model
Y=Bmax*X/(Kd+X) + NS*X + Background
293
Notes
This analysis assumes that only a small fraction of radioligand binds, which
means that the concentration you added is virtually identical to the free
concentration. If you can't make this assumption, use an alternative analysis
2.7.6.3
295
Introduction
In a saturation binding experiment, you vary the concentration of radioligand
and measure binding. The goal is to determine the Kd (ligand concentration that
binds to half the receptor sites at equilibrium) and Bmax (maximum number of
binding sites).
The ligand binds not only to receptors sites, but also to nonspecific sites. There
294
297
Analyze the total binding only, inferring the amount of nonspecific binding
from the shape of the total binding curve. Learn more 292 .
Globally analyze the total and nonspecific binding at one time. This is the
best approach, and the details are explained below.
Step by step
Create an XY data table. Enter radioligand concentration into X, total binding
into Y, and nonspecific binding into column B.
Use any convenient units for X. The Kd will be reported in those same
concentration units. Use the same units for total and nonspecific binding. The
Bmax will be reported in those same units.
Alternatively choose the sample data set: Binding - Saturation binding to total
and nonspecific.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Saturation Binding equations, and choose One site -- Total and
nonspecific binding.
Consider constraining the parameter Background to a constant value of zero.
This is the measured 'binding' when there is no radioligand binding added, so
represents the counter background, if there is any.
Model
specific=Bmax*X/(X+Kd)
nonspecific=NS*X + Background
<A>Y=specific+nonspecific
<B>Y=nonspecific
The <A> and <B> syntax means that the third line is only used for data set A
(total binding) while the fourth line is used only for data set B (nonspecific).
The parameters NS and Background are shared
41
295
2.7.6.4
Introduction
You don't have to measure nonspecific binding directly. Instead, you can
determine Bmax and Kd by fitting only total binding by assuming that the
amount of nonspecific binding is proportional to the concentration of
radioligand.
If only a small fraction of radioligand binds, you can use a simpler model
292
296
This equation allows for a substantial fraction of the added ligand to bind. This
only works with radioactive ligands, so the assessment of added ligand and
bound ligand are in the same counts-per-minute units. This method doesn't
work with fluorescent ligands.
Step by step
Create an XY data table. Enter radioligand concentration into X, and total
binding into Y. If you have several experimental conditions, place the first into
column A, the second into column B, etc.
Enter both X and Y in CPM units. This is essential for the analysis to work.
From the table of total binding, click Analyze, choose nonlinear regression,
choose the panel of Saturation Binding equations, and choose One site -- Total,
accounting for ligand depletion.
You must constrain two parameters to constant values based on your
experimental design:
SpAct is the specific radioactivity in cpm/fmol
Vol is the reaction volume in ml
Model
KdCPM=KdnM * Vol * 1000 * SpecAct
; (nm/L * mL * 0.001 L/ml * 1000000 fmol/nmol * cpm/fmol)
a=-1-NS
b=KdCPM + NS*KdCPM + X + 2*X*NS + Bmax
c=-1*X*(NS*KdCPM + X*NS+Bmax)
Y=(-b+sqrt(b*b-4*a*c) )/(2*a) ;Y is in cpm
297
Notes
This analysis accounts for the fact that a large fraction of the added radioligand
binds to the receptors. If you are able to assume that only a small fraction of
radioligand binds, which means that the concentration you added is virtually
identical to the free concentration, use an alternative analysis 292 .
Reference
This equation came from S. Swillens (Molecular Pharmacology, 47: 1197-1203, 1995)
2.7.6.5
Introduction
In a saturation binding experiment, you vary the concentration of radioligand
and measure binding. The goal is to determine the Kd (ligand concentration that
binds to half the receptor sites at equilibrium) and Bmax (maximum number of
binding sites).
The ligand binds not only to receptors sites, but also to nonspecific sites. There
are three approaches to dealing with nonspecific binding.
Subtract off the nonspecific, and analyze only the specific binding. Read on
for this approach.
Analyze the total binding only, inferring the amount of nonspecific binding
from the shape of the total binding curve. Learn more 292 .
Globally analyze the total and nonspecific binding at one time. Learn more.
293
Step by step
Create an XY data table. Enter radioligand concentration into X, and specific
298
binding into Y. If you have several experimental conditions, place the first into
column A, the second into column B, etc.
An alternative approach would be to enter total binding into column A, and
nonspecific into column B. Then use the Remove Baseline analysis to subtract
column B from column A, creating a new results table with the specific binding.
From the table of specific binding, click Analyze, choose nonlinear regression,
choose the panel of Saturation Binding equations, and choose One site specific
binding.
Model
Y = Bmax*X/(Kd + X)
299
If you create a Scatcahrd plot, use it only to display your data. The human
retina and visual cortex evolved to detect edges (straight lines), not rectangular
hyperbolas, and so it can help to display data this way. Scatchard plots are
often shown as insets to the saturation binding curves. They are especially
useful when you want to show a change in Bmax or Kd.
Don't use the slope and intercept of a linear regression line to determine values
for Bmax and Kd. If you do this, you won't get the most accurate values for
Bmax and Kd. The problem is that the transformation distorts the experimental
error, so the data on the Scatchard plot do not obey the assumptions of linear
regression. Use nonlinear regression to obtain the most accurate values of Kd
and Bmax.
To create a Scatchard plot from your specific binding data, use Prism's
Transform analysis, and choose the Scatchard transform from the panel of
biochemistry and pharmacology transforms.
To create a Scatchard line corresponding to the nonlinear regression fit, follow
these steps:
1.Create a new XY data table, with no subcolumns.
2.Into row 1 enter X=0, Y=Bmax/Kd (previously determined by nonlinear
regression).You need to do the calculation manually, and enter a number.
3.Into row 2 enter X=Bmax and Y=0. Again enter the number into the X
column, not the text 'Bmax'.
4.Note the name of this data table. Perhaps rename it to something
appropriate.
5.Go to the Scatchard graph.
6.Drag the new table from the navigator and drop onto the graph.
7.Double-click on one of the new symbols for that data set to bring up the
Format Graph dialog.
300
Notes
This is not the best way to determine Bmax and Kd. It is better to globally
fit total and nonspecific binding 293 , without subtracting to compute specific
binding.
When making a Scatchard plot, you have to choose what units you want to
use for the Y-axis. Some investigators express both free ligand and specific
binding in cpm so the ratio bound/free is a unitless fraction. While this is
easy to interpret (it is the fraction of radioligand bound to receptors), an
alternative is to express specific binding in sites/cell or fmol/mg protein, and
to express the free radioligand concentration in nM. While this makes the Yaxis hard to interpret visually, it provides correct units for the slope (which
equals -1/Kd).
2.7.6.6
Introduction
In a saturation binding experiment, you vary the concentration of radioligand
and measure binding. The goal is to determine the Kd (ligand concentration that
binds to half the receptor sites at equilibrium) and Bmax (maximum number of
binding sites).
This equation assumes you have subtracted off the nonspecific, and are only
analyzing specific binding.
This equation fits a Hill slope. If you assume the Hill slope is 1.0 (for mass
action binding of a monomer to one site) use a simpler equation 297 .
Step by step
Create an XY data table. Enter radioligand concentration into X, and specific
binding into Y. If you have several experimental conditions, place the first into
column A, the second into column B, etc.
An alternative approach would be to enter total binding into column A, and
nonspecific into column B. Then use the Remove Baseline analysis to subtract
column B from column A, creating a new results table with the specific binding.
From the table of specific binding, click Analyze, choose nonlinear regression,
301
choose the panel of Saturation Binding equations, and choose One site specific
binding with Hill Slope.
Model
Y=Bmax*X^h/(Kd^h + X^h)
302
2.7.6.7
Binding potential
The figure above shows a saturation binding equation fit to specific binding data.
But all the concentrations are relatively low (compared to the Kd of binding) so
the data almost form a straight line. The saturation binding model fits the data
fine (solid curve), with a narrow 95% confidence band around the curve
(dashed lines). But look at the confidence interval for Bmax and Kd (shown in
the box)! They are super wide, even descending into negative (impossible)
values. With these data, the curve fit the data fine, but neither parameter in the
model (Bmax and Kd) were fit with reasonable confidence intervals. This is no
surprise. The data dont show even a hint of plateauing, so the data simply
dont define the Bmax and Kd at all.
The problem is that the data are consistent with a system with a huge number
of low affinity receptors (high Bmax; high Kd) or a smaller number of high
affinity receptors (low Bmax; low Kd). The figure below shows two ways the
visualize this. The graph on the left shows the same fit as the one above, but
with the best-fit curve and its confidence bands extended out to larger
concentrations. While the confidence band is tight near the points, is it super
wide as it goes beyond concentrations with data. The graph on the right show
two different fits, with the Kd constrained to equal 200 nM or 500 nM. The
curves are very different, yet both go near the data.
303
Another way to look at the problem with these data is that the Bmax and Kd
are correlated. Prism can report the covariance matrix as part of its nonlinear
regression results. With only two parameters, there is only one value in that
"matrix". The covariance between Kd and Bmax (which can range from 0.0 to
1.0) is 0.9993. Prism can also report the dependency of each parameter
(which also can range from 0.0 to 1.0). With only two parameters, both have
the same dependency, which is 0.9986 for this example. This is not quite high
enough for Prism to declare the results ambiguous, but the threshold for that
designation (dependency > 0.9999) is arbitrary.
It is tempting to give up at this point and say that nothing can be determined
without more data at higher concentrations. But in some systems, especially
those using PET scanning to detect receptors, data like this are typical. It is
impossible to use higher concentrations of ligand.
304
In this example, Prism finds that the Binding Potential is 242.1, with a
confidence interval ranging from 183.4 to 300.7. That confidence interval is
reasonably narrow, so the result is quite useful.
Reference
1. Innis et al. Consensus nomenclature for in vivo imaging of reversibly binding
radioligands. Journal of Cerebral Blood Flow & Metabolism (2007) vol. 27 (9)
pp. 1533-1539
Download the Prism file for this example.
2.7.6.8
Introduction
In a saturation binding experiment, you vary the concentration of radioligand
and measure binding. The goal is to determine the Kd (ligand concentration that
binds to half the receptor sites at equilibrium) and Bmax (maximum number of
binding sites) of both kinds of receptors.
305
The ligand binds not only to receptors sites, but also to nonspecific sites. There
are three approaches to dealing with nonspecific binding.
Subtract off the nonspecific, and analyze only the specific binding. Read on
for this approach.
Analyze the total binding only, inferring the amount of nonspecific binding
from the shape of the total binding curve. This approach doesn't work well
when there are two classes of receptors.
Globally analyze the total and nonspecific binding at one time. Learn more.
307
Step by step
Create an XY data table. Enter radioligand concentration into X, and specific
binding into Y. If you have several experimental conditions, place the first into
column A, the second into column B, etc.
An alternative approach would be to enter total binding into column A, and
nonspecific into column B. Then use the Remove Baseline analysis to subtract
column B from column A, creating a new results table with the specific binding.
From the table of specific binding, click Analyze, choose nonlinear regression,
choose the panel of Saturation Binding equations, and choose Two sites -Specific binding.
Model
Site1=BmaxHi*X/(KdHi+X)
Site2=BmaxLo*X/(KdLo+X)
Y=Site1 + Site2
306
307
To plot the two straight lines that correspond to the nonlinear regression fit,
create a new data table that defines the two lines as shown below, using Bmax
and Kd values determined by nonlinear regression.
X
Bmax1/Kd1
Bmax1
Bmax2/Kd2
Bmax2
Go to the graph of the Scatchard transformed data and drag the new table to
that graph. Use the Format Graph dialog to plot the two data sets from the
table using connecting lines but no symbols.
2.7.6.9
Introduction
In a saturation binding experiment, you vary the concentration of radioligand
and measure binding. The goal is to determine the Kd (ligand concentration that
binds to half the receptor sites at equilibrium) and Bmax (maximum number of
binding sites).
The ligand binds not only to receptors sites, but also to nonspecific sites. There
are three approaches to dealing with nonspecific binding.
Subtract off the nonspecific, and analyze only the specific binding
304
Analyze the total binding only, inferring the amount of nonspecific binding
from the shape of the total binding curve. This approach doesn't work well
when the ligand binds to two sites
Globally analyze the total and nonspecific binding at one time. This is the
best approach, and the details are explained below.
308
Step by step
Create an XY data table. Enter radioligand concentration into X, total binding
into Y, and nonspecific binding into column B.
Use any convenient units for X. The Kd will be reported in those same
concentration units. Use the same units for total and nonspecific binding. The
Bmax will be reported in those same units.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Saturation Binding equations, and choose Two sites -- Total and
nonspecific binding.
The parameter Background is the measured 'binding' when there is no
radioligand binding added, so represents the counter background, if there is any.
Consider constraining it to a constant value of zero.
Model
Specific1=BmaxHi*X/(X+KdHi)
Specific2=BmaxLo*X/(X+KdLo)
Nonspecific=NS*X + Background
<A>Y=Specific1 + Specific2 + Nonspecific
<B>Y=Nonspecific
The <A> and <B> syntax means that the fourth line is only used for data set A
(total binding) while the fifth line is used only for data set B (nonspecific).
The parameters NS and Background are shared between the two data sets.
309
Introduction
An allosteric modulator can reduce radioligand binding. This model fits
experiments, where entire radioligand binding curves are measured in the
absence and presence of modulator. The goal is to learn the affinity of the
modulator for binding to its site, and also determine the value of alpha, the
ternary complex constant that quantifies the degree to which binding of the
modulator alters the affinity of the radioligand for the receptor site.
Step by step
Create an XY data table. Enter the concentration of the labeled ligand into X,
using any convenient units (maybe nM). Enter specific binding into Y in any
convenient units. Enter data with no modulator into column A. Enter data
collected with a constant concentration of modulator into column B. Repeat, if
you have data, for column C, D, E, ..., each with a different concentration of
modulator. Enter the modulator concentration (in nanomolar so 1nM is entered
as '1') into the column titles. Don't forget to enter '0' as the column title for
data set A.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Saturation Binding equations, and choose Allosteric modulator shift.
You don't need to constrain any parameters to constant values.
Model
Hot=X
Alpha=10^logalpha
KB=10^logKB
KApp=KDHot*((1+Allo/KB)/(1+alpha*Allo/KB))
Y=Bmax*Hot/(Hot+KApp)
310
Notes
This model is designed to analyze data when the unlabeled compound
works via an allosteric site. Since the labeled and unlabeled ligands are
acting via different sites, it is inappropriate (and incorrect) to refer to the
modulator as a competitor.
The model is written to fit the logarithm of alpha, rather than alpha itself.
This is because alpha is asymmetrically (all values from 0 to 1 mean that
the modulator decreases binding, while all values from 1 to infinity mean
that the modulator enhances binding. On a log scale, its values are more
symmetrical, so the confidence interval computed on a log scale (as Prism
does) are more accurate. Prism reports both alpha and log(alpha).
This model assumes that the allosteric modulator is present in excess, so
1995-2014 GraphPad Software, Inc.
311
the concentration you added is very close to its free concentration. This
model won't work when the concentration of allosteric modulator is limiting
(as it is when G proteins alter agonist binding to many receptors). No
explicit model can handle this situation. You need to define the model with
an implicit equation (Y on both sides of the equals sign) and Prism cannot
handle such equations.
Reference
2.7.7
2.7.7.1
Ligand depletion
If a large fraction of the added radioligand binds to the receptors, the ligand is
depleted so the concentration you added is greater than the free concentration.
You need to fit these data to a model that accounts for ligand depletion 318 .
Homologous binding
An homologous binding experiment is one where the labeled and unlabeled
ligands have identical affinities for the receptors. Generally this is because the
two are chemically identical. Receptor number and affinity are determined by
analyzing the competition of varying concentrations of unlabeled ligand for one
(or better, two) concentrations of labeled ligand. Prism offers a special model
312
321
Allosteric modulators
Allosteric modulators can alter radioligand binding, even though they bind to
different sites. Since the hot and cold ligands bind to different sites, the term
'competition' is not apt, but we include this model here because the
experimental design is the same as used for competitive binding. Prism can fit
binding inhibition 322 (or augmentation) by an allosteric modulator based on the
ternary complex model. Note that this model assumes the allosteric modulator
is present in excess, so is not depleted by binding to the receptors.
2.7.7.2
Introduction
You can determine the equilibrium dissociation constant of an unlabelled ligand
by measuring its competition for radioligand binding.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
unlabeled compound into X, and binding into Y. If you have several experimental
conditions, place the first into column A, the second into column B, etc. Use
subcolumns to enter replicates.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Competition Binding equations, and choose One site - Fit Ki.
You must constrain two parameters to constant values based on your
experimental design:
RadioligandNM is the concentration of labeled ligand in nM. A single
concentration of radioligand is used for the entire experiment.
HotKdNM is the equilibrium dissociation constant of the labeled ligand in nM.
Model
logEC50=log(10^logKi*(1+RadioligandNM/HotKdNM))
Y=Bottom + (Top-Bottom)/(1+10^(X-LogEC50))
313
Notes
This model fits the Ki of the unlabelled ligand directly. It does not report the
EC50, so you do not need to apply the Cheng and Prusoff correction(1).
Instead you enter the concentration of radioligand and its Kd as constants, and
Prism directly fits the Ki of your cold compound. If you want to know the IC50,
fit a log(dose)-response curve.
If you want to know the IC50 (which is not very informative in this situation) fit
the data using an alternative equation 314 .
The analysis assumes that you have one site, and that the binding is reversible
and at equilibrium.
1. Cheng, Y. and Prusoff, W. H. Relationship between the inhibition constant (K1) and the
concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction.
Biochem Pharmacol, 22: 3099-3108, 1973.
314
2.7.7.3
Introduction
You can determine the equilibrium dissociation constant of an unlabelled ligand
by measuring its competition for radioligand binding.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
unlabeled compound into X, and binding into Y. If you have several experimental
conditions, place the first into column A, the second into column B, etc. Use
subcolumns to enter replicates.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Competition Binding equations, and choose One site - Fit logEC50.
Model
Y=Bottom + (Top-Bottom)/(1+10^(X-LogIC50))
315
Notes
This model is the same as an inhibitory dose-response curve. It fits the logIC50,
which is not the same as the Ki of the unlabelled ligand for binding. The Ki
depends on the IC50, the concentration of radioligand, and its Kd for binding.
You can fit the Ki directly using a different equation 312 .
The analysis assumes that you have one site, and that the binding is reversible
and at equilibrium.
2.7.7.4
Introduction
You can determine the equilibrium dissociation constant of an unlabelled ligand
by measuring its competition for radioligand binding. This model assumes that
there are two classes of sites with identical affinity for the radioligand, but
different affinities for the competitor.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
unlabeled compound into X, and binding into Y. If you have several experimental
conditions, place the first into column A, the second into column B, etc. Use
subcolumns to enter replicates.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Competition Binding equations, and choose Two sites - Fit Ki.
You must constrain three parameters to constant values based on your
experimental design:
RadioligandNM is the concentration of labeled ligand in nM. A single
concentration of radioligand is used for the entire experiment.
KdHi is the equilibrium dissociation constant of the labeled ligand for the
high-affinity site in nM.
KdLo is the equilibrium dissociation constant of the labeled ligand for the
low-affinity site in nM.
Model
ColdnM=10^(X+9)
1995-2014 GraphPad Software, Inc.
316
KIHinM = 10^(LogKI_Hi+9)
KILonM = 10^(LogKI_Lo+9)
SITE1= HotnM*(Top-Bottom)/(HotnM + KDHotNM_Ki*(1+coldnM/KiHinM))
SITE2= HotnM*(Top-Bottom)/(HotnM + KDHotNM_Lo*(1+coldnM/KiLonM))
Y = SITE1*FractionHi + SITE2*(1-FractionHi) + Bottom
Notes
This model fits the two log(Ki) values of the unlabelled ligand directly. It does
not report the IC50s, so you do not need to apply the Cheng and Prusoff
correction(1). Instead you enter the concentration of radioligand and its Kd as
constants, and Prism directly fits the Ki of your cold compound. If you want to
fit the two IC50 values instead of the Ki values, use a different equation 317 .
The analysis assumes that you know the affinity of both sites for the labeled
ligand. In many cases, the radioligand has the same affinity for both sites. In
that case, simply enter that value twice. If the two sites have different affinities
for the labeled ligand, enter both values (determined from other experiments).
Watch out for the labels. The constant KdHi is the Kd of the hot ligand for the
receptors with the high affinity for the unlabeled ligand, and KdLo is the Kd of
317
the hot ligand for the receptors with lower affinity for the unlabeled ligand. So
KdHi may be larger or smaller than KdLo.
This analysis assumes that the binding is reversible and at equilibrium. It also
assumes that the labeled and unlabeled ligands compete for the same binding
sites.
1. Cheng, Y. and Prusoff, W. H. Relationship between the inhibition constant (K1) and the
concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction.
Biochem Pharmacol, 22: 3099-3108, 1973.
2.7.7.5
Introduction
You can determine the equilibrium dissociation constant of an unlabelled ligand
by measuring its competition for radioligand binding. This model assumes that
there are two classes of sites with identical affinity for the radioligand, but
different affinities for the competitor.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
unlabeled compound into X, and binding into Y. If you have several experimental
conditions, place the first into column A, the second into column B, etc. Use
subcolumns to enter replicates.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Competition Binding equations, and choose Two sites - Fit logIC50.
Model
Span=Top-Bottom
Section1=Span*FractionHi/(1+10^((X-LogIC50HI)))
Section2=Span* (1-FractionHi)/(1+10^((X-LogIC50Lo)))
Y=Bottom + Section1 +Section2
318
Notes
This model fits the two IC50 values of the unlabelled ligand. It does not report
the two Ki values. The Ki values depend on the IC50s, the concentration of
radioligand, and its Kd for binding. You can fit the Ki values directly using a
different equation 315 .
This analysis assumes that the binding is reversible and at equilibrium. It also
assumes that the labeled and unlabeled ligands compete for the same binding
sites.
2.7.7.6
Introduction
This model for competitive binding is useful when a large fraction of the added
radioligand binds to the receptors, so the concentration you added is greater
than the free concentration.
319
These equation allows for a substantial fraction of the added ligand to bind. This
only works with radioactive ligands, so the assessment of added ligand and
bound ligand are in the same counts-per-minute units. This method doesn't
work with fluorescent ligands.
Step by step
Create an XY data table. Enter the logarithm of the molar concentration of the
unlabeled compound into X, and binding into Y. The Y values must be in cpm.
If you have several experimental conditions, place the first into column A, the
second into column B, etc. Use subcolumns to enter replicates.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Competition Binding equations, and choose One site -- Heterologous
with depletion.
You must constrain four parameters to constant values based on your
experimental design:
Hot is the amount of labeled ligand in cpm. A single concentration of
radioligand is used for the entire experiment.
KdNM is the equilibrium dissociation constant of the labeled ligand in nM.
SpAct is the specific radioactivity in cpm/fmol.
Vol is the reaction volume in ml.
Model
KdCPM=KdnM*SpAct*vol*1000
; nmol/L *(cpm/fmol * ml * .001L/ml * 1000000fmol/nmol) = cpm
R=NS+1
S=[1+10^(X-LogKi)]*KdCPM+Hot
a=-1*R
b=R*S+NS*Hot + Bmax
c= -1*Hot*(S*NS + Bmax)
Y= (-1*b + sqrt(b*b-4*a*c))/(2*a)
320
Notes
This analysis accounts for the fact that a large fraction of the added radioligand
binds to the receptors. If you are able to assume that only a small fraction of
radioligand binds, which means that the concentration you added is virtually
identical to the free concentration, use an alternative analysis 312 .
Reference
This equation came from S. Swillens (Molecular Pharmacology, 47: 1197-1203, 1995)
2.7.7.7
321
Introduction
An homologous binding experiment is one where the labeled and unlabeled
ligands have identical affinities for the receptors. Generally this is because the
two are chemically identical. Receptor number and affinity are determined by
analyzing the competition of varying concentrations of unlabeled ligand for one
(or better, two) concentrations of labeled ligand.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
unlabeled compound into X, and binding into Y. Enter the concentration of
labeled ligand (in nM) as the column title. You will get better results if you use
two different concentrations of labeled ligand.
If you have several experimental conditions, place the first into column A, the
second into column B, etc. Use subcolumns to enter replicates.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Competition Binding equations, and choose One site - Homologous.
Model
ColdNM=10^(x+9) ;Cold concentration in nM
KdNM=10^(logKD+9) ;Kd in nM
Y=(Bmax*HotnM)/(HotnM + ColdNM + KdNM) + Bottom
322
Notes
This model assumes that the hot and cold ligand binds identically to the
receptor, and that you use two concentrations of hot ligand (in column A, and B
..) and vary cold. It assumes that a small fraction of added ligand binds, so the
free concentration is close to what you added.
2.7.7.8
Introduction
Fits a curve of "competition" of binding by an allosteric modulator, based on the
ternary complex model. Note that this model assumes the allosteric modulator
is present in excess, so is not depleted by binding to the receptors. Since it
binds to a different site than the radioligand, the term 'competition' is not apt,
but we list it here because the experimental design is the same as used for
competitive binding.
Step by step
Create an XY data table. Enter the logarithm of the concentration of the
unlabeled modulator (in nM) into X, and specific binding into Y in any convenient
units.
From the data table, click Analyze, choose nonlinear regression, choose the
panel of Competition Binding equations, and choose Allosteric modulator
titration.
323
Model
AlloNM=10^(X+9)
KbNM=10^(logKb +9)
alpha=10^logAlpha
KAppNM=HotKDnm*(((1+(AlloNM/KBNM))/(1+alpha*(AlloNM/KBNM))))
HotOccupancy = RadioligandNM/(RadioligandNM + HotKDnm)
Y=(Y0/HotOccupancy)*(RadioligandNM/(RadioligandNM + KAppNM))
324
Notes
This model is designed to analyze data when the unlabeled compound works
via an allosteric site. Since the labeled and unlabeled ligands are acting via
different sites, it is inappropriate (and incorrect) to refer to these types of
experiments as competition binding assays. In some cases, in fact, the
allosteric modulator enhances radioligand binding.
The model is written to fit the logarithm of alpha, rather than alpha itself. This is
because alpha is asymmetrically (all values from 0 to 1 mean that the
modulator decreases binding, while all values from 1 to infinity mean that the
modulator enhances binding. On a log scale, its values are more symmetrical,
so the confidence interval computed on a log scale (as Prism does) are more
accurate.
The Y axis plots specific binding. Even at very high concentrations of inhibitor,
the specific binding does not descend to zero. This is the nature of allosteric
inhibition. If alpha is very high, then the binding is inhibited almost to zero. If
alpha is not so high, then the maximum inhibition is more modest. For example,
if alpha=3, the maximum inhibition is down to 33%.
This model assumes that the allosteric modulator is present in excess, so the
concentration you added is very close to its free concentration. This model
does not apply when the concentration of allosteric modulator is limiting (as it is
when G proteins alter agonist binding to many receptors). No explicit model can
handle this situation. You need to define the model with an implicit equation (Y
on both sides of the equals sign) and Prism cannot handle such equations.
Reference
2.7.8
2.7.8.1
325
Rate of dissociation
A dissociation binding experiment measures the off rate for radioligand
dissociating from the receptor. Initially ligand and receptor are allowed to bind,
perhaps to equilibrium. At that point, you need to block further binding of
radioligand to receptor (by adding an unlabeled drug or by dilution) so you can
measure the rate of dissociation, which follows a one-phase exponential decay
with a rate constant equal to the rate of radioligand dissociation.
Rate of association
In an association experiment, you add radioligand and measure specific binding
at various times thereafter.
Binding increases over time until it plateaus. This plateau is not the same as the
Bmax. The plateau in an association experiment depends on the concentration
of radioligand used, while the Bmax is extrapolated to an infinite concentration
of radioligand.
The rate at which binding increases is determined by three factors (as well as
experimental conditions such as pH and temperature):
The association rate constant, kon or k+1. This is what you are trying to
determine.
The concentration of radioligand. If you use more radioligand, the system
equilibrates faster.
The dissociation rate constant, koff or k-1. Some people are surprised to
1995-2014 GraphPad Software, Inc.
326
Introduction
A dissociation binding experiment measures the off rate for radioligand
dissociating from the receptor. Initially ligand and receptor are allowed to bind,
perhaps to equilibrium. At that point, you need to block further binding of
radioligand to receptor so you can measure the rate of dissociation. There are
several ways to do this:
If the tissue is attached to a surface, you can remove the buffer containing
radioligand and replace with fresh buffer without radioligand.
Spin the suspension and resuspend in fresh buffer.
Add a very high concentration of an unlabeled ligand. If this concentration is
high enough, it will instantly bind to nearly all the unoccupied receptors and
thus block binding of the radioligand.
Dilute the incubation by a large factor, at least 100 fold dilution. This will
reduce the concentration of radioligand by that factor. At such a low
concentration, new binding of radioligand will be negligible. This method is
only practical when you use a fairly low concentration of radioligand so its
concentration after dilution is far below its Kd for binding.
You then measure binding at various times after that to determine how rapidly
the radioligand falls off the receptors.
Step by step
Create an XY data table. Enter time into X, and total binding into Y. If you have
327
several experimental conditions, place the first into column A, the second into
column B, etc.
From the table of specific binding, click Analyze, choose nonlinear regression,
choose the panel of Kinetics Binding equations, and choose Dissociation - One
phase exponential decay .
Model
Y=(Y0-NS)*exp(-K*X) + NS
328
The right figure shows ideal dissociation data when radioligand is bound to
interacting binding sites with negative cooperativity. The data are different
depending on how dissociation was initiated. If dissociation is initiated by infinite
dilution, the dissociation rate will change over time. The dissociation of some
radioligand will leave the remaining ligand bound more tightly. When dissociation
is initiated by addition of cold drug, all the receptors are always occupied by
ligand (some hot, some cold) and dissociation occurs at its maximal unchanging
rate.
2.7.8.3
Introduction
When you measure the association rate of a radioligand, the rate at which the
binding equilibrates depends not only on the association rate constant and the
amount of ligand you used, but also on its dissociation rate constant. (Why? 325 )
The only way to fit the association rate constant by analyzing association data
from one concentration of radioligand, is to constrain the dissociation rate
constant to a value you determined in a different experiment.
Alternative methods of determining an association rate constant are to globally
fit data obtained with multiple radioligand concentrations 329 , or to analyze an
experiment that measuresboth association and dissociation 331 rate
sequentially.
Step by step
Create an XY data table. Enter time in minutes into X, and specific binding into
Y.
From the table of specific binding, click Analyze, choose nonlinear regression,
choose the panel of Kinetics Binding equations, and choose Association kinetics
- One conc. of hot.
You must constrain Hotnm ([radioligand] in nM) and Koff (dissociation rate
constant, in inverse minutes) to constant values.
329
Model
Kd=Koff/Kon
L=Hotnm*1e-9
Kob=Kon*L+Koff
Occupancy=L/(L+Kd)
Ymax=Occupancy*Bmax
Y=Ymax*(1 - exp(-1*kob*X))
Notes
2.7.8.4
Introduction
You cannot determine kon from an association binding measured at a single
concentration of radioligand. The observed association rate depends on the
association rate constant, the amount of ligand you used, and its dissociation
rate constant. With one concentration of radioligand, the results are ambiguous.
.
If you perform an association kinetic experiments with multiple radioligand
concentration, you can globally fit the data to the association kinetic model to
derive a single best-fit estimate for kon and one for koff.
Shown below is an example of an association kinetic experiment conducted
using two concentrations of radioligand. All other conditions (temperature, pH,
330
etc.) were the same for both runs, of course. Times were entered into the X
column, specific binding for one concentration of radioligand were entered into
the first (A) Y column, and binding for the other concentration were entered
into column B.
Step by step
Create an XY data table. Enter time in minutes into X, and total binding into Y.
Enter binding at one concentration of radioligand into the column A, binding at
another concentration into column B, etc. Enter the concentrations, in nM, into
the column titles.
From the table of specific binding, click Analyze, choose nonlinear regression,
choose the panel of Kinetics Binding equations, and choose Association - Two or
more conc. of hot.
Model
Kd=Koff/Kon
L=Hotnm*1e-9
Kob=Kon*L+Koff
Occupancy=L/(L+Kd)
Ymax=Occupancy*Bmax
Y=Ymax*(1 - exp(-1*kob*X))
331
Notes
According to the law of mass action, the ratio of koff to kon is the Kd of
receptor binding:
Compare the Kd calculated this way (from kinetic experiments) with the Kd
determined from a saturation binding curve. If binding follows the law of mass
action, the two Kd values should be indistinguishable.
2.7.8.5
Introduction
You cannot determine the association rate constant by simply observing the
association of a single concentration of radioligand. The rate at which a ligand
reaches equilibrium is determined not only by the association rate constant and
the ligand concentration, but also by the dissociation constant.
One way to determine the association rate constant is to globally fit data
obtained with two different concentrations of radioligand 328 . An alternative
approach, explained here is to measure association and dissociation in one
experiment.
Add a radioligand and measure total binding at multiple time points, then at
Time0 initiate dissociation (by adding an antagonist or by massive dilution) and
then measure dissociation at various times.
Step by step
Create an XY data table. Enter time in minutes into X, and total binding into Y.
From the table of specific binding, click Analyze, choose nonlinear regression,
choose the panel of Kinetics Binding equations, and choose Association then
dissociation.
Constrain HotNM ([radioigand in nM] and Time0 (time at which dissociation was
initiated) to constant values. If you entered specific binding into the Y column,
also constrain NS to a constant value of zero.
332
Model
Radioligand=HotNM*1e-9
Kob=[Radioligand]*Kon+Koff
Kd=Koff/Kon
Eq=Bmax*radioligand/(radioligand + Kd)
Association=Eq*(1-exp(-1*Kob*X))
YatTime0 = Eq*(1-exp(-1*Kob*Time0))
Dissociation= YatTime0*exp(-1*Koff*(X-Time0))
Y=IF(X<Time0, Association, Dissociation) + NS
Notes
2.7.8.6
333
Introduction
Kinetics experiments can determine the dissociation and association rate
constants (off-rate and on-rate) of an unlabeled compound. Add labeled and
unlabeled ligand together and measure the binding of the labeled ligand over
time. Fit to the appropriate model described below, constraining the rate
constants of the labeled ligand to constant values determined from other
experiments, and fit the rate constants of the unlabeled compound.
Using only a single concentration of labeled and radioligand, it is very hard to
determine the rate constants with any reasonable precision. But measure the
kinetics at two (or more) concentrations of the unlabeled ligand, and the results
are much more precise.
Step by step
Create an XY data table. Enter time in minutes into X, and specific binding in
cpm into Y. Enter the binding for one concentration of of the unlabeled ligand in
column A, and another concentration in column B, etc.. Enter the
concentrations, in nM, into the column titles.
From the table of specific binding, click Analyze, choose nonlinear regression,
choose the panel of Kinetics Binding equations, and choose Kinetics of
competitive binding.
Constrain k1 and k2 to constant values determined from kinetic binding
experiments. k1 is the association rate constant of the hot ligand in M- 1 min- 1
and k2 is its dissociation rate constant in units of min- 1 .
Also constrain L to be a constant value equal to the concentration of labeled
ligand in nM.
I is constrained to be a column constant whose value comes from the column
titles.
Model
KA = K1*L*1E-9 + k2
KB = K3*I*1e-9 + K4
S=SQRT((KA-KB)^2+4*K1*K3*L*I*1e-18)
KF = 0.5 * (Ka + KB + S)
KS = 0.5 * (KA + KB - S)
DIFF=KF - KS
Q=Bmax*K1*L*1e-9/DIFF
Y=Q*(k4*DIFF/(KF*KS)+((K4-Kf)/KF)*exp(-KF*X)-((K4-KS)/KS)*exp(-KS*X))
1995-2014 GraphPad Software, Inc.
334
constant you know from other experiments. The Bmax is the maximum binding
at equilibrium with a very high concentration of radioligand. It is usually much
larger than any binding seen in the experiment.
All three parameters are constrained to be shared, so Prism fits one value of each of the
three parameters for all the data sets.
Notes
This equation does not account for ligand depletion. It assumes that only a
small fraction of radioligand binds to receptors, so that the free
concentration of radioligand is very close to the added concentration.
This method will only give reliable results if you have plenty of data points at
early time points.
The ratio K4/K3 is the equilibrium dissociation constant of the cold ligand in
Molar. You should compare this value (determined via kinetics) with the
same value determined by equilibrium competition.
335
Reference
2.7.9
2.7.9.1
What is an enzyme?
Living systems depend on chemical reactions which, on their own, would occur
at extremely slow rates. Enzymes are catalysts that reduce the needed
activation energy so these reactions proceed at rates that are useful to the cell.
The study of enzyme kinetics can help us understand the function and
regulation of enzymes.
[Product]
In most cases, an enzyme converts one chemical (the substrate) into another
(the product). A graph of product concentration vs. time follows three phases
marked on the graph below.
Time
1.At very early time points (usually less than a second), the rate of product
accumulation increases over time. Special techniques, not available in Prism,
are needed to study the early kinetics of enzyme action. The graph above
exaggerates this first phase.
2.For an extended period of time, the product concentration increases linearly
with time. All the analyses built-in to Prism use data collected during this
second phase.
3.At later times, the substrate is depleted, so the curve starts to level off.
336
Terminology
The terminology can be confusing. Note these confusing points:
As mentioned above, almost all studies of enzyme "kinetics" are done by
collecting data at a single time point. The X axis is substrate (or inhibitor)
concentration, not time.
The second phase shown in the graph above is often called the "initial
rate", a phrase that makes sense only if you ignore the short transient
phase that precedes it.
That second phase is also called "steady state", because the concentration
of enzyme-substrate complex doesn't change during that phase. However,
the concentration of product accumulates, so the system is not truly at
steady state until, much later, the concentration of product truly doesn't
change over time.
2.7.9.2
Standard analyses of enzyme kinetics (the only kind discussed here) assume:
The production of product is linear with time during the time interval used.
337
338
to
The Vmax equals the product of the concentration of active enzyme sites times
the turnover rate, kcat. This is the number of substrate molecules each enzyme
site can convert to product per unit time. If you know the concentration of
enzyme, you can fit the curve to determine kcat and Km 340 . The curve will be
identical to the Michaelis-Menten fit.
If the enzyme has cooperative subunits, the graph of enzyme velocity as a
function of substrate concentration will appear sigmoidal. Prism offers one
empirical equation for fitting sigmoidal substrate-velocity curves 342 .
338
Introduction
The most common kind of enzyme kinetics experiment is to vary the
concentration of substrate and measure enzyme velocity. The goal is to
determine the enzyme's Km (substrate concentration that yield a half-maximal
velocity) and Vmax (maximum velocity). If your goal is to determine the
turnover number kcat, rather than the Vmax, use an alternative version 340 of
the equation.
Step by step
Create an XY data table. Enter substrate concentration into X, and enzyme
velocity into Y. If you have several experimental conditions, place the first into
column A, the second into column B, etc.
You can also choose Prism's sample data: Enzyme kinetics -- Michaelis-Menten.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of enzyme kinetics equations, and choose Michaelis-Menten enzyme
kinetics.
Model
Y = Vmax*X/(Km + X)
Velocity
Vmax
Km
Substrate
339
1
KM
1/Velocity
Slope=
Km
Vm ax
1
Vmax
1/[Substrate]
If you create a Lineweaver-Burk plot, use it only to display your data. Don't use
the slope and intercept of a linear regression line to determine values for Vmax
and Km. If you do this, you won't get the most accurate values for Vmax and
Km. The problem is that the transformations (reciprocals) distort the
experimental error, so the double-reciprocal plot does not obey the
assumptions of linear regression. Use nonlinear regression to obtain the most
accurate values of Km and Vmax.
To create a Lineweaver-Burk plot with Prism, use the Transform analysis, then
choose the panel of biochemistry and pharmacology transforms.
To create a Lineweaver-Burke line corresponding to the nonlinear regression fit,
follow these steps:
1.Create a new XY data table, with no subcolumns.
2.Into row 1 enter X=-1/KM, Y =0 (previously determined by nonlinear
regression).
3.Into row 2 enter X=1/Smin (Smin is the smallest value of [substrate] you
want to include on the graph) and Y=(1/Vmax)(1.0 + KM/Smin).
4.Note the name of this data table. Perhaps rename it to something
appropriate.
5.Go to the Lineweaver-Burke graph.
6.Drag the new table from the navigator and drop onto the graph.
1995-2014 GraphPad Software, Inc.
340
7.Double-click on one of the new symbols for that data set to bring up the
Format Graph dialog.
8.Choose to plot no symbols, but to connect with a line.
Notes
See the list of assumptions
335
This equation fits exactly the same curve as the equation that fits the
turnover number Kcat 340 rather than the Vmax. The product of Kcat times Et (the
concentration of enzyme sites) equals the Vmax, so if you know Et, Prism can fit kcat.
This equation is a special case of the equation for allosteric enzymes 342 .
That allosteric model adds an additional parameter: the Hill slope h. When h
equals 1.0, the two models are identical.
Note that Km is not a binding constant that measures the strength of
binding between the enzyme and substrate. Its value takes into account the
affinity of substrate for enzyme, and also the rate at which the substrate
bound to the enzyme is converted to product.
2.7.10.3 Equation: Determine kcat
Introduction
Kcat is the turnover number -- the number of substrate molecule each enzyme
site converts to product per unit time. If you know the concentration of
enzyme sites, you can fit Kcat instead of Vmax when analyzing a substrate vs.
velocity curve.
The model
Y = Et*kcat*X/(Km + X)
100
Et*Kcat
Velocity
80
60
40
20
0
Km
[S], nM
341
If you know the concentration of enzyme sites you've added to the assay (Et)
then you can fit the catalytic constant Kcat using the model above.
When calculating Kcat, the concentration units cancel out, so Kcat is expressed
in units of inverse time. It is the turnover number -- the number of substrate
molecule each enzyme site converts to product per unit time.
1995-2014 GraphPad Software, Inc.
342
Notes
See the list of assumptions
335
This equation fits exactly the same curve as the equation that fits Vmax 338 ,
rather than the turnover number Kcat. The product of Kcat times Et (the
concentration of enzyme sites) equals the Vmax.
This equation is related to the equation for allosteric enzymes. That
allosteric model adds an additional parameter: the Hill slope h. When h
equals 1.0, the two models are identical.
Introduction
If the enzyme has cooperative subunits, the graph of enzyme velocity as a
function of substrate concentration will appear sigmoidal. Prism offers one
343
The model
Y=Vmax*X^h/(Khalf^h + X^h)
344
positive cooperativity. The variable h does not always equal the number of
interacting binding sites, but its value can not exceed the number of interacting
sites. Think of h as an empirical measure of the steepness of the curve and the
presence of cooperativity.
Kprime is related to the Km. It is computed as Khalf^h, and is expressed in the
same units as X.
Reference
Equation 5.47, in RA Copeland, Enzymes, 2nd edition, Wiley, 2000. In this
reference, Copeland shows how to fit Kprime. Via personal communication, he
extended the model to also fit the Khalf.
345
Prism can fit your data to three models of enzyme inhibition, plus a more
general model which includes the first three as special cases:
A competitive 345 inhibitor reversibly binds to the same site as the substrate,
so its inhibition can be entirely overcome by using a very high concentration
of substrate. The maximum velocity of the enzyme doesn't change (if you
give it enough substrate), but it takes more substrate to get to half maximal
activity. The substrate-velocity curve is shifted to the left but not down.
A noncompetitive 347 inhibitor binds with equal affinity to the enzyme, and the
enzyme-substrate complex. The inhibition is not surmountable by increasing
substrate concentration. The substrate-velocity curve is shifted down but
neither to the right or left.
An uncompetitive 349 inhibitor reversibly binds to the enzyme-substrate
complex, but not to the enzyme itself. This reduces both the effective Vmax
and the effective Km. The substrate-velocity curve is shifted down and to the
left.
The mixed 350 model is a general model that includes competitive,
noncompetitive and uncompetitive models as special cases. The model has
one more parameter than the others, and this parameter tells you about the
In some cases, the substrate of an enzyme also inhibits the enzyme by binding
to a second site on the enzyme. Prism offers a model to fit substrate-velocity
curves when the substrate also inhibits the enzyme 352 .
Reference
Introduction
A competitive inhibitor reversibly binds to the same site as the substrate, so its
inhibition can be entirely overcome by using a very high concentration of
substrate. The Vmax doesn't change, and the effective Km increases. You can
determine the Ki of a competitive inhibitor by measuring substrate-velocity
curves in the presence of several concentrations of inhibitor.
346
Step by step
Create an XY data table. Enter substrate concentration into the X column, and
enzyme activity into the Y columns. Each data set (Y column) represents data
collected in the presence of a different concentration of inhibitor, starting at
zero. Enter these concentrations into the column titles. Be sure to enter
concentrations, not logarithms of concentration.
Alternatively, choose the competitive enzyme inhibition sample data set.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of enzyme kinetics equations, and choose Competitive enzyme inhibition.
The model
KmObs=Km*(1+[I]/Ki)
Y=Vmax*X/(KmObs+X)
The constant I is the concentration of inhibitor, a value you enter into each
column title. This is constrained to equal a data set constant.
The parameters Vmax, Km and Ki are shared, so Prism fits one best-fit value
for the entire set of data.
347
Introduction
A noncompetitive inhibitor reversibly binds to both the enzyme-substrate
complex, and the enzyme itself. This means that the effective Vmax decreases
with inhibition but the Km does not change. You can determine the Ki of a
competitive inhibitor by measuring substrate-velocity curves in the presence of
several concentrations of inhibitor.
The term 'noncompetitive' is used inconsistently. It is usually used as defined
above, when the inhibitor binds with identical affinity to the free enzyme and the
enzyme-substrate complex. Sometimes, however, the term 'noncompetitive' is
used more generally, when the two binding affinities differ, which is more often
called mixed-model inhibition 350 .
Step by step
Create an XY data table. Enter substrate concentration into the X column, and
enzyme activity into the Y columns. Each data set (Y column) represents data
collected in the presence of a different concentration of inhibitor, starting at
zero. Enter these concentrations into the column titles. Be sure to enter
concentrations, not logarithms of concentration.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of enzyme kinetics equations, and choose Noncompetitive enzyme
inhibition.
The model
Vmaxinh=Vmax/(1+I/Ki)
Y=Vmaxinh*X/(Km+X)
348
The constant I is the concentration of inhibitor, a value you enter into each
column title. This is constrained to be a data set constant.
The parameters Vmax, Km and Ki are shared, so Prism fits one best-fit value
for the entire set of data.
345
Reference
349
Introduction
An uncompetitive inhibitor binds to the enzyme-substrate complex, but not the
free enzyme. This reduces both the effective Vmax and the effective Km. The
substrate-velocity curve is shifted down and to the left.
You can determine the Ki of a competitive inhibitor by measuring substratevelocity curves in the presence of several concentrations of inhibitor.
Step by step
Create an XY data table. Enter substrate concentration into the X column, and
enzyme activity into the Y columns. Each data set (Y column) represents data
collected in the presence of a different concentration of inhibitor, starting at
zero. Enter these concentrations into the column titles. Be sure to enter
concentrations, not logarithms of concentration.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of enzyme kinetics equations, and choose Uncompetitive enzyme
inhibition.
The model
VmaxApp=Vmax/(1+I/AlphaKi)
KmApp=Km/(1+I/AlphaKi)
Y=VmaxApp*X/(Kmapp+X)
The constant I is the concentration of inhibitor, a value you enter into each
column title. This is constrained to equal a data set constant.
The parameters Vmax, Km and Ki are shared, so Prism fits one best-fit value
for the entire set of data.
350
Introduction
The mixed model is a general equation that includes competitive 345 ,
uncompetitive 349 and noncompetitive 347 inhibition as special cases. The model
1995-2014 GraphPad Software, Inc.
351
has one more parameter than the others, and this parameter tells you about
the mechanism of inhibition.
Step by step
Create an XY data table. Enter substrate concentration into the X column, and
enzyme activity into the Y columns. Each data set (Y column) represents data
collected in the presence of a different concentration of inhibitor, starting at
zero. Enter these concentrations into the column titles. Be sure to enter
concentrations, not logarithms of concentration.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of enzyme kinetics equations, and choose Mixed model enzyme
inhibition.
Model
VmaxApp=Vmax/(1+I/(Alpha*Ki))
KmApp=Km*(1+I/Ki)/(1+I/(Alpha*Ki))
Y=VmaxApp*X/(KmApp + X)
The parameter I is the concentration of inhibitor, a value you enter into each
column title. This is constrained to equal a data set constant.
The parameters Alpha, Vmax, Km and Ki are shared, so Prism fits one best-fit
value for the entire set of data.
352
enhances substrate binding to the enzyme, and the mixed model becomes
nearly identical to an uncompetitive 349 model.
Reference
Introduction
At high concentrations, some substrates also inhibit the enzyme activity.
Substrate inhibition occurs with about 20% of all known enzymes. It happens
when two molecules of substrate can bind to the enzyme, and thus block
activity.
Step by step
Create an XY data table. Enter substrate concentration into the X column, and
enzyme activity into the Y columns. If you have several experimental
conditions, place the first into column A, the second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of enzyme kinetics equations, and choose Substrate inhibition.
Model
Y=Vmax*X/(Km + X*(1+X/Ki))
353
Parameters
Vmax is the maximum enzyme velocity, if the substrate didn't also inhibit
enzyme activity, expressed in the same units as Y.
Km is the Michaelis-Menten constant, expressed in the same units as X.
Ki is the dissociation constant for substrate binding in such a way that two
substrates can bind to an enzyme. It is expressed in the same units as X.
Reference
Introduction
This equation accounts for tight binding, so it does not assume that the free
concentration of inhibitor equals the total concentration.
Step by step
Create an XY data table. Enter inhibitor concentration into the X column (usually
in micromolar, but any concentration units is fine), and enzyme activity into the
Y columns (any units). If you have several experimental conditions, place the
first into column A, the second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of enzyme kinetics equations, and choose Morrison Ki.
354
the X values.
Km is the Michaelis-Menten constant, expressed in the same units as X,
determined in an experiment 338 without competitor.
Prism cannot fit any of these parameters from the graph of activity vs inhibitor
concentration. You must know S from your experimental design, determine Km
and Et in other experiments, and constrain all three to constant values.
Model
Q=(Ki*(1+(S/Km)))
Y=Vo*(1-((((Et+X+Q)-(((Et+X+Q)^2)-4*Et*X)^0.5))/(2*Et)))
Interpreting parameters
V0 is the enzyme velocity with no inhibitor, expressed in the same units as Y.
This is not the same as Vmax,which would require a maximal concentration of
substrate.
Ki is the inhibition constant, expressed in the same units as X.
355
Reference
2.7.12 Exponential
What is exponential?
Processes follow exponential models when the rate at which something is
happening depends on the amount that is present.
356
The rate constant and time constants are simply reciprocals of each other.
Prism always fits the rate constant (k), but computes the time constant (tau)
as well and reports the standard error and confidence interval of the time
constant just as if the model had been written to fit that constant.
The half-life equals ln(2)/k where ln is the abbreviation for natural logarithm.
Shown as a graph:
357
When you integrate both sides of the equation, you get the equation for
exponential decay:
Y=Y0 *exp(-k*X)
The function exp() takes the constant e ( 2.718...) to the power contained
inside the parentheses.
2.7.12.3 Equation: One phase decay
Introduction
An exponential decay equation models many chemical and biological processes.
It is used whenever the rate at which something happens is proportional to the
amount which is left. Here are three examples:
When ligands dissociate from receptors, the number of molecules that
dissociate in any short time interval is proportional to the number that were
bound at the beginning of that interval. Equivalently, each individual molecule
of ligand bound to a receptor has a certain probability of dissociating from
the receptor in any small time interval. That probability does not get higher
as the ligand stays on the receptor longer.
When radioactive isotopes decay, the number of atoms that decay in any
short interval is proportional to the number of undecayed atoms that were
present at the beginning of the interval. This means that each individual
atom has a certain probability of decaying in a small time interval, and that
probability is constant. The probability that any particular atom will decay
does not change over time. The total decay of the sample decreases with
time because there are fewer and fewer undecayed atoms.
When drugs are metabolized by the liver or excreted by the kidney, the rate
of metabolism or excretion is often proportional to the concentration of
drug in the blood plasma. Each drug molecule has a certain probability of
being metabolized or secreted in a small time interval. As the drug
concentration goes down, the rate of its metabolism or excretion goes
down as well.
Entering data
Create an XY data table. Enter time into X, and response (binding,
concentration ..) into Y. If you have several experimental conditions, place the
first into column A, the second into column B, etc.
358
You can also choose a sample data set for exponential decay.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of exponential equations, and choose One phase decay.
Model
Y=(Y0 - Plateau)*exp(-K*X) + Plateau
359
The derivative of an exponential decay equals -k*Y. So the initial rate equals -k*Y0.
2.7.12.4 Equation: Plateau followed by one phase decay
Introduction
In the standard one-phase decay 357 equation, the decay starts at time 0. This
equation is used when you measure a baseline for a while, then do some
experimental intervention that starts the decay at some time X0.
Entering data
Create an XY data table. Enter time into X, and response (binding,
concentration ..) into Y. If you have several experimental conditions, place the
first into column A, the second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of exponential equations, and choose Plateau followed by one phase
decay.
Model
Y= IF( X<X0, Y0, Plateau+(Y0-Plateau)*exp(-K*(X-X0)))
360
X0 is the time at which the decay begins. Often you will set that to a constant
value based on your experimental design, but otherwise Prism can fit it. It is
expressed in the same time units as X.
Y0 is the average Y value up to time X0. It is expressed in the same units as Y,
Plateau is the Y value at infinite times, expressed in the same units as Y.
K is the rate constant, expressed in reciprocal of the X axis time units. If X is in
minutes, then K is expressed in inverse minutes.
Tau is the time constant, expressed in the same units as the X axis. It is
computed as the reciprocal of K.
Half-life is in the time units of the X axis. It is computed as ln(2)/K.
Span is the difference between Y0 and Plateau, expressed in the same units as
your Y values.
Introduction
An exponential decay equation models many chemical and biological processes.
It is used whenever the rate at which something happens is proportional to the
361
Entering data
Create an XY data table. Enter time into X, and response (binding,
concentration ..) into Y. If you have several experimental conditions, place the
first into column A, the second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of exponential equations, and choose Two phase decay.
Model
SpanFast=(Y0-Plateau)*PercentFast*.01
SpanSlow=(Y0-Plateau)*(100-PercentFast)*.01
Y=Plateau + SpanFast*exp(-KFast*X) + SpanSlow*exp(-KSlow*X)
362
Introduction
An exponential decay equation models many chemical and biological processes.
It is used whenever the rate at which something happens is proportional to the
363
Entering data
Create an XY data table. Enter time into X, and response (binding,
concentration ..) into Y. If you have several experimental conditions, place the
first into column A, the second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of exponential equations, and choose Three phase decay.
Model
YFast=(Y0-Plateau)*PercentFast*.01*exp(-KFast*X)
YSlow=(Y0-Plateau)*PercentSlow*.01*exp(-KSlow*X)
YMedium=(Y0-Plateau)*(100-PercentFast - PercentSlow)*.01*exp(-Kmedium*X)
Y=Plateau + YFast + YMedium +YSlow
364
Introduction
This equation describes the pseudo-first order association kinetics of the
interaction between a ligand and its receptor, or a substrate and an enzyme.
During each time interval a certain fraction of the unoccupied receptors become
occupied. But as time advances, fewer receptors are unoccupied so fewer
ligand bind and the curve levels off.
Entering data
Create an XY data table. Enter time into X, and response (binding,
concentration ..) into Y. If you have several experimental conditions, place the
first into column A, the second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of exponential equations, and choose One phase association.
365
Model
Y=Y0 + (Plateau-Y0)*(1-exp(-K*x))
Introduction
In the standard one-phase association 364 equation, the increase starts at time
0. This alternative equation is used when you measure a baseline for a while,
then do some experimental intervention that starts the association at some
time X0.
366
Entering data
Create an XY data table. Enter time into X, and response (binding,
concentration ..) into Y. If you have several experimental conditions, place the
first into column A, the second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of exponential equations, and choose Plateau followed by one phase
association.
Model
Y= IF( X<X0, Y0,Y0 + (Plateau-Y0)*(1 - exp(-K*(X-X0))))
X0 is the time at which the association begins. Often you will set that to a
constant value based on your experimental design, but otherwise Prism can fit
it. It is expressed in the same time units as X.
Y0 is the average Y value up to time X0. It is expressed in the same units as Y,
Plateau is the Y value at infinite times, expressed in the same units as Y.
367
Introduction
An exponential decay equation models many chemical and biological processes.
It is used whenever the rate at which something happens is proportional to the
amount which is left.
A two-phase model is used when the outcome you measure is the result of the
sum of a fast and slow exponential decay.
Entering data
Create an XY data table. Enter time into X, and response (binding,
concentration ..) into Y. If you have several experimental conditions, place the
first into column A, the second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of exponential equations, and choose Two phase association.
Model
SpanFast=(Plateau-Y0)*PercentFast*.01
SpanSlow=(Plateau-Y0)*(100-PercentFast)*.01
Y=Y0+ SpanFast*(1-exp(-KFast*X)) + SpanSlow*(1-exp(-KSlow*X))
1995-2014 GraphPad Software, Inc.
368
Introduction
This equation describes the growth with a constant doubling time.
369
Entering data
Create an XY data table. Enter time into X, and response (cell number ..) into Y.
If you have several experimental conditions, place the first into column A, the
second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of exponential equations, and choose Exponential growth.
Model
Y=Y0*exp(k*X)
370
2.7.13 Lines
371
Introduction
Linear regression fits a straight line through your data. Nonlinear regression fits
any model, which includes a straight line model. Prism offers separate analyses
for linear regression and nonlinear regression, so you can choose either one to
fit a line.
Prism's nonlinear regression analysis offers more options than its linear
regression analysis 13 , such as the ability to compare two models 157 , apply
weighting 158 , automatically exclude outliers 59 and perform normality tests 185
on the residuals. See a longer discussion of the advantages of using the
nonlinear regression analysis to fit a straight line 82 .
Step by step
Create an XY data table. There is one X column, and many Y columns. If you
have several experimental conditions, place the first into column A, the second
into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel equations for lines, and choose Straight line.
Model
Y= YIntercept + Slope*X
372
Introduction
Prism's linear regression analysis 76 fits a straight line through your data, and
lets you force the line to go through the origin. This is useful when you are sure
that the line must begin at the origin (X=0 and Y=0).
Prism's nonlinear regression offers the equation Line through origin. It offers
more options than its linear regression analysis 13 , such as the ability to
compare two models 157 , apply weighting 158 , automatically exclude outliers 59
and perform normality tests 185 on the residuals. See a longer discussion of the
advantages of using the nonlinear regression analysis to fit a straight line 82 .
157
Step by step
Create an XY data table. There is one X column, and many Y columns. If you
have several experimental conditions, place the first into column A, the second
into column B, etc.
1995-2014 GraphPad Software, Inc.
373
After entering data, click Analyze, choose nonlinear regression, choose the
panel equations for lines, and choose Line Through Origin
Model
Y= Slope*X
Slope is the slope of the line, expressed in Y units divided by X units. It
estimates the ratio of Y/X in the entire population.
Weighting
In situations where linear regression through the origin is appropriate, it is
common for the variation among replicate Y values increases as X (and Y)
increase. Prism provides two weighting choices (in the Weights tab) that are
used in this situation. Weight by 1/X2 when you think the variance in Y is
proportional to the square of X, which means the SD among Y values is
proportional to X.Weight by 1/X if you think the variance in Y is proportional to
X.
Reference
1. J. G. Eisenhauer, Regression through the Origin. Teaching Statistics 25, 7680
(2003).
374
Introduction
Segmental regression fits one line to all data points with X less than some value
X0, and another line to all points with X greater than X0, while ensuring that the
two lines intersect at X0.
Segmental linear regression is helpful when X is time, and you did something at
time=X0 to change the slope of the line. Perhaps you injected a drug, or rapidly
changed the temperature. In these cases, your model really does have two
slopes with a sharp transition point.
In other cases, the true model has the slope gradually changing. The data fit a
curve, not two straight lines. In this situation, fitting the data with segmental
linear regression is not helpful.
Caution
Don't use segmental linear regression to analyze a biphasic Scatchard or
Lineweaver-Burk plot. A biphasic Scatchard plot follows a curve, not two
intersecting lines. There is no abrupt break point. You should fit the original data
to a two-site binding curve instead.
Step by step
Create an XY data table. Enter time into X, and your measurements into Y. If
you have several experimental conditions, place the first into column A, the
second into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel equations for lines, and choose Segmental linear regression.
Model
Y1 = intercept1 + slope1*X
YatX0 = slope1*X0 + intercept1
Y2 = YatX0 + slope2*(X X0)
Y = IF(X<X0, Y1, Y2)
The first line of the equation defines the first line segment from its intercept and
slope.
The second line of the equation computes the Y value of the first regression at
375
376
3.Click Analyze, choose Nonlinear regression (not Linear regression) and then
choose one of the semi-log or log-log equations from the "Lines" section of
equations.
1995-2014 GraphPad Software, Inc.
Equations
Semilog line -- X axis is logarithmic, Y axis is linear
Y=Yintercept + Slope*log(X)
On semilog axis
On linear axes
On linear axes
377
378
On log-log axes
On linear axes
Since both axes are transformed the same way, the graph is linear on both sets
of axes. But when you fit the data, the two fits will not be quite identical.
Parameters
In all three equations, Y intercept is in units of the Y values, and Slope is in units
of the Y values divided by units of the X values.
379
3.Click Analyze, choose Nonlinear regression (not Linear regression) and then
choose one of the Cumulative Gaussian distribution equations from the
"Lines" section of equations.
Equations
Cumulative Gaussian - Y values are percentages
Top=100
z=(X-Mean)/SD
Y=Top * zdist(z)
380
On probability axis
On linear axis
On semilog axis
On linear axes
Parameters
Mean is the average of the original distribution, from which the frequency
distribution was created.
SD is the standard deviation of the original distribution.
Both of these parameters are expressed in the same units as the X values
plotted on the graph, which is the same as the Y values in the original
distribution from which the frequency distribution was generated.
1995-2014 GraphPad Software, Inc.
381
2.7.14 Polynomial
382
382
Included in Prism, are both a set of ordinary polynomial equations and also a
set of centered polynomial equations. For example, when you look in the list of
polynomials you'll see both 'Second order polynomial' and 'Centered second
order polynomial'. We recommend always choosing one of the centered
equations instead of an ordinary polynomial equation. This page explains why.
383
384
prior to 5.02 or 5.0b, that constraint will be lost, and centered polynomial
regression won't work.
2.7.14.3 Equations: Polynomial models
Step by step
Create an XY data table. There is one X column, and many Y columns. If you
have several experimental conditions, place the first into column A, the second
into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel equations for polynomial equations, and choose one.
The "order" of a polynomial equation tells you how many terms are in the
equation. Prism offers first to sixth order polynomial equations (and you could
enter higher order equations as user-defined equations if you need them).
Higher order models wiggle more than do lower order models. Since the
equation rarely corresponds to a scientific model, use trial and error. If it isn't
close enough to the data, pick a higher order equation. If it wiggles too much,
pick a lower order equation.
Polynomial models
Order
Equation
First
Second
Third
Fourth
Fifth
Sixth
There is no general way to interpret the coefficients B0, B1, etc. In most cases,
the goal of fitting a polynomial model is to make a curve that looks good, and
the parameters really don't matter.
385
2.7.15 Gaussian
386
Introduction
Data follow a Gaussian distribution when scatter is caused by the sum of many
independent and equally weighted factors.
A frequency distribution (histogram) created from Gaussian data will look like a
bell-shaped Gaussian distribution.
Step-by-step
The data you fit must be in the form of a frequency distribution on an XY table.
The X values are the bin center and the Y values are the number of
observations.
If you start with a column of data, and use Prism to create the frequency
distribution, make sure that you set the graph type to "XY graph", with either
points or histogram spikes. This ensures that Prism creates an XY results table
with the bin centers entered as X values. If you pick a bar graph instead, Prism
creates a column results table, creating row labels from the bin centers. This
kind of table cannot be fit by nonlinear regression, as it has no X values.
Starting from the frequency distribution table, click Analyze, choose Nonlinear
regression from the list of XY analyses, and then choose the "Gaussian"
equation from the "Gaussian" family of equations.
387
388
Introduction
Data follow a Gaussian distribution when scatter is caused by the sum of many
independent and equally weighted factors.
When scatter is caused by the product of many independent and equally
weighted factors, data follow a log Gaussian distribution. When plotted on a
linear X axis, this is skewed to the right (see below). When plotted on a
logarithmic X axis, it looks like a bell-shaped Gaussian distribution.
Step-by-step
The data must be in the form of a frequency distribution on an XY table. The X
values are the bin center and the Y values are the number of observations.
If you start with a column of data, and use Prism to create the frequency
distribution, make sure that you set the graph type to "XY graph", with either
points or histogram spikes. This ensures that Prism creates an XY results table
with the bin centers entered as X values. If you pick a bar graph instead, Prism
creates a column results table, creating row labels from the bin centers. This
kind of table cannot be fit by nonlinear regression, as it has no X values.
Starting from the frequency distribution table, click Analyze, choose Nonlinear
regression from the list of XY analyses, and then choose the "logGaussian"
equation from the "Gaussian" family of equations.
Model
Y=Amplitude*exp(-0.5*(ln(X/Center)/Width)^2)
389
Introduction
A frequency distribution plots the number of observations as a function of
value. A cumulative frequency distribution plots the cumulative number of
observations as a function of value. Each Y value is the number of observations
in the original data set that have a value less than or equal to the X value.
The advantage of creating a cumulative distribution is that you don't have to
make any choice regarding bin width.
If your data follow a Gaussian distribution, the cumulative distribution has a
sigmoidal shape.
390
Step-by-step
1.Create an XY table, and enter your X and Y values. The X values correspond
to the value in the original data set, and the Y values are the number (or
fraction or percent) of values in the original data set that are less than or
equal to the Y value.
Alternatively, enter a stack of values onto a Column data table, and run
the frequency distribution analysis choosing to create a cumulative
frequency distribution with no bins.
2.From the cumulative frequency distribution, click Analyze, choose Nonlinear
regression and then choose one of the Cumulative Gaussian distribution
equations from the "Gaussian" group of equations.
3.If your data are entered as counts (rather than percentages or fractions)
constrain N to a constant value equal to the number of observations.
Models
The details of the model depend on whether the Y values are percentages,
fractions or counts.
Here is the model if the data are percentages, so the last Y value equals 100.
Top=100
z=(X-Mean)/SD
Y=Top * zdist(z)
Here is the model if the data are fractions, so the first line of the model defines
Top to equal 1.00.
Top=1.0
z=(X-Mean)/SD
Y=Top * zdist(z)
391
And finally, here is the model if the data are numbers of observations, so the
largest value equals the number of observations (N). In this case, you should
constrain N to be a constant value equal to the number of observations.
z=(X-Mean)/SD
Y=N * zdist(z)
Mean is the average of the original distribution, from which the frequency
distribution was created.
SD is the standard deviation of the original distribution.
Both of these parameters are expressed in the same units as the X values
plotted on the graph, which is the same as the Y values in the original
distribution from which the frequency distribution was generated.
Introduction
A Lorentzian distribution is bell shaped, but has much wider tails than does a
Gaussian distribution.
392
Step-by-step
The data must be in the form of a frequency distribution on an XY table. The X
values are the bin center and the Y values are the number of observations.
If you start with a column of data, and use Prism to create the frequency
distribution, make sure that you set the graph type to "XY graph", with either
points or histogram spikes. This ensures that Prism creates an XY results table
with the bin centers are entered as X values. If you pick a bar graph instead,
Prism creates a column results table, creating row labels from the bin centers.
This kind of table cannot be fit by nonlinear regression, as it has no X values.
Starting from the frequency distribution table, click Analyze, choose Nonlinear
regression from the list of XY analyses, and then choose the "Lorentzian"
equation from the "Gaussian" family of equations.
393
Introduction
Sine waves describe many oscillating phenomena.
Step by step
Create an XY data table. There is one X column, and many Y columns. If you
have several experimental conditions, place the first into column A, the second
into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of equations for sine waves, and choose Standard sine wave.
If you know the Y value is zero at time zero, then constrain PhaseShift to a
constant value of zero.
Model
Y= Amplitude*sin((2*pi*X/Wavelength)+PhaseShift)
394
Introduction
Sine waves describe many oscillating phenomena. Often the peak of each
wave decreases or dampens as time goes on.
Step by step
Create an XY data table. There is one X column, and many Y columns. If you
have several experimental conditions, place the first into column A, the second
into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of equations for sine waves, and choose Damped sine wave.
If you know the Y value is zero at time zero, then constrain PhaseShift to a
constant value of zero.
Model
Y= Amplitude*exp(-K*X)*sin((2*pi*X/Wavelength)+PhaseShift
395
Introduction
The sinc() function appears frequently in signal and image processing because it
is the Fourier transform of a rectangular pulse. It is also called the "sampling" or
"sine cardinal" function.
Step by step
Create an XY data table. There is one X column, and many Y columns. If you
have several experimental conditions, place the first into column A, the second
into column B, etc.
After entering data, click Analyze, choose nonlinear regression, choose the
panel of equations for sine waves, and choose Sinc() function.
Model
Y=IF(X=0,Amplitude,Amplitude*sin(2*pi*X/Wavelength)/(2*pi*X/Wavelength) )
396
397
This equation is an extension of the one site binding curve 396 . It shows the
binding of a ligand to two receptors with different affinities (different Kd values).
It also describes the enzyme activity as a function of substrate concentration
when two isozymes are present. The curve in the example has Kd values that
differ by a factor of ten, with equal Bmax values. Even with such a large
difference between Kd values, the curve is not obviously biphasic.
398
399
This equation extends the previous equation, but allows for a variable slope.
This equation is also called a four-parameter logistic equation.
Bottom is the Y value at the bottom plateau.
Top is the Y value at the top plateau.
LogEC50 is the X value when the response is halfway between Bottom and
Top. With different kinds of variables, this variable is sometimes called ED50
(effective dose, 50%), or IC50 (inhibitory concentration, 50%, used when
the curve goes downhill).
HillSlope describes the steepness of the curve. This variable is called the Hill
slope, the slope factor, or the Hill coefficient. If it is positive, the curve
increases as X increases. If it is negative, the curve decreases as X
increases. A standard sigmoid dose-response curve (previous equation) has
a Hill Slope of 1.0. When HillSlope is less than 1.0, the curve is more
shallow. When HillSlope is greater than 1.0, the curve is steeper. The Hill
slope has no units.
400
This equation describes the competition of a ligand for two types of receptors.
The radioligand has identical affinities for both receptors, but the competitor has
a different affinity for each.
Y is binding (total or specific) and X is the logarithm of the concentration of the
unlabeled ligand. FRACTION_1 is the fraction of the receptors that have an
affinity described by LogEC50_1. The remainder of the receptors have an
affinity described by LogEC50_2. If LogEC50_1 is smaller than LogEC50_2,
then Fraction_1 is the fraction of high affinity sites. If LogEC50_1 is larger than
LogEC50_2, then Fraction_1 is the fraction of low affinity sites.
401
This equation describes the kinetics such as the decay of radioactive isotopes,
the elimination of drugs, and the dissociation of a ligand from a receptor.
1995-2014 GraphPad Software, Inc.
402
X is time.
Y may be concentration, binding, or response. Y starts out equal to SPAN
+PLATEAU and decreases to PLATEAU with a rate constant K.
The half-life of the decay is 0.6932/K.
SPAN and PLATEAU are expressed in the same units as the Y axis. K is
expressed in the inverse of the units used by the X axis. In many
circumstances, the plateau equals zero. When fitting data to this equation,
consider fixing the plateau to a constant value of zero.
2.7.17.9 Equation: Two phase exponential decay
This equation describes a two phase exponential decay. Y starts out equal to
Span1+Span2+PLATEAU and decays to PLATEAU with fast and slow
components. The two half-lives are 0.6932/K1 and 0.6932/K2. In the figure,
the two rate constants differ tenfold, but the spans were equal. The curve is not
obviously biphasic, and it takes a very practiced eye to see that the curve does
not follow a single phase model.
2.7.17.10 Equation: One phase exponential association
403
404
Note: It is difficult to fit data to this equation with nonlinear regression, because
a tiny change in the initial values will drastically alter the sum-of-squares. You
may need to override the initial values provided by Prism.
2.7.17.13 Equation: Power series
Fitting data to a power series model can be difficult. The initial values generated
automatically by Prism are not very helpful (all four parameters are set to 1.0).
you will probably need to enter better initial values in order to fit this equation
to data. The initial values of B and D are important, because small changes in
those values can make a huge change in Y.
The equation is not defined, and leads to a floating point error, if X equals zero
and B or D are negative numbers or if X is negative and B or D are between 0.0
and 1.0.
2.7.17.14 Equation: Sine wave
X is in radians. In most cases, you will want to fix BASELINE to a constant value
of zero. AMPLITUDE is the maximum height of the curve away from the
baseline. FREQUENCY is the number of complete oscillations per 1 X unit.
405
2.8
406
407
409
410
411
413
414
415
416
418
406
423
424
437
Transforms to report
2.8.1
433
438
2.8.2
407
Entering an equation
Prism comes with many built-in equations, but you will often want to enter a
different equation. To do so, click the New button on the top of the Fit tab of
the Nonlinear regression dialog.
A drop down menu lets you choose to enter a new equation, clone an existing
one, or import an equation from a saved .PZF file.
408
values
162
409
Default constraints
Use the constraints tab to set default constraints. You can constrain a parameter to a
constant value, constrain to a range of values, share among data sets (global fit), or
define a parameter to be a column constant 424 . These constraints will become the
default every time the equation is selected. But each time the equation is
selected, you (or whoever is selecting the equation) can change the constraints
158 for that one fit.
If a parameter has to be set constant, but the actual value is different for each
experiment, set the constraint "Constant equal to" but leave the value blank. If
someone chooses the equation but forgets to constrain that parameter to a
constant value, Prism will prompt for one.
Transforms to report
Define transforms of the best-fit values on the Transforms to report 438 tab.
Unlike initial values and constraints, you can not override these transforms each
time you choose the equation. Transforms are defined with the equation
definition and can not be tweaked each time the equation is selected.
2.8.3
2.A drop down menu lets you choose to enter a new equation, clone an
410
2.8.4
411
Renaming an equation
The equation name helps you choose it in the future. It also appears on the
analysis results. You are not stuck with the name you originally gave it. To
rename an equation, select the equation, and then click Edit. Change the name
and click OK.
2.8.5
General syntax
Variable and parameter names must not be longer than 13 characters.
If you want to use two words to name a variable, separate with the
underscore character, for example Half_Life. Don't use a space, hyphen or
period.
Prism does not distinguish between upper and lower case letters in variable,
parameter or function names.
Use an asterisk (*) to indicate multiplication. Prism does not always
recognize implied multiplication. To multiply A times B, enter A*B and not
AB.
Use a caret (^) to indicate power. For example, "A^B" is A to the B power.
Use parentheses as necessary to show the order of operations. To increase
readability, substitute brackets [like this] or braces {like this}. Prism
interprets parentheses, brackets, and braces identically. Don't make any
assumptions about the order of precedence. Include enough parentheses so
there is no ambiguity about how the equation is evaluated.
Use a single equals sign to assign a value to a variable.
412
418
IF-THEN relationships
Prism allows you to introduce some branching logic through use of the IF
function. The syntax is:
IF (conditional expression, value if true, value if false).
You can precede a conditional expression with NOT, and can connect two
conditional expressions with AND or OR. Examples of conditional expressions:
MAX>100
Ymax=Constraint
(A<B or A<C)
NOT(A<B AND A<C)
FRACTION<>1.0
X<=A and X>=B
Prism's syntax is that of most computer languages: "<>" means not equal to,
"<=" means less than or equal to, and ">=" means greater than or equal to.
Here is an example.
Y= If (X<X0, Plateau, Plateau*exp(-K*X))
In this example, if X is less than X0, then Y is set equal to Plateau. Otherwise Y
is computed as Plateau*exp(-K*X). This approach is useful for segmental
regression 423 .
413
When X is less than 4, this evaluates to 1*1 + 0*10=1. When X is greater than
4, this evaluates to 0*1+1*10=10.
2.8.6
Multiline models
Equations can be written over several lines. Here is an example, the mixed
model enzyme inhibition model built-in to Prism:
VmaxApp=Vmax/(1+I/(Alpha*Ki))
KmApp=Km*(1+I/Ki)/(1+I/(Alpha*Ki))
Y=VmaxApp*X/(KmApp + X)
Prism follows the convention of all computer languages. It starts at the top and
goes down.
First it computes the intermediate variable VmaxApp. It knows this is an
intermediate variable, and not a parameter to fit, because it appears to the left
of the equals sign.
Next Prism computes the value of KmApp.
Finally it uses those two values to compute Y.
Math textbooks tend to write equations in the opposite order. A math text
might first define Y as a function of VmaxApp and KmApp, and then lower on
the page define how to calculate VmaxApp and KmApp from Vmax, Km, Alpha
and I. Prism (like all computer languages) requires that you define an
intermediate variable before you use it.
Here is a second example:
Specific=X*Bmax/(X+Kd)
Nonspecific=NS*X
<A>Y=Specific + Nonspecific
<B>Y=Nonspecific
414
The first line calculates the intermediate variable Specific. The second line
defines the intermediate variable Nonspecific.
The third line is preceded by <A>. This means that this line only applies to data
set A. The second line is preceded by <B> so only applies to data set B. This
allows the model to fit a table of data where column A is the total binding and
column B is the nonspecific binding. Read more about the syntax used to
specify that a particular line only applies only to selected data sets 423 .
You can define constants right in the multiline equation. This makes sense for
defining true constants, whose value will never change. If the constant is
something like a concentration that changes from experiment to experiment, it
is better to not define it in the equation itself, but rather define it in the
Constrain tab 158 . If it first appears on the right side of the equation, Prism will
treat it like a parameter. You can use the Constrain tab to fix that parameter to
a constant value. If the variable name appears first to the left of the equals
sign, it is used only within the equation and won't appear in the constraints tab.
For example, this line defines Pi:
Pi=3.141529
2.8.7
Model complexity
Prism compiles your equation into an internal format it uses to calculate the
math efficiently. If the compiled version of your equation won't fit in the space
Prism sets aside for this purpose, it reports that the equation is "too complex" .
415
If you see this message, don't give up. You can usually rewrite an equation to
make it less complex. Do this by defining an intermediate variable that defines
combinations of variables. For example if your equation uses the term "K1+K2"
four times, you reduce complexity (but keep exactly the same mathematical
meaning) by defining an intermediate variable at the top of your equation (say,
K1P2=K1+K2) and then using that intermediate later in the equation. That way
Prism has fewer steps to store.
2.8.8
Notes
Prism does not understand the other nomenclature for differential equations.
Don't try to define an equation that starts with "dY/dX = ".
Note that X doesn't actually appear in the equation. That's ok. It is there in
spirit, since Y' defines the derivative of Y with respect to X.
416
When you look at that equation, there appears to be only one parameter, K.
In fact, the equation has two parameters. Prism generates a parameter Y[0],
which is the value of Y at X=0.
When you go to add constraints and initial values, Y[0] appears just like the
other parameter K.
It is not possible to set this parameter to equal Y at some other X value
(other than 0.0). Let us know if this limitation matters to you.
Fitting a differential equation requires more calculations, so it takes noticeably
more time that fitting the usual kind of equation.
It is only possible to define Y'. It is not possible to use differential equations to
define intermediate variables. This would be useful for fitting compartmental
models, but Prism cannot (yet) fit this kind of model.
2.8.9
417
That's it. You'll need to define constraints and initial values as with any userdefined equation.
Notes
In this example you don't actually see Y on both sides of the equals sign in the same
equation line. But the first line puts Y on the right side of the equations sign, and the fourth
line puts it on the right side. That makes the equation implicit.
If you subtract Y from X, as in the example here, then X and Y must both be entered in the
same units. Here both are entered as radioactivity counts per minute (cpm). Of course, it
would make no sense to subtract Y from X if Y were in cpm and X were in nM.
Prism finds it "harder" to fit implicit equations than ordinary ones. You may have to fuss with
initial values and constraints to get it to work. The calculations take much longer, although
this may not be noticeable with small data sets and fast computers.
While Y appears on both sides of the equals sign in an explicit equation, X must appear only
on the right side of the equals sign.
418
In this particular example, the explicit equation has been derived and is even built into
Prism. The file you can download (link below) fits the data both ways (explicit equation and
implicit equation) and the results are identical. In other cases, it may be difficult or
impossible to derive an explicit equation.
Example Prism file.
411
Function
Explanation
Excel equivalent
abs(k)
abs(k)
arccos(k)
acos(k)
arccosh(k)
acosh(k)
arcsin(k)
asin(k)
arcsinh(k)
asinh(k)
arctan(k)
atan(k)
arctanh(k)
atanh(k)
arctan2(x,y)
atan2(x,y)
besselj(n,x)
besselj(x,n)
bessely(n,x)
bessely(x,n)
besseli(n,x)
besseli(x,n)
besselk(n,x)
besselk(x,n)
beta(j,k)
Beta function.
exp(gammaln(j)
+gammaln(k)
gammaln(j+k))
binomial(k,n,p)
1-
Function
419
Explanation
Excel equivalent
binomdist(k,n,p,true
)+
binomdist(k,n,p,fals
e)
chidist(x2,v)
ceil(k)
(no equivalent)
cos(k)
Cosine. K is in radians.
cos(k)
cosh(k)
cosh(k)
deg(k)
degrees(k)
erf(k)
Error function.
2*normsdist(k*sqrt
(2))-1
erfc(k)
22*normsdist(k*sqrt
(2))
exp(k)
exp(k)
floor(k)
(no equivalent)
fdist(f,v1,v2)
fdist(f,v1,v2)
gamma(k)
Gamma function.
exp(gammaln(k))
gammaln(k)
gammaln(k)
hypgeometricm
(a,b,x)
Hypergeometric M.
(no equivalent)
hypgeometricu( Hypergeometric U.
(no equivalent)
420
Function
Explanation
Excel equivalent
hypgeometricf(
a,b,c,x)
Hypergeometric F.
(no equivalent)
ibeta(j,k,m)
Incomplete beta.
(no equivalent)
if(condition, j,
k)
(similar in excel)
igamma(j,k)
Incomplete gamma.
(no equivalent)
igammac(j,k)
(no equivalent)
int(k)
Truncate fraction.
trunc()
a,b,x)
INT(3.5)=3
INT(-2.3) = -2
ln(k)
Natural logarithm.
ln(k)
log(k)
log10(k)
max(j,k)
max(j,k)
min(j,k)
min(j,k)
j mod k
psi(k)
(no equivalent)
rad(k)
radians(k)
sgn(k)
Sign of k.
sign(k)
If k>0, sgn(k)=1.
If k<0, sgn(k)= -1.
Function
Explanation
421
Excel equivalent
If k=0, sgn(k)=0.
sin(k)
Sine. K is in radians.
sin(k)
sinh(k)
sinh(k)
sqr(k)
Square.
k*k
sqrt(k)
Square root.
sqrt(k)
tan(k)
Tangent. K is in radians.
tan(k)
tanh(k)
tanh(k)
tdist(t,v)
tdist(t,v,1)
zdist(z)
normsdist(z)
422
In this example, you collected data that established a baseline early in the
experiment, up to "Start". You then added a drug, and followed the outcome
(Y) as it increased towards a plateau. Prior to the injection, the data followed a
horizontal line; after the injection the data formed an exponential association
curve.
Y1=BASELINE
Y2=BASELINE + SPAN*(1-exp(-K*(X-START)))
Y=IF[(X<START),Y1,Y2)]
It is easiest to understand this equation by reading the bottom line first. For X
values less than START, Y equals Y1, which is the baseline. Otherwise, Y equals
Y2, which is defined by the exponential association equation.
This equation has two intermediate variables (Y1 and Y2). Prism can fit the four
true variables: START, SPAN, K, and BASELINE.
In many cases, you will make START a constant equal to the time of the
experimental intervention. If you want Prism to fit START, choose an initial value
carefully.
This kind of model is most appropriate when X is time, and something happens
at a particular time point to change the model. In the example above, a drug
was injected at time=Start.
The Range tab of the nonlinear regression dialog lets you define an X range that
determines which points are fit and which are ignored.
423
Prefix
<C>
<~B>
<A:D>
<~A:D>
<A:J,3>
Data sets A,D,G, and J (every third data set between A and J.)
<~A:J,3>
Here is an example. It fits column A to a model that defines total binding and
column B to a model that defines nonspecific binding only. The first two lines
of the equation are evaluated for all data sets, the third line is only evaluated
for data set A, while the last line is only evaluated for data set B. To fit this
model, you would want to set the constraint that the parameter NS is shared
between data sets.
Specific=X*Bmax/(X+Kd)
Nonspecific=NS*X
<A>Y=Specific + Nonspecific
<B>Y=Nonspecific
424
<A>Y=1/(1+Ka*X^h)
<B>Y=(Ka*X^h)/(1+Ka*X^h)
Now the second line is for the second data set, which in this example is data set C
(since only A and C were selected in the Analyze dialog).
Note that Prism reads only the number in the column title. In this example, the
units are specified as micromolar, but Prism ignores this and simply reads the
numbers.
425
The first line defines an intermediate variable (KmObs, the observed MichaelisMenten constant in the presence of a competitive inhibitor), which is a function
of the Michaelis-Menten constant of the enzyme (Km), the concentration of
inhibitor (I), and the competitive inhibition constant (Ki).
The second line computes enzyme velocity (Y) as a function of substrate
concentration (X) and KMapp.
This model is defined with I constrained to being a data set constant, which
means its value comes from the column titles. In this example, therefore, I=0
when fitting column A, I=5 when fitting column B, etc. The ' M' in the title is
ignored by Prism -- it doesn't do any unit conversions.
The other three parameters (Km, Ki and Vmax) are defined to be shared, so
Prism fits one best-fit value that applies to the entire family of datasets.
Prism determined the maximum velocity of the enzyme with no inhibitor (Vmax
in the same units as the Y values you entered), the Michaelis-Menten constant
of the enzyme with no inhibitor (Km, in the units used for X values) and the
competitive inhibition constant (Ki, in units used for the column constants).
Note that I is not a parameter to be fit, but rather takes on constant values
you entered into the column titles. KmObs is not a parameter to be fit, but is
rather an intermediate variable used to define the model.
Learn more about competitive enzyme inhibition
345
426
datasets.
The graph above shows the sample data for competitive enzyme kinetics, and
shows how to fit a family of curves. Here, each curve shows enzyme activity
as a function of substrate concentration. The curves differ by the presence of
various amounts of an inhibitor. There are, essentially, two independent
variables: substrate concentration, and inhibitor concentration.
The substrate concentration is entered into the X column and the inhibitor
concentration into the column titles.
427
When fitting the data, the Constrain tab of the nonlinear regression dialog is
used to define the parameter I as dataset constant whose value comes from
the column titles (so is not fit by regression), and to share the values of all the
other parameters so there is one global fit of all the data.
428
8.Continue with the rest of the data. You'll be entering data diagonally down the
table. Since Prism can only have 104 columns, you will be limited to 104 data
points, each with a Y value and two X values.
9.When you are done, you'll have the same number of rows as Y columns.
429
equation. Let's imagine that the third variable has the name Z. Add syntax like
this to your equation:
<A> Z=2.34
<B> Z=45.34
<C>Z= -23.4
<D>Z=12.45
That tells Prism to assign Z one value for column A, a different value for column
B, etc. This approach gives you three independent variables, with some
constraints:
The X column is one independent variable, with one value for each row.
The column titles form a second independent variable with one value per data
set column.
The Z values defined within the equation for a third independent variable, also
with one value per data set.
430
You want to fit the sigmoidal enzyme kinetics data to a standard model. But
there are two forms of that model that are commonly used:
Y=Vmax*X^h/(Khalf^h + X^h)
Y=Vmax*X^h/(Kprime + X^h)
The two are equivalent, with Kprime equal to Khalfh , so the two fits will
generate exactly the same curve, with the same sum-of-squares, the same R2 ,
and the same numbers of degrees of freedom. Even though the two equations
express the same model, they are written differently. The fancy term is that
they are parameterized differently.
They both fit Vmax (the maximum activity extrapolated to very high
concentrations of substrate) and h (Hill slope, describing the steepness of the
curve). But one model fits Khalf (the concentration needed to obtain a velocity
half of maximal) and the other fits Kprime (a more abstract measure of
substrate action).
431
Ideal
Kprime
Khalf
"95% CI"
"99% CI"
5.0%
1.0%
8.8%
4.8%
5.1%
1.0%
These results show that Khalf is well behaved, as expected given its symmetry
432
(see above). The 95% confidence interval is expected to miss the true value in
5.0% of the simulations. In fact, it happened 5.1% of the time. Similarly, the
99% CI is expected to miss the true value in 1.0% of the simulations, which is
exactly what happened. In contrast, Kprime is less well behaved. The intervals
computed to be 95% confidence intervals were not wide enough so missed the
true value in 8.8% of the simulations. The 99% intervals were similarly not wide
enough so missed the true value in 4.8% of the simulations. Thus the
confidence intervals computed to be 99% intervals, actually turned out to be
95% intervals.
These simulations show the advantage of choosing the equation that fits Khalf,
rather than the one that fits Kprime. Khalf has a symmetrical distribution so the
confidence intervals computed from these fits can be interpreted at face value.
In contrast, Kprime has an asymmetrical distribution and its confidence intervals
cannot be interpreted at face value.
Hougaard's skewness
The results above were obtained by running numerous simulations. There is an
easier way to figure out how symmetrical a parameter is. Hougaards
skewness 204 (new in Prism 6) quantifies the asymmetry of each parameter,
computed from the equation, the number of data points, the spacing of the X
values, and the values of the parameters.
For the simulated data set, Hougaard's skewness is 0.09 for Khalf and 1.83 for
Kprime. A rule of thumb is to expect problems from asymmetry when the
absolute value of the Hougaard's skewness is greater than 0.25, and big
problems when the value is greater than 1.0. So Hougaard's skewness tells you
that the confidence intervals will be accurate when you fit Khalf, but not be so
accurate when you fit Kprime.
Note that Hougaard's skewness can be reported as part of the results of
nonlinear regression (choose in the Diagnostics tab). No simulations are
required.
Bottom line
Models can often be parameterized in multiple ways. You'll get the same curve
either way, but choosing an optimum parameterization ensures that the
confidence intervals for the parameters are believable. The best way to assess
various parameterizations is to ask Prism to report the value of Hougaards 204
measure of skewness for every parameter. Simulations take a bit more work,
but let you see how symmetrical a parameter is.
File used for this example
433
434
435
SIGN(YatXmax - YatXmin)
It equals +1 if the curve generally goes up as it goes from left to right: /
It equals -1 if the curve generally goes down as it goes from left to right: \
It is used as the initial value for the Hill Slope in dose response curves.
Here is an explanation of the math. The SIGN() function equals +1 when given
a positive number, and -1 when given a negative number. YatXMax is the Y
value at the largest X values. YatXmin is the Y value at the smallest X value.
(YatXmax - YatXmin) will be positive when the curve goes up, and negative
when the curve goes down.
Example
Y = Vmax*X/(Km + X)
Vmax
Velocity
436
Km
Substrate
437
What is a constraint?
Constraining parameters is often essential to getting useful results. Constraints
can be used to fix parameters to constant 158 values, to share parameters
among data sets (global 40 fitting), and to define one parameter to be a column
constant 424 (whose value comes from the column titles in the data table).
438
and also define one rate constant to be larger than the other (Kfast > Kslow).
Prism won't let you do that. What you have to do is define one constraint that
Kfast is greater than zero, and another that Kfast is greater than Kslow. But
don't put in the constraint that Kslow is greater than zero. That is implied by
being larger than Kfast.
439
440
Interpolating transforms
How to interpolate points off the curve
You can also use these 'transforms' to report values from the curve. The
interpolated value and its confidence interval will appear in the results, the same
as other transformed parameters.
Use this syntax:
441
Y[value] The Y value of the curve when X is the value you enter within
the brackets. You must enter a number within the brackets,
not a mathematical expression. The Y value will be computed
for any X, but confidence intervals will be calculated only when
the X value is within the range of the X axis.
X[value] The X value of the curve when Y is the value you enter within
the brackets. You must enter a number within the brackets,
not a mathematical expression. Prism searches for the Y value
you entered within the range the curve is plotted (Range tab)
and extending in each direction a distance equal to half that
range. It reports the smallest X value it finds within that range
that corresponds to the Y value you entered, and doesn't alert
you when the curve oscillates so there are several X values at
a particular Y value. If both X and Y are within the axis range, a
confidence interval is also calculated.
Example: You fit data to a log(dose) response curve and want to report the
antilog of the X value (dose) when Y=50 (which is not always the same as the
EC50). For the second example, you would enter "Dose at Y=50' on the left,
and '10^X[50]' on the right.
Confidence intervals
The confidence interval for interpolating transforms is computed by interpolation
off the confidence bands of the regression curve. You don't have a choice of
symmetrical vs. asymmetrical intervals.
442
2.9
Plotting a function
Graphing a family of theoretical curves, often called plotting a
function, is useful in three contexts:
To understand the properties of a function. After looking at the
graph, go back and change the parameters to see how the graph
changes. Or plot a family of curves, where one parameter varies
from curve to curve.
To create graphs for teaching theory.
To understand what initial values would make sense when fitting a
model to data with nonlinear regression.
2.9.1
443
444
2.9.2
445
In each case, the simulation generates two (or three) data sets. The first (A)
data set plots the entire curve. The equation is written so the second curve
(data set B) only plots values where X is less than a specified cutoff value, and
the third curve (data set B) only plots values when X is greater than the cutoff
value. The second and third data set are plotted with area fill to shade the tails
of the distributions. Remove data set B or C from the graph if you only want to
shade one tail. For example, here is the equation used for the first graph (z
distribution):
G=exp(-0.5*X^2)/sqrt(2*3.1415926)
<A>Y=G
<B>Y=IF(X<-z, G, 0)
<C>Y=IF(X>z, G, 0)
Change the numbers of degrees of freedom and the cutoff values (for shading)
in the Info sheet. This demonstrates how values entered into an info sheet can
be 'hooked' to constants used in analyses.
2.9.3
446
Mathematical details
Binomial distribution
The equation for the probability of exactly X successes in N trials, when each
trial has probability P of success is:
R=INT(X+0.5)
ExactProb=(P^R)*(1-P)^(N-R) ;exact probability of successes in n trials
NRearrangments=exp(gammaln(N+1) - gammaln(R+1) - gammaln(N - R +1))
;gamma(J)=(J-1)factorial, or (J-1)! but factorial is not a function within
;NRearrangments = N!/(R! (N-R)!)
H=ExactProb * NRearrangments
<A>Y=H
<B>Y=IF(X>cutoff, H, 0)
Poisson distribution
The equation for the Poisson distribution is:
Y=exp(-1*Lamda)*Lamda^X/gamma(X+1)
Note the definition of the gamma function:
gamma(i) = factorial(i-1)
gamma(x+1)= factorial(x) = X!
2.10
447
447
448
Don't select a lowess curve unless you have well over twenty data points.
Prism generates the curve as a series of line segments. Enter the number of
segments you want, and check the option box if you need to see the XY
coordinates of each point, or if you want to use the resulting lowess, point-topoint, or spline curve as a standard curve. Prism always generates a lowess
curve with at least four times more line segments than the number of data
points, and you cannot reduce the number of segments below this value.
1. John Chambers et. al., Graphical Methods for Data Analysis, Wadsworth and
Brooks, 1983.
447
447
Index
Index
-44PL
256, 262
-55PL
267
-AAbolute IC50
248
Adusted R2
216
Advice: How to understand a model
12
AIC
48, 51
AIC. Interpreting.
215
Akaike information criterion
48, 51
Allosteric EC50 shift equation
281
Allosteric modulator defined
311
Allosteric modulator of radioligand binding, equation
322
Ambiguous fits defined
197
Ambiguous nonlinear regression
228
Analysis checklist: Deming regression
103
Analysis checklist: Linear regression
91
Analysis checklist: Nonlinear regression, "Ambiguous"
228
Analysis checklist: Nonlinear regression, "Bad initial
values"
226
Analysis checklist: Nonlinear regression, "Don't fit"
232
Analysis checklist: Nonlinear regression, "Hit
constraint"
231
Analysis checklist: Nonlinear regression, "Impossible
weights"
233
Analysis checklist: Nonlinear regression, "Interrupted"
226
Analysis checklist: Nonlinear regression, "Not
converged"
227
Analysis checklist: Nonlinear regression, "Perfect fit"
233
Analysis checklist: Nonlinear regression, "Too few
points"
232
Analysis checklist: Nonlinear regression, comparing
models
221
1995-2014 GraphPad Software, Inc.
449
-BBackfit
104
Bell-shaped dose-response equation
271
Biphasic dose-response equation
269
Boltzmann sigmoid curve
400
Broken line model
374
450
185
-EEC50 defined
236
EC90
283
ECanything
283
Enzyme inhibition, general
337
Enzyme kinetics, assumptions
336
Enzyme progress curve
335
Equation, cloning
409
Equilibrium binding
9
Exponential association equation, one phase
Exponential association equation, two phases
Exponential decay
9
368
48
213
-F-
one phase
357
three phases
362
two phases
360
356
-H-
364
367
Hill slope
240
Hill slope, in saturation binding
300
Hit constraint. Nonlinear regression.
231
Homologous binding defined
311
Hougaard's measure of skewness
204
How the F test works:
50
How to: Linear regression interpolation
76
How to: Linear regression lines
76
How to: Nonlinear regression
126
Index
-IIC50
261, 262, 264, 265
IC50 defined
236
IC50, absolute
248
IC50, relative
248, 251
IC90
283
ICanything
283
IF() statement in segmental regression
421
Implicit equations
414, 416
Impossible weights. Nonlinear regression.
233
Information theory
51
Inhibition of enzymes by substrate
352
Initial rate, in enzyme kinetics
335
Initial values tab
162
Initial values, rules for
433
Intercept of linear regression
84
Interpolating from a standard curve
104
Interrupted nonlinear regression
226
-KKcat
340
Kd, meaning of 285
Kinetics of competitive binding, equation
333
Kinetics of radioligand binding
325
Kolmogorov-Smirnov test of residuals
185
451
-MMarquardt method
62
Mass action, law of 285
Mathematical model
9
Michaelis-Menten model
338
Mixed-model inhibition of enzyme
350
Model, defined
9
Models have the same DF
213
Morrison equation of enzyme inhibition
353
Multiple regression
21
-NNested models
213
Noncompetitive inhibition of enzyme
347
Nonlinear regression, how it works
62
Nonlinear regression, unequal weighting
30
Nonlinear regression, when to use for fitting a line
82
Nonlinear vs. linear regression
17
Nonspecific binding
288
Normality test, residuals of nonlinear regression
185
Normality tests
165
Normalized data and weighted nonlinear regression
35
Normalized dose-response models
243
Normalizing data, pros and cons
243
452
-OOccupancy of receptor
285
Off rate of radioligand binding
325
On rates of radioligand binding
325
One phase exponential association equation
364
One phase exponential decay equation
357
One site -- Fit total and nonspecific binding, equation
293
One site -- Specific binding equation
297, 300
One site -- Total binding equation
292
One site -- Total, accounting for ligand depletion
equation
295
One site competition binding with ligand depletion
318
One site competition equation
312, 314
One site competition homologous competition
321
One site with allosteric modulator equation
309
Operational model. Partial agonists experiments
275
Operational model. Receptor depletion experiments
272
Optical density
9
Origin, forcing a line to go through
372
Orthogonal regression
100
Outlier removal, nonlinear regression
53, 57
Outlier removal, when to avoid
54
Outliers table
207
Outliers, graphing
176
Outliers, what to do when you find one
207
Output tab, nonlinear regression
164
359
379
160
187
Index
453
349
241