Diffusion Models With Time-Dependent Parameters
Diffusion Models With Time-Dependent Parameters
article info a b s t r a c t
Article history: Drift-diffusion models have become valuable tools in many fields of contemporary psychology and the
Received 25 August 2022 neurosciences. The present study compares and analyzes different methods (i.e., stochastic differential
Received in revised form 21 February 2023 equation, integral method, Kolmogorov equations, and matrix method) to derive the first-passage
Accepted 26 February 2023
time distribution predicted by these models. First, these methods are compared in their accuracy and
Available online 27 March 2023
efficiency. In particular, we address non-standard problems, for example, models with time-dependent
Dataset link: https://doi.org/10.5281/zenod drift rates or time-dependent thresholds. Second, a mathematical analysis and a classification of these
o.6970739 methods is provided. Finally, we discuss their strengths and caveats.
© 2023 Elsevier Inc. All rights reserved.
Keywords:
Drift-diffusion model
Cognitive modeling
Numerical approximation
Parameter estimation
1. Introduction starts at zero without a bias toward one response option, and is
driven toward one threshold with a constant increase per time
Mathematical models describing the time course of decision- step, referred to as the drift rate µ. The accumulation process
making have become an increasingly valuable tool in psychology is noisy, and usually modeled as a Wiener process. Because of
and neuroscience. One of the most successful variants of such the noise, the entire diffusion process {X (t), t > 0} exceeds the
models is the drift-diffusion model (DDM; Ratcliff, 1978), a class correct threshold at a random point in time, and can also result
of mathematical models for binary decisions (for reviews, see in an erroneous decision when exceeding the incorrect threshold.
Forstmann, Ratcliff, & Wagenmakers, 2016; Ratcliff, Smith, Brown, While these parameters cover the decision part of the DDM,
& McKoon, 2016; Schwarz, 2022). This section begins with a perceptual and motor processes are captured by the residual (or
description of the standard DDM and then introduces two cases non-decision) time, which is added to the decision time. The
where the parameters of the DDM are time-dependent, that is, full model comprises several additional parameters. In particular,
they vary with the progression of time in an experimental trial. the starting point, the residual time, and the drift rate can vary
Based on this, we then lay out the purpose of the present paper from trial to trial, and the starting point can be biased toward
one or the other threshold (for overviews of the full model, see
in more detail and sketch the outline of its remainder.
Ratcliff, 1978; Voss, Nagler, & Lerche, 2013; Voss, Voss, & Lerche,
2015). The thresholds (b, −b) and the drift rate µ are assumed
1.1. The standard DDM
to be constant within a trial (although, as mentioned, they can
vary across trials). In other words, both parameters are time-
The general idea behind the DDM is that the cognitive sys- independent (or stationary), because their value does not depend
tem accumulates noisy evidence for one or the other of two on the progression of time within a single trial.
response options over time (Fig. 1) . If there are two response
options, a selection occurs once the evidence exceeds one of 1.2. Time-dependent parameters
the respective thresholds that are denoted here as b and −b for
response options 1 and 2, respectively. Evidence accumulation While this stationary assumption simplifies mathematical
tractability and is sufficient to model many psychological phe-
∗ Corresponding author. nomena, variants of the DDM with time-dependency of these
E-mail addresses: [email protected] (T. Richter), parameters are motivated by additional psychological phenom-
[email protected] (R. Ulrich), [email protected] (M. Janczyk). ena. For example, Heath (1992) considered that the drift rate may
https://doi.org/10.1016/j.jmp.2023.102756
0022-2496/© 2023 Elsevier Inc. All rights reserved.
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Fig. 1. (a) Illustration of the standard drift-diffusion model (DDM): The straight black line is the expected value E [X (t)] based on the drift rate alone, the jagged
black line is an example of a corresponding diffusion process with added Brownian motion. (b) Illustration of time-dependent drift rates according to the Diffusion
Model for Conflict (DMC) tasks (Ulrich, Schröter, Leuthold, & Birngruber, 2015): The straight black line is the expected value based on the controlled drift rate alone,
the dashed green and red lines are the expected values of the automatic activation modeled by a Gamma function for congruent and incongruent trials, respectively.
The solid green and red lines are the expected values based on the superimposition of controlled and automatic processes, and the jagged green and red lines are
examples of the corresponding diffusion processes with added Brownian motion. (c) Illustration of time-dependent thresholds: The straight black line is the expected
value based on the drift rate alone, and the jagged black line is an example of the corresponding diffusion process with added Brownian motion. Thresholds are
modeled with a hyperbolic ratio function.
diminish with increasing processing time. Moreover, Heath also an activation into the negative direction (in incongruent trials).
showed how a time-dependent drift rate could arise within the This reflects the congruency effect (see Fig. 1b). In technical
cascade model by McClelland (1979). In the following, however, terms, the superimposition results in a time-dependent drift rate,
we will describe more recent modeling work that employed because the drift rate changes with increasing time during a
time-dependent changes of the drift rate and time-dependent trial. Importantly, different delta functions can also result: The
thresholds. superimposition of the two diffusion processes results in a neg-
ative delta function, when the automatic activation reaches its
Time-dependent drift rate. One example for time-dependency of
maximum rather early, but it results in a positive delta function,
the drift rate µ(t) is the Diffusion Model for Conflict (DMC) tasks
when the maximum is reached later.
(Ulrich et al., 2015). This model was particularly developed to
Another example of time-dependent drift rates is the Shrink-
account for positive and negative delta functions in typical con-
ing Spotlight Model (White, Ratcliff, & Starns, 2011) that ad-
flict tasks. Consider, for example, an Eriksen flanker task (Eriksen
dresses conflict processing in the flanker task. This model also
& Eriksen, 1974). In this task, a centrally presented imperative
accounts for the observation that responses with short RTs are
stimulus requires a left or right manual key press, while it is
especially error-prone in incongruent trials. Such a result sug-
surrounded by flankers that either demand the same response
gests that the flankers exhibit their influence especially during
(in congruent trials) or the other response (in incongruent trials).
the early phases of processing. The model accounts for this by
Response times (RTs) are shorter and less error-prone in congru-
assuming that attention is distributed broadly at the onset of
ent than in incongruent trials, what is typically referred to as the
a trial and shrinks over time, thereby decreasing the influence
congruency effect. This congruency effect usually becomes larger
of the flankers the more time has progressed. For a congruent
with longer RTs, that is, a positive delta function is observed (see
trial, the resulting drift rate is time-independent. In contrast, for
de Jong, Liang, & Lauber, 1994; Pratte, Rouder, Morey, & Feng,
incongruent trials, the resulting drift rate is time-dependent and
2010; Schwarz & Miller, 2012).
increases with time, as more and more information is processed
The Simon task (Simon, 1969) is another conflict task where
and the influence of the flankers becomes smaller.
participants are to respond to a stimulus (e.g., the identity of
a letter or a color), which is presented either in a left or right Time-dependent thresholds. The second parameter that may cha-
location. Here, participants respond faster when stimulus location nge with time is the threshold b (Fig. 1c). Arguably, in most
and (correct) response location match (in congruent trials) than of the applications of the DDM, thresholds are assumed to be
when they mismatch (in incongruent trials). However, in this constant within a trial. However, models in which thresholds
case, the congruency effect becomes smaller with longer RTs, that vary with time—most often they are assumed to collapse with in-
is, a negative delta function is observed (e.g., Pratte et al., 2010). creasing time—have been suggested to account for some phenom-
Motivated by dual-process accounts for congruency effects, ena (e.g., Churchland, Kiani, & Shadlen, 2008; Ditterich, 2006a;
the DMC explains different delta functions by assuming that two Evans, Hawkins, & Brown, 2020; Hanks, Mazurek, Kiani, Hopp,
diffusion processes are superimposed: (1) a linear drift function & Shadlen, 2011). Collapsing thresholds were even suggested to
representing controlled response selection and (2) a pulse func- account for slow errors, thereby rendering the drift rate between-
tion toward the same response in congruent trials and to the trial variability as unnecessary (e.g., Ditterich, 2006b; Palmer,
alternative response in incongruent trials (see Fig. 1b for an il- Huk, & Shadlen, 2005).
lustration). The latter process represents the automatic activation Further, a recent study suggested that different experimental
induced by the task-irrelevant stimulus features in a conflict task methods to induce speed-accuracy tradeoffs affect different pa-
(e.g., the flankers or the location of the stimuli). rameters of a DDM. More precisely, Katsimpokis, Hawkins, and
The superimposed process tends to hit the upper threshold van Maanen (2020) suggested that instructions affect the initial
earlier when the automatic activation turns toward the posi- separation of thresholds, while response deadlines affect the rate
tive direction (in congruent trials) compared with the case of of their collapse over time. A very similar idea, though for an
2
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
accumulator model, was suggested to account for results obtained on the practical implementation of the methods, the emphasis
with free-choice tasks. These are tasks where the stimulus does of our work is to present the theoretical foundations and to
not request one particular response, but where the actor should explore the significance of the mathematical background for the
choose among a set of response options (e.g., Berlyne, 1957; application.
Janczyk, Naefgen, & Kunde, 2020). Based on results from priming The remainder of this article is structured in the following
experiments, Mattler and Palmer (2012) suggested that the deci- way. Section 2 formalizes the DDM and introduces stochastic
sion criterion relaxes in free-choice trials, an assumption similar simulations, random walks, and—with a particular focus—the Kol-
to a collapsing threshold in a DDM. mogorov Forward Equation (KFE) and the Kolmogorov Backward
It is, however, not clear though whether time-dependent thr- Equation (KBE) as methods to implement the diffusion model.
esholds are psychologically plausible. Moreover, at least three Section 3 introduces several methods of discretizations of the
studies (Evans et al., 2020; Hawkins, Forstmann, Wagenmakers, aforementioned methods with emphasis on the quality of the
Ratcliff, & Brown, 2015; Voskuilen, Ratcliff, & Smith, 2016) con- approximations and on the computational cost of these numerical
cluded that collapsing thresholds add little if any improvement approaches. We also introduce a new approach which is based
to model fit when compared to a DDM with time-independent on the discretization of the KFE and a realization that is able to
thresholds. For example, Voskuilen et al. (2016) used a hyperbolic handle time-dependent drift rates and thresholds without a drop
ratio function to model a decrease of the thresholds over time in efficiency (see in particular Section 3.3 and Appendix A.1). The
(see Fig. 1c).1 The best estimates for its parameters resulted in a practical dependence of computational costs (or efficiency) on
shape resembling a time-independent threshold (see their Fig. 5). the desired level of error reduction is investigated in Section 4.
Yet, there may be tasks and situations where collapsing thresh- In Section 5, we fit our approach to empirical data gathered in
olds are more likely to improve model fit. One such situation the context of DMC, that is, a model with time-dependent drift
are difficult tasks with a very small drift rate, thus resulting rates (Ulrich et al., 2015). Section 6 summarizes the conclusion
in long RTs. While the DDM has mainly been applied to rather of the present work, including advice for researchers using DDMs
simple tasks with short RTs, it appears also viable for longer with time-dependent parameters. The source codes to reproduce
RTs (Lerche & Voss, 2019). In this case, the process will even- all computations are available as a Zenodo repository (Richter,
tually hit a threshold, but this may take much too long with Ulrich, & Janczyk, 2023).
a time-independent threshold (e.g., Voss, Lerche, Mertens, &
Voss, 2019). As an additional example, expanded judgment or Remark 1 (Terminology: Model, Method, and Discretization). In the
deferred decision-making tasks are cases where decisions are following, it is necessary to distinguish between model, method,
made based on increasingly smaller amounts of evidence as time and discretization. A model describes a cognitive process and
progresses (e.g., Busemeyer & Rapoport, 1988), and data from is formulated as a stochastic differential equation (i.e., Equa-
highly practiced participants may better be accounted for with tion (1)). From this model, we deduce the first-passage time
collapsing thresholds as well (Hawkins et al., 2015). distribution, and to do so, different methods are available, that
In summary, it is controversial whether or not time-dependent is, stochastic simulations, the KFE (6), the KBE (7), the integral
thresholds are valuable to the DDM. Although this controversy equation (3), and random walks (see Section 2.4). All four meth-
is not yet settled, the present article nonetheless includes time- ods can be used to compute the predicted probability density
dependent thresholds into its analyses. function (PDF) of the first-passage times, and if they were solved
exactly, they would yield the identical result.
1.3. Purpose and outline of the present article This, however, is not possible in the general case as it would
require infinite computational resources. Hence, for each method,
The present article reports a comparison of different meth- different discretizations are possible. For example, the trapezoidal
ods to solving the underlying mathematics of diffusion models, rule (20) is used in approximating the integral equation and
particularly with time-dependent parameters. More precisely, we finite difference discretizations are applied to the KBE and KFE.
investigate, on the one hand, the direct simulation of the model Depending on the method and on the specific situation (e.g., time-
using the stochastic Euler method. On the other hand, we ana- dependency of parameters), one approach will be superior to the
lyze the integral equation method as well as different methods other. A discussion of the subtle differences is the content of the
based on a reformulation of the model into partial differential following sections.
equations (PDEs), such as the matrix method of Diederich and
Busemeyer (2003). We focus on the computational efficiency of
1.4. Transparency and openness
the different methods; thus we present a theoretical analysis of
how much more effort is required for a desired error reduction
This article will describe the mathematical tools to model and
and investigate the computational times required for the different
discretize diffusion problems. Various numerical approximation
implementations. In this course, we highlight the critical differ-
techniques will be detailed. The methods are implemented in our
ences between these methods, their potential advantages and
own software in C++ and Python. The scripts are deposited for
shortcomings, and present solutions to some known issues with
reproducibility of all results (Richter et al., 2023). This archive
particular methods.
also contains the data analysis techniques as well as the scripts
Shinn, Lam, and Murray (2020) describe and compare different
for generating the figures of this manuscript. This research does
approximation methods for diffusion models, and the authors
not involve human (or animal) experimentation.
also introduce a Python software package that allows for highly
efficient computations in a flexible environment. They further
discuss the versatility of the approaches, for example, when ap- 2. Reaction time modeling
plied to time-dependent parameters. While their paper focuses
2.1. The diffusion model
1 Hawkins et al. (2015) used the Weibull distribution function as a flexible
function allowing to approximate many candidates of how thresholds collapse
A fundamental assumption of the DDM is that the decision
exactly, for example, whether the large part of the collapse occurs early or rather process evolves continuously in time. Given a starting value X0
late during a trial. at time t = 0 in between the lower and upper threshold X0 ∈
3
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
[−b(0), b(0)], the decision variable X (t) evolves by the determin- with the Kernel function
istic drift rate µ(t) and a random influence modeled as Brownian ∫t
motion by the stochastic differential equation (SDE) f [b(t), t |y, s] ( b(t) − y − µ(r) dr )
Ψ (b(t), t |y, s) = b′ (t) − µ(t) − s
,
2 t −s
dX (t) = µ(t) + σ (t)dB(t), X (0) = X0 (1)
(4)
where µ(t) is the drift rate, B(t) refers to standard Brownian
motion, and σ (t) is the standard deviation of X (t) (also called and the transition density f [x, t |y, s] for a process starting in y at
‘‘diffusion constant’’). If X (t) at time t > 0 hits the upper time s reaching x at time t, while neglecting all thresholds:
threshold, that is, X (t) = b(t), before hitting the lower threshold, ( (x − y − µ(r) dr)2 )∫t
1
the decision is positive (if µ > 0). However, if it first hits the f [x, t |y, s] = √ exp − s
. (5)
lower threshold, that is, X (t) = −b(t), the decision is negative. All 2π σ 2 (t − s) 2σ 2 (t − s)
parameters, drift rate µ(t), diffusion constant σ (t), and thresholds The reformulation as an integral equation brings along the major
b(t) can be time-dependent, although most of the parameters are
advantage of the possibility for a direct discretization without the
usually taken as time-independent.
need for a large number of repeated samples. We give details in
The stochastic Euler (SE) method to simulate (1) is based on
Section 3.2 and refer to Hackbusch (1995, Chapter 2) for an overview
dividing the temporal interval of interest [0, T ] into small time
on the theory and numerical treatment of such integral equations.
steps of size ∆t and advancing the initial state X0 = X (0) via
√
Xn+1 = Xn + ∆t · µ(tn ) + σ (tn ) · Z (tn ) · ∆t , tn+1 = tn + ∆t (2)
2.3. Partial differential equation models
where Z (tn ) follows a Gaussian distribution with zero mean and
unit variance. Given an infinite number of such random processes
Ntr → ∞ simulated with an infinitesimally small time step ∆t → By applying Itô’s Lemma (see Øksendal, 2000, Section 4) to
0, the positive and negative first passage times define PDFs. As the SDE (1), one derives the Kolmogorov Equations.2 These are
computer simulations will always have to limit to a finite number two partial differential equation (PDEs), namely the Kolmogorov
of trials Ntr ∈ N and discretize (1) with time step ∆t > 0, the Forward Equation (KFE)
derived PDF will, of course, be an error-prone approximation.
σ2
In the following sections, different methods for approximating ∂t p(x, t) + µ∂x p(x, t) − ∂xx p(x, t) = 0 (6)
the PDFs are presented and compared. The focus is on efficient 2
numerical realization, that is, on the question of gaining the best and the Kolmogorov Backward Equation (KBE)
possible accuracy with the least possible computational effort.
σ2
− ∂t q(x, t) − µ∂x q(x, t) − ∂xx q(x, t) = 0. (7)
2.2. Reformulation as an integral equation 2
The connotation of the solutions p(x, t) and q(x, t) to these two
The integral method for computing the first-passage time dis- PDEs is the following. In case of the KFE, let us assume that p0 (x)
tribution of Wiener processes in the context of psychological is the PDF of the states at initial time t = 0. Then, p(x, t) denotes
research was first used by Heath (1992). Since then, this method the probability of the process residing in the state x at time t. In
has been successfully applied in numerous studies on RT mod- contrast, for the KBE, the solution q(x, t) is the probability that
eling (Evans et al., 2020; Jones & Dzhafarov, 2014; Smith, 1995, a state x at final time t will, at a future time T > t, be within
2023; Smith & Lilburn, 2020; Smith & Ratcliff, 2009, 2021; Smith, the target set denoted by qT . The names forward and backward
Ratcliff, & Sewell, 2014; Voskuilen et al., 2016). A detailed in-
describe the transportation direction of information: The KFE
troduction to this method for cognitive modelers is provided
starts at time t = 0, where the initial probability p(x, 0) = p0 (x)
by Smith (2000). In more detail, the diffusion model (2) can be re-
is prescribed and it evolves forward in time, while the KBE starts
formulated as an Volterra integral equation of first kind (Durbin,
1971). However, this integral equation is not amenable to a at time t = T with terminal values q(x, T ) = qT (x) and evolves
simple numerical treatment because singularities complicate the backward in time.
approximation. Buonocore, Nobile, and Ricciardi (1987) reformu- Both approaches find their match in numerical realizations of
lated the equation to a Volterra integral equation of second kind diffusion modeling. The KBE is the starting point of fast-dm (Voss
to remove the singularities. In a second work, the two-thresholds & Voss, 2007), while the random walk approach (Diederich &
case has been considered (Buonocore, Giorno, Nobile, & Ricciardi, Busemeyer, 2003; Diederich & Oswald, 2016) is a numerical ap-
1990). For a survey and an overview on the derivations and proximation of the KFE that is also used in Kiani and Shadlen
reformulations into the Volterra equation of second kind, we refer (2009, Supplementary Materials). Further, Shinn et al. (2020)
to Smith (2000). Here, by Pmin (t , x) and Pmax (t , x) we denote the recently presented a computational framework also based on a
probability densities at time t for reaching the lower and upper direct discretization of the KFE.
thresholds, when starting with state x ∈ [−b, b] at initial time 0. In principle, the KFE and the KBE approach are interchangeable
The coupled set of integral equations is given by and can be used equally to calculate PDFs. They differ, however,
Pmin (t , x) = −2Ψ [−b(t), t |x, 0] in details of the application. The KBE is highly efficient if all pa-
∫ t rameters are time-independent, whereas the KFE approach allows
+2 Pmin (s, x)Ψ [−b(t), t | − b(s), s] more flexibility. In the following, we briefly discuss the use of the
0
two PDE methods for deriving the PDF of first-passage times.
+ Pmax (s, x)Ψ [−b(t), t |b(s), s] ds
(3)
Pmax (t , x) = −2Ψ [b(t), t |x, 0]
∫ t 2 These are, in statistical mechanics, called the Fokker–Planck equations, and
+2 Pmin (s, x)Ψ [b(t), t | − b(s), s] they are also closely related to the Black–Scholes equations used in computa-
0 tional finance. They are the prototypical diffusion-transport problems in plain
+ Pmax (s, x)Ψ [b(t), t |b(s), s] ds mathematical language.
4
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Fig. 2. Visualization of the KFE solution for the case of a time-dependent, collapsing threshold b(t): The solution p(x, t) indicates the probability that, at time t, the
state resides in x. Here, one sees the drift towards the upper threshold. The probabilities for reaching upper and lower thresholds, Pmax (t) and Pmin (t), are the fluxes
of the solution p(x, t) at the thresholds. At time t = 0, p(x, 0) is the probability density of the initial condition. For times t > 0 the volume is less than one as the
thresholds are absorbing (see Section 2.3.1).
The thresholds −b(t) and b(t) are absorbing: Once we reach a such that the integrated probability decreases with the states
that cross the thresholds. This is illustrated in Fig. 2,
∫ b where for
limit, the process is terminated (i.e. absorbed). Thus, we are inter-
larger times t → ∞, the integrated probability −b p(x, t) dx
ested in the first time reaching the threshold (i.e., the first passage
will turn to zero. The probability fluxes at the thresholds −b
time). Hence, we complete the KFE by homogeneous Dirichlet
and b are exactly the probabilities of reaching the lower and
data, that is, by setting the probabilities to zero on the thresholds upper threshold, which we denote by Pmin (t) and Pmax (t). They
p(−b(t), t) = 0 and p(b(t), t) = 0. Fig. 2 shows the solution of the accumulate in time by
KFE method for an example with a time-independent drift rate µ
t
σ2 t
σ2
∫ ∫
and time-dependent, collapsing thresholds b(t). The PDE is solved
Pmin (t) = − ∂x p(−b, t) dt , Pmax (t) = ∂x p(b, t) dt
in the complete domain Ω and in each point (x, t) ∈ Ω , the 0 2 0 2
solution p(x, t), specified by its color, gives the probability that (10)
state x is reached at time t.
The initial probability p0 (x) = φ0 (x) is a probability density and these integrals are approximated with the trapezoidal rule.
The first-passage time is then obtained by discretizing the KFE (6)
such that
in the restricted domain (8) and by integrating the differential
∫ b(0)
Eqs. (10) at the thresholds. This is visualized in Fig. 2, where
p0 (x) dx = 1. the fluxes are summed up and give the probability distributions
−b(0)
Pmax (t) and Pmin (t) of reaching upper and lower thresholds at time
Assuming for now that b(t) = b is time-independent (the general t.
case is presented in Appendix A.1), the integrated probability
evolves by 2.3.2. The Kolmogorov backward equation and its efficient realiza-
tion in fast-dm
b b b
σ2
∫ ∫ ∫
d The KBE is the basis of fast-dm (Voss & Voss, 2007). To simu-
p dx = ∂t p dx = ∂xx p − µ∂x p dx, (9)
dt −b −b −b 2 late the probability of reaching the upper threshold at time T , we
define the target set (which will serve as terminal values to the
KBE) as
3 φ could for instance be the Dirac distribution or the symmetric Beta(α, α )-
0 qT (x) = 1 for x = b and qT (x) = 0 for − b ≤ x < b. (11)
distribution. The Dirac delta function∫ φa (x) has the property of concentrating an
∞
integral on the point a ∈ R, that is, −∞ f (x)φa (x) dx = f (a). It, therefore, models
4 Thus, using the simple rule b u′ v dx = − b uv ′ dx + u(b)v (b) − u(a)v (a)
∫ ∫
that we always start in the position x = a. Due to its very low regularity (e.g., it a a
is not continuous), it cannot be considered a classical function, but must be and choosing the constant function v (x) ≡ 1 with v ′ (t) ≡ 0 such that the first
regarded as a limit φϵ → φ0 . part vanishes.
5
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Then, on the domain (8), but with time-independent thresholds multiple times for deriving the PDF. While not the most beneficial
for the beginning, we solve the KBE (7) backward in time, that approach, efficient repeated approximations of this equation are
is, from t = T back to t = 0. The solution q(x, t) denotes the still possible, and we refer to Appendix C.2 for a discussion. There,
probability that state x at time t will reach the upper threshold we also discuss the proper approximation of rough initial and
at time T (or earlier). For instance, q(0, 0) denotes the probability threshold values that was raised by Boehm et al. (2021, Section
that the neutral state at time t = 0 will reach the threshold latest 2).
at time T . The boundary and final-time conditions (11) introduce
a discontinuity at x = b for t = T that will lead to reduced 2.4. Random walks as an explicit discretization of the Kolmogorov
convergence of numerical approximation schemes. To deal with forward equation
this, Boehm, Cox, Gantner, and Stevenson (2021) use splitting
the KBE into a time-dependent one that allows for an efficient Diederich and Busemeyer (2003) and Diederich and Oswald
and accurate solution by a series expansion, and a remaining (2016) presented a simple and highly efficient approach based
homogeneous KBE that has a smooth solution and allows for an on the forward transportation of probabilities by matrix vector
accurate approximation with finite difference methods. multiplications. We only give a very brief introduction here for
One benefit of the KBE approach is that no initial probability the purpose of classification and refer to the literature mentioned
distribution must be chosen beforehand. Rather, the probability above for details.
that any initial state X0 ∈ [−b, b] will reach the upper threshold Given time-independent thresholds −b < x < b, the state
at time T is given by Pmax (X0 ) = q(X0 , 0). If the probabilities for space is discretized into M equidistant steps of size ∆x, namely
reaching the lower threshold at time T are to be estimated, the x0 < x1 < · · · < xM . Then, the vector Pn = (Pn,1 , . . . , Pn,M ) ∈ RM
KBE must be solved again, with the final state qT (x) = 1 for indicates the probability distribution at time tn . More precisely,
x = −b and qT (0) = 0 for −b < x ≤ b. Still, with only two the entry Pn,m indicates the probability that at time tn the state
KBE discretizations the probabilities for all initial distributions are is within the interval [xm−1 , xm ]. In each time step tn ↦ → tn+1 ,
obtained. these probabilities are transported Pn ↦ → Pn+1 by fluxes between
Yet another, and even more striking, problem is that the adjacent states. This results in a very efficient algorithm, each
step being equivalent to performing one matrix vector multipli-
repeated approximation of the KBE is necessary to obtain the
cation with a tridiagonal matrix.6 The boundary fluxes at x0 and
complete distribution Pmax (X0 , t) of reaching a threshold at a
xM correspond to the probability of reaching upper and lower
different time t. More precisely, we must solve the KBE backward
thresholds.
in time from each initial value qt (·) for t ∈ [0, T ]. Then, by
Denoting the transition probabilities by an,− and an,+ (see
Pmax (X0 , t) = qt (X0 , 0), the probability is given that state X0
Diederich & Oswald, 2016), the standard DDM is given by the
reaches the upper threshold at time t.
choices
The success of fast-dm is based on a reformulation that is
µ∆x µ∆x
possible for time-independent drift rates and thresholds. In this 1− σ2
1+ σ2 1
case, the KBE is transformed into an equation that runs forward an,− = , an,+ = , an,0 = 1 − , (14)
2ζ 2 2ζ 2 2ζ 2
in time. By replacing t ↦ → s − t we define ps (x, t) as
with a parameter ζ > 0 that controls the stability of the method.
ps (x, t) := qs (x, s − t) ⇔ qs (x, t) = ps (x, s − t). In each step n ↦ → n + 1 we compute
Thus, we replace qs (x, t) by ps (x, s − t) in the KBE (7) and get5 Pn+1,m = an,− · Pn,m−1 + an,0 · Pn,m + an,+ · Pn,m+1 , m = 1, . . . , M .
σ (t)2 (15)
∂t ps (x, s − t) − µ(t)∂x ps (x, s − t) − ∂xx ps (x, s − t) = 0.
2 This can be realized efficiently by one matrix–vector product with
Finally, we relabel the variable and introduce r := s − t a tridiagonal matrix. Introducing pn,m := Pn,m /∆x, the specific
σ (s − r)2 choices of an,∗ from (14) allow to write (15) as the equivalent
∂r ps (x, r) − µ(s − r)∂x ps (x, r) − ∂xx ps (x, r) = 0. (12) difference equation
2
If t runs backward in time from s to zero, the new variable pn+1,j − pn,j σ2 pn,j+1 − 2pn,i + pn,j−1 pn,j+1 − pn,j−1
r = s − t will run forward from zero to s. This forward running
= · −µ· .
∆ x2 2 ∆x2 2∆x
KBE (12) is still equivalent to the original one (7). σ 2ζ 2
The problem of this formulation is that it still depends on the (16)
final time s > 0 which has to be chosen a priori since it is required
Finally, introducing a time step ∆t as
to evaluate the drift rate µ(s − r), the diffusion constant σ (s − r),
and also the thresholds b(s − r) and −b(s − r). If we, however, ∆ x2
assume that these values do not depend on time, they also do ∆t = (17)
σ 2ζ 2
not depend on s and the forward running KBE simplifies to
reveals the equivalence of this method with the forward Eu-
σ2 ler method in time — central difference in space discretization of
∂r p (x, r) − µ∂x p (x, r) −
s s
∂xx p (x, r) = 0.
s
(13)
2 the KFE (6), as has already been pointed out in Appendix A.4
Since the final time s does not enter the equation, it is sufficient of Diederich and Busemeyer (2003). The condition ζ ≥ 1 (com-
to solve (13) once from r = 0 to r = T : The probability of pare Diederich & Oswald, 2016, and also (17)), corresponds to
reaching the upper threshold when starting at X0 is given by the classical parabolic stability condition for explicit time step-
Pmax (X0 , t) = p(X0 , t). This is the approach that explains the ping schemes ∆t ≤ ∆x2 /σ 2 (Fort & Fankel, 1953, and see also
Appendix B).
efficiency of fast-dm.
However, if any parameter is time-dependent, this simplifi-
6 A matrix A ∈ Rn×n is called tridiagonal, if each matrix row i = 1, . . . , n has
cation is not possible and the KBE (7) must be approximated
at most three values that differ from zero: the diagonal entry Aii and the values
left and right of the diagonal Ai,i−1 and Ai,i+1 , respectively. Linear systems with
5 With the chain rule it holds −∂ qs (x, t) = ∂ ps (x, s − t), that is, the sign tridiagonal matrices can be solved efficiently with an effort that is comparable
t t
changes. to the multiplication with the matrix itself.
6
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
2.5. Discussion on methods for deriving the first-passage time PDF Based on the discrete points in time, the SDE (1) is approximated
by the SE method, which is also called the Euler–Maruyama
Section 2 began with introducing the SDE of the diffusion method,
model and its implementation in simulations with the SE method. √
Xn = Xn−1 + ∆t µ(tn−1 ) + σ (tn−1 )Z (tn−1 ) ∆t . (19)
We then introduced the integral equation method (see also be-
low) and turned more thoroughly to two PDE solutions, that is, In a deterministic setting in which forward Euler approximation
the KFE and the KBE. While, in principle exchangeable, the KBE of the first derivative X ′ (tn ) is of first order giving linear conver-
1
solution is highly efficient with time-independent parameters (as gence O(∆t), the resulting SE method is only of order O(∆t 2 )
done in fast-dm; Voss & Voss, 2007), whereas the KFE allows for due to the discretization of the diffusion term such that, to reduce
more flexibility (see also Shinn et al., 2020). An explicit discretiza- the error by the factor two, even four times more time steps are
tion of the KFE is also achieved with random walks, as used in the required. In addition, by the law of large numbers, many samples
approach by Diederich and Busemeyer (2003) and Diederich and of (2) are required to obtain accurate probabilities. A detailed
Oswald (2016). analysis of the error is given in Section 3.5. An extension to higher
To derive the integral equation method, the SDE (1) is first order methods that have a better scaling in terms of the time step
transformed into a deterministic differential equation, or here size ∆t is in general not possible (see Øksendal, 2000, for more
into an integral equation, with the help of the Itô calculus. This details).
will allow a direct discretization, that is, without repeated sam-
pling. In contrast to the PDE approaches, the location variable is 3.2. Discretization of the integral equation
decoupled in the integral equation. The probabilities of the first
passage times are described independently for each initial state The coupled set of integral Eqs. (3) describing the probabilities
x ∈ [−b, b] at time t = 0. Pmin (t , x) and Pmax (t , x) can be discretized by approximating the
We will now continue by presenting numerical realizations of integrals with the trapezoidal rule. Hence, as in the case of the
these methods with a particular emphasis on the effort required SDE (1) we introduce time steps 0 = t0 < t1 < · · · < tN =
to achieve a desired error reduction. T (compare (18)). We approximate Pmin,n (x) := Pmin (tn , x) and
Pmax,n := Pmax (tn , x). The integrals are approximated with the
trapezoidal rule
3. Numerical discretizations
n−1 ( )
∑
The just introduced methods and their numerical realization Pmin,n (x) = −2Ψn−,0 + 2∆t Pmin,k (x)Ψn−−
,k + Pmax,k (x)Ψn,k
−+
(which we use synonymously with discretizations) must be con- k=1
sidered as two separate matters (see also Remark 1). However, it n−1 ( )
is often the case that a discrete method is used. This is the case +,0
∑
Pmax,n (x) = −2Ψn,0 + 2∆t Pmin,k (x)Ψn+−
,k + Pmax,k (x)Ψn,k
++
with random walks (Diederich & Busemeyer, 2003) and in general
k=1
this approach, using discrete arguments, can be called the origin
of numerical mathematics (Wanner, 2010). (20)
In the following paragraphs, we will introduce and relate dif- with vectors Ψ , Ψ − +
∈R N +1
Fig. 3. Mesh of the space–time domain Ω and visualization of the flow of information in different discretizations of PDEs. Left: explicit forward Euler, Middle:
implicit backward Euler, Right: implicit Crank–Nicolson. Explicit couplings are shown in red, implicit connections in green. If a scheme is implicit, all states pn+1,m
within one column (i.e., for m = 0, . . . , M) must be solved at once.
Fig. 4. Different methods (center) and their numerical approximations. By random processes we denote the direct discretization of the SDE, by KFE the approach
presented in Shinn et al. (2020), by random walks the approach by Diederich and Busemeyer (2003), and by fast-dm, the framework of Voss and Voss (2007). By
KFE (transformed) we denote the modification of the KFE solution described in the present work. The integral equation method is the one introduced in Buonocore
et al. (1990). Blue labels indicate the technique that is used to show the relation between different models, black arrows refer to the discretization technique. In
violet, we indicate the accuracy of the discretization and finally, in red, we highlight possible limitations.
the effort, measured in elementary operations, to determine the bound by T /∆t such that for a total of Ntr trials we expect the
probability density with an error that does not exceed a bound ϵ > complexity
0? When we talk about errors, we always refer to the relative
CSDE (ϵ ) = O T · ϵ −4 .
( )
maximum error of the cumulative distribution function (CDF), see (31)
Remark 2 for details.
Reducing the desired tolerance ϵ by a factor of 10 thus calls for
Instead of giving exact computing times, which would depend
on the implementation and the hardware, we analyze the com- 104 = 10 000 times higher effort.
plexity with respect to the error ϵ . To be precise, the question is: The integral equation method described in Sections 2.2 and
How much does the required effort grow if we aim at reducing the 3.2 is based on the trapezoidal rule. It converges quadratically in
error ϵ by a factor of 10? The results of the following analyses are space and time (Hackbusch, 1995, Theorem 2.2.12)
summarized in Fig. 5.
We begin with the case of time-independent parameters. The EIE (∆t , ∆x) = O(∆t 2 + ∆x2 ). (32)
SDE (1) is converging slowly for decreasing step sizes ∆t → 0 and Discretization of the integral equation calls for a nested time-
increasing trial counts Ntr → ∞. The error behaves like (Kloeden loop such that the complexity is quadratic in time N = (1/∆t)2
& Platen, 1999, Theorem 10.2.2)
and linear in space M = 1/∆x. For the balanced discretization
1
− 21 )
1
ESDE (∆t , Ntr ) = O ∆t 2 + Ntr . ∆x = ∆t = ϵ 2 the complexity for reaching the accuracy ϵ > 0 is
(
(30)
given by
The optimal balance between time step and number of trials is
1 1 3
to choose ∆t = ϵ 2 and Ntr = ϵ − 2 . The effort of each trial is CIE (ϵ ) = O(T · (bmax − bmin ) · ϵ − 2 ). (33)
9
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
In all cases, we consider the time limit T = 1000 and do not take Remark 2 (Realization). The KFE, the random walk approach, and
decision times t > T into account. Finally, in a fourth case, we also the integral equation method are implemented in Python
take a Dirac distribution as initial value. and no parallelization is used. Coding is based on NumPy (Harris
et al., 2020) and SciPy (Virtanen et al., 2020) such that all inner
Case I For this case, all parameters are assumed as time-inde- loops are performed efficiently in the corresponding C-backend
pendent with the following values: (see also Appendix C.1). All these methods are run on a Macbook
Pro with M1 Max CPU.
µ = 0.5, bmin = −75, bmax = 75, σ =4 All SDE-based computations were coded in C++ and run on an
AMD EPYC 7662 CPU and using 64 parallel cores. In the figures,
Case II For this case both drift rate and the threshold are time- we show the average error of 64 repetitions with the indicated
dependent at the same time. The time-dependent drift number of Ntr = O(1/∆t) trials each.
rate µ(t) was modeled as in DMC (see Ulrich et al., 2015) To compare the accuracy of the different methods, the relative
and is the sum of the (time-independent) drift rate of the maximum-norm error of the CDFs for reaching the two thresholds
controlled process, µc , and the (time-dependent) drift rate is computed. This is the error norm used in all figures.
of the automatic process, µa , thus: maxn=0,...,N |pCDF (tn ) − pexact
CDF (tn )|
Error = . (41)
µ(t) = µc + µa (t) maxn=0,...,N |pexact
CDF (tn )|
More precisely, the (expected) time-course of the auto- The ‘‘exact’’ CDFs pexact
CDF (tn ) are approximated using the KFE ap-
matic process is modeled as a (rescaled) Gamma function, proach on meshes that are refined both in space ∆x and time
that is, ∆t. The code to reproduce all examples is published on Zen-
( )a−1 odo (Richter et al., 2023).
t t ·e
E [Xa (t)] = A · e− τ Remark 3 (Computing CDFs from PDFs). It appears trivial to trans-
(a − 1)τ
form a PDF to a CDF. Usually, it suffices to call the cumsum
with A being positive and negative in congruent and incon- command of Matlab, Python, or similar packages and to multiply
gruent trials, respectively. The first derivative is then the with the step size ∆t. This, however, corresponds to the box rule,
drift rate µa (t): also called rectangle formula for numerical quadrature (Quar-
( )a−1 ( ) teroni, Sacco, & Saleri, 2007, Section 9.2.1) which is of first order
dE [Xa (t)] t t ·e a−1 1 only. Where second-order discretizations are used to discretize
µa (t) = = A · e− τ · −
dt (a − 1)τ t τ the KFE (as we do in the present approach) or the KBE (as done
in fast-dm), it is also advisable to use higher-order quadrature
(39) (like the trapezoidal rule) when transforming the PDF into a CDF.
With a = 2 the drift rate then becomes Otherwise, the CDF accuracy would be reduced to first order only.
) ( ( ) Given that PDF(i) for i = 0, 1, . . . , n is the probability in the time
t1 1 t ·e steps ti , the trapezoidal rule for computing the CDF is:
µ(t) = µc + A · e− τ · − with
τ t τ n−1
∆t ∆t
µc = 0.5 , A = 20, τ = 150, σ = 4.
∑
CDF(n) = PDF(0) + ∆t · PDF(i) + PDF(n).
2 2
i=1
The time-dependent thresholds were modeled with a hy-
perbolic ratio function (see Voskuilen et al., 2016), that is, In the following, we present the results of the four test cases
as: described above to compare the performance of the different
( ) methods.
t
bmax (t) = b0 1−κ · with Case I. Fig. 6 visualizes the resulting error in the CDF versus the
t + t0.5
time step size ∆t (upper row) and the computational efficiency,
b0 = 75, κ = 0.6, t0.5 = 150 ms (40) which is the computational effort (measured in seconds), as a
function of the error (lower row). For each of the four methods,
and bmin (t) = −bmax (t).
we choose the spatial discretization parameter respectively the
number of trials such that an optimal balance is given, that is, for
Case III This test case is identical to Case II apart from the pa-
the SE method we choose Ntr = 10 000/∆t, ∆x ≈√∆t for the
rameter τ . Instead of the fixed value τ = 150, here
KFE and the integral equation method, and ∆x = σ / ∆t for the
we will consider different values in the interval [5, 150].
random walk.
It turns out that especially small values of τ make the The results show the expected order of convergence (upper
approximation more difficult. We will give an explanation row) with respect to the discretization parameter ∆t. If we com-
for this behavior later. pare the complexity of the approaches, that is, the resulting error
plotted over the computational effort, we observe that the SDE
Case IV The fourth case is again similar to Case II, but the initial scales as expected (compare (31)). However, both the random
value is fixed to X0 = 0, whereas Cases I−III picked the ini- walk approach and the KFE behave better than expected. The ran-
tial from the smooth Beta-distribution. This irregular case 3
dom walk scales linearly in 1/ϵ instead of the expected O(ϵ − 2 ),
is important in application (Kiani & Shadlen, 2009) and we − 21
while the KFE scales nearly like O(ϵ ) instead of the expected
also include this test case as the highly non-regular Dirac at
linear behavior O(ϵ −1 ). We attribute this slight discrepancy to
time t = 0 is known to be troublesome to approximate (see
the very efficient C-backend of the NumPy library and refer to
also Boehm et al., 2021). Appendix C.1 for a discussion.7
To compare the different approaches, we compute the PDF p(t) of
7 The equations considered here are very small. For instance, b
reaching the upper threshold at time t > 0. The ‘‘true’’ solution max =
75, bmin = −75 and ∆x = 2.5 means that only 60 probabilities pn,m must
p(t) is obtained by solving the KFE on a very fine discretization. be identified in each time step. For such problems, a large fraction of the
For the different approaches, we compare the relative maximum computational time will be spent on the interface between Python and the
error, as is detailed in Remark 2. C-backend of NumPy, whereas the actual computations are very fast.
11
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Fig. 6. Visualization of the results for Case I: Comparison of the error versus time step ∆t (upper row) and the computational time required to reach the error ϵ
(lower row). The dashed lines show the complexity.
It is clearly evident that the SE method for approximating the transformation has been applied to the KBE by Boehm et al.
SDE is not competitive, particularly as the C++ implementation (2021).
runs on a server with 64 parallel threads, whereas the other three Fig. 7 visualizes the resulting accuracy for estimating the PDFs
approaches are sequential. Even though the integral equation and of reaching the upper threshold b(t) and the corresponding com-
the KFE approach have a slight advantage over random walks, the putational efficiency. The upper panel shows the approximation
differences are not substantial. Both, as well as fast-dm, which properties of the different approaches and the results support
would behave here according to the KFE approach, provide an the theoretical predictions: While the integral method and the
accuracy of 0.1% in less than a second. KFE approaches maintain the approximation properties that they
Case II. This test with both time-dependent thresholds and drift show for the case of time-independent problems, the random
rate will show that all methods except for the KFE discretization walk method falls back to linear convergence with respect to
developed here will suffer from a reduced accuracy. We have ∆x which, by the time step restriction ∆t ≈ ∆x2 , relates to
1
argued in Section 2.3.1 that applying the KBE to time-dependent ∆x 2 convergence with respect to time. The lower panel in the
problems will call for multiple solutions and an increased effort. figure shows the computational time that is required to reach
Hence, we do not consider the KBE. As noted in Diederich and the accuracy ϵ . A comparison of the results with that presented
Oswald (2016, Section 5.3), the random walk approach suffers in the lower row of Fig. 6 shows the advantage of the modified
from stability problems if time-dependent thresholds are consid- KFE method, which is the only method which gives optimal
ered and shows spikes in the PDFs whenever the threshold b(t) complexity in case of time-dependent parameters.
crosses a mesh layer of size ∆x. In Appendix A.2, we present our So far, we have seen that in case all parameters are time-
approach to remove these spikes. However, as noted in Diederich independent (Case I), the SDE approach is much less efficient than
and Oswald (2016), the convergence rate drops to O(∆x). Since the other discretizations based on the integral equation or the
the parabolic time step restriction ∆t = O(∆x2 ) must still be sat- PDEs, which are hardly distinguishable. Yet, with time-dependent
isfied, the random walk approach is not a balanced discretization parameters (Case II), the random walk approach becomes notice-
and thus yields the non-optimal efficiency (see also Diederich & ably less efficient than the KFE approach. Here, both thresholds
Busemeyer, 2003; Diederich & Oswald, 2016). and drift rate are time-dependent simultaneously, however, the
The KFE also calls for adjustments along the thresholds if drop in accuracy of the random walk discretization is only due to
these are time-dependent. In Shinn et al. (2020), the authors the time-dependence of the thresholds. In fact, it has been shown
modified the space discretization in a way that is similar to in Diederich and Oswald (2016) and also in Shinn et al. (2020)
the approach noted in Diederich and Oswald (2016). They also that time-dependent drift rates alone do not cause any problems.
report an oscillatory behavior that is cured by replacing the
second-order Crank–Nicolson method with the first-order back- Case III. Time-dependent drift rates with a non-vanishing deriva-
ward Euler discretization. The benefit over the random walks is tive at initial time µ′ (0) ̸ = 0 lead to excessive discretization
that the implicit backward Euler method is unconditionally stable errors. Stated briefly, this effect is not due to a special shape of
and no time-step restriction must be satisfied. the drift rate, and not due to non-vanishing derivatives at time
In Appendix A.1, we introduce a more subtle modification of t = 0, but only due to its maximum derivative in the whole
the KFE method for time-dependent thresholds that is based on a interval. It is well known that the deterministic Euler method
transformation of the problem onto one that has fixed thresholds. applied to the drift problem (no diffusion) dX (t) = µ(t) will yield
Although being mathematically more challenging, this approach an approximation with error behaving like
is able to give accurate results without compromising efficiency.
This trick is well-established when handling PDEs and the same |Xn − X (tn )| ≤ C · |∆t | · max |µ′ (t)|,
t ∈[0,1000]
12
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Fig. 7. Visualization of the results for Case II: Comparison of the error versus time step ∆t (upper panel) and the computational efficiency, that is, required
computational time for reaching the error tolerance ϵ (lower panel).
Fig. 8. Visualization of the results of Case III: Comparison of the performance of all approaches using time step size ∆t = 1 for decreasing values of τ , which
corresponds to an increasing derivative of the drift rate |µ′ (t)|. The upper row shows the error in the CDFs for reaching the thresholds as a function of τ . The
lower row shows the PDFs for τ = 10. The gray lines are the PDFs as derived from a highly resolved discretization. Note that the explicit random walk violates the
hyperbolic time step condition ∆t ≤ ∆x/2 · µ(t) (for details, see Appendix B), what results in negative probabilities for the explicit random walk approach. For the
lower threshold the results obtained with the integral equation method are indistinguishable from the exact solution. (Note that the SE method is not included in
the lower row plots).
that depends on the maximum derivative of the drift rate over the too small values of τ lead to a violation of the hyperbolic time
complete interval (see Hairer, Nørsett, & Wanner, 2008, Theorem step restriction
3.1). The same error estimate will apply to the random walk ∆x
approach which is also based on the forward Euler method. The ∆t ≤ µ(t),
KFE simulations are based on the second-order trapezoidal rule 2
and thus, the error should behave as |∆t |2 · maxt ∈[0,1000] |µ′′ (t)| which results in an unstable discretization that even gives neg-
such that the second derivative of µ(t) is involved. The special ative probabilities. The 2nd order implicit discretization used for
form of µ(t) (via (39)) yields the dependency |µ′ | = O(τ −1 ) and KFE computations has no time-step restriction. However, domi-
|µ′′ | = O(τ −2 ) such that small values of τ lead to large derivatives nant drift leads to what is called a singularly disturbed transport
and larger errors. diffusion problem (Johnson, 2009, Chapter 9) and central differ-
Fig. 8 visualizes the error in reaching the thresholds depending ences (such as used in fast-dm and our mapped KFE approach)
on the choice of τ , that is, the parameter that determines the will call for a restriction on the spatial step size
peak time of the Gamma function in DMC (Ulrich et al., 2015). The σ
impact of small values of τ is clearly visible. For random walks, ∆x ≤ .
2 max |µ|
13
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Fig. 9. Visualization of the results for Case IV: Comparison of the error versus time step ∆t (upper panel) and the computational efficiency, that is, required
computational time for reaching the error tolerance ϵ (lower panel).
If this condition is violated, also implicit discretizations might smoothing property and requires additional stabilization. This can
produce oscillatory solutions with negative probabilities. By mod- be achieved either by the variant discussed in Appendix B or, even
ifying the spatial discretization, for example, using an upwind better, by a modification called Rannacher time-marching (Ran-
method instead of central differences (Johnson, 2009, Chapter 9), nacher, 1984): The first couple of steps (usually two or four)
stability is preserved at the cost of reduced convergence, yielding are replaced by approximations with the implicit backward Euler
O(|∆x|) instead of O(|∆x|2 ). Fig. 8 shows that the approxima- discretization. It can then be shown that this modified Crank–
tion properties of random walks, KFE, and the integral equation Nicolson approximation applied to rough initial data gives the
method show large differences. Especially the error of the integral error estimate
method is far below the errors of the other two approaches
1
(and of course far below the error of the stochastic simulation). ∥p(tn ) − pn ∥ ≤ ∆t 2 . (42)
However, all calculations were performed at very small step size tn2
∆t = 1 ms and the picture changes when simulation times are The error can still ‘‘explode’’ for tn → 0. The convergence is,
taken into account. While a simulation with random walks needs however, of second order for all positive times. This approach has
about 0.0075 s, it is 0.058 s for the KFE and 0.72 s for the integral
been refined by Giles and Carter (2006) and extended to Dirac
method.
initials. They suggest replacing the first Crank–Nicolson step by
Case IV. In this test case, the initial distribution is a Dirac distri- four backward Euler steps with step size ∆t /4. In the context
bution. In terms of the PDE methods, this means that the initial of the drift-diffusion problem, the error singularity for tn → 0
value is a highly irregular function, and the solution will have a indicated by (42) plays no role as the solution p(x, t) is only eval-
singularity at time t = 0. As discussed in Boehm et al. (2021), uated along the thresholds and high frequent errors are quickly
such irregular initial might lead to problems in the discretization damped before those are reached. The Rannacher time-marching
and an increased error. For the KBE, Boehm et al. (2021) sug- procedure is the standard method in the field of computational
gested splitting the PDE into a regular part which can be easily finance for approximating the Black–Scholes equation, a variant
discretized with standard finite difference approaches and into of the KBE. The analysis for the most simple case that would
the irregular part which is approximated in a fast converging refer to zero-drift and constant thresholds is given in Rannacher
sequence. Another reason to consider this test case is the con- (1982), and the general case that directly applies to the KFE is
struction of the integral equation method, which benefits from discussed in Rannacher (1984).
concentrated initial values, because here a discretization of the
In Fig. 10 we demonstrate that such stabilizing modifications
location variable can be omitted.
are not even required for the typical application of the KFE.
In Fig. 9 we show the approximation error and the computa-
In the lower row, we show the smooth results for Rannacher
tional efficiency for the integral equation method, random walks,
time-marching. The time step is chosen as ∆t = 1 and the
and the KFE. The results seem surprising at first glance, as they
spatial step size as ∆x = 3.75 on the left and ∆x = 1.875
do not reflect the assumed difficulties of the KFE method due to
the irregular initial distribution. In the following, we will explain on the right. Although the Crank–Nicolson method (upper row)
this behavior. shows highly oscillating errors, these are quickly damped. This
The KFE, as a simple linear elliptic problem, has a smoothing is why also the non-stabilized Crank–Nicolson method provides
property (Evans, 2010, Section 2.3). Even if the initial value p0 (x) the full approximation order: The probabilities are only needed
is irregular (e.g., a Dirac as considered here), the solution p(x, t) at the thresholds −b(t) and b(t) and until the information from
is smooth for all t ≥ t0 > 0. This means that the solution will be the initial value has substantially reached these margins, the
differentiable and easy to approximate as long as we ‘‘stay away standard Crank–Nicolson method has sufficiently smoothed out
from zero’’. the oscillations coming from the initial disturbance. The explicit
A numerical time-stepping scheme must also have this smoo- Euler scheme, which is the basis of the random walks approach
thing property. The Crank–Nicolson method does not have this is only conditionally stable when ∆t ≤ ∆x2 /σ 2 .
14
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Fig. 10. Case IV: Smoothing property for a Dirac initial distribution. We show the solution p(x, t) on a small subset of the domain for time t ∈ [0, 10] and
x ∈ [−25, 25]. The oscillations have no effect along the thresholds as the solution is already smoothed out there.
4.1. Summary and discussion of the numerical study Strictly speaking, we have so far only considered the decision
part of a response, and thus the PDF as derived from the consid-
The results of the numerical study show that, in principle, all ered methods is that of the decision time. However, RTs properly
methods and discretizations are suitable to efficiently determine comprise perceptual and motor response parts as well. These
the PDF of the first-passage time. As one could expect, the only two contributions are typically summarized as the residual (or
exception was the SE method, which clearly fell behind the other non-decision) time. DMC assumes the residual time as normally
methods regarding accuracy and efficiency. distributed with expected value µR and standard deviation σR
Yet, some differences also need to be noted for the other meth- which is added to the decision time to yield RTs. Thus, the PDF
ods and discretizations. In terms of accuracy, the KFE, random of RTs in DMC is the convolution of the PDF of the decision time
walks, and the integral equation method are comparable if the and a normal distribution.
discretization parameters, that is, the time step size ∆t and the
space step size ∆x are chosen optimally. However, this picture 5.1. Simon and Eriksen task data from Ulrich et al. (2015)
changes when using time-dependent thresholds. In this case, only
Ulrich et al. (2015) reported an experiment with n = 16
the KFE and the integral equation method maintain accuracy.
participants performing both a Simon task and an Eriksen flanker
A second aspect is the efficiency, that is, the required com-
task. Briefly, the letters H and S served as the imperative stimuli
puting time to achieve the desired accuracy. In case of a smooth
requiring a left or right response. In the Simon task, the letters
initial distribution, only the KFE is optimal and the effort scales
were presented to the left or right of the fixation cross, and a trial
linearly with the error. For the other methods, the computa- was congruent when the stimulus and the correct response were
tional time increases faster when increasing the accuracy require- on the same side. In the flanker task, the imperative stimulus
ments. For the integral equation method, however, this reduced (i.e., the target) appeared centrally, and two letters appeared on
efficiency is marginal and only of theoretical importance. its left and right side as flankers. A trial was congruent, if the
In the next section, we will use the (transformed) KFE method target and flankers were the same letters.
to estimate parameters to empirical data. Overall, a congruency effect was observed for RTs and error
rates, and it was roughly of the same size for both tasks. However,
5. Data fitting the flanker congruency effect became larger with longer RTs
(i.e., a positive delta function) and the Simon congruency effect
In this section, we apply the KFE discretization developed in became smaller with longer RTs (i.e., a negative delta function).
the present article to fit DMC (Ulrich et al., 2015), a diffusion These results are visualized as filled circles in Fig. 11. Accordingly,
model with a time-dependent drift rate, to empirical data. As both tasks differed in their estimated values of the peak of the
the thresholds are time-independent, the KFE method described Gamma function, that is, the Simon data yielded a smaller value
in Shinn et al. (2020) is also applicable without any restriction. of τ .
15
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Fig. 11. Visualization of the observed data provided by Ulrich et al. (2015) (filled circles) separately for congruent (blue) and incongruent (orange) trials. The solid
lines are the estimated CDFs that result from the fitting procedure described in Section 5.2. The computations are obtained with the KFE using the time step size
∆t = 1.25 ms (see also Tables 1 and 2).
Table 1
Fitting results for the Simon task: The table shows the dependency of the identified parameters on the discretization parameter ∆t (with ∆x = ∆t and provides the
required computational time in the last column. The results are in good agreement with the ones published in Ulrich et al. (2015), here provided in the last row
labeled reference.
∆t σ a b µR σR A µc τ α Time
5 4.25 2 52.4 321.0 30.3 21.0 0.63 49.8 1.99 44 s
2.5 4.25 2 54.1 321.1 29.9 19.0 0.62 48.1 2.06 90 s
1.25 4.25 2 58.8 319.2 33.0 14.5 0.69 46.9 1.95 188 s
0.625 4.25 2 58.1 321.2 33.4 14.6 0.69 50.2 1.73 434 s
0.3125 4.25 2 57.2 322.4 35.7 15.0 0.69 45.0 1.97 909 s
reference 4.25 2 54.6 322.8 38.6 16.0 0.69 34.9 2.80
Table 2
Fitting results for the Eriksen flanker task: The table shows the dependency of the identified parameters on the discretization parameter ∆t (with ∆x = ∆t and
provides the required computational time in the last column. The results are in good agreement with the ones published in Ulrich et al. (2015), here provided in
the last row labeled reference.
∆t σ a b µR σR A µc τ α Time
5 3.98 2 49.9 336.6 29.0 20.2 0.56 57.4 2.07 48 s
2.5 3.98 2 52.4 335.2 30.4 18.7 0.57 55.0 2.13 92 s
1.25 3.98 2 54.6 334.6 30.0 19.2 0.59 52.8 1.98 182 s
0.625 3.98 2 45.4 339.6 28.9 17.0 0.44 90.3 1.77 459 s
0.3125 3.98 2 49.1 338.1 27.4 17.8 0.50 96.3 1.27 949 s
reference 3.98 2 51.3 331.8 36.6 19.2 0.69 118.3 2.15
5.2. Fitting procedure and results values (particularly regarding α and τ ). To understand the pos-
sible origin of this discrepancy, we focus on the dependency of
To fit the parameters to the data, we simplified the procedure the optimization results on the accuracy of the computations.
of Ulrich et al. (2015) and minimized the weighted least squares Tables 1 and 2 list the identified parameters for discretizations
errors of the CDF measured in 20 equally spaced bins, but did of increasing fineness. The last row of each table gives the refer-
not additionally identify the conditional accuracy function (which ence parameters as identified by Ulrich et al. (2015). While the
can be derived from the full CDF). The problem parameters are identified parameters still change drastically for ∆t > 1, they
similar to those of Case II described in Section 4. In particular, converge for finer discretizations. The last column of both tables
the drift rate is parameterized as described in (39). The exponent indicates the complete time used for the optimization procedure.
was fixed as a = 2. The thresholds are time-independent. The In both cases, Simon and Erikson flanker, the solver required
diffusion constant was fixed to those values provided by Ulrich about 1 000 evaluations of the goal functional, corresponding to
et al. (2015, Table 2), that is, σ = 4.25 for the Simon task and 1 000 approximations of the underlying KFE.
σ = 3.98 for the Eriksen flanker task. The parameter σ cannot be In Fig. 6, we can see that for obtaining the KFE-accuracy at
identified along with the threshold b and the drift rate µ as the ∆t ≈ 1 ms, the random walk approach would require ∆t ≈
resulting system would be under-determined, for this see Ulrich, 0.1 ms and stochastic simulations would even require ∆t ≪
Schröter, Leuthold, and Birngruber (2016) and also Remark 4 in 0.01 ms with a corresponding increase in computational time.
Appendix A.1. The initial distribution is a Beta-distribution with In particular, using the SE method is not feasible to obtain a
parameter α . α and all further parameters, that is, the threshold sufficiently accurate PDF for accurate parameter identification.
b, residual time parameters µR and σR , as well as µc , A and τ ,
are left free for the optimization procedure, which is a simple 6. Conclusion
Nelder–Mead solver as implemented in Python’s SciPy library.
The identified parameters confirm those reported by Ulrich The present study’s purpose was to compare various ap-
et al. (2015), however, with slight differences in the exact proaches to approximate the first-passage time distribution, with
16
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
a focus on contemporary diffusion models with time-dependent In contrast, if only the drift rate is time-dependent, no further
parameters, usually time-dependent drift rates or time-dependent modification is necessary. In this case, all PDE approaches (includ-
thresholds. On the one hand, we discussed different methods ing random walks) can approximate the diffusion problem with
for obtaining the PDF of the first-passage times of a diffusion the same efficiency and accuracy.
process; on the other hand, we discussed the corresponding The integral equation gives a robust approximation with high
discretizations of these methods. accuracy. Theoretically, its efficiency is non-optimal as its effort
The most simple method is the SE, which simulates single scales quadratically with the inverse of the time step size. In usual
trials and whose repeated application yields the PDF. In contrast, applications, however, such small time steps have little relevance.
the transition with Ito’s Lemma leads to the two Kolmogorov
equations, two PDEs that can also be used to approximate the 6.2. Advice for practical applications
PDF. Alternatively, the SDE can be reformulated as an integral
equation. The analyses in this study provide information about what
The SE method is the most common and flexible discretiza- methods can or should be used for specific problems. The most
tion technique for SDEs. Although it is well established and has straightforward point in this regard is that it is always advisable
been used often, the present paper shows that there are always to use an approach based on one of the two PDEs or the integral
better alternatives that provide a good approximation far more equation. The SDE, in contrast, should only be the fallback option
efficiently, at least for those applications discussed here. in cases, where no PDE or integral equation is known. In general,
The two PDE approaches differ in several respects. First, they the PDE should be discretized with an implicit time-stepping
differ in their character: In an explicit method, each time step scheme. The additional effort is negligible, as only linear systems
can be written as a matrix–vector multiplication (e.g., random with tridiagonal matrices must be solved. The benefit of implicit
walks), while implicit methods require solving a linear system methods is the stable discretization independent of time and
of equations (e.g., fast-dm or the method we pursued in the spatial step sizes and the possibility of smooth approximations
present paper). The advantage of implicit methods is that no for rough initial and boundary conditions.
condition for spatial and temporal step sizes has to be respected; Some differences apply, however, about whether parame-
the discretization is always stable. Second, the methods differ ters are time-independent or not. If the parameters are time-
in the approximation order. While the Euler methods converge independent, the KBE approach, such as implemented in fast-dm,
linearly (halving the step size ∆t halves the error), the trapezoidal is optimal and has the additional advantage over the KFE meth-
rule is quadratic (halving the step size ∆t quarters the error). The ods that the initial distribution can be chosen a posteriori. If,
explicit Euler method is used in the case of random walks, and the however, this feature of the initial distribution is not required,
implicit Euler method is the fallback option in Shinn et al. (2020) both the KBE and KFE are equivalent. In general, with time-
when applied to time-dependent thresholds. fast-dm, KFE in the independent parameters, all methods under consideration in the
case of not time-dependent thresholds, and our KFE approach present study are highly efficient and well-suited, except for
based on a remapping of the time-dependent domain use the the SE methods. Even if random walks are theoretically some-
second-order trapezoidal rule. This gives smaller errors for the what less well-performing, this is not very important in practice
same number of time steps, resulting in higher computational since the computation times are short anyway. Furthermore, the
efficiency. method impresses with its direct access close to the mathematical
In the case of time-independent parameters, the methods model, which allows a very simple intuitive implementation.
based on PDEs (i.e., random walks, fast-dm, KFE, our approach) The integral method is the one with the smallest error con-
provide almost the same efficiency. Only the SE method falls stant and it gives the best results for a certain time step size
far behind and is not competitive. Consider, for instance, Fig. 6. ∆t. This largely compensates for its non-optimal efficiency in
There we show that all PDE methods generate approximations applications.
with less than 1% error within 0.01 s computational time, whereas In the case of time-dependent parameters, however, the in-
the SE method requires more than 1 s. If we aim at reducing the tegral approach and the KFE are more flexible and efficient than
error to 0.1%, the PDE approaches still manage to produce the the other methods. By adjusting the discretization scheme along
result in less than 0.1 s, whereas the SE method was not able the thresholds, the KFE preserves its good accuracy and stability
to reach such low errors at all. An extrapolation of the results and is the only method with theoretically optimal efficiency in all
from Fig. 6 suggests that the SE would take more than 1 000 s. cases.
Finally, identifying parameters requires many computations, in Parameter fitting applications should have the accuracy of
our case, a good 1 000 runs each. We will now turn to the point the underlying discretization procedure in mind. As shown in
of time-dependent parameters. Tables 1 and 2, the values of the identified parameters strongly
depend on the quality of the approximation. As a rule, parameter
6.1. The efficiency of different methods to realize DDMs with time- fitting should always be repeated on three discretizations with
dependent parameters increasing fineness. The result should only be accepted if the
identified parameters are within acceptable ranges.
According to our analyses, especially the discretization of the
integral equation and the KFE can be used flexibly for applica- 6.3. Outlook
tions with time-dependent parameters while providing very high
accuracy simultaneously. This, however, requires a suitable mod- In this paper, we have focused on solving a single SDE X ′ (t) =
ification of the KFE discretization to cope with time-dependent f (t) + dW (t). However, in some cases, the solution of a sys-
thresholds. Modifications of the finite difference approximations tem with multiple stochastic differential equations is sought, for
as suggested in Diederich and Oswald (2016) or Shinn et al. example, in the leaky competing accumulator model (Usher &
(2020) were, in fact, not able to fully preserve the approximation McClelland, 2001). In addition, other application cases do not fall
property and provide a stable result at the same time. Hence, we into the framework covered by this article. For instance, Koob,
describe in Appendix A.1 a transformation of the KFE that allows Ulrich, and Janczyk (2023) used the diffusion model in a multi-
for approximations with optimal accuracy (and efficiency) in that tasking context and had a second diffusion process start once the
case as well. first process exceeded the threshold.
17
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
In such cases, there is no immediate transition to a PDE that going from the simple rectangular domain (−1, 1) × (0, T ) to
directly gives the PDF. Instead, one will probably need to continue the real domain (see Fig. 12). Further, we define the probabilities
relying on the SE method to analyze the properties of such a p̂(x̂, t) on the rectangular domain as
system, which, however, comes with extremely long computation
p̂(x̂, t) = p(T̂ (x̂, t), t) = p(x̂ · b(t), t) ⇔ p(x, t) = p̂(x̂, t) = p̂(x/b(t), t).
times and reduced accuracy. In this case, substantial acceleration
would be possible using modern computer hardware such as By the chain rule, we can express the derivatives of p with respect
accelerator cards (GPU). to x̂ and formulate the KFE on the rectangle
A further point for future development concerns the optimiza- ( µ(t)
tion algorithms used. Often, and here as well, simple optimization x̂b′ (t) ) σ (t)2
∂t p̂(x̂, t) − − ∂x̂ p̂(x̂, t) − ∂x̂x̂ p̂(x̂, t) = 0. (43)
schemes are used in parameter fitting tasks, such as the Nelder– b(t) b(t) 2b2 (t)
Mead method (Nelder & Mead, 1965). As gradient-free methods, This transformed equation can be approximated by standard fi-
these methods are straightforward and versatile to use, but they nite differences on a uniform spatial and temporal mesh with-
converge slowly and require many repeated calculations of the out loss of accuracy. Such approaches are called Arbitrary La-
PDF. Alternatively, the gradient descent method or Newton-type grangian Eulerian coordinates and are standard in the handling
methods can be used, for example, BFGS (Kelley, 1999), which of free-boundary value problems (Richter, 2017).
can solve the data fitting problem with substantially fewer iter-
ations. On the other hand, these optimization methods require Remark 4. The mapping of the system to reference coor-
the derivatives of the PDF concerning the parameters given by dinates (43) reveals that drift rate µ(t), thresholds b(t), and
further PDEs, so-called adjoint and tangent equations that must diffusion constant σ (t) cannot be identified at the same time,
be approximated. Such an approach is discussed by Hartmann and as (43) is equivalent to
Klauer (2021) using the integral approximation of the PDF.
σ̃ (t)2
∂t p̂ − µ̃(t)∂x̂ p̂ − ∂x̂x̂ p̂ = 0.
Data availability 2
with
Data and software for the reproduction of results is available µ(t) x̂b′ (t) σ (t)
in a Zenodo repository (Richter et al., 2023) at https://doi.org/10. µ̃(t) := − , σ̃ (t) :=
b(t) b(t) b(t)
5281/zenodo.6970739.
determined by only two independent quantities. Fixing one of
Appendix A. Handling time-dependent thresholds µ(t), b(t), or σ (t) still allows to freely choose both µ̃(t) and σ̃ (t).
The impossibility to identify threshold, drift rate, and diffusion
Using time-dependent thresholds bmin (t) and bmax (t) gives rise constant at the same time was also noted by Ulrich et al. (2016).
to several technical difficulties if any method is used, except for
the SE discretization of the SDE (this, however, is always highly
A.2. Random walks on variable domains
inefficient). The problems of the other approaches result from the
domain Ω , which is not of tensor-product type, but rather the
state-range bmin (t) < x < bmax (t) changes for every t. Diederich As discussed in Section 5.3 of Diederich and Oswald (2016),
and Oswald (2016) tackled this problem by considering a brick- the random walk approach must be modified to handle time-
like discretization of the domain, including uniform state steps of dependent thresholds. Without such a modification, the PDFs will
size ∆x until the threshold is reached. A technical modification show spikes whenever the threshold b(t) crosses the next mesh
is required if the bricks do not precisely represent the thresh- layer of size ∆x (see Fig. 13 for a visualization of this defect).
old. One such approach is provided in Section 5.3 of Diederich Similar to the procedure described in Diederich and Oswald
and Oswald (2016), and a brief explanation is provided in Ap- (2016), we propose a simple modification that is based on split-
pendix A.2. Although robust results can be achieved, we still ting those control volumes of size ∆t × ∆x that are cut by the
threshold (see Fig. 14) into a fraction of size δn ∆x that is within
observe a drop in accuracy, as discussed in Section 3.5. First,
the domain and a fraction (1 − δn )∆x outside. The transport
in Appendix A.1, we will show how the KFE can be applied
of probabilities is then achieved in two steps (see Fig. 14 for a
to the case of time-dependent thresholds without any loss in
visualization). First, we step forward n ↦ → n + 1 such as described
approximation quality.
in Section 2.4 in all elements that are within the domain or that
are cut by the threshold b(t), that is,
A.1. Solving the Kolmogorov forward equation with time-dependent
thresholds in optimal complexity by a reference map Pn+1,m = an,− · Pn,m−1 + an,0 · Pn,m + an,+ · Pn,m+1 .
Fig. 12. Mapping from the fixed reference domain (left) with thresholds (−1, 1) to the real domain with variable threshold (bmin (t), bmax (t)). The KFE is discretized
in the fixed domain on the left side, but the solution is transformed into the variable domain on the right.
Fig. 13. Known defect of the random walk approach that causes spikes whenever the time-dependent threshold b(t) crosses a discretization line. The left panel
visualizes this for the unmodified random walk, the KFE solution (that does not suffer from this defect), and a simple modification presented in the present
Appendix A.2, that yields rather smooth results. The right panel provides a magnification of the interval t ∈ [25, 50].
written as
pn + θ ∆tApn = pn−1 + (θ − 1)∆tApn−1 ,
with a parameter θ ∈ [0, 1] and a matrix A given as
1 0 0 0 ··· 0
⎛ ⎞
σ2 µ
⎜α1 α2 α3 0 ··· 0 α1 = − −
2∆x2 2∆x
⎟
⎜ .. .. ⎟
⎜0
⎜
α1 α2 α3 . .
⎟
σ2
A=⎜ ⎟, α2 =
⎟
⎜ .. .. .. .. .. ∆x2
⎜. . . . .
⎟
0
σ2 µ
⎟
0 ··· 0 α1 α2 α3 α3 = −
⎝ ⎠
+
0 ··· 0 0 0 1 2∆x2 2∆x
(44)
The matrix is tridiagonal, meaning that each row has only the
Fig. 14. Modification of the random walk approach to time-dependent thresh- values α2 on the diagonal, α1 left of it, and α3 on the right of the
olds. First, probabilities are transported to the next time step via the usual diagonal (the first and the last rows differ due to the thresholds).
n , and pn . Then, the cut elements are modified. Only δn+1 of the
factors pln , pm u
For θ = 0, this is the explicit Euler method that yields the random
probability remains and (1 − δn+1 ) is transported across the threshold.
walk approach, and the method is called explicit as the new
probability vector pn is given by a simple matrix–vector product
(I is the unit matrix in RM +1 )
Appendix B. Stability and efficiency of time-stepping methods The method is implicit whenever θ > 0, which means that a
linear system of equations has to be solved in each step
(I + θ ∆tA)pn = (I + (θ − 1)∆tA)pn−1 .
In this appendix, we briefly consider different time discretiza- For θ = 1, this is the implicit Euler method used by Shinn
tions of the KFE with time-independent parameters. By pn = et al. (2020) (for time-dependent thresholds), and for θ = 12 ,
(pn,0 , . . . , pn,M ) ∈ RM +1 we denote the probabilities at discrete this is the trapezoidal rule, sometimes also called Crank–Nicolson
time tn . Then, one step of all PDE methods can be uniformly method, that is used in the present paper and by fast-dm (for
19
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
the KBE). While it is generally more costly to solve an implicit The benefit lies in the high-speed processing of the C++-backend.
equation, the tridiagonal form of the matrix A makes it simple For moderate vector sizes, the actual NumPy computations re-
here, and the effort of the implicit methods using the Thomas quire less time than the instantiation of the NumPy backend from
algorithm (Quarteroni et al., 2007, Section 3.7.1) is just twice the Python. This is the reason behind the apparently linear (instead
effort of the explicit ones. of quadratic) scaling of (45) for moderate values of n. While all
To obtain stable solutions without any oscillations, the meth- implementations would benefit from direct coding in C or C++,
ods have to satisfy certain conditions on their parameters. Most the advantage is only small compared to using NumPy.
important, the explicit Euler method must fulfill the parabolic
time-step requirement, with (44) C.2. Implementation of the KBE in the time-dependent setting
∆x 2
1 − ∆t α 2 > 0 ⇔ ∆t < . In Section 2.3.1 we have detailed on the disadvantage of the
σ2 KBE for time-dependent problems (see also Boehm et al., 2021,
Further, all methods must satisfy the sign condition that for a discussion). To determine the whole distribution up to the
final time T = tN , N problems of type (12) must be approximated
1 + θ ∆t α2 > 0 and θ ∆t αi ≤ 0 for i = 1, 3. up to t1 , t2 , . . . , tN , respectively. This would result in a quadratic
While the first condition is satisfied for all θ ≥ 0 and ∆t > 0, the scaling concerning 1/∆t. While this non-optimal scaling cannot
second condition is only satisfied when be circumvented in the general case, we sketch the idea for an
efficient implementation based on the fast NumPy backend as dis-
σ2 cussed in the previous section. Denoting by ps for s ∈ {1, . . . , N }
µ≤ .
∆x the solution for final time ts , a Crank–Nicolson discretization will
Violation of this condition means that the problem is too much call for the solution of the linear systems backward in time from
drift-dominated which leads to the instabilities described in Case ts to t0
III in Section 4. s = 1, 2, . . . , N : As−n psn = Bs−n psn−1 , n = 1 , . . . , s, (47)
Finally, implicit methods are stable for all time steps when θ ≥
1
which includes the implicit Euler method and the trapezoidal where the matrices As−n , Bs−n ∈ R M ×M
come from the dis-
2
rule. However, the limit case θ = 12 is still not sufficiently stable cretization of the state space. Eq. (47) reveals the quadratic (in
to reduce oscillations coming from rounding errors, which are un- N = O(1/∆t)) effort with ever-changing matrices that need to
avoidable in computer simulations. At the same time, enhanced be reassembled. By introducing the reversed index r = s − n we
stability and second-order convergence are given for the choice can rewrite (47) as forward in time problems from t0 to tr
θ = 21 + ∆t. This modified trapezoidal rule is used in our KFE r = N − 1, . . . , 1, 0 : −1 ,
Ar pnr +n = Br prn+ n = 1, . . . , N − r . (48)
n
approach (see also Luskin et al., 1982).
Now, the matrices Ar , Br are reused to solve N − r linear systems
Appendix C. Realization in python each. Eq. (48) can be written as a sequence of the linear matrix
equation.
r = N − 1, . . . , 1, 0 :
C.1. Efficient implementation
Ar [pr1+1 , pr2+2 , . . . , pNN ] = Br [pr0+1 , pr1+2 , . . . , pNN −1 ].
In this Appendix, we briefly comment on the better-than- After N steps, the solution matrix (p11 , p22 , . . . , pNN ) contains the
expected performance of several approximation methods used solutions to the KBE for all final times t1 , t2 , . . . , tN . Formally
in Section 4. For instance, both random walks and the Crank– the complexity has not changed, but the impact on a practical
Nicolson discretization of the KFE presented in Fig. 6 show a implementation is substantial, as NumPy will solve each matrix
3
complexity that is by far superior to the postulated O(ϵ − 2 ) in equation with one call within the C++-backend. The effective
the case of random walks and O(ϵ ) for the KFE. The same is
−1
runtime will appear linear in N · M as long as M and N are of
true for the integral equation method that only for very small moderate size, which is the optimal complexity that is also given
3
step sizes exhibits the expected O(ϵ − 2 ) complexity. The reason for the Crank–Nicolson discretization of the KFE.
for this lies in using NumPy (Harris et al., 2020) for all numerical
operations acting on matrices and vectors. We explain this be- References
havior for the integral equation method. For simplicity, we show
the single-threshold case (compare (20)): Berlyne, D. (1957). Conflict and choice time. British Journal of Psychology, 48,
106–118.
n−1 Boehm, U., Cox, S., Gantner, G., & Stevenson, R. (2021). Fast solutions for the
∑
Pn (x) = −2Ψ0n (x) + 2∆t Pk (x)Ψkn , n = 1, 2, . . . , N . (45) first-passage distribution of diffusion models with space-time-dependent
drift functions and time-dependent boundaries. Journal of Mathematical
k=1
Psychology, 105, Article 102613. http://dx.doi.org/10.1016/j.jmp.2021.102613.
Here, Pn (x) is the probability at time tn for the initial x and Buonocore, A., Giorno, V., Nobile, A., & Ricciardi, L. (1990). On the two-
boundary first-crossing-time problem for diffusion processes. Journal of
Ψ0n (x) = Ψ [b(tn ), tn |x, 0], Ψkn (x) = Ψ [b(tn ), tn |b(tk ), tk ], Applied Probability, 27, 102–114.
Buonocore, A., Nobile, A., & Ricciardi, L. (1987). A new integral equation for
n = 1, . . . , N , k = 1, . . . , n − 1. (46)
the evaluation of first-passage-time probability densities. Advances in Applied
The effort lies in the repeated evaluation of the kernel func- Probability, 19, 784–800.
Busemeyer, J., & Rapoport, A. (1988). Psychological models of deferred decision
tion (4) for each n, k, and x. The sum over k in (45) brings
making. Journal of Mathematical Psychology, 32, 91–134.
along a quadratic scaling in O(1/∆t 2 ). However, if we treat P ∈ Churchland, A., Kiani, R., & Shadlen, M. (2008). Decision-making with multiple
RN +1×M +1 and Ψ n ∈ RM +1×N +1 as matrices, the entries of (46) alternatives. Nature Neuroscience, 11, 693–702.
can be computed by just one single call to NumPy. Then, the sum de Jong, R., Liang, C.-C., & Lauber, E. (1994). Conditional and unconditional
is given by a matrix–vector product automaticity: A dual-process model of effects of spatial stimulus-response
correspondence. Journal of Experimental Psychology: Human Perception and
Pn = −2Ψ0n + 2∆t Ψ n P ⏐1,...,k .
⏐
Performance, 20, 731–750.
20
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Diederich, A., & Busemeyer, J. (2003). Simple matrix methods for analyzing Luskin, M., Rannacher, R., & Wendland, W. (1982). On the smoothing property
diffusion models of choice probability, choice response time, and simple of the Crank-Nicholson scheme. Applicable Analysis, 14, 117–135.
response time. Journal of Mathematical Psychology, 47, 304–322. Mattler, U., & Palmer, S. (2012). Time course of free-choice priming effects
Diederich, A., & Oswald, P. (2016). Multi-stage sequential sampling models explained by a simple accumulator model. Cognition, 123, 347–360.
with finite or infinite time horizon and variable boundaries. Journal of McClelland, J. L. (1979). On the time relations of mental processes: An ex-
Mathematical Psychology, 74, 128–145. amination of systems of processes in cascade. Psychological Review, 86,
Ditterich, J. (2006a). Evidence for time-variant decision making. European Journal 287–330.
of Neuroscience, 24, 3628–3641. Nelder, J., & Mead, R. (1965). A simplex method for function minimization.
Ditterich, J. (2006b). Stochastic models of decisions about motion direction: Computer Journal, 7, 308–313. http://dx.doi.org/10.1093/comjnl/7.4.308.
Behavior and physiology. Neural Networks, 19, 981–1012. Palmer, J., Huk, A., & Shadlen, M. (2005). The effect of stimulus strength on the
Durbin, J. (1971). Boundary-crossing probabilities for the Brownian motion speed and accuracy of a perceptual decision. Journal of Vision, 5, 376–404.
and Poisson processes and techniques for computing the power of the Pratte, M., Rouder, J., Morey, R., & Feng, C. (2010). Exploring the differences in
Kolmogorov-Smirnov test. Journal of Applied Probability, 8, 431–453. http: distributional properties between Stroop and Simon effects using delta plots.
//dx.doi.org/10.2307/3212169. Attention, Perception, & Psychophysics, 72, 2013–2025.
Eriksen, B., & Eriksen, C. (1974). Effects of noise letters upon the identification Quarteroni, A., Sacco, R., & Saleri, F. (2007). Numerical mathematics (2nd ed.).
of a target letter in a nonsearch task. Perception & Psychophysics, 1, 143–149. Springer.
Evans, L. (2010). Graduate studies in mathematics: vol. 19, Partial differential Rannacher, R. (1982). Discretization of the heat equation with singular initial
equations (2nd ed.). Americal Mathematical Society. data. Zeitschrift für Angewandte Mathematik und Mechanik, 62, T346 – T348.
Evans, N. J., Hawkins, G. E., & Brown, S. D. (2020). The role of passing time Rannacher, R. (1984). Finite element solution of diffusion problems with iregular
in decision-making. Journal of Experimental Psychology: Learning Memory and data. Numerische Mathematik, 43, 309–327.
Cognition, 46, 316–326. http://dx.doi.org/10.1037/xlm0000725. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108.
Forstmann, B., Ratcliff, R., & Wagenmakers, E.-J. (2016). Sequential sampling Ratcliff, R., Smith, P., Brown, S., & McKoon, G. (2016). Diffusion decision model:
models in cognitive neuroscience: Advantages, applications, and extensions. Current issues and history. Trends in Cognitive Sciences, 20, 260–281.
Richter, T. (2017). Fluid-structure interactions. Models, analysis and finite el-
Annual Review of Psychology, 67, 641–666.
ements. Lecture notes in computational science and engineering: vol. 118,
Fort, E., & Fankel, S. (1953). Stability conditions in the numerical treatment of
Springer.
parabolic differential equations. Mathematical Tables and Other Aids to Compu-
Richter, T., Ulrich, R., & Janczyk, M. (2023). Diffusion models with time-dependent
tation, 7, 135–152.
parameters: An analysis of computational effort and accuracy of different
Giles, M., & Carter, R. (2006). Convergence analysis of Crank–Nicolson and
numerical methods. Zenodo, http://dx.doi.org/10.5281/zenodo.6970739.
Rannacher time-marching. Journal of Computational Finance, 9, 89–112. http:
Schwarz, W. (2022). Random walk and diffusion models: An introduction for life
//dx.doi.org/10.21314/JCF.2006.152.
and behavioral scientists. Springer.
Grossmann, C., Roos, H.-G., & Stynes, M. (2007). Numerical treatment of partial
Schwarz, W., & Miller, J. (2012). Response time models of delta plots with
differential equations. Springer.
negative-going slopes. Psychonomic Bulletin & Review, 19, 555–574.
Hackbusch, W. (1995). Integral equations. Theory and numerical treatment. In
Shinn, M., Lam, N., & Murray, J. (2020). A flexible framework for simulating
International series of numerical mathematics: vol. 120, Birkhäuser.
and fitting generalized drift-diffusion models. ELife, 9, Article e56938. http:
Hairer, E., Nørsett, S., & Wanner, G. (2008). Solving ordinary differential equations
//dx.doi.org/10.7554/eLife.56938.
I (3rd ed.). Springer.
Simon, J. R. (1969). Reactions towards the source of stimulation. Journal of
Hanks, T., Mazurek, M., Kiani, R., Hopp, E., & Shadlen, M. (2011). Elapsed decision
Experimental Psychology, 55, 270–279.
time affects the weighting of prior probability in a perceptual decision task.
Smith, P. L. (1995). Psychophysically principled models of visual simple reaction
The Journal of Neuroscience, 27, 6339–6352.
time. Psychological Review, 102, 567–593.
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P.,
Smith, P. L. (2000). Stochastic dynamic models of response time and accuracy:
Cournapeau, D., et al. (2020). Array programming with NumPy. Nature,
A foundational primer. Journal of Mathematical Psychology, 44, 408–463.
585(7825), 357–362. http://dx.doi.org/10.1038/s41586-020-2649-2. Smith, P. L. (2023). ‘‘Reliable organisms from unreliable components’’ revisited:
Hartmann, R., & Klauer, K. (2021). Partial derivatives for the first-passage time The linear drift, linear infinitesimal variance model of decision making.
distribution in Wiener diffusion models. Journal of Mathematical Psychology, Psychonomic Bulletin & Review, http://dx.doi.org/10.3758/s13423-022-02237-
103, Article 102550. 3.
Hawkins, G., Forstmann, B., Wagenmakers, E., Ratcliff, R., & Brown, S. (2015). Smith, P. L., & Lilburn, S. D. (2020). Vision for the blind: Visual psychophysics
Revisiting the evidence for collapsing boundaries and urgency signals in and blinded inference for decision models. Psychonomic Bulletin & Review,
perceptual decision-making. The Journal of Neuroscience, 35, 2476–2484. 27, 882–910. http://dx.doi.org/10.3758/s13423-020-01742-7.
Heath, R. A. (1992). A general nonstationary diffusion model for two-choice Smith, P. L., & Ratcliff, R. (2009). An integrated theory of attention and decision
decision-making. Mathematical Social Sciences, 23, 283–309. http://dx.doi.org/ making in visual signal detection. Psychological Review, 116, 283–317.
10.1016/0165-4896(92)90044-6. Smith, P. L., & Ratcliff, R. (2021). Modeling evidence accumulation decision pro-
Janczyk, M., Naefgen, C., & Kunde, W. (2020). Are freely chosen actions generated cesses using integral equations: Urgency-gating and collapsing boundaries.
by stimulus codes or effect codes? attention. Perception, & Psychophysics, 82, Psychological Review, 129, 235–267. http://dx.doi.org/10.1037/rev0000301.
3767–3773. Smith, P. L., Ratcliff, R., & Sewell, D. K. (2014). Modeling perceptual discrimina-
Johnson, C. (2009). Numerical solution of partial differential equations by the finite tion in dynamic noise: Time-changed diffusion and release from inhibition.
element method. Dover Publications. Journal of Mathematical Psychology, 59, 95–113. http://dx.doi.org/10.1016/j.
Jones, M., & Dzhafarov, E. N. (2014). Unfalsifiability and mutual translatability of jmp.2013.05.007.
major modeling schemes for choice reaction time. Psychological Review, 121, Ulrich, R., Schröter, H., Leuthold, H., & Birngruber, T. (2015). Automatic and
1–32. http://dx.doi.org/10.1037/a0034190. controlled stimulus processing in conflict tasks: Superimposed diffusion
Katsimpokis, D., Hawkins, G., & van Maanen, L. (2020). Not all speed-accuracy processes and delta functions. Cognitive Psychology, 78, 148–174.
trade-off manipulations have the same psychological effect. Computational Ulrich, R., Schröter, H., Leuthold, H., & Birngruber, T. (2016). Corrigendum to
Brain & Behavior, 3, 252–268. Automatic and controlled stimulus processing in conflict tasks: Superim-
Kelley, C. (1999). Iterative methods for optimization. SIAM, http://dx.doi.org/10. posed diffusion processes and delta functions (Cognitive Psychology, 2015,
1137/1.9781611970920. 78, 148–174). Cognitive Psychology, 91, 150.
Kiani, R., & Shadlen, M. (2009). Representation of confidence associated with Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The
a decision by neurons in the parietal cortex. Science, 324, 759–764. http: leaky, competing accumulator model. Psychological Review, 108, 550–592.
//dx.doi.org/10.1126/science.1169405. Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Courna-
Kloeden, P., & Platen, E. (1999). Applications of mathematics: vol. 23, Numerical peau, D., et al. (2020). SciPy 1.0: Fundamental algorithms for scientific
solution of stochastic differential equations. Springer, Corrected Third Printing. computing in Python. Nature Methods, 17, 261–272. http://dx.doi.org/10.
Koob, V., Ulrich, R., & Janczyk, M. (2023). Response activation and activation- 1038/s41592-019-0686-2.
transmission in response-based backward crosstalk: Analyses and sim- Voskuilen, C., Ratcliff, R., & Smith, P. L. (2016). Comparing fixed and collapsing
ulations with an extended diffusion model. Psychological Review, 130, boundary versions of the diffusion model. Journal of Mathematical Psychology,
102–136. 73, 59–79. http://dx.doi.org/10.1016/j.jmp.2016.04.008.
Lerche, V., & Voss, A. (2019). Experimental validation of the diffusion model Voss, A., Lerche, V., Mertens, U., & Voss, J. (2019). Sequential sampling models
based on a slow response time paradigm. Psychological Research, 83, with variable boundaries and non-normal noise: A comparison of six models.
1194–1209. Psychonomic Bulletin and Review, 26, 813–832.
21
T. Richter, R. Ulrich and M. Janczyk Journal of Mathematical Psychology 114 (2023) 102756
Voss, A., Nagler, M., & Lerche, V. (2013). Diffusion models in experimental Wanner, G. (2010). Kepler, Newton and numerical analysis. Acta Numerica,
psychology. A practical introduction.. Experimental Psychology, 60, 385–402. 561–598.
Voss, A., & Voss, J. (2007). Fast-dm: A free program for efficient diffusion model White, C., Ratcliff, R., & Starns, J. (2011). Diffusion models of the flanker
analysis. Behavior Research Methods, 39, 767–775. task: Discrete versus gradual attentional selection. Cognitive Psychology, 63,
Voss, A., Voss, J., & Lerche, V. (2015). Assessing cognitive processes with diffusion 201–238.
model analyses: A tutorial based on fast-dm-30. Frontiers in Psychology, Øksendal, B. (2000). Stochastic differential equations: An introduction with
6(336). applications (Fifth Edition, Corrected Printing ed.). Springer.
22