Comp05 Diederich (2023)
Comp05 Diederich (2023)
COMPUTATIONAL
MODELING
Adele Diederich
Over the last 50 years or so, computational specific technics of computational modeling,
modeling has become a rapidly growing approach a psychological journal—Computational Brain and
in many scientific disciplines. In psychology, Behavior—has been founded recently as another
computational modeling is more recent but has outlet for the growing work in this field.
Copyright American Psychological Association. Not for further distribution.
515
APA Handbook of Research Methods in Psychology: Research Designs: Quantitative, Qualitative,
Neuropsychological, and Biological, edited by H. Cooper, M. N. Coutanche, L. M. McMullen, A. T.
Panter, D. Rindskopf, and K. J. Sher
Copyright © 2023 American Psychological Association. All rights reserved.
Adele Diederich
probabilistic, linear or nonlinear in nature, and equations, such as power function decay, linear
so forth. They are designed to describe internal combinations of inputs to produce activation
presentations, processes, functions, mechanisms, values, and mathematical reinforcement learning
and structures. The goal of a formal model is to rules. But they are also heavily reliant on produc-
derive predictions which can connect to data tion rules. That is, technically, they could be
observed in the real world. These data are often formalized as mathematical models (dynamic
obtained in experiments. Interpreting data in systems, stochastic dynamic systems, finite-state
light of the model’s prediction is crucial for the machines, Markov processes); however, in practice,
modeling process and often leads to a modification it is difficult to see the mathematics in some of
of the model. This entire process is referred to as them, such as pure production rule systems.
mathematical modeling. When the phenomena With this description of a computational
of interest stem from psychological questions, approach in mind, are mathematical models
the area is called mathematical psychology then a subset of computational models, as Sun
(Chapter 22, this volume), and the process often (2008b) claimed? Or are they merely a tool in
is called cognitive modeling. mathematical psychology, one of many methods
Sometimes, a mathematical model representing that can be applied in the modeling process, with
the phenomena or situation is so complex that the advantage that data can be simulated easily
Copyright American Psychological Association. Not for further distribution.
an analytical solution is not readily available and extremely quickly? After all, computational
or a closed-form solution does not even exist. models require a mathematically and logically
Other times, experiments are too expensive or formal representation of the problem and could
may not even be feasible to conduct. For those therefore be considered as a subset of math-
cases, a computational approach is considered as ematical models.
a valuable alternative. Computational approaches In light of the many disciplines involved
develop computer algorithms and programs, in computational modeling research, it seems
implement them on a computer, and the computer impossible to come up with an agreed-upon
simulations derive the predictions of the model definition. There are interesting discussions going
and also generate data. Instead of deriving an on in some disciplines outside of psychology.
analytical solution to the problem, a computer Indeed, an entire subfield of computer science
simulation—that is, changing the parameters studies the relationships and differences
of the system in the computer—provides the between computational and mathematical
basis for studying and evaluating the model by models. I am not dwelling on it here further
comparing the simulated data to the outcome (Diederich & Busemeyer, 2012) and agree with
of the experiments. Wilson and Collins (2019) that computational
Many computational models are computer modeling in behavioral science use precise
simulations of complex mathematical equations mathematical models.
that cannot be solved in a simple form and Many formal models in (cognitive) psychology
must be simulated by computer. Some other are functional in nature. They are built to describe
computational models are simply based on if–then and explain psychological mechanisms and
production rules and are harder to see as processes and to predict (human) performance—
mathematical models. Technically, they are finite- that is, what can be expected to happen under
state machines, or if a probabilistic selection of various conditions. The (cognitive) architecture
rules is applied, they technically become Markov of a model reflects its structure and computa-
models in a large state space. However, these types tional rules, that is, its functional organization.
of computational models are never expressed Cognitive architectures are classified as symbolic,
mathematically, even though they could be. subsymbolic (connectionist) or hybrid depending
Models such as Act-R (Lebière & Anderson, 1993) on the assumed properties of the models and the
or Clarion (Sun, 2009) use some mathematical rules applied to the system.
516
Computational Modeling
Symbolic modeling has its origin in computer possibly modify our assumptions to give a better
science, in particular, artificial intelligence, which account of the data. Formal modeling has several
focuses on building enhanced intelligence into advantages over a merely verbal modeling approach.
computer systems. Patterns of reasoning, such First, it forces the researcher to give precise
as logical operations or processing systems with definitions and clear statements. This requires a
their respective specified rules, operate by using high degree of abstraction: Assumptions about
symbols. From early on, these models have underlying processes, relations between entities,
attracted a broad range of disciplines from interactions between variables, and so on, all need
philosophy to engineering. They have been applied to be mapped onto mathematical objects and
to semantic representation, language processing, operations. The language of mathematics minimizes
knowledge representation, reasoning, speech the risk of making contradictory statements
recognition, computer vision, robotic, various in the theory. Second, formal modeling allows
expert systems, and many more (see Nilsson, deriving precise predictions from the underlying
2010, for a history and achievement of artificial assumptions, thereby enabling empirical falsifica-
intelligence with a focus on engineering, and tion of these assumptions. Furthermore, deriving
Boden, 2006, for a comprehensive work on a predictions is particularly important and useful
history of cognitive science, which also include when they are not obvious. Testable predictions
Copyright American Psychological Association. Not for further distribution.
the philosophical perspective of computational may be useful for deciding between competing
approaches). theories of a given phenomenon. They are not
Subsymbol or connectionist models use necessarily quantitative but can also reflect
the analogy to neural networks as observed in qualitative patterns, which can be observable in
the human brain. Activation patterns among the data. Third, mathematical modeling brings
large numbers of processing units (neurons) encode together theory and data; it facilitates the analysis
knowledge and specific content. Connectionist and interpretation of complex data and helps
models are developed in many scientific disciplines generating new hypotheses. Fourth, even rather
from computer science to neuroscience, natural simple mathematical models often describe data
science, cognitive science, and behavioral sciences. better and are more informative than a statistical
They have been applied to speech recognition test of a verbally phrased hypothesis. Finally,
and speech generation, predictions of financial formal models can provide a unifying language and
indices, identification of cancerous cells, automatic methodology that can be used across disciplines
recognition to handwritten characters, sexing ranging from experimental psychology to cognitive
of faces and many more. Hybrid architectures science, computer science, and neuroscience.
combine both types of processing and become
more interesting for cognitive modelers (Diederich
USAGE OF COMPUTATIONAL MODELS
& Busemeyer, 2012). Two simple connectionist
types of models for categorical learning serve Wilson and Collins (2019) identified the appli-
as examples of the modeling process later. cation of computational models predominantly
in four different fields: simulation, parameter
estimation, model comparison, and latent variable
THE ADVANTAGE OF FORMAL MODELS
inference. These are parts of the modeling process.
AND MODELING
How to design a good computational/mathematical
Modeling is part of the scientific method (e.g., model is a different topic and goes beyond the
Hepburn & Andersen, 2021; van Rooij & Baggio, scope of this chapter.
2021; Voit, 2019). Roughly, starting with an
observation or a general theory, we formulate Simulation
assumptions (or hypotheses), derive predictions, Simulation, in this context, refers to generating
test them in the laboratory or in the field, and (a) artificial data by inserting specific parameter
517
Adele Diederich
values into the model, (b) qualitative and quanti- decisive step in model falsification, showing that
tative predictions of the model by inserting a wide a computational cognitive model is unable to
range of different parameter values (combinations), account for a specific (behavioral) phenomenon.
or (c) a combination of both. Simulation is This becomes even more important when several
particularly important in the model-building phase model candidates are compared (see the Model
(Fan, 2012). Wilson and Collins (2019) suggested Comparison section). Navarro (2019, p. 234)
also including qualitative properties of the model; argued that showing how the qualitative patterns
it may give the modeler a better intuition for in the empirical data emerge from a computational
the model’s behavior and may allow for specific model is often more scientifically useful than
predictions. For instance, take the popular diffusion presenting a quantified measure of its performance.
model for binary choice options, which accounts Simulation also includes parameter recovery;
simultaneously for (mean) choice response that is, when artificial data are produced with a
times and choice frequencies (Ratcliff, 2012). specific set of parameters, and the model is fitted
It assumes that all the information of the stimuli to those data (see below), the estimated parameters
(provided in attributes or dimensions) is mapped should be close to the ones that generated the
onto one so-called drift rate, µ. It reflects the mean artificial data.
tendency to choose one alternative over the other.
Copyright American Psychological Association. Not for further distribution.
518
Computational Modeling
for making a decision (short and long) and nine Maximum likelihood parameter estimation
gambles (probability and value combinations), seeks to find those parameters that maximize the
Diederich and Trueblood (2018) minimized likelihood of the chosen model for the given set of
the function data. This method plays a major role in Bayesian
parameter estimation (Feinberg & Gonzales, 2007;
72 RTobs − RT pred 36 Probs − Pr pred
i i 2 i i 2
519
Adele Diederich
Adding more parameters to the model typically include point or interval predictions, predictive
gives a better fit of the data. To give a simple regions, and predictive distributions (p. 291).
and intuitive example: the higher the degree of When a model is used as measurement model,
a polynomial model is, the better it can be fitted it is important to know whether the parameters
to even very noisy data. Several goodness-of-fit are identifiable—that is, whether the set of
measures take the number of parameters into parameter values can be determined from the set
account. They penalize models with many of data and whether the model can recover the
parameters as compared to models with fewer parameters. Parameter recovery includes the
parameters. Example of measures are listed in following steps: The computational model simu-
the next section on model comparison. lates data with known parameter values. Then the
The functional form is defined as the way model is fitted to these simulated data. Typically,
in which parameters are combined in the model this procedure is repeated for several sets of
equation. For instance, for psychophysical models parameters. Ideally, the estimated parameters are
mapping the intensity of a physical stimulus, I, identical (or close) to the ones that generated the
onto the perceived magnitude, Steven’s power law, data (see, e.g., Hübner & Pelzer, 2020; Kandil
ψ(I) = k • Ia, is more complex than Fechner’s et al., 2014; van Ravenzwaaij & Oberauer, 2009;
law with the same number of parameters, for examples of parameter recovery studies for
Copyright American Psychological Association. Not for further distribution.
520
Computational Modeling
(BIC; Schwarz, 1978). The AIC for model i with is defined as F* = χ2 − df. The degrees of freedom
ki free parameters is are defined in terms of the number of experi
mental conditions, c, number of response bins, b,
i + 2ki
AICi = −2 LL (23.4) and number of free parameters, k—that is,
υ = c(2b − 1) − k. The number of participants, N,
where LL i is the maximized value of the log- is also included in the denominator:
likelihood (Equation 23.3) for the ith-model with
estimated parameter values ûi that maximized max ( F*, 0 )
R* = . (23.7)
the likelihood function. The model with the υ ( N − 1)
smallest AIC should be chosen.
The BIC is similar to the AIC except that it Obviously, the smaller R*, the better is the
also includes the number of observations n: model fit. If χ2 is smaller than υ, zero is taken.
Summaries and evaluation of goodness-of-fit
i + ki ln n.
BICi = −2 LL (23.5) measures are provided by, for instance, Evren and
Tuna (2012) based on statistical entropy, Steiger
For fewer than eight observations (exp(2) = 7.39), (n.d.), and many more. Farrell and Lewandowsky
BIC is smaller than AIC and grows slowly as (2018) described Bayesian model comparison
Copyright American Psychological Association. Not for further distribution.
n increases (ln(1000) = 6.91). For a detailed using Bayes factors in Chapter 11 of their book.
discussion of both measures with respect to Besides comparing competing models on the
the philosophical background (information basis of goodness-of fit measures, several other
theory versus Bayesian theory) and statistical methods have been proposed, like cross-validation
properties see Burnham and Anderson (2004) procedures likelihood ratio tests, in particular for
and Vrieze (2012). nested models, and many more (Nezlek, 2012;
Another goodness-of-fit measures is the RMSEA Rindskopf, 2012).
(R*; Browne & Cudeck, 1992; Steiger, 1990):
EXAMPLE OF THE MODELING
F*
R* = . (23.6) PROCESS WITH TWO COMPETING
υ
CONNECTIONIST MODELS
F* is the Population Noncentrality Index The following simple example demonstrates with
(PNI), a measure of badness-of-fit (Steiger, 1990). two competing connectionist models for categorical
As Steiger (n.d., p. 5) pointed out, model com- learning how the modeling process is performed.
plexity is directly related to the number of free Categorizing (perceptual) objects or patterns
parameters and inversely to the number of degrees in distinct classes is important to many real-life
of freedom (υ). Therefore, dividing the PNI situations. Does the functional magnetic resonance
by the degrees of freedom accounts for model imaging scan indicate disease A or B or none?
complexity and taking the square root of the ratio Are these cells cancerous or not? Is this a female
returns the index to the same metric as the original or male face? What ethnic group do these people
standardized parameters (Steiger, n.d., p. 5). belong to? Often, people are very successful in
For example, Schubert et al. (2017) used this performing these and similar tasks, but sometimes
measure when fitting quantiles of a diffusion model they are not. How do we learn to categorize objects,
to the empirical RT quantiles (Ratcliff, 2012). and how do we generalize our knowledge to
Using a χ2 statistics (the squared difference objects we have not seen before?
between the observed and predicted quantile
values, divided by the predicted quantile values Step 1
and then summed up, slightly different from The first step in the modeling process to answer
Equation 23.1) as objective function, the PNI these questions is to come up with a conceptual
521
Adele Diederich
theoretical framework. This requires creativity may be performed. The units can be inter-
on the part of the researcher and involves hard preted as natural and artificial neurons and
work. For the current demonstration two exist- groups of neurons. For cognitive models, the
ing and competing theoretical frameworks for input units may represent perceptual features,
categorization are taken: a prototype model and letters, faces, and so on, and output units may
an exemplar model. According to the prototype represent words, phonemes, or ethnic groups.
model, some members of a category are more All the processing is carried out by these
central than others. The person extracts the central units. The system is parallel, as many simple
tendency (sometimes referred to as characteristic operations are carried out simultaneously.
features) of the examples presented during a Units in a computer simulation are virtual
learning phase and uses these characteristic entities and usually presented by circles.
features to form a prototype, which serves as basis 2. A state of activation for each unit, ai, at a
for categorizing new objects. That is, when a new given time t,ai(t). The state of a set of units
target object is presented, it is compared with at time t are organized in a vector, a(t) =
the prototypical object of each category, and the (a1(t), a2(t), . . . , ai(t) . . . , an(t)). The activation
category with the most similar prototype is chosen. values can be any numeric value but often they
According to the exemplar model, the learner are real numbers bounded between 0 and 1.
Copyright American Psychological Association. Not for further distribution.
stores specific instances (exemplars) for each The analogy to neural activity is the neuron’s
category. When a new target stimulus is presented, firing rate (rate of action potentials). A zero
the similarity of the target to each stored example would indicate the least and a one the most
is computed for each category, and the category possible activity of a neuron.
with the greatest total similarity is chosen. 3. The pattern of connectivity. To make a network,
units need to be connected. If units are analo-
Step 2 gous to neurons, connections are analogous
The second step is to describe the objects in an to synapses. Connections are represented with
abstract formal way, translate the assumptions lines and arrows indicate the flow of informa-
into equations, and describe the response also in tion from one unit to the next. In a standard,
a rigorous form. Here, we take a connectionist three-layer feedforward network (described
version of a prototype model and a connectionist later), activation is sent from all input units
version of an exemplar model (e.g., Nosofsky to all hidden units to all output units in a
et al., 1992). There has been a long debate about single direction—that is, a directed graph
which of these two models (prototype versus with nodes and intermodal connections.
exemplar) best represents category learning, The strength or weakness of a connection
and some question whether it is possible to between any two units determines the extent
distinguish empirically. to which the activation state of one unit can
affect the activation state of another unit and
Digression. Before describing the models in
can be measured by a connection weight, w.
more detail, we define the general framework
The connections weights of all units are orga-
of key features for connectionist processing
nized in a matrix W = ||wij||. Often connection
(Rumelhart et al., 1986; Thomas & McClelland,
weights are real numbers between −1 and 1.
2008) to fix some notation.
High connection weights represent a strong
1. A set of processing units ui, organized in layers connection while low weights represent a
and often divided into input units, which weak connection, analogous to excitatory and
receive the information to be processed; output inhibitory synapses.
units, which provide the results of the pro- 4. The propagation rule. This rule determines how
cessing; and hidden units, in which specific activation is propagated through the network.
computations necessary for the entire process The activation values of the sending units
522
Computational Modeling
are combined with the connection weights to digms refer to models of the environment in
produce the net input into the receiving units, which the neural network operates. Any given
usually by a linear function. That is, the inputs network architecture can usually be employed
from all sending units are multiplied by the in any given learning paradigm.
connection weights and summed to get the 7. Network architectures. There are three
overall input of the receiving units, that is, fundamentally different classes of network
the net input for the receiving units is net(t) = architectures:
W • a(t). The net input for a specific unit, i, ■ Single-layer feedforward networks. The input
F(x) = sgn(x), producing binary (±1) output; more hidden layers or hidden units are
Copyright American Psychological Association. Not for further distribution.
F(x) = (sgn(x) + 1)/2, producing binary (0/1) sandwiched between the input layer and
output; F(x) = (1 + e−x)−1, the sigmoidal the output layer. The hidden units inter-
(logistic) nonlinearity, producing output vene between the external input and the
between 0 and 1; F(x) = tanh(x), producing network output in some way. Typically,
output between −1 and 1; and some other the units in each layer of the network have
forms are also possible. The net input can take as their inputs the output signals of the
on any value and the function F ascertains preceding layer. If every node in each layer
that the new activation state does not exceed of the network is connected to every other
the maximum or minimum activation values node in the adjacent forward layer, the neural
(e.g., above 1 or below 0). network is fully connected. If a connection
6. The learning rule. Learning in a neural network is missing the neural network is partially
involves modifying the connection weights connected.
and finding the right weights is at the heart of ■ Recurrent networks. The network has at
connectionist models. The learning rule is an least one feedback loop. The feedback
algorithm and specifies how the connectivity loops may be self-feedback loops—that is,
changes over time as a function of experience, the output of a neuron is fed back into
that is, data. For instance, the simplest learning its own input or no self-feedback loops,
rule assumes that the weight wij between two for instance, when the output is fed back
units ui and uj changes proportional to the to the inputs of all the other neurons.
respective activation values—that is, ∆wij = 8. Learning paradigms. There are three major
wij(t + 1) − wij(t) = ηaiaj, where the constant η learning paradigms: supervised learning,
is called learning rate. Basically, it determines unsupervised learning, and reinforcement
the step size at each iteration. There is a variety learning.
of learning algorithms. They differ from each ■ Supervised learning, also referred to as
other in the way in which the adjustment of learning with a teacher. In supervised
the connection weights of a unit is formulated learning, there is given a set of data, the
(e.g., Haykin, 1999, for a detailed description training set, which consists of the input
of several algorithms). Specific learning rules (e.g., object patterns) and the desired output
depend on the architecture of the neural (e.g., classification). That is, the input is
network. In addition, various learning para- given together with the correct output,
523
Adele Diederich
also called target. The parameters of the vector a (e.g., describing visual objects) to a binary
network are gradually adjusted to match the output value y (e.g., category A or B). Furthermore,
input and desired output by going through the neural network applies a reinforcement
the training set many times. The aim of the paradigm. The architecture of this model in the
supervised neural network is to predict above introduced notation is
the correct answer to new data that were
not included in the training set. That is, y=F (∑ n
j =1 )
w ja j (t ) + w0 = sgn (∑ n
j =1 )
w ja j (t ) + w0 ,
the network is expected to learn certain
(23.8)
aspects of the input–output pairs in the
training set and to apply to it new data.
where w0 is a bias factor. The bias has the effect
■ Unsupervised learning, also referred to as
of increasing or lowering the net input of the
self-organized leaning or learning without a
activation function, depending on whether it is
teacher. In unsupervised learning a correct
positive or negative, respectively. Setting a0 ≡ 1
or desired output is not known—that is,
the above equation can be written as
there is no input–output pair. This type of
learning is often used to form natural groups
or clusters of objects based on similarity
y = sgn (∑ n
j= 0 )
w ja j (t ) = sgn ( w ′a (t )) . (23.9)
Copyright American Psychological Association. Not for further distribution.
between objects.
■ Reinforcement learning. In reinforcement Denote the set of training or learning data
learning the only information given for as D = {[a(t), z(t)], t = 1, . . . , m}, where {z(t)}
each input–output pair is whether the contains the binary classification variables ±1,
neural network produced the desired result the desired activation state at time t and a(t) =
or not or the total reward given for an output (a0(t), a1(t), . . . , an(t)) is the state activation
response. The weights are updated based vector for the observation at time t as before.
solely on this global feedback (that is, the Learning is modeled by updating the weight
Boolean values true or false or the reward vector w during m iterations for all training
value) (for details, see, e.g., Rojas, 1996). examples. That is, for each pair in D and for each
iteration t, the weight w is updated according to
As an example for a learning rule, consider
the single-unit perceptron (Figure 23.1), which ∆ wj = w j (t + 1) − w j (t ) = η ( z − y ) a j , (23.10)
is the simplest version of a feed-forward neural
network and classifies inputs into two distinct where the constant η (> 0) is the learning rate,
categories. That is, it maps a real-valued input and the learning rule is called the delta rule. The
delta rule is related to a gradient descent type of
method for optimization.
a0 = +1 End of digression. For the two connectionist
a1 W0 models for categorization, we assume very simple
W1 objects characterized by only two dimensions.
a2 W2 These could be saturation and brightness, length
n
F wiai y and orientation, distance between eyes and length
i=0
of mouth, and so on. The stimuli are conveniently
Wn
described in forms of vectors. A stimulus is denoted
S = (s1, s2), where s1 represents the value of the
an
stimulus on the first dimension and s2 represents
the value of the stimulus on the second dimension.
FIGURE 23.1. A simple single-unit
perceptron, also known as McCulloch- Consider the connectionist version of the prototype
Pitts neuron (McCulloch & Pitts, 1943). model first.
524
Computational Modeling
disparity, frequency. The ideal point value of discriminability parameter (large σ) makes it
each unit is not naturally given but needs to be hard to discriminate differences between the
defined. These additional detailed assumptions stimulus value and the ideal point, and a high
(called ad hoc assumptions) are necessary in order discriminability parameter (small σ) makes easy-
to complete the model. That is, for the prototype to-discriminate differences between the stimulus
model, assumptions about what features should value and the ideal point. That is, it determines
be used to represent the stimuli to be categorized the rate at which similarity declines with distance.
need to be added and also formulated in an The values of the function range between 0 and 1.
abstract way. If the stimulus value si and the ideal point zij are
The jth unit in the first set is designed to identical, the function takes on the value 1. If the
detect a stimulus value, z1j, that is, the ideal stimulus value si is far apart from the ideal point zij,
point of that unit and the activation of this unit, the function approaches 0.
denoted a1j(t), is determined by the similarity The input activation aij(t) generated at the
of si presented at trial t to the ideal point z1j. jth unit is determined by the similarity of that
Analogously, the jth unit in the second set is unit relative to the sum of the similarity of all
designed to detect a stimulus value, z2j, and the the units:
activation of this unit, denoted a2j(t), is determined
by the similarity of the ideal point z2j to s2 presented fsim ( z ij , si )
a ij (t ) = ,
∑ j=1 fsim ( z ij , si )
p
at trial t.
How large the set U is, depends on how many
specific features are to be coded. For instance, i = 1, 2 j = 1, . . . , p. (23.12)
Le Cun et al. (1989) developed a network for
zip-code recognition, in which 7,291 handwritten The input units are connected to two output
zip-code digits were processed such to fit into a units, one for each category. The activation
16 × 16 pixel image with grey levels in the range of the two category units is denoted c1(t) and
of –1 to +1. Thus, the dimensionality of each c2(t) for category C1 and C2, respectively. The
input is 256. connection weight, wijk, connects the input
The similarity between the current stimulus activation aij(t) to the kth output unit, k = 1, 2.
value si and the ideal point zij, for each unit j is The propagation rule—that is, the function
determined by the following function: that combines the input activation with the
525
Adele Diederich
(23.14)
FIGURE 23.2. Architecture of the connectionist
where hk(t) is the indicator function with hk(t) = 1 version of the prototype model.
for the desired category and 0 otherwise.
The whole learning process begins with some where zi is a vector of length p with the p ideal
initial weights, and usually these are randomly points for dimension i and O is a p-dimensional
assigned to represent a state of ignorance at the vector containing ones. The similarity function is
beginning of the learning process. Alternatively,
if some prior knowledge or training exists, then the fsim ( z i , si ) = exp ( − d•2
i ), (23.16)
initial weights can be set to values that represent
this prior knowledge or training. where • 2 means the elementwise square of the
The architecture of the connectionist version of vector.
the prototype model is presented in Figure 23.2. The activation function in Equation 23.12
One essential step in computational modeling is a p-dimensional vector with
is to implement the model onto the computer—
that is, writing codes and algorithms for training 1
the model and estimating the parameters. a i (t ) = fsim ( z i , si ) , i = 1, 2
O T • fsim ( z i , si )
To do so it is convenient to rewrite the equa-
(23.17)
tions in matrix format. Computer languages such
as MATLAB, Mathematica, Python, and R have
where OT is the transposed vector with ones,
built-in matrix operators, which allows effective
that is, a row vector. Note that (OT • fsim(zi, si))
programming leading to fast computations.
is the inner product, which produces a scalar.
The deviation between the p ideal points and
Obviously, the activation for both stimulus
the stimulus value si in Equation 23.11 can be
dimensions can be expressed in one vector of
written as p-dimensional vector di, i = 1, 2 with
a1 (t )
length 2p with a (t ) = . The weights for
1 a 2 (t )
di = ( z i − si O ) , (23.15)
σ each category are arranged in a 2 × p matrix
526
Computational Modeling
w1
T
TABLE 23.1
W = . The propagation rule in Equation 23.6
w2
T
MATLAB Codes for Determining the Similarity
can be written as
Function and Weights for the Prototype Model
c (t ) = Wa (t ) , (23.18)
Program command Comment
function[a] = protoac(Z,S, sigma,j); Input and output of function
c1 (t ) fsim=exp(-((Z-S(j))./sigma).^2); Calculating similarity
where c (t ) = is a vector of length 2 with a=fsim/sum(fsim); Calculation activation
c2 (t )
the activation of the two category units for function[W] = prototype(eta,p)
category C1 and C2, respectively. W = zeros(2,p) Initial connection weights
for j=1:p Loop for stimuli
Finally, the delta rule (Equation 23.14) can be [a1]=protoac(Z1,S1,sigma,j); Activation for category 1
written as [a2]=protoac(Z2,S2,sigma,j); Activation for category 2
c=W*[a1;a2]; Propagation
∆W = W (t + 1) − W (t ) = η • [h (t ) − c (t )] • a, W1=eta*([1;0]-c)*a1; Adjusting weights for c1
W2=eta*([0;1]-c)*a2; Adjusting weights for c2
(23.19) W=W+W1+W2; Updating the weights
End End of loop
h1 (t )
Copyright American Psychological Association. Not for further distribution.
527
Adele Diederich
distance of the unit from this center. The activation activation aij(t) to the kth output unit, k = 1, 2.
of this unit, aij(t), is determined by the similarity The propagation rule for the exemplar model is
of the stimulus S to the ideal point zij denoted
ck (t ) = ∑ i =1 ∑ j=1 wijk aij (t ),
p p
fsim(zij, S) and defined as k = 1, 2. (23.22)
(
z i − s1 •
)
z j − s2
( ) ,
2 2
fsim ( z ij , S ) = exp − exp − This model is called an exemplar model
σ σ because each receptive field of a training stimulus
i = 1, . . . , p j = 1, . . . , p. (23.20) is associated with the output category units
through a separate set of connection weights.
This is a type of a bivariate Gaussian distribution Thus, the model simply associates each region
and is used to form the receptive field. As for the of the stimulus space with a response, and similar
prototype model the values of the function range examples get mapped to similar responses.
between 0 and 1. If the stimulus values of both As for the prototype model, the connection
dimensions (s1, s2) and the ideal point (zi, zj) weights are updated according to the delta
are identical, the function takes on the value 1. learning rule:
If the stimulus value of at least one dimension, si,
∆wijk = wijk (t + 1) − wijk (t ) = η • [hk (t ) − ck (t )] • a ij ,
i = 1, 2, is far apart from its ideal point, zi, i = 1, 2,
Copyright American Psychological Association. Not for further distribution.
528
Computational Modeling
s1
s2 a12 a1p
W1p1
a21 W1p2 c1
Wpp1
c2
ap1 app Wpp2
a(t) is a p2 vector with elements as defined hold for all possible parameter values), quantitative
in Equation 23.22. The propagation rule and predictions do require specific values for the
the delta rule are analogous to Equation 23.18 free parameters of the model. Probing the model
and Equation 23.19, respectively. The algorithm includes both qualitative and quantitative
for this part of the model can be found in tests. For a qualitative test, the model predicts
Table 23.2. characteristic patterns which are compared to
patterns observed in data. For the quantitative
Step 3 test, the free parameters of the model are esti-
The third step of the modeling process is to derive mated from data and a goodness-of-fit measure
the predictions of the models. The predictions (see above) provides information about how
can be qualitative and quantitative. Although well the model describes the data in a statistical
qualitative predictions do not require specific sense. (For a broader discussion on qualitative
parameter values of the model (the predictions versus quantitative tests, see Busemeyer and
Diederich, 2010.)
Both models make predictions with respect to
two different transfer tests: a generalization test
TABLE 23.2 and a recognition test. For the generalization test,
MATLAB Codes for Determining the Similarity new stimuli, not previously presented in the
Function and Weights for the Exemplar Model training set are classified. For the recognition
test, new and old stimuli—that is, those presented
Program command Comment
in the training set are mixed and classified as
new and old.
function[a] = exempac(Z1,Z2,S1,S2, Input and output of
sigma,j); function For the generalization test, the models assume
fsim1=exp(-((Z1-S1(j))./sigma).^2); Calculating similarity that the probability of choosing category Ck
fsim2=exp(-((Z2-S2(j))./sigma).^2); for a new stimulus Snew (i.e., not an element of
fsim=kron(fsim1,fsim2);
the training set) is based on a ratio of strength
a=fsim/sum(fsim); Calculation activation
of the output activations. After t trials of training,
529
Adele Diederich
the output for category k = 1 is c1(t) and the its new activation state, is the logistic function
probability for choosing C1 is F(x) = (1 + e−x)−1, where x = β(−c1(t) + c2(t))).
The coefficient, β, is called a sensitivity parameter.
exp (β • c1 (t )) Increasing the sensitivity parameter decreases
Pr [C1 Snew ] =
exp (β • c1 (t )) + exp (β • c2 (t )) the value for exp(−β) and therefore increases
the F(x). That is, increasing the sensitivity
1
= (23.26) increase the slope of the function that relates
1 + exp ( −β ( − c1 (t ) + c2 (t )))
the choice probability to the activation of a
category. Here it increases the probability for
and the probability for choosing C2 is
choosing C1 with activation c1(t).
Pr [C2 Snew ] = 1 − Pr [C1 Snew ]. (23.27) The predictions of both models over a range
of parameters are presented in Figure 23.4. In
That is, the activation rule, which specifies particular, the sensitivity parameter β ranged from
how the net input of a given unit produces 0 to 15 in unit steps, the learning parameter η
0.9
Proportion correct
0.8
0.7
0.6
0.5
1
0.8 15
0.6 10
0.4
0.2 5
Learning rate η 0 0 Sensitivity β
0.9
Proportion correct
0.8
0.7
0.6
0.5
1
0.8 15
0.6 10
0.4
0.2 5
Learning rate η 0 0 Sensitivity β
530
Computational Modeling
from 0 to one in steps of 0.04, and σ in Equa- The sensitivity parameter γ determines the
tion 23.11 and Equation 23.20 is set to 5. Suppose recognition probability to the category activations.
the stimuli are defined by two dimensions, and Increasing the sensitivity parameter causes the
let H and L be sets containing all possible values recognition probability to be more strongly
within the described dimension. Stimuli belonging influenced by the category activations.
to category C1 have either low values, L, on both The parameter δ is a background-noise constant
dimensions, (S1 ∈ L, S2 ∈ L) = (l, l), or high (Nosofsky et al., 1992). Here, it can be inter-
values, H, on both dimensions, (S1 ∈ H, S2 ∈ H) = preted as a response bias parameter representing
(h, h). Stimuli belonging to category C2 have low the tendency to say new to any stimulus, and
values on the first dimension and high values on increasing δ increases the tendency to respond new.
the second dimension, (S1 ∈ L, S2 ∈ H) = (l, h), Both models have five model parameters:
or high values on the first dimension and low the discriminability parameter σ which deter-
values on the second dimension, (S1 ∈ H, S2 ∈ L) = mines the width of the generalization gradients;
(h, l). For the simulation, the stimuli are realiza- the learning rate parameter η for the delta
tions from Gaussian distributions, N(µ, φ2). In learning rule; the sensitivity parameter β for the
particular, stimuli belonging to category C1 have categorization choice rule; and two parameters
low values on both dimensions, with mean µ1 = 1 for the recognition response rule, the sensitivity
Copyright American Psychological Association. Not for further distribution.
for the first and µ2 = 1 for the second dimension, parameter γ and the response bias parameter δ.
or high values on both dimensions, with mean The main difference between the two models is
µ1 = 10 for the first and µ2 = 9 for the second in terms of the input representation. The prototype
dimension; stimuli belonging to category C2 have model uses two univariate sets of input units,
whereas the exemplar model uses a single bivariate
either a low and a high value on both dimensions
grid of input units. The latter is plausible as
with µ1 = 2 for the first and µ2 = 10 for the second
many neurons are tuned to more than one feature.
dimension, or with µ1 = 9 for the first and µ2 = 1
For instance, neurons in MT are tuned both to
for the second dimension. For all conditions,
direction and spatial frequency or neurons in
φ2 is set to 1.
V1 and V2 are tuned both to orientation and spatial
For a recognition task, the models assume that
frequency (e.g., De Valois & De Valois, 1990;
the probability of classifying a stimulus as old,
Mazer et al., 2002).
that is, as previously presented in the training
set, is an increasing function of the total amount
Step 4
of activation produced by the stimulus to both
The fourth step is to test the predictions of the
output units.
model with data and to compare the predictions
Again, a logistic function is used to relate
of competing models with respect to their ability to
total activation to old–new recognition response explain the empirical results. However, as Roberts
probability: and Pashler (2000) pointed out, showing that a
model fits the data is not enough. As pointed out
exp ( γ • ( c1 (t ) + c2 (t ))) before, a major concern is that if a model is too
Pr [ old Snew ] =
δ + exp ( γ • ( c1 (t ) + c2 (t ))) flexible (fits too much) and does not constrain
possible outcomes, then the fit is meaningless;
1
= if it is too flexible, it is necessary to penalize it
1 + exp ( − ( γ ( c1 (t ) + c2 (t )) + ln ( δ ))
•
for its complexity (Myung, 2000).
(23.28) All models are an abstraction from a real-world
phenomenon, and they focus only on essential
and the probability for choosing new is aspects of a complex system. To be tractable and
useful, models only reflect a simple and limited
Pr [ new Snew ] = 1 − Pr [ old Snew ]. (23.29) representation of the complex phenomenon.
531
Adele Diederich
That is, a priori, all models are wrong in some response probability, depending on the value
details, and a sufficient amount of data will always of the first dimension (Nosofsky et al., 1992).
prove that a model is not true. The question is This crossover interaction effect is critical for
which among the competing models provides a a qualitative test of the two competing models.
better representation of the phenomenon under As it turns out, the prototype model cannot predict
question. Within the present context, the question the crossover effect when fixing one dimension
is which of the two models, the prototype model and varying only the second; the exemplar model,
or the exemplar model provide a better explanation however, predicts this crossover for a wide range
of how objects are categorized. of parameter values (see Busemeyer & Diederich,
To empirically test competing models, it is 2010). For demonstration, let us take the same
crucial to design experiments that challenge the parameters for β, η, σ as in Step 3. The mean for
models. For instance, designing experimental the stimuli values, however, are set to µ1 = 1 and
conditions that lead to opposite qualitative pre- µ2 = 1 or µ1 = 10 and µ2 = 10 for category C1,
dictions (categorical or ordinal) is an essential and µ1 = 1 and µ2 = 10 or µ1 = 10 and µ2 = 1 for
step in the model testing process. For example, category C2. Figure 23.5 shows the simulation
the prototype model predicts that stimulus S is results. A dot indicates a combination of param
categorized in category C1 most often, but the eters that successfully reproduced the crossover.
Copyright American Psychological Association. Not for further distribution.
exemplar model predicts that stimulus S is cate If the parameters are sufficiently large the exemplar
gorized in category C2 most often. Qualitative tests model predicts the crossover, here in 337 out of
are parameter free in the sense that the models are 375 possible cases.
forced to make these predictions for any value When we take the previous parameters µ1 = 1
of the free parameters. The following briefly and µ2 = 1 or µ1 = 10 and µ2 = 9 for category C1
describes a design and shows a qualitative test and µ1 = 2 and µ2 = 10 or µ1 = 9 and µ2 = 1 for
for the two competing models. category C2, the simulation reproduces the cross-
Experiments in categorical learning typically over for both models as show in Figure 23.6.
are divided in two phases: a learning or training
phase followed by a transfer test phase. During
0.04
the training phase, the participants categorize
objects in distinct classes and receive feedback
about the performance. The transfer test is either 0.2
a generalization test or a recognition test, both
without feedback (see Step 3).
0.4
Learning rate η
532
Computational Modeling
(B)
0.04 make quantitative predictions that are more accu-
rate than its competitors. Quantitative predictions
of a model are evaluated on the basis of an optimal
0.2
selection of parameters. Otherwise, a perfectly good
model could be rejected due to a poor selection of
0.4 parameters.
Learning rate η
Step 5
0.6 The last step is in the modeling process is to
modify the model in light of the data. Sometimes
it is sufficient to make some adjustments to account
0.8 for the observed data. Sometimes it is necessary
to reformulate the theoretical framework—for
instance, modifying assumptions or by adding
1
new assumptions; sometimes it is inevitable to
1 5 10 15
abandon the model and construct a completely
Sensitivity β
new model based on the feedback obtained from
FIGURE 23.6. Predicted patterns new experimental results. That is, new experi-
of results for the exemplar model (A) mental findings pose new challenges to previous
and the prototype model (B). Each dot
indicates a combination of parameters
models. New models trigger new experiments.
that correctly reproduced the crossover Modeling is a cyclic process, and progress in the
interaction. empirical and experimental sciences is made via
this cycle: theorizing about the phenomenon and
developing a model, deriving predictions from
The exemplar model (Fig. 23.6A) reproduces the the model, testing the model, revising the model
correct pattern in 325 out of 375 possible cases, in light of empirical findings, testing the model
the prototype model reproduces it in 191 cases. again, and so on. Thus, the modeling process
This example shows how crucial it is to design produces an evolution of models that improve
a proper experiment to distinguish between two and become more powerful over time as the
competing models that make, in general, very science in a field progresses.
533
Adele Diederich
10.1037/13620-021
References Diederich, A., & Oswald, P. (2014). Sequential
Akaike, H. (1983). Information measures and model sampling model for multiattribute choice alter
selection. Bulletin of the International Statistical natives with random attention time and processing
Institute, 50, 277–290. order. Frontiers in Human Neuroscience, 8, 697.
https://doi.org/10.3389/fnhum.2014.00697
Beasley, W. H., & Rodgers, J. L. (2012). Bootstrapping
and Monte Carlo methods. In H. Cooper, P. M. Diederich, A., & Trueblood, J. S. (2018). A dynamic
Camic, D. L. Long, A. T. Panter, D. Rindskopf, & dual process model of risky decision making.
K. J. Sher (Eds.), APA handbook of research methods Psychological Review, 125(2), 270–292. https://
in psychology: Vol. 2. Research designs: Quantitative, doi.org/10.1037/rev0000087
qualitative, neuropsychological, and biological Evren, A., & Tuna, E. (2012). On some properties
(pp. 407–425). American Psychological Association. of goodness of fit measures based on statistical
https://doi.org/10.1037/13620-022 entropy. International Journal of Research and
Boden, M. A. (2006). Mind as machine: A history of Reviews in Applied Sciences, 13(1), 192–205.
cognitive science. The Clarendon Press. Fan, X. (2012). Designing simulation studies. In
Browne, M. W., & Cudeck, R. (1992). Alternative H. Cooper, P. M. Camic, D. L. Long, A. T. Panter,
ways of assessing model fit. Sociological Methods D. Rindskopf, & K. J. Sher (Eds.), APA handbook
& Research, 21(2), 230–258. https://doi.org/10.1177/ of research methods in psychology: Vol. 2. Research
0049124192021002005 designs: Quantitative, qualitative, neuropsychological,
and biological (pp. 427–444). American Psycho-
Burnham, K. P., & Anderson, D. R. (2004). Multi- logical Association. https://doi.org/10.1037/
model inference: Understanding AIC and BIC in 13620-021
model selection. Sociological Methods & Research,
33(2), 261–304. https://doi.org/10.1177/ Farrell, S., & Lewandowsky, S. (2018). Computational
0049124104268644 modeling of cognition and behavior. Cambridge
University Press. https://doi.org/10.1017/
Busemeyer, J. R., & Diederich, A. (2010). Cognitive CBO9781316272503
modeling. SAGE Publishing.
Feinberg, F. M., & Gonzalez, R. (2007, March). Bayesian
Chechile, R. A. (1999). A vector-based goodness-of fit modeling for psychologists: An applied approach
metric for interval-scaled data. Communications [Paper presentation]. Tutorial Workshop on
in Statistics. Theory and Methods, 28(2), 277–296. Bayesian Techniques, University of Michigan,
https://doi.org/10.1080/03610929908832298 Ann Arbor, MI, United States.
Cole, D. A., & Ciesla, J. A. (2012). Latent variable Flaherty, B. P., & Kiff, C. J. (2012). Latent class and
modeling of continuous growth. In H. Cooper, latent profile models. In H. Cooper, P. M. Camic,
P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, D. L. Long, A. T. Panter, D. Rindskopf, & K. Sher
& K. Sher (Eds.), APA handbook of research methods (Eds.), APA handbook of research methods in
534
Computational Modeling
psychology, Vol. 3. Data analysis and research Psychology, 44(1), 190–204. https://doi.org/10.1006/
publication (pp. 391–404). American Psychological jmps.1999.1283
Association. https://doi.org/10.1037/13621-019
Myung, I. J., & Pitt, M. A. (1997). Applying Occam’s
George, G., & Raimond, K. (2013). A survey on opti- razor in modeling cognition: A Bayesian approach.
mization algorithms for optimizing the numerical Psychonomic Bulletin & Review, 4(1), 79–95.
functions. International Journal of Computers and https://doi.org/10.3758/BF03210778
Applications, 61(6), 41–46. https://doi.org/10.5120/
Navarro, D. J. (2019). Between the devil and the deep
9935-4570 blue sea: Tensions between scientific judgment
Haykin, S. (1999). Neural networks: A comprehensive and statistical model selection. Computational
foundation (2nd ed.). Prentice Hall. Brain & Behavior, 2(1), 28–34. https://doi.org/
10.1007/s42113-018-0019-z
Hepburn, B., & Andersen, H. (2021). Scientific method.
In E. N. Zalta (Ed.), The Stanford encyclopedia of Nezlek, J. B. (2012). Multilevel modeling for psycholo-
philosophy. https://plato.stanford.edu/archives/ gists. In H. Cooper, P. M. Camic, D. L. Long, A. T.
sum2021/entries/scientific-method/ Panter, D. Rindskopf, & K. J. Sher (Eds.), APA
handbook of research methods in psychology: Vol. 3.
Hübner, R., & Pelzer, T. (2020). Improving parameter
Data analysis and research publication (pp. 219–241).
recovery for conflict drift-diffusion models.
American Psychological Association. https://doi.org/
Behavior Research Methods, 52(5), 1848–1866.
10.1037/13621-011
https://doi.org/10.3758/s13428-020-01366-8
Nilsson, N. J. (2010). The quest for artificial intelligence:
Kandil, F. I., Diederich, A., & Colonius, H. (2014). A history of ideas and achievements. Cambridge
Parameter recovery for the time-window-of-
Copyright American Psychological Association. Not for further distribution.
University Press.
integration (TWIN) model of multisensory
integration in focused attention. Journal of Vision, Nosofsky, R. M., Kruschke, J. K., & McKinley, S. C.
14(11), 1–20. https://doi.org/10.1167/14.11.14 (1992). Combining exemplar-based category
representations and connectionist learning rules.
Kruschke, J. K. (1992). ALCOVE: An exemplar-based Journal of Experimental Psychology: Learning,
connectionist model of category learning. Psycho- Memory, and Cognition, 18(2), 211–233. https://
logical Review, 99(1), 22–44. https://doi.org/ doi.org/10.1037/0278-7393.18.2.211
10.1037/0033-295X.99.1.22
Palminteri, S., Wyart, V., & Koechlin, E. (2017).
Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., The importance of falsification in computational
Howard, R. E., Hubbard, W., & Jackel, L. D. cognitive modeling. Trends in Cognitive Sciences,
(1989). Backpropagation applied to handwritten 21(6), 425–433. https://doi.org/10.1016/j.tics.
Zip code recognition. Neural Computation, 1(4), 2017.03.011
541–551. https://doi.org/10.1162/neco.1989.
1.4.541 Ratcliff, R. (2012). Response time distributions.
In H. Cooper, P. M. Camic, D. L. Long, A. T.
Lebière, C., & Anderson, J. R. (1993). A connectionist Panter, D. Rindskopf, & K. J. Sher (Eds.), APA
implementation of the ACT-R production system. handbook of research methods in psychology: Vol. 2.
In Proceedings of the Fifteenth Annual Conference Research designs: Quantitative, qualitative, neuro
of the Cognitive Science Society (pp. 635–640). psychological, and biological (pp. 429–443).
Lawrence Erlbaum Associates. American Psychological Association. https://doi.org/
Lewandowsky, S., & Farrell, S. (2011). Computational 10.1037/13620-021
modeling in cognition: Principles and practice. Rindskopf, D. (2012). Generalized linear models.
SAGE Publications. https://doi.org/10.4135/ In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter,
9781483349428 D. Rindskopf, & K. J. Sher (Eds.), APA handbook
Mazer, J. A., Vinje, W. E., McDermott, J., Schiller, P. H., of research methods in psychology: Vol. 3. Data
& Gallant, J. L. (2002). Spatial frequency and analysis and research publication (pp. 191–206).
orientation tuning dynamics in area V1. Proceedings American Psychological Association. https://doi.org/
of the National Academy of Sciences of the United 10.1037/13621-009
States of America, 99(3), 1645–1650. https://doi.org/ Roberts, S., & Pashler, H. (2000). How persuasive is
10.1073/pnas.022638499 a good fit? A comment on theory testing. Psycho
McCulloch, W. S., & Pitts, W. (1943). A logical logical Review, 107(2), 358–367. https://doi.org/
calculus of ideas immanent in nervous activity. 10.1037/0033-295X.107.2.358
Bulletin of Mathematical Biophysics, 5, 115–133. Rojas, R. (1996). Neural networks. A systematic intro-
https://doi.org/10.1007/BF02478259 duction. Springer.
Myung, I. J. (2000). The importance of complexity Rumelhart, D. E., McClelland, J. L., & the PDP
in model selection. Journal of Mathematical Research Group. (1986). Parallel distributed
535
Adele Diederich
536