Decoding Semantic Representations in Mind and Brai
Decoding Semantic Representations in Mind and Brai
A key goal for cognitive neuroscience is to understand the neurocognitive sys- Highlights
tems that support semantic memory. Recent multivariate analyses of neuroimag- State-of-the-art brain imaging studies
ing data have contributed greatly to this effort, but the rapid development of these have recently produced a variety of
novel approaches has made it difficult to track the diversity of findings and to un- sometimes contradictory conclusions
about the neural systems that support
derstand how and why they sometimes lead to contradictory conclusions. We ad- human semantic memory.
dress this challenge by reviewing cognitive theories of semantic representation
and their neural instantiation. We then consider contemporary approaches to neu- Multivariate techniques deployed in this
work adopt implicit or explicit assump-
ral decoding and assess which types of representation each can possibly detect. tions that limit the types of signal they
The analysis suggests why the results are heterogeneous and identifies crucial can detect, and thus the types of hy-
links between cognitive theory, data collection, and analysis that can help to bet- potheses they can test.
ter connect neuroimaging to mechanistic theories of semantic cognition.
We lay out the space of possible cogni-
tive and neural representations and
then critically review contemporary
The neurocognitive quest for semantic representations methods to determine which analyses
Cognitive science has long sought to understand the mechanisms underlying human semantic can test which hypotheses.
memory – the storehouse of knowledge that supports our ability to comprehend and produce The results account for the heterogeneity
language, recognize and classify objects, and understand everyday events. Recently, cross- of recent findings and identify an impor-
fertilization of cognition, neuroscience, and machine learning has generated a plethora of new tant empirical and methodological gap
that makes it difficult to connect the im-
analysis methods to aid the discovery of neural systems that encode semantic information [1–5].
aging literature to neurocomputational
Although this renaissance has produced a remarkable array of new findings, the evolution of models of semantic processing.
different approaches across research groups makes it difficult to track them all, understand their
respective strengths and limitations, and compare results across studies. Consequently, the liter-
ature contains sometimes startlingly different conclusions about the nature, structure, and organi-
zation of semantic representations in the mind and brain, and the field has little recourse for
understanding why the differences arise or how they might be reconciled.
We address this challenge by reviewing hypotheses about how semantic information may be
1
Medical Research Council (MRC)
encoded computationally and neurally, then critically evaluating the types of representational
Cognition and Brain Sciences Unit,
structure that contemporary multivariate methods can possibly discover in functional neuroimag- Chaucer Road, Cambridge CB2 7EF, UK
2
ing data. Crucially, each method encapsulates assumptions about how neural systems encode Department of Psychology, Louisiana
State University, Baton Rouge,
mental structure that then constrain the types of neural coding it can, and cannot, detect. Hypoth-
LA 70803, USA
esis, data collection, and analysis are therefore linked in ways that sometimes go unremarked and 3
Department of Psychology, University
may explain the heterogeneity of findings in the literature. Through exposition of these points, we of Wisconsin–Madison, 1202 West
Johnson Street, Madison, WI 53706, USA
present an overview of the current empirical landscape with the aim of both organizing current
thinking about semantic representations in mind and brain, and of providing a more general
field guide to contemporary multivariate methods for brain imaging.
258 Trends in Cognitive Sciences, March 2023, Vol. 27, No. 3 https://doi.org/10.1016/j.tics.2022.12.006
© 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Trends in Cognitive Sciences
OPEN ACCESS
in appearance (e.g., hummingbird and ostrich), verbal labels (e.g., dog and wolf), or the action Glossary
plans that engage them (e.g., glue and tape). Children as young as 9 months of age detect Category: (of a representation)
such relationships and use them to guide reaches even when they contravene perceptual similar- composed of discrete, independent
ity [6–8]. Adults can reliably judge relatedness in kind and sort items into conceptual groups on units that each correspond to a concept
(such as boat, vehicle, or yacht).
this basis [9–11], and both children and adults use conceptual similarity as a primary basis for Conjoint: (of a representation)
generalizing names and other properties [12–14]. Second, semantic representations support consisting of units that express different
knowledge retrieval or inference – attributing to an item or event properties that are not directly semantic information depending on the
observed or stated. For instance, when observing a picture of a parrot in a textbook, the student states of other units.
Consistent: (of a representation)
may infer that the item can fly even though the image is static; reading about a trip to the restau- associated with the same direction of
rant, she may infer that the diner had to pay even if this is not mentioned; observing the new change in activation across individuals –
neighbor’s pet, a toddler may call it 'doggie' even if it is an unfamiliar breed, and so on. Semantic for example, homologous voxels in
different individuals become more active
representations thus can be defined as the cognitive and neural states that express conceptual
when representing cat.
structure and support semantic retrieval/inference. Hypotheses about the cognitive mechanisms Contiguous: (of a representation)
that support these functions reside within a fairly constrained space of possibilities (Figure 1). composed of units residing in the same
brain region.
Decoding: predicting the stimulus
Considering conceptual structure, most approaches adopt one of three positions. The first pro-
(or sometimes the properties of the
poses that semantic memory contains many discrete and independent category (see Glossary) stimulus, or of the task) experienced by a
representations, each corresponding roughly to a basic-level natural language concept such as participant using patterns of activity
tree or boat [15,16] (Figure 1, top) and possibly to more general (plant, vehicle) or specific (elm, across multiple neural units.
Dispersed: (of a representation)
yacht) classes [17,18]. On this view, verbal comprehension involves discerning the category to
composed of units residing in different
which a word refers [19] whereas comprehension of visual and other sensory inputs involves cor- brain regions.
rectly classifying a perceived item [4,18,20,21]. Category-based theories explain conceptual struc- Electrocorticography (ECoG): a
ture by proposing that conceptually similar items activate the same category representation – for method of measuring brain activity via
intracranial electrodes placed on the
instance, parrots, hummingbirds, and robins are viewed as being conceptually related because cortical surface.
they all activate the mental category bird. Electroencephalography (EEG): a
method of measuring brain activity via
The second view proposes that semantic representations are composed of local features, each electrodes placed on the scalp.
Encoding model: a model that
independently indicating the presence/absence of a property such as is red, can fly, or has eyes predicts the activity of a single neural unit
(Figure 1, middle row). Each perceived item or word activates associated features, indicating using multiple independently
properties that are likely to be true of the item [22–26]. Conceptual similarity structure arises interpretable features of the stimulus.
Multiple encoding models are used to
from property overlap: hummingbirds and ostriches are understood to be similar in kind because
predict activity across multiple neural
they possess many common properties (wings, feathers, etc.), but are also known to be non- units.
identical because they possess individuating properties as well [27,28]. Feature-based: (of a representation)
composed of multiple independently
interpretable features (such as is red or
Category-based approaches are often distinguished from feature-based views because of the
can fly).
special role that category representations play in determining conceptual similarity and Functional magnetic resonance
supporting inference. For instance, prototype theories [29], 'entry-level' [18,30] and spreading- imaging (fMRI): a method of
activation views [31], rational approaches [15], and some neurally inspired models of object cat- measuring brain activity by detecting
changes in blood flow.
egorization [32] all propose that access to semantic information depends upon first matching a
Grounded: (of a representation)
stimulus (image, word, sound, etc.) to a semantic category. Successful categorization then pro- requiring the generation of modality-
vides direct access to semantic information or initiates a 'search' of the semantic system, allowing specific surface representations to
retrieval of other properties. On such views, semantic categories constitute more than merely an produce retrieval/inference.
Heterogeneous: (of a representation)
additional feature that is attributed to a perceived item. consisting of units that adopt different
activation states when representing a
Nevertheless, under both approaches semantic representations can also be viewed as vectors in concept.
a high-dimensional representation space. For categorical theories, dimensions encode member- Homogeneous: (of a representation)
consisting of units that all adopt the
ship of distinct and mutually exclusive categories, and the representation of an item is a multino- same activation state when representing
mial probability distribution indicating the probability that a stimulus belongs to each class. For a concept.
instance, observing an item with wings, feathers, and a beak would generate a high probability Inconsistent: (of a representation)
density on the bird axis and a low density on axes corresponding to fish, car, boat, etc. because associated with different directions of
the probability that the item is a bird is high and the probability of it belonging to other categories is change in activation across
individuals – for example, homologous
low. For feature-based theories, dimensions encode various directly interpretable properties, and
voxels in multiple individuals behave
the representation of an item indicates, independently on each dimension, the binomial probabil- differently when representing cat, some
ity that the item possesses the corresponding property. On this view, cardinal is a vector with high becoming more active and others
values on dimensions such as is red and can fly, but low values on dimensions such as has scales becoming less active.
Independent: (of a representation)
and can swim. Moreover, some such features may directly indicate the semantic category label of consisting of units that express the
an item (e.g., 'bird', 'fish'), although, in contrast to category-based theories, such labels have no presence or absence of the same
special function beyond that of other features. In both cases, conceptual structure reflects the semantic information irrespective of the
similarity of different points in the vector space. states of other units.
Labeled data: a dataset specifying
both input and output values for fitting an
The third proposal likewise views semantic representations as points in a high-dimensional vector encoding or decoding model.
space, but without assigning any directly interpretable meaning to the corresponding dimensions Magnetoencephalography (MEG): a
method of measuring brain activity by
(Figure 1, bottom). Perception of a stimulus or word evokes an activation pattern across an en-
measuring magnetic fields generated by
semble of representation units, corresponding to a point in the space where the proximity be- neural activity.
tween points expresses conceptual similarity [33–35]. Unlike feature- and category-based Multivariate pattern classification
approaches, however, one cannot discern what information is encoded in the representation (MVPC): the categorization of stimuli
based on the neural patterns they evoke
by looking at the activation of each element taken independently. Instead, what matters is the
(a form of decoding).
similarity of a given vector to those elicited by other items, taken across all units in the ensemble. Region of interest (ROI): a subset of
On this view, cardinal is a vector with high values on some dimensions and low values on others. neural units, chosen in a hypothesis-
Examining each dimension reveals no information about the properties of the cardinal, but infor- guided way, upon which an analysis is
conducted.
mation can be gleaned from the fact that cardinal is located very close to goldfinch, reasonably
Regularization: a method of avoiding
close to ostrich, and far from canoe (Box 1). overfitting by finding classifier weights
that jointly minimize classification error
Considering retrieval/inference, most approaches adopt one of two proposals, both compatible and an additional loss which is a function
of the classifier weights.
with the perspectives on conceptual structure outlined above. First, semantic information may be Representational similarity analysis
self-contained within the representation such that activation brings retrieval/inference along (RSA): a method of investigating
with it (Figure 1; left column). For categorical models, the category representation might encapsu- representational structure by comparing
late knowledge of properties essential to or characteristic of category members, as in classical, the similarity structure recorded to that
hypothesized.
prototype, and rational models [36–38]. In feature-based models, because each element of the Self-contained: (of a representation)
representation vector corresponds to an explicit property, the system need only 'read off' the vec- encapsulating semantic information
tor elements active above some threshold to attribute the corresponding properties to the within itself such that mere activation of
the representation brings about retrieval/
perceived/named item. Such a view is captured by semantic feature-based neural network
inference.
models [22–24], spreading-activation models [31,39,40], and distributional semantic models Surface representation: a sensory
that constrain representations to have interpretable dimensions (such as topic models and representation of a stimulus that is
non-negative sparse embeddings; Box 1) [41,42]. For vector space models, although the dimen- modality-specific – for example, color
(specific to the visual modality) or a
sions of the representation space are not independently interpretable, retrieval/inference can still
paddling action (specific to the motor
be self-contained by proposing that these functions rely on similarity and/or direction within the modality).
representation space [34]. For instance, the system may infer that the cardinal can fly and breathe Transcranial magnetic stimulation
because the vectors for the words 'fly' and 'breathe' are both near to the vector for 'cardinal' and (TMS): the use of magnetic fields to
temporarily and reversibly disrupt brain
are situated along a direction in the space that separates behavioral 'can' properties from other
function.
property types (such as parts, names, colors, etc.). Such a perspective is captured by distribu- Vector space: (of a representation)
tional semantic models that are not constrained to yield interpretable dimensions (e.g., latent se- composed of a pattern across
mantic analysis [33], holistic analog to language [43], word2vec [34], and language neural representational units, the meanings of
which cannot be independently
networks [44]) (Box 1). interpreted.
Self-contained approaches face a significant hurdle, however: retrieving the content of a represen-
tation requires a labeling scheme, without which it would be impossible to know which semantic
content 'goes with' which representation vectors (sometimes called the symbol grounding problem
[45]). The second approach to retrieval/inference (Figure 1, right column) addresses this problem
by proposing that semantic content is grounded in perception, action, and language systems
Figure 1. Computational hypotheses about semantic representation. There are three ways in which conceptual structure could be encoded. First, information
may be encoded in discrete, independent category representations (top row). On this view, sensory inputs recruit discrete and independent category representations
Feature-based theories cast semantic representations as vectors that denote the properties of a given item, such as is red,
can fly, or has blood inside for the concept cardinal. Three methods have been used to construct such vectors.
(i) Semantic norming studies ask participants to list the properties that are true of a given concept. Properties generated
and/or verified by many participants are compiled in a matrix with rows corresponding to the tested concepts and columns
corresponding to the various properties generated by the participants across all study concepts [28,111] (J. Tanaka and L.
Szechter, unpublished data).
(ii) Brain-inspired feature vectors identify semantic properties that, from univariate brain imaging, selectively engage differ-
ent cortical areas. Participants then rate the strength of association between a given concept and each such property. The
procedure produces many fewer features than norming studies, but still captures rich conceptual structure [26,52].
(iii) Non-negative sparse word embeddings (NNSE) estimate feature vectors from text corpora by exploiting the tendency
for words with similar meanings to occur in similar contexts. Standard techniques {e.g., latent semantic analysis (LSA)
[33,113,114] and word2vec [34]} generate embeddings with uninterpretable dimensions, but, when embeddings are
constrained to be both sparse (zeros on most dimensions) and non-negative (only positive values on the rest), the resulting
elements are more interpretable and each word can be viewed as a semantic feature vector [115].
Vector spaces cast semantic representations as points in a high-dimensional space where pairwise distances capture
conceptual relatedness, but with uninterpretable dimensions. Two methods are used to compute such spaces.
(i) Unconstrained word embeddings adopt the same corpus-based approach as non-negative sparse embeddings
without sparsity or positivity constraints. The resulting spaces express comparable structure to NNSE using fewer dimen-
sions, but the dimensions are not typically independently interpretable.
(ii) Deep neural networks trained on natural language and/or large image datasets learn vector space representations for
photographs, words, or larger units of language. Deep image classifiers represent color photographs with activation
vectors across many serial processing layers [116,117]; sentence-processing networks represent words, phrases, or
whole passages of text as activation vectors over internal units {e.g., bidirectional encoder representations from trans-
formers (BERT) [44] and generative pretrained transformer 3 (GPT3) [118])}.
that directly encode surface representations of the environment: shapes, colors, parts, move-
ments, affordances, words, and so on [46–48]. On this view, the activation of a categorical,
feature-based, or vector space representation does not in itself cause information retrieval/
inference. Instead, retrieval/inference arises when these structure-encoding representations
activate modality-specific representations that are identical or intimately related to those that di-
rectly mediate perception and action. Thus the categorical/featural/vector space representation
of canoe is meaningful only in virtue of its ability to generate mental images of what a canoe
looks like (including shape, color, parts, etc.), motor actions associated with canoes (e.g., pad-
dling), words used to describe canoes ('boat', 'light', 'floats'), and so on.
which either encapsulate semantic information within themselves [15,20,36,105,106] (top left) or connect and bind modality-specific surface representations encoding
characteristics of category members [49,50] (top right). Second, semantic information may be distributed across independent and interpretable semantic feature
representations, with featural overlap indicating conceptual similarity (middle). Features may independently and intrinsically encode the presence of stipulated semantic
features within a concept [22–24,75] (middle left) or gain meaning via connection to surface representations that directly encode such information [2,25,51,52] (middle
right). Third, semantic information may be encoded by a continuous distributed representation space that expresses conceptual similarities among items even though
its dimensions are not independently interpretable (bottom). Semantic information may be self-contained by the distances encoded in such a space [33,34,41,44]
(bottom left) or grounded via mappings from the space to modality-specific surface representations of specific properties [9,53,54] (bottom right). Black arrows
illustrate how information may flow through the network given the stimuli shown. Text on either side indicates well-known perspectives in the literature that characterize
each view. For feature-based and vector space representations, representational spaces are schematized on a blue background. Blue arrows point to the type of
representational similarity structure encoded by the corresponding layers – note that both self-contained and grounded approaches can encode the same
representational space. Abbreviations: GRAPES, grounding representations in action, perception, and emotion systems; NNSE, non-negative sparse embeddings.
In sum, considering how semantic representations might serve their defining functions – expressing
conceptual structure and supporting semantic retrieval/inference – delineates a well-constrained
space of hypotheses in which cognitive theories of semantic representation can be situated. The
different views, and examples of theories aligning with each, are shown in Figure 1. Each cognitive
hypothesis has implications for how neural data are best collected and analyzed; for instance, adju-
dicating grounded versus self-contained theories may require participants to semantically process
stimuli in different modalities. The next section considers how these views constrain the search for
neural systems that encode semantic information.
Figure 2. Hypotheses about the neuro-semantic code. (A) Within individuals a representation may adopt a homogeneous code (all involved units adopt the same
activation change – i.e., all become more active or all become less active) or a heterogeneous code (the units involved adopt different changes to activation – i.e., some
become more active than others, and/or some become more active and some less active). Across individuals the code may be consistent (the same magnitude and
direction of change in all individuals) or inconsistent (different magnitudes and/or directions of change in different individuals). Spatial smoothing and cross-subject
averaging can either help or hinder discovery depending on the code. (B) In the independent code shown, unit 1 activation indicates whether the item is animate, while
unit 2 independently encodes whether it can fly. In the first conjoint code, the two units express the same similarity relations among the four items, but considered
independently, neither unit clearly expresses either dimension. For instance, fish and plane both moderately activate unit 1, whereas bird and boat moderately activate
unit 2. In the second conjoint example, unit 2 activation is difficult to interpret considered independently, but discriminates birds from fish when unit 1 is active, and
fruits from vegetables when unit 1 is inactive. In both conjoint examples, understanding the neural code requires joint consideration of both units. (C) Anatomically, the
units in a representation may be localized to a contiguous region or dispersed across multiple distal areas, and the units may occupy either the same or different
locations across individuals. The two brains within each white box denote two different individuals. Abbreviation: Betw. individuals, between individuals.
processed. In a heterogeneous code, different units express the same information differently –
some voxels representing cat may be greatly activated when a cat is present, some greatly sup-
pressed, and some only moderately active, etc. Approaches that average unit activations within
participants [e.g., via spatial smoothing or region of interest (ROI) averaging] favor the discov-
ery of homogeneous over heterogeneous codes.
Across individuals, the neural code may be consistent – a given piece of information is always
expressed with the same activity change in homologous units (e.g., cat always being signaled
by the same activation pattern across aligned voxels of different individuals) – or inconsistent
(cat being signaled by different activation patterns across aligned voxels of different individuals;
Figure 2A). Methods that aggregate or summarize unit activation across individuals – for instance,
fitting a single model to decode all participants, computing the mean blood oxygen level-depen-
dent (BOLD) response at each voxel before applying a decoding model, or averaging predictions
of encoding models across participants before passing the result to further analysis – favor the
discovery of consistent over inconsistent codes. Likewise, methods that align voxels across indi-
viduals on the basis of their having similar activation patterns across stimuli (e.g., hyper-alignment)
[56] implicitly assume a consistent code.
By contrast, vector space hypotheses suggest that units conjointly encode a representational
space, and that semantic information is expressed in the activity pattern considered across mul-
tiple units such that single-unit activation may not be interpretable without consideration of other
units in the ensemble. Figure 2B shows two examples. In the middle panel, one cannot determine
whether a stimulus is living or whether it can fly solely by inspecting the activation of unit 1
(because fish and plane elicit equal activation) or unit 2 (because boat and cardinal elicit equal
activation). Considering the joint activation of both units clearly separates living and non-living
things along one diagonal, and flying from non-flying things along the other. In the right panel,
unit 1 clearly encodes whether a stimulus is a plant or animal, but the behavior of unit 2 consid-
ered independently might appear to be arbitrary (activating for banana and cardinal, but not for
carrot or fish). Joint consideration of both units makes the interpretation of unit 2 clear: if unit 1
is active, it differentiates birds from fish; if inactive, it differentiates fruit from vegetables.
Together these factors delineate 24 different possibilities for the organization of the neuro-
semantic code within and across individuals (Table 1). These are not mutually exclusive – different
aspects of a representation, or representations in different conceptual domains, may be orga-
nized according to different principles. Understanding which principles best explain which as-
pects of representation thus requires methods capable of finding each variety of signal.
Table 1. Twenty-four hypotheses about the nature and anatomical organization of the neuro-semantic codea
Code Within subject Across subjects Single Spatial ROI/SL Average before Average after model
voxel blurring model fitting fitting
Type Codeb Location Code Location n = 46 n = 40 n = 63 n = 45 n = 64
Independent Homo Contiguous Consistent Same 100 100 100 100 100
Independent Homo Contiguous Consistent Different 100 100 100 100 10
Independent Homo Contiguous Inconsistent Same 100 100 100 62 62
Independent Homo Contiguous Inconsistent Different 100 100 100 62 9
Independent Homo Dispersed Consistent Same 100 100 42 42 42
Independent Homo Dispersed Consistent Different 100 100 42 42 9
Independent Homo Dispersed Inconsistent Same 100 100 42 23 23
Independent Homo Dispersed Inconsistent Different 100 100 42 23 8
Independent Hetero Contiguous Consistent Same 100 60 60 60 60
Independent Hetero Contiguous Consistent Different 100 60 60 60 9
Independent Hetero Contiguous Inconsistent Same 100 60 60 36 36
Independent Hetero Contiguous Inconsistent Different 100 60 60 36 8
Independent Hetero Dispersed Consistent Same 100 60 30 30 30
Independent Hetero Dispersed Consistent Different 100 60 30 30 8
Independent Hetero Dispersed Inconsistent Same 100 60 30 17 17
Independent Hetero Dispersed Inconsistent Different 100 60 30 17 7
Conjoint Hetero Contiguous Consistent Same 46 23 23 23 23
Conjoint Hetero Contiguous Consistent Different 46 23 23 23 2
Conjoint Hetero Contiguous Inconsistent Same 46 23 23 15 15
Conjoint Hetero Contiguous Inconsistent Different 46 23 23 15 2
Conjoint Hetero Dispersed Consistent Same 46 23 3 3 3
Conjoint Hetero Dispersed Consistent Different 46 23 3 3 1
Conjoint Hetero Dispersed Inconsistent Same 46 23 3 3 3
Conjoint Hetero Dispersed Inconsistent Different 46 23 3 3 1
a
Each row indicates one hypothesis and the first five columns show corresponding combinations of key factors discussed in the text (code type, within-subject homo-
geneity and localization, and between-subject consistency and localization). The remaining columns summarize a review of 100 papers using multivariate methods to un-
cover neuro-semantic representations. Each column represents a common analysis step that entails an implicit assumption about the neural code, including independent
analysis of single voxels (assuming an independent code), spatial blurring of BOLD (assuming a homogeneous code), independent consideration of different areas via ROI
or searchlight (assuming contiguous localization within area), averaging the neural signal across subjects before model fitting (assuming a consistent code), and averaging
of model fit data across subjects (assuming similar localization). The n indicates how many papers adopted the corresponding step. Emphasis shows hypotheses where
the associated step will benefit (bold font) or hinder (italic) discovery. The numbers indicate how many reports are capable of detecting each possible neural code consid-
ering the analysis decisions taken at each step from left to right. The final column indicates the number of reports that adopt choices capable of finding each possible code.
b
Abbreviations: Hetero, heterogeneous; Homo, homogeneous.
Because all imaging methods yield thousands of noisy measurements for each stimulus in each
participant, statistical models that seek informative units must be constrained in some way. Mul-
tivariate methods vary in their approach to this problem and thus in their ability to detect different
types of representations. We consider three broad approaches and their variants (Figure 3) with
an eye to highlighting their respective strengths and limitations. Box 2 additionally considers cru-
cial but commonly overlooked issues for collecting the data that feed these different approaches.
Multivariate pattern classification (MVPC) fits models (Gaussian naive Bayes, support vector
machines, logistic/multinomial regression, etc.) to categorize stimuli from the neural activity they
evoke [57,58]. During a training phase, the model receives labeled data consisting of the neural
responses across units to each of many stimuli (e.g., various images of objects) and, for each
item, a label indicating the stimulus category. Training involves fitting classifier weights to output
the correct label for each item in the training set. The trained model is then evaluated by assessing
whether it outputs the correct category label when given neural responses for test stimuli that are
not present in the training set. Where a fitted model reliably classifies held-out items, input units
are interpreted as encoding information about the target categories. The approach is transpar-
ently consistent with category-based semantic representations but will also yield positive results
for both feature-based and vector space representations provided that the target categories are
separable in the corresponding neural activation patterns (i.e., it is possible to fit a flat hyperplane
that reliably divides the target categories in the high-dimensional representation space). Because
the output of a classifier depends on activation patterns across multiple units, MVPC can detect
both independent and conjoint codes. Classifiers assign unique weights to each unit, and the ap-
proach can therefore detect both homogeneous and heterogeneous codes. Because separate
classifiers are typically fitted for each participant, the method can potentially find inconsistent
and variably localized representations as well.
A key challenge for MVPC concerns over-fitting. With more predictors (neural measurements)
than datapoints (stimuli), model fitting is underdetermined without additional constraint – even
with random data, an infinite set of coefficients will perfectly predict the category membership
of training items [35]. MVPC variants differ in the constraints they impose to handle this issue;
this has important implications for signal discovery (Figure 3A).
One method is to reduce the number of neural features provided as the input to the model by ap-
plying an explicit anatomical constraint. For instance, ROI-based approaches look only at the
units contained in a predefined ROI – discovery therefore requires that the representation is ana-
tomically contiguous and localized similarly across individuals, and also that a sufficient amount of
the representation falls within the preselected region to drive classifier accuracy above chance.
ROI selection also crucially determines how neural evidence can relate to the space of cognitive
hypotheses. For instance, ROIs falling outside modality-specific areas cannot offer evidence rel-
evant to testing grounded theories of representation, whereas those falling solely within a given
modality-specific region cannot evaluate self-contained hypotheses.
Relatedly, searchlight approaches fit a separate classifier at each spatial location in each partici-
pant (e.g., each voxel, source, or electrode), including as predictors all units within a prespecified
anatomical radius ('searchlight') [58,59]. Thus, different brain regions are analyzed separately.
Typically cross-participant univariate statistics at each location assess where in the brain the clas-
sifier hold-out accuracy is reliably better than chance; this approach therefore requires that the
representation is localized similarly across individuals. If this criterion is met, the searchlight can
reveal anatomically dispersed codes, but only if each searchlight independently contains suffi-
cient information to drive classifier accuracy above chance. If accurate classification depends
Figure 3. Approaches to neural decoding. (A) Different solutions to the over-fitting problem faced by multivariate pattern classification (MVPC) and representational
similarity analysis (RSA) approaches. Region of interest (ROI) approaches look only at a prespecified area in each participant and evaluate whether the mean model fit (i.e.,
hold-out error or correlation) across participants differs reliably from chance. Searchlight methods independently evaluate model fit at many 'searchlights' throughout the
brain in each participant, then find areas where searchlights produce above-chance fits reliably across participants. Regularization fits a single model in each participant
using all neural features, but constrains the model to minimize prediction error jointly with an additional cost that prevents over-fitting (discussed in the main text). Non-
zero coefficients in the decoding model of a subject indicate neural units that carry signal; these can be distributed across the brain and can be different for each
participant. Group maps indicate areas where non-zero coefficients accumulate more than expected by chance across individuals. (B) Multivariate pattern classification
fits a model to predict a stimulus category label from the neural pattern it evokes across selected neural units. Mean hold-out accuracy across participants indicates
whether the selected units carry category information and classifier weights can indicate whether category membership is signaled by increased or decreased neural
activation. (C) RSA computes similarity in the neural responses generated across selected units by various stimuli, and then correlates this with a target semantic
similarity matrix. Mean correlation across subjects indicates whether the selected neural units encode semantic structure. (D) Generative approaches use regression to
Stimulus selection
Each modality of stimulus has advantages and disadvantages. Words are easily presented in the scanner, allow all con-
cept types to be probed, and have a perceptual/orthographic structure that is unconfounded with semantic structure.
However, decoding is less successful with words than with picture stimuli generally [82] and written words generate a
strongly asymmetric (left hemisphere) distribution of activation that contrasts with the bilateral pattern found for pictures
and spoken words [119].
Task selection
Tasks used to elicit semantic activation vary across studies in ways that are known to strongly impact the engagement of
underlying neural systems, including their overall difficulty [120], the specificity with which an item must be identified for
good performance [121], reliance on strongly versus weakly encoded information [122], aspects of knowledge the task
foregrounds [25,123], and the degree to which the task can be performed via alternative, non-semantic processing routes
[124].
Image acquisition
The possibility that semantic representations are anatomically dispersed must be tested with whole-brain imaging, thus
posing a challenge for fMRI acquisition where the signal-to-noise ratio varies substantially across the brain [126].
Standard sequences yield especially poor signal in orbitofrontal and ventral anterior temporal regions that are thought to
be crucial for semantic cognition [127]. Strategies for improving the signal, including distortion-corrected spin-echo
[127,128] and multi-echo protocols [129,130], have been available for several years but have only rarely been applied in
semantic studies [131]. Indeed, many studies have restricted the field of view to exclude ventral anterior temporal lobe
(ATL) completely [132].
on joint consideration of units that fall in separate searchlights, the code will be missed. In this
sense, the searchlight may fail to find dispersed, conjoint codes [5,60].
Note that, in principle, classifier accuracy for searchlights and ROIs could be analyzed separately
in each individual, relaxing the assumption of similar localization across participants. We are not
aware of such an approach being applied to semantic decoding and we therefore focus on the
more usual method of using cross-subject univariate statistics to create group-level information
maps for these approaches.
A second approach chooses classifier inputs based on a summary univariate statistic that is com-
puted independently for each unit (such as an F-statistic that contrasts unit activation for different
category members [3], or a correlation-based metric that assesses the stability of the response of
a voxel across stimuli [61]). This avoids the anatomical assumptions of ROI and searchlight
fit models that predict the response of each neural unit to various stimuli. After fitting, the regression weights can be inspected to determine the information that each unit
encodes, and novel brain responses can be 'decoded' by finding the semantic vector most likely to have generated the observed neural pattern and then comparing this to
known semantic vectors. Abbreviations: acc., accuracy; Neg., negative; NSM, neural similarity matrix; Pos., positive; RSM, representational similarity matrix; S1–S3, brains
from three different subjects; stim., stimulus.
approaches but lacks a principled rationale for setting a cut-off threshold and may fail to discover
conjoint representations because each included unit must independently survive the preselection
criterion.
A third strategy employs model regularization: all units in the cortex provide input to the classifier,
which avoids over-fitting by jointly minimizing classification error and an additional loss that is itself a
function of the classifier weights [5]. Common losses include the sum of the squared coefficients
(L2-norm, also known as ‘ridge’ regression [62]), the sum of their absolute values {L1-norm, also
known as ‘LASSO’ (least absolute shrinkage and selection operator) [63]}, or a weighted average
of these (also known as 'elastic net' [64]). The approach makes no assumption about the anatom-
ical location of signal-carrying units within or across participants, can detect conjoint representa-
tions (because it does not require independent preselection of classifier units), and offers a
principled way to guide parameterization via nested cross-validation of prediction error [5].
Crucially, however, different regularizers impose different constraints on model fitting, leading to
wildly different solutions [5]. Regularization with the L1 norm zeros out as many predictors as pos-
sible while still maximizing predictive accuracy, and typically 'selects' (i.e., places non-zero coef-
ficients on) a very small proportion of units. By contrast, the L2 norm spreads similar weights
across correlated units and places non-zero weights on all units. The choice of regularizer thus
implements an assumption about the likely nature of the true signal: that signal-carrying units
are sparse and uncorrelated (L1) or that they are dense and highly redundant (L2). An alternative
approach designs loss functions that explicitly incorporate prior knowledge about the likely neural
and cognitive structure. For instance, the sparse overlapping sets (SOS) LASSO penalty encour-
ages patterns of 'structured sparsity' where selected units reside in roughly similar locations
across participants, promoting loose anatomical clustering that still permits some variation in sig-
nal location across participants [65,66].
These differences can yield radically different views of the neuro-semantic code when applied to
the same data. In Figure 4A, neural representations of face stimuli appear to be increasingly
widely distributed and heterogeneous as analytic methods progressively relax tacit assumptions
about the independence, heterogeneity, and localization of the neural code. Standard univariate
contrast (assuming a consistently localized, independent, and homogeneous code) replicates the
classic finding of a right-lateralized posterior fusiform area that is more active for faces. Search-
light (assuming a similarly localized and contiguous but potentially conjoint and heterogeneous
code) suggests a bilateral representation localized to posterior ventral temporal cortex. Whole-
brain MVPC regularized with the L1 norm (assuming a sparse code that can be dispersed, het-
erogeneous, and differently localized) shows a bilateral face-to-nonface gradient in posterior ven-
tral temporal cortex and a face-selective region in right lateral occipital cortex. Regularization with
the SOS LASSO (allowing dispersed, heterogeneous, and differently localized codes, but preferring
solutions with roughly similar anatomical distributions) suggests a much more broadly distributed
code encompassing anterior temporal, parietal, and prefrontal regions in both hemispheres [5].
Representational similarity analysis (RSA) searches for sets of units whose responses ex-
press semantic similarities among stimuli [58,59,67]. The analysis first computes a target repre-
sentational similarity matrix (RSM; sometimes defined in terms of dissimilarity where it is called
a target representational dissimilarity matrix) that expresses semantic relatedness for all pairs of
stimuli (Box 1). It then estimates a neural similarity matrix (NSM; sometimes called a neural repre-
sentational dissimilarity matrix) that encodes pairwise similarities in stimulus-evoked neural activity
across a set of units. The correlation between RSM and NSM indicates whether the selected units
encode the target structure (Figure 3C).
Figure 4. Example results from various decoding methods applied to fMRI data. (A) Four different multivariate pattern classification (MVPC) approaches applied
to the same dataset. Participants made pleasantness judgments in response to images of faces, places, or objects, and each analysis sought voxel sets that differentiate
face from non-face stimuli. Approaches that assume consistently localized signals (univariate and searchlight) suggest that representations are localized to posterior ventro-
temporal cortex, whole-brain decoding with sparse regularization suggests a somewhat more distributed representation, whereas decoding with structured sparsity
suggests a widely distributed representation [5]. (B) Searchlight representational similarity analysis (RSA) decoding of semantic structure from pictures, words, or both.
(Figure legend continued at the bottom of the next page.)
Similarly to MVPC, RSA can detect categorical, feature-based, and vector space representations
provided that the NSM and semantic RSM correlate positively. Because neural similarities are
computed across multiple units, the technique can detect conjoint or independent codes and
heterogeneous or homogeneous codes. A central challenge concerns how neural units are se-
lected and evaluated for significance. Most studies employ either a prespecified ROI or a search-
light technique. The correlation between RSM and NSM is computed for each ROI or searchlight
individually in each participant and, if these are reliably positive across individuals, the ROI/search-
light is interpreted as encoding semantic structure. As with MVPC, information maps could be an-
alyzed separately in each individual, but RSA as typically practiced requires that (i) representations
are localized similarity across individuals, (i) information is not conjointly encoded across different
searchlights or ROIs, and (iii) individual searchlights contain sufficient information to drive correla-
tions with the target matrix reliably above chance.
RSA views even small correlations as meaningful provided that they are reliably positive across
participants. Because semantic structure covaries with many confounding factors, the results
can be difficult to interpret. For instance, early studies using visual stimuli suggested that posterior
temporo-occipital areas encode semantic structure [68], but a recent comparative analysis found
that these areas more strongly encode high-order visual structure and semantic structure was
better encoded in more anterior ventro-temporal regions (Figure 4B, top) [69]. Studies that do
not control for visual similarity suggest that semantic structure for both words and pictures is
encoded within a left perisylvian network [70], but when stimuli orthogonally vary semantic and
visual similarity, semantic structure for words appears to be localized to the medial-ventral ante-
rior temporal lobe [71] (Figure 4B, bottom). Thus, very different patterns are obtained depending
upon the target RSMs, the selection of stimuli, and the input modality (Box 2).
First, generative approaches can fail to predict the independent activity of a unit that forms part of
a conjoint code. To see this, consider the second conjoint example in Figure 2B right, where two
units both contribute to a semantic representation. If unit 1 is active, unit 2 differentiates fish from
Results vary remarkably depending on several factors, including the representational similarity matrices (RSMs) considered (semantic similarity alone [68] produces different
results from comparing semantic versus visual similarity; top two images [69]) and experimental control of stimulus properties (semantic structure for words appears to be
encoded in perisylvian regions when visual structure is uncontrolled [70], but in ventral anterior temporal lobe (ATL) when controlled [71]). (C) Generative approaches for
decoding semantic representations of narrative speech/sentences. When predictor vectors have semantically interpretable dimensions, and encoder weights are used
to interpret the meaning of a voxel’s activation, the results seem to show a mosaic of localized semantic features across cortex within each subject, but callouts show
areas where the proposed semantic content is at odds with traditional understanding of function (top; images generated from online visualization tool at https://
gallantlab.org/huth2016/). Approaches that invert encoding models to decode whole-brain states (bottom) can recover sentence meanings with good accuracy, but
the nature of the underlying code is difficult to discern because the approach selects thousands of voxels widely distributed across cortex in each participant (right),
with approximately equal proportions residing in various pre-defined brain networks [1] (left). In both cases verbal semantic representations appear to be widely distributed
across cortex and highly variable across individuals. For references see [1,5,68–71,75]. Abbreviations: ant, anterior; LOC, lateral occipital complex; Pic, picture; post, pos-
terior; pref, preference; PR, perirhinal cortex; Prop., proportion; reg., regularization; TP, temporal pole.
birds; if inactive, unit 2 instead differentiates fruits from vegetables. The 'meaning' of unit 2 is clear
when unit 1 is taken into consideration, but might appear arbitrary when considered indepen-
dently. An encoder model might struggle to predict the independent behavior of unit 2 from se-
mantic features such as can move, has feathers, is sweet, etc., and thus might suggest that it
is not involved in semantic representation.
The second challenge concerns interpretation. One strategy fits the encoders using semantic
vectors whose elements are each individually interpretable (such as a semantic feature vector;
Box 1), and then inspects the encoder weights for each unit to understand what content it en-
codes [2,75,76]. For instance, if the activation of a voxel is reliably predicted by semantic features
such as can move, can grow, and has eyes, these features will receive non-zero weights in the
regression model for that voxel, which might then be interpreted as encoding animacy. The
goal is to understand each unit as independently encoding a subset of semantic features, thereby
yielding an interpretable semantic feature map of cortex that is consistent with feature-based
cognitive models. Because there are many potential semantic features, however, the encoder
fit must be regularized using techniques such as those described earlier for MVPC (commonly
L2 norm, e.g., [16], although other approaches are also popular, e.g., [77]). As we have seen, dif-
ferent regularizers can produce dramatically different configurations of weights, and the interpre-
tation of encoder weights therefore hinges crucially upon the choice of the regularizer. Perhaps for
this reason, approaches adopting this strategy have yielded puzzling findings – suggesting a
mosaic-like organization of local semantic features across many cortical areas that is difficult to
reconcile with the wealth of cognitive and clinical neuroscience information about the functions
of these regions [75] (Figure 4C, top).
An alternative strategy eschews the effort to identify a 'meaning' for individual units and instead
decodes the full activation pattern evoked across cortical units by inverting the encoder models
to find the semantic vector that is most likely to have generated the whole-brain response. The
recovered vector is interpreted by comparing its similarity to vectors corresponding to known
words or sentences [1,74]. For instance, if the decoded vector is near to the known vectors for
grow, move, eat, eyes, legs, fur, it will be interpreted as encoding a meaning such as animal. Be-
cause no effort is made to interpret each dimension, this method is consistent with vector space
approaches, but can also detect category or feature-based representations. One recent study
showed remarkably good decoding of sentence-level meaning using this approach [1] – but
the implications of the study for understanding neural organization of semantics remain unclear
because the results identified thousands of voxels scattered across the cortex in each individual,
with approximately equal involvement of many different brain networks and no voxels selected in
more than half of the participants (Figure 4C, bottom).
It is worth noting that each general approach encompasses several variants – for instance, in the
particular classification model adopted by MVPC [58] and the specific similarity metric used by
RSA [78,79]. Although a full characterization of each is beyond the scope of this review, it
seems likely that such variation further contributes to the heterogeneity of the findings reported
in the literature.
localized similarly across individuals. Grounded approaches suggest that such areas can encode
semantic information about stimuli, and studies designed specifically to assess whether semantic
structure arises within a given modality [80,81] therefore have good motivation to employ ROI or
searchlight-based feature selection. The anatomical organization of tertiary and association cor-
tices is less well understood and may be more likely to vary across individuals, therefore studies
seeking semantic structure outside the earlier modality-specific regions are better served by the
adoption of approaches that loosen localization, homogeneity, and consistency assumptions.
Assessment of self-contained hypotheses will depend crucially on such methods because they
propose that semantic representations encode information in a modality-independent manner.
Second, adjudication of grounded versus self-contained hypotheses requires studies that probe
semantic information through different stimulus modalities. Self-contained views hold that the
same system of semantic representation is engaged regardless of whether the stimulus is a
word, picture, image, sound, etc. Such a view cannot be disconfirmed by evidence that, for in-
stance, semantic information is decodable from visual areas when a visual stimulus appears be-
cause such a result might also arise if the structure of purely perceptual visual representations is
confounded with semantic structure (e.g., Figure 4B). Evaluating the proposal instead requires
searching for neural systems from which semantic information can be decoded across multiple
different stimulus modalities. Currently, the literature contains relatively few such studies, and
these have yielded mixed findings [70,82–84] (further details are given in the supplemental infor-
mation online).
A central question thus concerns how the field might best proceed given the complexity and het-
erogeneity of contemporary methods and the filtering that inevitably results. No analytic approach
is assumption-free, and we doubt that the universal adoption of any single method will resolve the
issues we have identified. Instead, we believe the field would be well served by adopting some
best practices in the way that studies are designed and results are communicated.
which are important because they allow the reader to understand why a given analysis method
was chosen and how the observed results relate to the working hypothesis.
Figure 5. Recent examples of computational models informing neural decoding. (A) In recurrent models the activation patterns that encode semantic information change
over the course of stimulus processing. In simulated electrocorticography (ECoG, left), classifiers fit to different temporal windows (colored dots) decode well within the same and
neighboring time-windows, but poorly for more distal time-windows (colored lines). A similar pattern arises when the same approach is used to decode ECoG from human anterior
temporal cortex while participants name pictures, suggesting rapid nonlinear change in the neuro-semantic code [133]. (B) Deep convolutional neural networks (DCNNs) may
provide a useful framework for understanding visual object semantics [134,135]. A recent study assessed whether a trained DCNN could classify images when activations at a
given model layer were replaced by neural responses (measured by fMRI) of different visual areas [136]. Neural patterns from each area were successfully decoded, but only
when they were input to the deeper model layers (barplot) – suggesting that the richer semantic structure encoded in such layers is reflected throughout the ventral visual
stream. (C) Other work uses similar models to evaluate individual differences across parts of the vision-to-semantics system [137]. In the plot shown the authors trained several
models, measured similarity in the representational geometry acquired in each layer across models, and embedded these in two dimensions. The proximity of colored circles
indicates the similarity of the representational structure acquired by the corresponding layers. Lines connect layers in the same model. Shallower model layers (light colors)
always learned relatively similar structure, whereas deeper layers – those most likely to express abstract semantic structure – learned more variable structure, suggesting that
neural codes may differ more across individuals in the regions that are most likely to encode semantic structure. For references see [133,136,137]. Abbreviations: AUC, area
under the curve; dim, dimension; LOC, lateral occipital complex; MDS, multidimensional scaling; V1–V4, visual cortex areas 1–4.
Concluding remarks
Our review illustrates that methodological choices in multivariate neuroimaging analysis selec-
tively filter data to promote discovery of some types of neuro-semantic codes over others.
These considerations compel a re-evaluation of the literature. Over three decades many neuroim-
aging studies have reported cortical areas that locally encode a particular type of semantic infor-
mation in a systematic way across individuals. The preponderance and replicability of such
Acknowledgments
This work was supported by an MRC Career Development Award (MR/V031481/1) to A.D.H., by a grant from the Rosetrees
Trust (A1699) to A.D.H. and M.A.L.R., and by an Advanced European Research Council (ERC) award (GAP 670428-30
BRAIN2MIND_NEUROCOMP), MRC programme grant (MR/R023883/1), and intramural funding (MC_UU_00005/18) to
M.A.L.R.
Declaration of interests
The authors declare no conflicts of interest.
Supplemental information
Supplemental information associated with this article can be found online at https://doi.org/10.1016/j.tics.2022.12.006.
References
1. Pereira, F. et al. (2018) Toward a universal decoder of linguistic 12. Waxman, S.R. and Markow, D.B. (1995) Words as invitations to
meaning from brain activation. Nat. Commun. 9, 1–13 form categories: evidence from 12- to 13-month-old infants.
2. Popham, S.F. et al. (2021) Visual and linguistic semantic repre- Cogn. Psychol. 29, 257–302
sentations are aligned at the border of human visual cortex. Nat. 13. Booth, A.E. and Waxman, S.R. (2008) Taking stock as theories
Neurosci. 24, 1628–1636 of word learning take shape. Dev. Sci. 11, 185–194
3. Visconti di Oleggio Castello, M. Visconti et al. (2021) Shared 14. Lin, E.L. and Murphy, G.L. (2001) Thematic relations in adults'
neural codes for visual and semantic information about familiar concepts. J. Exp. Psychol. Gen. 130, 3–28
faces in a common representational space. Proc. Natl. Acad. 15. Anderson, J.R. (1991) The adaptive nature of human categorization.
Sci. U. S. A. 118, e2110474118 Psychol. Rev. 98, 409–426
4. Kriegeskorte, N. (2015) Deep neural networks: a new frame- 16. Rosch, E. et al. (1976) Basic objects in natural categories.
work for modeling biological vision and brain information Cogn. Psychol. 8, 382–439
processing. Annu. Rev. Vis. Sci. 1, 417–446 17. Collins, A.M. and Quillian, M.R. (1969) Retrieval time from se-
5. Cox, C.R. and Rogers, T.T. (2021) Finding distributed needles mantic memory. J. Verbal Learn. Verbal Behav. 8, 240–247
in neural haystacks. J. Neurosci. 41, 1019–1032 18. Jolicoeur, P. et al. (1984) Pictures and names: making the
6. Mandler, J.M. (2006) The Foundations of Mind: Origins of connection. Cogn. Psychol. 19, 31–53
Conceptual Thought (1st edn), Oxford University Press 19. Xu, F. and Tenenbaum, J.B. (2007) Word learning as Bayesian
7. Pauen, S. (2002) Evidence for knowledge-based category dis- inference. Psychol. Rev. 114, 245–272
crimination in infancy. Child Dev. 73, 1016–1033 20. Serre, T. et al. (2007) A feedforward architecture accounts for
8. Pauen, S. (2002) The global-to-basic shift in infants' categorical rapid categorization. Proc. Natl. Acad. Sci. 104, 6424–6429
thinking: first evidence from a longitudinal study. Int. J. Behav. 21. Humphreys, G.W. and Forde, E.M. (2001) Hierarchies, similar-
Dev. 26, 492–499 ity, and interactivity in object-recognition: on the multiplicity of
9. Rogers, T.T. et al. (2004) The structure and deterioration of 'category-specific' deficits in neuropsychological populations.
semantic memory: a computational and neuropsychological Behav. Brain Sci. 24, 453–509
investigation. Psychol. Rev. 111, 205–235 22. Farah, M.J. and McClelland, J.L. (1991) A computational
10. Lopez, A. et al. (1997) The tree of life: universal and cultural fea- model of semantic memory impairment: modality-specificity
tures of folkbiological taxonomies and inductions. Cogn. and emergent category-specificity. J. Exp. Psychol. Gen.
Psychol. 32, 251–295 120, 339–357
11. Hodges, J.R. et al. (1995) Charting the progression in semantic 23. Cree, G. et al. (1999) An attractor model of lexical conceptual
dementia: implications for the organisation of semantic memory. processing: simulating semantic priming. Cogn. Sci. 23,
Memory 3, 463–495 371–414
24. Tyler, L. et al. (2000) Conceptual structure and the structure of 52. Fernandino, L. et al. (2022) Decoding the information structure
concepts: a distributed account of category-specific deficits. underlying the neural representation of concepts. Proc. Natl.
Brain Lang. 75, 195–231 Acad. Sci. 119, e2108091119
25. Martin, A. (2007) The representation of object concepts in the 53. Patterson, K. et al. (2007) Where do you know what you know?
brain. Annu. Rev. Psychol. 58, 25–45 The representation of semantic knowledge in the human brain.
26. Anderson, A.J. et al. (2019) An integrated neural decoder of lin- Nat. Rev. Neurosci. 8, 976–987
guistic and experiential meaning. J. Neurosci. 39, 8969–8987 54. Lambon Ralph, M.A. et al. (2017) The neural and computational
27. McRae, K. et al. (1997) On the nature and scope of featural rep- bases of semantic cognition. Nat. Rev. Neurosci. 18, 42–55
resentations of word meaning. J. Exp. Psychol. Gen. 126, 55. Rogers, T.T. and McClelland, J.L. (2004) Semantic Cognition: A
99–130 Parallel Distributed Processing Approach, MIT Press
28. Ruts, W. et al. (2004) Dutch norm data for 13 semantic catego- 56. Guntupalli, J.S. et al. (2016) A model of representational spaces
ries and 338 exemplars. Behav. Res. Methods Instrum. in human cortex. Cereb. Cortex 26, 2919–2934
Comput. 36, 506–515 57. Pereira, F. and Botvinick, M. (2011) Information mapping with
29. Mervis, C.B. and Rosch, E. (1981) Categorization of natural pattern classifiers: a comparative study. NeuroImage 56,
objects. Annu. Rev. Psychol. 32, 89–115 476–496
30. Mack, M. and Palmeri, T. (2011) The timing of visual object 58. Norman, K.A. et al. (2006) Beyond mind-reading: multi-voxel
categorization. Front. Psychol. 2, 165 pattern analysis of fMRI data. Trends Cogn. Sci. 10, 424–430
31. Collins, A.M. and Loftus, E.F. (1975) A spreading-activation the- 59. Kriegeskorte, N. et al. (2006) Information-based functional brain
ory of semantic processing. Psychol. Rev. 82, 407–428 mapping. Proc. Natl. Acad. Sci. 103, 3863–3868
32. Riesenhuber, M. and Poggio, T. (1999) Hierarchical models of 60. Cox, C.R. et al. (2015) Connecting functional brain imaging and
object recognition in cortex. Nat. Neurosci. 2, 1019–1025 parallel distributed processing. Lang. Cogn. Neurosci. 30,
33. Landauer, T.K. and Dumais, S.T. (1997) A solution to Plato's 380–394
problem: the latent semantic analysis theory of acquisition, in- 61. Vargas, R. and Just, M.A. (2020) Neural representations of
duction, and representation of knowledge. Psychol. Rev. 104, abstract concepts: identifying underlying neurosemantic
211–240 dimensions. Cereb. Cortex 30, 2157–2166
34. Mikolov, T. et al. (2013) Distributed representations of words 62. Hoerl, A.E. and Kennard, R.W. (2000) Ridge regression: biased
and phrases and their compositionality. In Proceedings of the estimation for nonorthogonal problems. Technometrics 42,
26th International Conference on Neural Information Processing 80–86
Systems (Vol. 2), pp. 3111–3119, Curran Associates Inc. 63. Tibshirani, R. (1996) Regression shrinkage and selection via the
35. Pereira, F. et al. (2016) A comparative evaluation of off-the-shelf lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288
distributed semantic representations for modelling behavioural 64. Jia, J. and Yu, B. (2008) On model selection consistency of the
data. Cogn. Neuropsychol. 33, 175–190 elastic net when p >> n. Stat. Sin. 20, 595–611
36. Katz, J. (1972) Semantic Theory, Addison-Wesley Educational 65. Rao, N. et al. (2013) Sparse overlapping sets lasso for multitask
Publishers learning and its application to fMRI analysis. Adv. Neural Inf.
37. Rosch, E. (1978) Principles of categorization. In Cognition and Proces. Syst. 26, 2202–2210
Categorization (Lloyd, B. and Rosch, E., eds), pp. 27–48, 66. Rao, N. et al. (2016) Classification with the sparse group lasso.
Lawrence Erlbaum Associates IEEE Trans. Signal Process. 64, 448–463
38. Hampton, J.A. (2015) Categories, prototype and exemplars. 67. Pereira, F. et al. (2009) Machine learning classifiers and fMRI: a
In The Routledge Handbook of Semantics (Riemer, N., ed.), tutorial overview. NeuroImage 45, S199–S209
pp. 141–157, Routledge 68. Connolly, A.C. et al. (2012) The representation of biological
39. Rotaru, A.S. et al. (2018) Modeling the structure and dynamics classes in the human brain. J. Neurosci. 32, 2608–2618
of semantic processing. Cogn. Sci. 42, 2890–2917 69. Devereux, B.J. et al. (2018) Integrated deep visual and se-
40. Kumar, A.A. et al. (2022) A critical review of network-based and mantic attractor neural networks predict fMRI pattern-infor-
distributional approaches to semantic memory structure and mation along the ventral object processing pathway. Sci.
processes. Top. Cogn. Sci. 14, 54–77 Rep. 8, 1–12
41. Griffiths, T.L. et al. (2007) Topics in semantic representation. 70. Devereux, B.J. et al. (2013) Representational similarity analysis
Psychol. Rev. 114, 211–244 reveals commonalities and differences in the semantic process-
42. Derby, S. et al. (2018) Using sparse semantic embeddings ing of words and objects. J. Neurosci. 33, 18906–18916
learned from multimodal text and image data to model human 71. Martin, C.B. et al. (2018) Integrative and distinctive coding of vi-
conceptual knowledge. ArXiv Published online September 7, sual and conceptual object features in the ventral visual stream.
2018. https://doi.org/10.48550/arXiv.1809.02534 eLife 7, e31873
43. Burgess, C. and Lund, K. (1997) Modelling parsing constraints 72. Mitchell, T.M. et al. (2008) Predicting human brain activity asso-
with high-dimensional context space. Lang. Cogn. Process. 12, ciated with the meanings of nouns. Science 320, 1191–1195
177–210 73. Just, M.A. et al. (2010) A neurosemantic theory of concrete
44. Devlin, J. et al. (2019) BERT: pre-training of deep bidirectional noun representation based on the underlying brain codes.
transformers for language understanding. ArXiv Published on- PLoS One 5, e8622
line May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805 74. Pereira, F. et al. (2011) Generating text from functional brain
45. Barsalou, L.W. (2008) Grounded cognition. Anuual Rev. images. Front. Hum. Neurosci. 5, 72
Psychol. 59, 617–645 75. Huth, A.G. et al. (2016) Natural speech reveals the semantic
46. Barsalou, L.W. (2003) Situated simulation in the human con- maps that tile human cerebral cortex. Nature 532, 453–458
ceptual system. Lang. Cogn. Process. 18, 513–562 76. Huth, A.G. et al. (2012) A continuous semantic space describes
47. Glenberg, A.M. and Robertson, D.A. (2000) Symbol grounding the representation of thousands of object and action categories
and meaning: a comparison of high-dimensional and embodied across the human brain. Neuron 76, 1210–1224
theories of meaning. J. Mem. Lang. 43, 379–401 77. Nunez-Elizalde, A.O. et al. (2019) Voxelwise encoding models
48. Glenberg, A.M. (2010) Embodiment as a unifying perspective with non-spherical multivariate normal priors. NeuroImage
for psychology. Wiley Interdiscip. Rev. Cogn. Sci. 1, 586–596 197, 482–492
49. Damasio, A.R. (1989) The brain binds entities and events by 78. Haxby, J.V. et al. (2014) Decoding neural representational
multiregional activation from convergence zones. Neural spaces using multivariate pattern analysis. Annu. Rev.
Comput. 1, 123–132 Neurosci. 37, 435–456
50. Damasio, H. et al. (2004) Neural systems behind word and con- 79. Diedrichsen, J. and Kriegeskorte, N. (2017) Representational
cept retrieval. Cognition 92, 179–229 models: a common framework for understanding encoding,
51. Martin, A. (2016) GRAPES – grounding representations in ac- pattern-component, and representational-similarity analysis.
tion, perception, and emotion systems: how object properties PLoS Comput. Biol. 13, e1005508
and categories are represented in the human brain. Psychon. 80. Clarke, A. and Tyler, L.K. (2014) Object-specific semantic cod-
Bull. Rev. 23, 979–990 ing in human perirhinal cortex. J. Neurosci. 34, 4766–4775
81. Carota, F. et al. (2021) Category-specific representational pat- 105. Rosch, E. (1975) Cognitive representations of semantic categories.
terns in left inferior frontal and temporal cortex reflect similarities J. Exp. Psychol. Gen. 104, 192–233
and differences in the sensorimotor and distributional properties 106. Armstrong, S.L. et al. (1983) What some concepts might not
of concepts. BioRxiv Published online September 3, 2021. be. Cognition 13, 263–308
https://doi.org/10.1101/2021.09.03.458378 107. Caramazza, A. and Shelton, J.R. (1998) Domain-specific knowl-
82. Shinkareva, S.V. et al. (2011) Commonality of neural represen- edge systems in the brain: the animate–inanimate distinction.
tations of words and pictures. NeuroImage 54, 2418–2425 J. Cogn. Neurosci. 10, 1–34
83. Simanova, I. et al. (2014) Modality-independent decoding of se- 108. Kanwisher, N. (2010) Functional specificity in the human brain: a
mantic information from the human brain. Cereb. Cortex 24, window into the functional architecture of the mind. Proc. Natl.
426–434 Acad. Sci. U. S. A. 107, 11163–11170
84. Handjaras, G. et al. (2016) How concepts are encoded in the 109. Murphy, G. (2002) The Big Book of Concepts, MIT Press
human brain: a modality independent, category-based cortical 110. Murphy, G. and Medin, D.L. (1985) The role of theories in con-
organization of semantic knowledge. NeuroImage 135, ceptual coherence. Psychol. Rev. 92, 289–316
232–242 111. McRae, K. et al. (2005) Semantic feature production norms for a
85. Rogers, T.T. (2020) Neural networks as a critical level of de- large set of living and nonliving things. Behav. Res. Methods
scription for cognitive neuroscience. Curr. Opin. Behav. Sci. Instrum. Comput. 37, 547–559
32, 167–173 113. Landauer, T.K. (1998) Learning and representing verbal meaning:
86. Yuste, R. (2015) From the neuron doctrine to neural networks. the latent semantic analysis theory. Curr. Dir. Psychol. Sci. 7,
Nat. Rev. Neurosci. 16, 487–497 161–164
87. Yang, G.R. et al. (2019) Task representations in neural networks 114. Landauer, T.K. et al. (1998) An introduction to latent semantic
trained to perform many cognitive tasks. Nat. Neurosci. 22, analysis. Discourse Process. 25, 259–284
297–306 115. Panigrahi, A. et al. (2019) Word2Sense: sparse interpret-
88. Richards, B.A. et al. (2019) A deep learning framework for able word embeddings. In Proceedings of the 57th Annual
neuroscience. Nat. Neurosci. 22, 1761–1770 Meeting of the Association for Computational Linguistics,
89. Patterson, K. and Hodges, J. (2000) Semantic dementia: one pp. 5692–5705, ACL
window on the structure and organisation of semantic memory. 116. Krizhevsky, A. et al. (2017) ImageNet classification with deep
In Handbook of Neuropsychology Vol. 2: Memory and Its Disor- convolutional neural networks. Commun. ACM 60, 84–90
ders (Cermak, J., ed.), pp. 313–333, Elsevier Science 117. Simonyan, K. and Zisserman, A. (2015) Very deep
90. Caramazza, A. and Mahon, B.Z. (2003) The organization of convolutional networks for large-scale image recognition.
conceptual knowledge: the evidence from category-specific se- ArXiv Published online April 10, 2015. https://doi.org/10.
mantic deficits. Trends Cogn. Sci. 7, 354–361 48550/arXiv.1409.1556
91. Mesulam, M.M. et al. (2013) Words and objects at the tip of the 118. Floridi, L. and Chiriatti, M. (2020) GPT-3: its nature, scope,
left temporal lobe in primary progressive aphasia. Brain limits, and consequences. Mind. Mach. 30, 681–694
J. Neurol. 136, 601–618 119. Liuzzi, A.G. et al. (2015) Left perirhinal cortex codes for similarity
92. Jefferies, E. and Lambon Ralph, M.A. (2006) Semantic impair- in meaning between written words: comparison with auditory
ment in stroke aphasia versus semantic dementia: a case- word input. Neuropsychologia 76, 4–16
series comparison. Brain 129, 2132–2147 120. Sabsevitz, D.S. et al. (2005) Modulation of the semantic system
93. Acosta-Cabronero, J. et al. (2011) Atrophy, hypometabolism by word imageability. NeuroImage 27, 188–200
and white matter abnormalities in semantic dementia tell a co- 121. Rogers, T.T. et al. (2006) Anterior temporal cortex and se-
herent story. Brain 134, 2025–2035 mantic memory: reconciling findings from neuropsychology
94. Chen, L. and Rogers, T.T. (2014) Revisiting domain-general ac- and functional imaging. Cogn. Affect. Behav. Neurosci. 6,
counts of category specificity in mind and brain. Wiley 201–213
Interdiscip. Rev. Cogn. Sci. 5, 327–344 122. Noonan, K.A. et al. (2013) Going beyond inferior prefrontal in-
95. Pobric, G. et al. (2010) Category-specific versus category- volvement in semantic control: evidence for the additional con-
general semantic impairment induced by transcranial magnetic tribution of dorsal angular gyrus and posterior middle temporal
stimulation. Curr. Biol. 20, 964–968 cortex. J. Cogn. Neurosci. 25, 1824–1850
96. Pobric, G. et al. (2007) Anterior temporal lobes mediate seman- 123. Chiou, R. et al. (2018) Controlled semantic cognition relies upon
tic representation: mimicking semantic dementia by using rTMS dynamic and flexible interactions between the executive
in normal participants. Proc. Natl. Acad. Sci. U. S. A. 104, 'semantic control' and hub-and-spoke 'semantic representation'
20137–20141 systems. Cortex 103, 100–116
97. Lambon Ralph, M.A. et al. (2009) Conceptual knowledge is 124. Graves, W.W. et al. (2010) Neural systems for reading
underpinned by the temporal pole bilaterally: convergent evi- aloud: a multiparametric approach. Cereb. Cortex 20,
dence from rTMS. Cereb. Cortex 19, 832–838 1799–1815
98. Mahon, B.Z. et al. (2007) Action-related properties shape 125. Lewis-Peacock, J.A. and Postle, B.R. (2008) Temporary activa-
object representations in the ventral stream. Neuron 55, tion of long-term memory supports working memory.
507–520 J. Neurosci. 28, 8765–8771
99. Binney, R.J. et al. (2012) Convergent connectivity and graded 126. Liu, T.T. (2016) Noise contributions to the fMRI signal: an
specialization in the rostral human temporal lobe as revealed overview. NeuroImage 143, 141–151
by diffusion-weighted imaging probabilistic tractography. 127. Embleton, K.V. et al. (2010) Distortion correction for diffusion-
J. Cogn. Neurosci. 24, 1998–2014 weighted MRI tractography and fMRI in the temporal lobes.
100. Chen, L. et al. (2017) A unified model of human semantic knowl- Hum. Brain Mapp. 31, 1570–1587
edge and its disorders. Nat. Hum. Behav. 1, 1–10 128. Binney, R.J. et al. (2010) The ventral and inferolateral aspects
101. Plaut, D.C. and Behrmann, M. (2011) Complementary neural of the anterior temporal lobe are crucial in semantic memory:
representations for faces and words: a computational exploration. evidence from a novel direct comparison of distortion-
Cogn. Neuropsychol. 28, 251–275 corrected fMRI, rTMS, and semantic dementia. Cereb. Cortex
102. Behrmann, M. and Plaut, D.C. (2012) Bilateral hemispheric pro- 20, 2728–2738
cessing of words and faces: evidence from word impairments in 129. Halai, A.D. et al. (2014) A comparison of dual gradient-echo and
prosopagnosia and face impairments in pure alexia. Cereb. spin-echo fMRI of the inferior temporal lobe. Hum. Brain Mapp.
Cortex 24, 1102–1118 35, 4118–4128
103. Van Rullen, R. and Thorpe, S.J. (2001) Is it a bird? Is it a plane? 130. Kundu, P. et al. (2017) Multi-echo fMRI: a review of applications in
Ultra-rapid visual categorization of natural and artifactual fMRI denoising and analysis of BOLD signals. NeuroImage 154,
objects. Perception 30, 655–668 59–80
104. Rogers, T.T. and Patterson, K. (2007) Object categorization: 131. Asyraff, A. et al. (2021) Stimulus-independent neural coding of
reversals and explanations of the basic-level advantage. event semantics: evidence from cross-sentence fMRI decoding.
J. Exp. Psychol. Gen. 136, 451 NeuroImage 236, 118073
132. Visser, M. et al. (2010) Semantic processing in the anterior 140. Mahon, B.Z. et al. (2009) Category-specific organization in the
temporal lobes: a meta-analysis of the functional neuroimaging human brain does not require visual experience. Neuron 63,
literature. J. Cogn. Neurosci. 22, 1083–1094 397–405
133. Rogers, T.T. et al. (2021) Evidence for a deep, distributed and 141. Mahon, B.Z. et al. (2010) The representation of tools in left pa-
dynamic code for animacy in human ventral anterior temporal rietal cortex is independent of visual experience. Psychol. Sci.
cortex. eLife 10, e66276 21, 764–771
134. Kriegeskorte, N. et al. (2008) Matching categorical object repre- 142. Bedny, M. and Saxe, R. (2012) Insights into the origins of
sentations in inferior temporal cortex of man and monkey. Neuron knowledge from the cognitive neuroscience of blindness.
60, 1126–1141 Cogn. Neuropsychol. 29, 56–84
135. Cadieu, C.F. et al. (2014) Deep neural networks rival the repre- 143. Chen, L. and Rogers, T.T. (2015) A model of emergent
sentation of primate IT cortex for core visual object recognition. category-specific activation in the posterior fusiform gyrus of
PLoS Comput. Biol. 10, e1003963 sighted and congenitally blind populations. J. Cogn. Neurosci.
136. Sexton, N.J. and Love, B.C. (2022) Reassessing hierarchical 27, 1981–1999
correspondences between brain and deep networks through 144. Kanwisher, N. (2000) Domain specificity in face perception. Nat.
direct interface. Sci. Adv. 8, eabm2219 Neurosci. 3, 759–763
137. Mehrer, J. et al. (2020) Individual differences among deep neu- 145. Kanwisher, N. et al. (1997) The fusiform face area: a module in
ral network models. Nat. Commun. 11, 1–12 human extrastriate cortex specialized for face perception.
138. Adlam, A.-L.R. et al. (2006) Semantic dementia and fluent pri- J. Neurosci. 17, 4302–4311
mary progressive aphasia: two sides of the same coin? Brain 146. Behrmann, M. et al. (2016) Neural mechanisms of face percep-
129, 3066–3080 tion, their emergence over development, and their breakdown.
139. Rogers, T.T. et al. (2015) Disorders of representation and con- WIREs Cogn. Sci. 7, 247–263
trol in semantic cognition: effects of familiarity, typicality, and 147. Dundas, E.M. et al. (2013) The joint development of hemispheric lat-
specificity. Neuropsychologia 76, 220–239 eralization for words and faces. J. Exp. Psychol. Gen. 142, 348–358