0% found this document useful (0 votes)
64 views

Exploring What Is Encoded in Distributional Word Vectors

This document discusses exploring what types of information are encoded in distributional semantic models (DSMs), which are language models that learn word embeddings from large text corpora. It summarizes previous research that has shown DSMs can capture semantic categories, semantic relations between words, social relationships between fictional characters, and cultural biases. The document proposes analyzing DSMs against a target neurobiologically-motivated conceptual representation comprising 65 attributes to determine what conceptual knowledge is encoded. Recent studies that have compared DSMs' abilities to represent various feature types are also discussed.

Uploaded by

Rodrigo Sanz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Exploring What Is Encoded in Distributional Word Vectors

This document discusses exploring what types of information are encoded in distributional semantic models (DSMs), which are language models that learn word embeddings from large text corpora. It summarizes previous research that has shown DSMs can capture semantic categories, semantic relations between words, social relationships between fictional characters, and cultural biases. The document proposes analyzing DSMs against a target neurobiologically-motivated conceptual representation comprising 65 attributes to determine what conceptual knowledge is encoded. Recent studies that have compared DSMs' abilities to represent various feature types are also discussed.

Uploaded by

Rodrigo Sanz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Exploring What Is Encoded in Distributional Word Vectors: A Neurobiologically Motivated

Analysis
Akira Utsumi

Research on cognitive science also benefits greatly from DSMs (Jones & Mewhort, 2007;
Jones, Willits, & Dennis, 2015)

Jones, M. N., Gruenenfelder, T. M., & Recchia, G. (2018). In defense of spatial models of semantic representation. New
Ideas in Psychology, 50, 54–60. https://doi.org/10.1016/j.newideapsych.2017.08.001

Word vectors have been demonstrated to explain a number of cognitive phenomena


relevant to semantic memory or the mental lexicon, such as word association (Jones,
Gruenenfelder, & Recchia, 2018; Utsumi, 2015), semantic priming (Mandera, Keuleers, & Brysbaert, 2017),
semantic transparency (Marelli & Baroni, 2015), and conceptual combination (Vecchi, Marelli,
Zamparelli, & Baroni, 2017).

Utsumi, A. (2015). A complex network approach to distributional semantic models. PLoS ONE, 10(8), e0136277.
https://doi.org/10.1371/journal.pone.0136277

In cognitive research on concepts, in particular on embodied versus symbolic processing,


DSM is regarded as a de facto standard language model (Bolognesi & Steen, 2018; de Vega,
Glenberg, & Graesser, 2008)

Bolognesi, M., & Steen, G. (2018). Abstract concepts: Structure, processing, and modeling. Topics in Cognitive Science,
10(3), 490–500. https://doi.org/10.1111/tops.12354
de Vega, M., Glenberg, A., & Graesser, A. (2008). Symbols and embodiment: Debates on meaning and cognition. New York:
Oxford University Press.

Furthermore, recent brain imaging studies have demonstrated that distributional word
vectors have a powerful ability to predict the neural brain activity evoked by lexical
processing (Anderson et al., 2019; Anderson, Kiela, Clark, & Poesio, 2017; Huth, de Heer,
Griffiths, Theunissen, & Gallant, 2016; Mitchell et al., 2008; Pereira et al., 2018).

Despite the fact that an extensive number of studies have applied word vectors to a variety
of tasks in many research fields, relatively little effort has been made to explore what type
of information or knowledge is encoded in word vectors.

Although these studies have focused on specific information, some recent studies (Grand,
Blank, Pereira, & Fedorenko, 2018; Sommerauer & Fokkens, 2018) have compared various
types of knowledge in terms of the representational ability of word vectors.

Grand, G., Blank, I., Pereira, F., & Fedorenko, E. (2018). Semantic projection: Recovering human knowledge of multiple,
distinct object features from word embeddings. arXiv:1802.01241 [cs.CL]

Sommerauer, P., & Fokkens, A. (2018). Firearms and tigers are dangerous, kitchen knives and zebras are not: Testing
whether word embeddings can tell. In T. Linzen, G. Chrupała, & A. Alishahi (Eds.), Proceedings of the
2018 EMNLP workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP (pp. 276–286). Brussels:
Association for Computational Linguistics.
Our approach to this question is to simulate human conceptual representation using text-
based distributional word vectors.
As a target human conceptual representation, we use a neurobiologically motivated
semantic representation provided by Binder et al. (2016) This conceptual representation
comprises 65 attributes based entirely on functional divisions in the human brain. Each word
is repre- sented as a 65-dimensional vector and each element of the vector is the degree of
sal- ience of the corresponding attribute.

Binder, J. R., Conant, L. L., Humphries, C. J., Fernandino, L., Simons, S. B., Aguilar, M., & Desai, R. H. (2016). Toward a brain-based
componential semantic representation. Cognitive Neuropsychology, 33(3–4), 130–174. https://doi.org/10.1080/02643294.2016.1147426

In this paper, we consider concepts and word meanings as interchangeable, as is usually


assumed in the cognitive science (in particular, concepts and categorization) literature
(Jackendoff, 2019; Vigliocco & Vinson, 2007).

Jackendoff, R. (2019). Mental representations for language. In P. Hagoort (Ed.), Human language: From genes and brains to
behavior (pp. 7–20). Cambridge, MA: MIT Press.
Vigliocco, G., & Vinson, D. P. (2007). Semantic representation. In M. G. Gaskell (Ed.), The Oxford handbook of
psycholinguistics (pp. 217–234). Oxford, UK: Oxford University Press.

We assume that a word is a label (or symbol) that refers to a concept, and the concept
defines the meaning of that word.
Hence, Binder et al.’s (2016) featural representation used as a target in our analysis is
regarded as the semantic representation of words, and also as the conceptual
representation of concepts.

2.1. Knowledge about words


Early studies on the representational ability of language statistics focused on semantic
categories and their hierarchical structure (Elman, 1990, 1991; McClelland & Rogers, 2003;
Rogers & McClelland, 2004). Elman (1990, 1991) demonstrated that distributed
representations that emerged in the hidden layers of a simple recurrent network (SRN)
captured structured semantic categories in addition to grammatical knowledge.

Rogers, T., & McClelland, J. (2004). Semantic cognition: A parallel distributed processing approach. Cambridge, MA: MIT
Press.
Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning,
7(2–3), 195–225. https://doi.org/10.1007/BF00114844

Recently, Huebner and Willits (2018) found that a long short-term memory (LSTM) neural
network can represent structured semantic knowledge better than an SRN, and the skip-
gram model (Mikolov, Chen, Corrado, & Dean, 2013) can also capture semantic knowledge,
but in a qualitatively different manner. Additionally, Utsumi (2015) showed using com- plex
network analysis that word vectors computed using positive pointwise mutual information
(PPMI; Bullinaria & Levy, 2007) can capture the hierarchical structure of semantic knowledge
better than word vectors constructed using latent semantic analysis (LSA; Landauer &
Dumais, 1997).

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In
Proceedings of workshop at the International Conference on Learning Representation (ICLR).
Utsumi, A. (2015). A complex network approach to distributional semantic models. PLoS ONE, 10(8), e0136277.
https://doi.org/10.1371/journal.pone.0136277

In recent DSMs, such as word2vec and GloVe (Pennington, Socher, & Manning, 2014), the
offsets (i.e., vector differences) between word vectors have been shown to encode some
semantic relations (Mikolov, Yih, & Zweig, 2013; Pennington et al., 2014). More recently, Lu
et al. (2019) demonstrated that word vectors computed by word2vec predicted various
types of semantic relations (e.g., synonym, antonym, and hyponym) using supervised
learning with labeled word pairs

Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In A. Moschitti, B. Pang,
& W. Daelemans (Eds.), Proceedings of the 2014 conference on empirical methods in natural language processing (pp.
1532–1543). Doha, Qatar: Association for Computational Linguistics.

2.2. Featural knowledge about concepts

Hutchinson and Louwerse (2018) reported that social relationships between fictional
characters in three novels could be derived from language statistics. Furthermore, social or
cultural biases toward gender or race (e.g., European-Americans are pleasant vs. African-
Americans are unpleasant) that result in prejudiced decisions have recently been found to be
encoded in word vectors (e.g., Bolukbasi, Chang, Zou, Saligrama, & Kalai, 2016; Caliskan,
Bryson, & Narayanan, 2017).

Caliskan, A., Bryson, J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like
biases. Science, 356(6334), 183–186. https://doi.org/10.1126/science.aal4230
Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to
homemaker? Debiasing word embeddings. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances
in neural information processing systems (Vol. 29, pp. 4349–4357). Barcelona, Spain: Neural Information Processing
Systems Foundation.

Although the studies described thus far have focused only on specific information, very
recent studies exist that have compared the representational ability of word vectors among
various types of features (Grand et al., 2018; Richie, Zou, & Bhatia, 2019; Sommerauer &
Fokkens, 2018).

Grand, G., Blank, I., Pereira, F., & Fedorenko, E. (2018). Semantic projection: Recovering human knowledge of multiple,
distinct object features from word embeddings. arXiv:1802.01241 [cs.CL]
Richie, R., Zou, W., & Bhatia, S. (2019). Distributional semantic representations predict high-level human judgment in seven
diverse behavioral domains. In A. K. Goel, C. M. Seifert, & C. Freksa (Eds.), Proceedings of the 41st annual conference of the
Cognitive Science Society (CogSci2019) (pp. 2654– 2660). Montreal: Cognitive Science Society.
Sommerauer, P., & Fokkens, A. (2018). Firearms and tigers are dangerous, kitchen knives and zebras are not: Testing
whether word embeddings can tell. In T. Linzen, G. Chrupała, & A. Alishahi (Eds.), Proceedings of the 2018 EMNLP
workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP (pp. 276–286). Brussels: Association for
Computational Linguistics.

Grand et al. (2018) proposed a method of semantic projection by which word vectors were
projected onto a linear dimension that corresponded to each of 17 properties involved in
conceptual knowledge. Using GloVe vectors, they revealed that abstract properties, such as
gender and danger, were significantly predicted across all categories, and even perceptual
properties, such as size, were captured for some relevant categories.
Richie et al. (2019) also showed that 14 properties, most of which are abstract (e.g.,
competent and sincere), were predicted by skip-gram vectors.

You might also like