The Semantic Spaces of Child-Directed Speech, Child Speech and Adult-Directed Speech: A Manifold Perspective

Hao Sun ([email protected])
Department of Linguistics, University at Buffalo
Buffalo, NY 14260 USA

John Pate ([email protected])

Department of Linguistics, University at Buffalo
Buffalo, NY 14260 USA

Abstract of nouns and verbs and we model such differences in word

meaning as a mismatch of data points in a semantic space.
Child-directed speech (CDS) is a talking style adopted by
Methodologically, we adapt a novel semi-supervised
caregivers when they talk to toddlers (Snow, 1995). We
manifold alignment algorithm to compare semantic spaces
consider the role of distributional semantic features of CDS
(Ham et al, 2005), which maps two manifolds into a common
in language acquisition. We view semantic structure as a
subspace to measure the similarity of these manifolds. This
manifold on which words lie. We compare the semantic
algorithm takes as input a subset of initial points that must be
structure of verbs in CDS to the semantic structure of child
aligned (i.e., pairs of points, one on each manifold, that
speech (CS) and adult-directed speech (ADS) by measuring
correspond to the same verb), and produces an alignment for
how easy it is to align the manifolds. We find that it is easier
the rest of the verbs. We then measure the similarity of the
to align verbs in CS to CDS than to align CS to ADS,
manifolds in terms of the accuracy of the alignment: how
suggesting that the semantic structure of CDS is reflected in
often a verb is mapped to the same region of the common
child productions. We also find, by measuring verbs vertex
degrees in a semantic graph, that a mixed initialized set of
We find that alignment between the CS and CDS is more
verbs with high degrees and medium degrees has the best
accurate than the alignment between CS and ADS.
performance among all alignments, suggesting that both
Additionally, we obtain more accurate alignments when
semantic generality and diversity may be important for
using verbs with many nearest neighbors (which have
developing semantic representations.
broader meanings) as the initial points than verbs with few
Keywords: child-directed speech; lexical development; near neighbors. Together, these results indicate that the
manifold learning; distributional semantics; graph theory semantic structure of CS reflects the semantic structure of
CDS, and verbs with broad meanings may provide useful
Introduction cues to children in acquiring the overall semantic structure of
One of the biggest puzzles in cognitive science is how verbs. On the one hand, what children can learn from CDS
children learn language from language input, namely child- deviates semantically from unfamiliar conversations in ADS,
directed speech. Child-directed speech is characterized by which suggests that further learning is required. On the other
simplified sentence structures, restricted vocabulary, hand, caregivers might align their semantic spaces to
exaggerated intonation, and hyperarticulation, and previous children’s semantic spaces, which lies within the general
work has proposed that these features facilitate language framework of conversational alignment (Pickering & Garod,
acquisition (Golinkoff and Alioto, 1995; Snow, 1995; 2004).
Thiessen, Hill, and Saffran, 2005). Here, we compare the
semantic spaces of child speech, child-directed speech and Model Setting
adult-directed speech, spanned by verbs, using state-of-the- We combine models from two different traditions into a
art computational tools. general framework of semantic representation. To compare
The contributions of this paper are both theoretical and the semantic spaces of CS, CDS and ADS, we use a manifold-
methodological. Theoretically, we explore various proposals based algorithm. The similarities between semantic spaces
about roles of verbs meanings in CDS, represented using a are measured by how easy it is to map one semantic space to
state-of-the-art distributional semantics approach. another. We represent the meaning of each verb by using the
Distributional methods map each word to a point in high- global vector model (Pennington, Socher & Manning, 2014)
dimensional space so that words with similar meanings are to embed words into a 50-dimensional space, which we call
near each other. We view the semantic structure of the a semantic space. Following the associationist tradition in
vocabulary as a high-dimensional surface in this space, called psychology (Anderson, 1973), we represent the meaning
a manifold, and compare manifolds estimated from CDS to structure of the verbal lexicon as a whole by considering how
manifolds estimated from child speech (CS) and adult- a collection of verbs is situated in this space, as expressed by
directed speech (ADS). Young children often broaden the use a neighborhood graph (Steyvers & Tenenbaum, 2005).
Estimating verb meanings from different datasets produces graph W is defined in Equation (2). D is the degree matrix, a
different semantic spaces, and we compare the spaces using diagonal matrix with vertex degrees on the diagonal.
a semisupervised manifold alignment algorithm (Ham et al.,
𝐿 =𝑊−𝐷 (2)
2005). This algorithm maps verbal semantic graphs into a
common semantic space and discovers the data point We use a symmetric graph Laplacian normalized by vertex
correspondences by finding pairs of points with the smallest degree (Shi & Malik, 2000), as
Euclidean distances.
𝐿𝑠𝑦𝑚 = 𝐷 −1⁄2 𝐿𝐷 −1⁄2 = 𝐼 − 𝐷 −1⁄2 𝑊𝐷 1⁄2 (3)
Lexical Semantic Representation
Aligning Semantic Spaces
The past three decades saw efforts to model the mental
representation of concepts (Launder & Dumais, 1997). The We compare the semantic spaces of CS, CDS and ADS using
inspiration for recent computational work on lexical the semisupervised manifold alignment algorithm. A
semantics dates back to Harris’s (1954) hypothesis that manifold is defined as a topological structure with every local
synonymous words appear in similar contexts. point with a neighborhood similar to a Euclidean space. The
One of the most successful semantic representation models goal of the manifold alignment algorithm is to pair up data
is proposed by Launder & Dumais (1997), known as Latent points from two high-dimensional data sets. For example, the
Semantic Analysis (LSA), which uses word-context co- algorithm aims to match give in CS to give in CDS. A
occurrence matrices to produce a low-dimensional semisupervised algorithm, using both labeled and unlabeled
representation by singular value decomposition. The lexical data as input, combines the strength of supervised and
semantic representation model used in this paper is based on unsupervised learning. The general goal of manifold
a state-of-the-art algorithm, GloVe (Pennington, Socher & alignment is to map two high-dimensional data sets to a
Manning, 2014), which is an extension of LSA. Instead of common low-dimensional space simultaneously (Ham et al.,
explicitly decomposing a word-context co-occurrence 2005), which essentially is an extension of manifold-based
matrix, GloVe implicitly decomposes a word-context log- nonlinear dimensionality reduction (Belkin & Niyogi, 2003).
frequency matrix. GloVe uses a weighted regression Manifold-based methods are based on the geometric
objective function to reconstruct a log word-context count assumption that data in high dimensional space lie in low-
matrix log( X ) with bias terms, as shown in Equation (1), dimension manifolds.
where w and b are bias vectors, X is the co-occurrence matrix Ham et al.'s algorithm defines a function f that maps the first
and f is a heuristic weighting function. The optimization manifold to a common space, and a function g that maps the
problem is iteratively solved using AdaGrad (Duchi, Hazan second manifold to a common subspace. These functions
& Singer, 2011). strike a tradeoff between mapping labeled pairs to the same
point in the common space, and respecting local structure on
̃𝑗 + 𝑏𝑖 + 𝑏̃𝑗 − log𝑋𝑖𝑗 )2
𝐽 = ∑𝑉𝑖,𝑗=1 𝑓(𝑋𝑖𝑗 )(𝑤𝑖𝑇 𝑤 (1) the original manifolds as expressed by the graph
Laplacian Lx for the first space and Ly for the second space.
Even though GloVe has better performance than traditional As we have both labeled (l) and unlabeled (u)
singular-value-decomposition-based LSA, careful analysis of points, Lx and Ly are block matrices:
the objective function suggests that GloVe is fundamentally
probabilistic matrix factorizations (Levy & Goldberg, 2014). 𝐿𝑥 𝐿𝑥𝑢𝑙
𝐿𝑥 = [ 𝑥𝑙𝑢 ] (4)
𝐿𝑙𝑢 𝐿𝑥𝑢𝑢
Semantic Graphs
The manifold alignment algorithm we use approximates the The cost of the mapping is then:
underlying manifold by constructing a similarity graph G =
(V, E), where the vertex set V is the set of verbs and the edge 𝐶̃ (𝒇, 𝒈) = (5)
𝒇𝑇 𝒇+ 𝒈𝑇 𝒈
set E is a set of pairs of verbs that are near to each other. The
weight of an edge is set to the cosine similarity between the
where μ expresses the tradeoff between mapping points
verbs associated by the edge. The degree of a vertex is the
exactly and preserving local structure on the original
sum of weights of all the edges linking to the vertex. In
manifolds. The first term is the sum of distances between
semantic networks, vertex degrees can be interpreted as
paired data points in the common space, and the second two
contextual diversity. There are several ways to build such a
terms represent faithfulness to the graph Laplacian. Ham et
similarity graph. Ozaki et al (2011) found that undirected
al. point out that Equation 4 is unsuitable for optimization,
mutual k nearest neighbor (mkNN) graphs give good
since it ignores simultaneous scaling of f and g, and so
performance for alignment of natural language data, so we
instead minimize the Rayleigh quotient:
use mkNN graphs. An mkNN graph has an edge (𝑣1 , 𝑣2 ) if
either 𝑣1 or 𝑣2 is within the k nearest neighbors of the other. 𝐶(𝒇, 𝒈) = μ ∑𝑖 |𝑓𝑖 − 𝑔𝑖 |2 + 𝒇𝑇 𝐿𝑥 𝒇 + 𝒈𝑇 𝐿𝑦 𝒈 (6)
We set k to 15 for the first experiment. In the second
experiment, we increase k to 20 to better investigate the We set μ to positive infinity to impose a hard constraint
degree effects. The unnormalized graph Laplacian (L) of for labeled pairs to be mapped directly on top of each other.
The analytic solution to the optimization is then given by the most frequent 200 English verbs in adult language
generalized graph Laplacian Lz in Equation 7. productions (Davies, 2008) and verbs that appear in three
common constructions (Levin, 1993). The three
𝑦 𝑦
𝐿𝑥𝑙𝑙 + 𝐿𝑙𝑙 𝐿𝑥𝑙𝑢 𝐿𝑙𝑢 constructions are the ditransitive (John gave Mary a book),
𝐿 = [ 𝐿𝑥𝑢𝑙
𝑧 𝐿𝑥𝑢𝑢 0 ] (7) the locative (The man loaded hay onto a truck) and the
𝑦 𝑦 conative (The police shot at the criminal). Since CHILDES
𝐿𝑢𝑙 0 𝐿𝑢𝑢
suffers from data sparcity, verbs missing in either CS or CDS
The semisupervised manifold alignment algorithm adopted were excluded from analysis. We end up with 811 data points
from Ham et al. 2005 is described in Algorithm 1. for CS, CDS and spoken COCA respectively.

Algorithm 1: Semisupervised Manifold Alignment Data Preprocessing

Algorithm (Ham et al., 2005) The adult-directed speech data from spoken COCA and the
child speech and child-directed speech data from CHILDES
Input: data points from two data sets, with N initially data were preprocessed using regular expressions. Verbs in
aligned data point pairs different inflectional forms were treated as separate verb
Output: a matching of data points types.
1. Construct similarity graphs G1, G2, for both data sets
respectively, using mkNN Model Training
2. Compute the symmetric graph Laplacians of G1 and Global Vector Training We used the implementation of
G2, Lx and Ly, using Equation (3) GloVe from the Stanford NLP website to train 50-
3. Compute a graph Laplacian for a joint graph Lz using dimensional vectors for each of our three datasets
Equations (6) and (7) (Pennington, Socher & Manning, 2014). We trained each set
4. Compute the eigenvectors of Lz and take eigenvectors of vectors for 50 epochs with a context window size of 10,
corresponding to the smallest non-zero eigenvalues, the used a frequency cut-off of 2 for the CS and CDS datasets
results of which are the vectors in a lower-dimensional and a cut-off of 10 for the ADS dataset.
space Similarity Graph Construction We construct mkNN
5. Find the data points with smallest Euclidean distance graphs consistently throughout this paper. In the first
weighted by the inverse of their respective eigenvalues simulations, we fix the number of mutual nearest neighbors
to 15. In the second simulation, we test the effect of vertex
degrees and we set the number of mutual nearest neighbors
Experiment Setup to 20 to increase the range of vertex degree.
Manifold Alignment The parameters that we need to
Corpora specify in the manifold alignment module include the initial
The training set for CDS and CS is a combined data set from labeled alignments and the dimensionality of the manifold.
CHILDES (MacWhinney, 2000), which consists of all the In addition to the number of labeled data, the identity of the
data on American English-speaking monolingual 3 to 7 year- labeled data can also influence the quality of alignment. The
old children with typical language and cognitive dimensionality of the manifold controls the abstraction of
development, excluding diary studies. To simplify data semantic information contained in the word vectors. The
collection, only utterances annotated as child are considered lower the dimension, the more abstract the representation.
child speech and only utterances annotated as mother and
father are considered as child-directed speech. The CS and Evaluation
CDS corpora contain 5 million and 9 million word tokens, Because the alignment algorithm pairs up labeled data points
respectively. To prevent the CS from being similar to CDS exactly, we only evaluate alignments on unlabeled data. We
purely due to priming effects, we divided the data into two use a random alignment averaged over 5 times as the baseline
halves so that the CDS and CS data were not drawn from the condition. Ideally, corresponding data points from two data
same contexts sets should be mutual nearest neighbor in the lower
Our ADS data is drawn from the spoken portion of the dimensional space. We relax the evaluation requirements by
Corpus of Contemporary American English (COCA, Davies, giving every alignment a k-nearest neighborhood evaluation
2008). Although this data may differ from more casual radius. If one data point is one of the k-nearest neighbors of
conversations, it provides a large amount of spontaneous the corresponding point, we take it as a hit. When the
speech in the form of unscripted conversations from 150 evaluation neighborhood radius equals 1, the measures
television and radio programs. quantify the exact alignment.

Materials Simulation 1: Mapping CS to CDS and COCA

The target words used in this model are all verbs, which are In this section, we demonstrate that CS-CDS alignment is a
understudied in the literature. We included the first 100 less demanding task than CS-COCA alignment even when
English verbs acquired by infants (Fenson et al., 1994), the
potential priming effects from linguistic and non-linguistic children, which sits well with the conversational alignment
contexts are removed. We also predict that with the increase theory (Pickering & Garrod, 2004). The big semantic gap
of labeled data, the alignment accuracy also increases. between initial language input and adult-to-adult
conversations on TV shows or radios suggests that learning
Method from CDS alone is not sufficient for real world language
We performed verb semantic graph alignments of CS to CDS processing. Adapting to TV or radio conversations constitute
and to ADS for alignment spaces of dimensionality from 5 to one part of further learning, which supports a continuous
30. The unlabeled precisions are evaluated by the window- theory of language development.
size at 1 and at 20, as demonstrated in the contour heat maps
in Figure 1. The colors of different areas in the contours Simulation 2: Semantic generality
indicate different levels of unlabeled accuracy and the data In Simulation 2, we use a fixed list of labeled data to
points with the same unlabeled accuracy are connected by the investigate the effect of initialization in alignment, instead of
isolines in the maps. random initialization. The motivation is that language
scientists argue for the importance of a few important “path-
breaking” word exemplars in language learning (Ninio, 1999;
Goldberg, Casenhiser & Sethuraman, 2004). Some words
attract more vertices than others, which is known as
preferential attachment in network growth (Steyvers &
Tenenbaum, 2005). We evaluate the proposal that
semantically general verbs are better starting points for
language learning than semantically specific verbs, by
measuring the vertex degrees.

Figure 1 Accuracies of mapping CS to CDS and COCA

The general trend is that the highest unlabeled precisions are
found in the upper right corners of the contour maps whereas
the lowest unlabeled precisions tend to lie close to the x-axis.
The dimensionality of the embedding space can be
interpreted as the granularity of children’s representations.
The result of the alignments is demonstrated graphically in
Figures 1 and 2. In the alignments from CS to CDS and CS
to COCA, the CS-COCA alignment achieves only 50% to
60% of the unlabeled precision of the CS-CDS alignment. Figure 2 Unlabeled accuracies of CS-CDS and CS-COCA
The unlabeled precision of the CS-CDS alignment is alignments with a random alignment as the baseline
consistently higher than the unlabeled precision of the CS-
COCA alignment across all conditions. Both alignments have The degree of a vertex measures the association between a
much larger unlabeled accuracy than the random baseline. vertex and its neighboring vertices. The prediction is that
The CS data are aligned to both the spoken COCA and vertices with large degree are better labeled data than vertices
CDS corpora. The CS-CDS alignment precision wins over with small degree. Cognitively, the verbs with high degree
the CS-COCA precision across all conditions. In other words, are semantically general verbs whereas the verbs with low
child speech is much easier to map to child-directed speech degree are the ones with less general meanings.
than to spoken COCA. This easier alignment can be
interpreted as similarity in semantic spaces across corpora. Method
Since the CS and the CDS word vectors are trained on Verbs are ranked based on their vertex degree in a semantic
speech data from different experiments, the relative similarity network. As shown in Table 1, what we use as labeled data is
between CS and CDS lexical semantics, this similarity does 100 verbs with the largest degrees, 100 with the smallest
not reflect mere priming effects. There are two possible degrees, and medium-degree verbs with degree rank of 201
interpretations for this result. First, the result can be viewed to 300. We also mixed half of high degree verbs with half of
as an imitation effect in which children mirror child-directed medium degree verbs in the mixed condition. The baseline
speech semantically. Second, adult caregivers might adapt condition is averaged over 5 random initializations. We set
their mental representations to children’s when they talk to
the number of mutual nearest neighbors, the evaluation
radius and the dimensionality all to 20.

The alignment precisions shown in Figure 3 show a clear
advantage of high-degree and medium degree conditions
over the low degree condition, but both high-degree and low-
degree have below random performances. We can also see an
advantage of medium degree initialization, which is parallel
to the basic level categorization theories. When we use a
mixed set of high-degree and medium-degree verbs, we get
the best results on all the conditions, which suggests that a
diverse-degree initialization facilitates semantic space
Figure 3 Unlabeled accuracies of alignments with high-
Table 1: Verbs with the largest, medium and smallest degree, medium-degree, low degree, mixed-degree and
vertex degrees in ADS random initializations

largest medium smallest Speaker Normalization by Manifold Alignment

get giving tickles In speech recognition and perception, speaker normalization
go tearing points is the task of automatically adjusting to acoustic differences
want taken shooting between different speakers. Our work is inspired by Plummer
put poured design et al. (2010), who proposed manifold alignment as an account
think tipping tapping for how young children learn to handle phonetic variability
in vowel production during language acquisition.
General Discussion Aside from working with semantic, rather than acoustic,
In Simulation 1, we demonstrate that CS has semantic representations, our work differs from theirs in two respects.
properties very similar to CDS in comparison to ADS. This First, they used synthesized data as input, while we used
result supports a usage-based approach to language naturalistic corpus data. Second, since two token
acquisition: children imitate their caregivers. The results can pronunciations of vowels will never be the same, they
be interpreted in multiple perspectives. First, the result imposed only a soft alignment constraint that labeled pairs be
suggests that child speech is built upon restricted linguistic aligned, while we imposed a hard constraint.
contexts. One of the biggest characteristics of human memory
is context-dependency. Early language experience is built Crosslinguistic Alignment of Polysemous Words
upon restricted contexts and usages requires further learning Youn et al. (2016) investigated semantic universals by
to achieve the adult form. Second, child-directed speech is constructing networks of corresponding polysemous nouns
used in young children’s living environments. Children seem from 81 languages sampled from different language families.
to use words highly consistent with their caregivers. Third, Using an approach reminiscent of thesaurus-based synonym
talking to children in child-directed speech is a double-edged induction, they established semantic correspondences
sword. On the one hand, children might have an easier time between nouns using bilingual dictionaries. The target
initializing their language capacities at an early language polysemous words were selected from the Swadesh 200 basic
development stage because their hypothesis space is vocabulary list. The procedure described in this paper is
restricted by child-directed speech. On the other hand, the automatic and takes into consideration the matching of
mismatch between child-directed speech and adult-directed semantic spaces in one language, whereas Youn and
speech requires children to shift their semantic colleagues manually establishes semantic correspondences
representations at later development stages. for a few basic words in bilingual data.
In Simulation 2, we show empirically that semantically
moderately general verbs are better starting points for Conclusions
language development. Our simulations show mixed results The contribution of this paper is a novel integrated
for the “path-breaking” argument that semantically generic framework that compares semantic spaces of children and
verbs are important for language learning (Ninio, 1999). Our their caregivers based on naturalistic language productions.
results suggest that both semantic generality and semantic We combined methods from three traditions, distributed
diversity play a role here. Although semantically general semantic representations, graph theory, and manifold
verbs help in general, verbs that are semantically too general alignment, into one framework for approaching the semantic
may not be that helpful. structure of the lexicon. We used naturalistic language
productions from CHILDES to compare the semantic spaces
spanned by verbs and demonstrated that (i) that CDS is more Harris, Z. S. (1954). Distributional structure. Word, 10(2-3),
similar to ADS than CS in terms the semantic spaces spanned 146-162.
by verbs and that (ii) verbs with relatively large and diverse Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's
degrees are especially useful for aligning semantic structures. problem: The latent semantic analysis theory of
While the general computational framework proposed in acquisition, induction, and representation of
this paper does not provide an account of how children might knowledge. Psychological review, 104(2), 211.
exploit this manifold-based and graph-theoretic information, Levy, O., & Goldberg, Y. (2014). Neural word embedding as
it does suggest that useful information about the structure of implicit matrix factorization. In Advances in neural
the adult lexicon is available to children. Even though our information processing systems (pp. 2177-2185).
framework is on the computational level, using Marr’s MacWhinney, B. (2000). The CHILDES Project: Tools for
terminology (1982), it is very likely that semantic manifold analyzing talk. Third Edition. Mahwah, NJ: Lawrence
alignment plays a role in children’s semantic development. Erlbaum Associates.
Additionally, this framework may be of use to other fields Marr, D. (1982). Vision: A Computational Investigation into
that are interested in the semantic structure of different the Human Representation and Processing of Visual
lexicons. For example, this approach may be useful for Information. The MIT Press.
performing semantic comparisons between Ninio, A. (1999). Pathbreaking verbs in syntactic
languages or across time over the course of language development and the question of prototypical transitivity.
change, and understanding the semantic organization of Journal of child language, 26(03), 619-653.
bilingual lexicons. Pennington, J., Socher, R., & Manning, C. D. (2014,
October). Glove: Global Vectors for Word Representation.
References In EMNLP (Vol. 14, pp. 1532-1543).
Anderson, J. R., & Bower, G. H. (1973). Human associative Pickering, M. J., & Garrod, S. (2004). Toward a mechanistic
memory. Psychology press. psychology of dialogue. Behavioral and brain sciences,
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold 27(02), 169-190.
regularization: A geometric framework for learning from Plummer, A. R., Beckman, M. E., Belkin, M., Fosler-Lussier,
labeled and unlabeled examples. Journal of machine E., & Munson, B. (2010). Learning speaker normalization
learning research, 7(Nov), 2399-2434. using semisupervised manifold alignment.
Davies, Mark. (2008-) The Corpus of Contemporary In INTERSPEECH (pp. 2918-2921).
American English: 520 million words, 1990-present. Thiessen, E. D., Hill, E. A., & Saffran, J. R. (2005). Infant‐
Available online at http://corpus.byu.edu/coca/. directed speech facilitates word
Diaz, F., & Metzler, D. (2007, January). Pseudo-Aligned segmentation. Infancy, 7(1), 53-71.
Multilingual Corpora. In IJCAI (pp. 2727-2732). Shi, J., & Malik, J. (2000). Normalized cuts and image
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive segmentation. IEEE Transactions on pattern analysis and
subgradient methods for online learning and stochastic machine intelligence, 22(8), 888-905.
optimization. Journal of Machine Learning Snow, C. (1995). Issues in the study of input: Finetuning,
Research, 12(Jul), 2121-2159. universality, individual and developmental differences,
Elman, J. L. (1993). Learning and development in neural and necessary causes. The handbook of child language, ed.
networks: The importance of starting by Paul Fletcher and Brian MacWhinney. Oxford:
small. Cognition, 48(1), 71-99. Blackwell.
Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Thal, D. J., Steyvers, M., & Tenenbaum, J. B. (2005). The Large‐scale
Pethick, S. J., ... & Stiles, J. (1994). Variability in early structure of semantic networks: Statistical analyses and a
communicative development. Monographs of the society model of semantic growth. Cognitive science, 29(1), 41-
for research in child development, i-185. 78.
Goldberg, A. E., Casenhiser, D. M., & Sethuraman, N. Youn, H., Sutton, L., Smith, E., Moore, C., Wilkins, J. F.,
(2004). Learning argument structure Maddieson, I., ... & Bhattacharya, T. (2016). On the
generalizations. Cognitive linguistics, 15(3), 289-316. universal structure of human lexical
Goldfield, B. A., & Reznick, J. S. (1990). Early lexical semantics. Proceedings of the National Academy of
acquisition: Rate, content, and the vocabulary Sciences, 113(7), 1766-1771.
spurt. Journal of child language, 17(01), 171-183.
Golinkoff, R. M., & Alioto, A. (1995). Infant-directed speech
facilitates lexical learning in adults hearing Chinese:
Implications for language acquisition. Journal of Child
Language, 22(3), 703-726.
Ham, J., Lee, D. D., & Saul, L. K. (2005, January).
Semisupervised alignment of manifolds. In AISTATS (pp.

