Universal Features in Phonological Neighbor Networks
Abstract
:1. Introduction
1.1. Background
1.2. Hypotheses
2. Results
2.1. Empirical Analysis of Phonological Neighbor Networks
Degree Distributions and Topology
2.2. Islands and Frequency Assortativity
2.3. DAS Graphs as Mixtures
3. Pseudolexicons
3.1. Models
- Infinite Temperature (INFT). Each phoneme in the string is drawn uniformly from the target language’s phoneme inventory. This model has no information about relative consonant (C) and vowel (V) frequencies in the target language; all are treated as equally likely.
- Noninteracting, Uniform Field (UNI). Each phoneme in the string is drawn randomly using its observed frequency in the real language’s lexicon. This model receives information about overall C and V frequencies; for example, it is given the relative likelihood of finding /k/vs./a/ in English words. However, no positional frequency information is provided, so, for example, if asked to produce a three phoneme English pseudoword, UNI is no more likely to produce a CVC (a very common three phoneme pattern [/kæt/]) than a CCC (vanishingly rare, with some controversy regarding their status as actual words [/pst/]).
- Noninteracting, Consonant/Vowel Uniform Field (CVUNI). Each position in the random string is either a C or a V drawn randomly using observed positional C and V frequencies in the real lexicon. Specifically, we use the real language’s corpus to compute the position-dependent probability that position l is a C or a V. The particular consonant or vowel placed at that position is drawn uniformly from the target language’s list of consonants and vowels. Unlike UNI, CVUNI would produce CVC more often than CCC if asked to generate a three-phoneme English pseudoword. However, the model is provided no knowledge of individual consonant and vowel frequencies, so common and uncommon phonemes will be mixed.
- Noninteracting, Consonant/Vowel Field (CV). Positions are selected to be consonants or vowels exactly as in CVUNI. The particular consonant or vowel placed at each position is selected using observed frequencies of consonants and vowels from the real lexicon.
- Noninteracting, Spatially Varying Field (SP). Each phoneme is drawn randomly from real positional frequencies in the target lexicon. For example, if a language has an inventory of twenty phonemes, we use the real lexicon to compute a that gives the probability that phoneme x occurs at position l, and then use this table to assign a phoneme to each position of the random string. SP and CV use similar but not identical information from the real language. One important feature of a real language that they do not capture is phonotactic constraints. That is, pairs of phonemes occur in real languages with frequencies different from the product of the frequencies of the individual phonemes, and in a word form location-dependent manner. For example, /t/ and /b/ are common consonants in English, but the diphone /tb/ rarely ever appears except in multisyllabic words at syllable boundaries (i.e., the words outbreak, outburst, frostbite).
- Nearest Neighbor Interactions (PAIR). The first phoneme in each string is drawn using a positional probability. Subsequent phonemes are drawn via the following rule. If the phoneme at position k is x, then the phoneme at position is drawn using the empirical probability (from the real lexicon) that phoneme follows phoneme x. PAIR is the model we consider with the most (though not full) linguistic detail; unlike any other model above, PAIR will not produce unobserved diphones even if the constituent phonemes are quite common.
3.2. English Networks
3.3. Five Language Pseudonetworks
3.4. Sensitivity to the Form Length Distribution
- ZTP(1x). is a zero-truncated Poisson (ZTP) model fit to the empirical distribution. The ZTP distribution has the form
- ZTP(1.5x). This model is a zero-truncated Poisson model for with a mean equal to 1.5 times the mean of the ML of ZTP(1x).
- GEO. Here follows a geometric distribution
4. Discussion
5. Materials and Methods
5.1. Data
Author Contributions
Funding
Conflicts of Interest
Abbreviations
NAM | Neighborhood Activation Model |
PNN | Phonological Neighbor Network |
DAS | Deletion-Addition-Substitution |
SWR | Spoken Word Recognition |
WS | Watts-Strogatz |
BA | Barabasi-Albert |
CLEARPOND | Cross-Linguistic Easy-Access Resource for Phonological and Orthographic Neighborhood Densities |
FK | Francis and Kucera |
EN | English |
NL | Dutch |
DE | German |
ES | Spanish |
FR | French |
GC | Giant Component |
INFT | Infinite Temperature Pseudolexicon |
UNI | Noninteracting, Uniform Field Pseudolexicon |
CVUNI | Noninteracting, Consonant/Vowel Uniform Field Pseudolexicon |
CV | Noninteracting, Consonant/Vowel Field Pseudolexicon |
SP | Noninteracting, Spatially Varying Field Pseudolexicon |
PAIR | Nearest Neighbor Interactions Pseudolexicon |
EMP | Emprical Form Length Distribution |
ZTP | Zero-Truncated Poisson Distribution |
GEO | Geometric Distribution |
Appendix A. Syllable Level Analysis
EN MS + PS | EN MS | EN PS | NL MS + PS | NL MS | NL PS | |
---|---|---|---|---|---|---|
N | 18,983 | 5979 | 13,004 | 15,630 | 2808 | 12,552 |
m | 76,092 | 50,232 | 19,808 | 36,158 | 16,785 | 18,396 |
d | 0.0004 | 0.003 | 0.0002 | 0.0003 | 0.004 | 0.0002 |
8.01 | 16.8 | 3.0 | 4.71 | 11.96 | 2.93 | |
GC size | 0.66 | 0.98 | 0.46 | 0.31 | 0.97 | 0.43 |
C | 0.23/0.28 | 0.3/0.3 | 0.19/0.26 | 0.16/0.23 | 0.31/0.30 | 0.13/0.20 |
l | 6.68 | 4.63 | 10.3 | 4.62 | 11.8 | 8.73 |
1.0 * | - | 1.04 * | 1.84 * | - | 1.72 | |
r | 0.73/0.70 | 0.65/0.65 | 0.74/0.66 | 0.74/0.69 | 0.59/0.59 | 0.74/0.65 |
0.104(4) | 0.068(4) | 0.089(7) | 0.126(5) | 0.055(8) | 0.100(7) |
Appendix B. Lexical Issues
- Proper Nouns. Any word whose orthographic (written) form begins with a capital letter is assumed to be a proper noun. This rule applies equally well to the PNNs for FK and CLEARPOND. However, we emphasize here that because of the rules for capitalization in German (all nouns are capitalized), we cannot systematically remove proper nouns for all five languages in CLEARPOND.
- Inflected Forms. FK includes lemma numbers for all the words, so we can simply remove any words that are not lemmas. We do not have this information for any words in CLEARPOND and thus cannot remove them. To try to remove inflected forms in CLEARPOND we could, for example, remove all words with word-final phonological “z”. This would remove English plurals but also improperly remove some lemmas (size). Even if this were desirable, we would need different rules for all five languages. Therefore we are forced to keep all inflected forms in the CLEARPOND PNNs.
- Homophones. Homophones are items with identical phonological transcriptions but different orthography. These are relatively simple to remove in both FK and CLEARPOND English, and the same procedure works in any language. We search the nodes for sets of items with identical phonological transcriptions. For example, see and sea would comprise one homophone set in English, and lieu, loo, and Lou another. One of the items from each homophone set, chosen at random, is kept in the PNN and the nodes corresponding to all other items in the set are deleted.
Word | Degree |
---|---|
Lea | 68 |
Lee | 68 |
Lew | 66 |
loo | 66 |
lieu | 66 |
Lai | 63 |
lye | 63 |
lie | 63 |
Lowe | 62 |
low | 60 |
male | 60 |
see | 60 |
sea | 60 |
Language | Nodes Removed | ||
---|---|---|---|
EN | 731 | 2.09 | 795 |
DE | 440 | 2.10 | 485 |
ES | 1059 | 2.03 | 1123 |
FR | 9013 | 2.63 | 14,735 |
NL | 417 | 2.08 | 449 |
References
- Fowler, C.A.; Magnuson, J.S. Speech perception. In The Cambridge Handbook of Psycholingustics; Spivey, M., McRae, K., Joanisse, M., Eds.; Cambridge University Press: Cambridge, UK, 2012; pp. 3–25. [Google Scholar]
- Magnuson, J.S.; Mirman, D.; Myers, E. Spoken word recognition. In The Oxford Handbook of Cognitive Psychology; Reisberg, D., Ed.; Oxford University Press: Oxford, UK, 2013; pp. 412–441. [Google Scholar]
- Luce, P.A. Research on Speech Perception, Technical Report No. 6: Neigborhoods of Words in the Mental Lexicon; Technical Report; Speech Research Laboratory, Department of Psychology, Indiana University: Bloomington, IN, USA, 1986. [Google Scholar]
- Luce, P.A.; Pisoni, D.B. Recognizing spoken words: The neighborhood activation model. Ear Hear. 1998, 19, 1–36. [Google Scholar] [CrossRef] [PubMed]
- Vitevitch, M.S. What can graph theory tell us about word learning and lexical retrieval? J. Speech Lang. Hear. Res. 2008, 51, 408–422. [Google Scholar] [CrossRef]
- Chan, K.Y.; Vitevitch, M.S. The influence of the phonological neighborhood clustering-coefficient on spoken word recognition. J. Exp. Psychol. Hum. Percept. Perform. 2009, 35, 1934–1949. [Google Scholar] [CrossRef] [PubMed]
- Chan, K.Y.; Vitevitch, M.S. Network structure influences speech perception. Cognit. Sci. 2010, 34, 685–697. [Google Scholar] [CrossRef] [PubMed]
- Iyengar, S.R.S.; Madhavan, C.E.V.; Zweig, K.A.; Natarajan, A. Understanding human navigation using network analysis. Top. Cognit. Sci. 2012, 4, 121–134. [Google Scholar] [CrossRef] [PubMed]
- Siew, C.S.Q. The influence of 2-hop density on spoken word recognition. Psychon. Bull. Rev. 2017, 24, 496–502. [Google Scholar] [CrossRef] [PubMed]
- Watts, D.J.; Strogatz, S.H. Collective dynamics of “small-world” networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef] [PubMed]
- Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [PubMed]
- Arbesman, S.; Strogatz, S.H.; Vitevitch, M.S. The structure of phonological networks across multiple languages. Int. J. Bifurc. Chaos 2010, 20, 679–685. [Google Scholar] [CrossRef]
- Kello, C.T.; Beltz, B.C. Scale-free networks in phonological and orthographic wordform lexicons. In Approaches to Phonological Complexity; Chitoran, I., Coupé, C., Marsico, E., Pellegrino, F., Eds.; Mouton de Gruyter: Berlin, Germany, 2009; pp. 171–192. [Google Scholar]
- Steyvers, M.; Tenenbaum, J.B. The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognit. Sci. 2005, 29, 41–78. [Google Scholar] [CrossRef] [PubMed]
- Allopenna, P.D.; Magnuson, J.S.; Tanenhaus, M.K. Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. J. Mem. Lang. 1998, 38, 419–439. [Google Scholar] [CrossRef]
- Marslen-Wilson, W.D.; Welsh, A. Processing interactions and lexical access during word recognition in continuous speech. Cognit. Psychol. 1978, 10, 29–63. [Google Scholar] [CrossRef]
- Marslen-Wilson, W.D. Issues of process and representation in lexical access. In Cognitive Models of Speech Processing: The Second Sperlonga Meeting; Altmann, G.T.M., Shillock, R., Eds.; Psychology Press: London, UK, 1993; pp. 187–210. [Google Scholar]
- Gruenenfelder, T.M.; Pisoni, D.B. The lexical restructuring hypothesis and graph theoretica analyses of networks based on random lexicons. J. Speech Lang. Hear. Res. 2009, 52, 596–609. [Google Scholar] [CrossRef]
- Stella, M.; Brede, M. Patterns in the English language: Phonological networks, percolation, and assembly models. J. Stat. Mech. Theory Exp. 2015, 5, P05006. [Google Scholar] [CrossRef]
- Marian, V.; Bartolotti, J.; Chabal, S.; Shook, A. CLEARPOND: Cross-Linguistic Easy-Access Resource for Phonological and Orthographic Neighborhood Densities. PLoS ONE 2012, 7, e43230. [Google Scholar] [CrossRef] [PubMed]
- Clauset, A.; Shalizi, C.R.; Newman, M.E.J. Power-law distributions in empirical data. SIAM Rev. 2009, 51, 661–703. [Google Scholar] [CrossRef]
- Arbesman, S.; Strogatz, S.H.; Vitevitch, M.S. Comparitive analysis of networks of phonologically similar words in English and Spanish. Entropy 2010, 12, 327–337. [Google Scholar] [CrossRef]
- Newman, M.E.J. Mixing patterns in networks. Phys. Rev. E 2003, 67, 026126. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Landauer, T.K.; Streeter, L.A. Structural differences between common and rare words: Failure of equivalence assumptions for theories of word recognition. J. Verbal Learn. Verbal Behav. 1972, 12, 119–131. [Google Scholar] [CrossRef]
- Simon, H.A. On a class of skew distribution functions. Biometrika 1955, 42, 425–440. [Google Scholar] [CrossRef]
- Potts, R.B. Some generalized order-disorder transitions. Math. Proc. Camb. Philos. Soc. 1952, 48, 106–109. [Google Scholar] [CrossRef]
- Ising, E. Beitrag zur Theorie des Ferromagnetismus. Z. Phys. 1925, 31, 253–258. (In German) [Google Scholar] [CrossRef]
- Francis, W.N.; Kucera, H. Frequency Analysis of English Usage: Lexicon and Grammar; Houghton Mifflin: Boston, MA, USA, 1982. [Google Scholar]
- Stella, M.; Brede, M. Investigating the Phonetic Organization of the English Language via Phonological Networks, Percolation, and Markov Models. In Proceedings of ECCS; Battiston, S., Pellegrini, F.D., Caldarelli, G., Merelli, E., Eds.; Springer: Cham, Switzerland, 2016; pp. 219–229. [Google Scholar]
- McClelland, J.L.; Rumelhart, D.E. An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychol. Rev. 1981, 88, 375–407. [Google Scholar] [CrossRef]
- Christiansen, M.H.; Kirby, S. Language evolution: consensus and controversies. Trends Cogn. Sci. 2003, 7, 300–307. [Google Scholar] [CrossRef] [Green Version]
- Plotkin, J.B.; Nowak, M.A. Language evolution and information theory. J. Theor. Biol. 2000, 205, 147–159. [Google Scholar] [CrossRef] [PubMed]
- Cancho, R.F.; Solé, R.V. Least effort and the origins of scaling in human language. Proc. Natl. Acad. Sci. USA 2003, 100, 788–791. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- McClelland, J.L.; Elman, J.L. The TRACE model of human speech perception. Cognit. Psychol. 1986, 18, 1–86. [Google Scholar] [CrossRef]
- Hannagan, T.; Magnuson, J.S.; Grainger, J. Spoken word recognition without a TRACE. Front. Psychol. 2013, 4, 563. [Google Scholar] [CrossRef] [PubMed]
- Vitevitch, M.S.; Ercal, G.; Adagarla, B. Simulating retrieval from a highly clustered network: Implications for spoken word recognition. Front. Lang. Sci. 2011, 2, 369. [Google Scholar] [CrossRef] [PubMed]
- Brysbaert, M.; New, B. Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav. Res. Methods 2009, 41, 977–990. [Google Scholar] [CrossRef] [PubMed]
- Keuleers, E.; Brysbaert, M.; New, B. SUBLTEX-NL: A new measure for Dutch word frequency based on film subtitles. Behav. Res. Methods 2010, 42, 643–650. [Google Scholar] [CrossRef] [PubMed]
- Brysbaert, M.; Buchmeier, M.; Conrad, M.; Jacobs, A.M.; Bölte, J.; Böhl, A. The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Exp. Psychol. 2011, 58, 412–424. [Google Scholar] [CrossRef] [PubMed]
- Cuetos, F.; Glez-Nosti, M.; Barbón, A.; Brysbaert, M. SUBTLEX-ESP: Spanish word frequencies based on film subtitles. Psicológica 2011, 32, 133–143. [Google Scholar]
- New, B.; Pallier, C.; Brysbaert, M.; Ferrand, L. Lexique 2: A new French Lexical Database. Behav. Res. Methods Instrum. Comput. 2004, 36, 516–524. [Google Scholar] [CrossRef] [PubMed]
EN | NL | DE | ES | FR | |
---|---|---|---|---|---|
N | 18,983 | 15,360 | 17,227 | 20,728 | 21,177 |
m | 76,092 | 36,158 | 41,970 | 36,111 | 145,426 |
8.01 | 4.71 | 4.87 | 3.48 | 13.7 | |
GC size | 0.66 | 0.56 | 0.58 | 0.43 | 0.74 |
C (all/GC) | 0.23/0.28 | 0.16/0.23 | 0.21/0.24 | 0.18/0.20 | 0.24/0.25 |
l | 6.68 | 8.48 | 8.73 | 9.41 | 6.85 |
1.0 * | 1.84 * | 1.2 * | 2.1 * | 1.04 * | |
r (all/GC) | 0.73/0.70 | 0.74/0.69 | 0.75/0.70 | 0.71/0.62 | 0.71/0.68 |
FK | INFT | UNI | CVUNI | CV | SP | PAIR | |
---|---|---|---|---|---|---|---|
N | 7861 | 1891 | 2922 | 1947 | 3022 | 3139 | 4346 |
m | 22,745 | 2841 | 7501 | 3532 | 8687 | 8811 | 12,319 |
5.79 | 3.0 | 5.13 | 3.63 | 5.74 | 5.61 | 5.67 | |
GC size | 0.69 | 0.77 | 0.85 | 0.80 | 0.85 | 0.85 | 0.87 |
C | 0.21 | 0.19 | 0.25 | 0.22 | 0.27 | 0.25 | 0.25 |
l | 6.38 | 7.26 | 5.34 | 6.65 | 5.30 | 5.40 | 5.63 |
1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | |
r | 0.67 | 0.60 | 0.45 | 0.64 | 0.48 | 0.49 | 0.45 |
JSD | 0.0 | 0.079 | 0.011 | 0.013 | 0.046 | 0.0052 | 0.0086 |
EN | INFT | UNI | CVUNI | CV | SP | PAIR | |
---|---|---|---|---|---|---|---|
N | 18,252 | 3192 | 5911 | 3942 | 6219 | 7098 | 8705 |
m | 59,965 | 3748 | 13,281 | 6857 | 16,373 | 18,821 | 20,922 |
6.6 | 2.35 | 4.49 | 3.48 | 5.27 | 5.31 | 4.81 | |
GC size | 0.65 | 0.36 | 0.41 | 0.47 | 0.38 | 0.63 | 0.35 |
C | 0.21 | 0.19 | 0.25 | 0.22 | 0.27 | 0.25 | 0.25 |
l | 6.81 | 16.7 | 6.53 | 9.43 | 5.77 | 10.7 | 9.35 |
1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | |
r | 0.70 | 0.83 | 0.74 | 0.85 | 0.68 | 0.71 | 0.72 |
JSD | 0.0 | 0.11 | 0.034 | 0.030 | 0.069 | 0.025 | 0.046 |
FR | pFR | ES | pES | DE | pDE | NL | pNL | |
---|---|---|---|---|---|---|---|---|
N | 12,164 | 7854 | 20,018 | 2198 | 16,787 | 4141 | 14,943 | 3938 |
m | 32,753 | 21,577 | 31,812 | 2852 | 35,402 | 8749 | 31,697 | 9408 |
5.38 | 5.49 | 3.16 | 2.60 | 4.17 | 4.23 | 4.24 | 4.79 | |
GC size | 0.72 | 0.36 | 0.43 | 0.32 | 0.57 | 0.33 | 0.55 | 0.57 |
C | 0.28 | 0.27 | 0.19 | 0.19 | 0.21 | 0.25 | 0.16 | 0.27 |
l | 7.13 | 7.49 | 9.49 | 9.99 | 8.88 | 5.34 | 8.5 | 11.59 |
1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | 1.0 * | |
r | 0.59 | 0.75 | 0.70 | 0.65 | 0.71 | 0.67 | 0.73 | 0.66 |
JSD | 0.0 | 0.0080 | 0.0 | 0.020 | 0.0 | 0.0068 | 0.0 | 0.0039 |
FK | EMP | ZTP(1x) | ZTP(1.5x) | GEO | |
---|---|---|---|---|---|
N | 7861 | 2959 | 2753 | 211 | 3592 |
m | 22,745 | 7954 | 13,127 | 304 | 35,938 |
5.79 | 5.38 | 9.54 | 2.88 | 20.0 | |
GC size | 0.69 | 0.85 | 0.85 | 0.64 | 0.95 |
C | 0.21 | 0.24 | 0.28 | 0.19 | 0.35 |
l | 6.38 | 5.19 | 4.48 | 4.71 | 3.73 |
1.0 * | 1.0 * | 1.0 * | 1.74 | 1.0 * | |
r | 0.67 | 0.44 | 0.53 | 0.46 | 0.49 |
JSD | 0.0 | 0.011 | 0.040 | 0.086 | 0.16 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Brown, K.S.; Allopenna, P.D.; Hunt, W.R.; Steiner, R.; Saltzman, E.; McRae, K.; Magnuson, J.S. Universal Features in Phonological Neighbor Networks. Entropy 2018, 20, 526. https://doi.org/10.3390/e20070526
Brown KS, Allopenna PD, Hunt WR, Steiner R, Saltzman E, McRae K, Magnuson JS. Universal Features in Phonological Neighbor Networks. Entropy. 2018; 20(7):526. https://doi.org/10.3390/e20070526
Chicago/Turabian StyleBrown, Kevin S., Paul D. Allopenna, William R. Hunt, Rachael Steiner, Elliot Saltzman, Ken McRae, and James S. Magnuson. 2018. "Universal Features in Phonological Neighbor Networks" Entropy 20, no. 7: 526. https://doi.org/10.3390/e20070526
APA StyleBrown, K. S., Allopenna, P. D., Hunt, W. R., Steiner, R., Saltzman, E., McRae, K., & Magnuson, J. S. (2018). Universal Features in Phonological Neighbor Networks. Entropy, 20(7), 526. https://doi.org/10.3390/e20070526