What are the scope and limits of syntactic variation within and across varieties of English? To a... more What are the scope and limits of syntactic variation within and across varieties of English? To address this question, we investigate well-known syntactic variation between the s-genitive (Mr Barnsley’s management) and the of-genitive (the management of Mr Barnsley) in nine varieties of English. We specifically gauge the stability of constraints on this variation by analyzing a richly annotated dataset spanning 10,558 interchangeable genitives from nine components of the International Corpus of English. Regression modeling indicates that constraints such as possessor animacy, constituent length, final sibilancy of the possessor, as well as the effect of medium (spoken vs. written) as a language-external factor differ in strength across varieties. The language-internal constraints, however, never change effect direction. We conclude that the probabilistic grammar fueling genitive variation is surprisingly stable overall, but does exhibit some fluidity along the lines of a distinction between English as a native language (ENL) and English as a second language (ESL) varieties: those constraints that tend to favor s-genitive usage tend to be weakened in ESL varieties.
This study explores variability in particle placement across nine varieties of English around the... more This study explores variability in particle placement across nine varieties of English around the globe, utilizing data from The International Corpus of English and the Global corpus of Web-based English. We introduce a quantitative approach for comparative sociolinguistics that integrates linguistic distance metrics and predictive modelling, and use these methods to examine the development of regional patterns in grammatical constraints on particle placement in World Englishes. We find a high degree of uniformity among the conditioning factors influencing particle placement in native varieties, (e.g., British, Canadian, and New Zealand English), while English as a second language varieties, (e.g., Indian and Singaporean English), exhibit a high degree of dissimilarity with the native varieties and with each other. We attribute the greater heterogeneity among second language varieties to the interaction between general L2 acquisition processes and the varying sociolinguistic contexts of the individual regions. We argue that the similarities in constraint effects represent compelling evidence for the existence of a shared variable grammar, and variation among grammatical systems is more appropriately analyzed and interpreted as a continuum rather than multiple distinct grammars.
This special collection brings together research exploring and evaluating probabilistic variation... more This special collection brings together research exploring and evaluating probabilistic variation patterns from a comparative perspective, thus highlighting current work situated at the crossroads of research on usage-based theoretical linguistics, variationist linguistics, and sociolinguistics. The contributions in the collection advance our understanding of the plasticity of syntactic knowledge on the part of language users with diverse regional and/or cultural backgrounds, and demonstrate how a probabilistic approach to grammatical variation can offer insight into the scope and limits of language variation. In this general introduction to the special collection, we provide some essential background for perspective, and subsequently summarize the contributions in the collection.
We investigate internal and stylistic factors affecting binary and ternary
relativizer choice in ... more We investigate internal and stylistic factors affecting binary and ternary relativizer choice in subject (that vs which) and non-subject (that vs which vs zero) relative clauses. We employ a novel methodological approach to predicting relativizers: Bayesian regression modeling with the dimensional reduction of model inputs via factor analysis. Our factor analysis is motivated by the high degree of redundancy and collinearity in natural language data, while Bayesian regression models are robust to effects of data sparseness and (near) separation. We find that in both types of relative clauses, the more marked variant (which) is preferred in complex contexts, while the unmarked variant (that, or zero in NSRCs) is favored in contexts where the relative clause is short and more fully integrated with the NP it modifies. We also find that use of which is somewhat more sensitive to stylistic considerations in subject than in non-subject relative clauses, and that which correlates most strongly with features associated with lexical density, e. g. ‘nouniness’, rather than those often associated with formality, e. g. passivization and sentence length.
This paper introduces a new resource designed to facilitate the quantitative investigation of syn... more This paper introduces a new resource designed to facilitate the quantitative investigation of syntactic variation in spoken language from a comparative perspective. The datasets comprise homogeneously annotated collections of " interchangeable " (i.e. competing) genitive and dative variants in four varieties of English: American English, British English, Canadian English, and New Zealand English. To showcase the empirical potential of the data source, we present a suggestive analysis that investigates the extent to which the probabilistic grammar of genitive and dative variant choice differs across varieties. The statistical analysis reveals that while there are a number of subtle probabilistic contrasts between the regional varieties under study, there is overall a striking degree of cross-varietal homogeneity. We conclude by outlining directions for future research.
We sketch a project that marries probabilistic grammar research to scholarship on World Englishes... more We sketch a project that marries probabilistic grammar research to scholarship on World Englishes, thus synthesizing two previously rather disjoint lines of research into one unifying project with a coherent focus. This synthesis is hoped to advance usage-based theoretical linguistics by adopting a large-scale comparative and sociolinguistically responsible perspective on grammatical variation. To highlight the descriptive and theoretical benefits of the approach, we present case studies of three syntactic alternations (the particle placement, genitive, and dative alternations) in four varieties of English (British, Canadian, Indian, and Singapore), as represented in the International Corpus of English. We report that the varieties studied share a core probabilistic grammar which is, however, subject to indigenization at various degrees of subtlety, depending on the abstractness of the syntactic patterns studied.
Talk given at the LSA 2011 Annual Meeting, Jan 1, 2011
... 85th Annual Meeting Pittsburgh, Pennsylvania January 9, 2011 Stephanie Shih Jason Grafmiller ... more ... 85th Annual Meeting Pittsburgh, Pennsylvania January 9, 2011 Stephanie Shih Jason Grafmiller Department of Linguistics Stanford University ... Science Foundation. Contact: Stephanie Shih Jason Grafmiller [email protected][email protected] ...
What are the scope and limits of syntactic variation within and across varieties of English? To a... more What are the scope and limits of syntactic variation within and across varieties of English? To address this question, we investigate well-known syntactic variation between the s-genitive (Mr Barnsley’s management) and the of-genitive (the management of Mr Barnsley) in nine varieties of English. We specifically gauge the stability of constraints on this variation by analyzing a richly annotated dataset spanning 10,558 interchangeable genitives from nine components of the International Corpus of English. Regression modeling indicates that constraints such as possessor animacy, constituent length, final sibilancy of the possessor, as well as the effect of medium (spoken vs. written) as a language-external factor differ in strength across varieties. The language-internal constraints, however, never change effect direction. We conclude that the probabilistic grammar fueling genitive variation is surprisingly stable overall, but does exhibit some fluidity along the lines of a distinction between English as a native language (ENL) and English as a second language (ESL) varieties: those constraints that tend to favor s-genitive usage tend to be weakened in ESL varieties.
This study explores variability in particle placement across nine varieties of English around the... more This study explores variability in particle placement across nine varieties of English around the globe, utilizing data from The International Corpus of English and the Global corpus of Web-based English. We introduce a quantitative approach for comparative sociolinguistics that integrates linguistic distance metrics and predictive modelling, and use these methods to examine the development of regional patterns in grammatical constraints on particle placement in World Englishes. We find a high degree of uniformity among the conditioning factors influencing particle placement in native varieties, (e.g., British, Canadian, and New Zealand English), while English as a second language varieties, (e.g., Indian and Singaporean English), exhibit a high degree of dissimilarity with the native varieties and with each other. We attribute the greater heterogeneity among second language varieties to the interaction between general L2 acquisition processes and the varying sociolinguistic contexts of the individual regions. We argue that the similarities in constraint effects represent compelling evidence for the existence of a shared variable grammar, and variation among grammatical systems is more appropriately analyzed and interpreted as a continuum rather than multiple distinct grammars.
This special collection brings together research exploring and evaluating probabilistic variation... more This special collection brings together research exploring and evaluating probabilistic variation patterns from a comparative perspective, thus highlighting current work situated at the crossroads of research on usage-based theoretical linguistics, variationist linguistics, and sociolinguistics. The contributions in the collection advance our understanding of the plasticity of syntactic knowledge on the part of language users with diverse regional and/or cultural backgrounds, and demonstrate how a probabilistic approach to grammatical variation can offer insight into the scope and limits of language variation. In this general introduction to the special collection, we provide some essential background for perspective, and subsequently summarize the contributions in the collection.
We investigate internal and stylistic factors affecting binary and ternary
relativizer choice in ... more We investigate internal and stylistic factors affecting binary and ternary relativizer choice in subject (that vs which) and non-subject (that vs which vs zero) relative clauses. We employ a novel methodological approach to predicting relativizers: Bayesian regression modeling with the dimensional reduction of model inputs via factor analysis. Our factor analysis is motivated by the high degree of redundancy and collinearity in natural language data, while Bayesian regression models are robust to effects of data sparseness and (near) separation. We find that in both types of relative clauses, the more marked variant (which) is preferred in complex contexts, while the unmarked variant (that, or zero in NSRCs) is favored in contexts where the relative clause is short and more fully integrated with the NP it modifies. We also find that use of which is somewhat more sensitive to stylistic considerations in subject than in non-subject relative clauses, and that which correlates most strongly with features associated with lexical density, e. g. ‘nouniness’, rather than those often associated with formality, e. g. passivization and sentence length.
This paper introduces a new resource designed to facilitate the quantitative investigation of syn... more This paper introduces a new resource designed to facilitate the quantitative investigation of syntactic variation in spoken language from a comparative perspective. The datasets comprise homogeneously annotated collections of " interchangeable " (i.e. competing) genitive and dative variants in four varieties of English: American English, British English, Canadian English, and New Zealand English. To showcase the empirical potential of the data source, we present a suggestive analysis that investigates the extent to which the probabilistic grammar of genitive and dative variant choice differs across varieties. The statistical analysis reveals that while there are a number of subtle probabilistic contrasts between the regional varieties under study, there is overall a striking degree of cross-varietal homogeneity. We conclude by outlining directions for future research.
We sketch a project that marries probabilistic grammar research to scholarship on World Englishes... more We sketch a project that marries probabilistic grammar research to scholarship on World Englishes, thus synthesizing two previously rather disjoint lines of research into one unifying project with a coherent focus. This synthesis is hoped to advance usage-based theoretical linguistics by adopting a large-scale comparative and sociolinguistically responsible perspective on grammatical variation. To highlight the descriptive and theoretical benefits of the approach, we present case studies of three syntactic alternations (the particle placement, genitive, and dative alternations) in four varieties of English (British, Canadian, Indian, and Singapore), as represented in the International Corpus of English. We report that the varieties studied share a core probabilistic grammar which is, however, subject to indigenization at various degrees of subtlety, depending on the abstractness of the syntactic patterns studied.
Talk given at the LSA 2011 Annual Meeting, Jan 1, 2011
... 85th Annual Meeting Pittsburgh, Pennsylvania January 9, 2011 Stephanie Shih Jason Grafmiller ... more ... 85th Annual Meeting Pittsburgh, Pennsylvania January 9, 2011 Stephanie Shih Jason Grafmiller Department of Linguistics Stanford University ... Science Foundation. Contact: Stephanie Shih Jason Grafmiller [email protected][email protected] ...
Uploads
relativizer choice in subject (that vs which) and non-subject (that vs which vs zero) relative clauses. We employ a novel methodological approach to predicting relativizers: Bayesian regression modeling with the dimensional reduction of model inputs via factor analysis. Our factor analysis is motivated by the high degree of redundancy and collinearity in natural language data, while Bayesian regression models are robust to effects of data sparseness and (near) separation. We find that in both types of relative clauses, the more marked variant (which) is preferred in complex contexts, while the unmarked variant (that, or zero in NSRCs) is favored in contexts where the relative clause is short and more fully integrated with the NP it modifies. We also find that use of which is somewhat
more sensitive to stylistic considerations in subject than in non-subject relative
clauses, and that which correlates most strongly with features associated with
lexical density, e. g. ‘nouniness’, rather than those often associated with formality,
e. g. passivization and sentence length.
relativizer choice in subject (that vs which) and non-subject (that vs which vs zero) relative clauses. We employ a novel methodological approach to predicting relativizers: Bayesian regression modeling with the dimensional reduction of model inputs via factor analysis. Our factor analysis is motivated by the high degree of redundancy and collinearity in natural language data, while Bayesian regression models are robust to effects of data sparseness and (near) separation. We find that in both types of relative clauses, the more marked variant (which) is preferred in complex contexts, while the unmarked variant (that, or zero in NSRCs) is favored in contexts where the relative clause is short and more fully integrated with the NP it modifies. We also find that use of which is somewhat
more sensitive to stylistic considerations in subject than in non-subject relative
clauses, and that which correlates most strongly with features associated with
lexical density, e. g. ‘nouniness’, rather than those often associated with formality,
e. g. passivization and sentence length.