Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Sartran, Laurent; Barrett, Samuel; Kuncoro, Adhiguna; Stanojević, Miloš; Blunsom, Phil; Dyer, Chris

Computer Science > Computation and Language

arXiv:2203.00633v1 (cs)

[Submitted on 1 Mar 2022 (this version), latest version 6 Dec 2022 (v2)]

Title:Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Authors:Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Miloš Stanojević, Phil Blunsom, Chris Dyer

View PDF

Abstract:Transformer language models that are trained on vast amounts of data have achieved remarkable success at various NLP benchmarks. Intriguingly, this success is achieved by models that lack an explicit modeling of hierarchical syntactic structures, which were hypothesized by decades of linguistic research to be necessary for good generalization. This naturally leaves a question: to what extent can we further improve the performance of Transformer language models, through an inductive bias that encourages the model to explain the data through the lens of recursive syntactic compositions? Although the benefits of modeling recursive syntax have been shown at the small data and model scales, it remains an open question whether -- and to what extent -- a similar design principle is still beneficial in the case of powerful Transformer language models that work well at scale. To answer these questions, we introduce Transformer Grammars -- a novel class of Transformer language models that combine: (i) the expressive power, scalability, and strong performance of Transformers, and (ii) recursive syntactic compositions, which here are implemented through a special attention mask. We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics, in addition to sentence-level language modeling perplexity. Nevertheless, we find that the recursive syntactic composition bottleneck harms perplexity on document-level modeling, providing evidence that a different kind of memory mechanism -- that works independently of syntactic structures -- plays an important role in the processing of long-form text.

Comments:	24 pages, 9 figures, 3 tables and 1 algorithm
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2203.00633 [cs.CL]
	(or arXiv:2203.00633v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2203.00633

Submission history

From: Laurent Sartran [view email]
[v1] Tue, 1 Mar 2022 17:22:31 UTC (103 KB)
[v2] Tue, 6 Dec 2022 15:20:14 UTC (188 KB)

Computer Science > Computation and Language

Title:Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators