How Much Do Modifications to Transformer Language Models Affect Their Ability to Learn Linguistic Knowledge?

Simeng Sun, Brian Dillon, Mohit Iyyer


Abstract
Recent progress in large pretrained language models (LMs) has led to a growth of analyses examining what kinds of linguistic knowledge are encoded by these models. Due to computational constraints, existing analyses are mostly conducted on publicly-released LM checkpoints, which makes it difficult to study how various factors during training affect the models’ acquisition of linguistic knowledge. In this paper, we train a suite of small-scale Transformer LMs that differ from each other with respect to architectural decisions (e.g., self-attention configuration) or training objectives (e.g., multi-tasking, focal loss). We evaluate these LMs on BLiMP, a targeted evaluation benchmark of multiple English linguistic phenomena. Our experiments show that while none of these modifications yields significant improvements on aggregate, changes to the loss function result in promising improvements on several subcategories (e.g., detecting adjunct islands, correctly scoping negative polarity items). We hope our work offers useful insights for future research into designing Transformer LMs that more effectively learn linguistic knowledge.
Anthology ID:
2022.insights-1.6
Volume:
Proceedings of the Third Workshop on Insights from Negative Results in NLP
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Shabnam Tafreshi, João Sedoc, Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Arjun Akula
Venue:
insights
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–53
Language:
URL:
https://aclanthology.org/2022.insights-1.6
DOI:
10.18653/v1/2022.insights-1.6
Bibkey:
Cite (ACL):
Simeng Sun, Brian Dillon, and Mohit Iyyer. 2022. How Much Do Modifications to Transformer Language Models Affect Their Ability to Learn Linguistic Knowledge?. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, pages 46–53, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
How Much Do Modifications to Transformer Language Models Affect Their Ability to Learn Linguistic Knowledge? (Sun et al., insights 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.insights-1.6.pdf
Video:
 https://aclanthology.org/2022.insights-1.6.mp4
Data
BLiMP