Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models

Gonen, Hila; Blevins, Terra; Liu, Alisa; Zettlemoyer, Luke; Smith, Noah A.

Computer Science > Computation and Language

arXiv:2408.06518 (cs)

[Submitted on 12 Aug 2024 (v1), last revised 12 Sep 2024 (this version, v2)]

Title:Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models

Authors:Hila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, Noah A. Smith

View PDF HTML (experimental)

Abstract:Despite their wide adoption, the biases and unintended behaviors of language models remain poorly understood. In this paper, we identify and characterize a phenomenon never discussed before, which we call semantic leakage, where models leak irrelevant information from the prompt into the generation in unexpected ways. We propose an evaluation setting to detect semantic leakage both by humans and automatically, curate a diverse test suite for diagnosing this behavior, and measure significant semantic leakage in 13 flagship models. We also show that models exhibit semantic leakage in languages besides English and across different settings and generation scenarios. This discovery highlights yet another type of bias in language models that affects their generation patterns and behavior.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2408.06518 [cs.CL]
	(or arXiv:2408.06518v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2408.06518

Submission history

From: Hila Gonen [view email]
[v1] Mon, 12 Aug 2024 22:30:55 UTC (5,035 KB)
[v2] Thu, 12 Sep 2024 18:33:33 UTC (5,035 KB)

Computer Science > Computation and Language

Title:Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators