Multilingual large language models leak human stereotypes across language boundaries

Cao, Yang Trista; Sotnikova, Anna; Zhao, Jieyu; Zou, Linda X.; Rudinger, Rachel; Daume III, Hal

Computer Science > Computation and Language

arXiv:2312.07141 (cs)

[Submitted on 12 Dec 2023 (v1), last revised 8 May 2024 (this version, v2)]

Title:Multilingual large language models leak human stereotypes across language boundaries

Authors:Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger, Hal Daume III

View PDF HTML (experimental)

Abstract:Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language models undergo the same training procedure as monolingual ones, albeit with training data sourced from various languages. This raises the question: do stereotypes present in one social context leak across languages within the model? In our work, we first define the term ``stereotype leakage'' and propose a framework for its measurement. With this framework, we investigate how stereotypical associations leak across four languages: English, Russian, Chinese, and Hindi. To quantify the stereotype leakage, we employ an approach from social psychology, measuring stereotypes via group-trait associations. We evaluate human stereotypes and stereotypical associations manifested in multilingual large language models such as mBERT, mT5, and GPT-3.5. Our findings show a noticeable leakage of positive, negative, and non-polar associations across all languages. Notably, Hindi within multilingual models appears to be the most susceptible to influence from other languages, while Chinese is the least. Additionally, GPT-3.5 exhibits a better alignment with human scores than other models. WARNING: This paper contains model outputs which could be offensive in nature.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2312.07141 [cs.CL]
	(or arXiv:2312.07141v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2312.07141

Submission history

From: Anna Sotnikova [view email]
[v1] Tue, 12 Dec 2023 10:24:17 UTC (952 KB)
[v2] Wed, 8 May 2024 20:19:09 UTC (1,860 KB)

Computer Science > Computation and Language

Title:Multilingual large language models leak human stereotypes across language boundaries

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Multilingual large language models leak human stereotypes across language boundaries

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators