You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

Li, Haoran; Song, Yangqiu; Fan, Lixin

Computer Science > Computation and Language

arXiv:2205.10228 (cs)

[Submitted on 26 Apr 2022]

Title:You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

Authors:Haoran Li, Yangqiu Song, Lixin Fan

View PDF

Abstract:Social chatbots, also known as chit-chat chatbots, evolve rapidly with large pretrained language models. Despite the huge progress, privacy concerns have arisen recently: training data of large language models can be extracted via model inversion attacks. On the other hand, the datasets used for training chatbots contain many private conversations between two individuals. In this work, we further investigate the privacy leakage of the hidden states of chatbots trained by language modeling which has not been well studied yet. We show that speakers' personas can be inferred through a simple neural network with high accuracy. To this end, we propose effective defense objectives to protect persona leakage from hidden states. We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6% to 0.5%. Meanwhile, the proposed objectives preserve language models' powerful generation ability.

Comments:	Conference paper accepted by NAACL 2022
Subjects:	Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2205.10228 [cs.CL]
	(or arXiv:2205.10228v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.10228

Submission history

From: Haoran Li [view email]
[v1] Tue, 26 Apr 2022 09:36:18 UTC (1,566 KB)

Computer Science > Computation and Language

Title:You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators