Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution

Tong, Yao; Li, Weijun; He, Xuanli; Zhan, Haolan; Xu, Qiongkai

Computer Science > Computation and Language

arXiv:2412.20476 (cs)

[Submitted on 29 Dec 2024]

Title:Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution

Authors:Yao Tong, Weijun Li, Xuanli He, Haolan Zhan, Qiongkai Xu

View PDF HTML (experimental)

Abstract:The success of DNNs often depends on training with large-scale datasets, but building such datasets is both expensive and challenging. Consequently, public datasets from open-source platforms like HuggingFace have become popular, posing significant risks of data poisoning attacks. Existing backdoor defenses in NLP primarily focus on identifying and removing poisoned samples; however, purifying a backdoored model with these sample-cleaning approaches typically requires expensive retraining. Therefore, we propose Greedy Module Substitution (GMS), which identifies and substitutes ''deadwood'' modules (i.e., components critical to backdoor pathways) in a backdoored model to purify it. Our method relaxes the common dependency of prior model purification methods on clean datasets or clean auxiliary models. When applied to RoBERTa-large under backdoor attacks, GMS demonstrates strong effectiveness across various settings, particularly against widely recognized challenging attacks like LWS, achieving a post-purification attack success rate (ASR) of 9.7% on SST-2 compared to 58.8% for the best baseline approach.

Comments:	preprint
Subjects:	Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Cite as:	arXiv:2412.20476 [cs.CL]
	(or arXiv:2412.20476v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.20476

Submission history

From: Xuanli He [view email]
[v1] Sun, 29 Dec 2024 14:29:34 UTC (832 KB)

Computer Science > Computation and Language

Title:Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators