CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

Qin, Zeqing; Wu, Yiwei; Han, Lansheng

Computer Science > Cryptography and Security

arXiv:2409.07407 (cs)

[Submitted on 11 Sep 2024]

Title:CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

Authors:Zeqing Qin, Yiwei Wu, Lansheng Han

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have shown great promise in vulnerability identification. As C/C++ comprises half of the Open-Source Software (OSS) vulnerabilities over the past decade and updates in OSS mainly occur through commits, enhancing LLMs' ability to identify C/C++ Vulnerability-Contributing Commits (VCCs) is essential. However, current studies primarily focus on further pre-training LLMs on massive code datasets, which is resource-intensive and poses efficiency challenges. In this paper, we enhance the ability of BERT-based LLMs to identify C/C++ VCCs in a lightweight manner. We propose CodeLinguaNexus (CLNX) as a bridge facilitating communication between C/C++ programs and LLMs. Based on commits, CLNX efficiently converts the source code into a more natural representation while preserving key details. Specifically, CLNX first applies structure-level naturalization to decompose complex programs, followed by token-level naturalization to interpret complex symbols. We evaluate CLNX on public datasets of 25,872 C/C++ functions with their commits. The results show that CLNX significantly enhances the performance of LLMs on identifying C/C++ VCCs. Moreover, CLNX-equipped CodeBERT achieves new state-of-the-art and identifies 38 OSS vulnerabilities in the real world.

Comments:	8 pages, 2 figures, conference
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
MSC classes:	68M25
Cite as:	arXiv:2409.07407 [cs.CR]
	(or arXiv:2409.07407v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2409.07407

Submission history

From: Zeqing Qin [view email]
[v1] Wed, 11 Sep 2024 16:49:46 UTC (946 KB)

Computer Science > Cryptography and Security

Title:CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators