Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

Henkel, Jordan; Lahiri, Shuvendu K.; Liblit, Ben; Reps, Thomas

doi:10.1145/3236024.3236085

Computer Science > Software Engineering

arXiv:1803.06686 (cs)

[Submitted on 18 Mar 2018 (v1), last revised 20 Aug 2018 (this version, v2)]

Title:Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

Authors:Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps

View PDF

Abstract:With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied.
In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:1803.06686 [cs.SE]
	(or arXiv:1803.06686v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.1803.06686
Related DOI:	https://doi.org/10.1145/3236024.3236085

Submission history

From: Jordan Henkel [view email]
[v1] Sun, 18 Mar 2018 16:40:13 UTC (88 KB)
[v2] Mon, 20 Aug 2018 16:03:10 UTC (678 KB)

Computer Science > Software Engineering

Title:Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators