0% found this document useful (0 votes)

58 views7 pages

Automated Vulnerability Detectionin Source Code Using Deep Representation Learning

Uploaded by

Luiz Henrique Custódio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views7 pages

Automated Vulnerability Detectionin Source Code Using Deep Representation Learning

Uploaded by

Luiz Henrique Custódio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/330475443

Automated Vulnerability Detection in Source Code Using Deep Representation

Learning

Conference Paper · December 2018

DOI: 10.1109/ICMLA.2018.00120

CITATIONS READS
347 3,054

8 authors, including:

Louis Kim Paul Ellingwood

Draper Laboratory Draper Laboratory
13 PUBLICATIONS 679 CITATIONS 4 PUBLICATIONS 498 CITATIONS

SEE PROFILE SEE PROFILE

Marc McConley
Draper Laboratory
28 PUBLICATIONS 847 CITATIONS

SEE PROFILE

All content following this page was uploaded by Paul Ellingwood on 28 January 2021.

The user has requested enhancement of the downloaded file.

2018 17th IEEE International Conference on Machine Learning and Applications

Automated Vulnerability Detection in Source Code

Using Deep Representation Learning
Rebecca L. Russell1* , Louis Kim1 , Lei H. Hamilton1 , Tomo Lazovich1† ,
Jacob A. Harer1,2 , Onur Ozdemir1 , Paul M. Ellingwood1 , Marc W. McConley1

1
Draper
2
Boston University

Abstract— Increasing numbers of software vulnerabilities are II. R ELATED W ORK

discovered every year whether they are reported publicly or
discovered internally in proprietary code. These vulnerabilities There currently exist a wide variety of analysis tools that
can pose serious risk of exploit and result in system compromise, attempt to uncover common vulnerabilities in software. Static
information leaks, or denial of service. We leveraged the wealth
analyzers, such as Clang [7], do so without needing to execute
of C and C++ open-source code available to develop a large-
scale function-level vulnerability detection system using machine programs. Dynamic analyzers repeatedly execute programs
learning. To supplement existing labeled vulnerability datasets, with many test inputs on real or virtual processors to identify
we compiled a vast dataset of millions of open-source functions weaknesses. Both static and dynamic analyzers are rule-based
and labeled it with carefully-selected findings from three different tools and thus limited to their hand-engineered rules and not
static analyzers that indicate potential exploits. Using these
able to guarantee full test coverage of codebases. Symbolic
datasets, we developed a fast and scalable vulnerability detection
tool based on deep feature representation learning that directly execution [8] replaces input data with symbolic values and
interprets lexed source code. We evaluated our tool on code from analyzes their use over the control flow graph of a program.
both real software packages and the NIST SATE IV benchmark While it can probe all feasible program paths, symbolic exe-
dataset. Our results demonstrate that deep feature representation cution is expensive and does not scale well to large programs.
learning on source code is a promising approach for automated
software vulnerability detection. Beyond these traditional tools, there has been significant
Index Terms—artificial neural networks, computer security, recent work on the usage of machine learning for program
data mining, machine learning analysis. The availability of large amounts of open-source
code opens the opportunity to learn the patterns of software
I. I NTRODUCTION vulnerabilities directly from mined data. For a comprehensive
Hidden flaws in software can result in security vulnera- review of learning from “Big Code”, including that not directly
bilities that potentially allow attackers to compromise sys- related to our work, see Allamanis et al. [9].
tems and applications. Thousands of such vulnerabilities are In the area of vulnerability detection, Hovsepyan et al. [10]
reported publicly to the Common Vulnerabilities and Expo- used a support vector machine (SVM) on a bag-of-words
sures database [1] each year and many more are discov- (BOW) representation of a simple tokenization of Java source
ered internally in proprietary code and patched. Recent high- code to predict static analyzer labels. However, their work
profile exploits have shown that these security holes can have was limited to training and evaluating on a single software
disastrous effects, both financially and societally [5]. These repository. Pang et al. [11] expanded on this work by including
vulnerabilities are often caused by subtle errors made by n-grams in the feature vectors used with the SVM classifier.
programmers and can propagate quickly due to the prevalence Mou et al. [12] explored the potential of deep learning for
of open-source software and code reuse. program analysis by embedding the nodes of the abstract
While there are existing tools for program analysis, these syntax tree representations of source code and training a
tools typically only detect a limited subset of possible er- tree-based convolutional neural network for simple supervised
rors based on pre-defined rules. With the recent widespread classification problems. Li et al. [13] used a recurrent neural
availability of open-source repositories, it has become pos- network (RNN) trained on code snippets related to library/API
sible to use data-driven techniques to discover vulnerability function calls to detect two types of vulnerabilities related to
patterns. We present machine learning (ML) techniques for the improper usage of those calls. Harer et al. [14] trained
the automated detection of vulnerabilities in C/C++1 source an RNN to detect vulnerabilities in the lexed representations
code learned from real world code examples. of functions in a synthetic codebase, as part of a generative
adversarial approach to code repair.
* Correspondence to: [email protected]
† Tomo To our knowledge, no work has been done on using deep
Lazovich now works at Lightmatter
1 While our work focuses on C/C++, the techniques are applicable to any learning to learn features directly from source code in a large
programming language. natural codebase to detect a variety of vulnerabilities. The

978-1-5386-6805-4/18/$31.00 ©2018 IEEE 757

DOI 10.1109/ICMLA.2018.00120

Authorized licensed use limited to: DRAPER. Downloaded on June 09,2020 at 17:30:23 UTC from IEEE Xplore. Restrictions apply.
limited datasets (in both size and variety) used by most of lexers, designed for actually compiling code, capture far too
the previous works limit the usefulness of the results and much detail that can lead to overfitting in ML approaches.
prevent them from taking full advantage of the power of deep Our lexer was able to reduce C/C++ code to representations
learning. using a total vocabulary size of only 156 tokens. All base
C/C++ keywords, operators, and separators are included in
III. DATA
the vocabulary. Code that does not affect compilation, such
Given the complexity and variety of programs, a large as comments, is stripped out. String, character, and float
number of training examples are required to train machine literals are lexed to type-specific placeholder tokens, as are all
learning models that can effectively learn the patterns of identifiers. Integer literals are tokenized digit-by-digit, as these
security vulnerabilities directly from code. We chose to an- values are frequently relevant to vulnerabilities. Types and
alyze software packages at the function-level because it is function calls from common libraries that are likely to have
the lowest level of granularity capturing the overall flow of a relevance to vulnerabilities are mapped to generic versions. For
subroutine. We compiled a vast dataset of millions of function- example, u32, uint32_t, UINT32, uint32, and DWORD
level examples of C and C++ code from the SATE IV Juliet are all lexed as the same generic token representing 32-bit
Test Suite [6], Debian Linux distribution [15], and public Git unsigned data types. Learned embeddings of these individual
repositories on GitHub [16]. Table I shows the data summary tokens would likely distinguish them based on the kind of code
of the number of functions we collected and used from each they are commonly used in, so care was taken to build in the
source in our dataset of over 12 million functions. desired invariance.
SATE IV GitHub Debian B. Data curation
Total 121,353 9,706,269 3,046,758 One very important step of our data preparation was
Passing curation 11,896 782,493 491,873
‘Not vulnerable’ 6,503 (55%) 730,160 (93%) 461,795 (94%) the removal of potential duplicate functions. Open-source
‘Vulnerable’ 5,393 (45%) 52,333 (7%) 30,078 (6%) repositories often have functions duplicated across different
packages. Such duplication can artificially inflate performance
TABLE I: Total number of functions obtained from each metrics and conceal overfitting, as training data can leak into
data source, the number of valid functions remaining after test sets. Likewise, there are many functions that are near
removing duplicates and applying cuts, and the number of duplicates, containing trivial changes in source code that do
functions without and with detected vulnerabilities. not significantly affect the execution of the function. These
near duplicates are challenging to remove, as they can often
The SATE IV Juliet Test Suite contains synthetic code appear in very different code repositories and can look quite
examples with vulnerabilities from 118 different Common different at the raw source level.
Weakness Enumeration (CWE) [1] classes and was originally To protect against these issues, we performed an extremely
designed to explore the performance of static and dynamic strict duplicate removal process. We removed any function
analyzers. While the SATE IV dataset provides labeled exam- with a duplicated lexed representation of its source code or
ples of many types of vulnerabilities, it is made up of synthetic a duplicated compile-level feature vector. This compile-level
code snippets that do not sufficiently cover the space of natural feature vector was created by extracting the control flow graph
code to provide an appropriate training set alone. To provide a of the function as well as the operations happening in each
vast dataset of natural code to augment the SATE IV data, we basic block (opcode vector, or op-vec) and the definition and
mined large numbers of functions from Debian packages and use of variables (use-def matrix)2 . Two functions with iden-
public Git repositories. The Debian package releases provide tical instruction-level behaviors or functionality are likely to
a selection of very well-managed and curated code which is have both similar lexed representations and highly correlated
in use on many systems today. The GitHub dataset provides a vulnerability status.
larger quantity and wider variety of (often lower-quality) code. The “Passing curation” row of Table I reflects the number
Since the open-source functions from Debian and GitHub are of functions remaining after the duplicate removal process,
not labeled, we used a suite of static analysis tools to generate about 10.8% of the total number of functions pulled. Although
the labels. Details of the label generation are explained in our strict duplicate removal process filters out a significant
Subsection III-C. amount of data, this approach provides the most conservative
performance results, closely estimating how well our tool will
A. Source lexing perform against code it has never seen before.
To generate useful features from the raw source code of
each function, we created a custom C/C++ lexer designed to C. Labels
capture the relevant meaning of critical tokens while keeping Labeling code vulnerability at the function level was a
the representation generic and minimizing the total token significant challenge. The bulk of our dataset was made up
vocabulary size. Making our lexed representation of code 2 Our compile-level feature extraction framework incorporated modified
from different software repositories as standardized as possible variants of strace and buildbot as well as a custom Clang plugin – we omit
empowers transfer learning across the full dataset. Standard the details to focus on the ML aspects of our work.

758

Authorized licensed use limited to: DRAPER. Downloaded on June 09,2020 at 17:30:23 UTC from IEEE Xplore. Restrictions apply.
CWE ID CWE Description Frequency %
120/121/122 Buffer Overﬂow 38.2%
119 Improper Restriction of Operations within the Bounds of a Memory Buffer 18.9%
476 NULL Pointer Dereference 9.5%
469 Use of Pointer Subtraction to Determine Size 2.0%
20, 457, 805 etc. Improper Input Validation, Use of Uninitialized Variable, Buffer Access with Incorrect Length Value, etc. 31.4%

TABLE II: CWE statistics of vulnerabilities detected in our C/C++ dataset.

of mined open-source code without known ground truth. In “not vulnerable” even though static analyzers flagged them.
order to generate labels, we pursued three approaches: static Of the 390 total types of findings from the static analyzers,
analysis, dynamic analysis, and commit-message/bug-report 149 were determined to result in a potential security vulnera-
tagging. bility. Roughly 6.8% of our curated, mined C/C++ functions
While dynamic analysis is capable of exposing subtle flaws triggered a vulnerability-related finding. Table II shows the
by executing functions with a wide range of possible inputs, it statistics of frequent CWEs in these “vulnerable” functions.
is extremely resource intensive. Performing a dynamic analysis
on the roughly 400 functions in a single module of the LibTIFF IV. M ETHODS
3.8.2 package from the ManyBugs dataset [17] took nearly a Our primary machine learning approach to vulnerability
day of effort. Therefore, this approach was not realistic for detection, depicted in Figure 1, combines the neural feature
our extremely large dataset. representations of lexed function source code with a powerful
Commit-message based labeling turned out to be very ensemble classifier, random forest (RF).
challenging, providing low-quality labels. In our tests, both
humans and ML algorithms were poor at using commit mes- A. Neural network classification and representation learning
sages to predict corresponding Travis CI [18] build failures or Since source code shares some commonalities with writing
fixes. Motivated by recent work by Zhou et al. [19], we also and work done for programming languages is more limited, we
tried a simple keyword search looking for commit words like build off approaches developed for natural language processing
“buggy”, “broken”, “error”, “fixed”, etc. to label before-and- (NLP) [22]. We leverage feature-extraction approaches similar
after pairs of functions, which yielded better results in terms of to those used for sentence sentiment classification with convo-
relevancy. However, this approach greatly reduced the number lutional neural networks (CNNs) and recurrent neural networks
of candidate functions that we could label and still required (RNNs) for function-level source vulnerability classification.
significant manual inspection, making it inappropriate for our 1) Embedding: The tokens making up the lexed functions
vast dataset. are first embedded into a fixed k-dimensional representation
As a result, we decided to use three open-source static ana- (limited to range [−1, 1]) that is learned during classification
lyzers, Clang, Cppcheck [20], and Flawfinder [21], to generate training via backpropagation to a linear transformation of
labels. Each static analyzer varies in its scope of search and a one-hot embedding. Several unsupervised word2vec ap-
detection. For example, Clang’s scope is very broad but also proaches [23] trained on a much larger unlabeled dataset were
picks up on syntax, programming style, and other findings explored for seeding this embedding, but these yielded mini-
which are not likely to result in a vulnerability. Flawfinder’s mal improvement in classification performance over randomly-
scope is geared towards CWEs and does not focus on other initialized learned embeddings. A fixed one-hot embedding
aspects such as style. Therefore, we incorporated multiple was also tried, but gave diminished results. As our vocabulary
static analyzers and pruned their outputs to exclude findings size is much smaller than those of natural languages, we
that are not typically associated with security vulnerabilities were able to use a much smaller embedding than is typical
in an effort to create robust labels. in NLP applications. Our experiments found that k = 13
We had a team of security researchers map each static performed the best for supervised embedding sizes, balanc-
analyzer’s finding categories to the corresponding CWEs and ing the expressiveness of the embedding against overfitting.
identify which CWEs would likely result in potential security We found that adding a small
amount of random Gaussian
vulnerabilities. This process allowed us to generate binary noise N μ = 0, σ 2 = 0.01 to each embedded representation
labels of “vulnerable” and “not vulnerable”, depending on the substantially improved resistance to overfitting and was much
CWE. For example, Clang’s “Out-of-bound array access” find- more effective than other regularization techniques such as
ing was mapped to “CWE-805: Buffer Access with Incorrect weight decay.
Length Value”, an exploitable vulnerability that can lead to 2) Feature extraction: We explored both CNNs and RNNs
program crashes, so functions with this finding were labeled for feature extraction from the embedded source representa-
“vulnerable”. On the other hand, Cppcheck’s “Unused struct tions. Convolutional feature extraction: We use n convolu-
member” finding was mapped to “CWE-563: Assignment to tional filters with shape m × k, so each filter spans the full
Variable without Use”, a poor code practice unlikely to cause a space of the token embedding. The filter size m determines the
security vulnerability, so corresponding functions were labeled number of sequential tokens that are considered together and

759

Authorized licensed use limited to: DRAPER. Downloaded on June 09,2020 at 17:30:23 UTC from IEEE Xplore. Restrictions apply.
int sequence
* max pool
id
source code dense layers
=
new

int labels

[
1
0
lexer ] random forest
; classifier
convolutional learned source
embedding filters features

Fig. 1: Illustration of our convolutional neural representation-learning approach to source code classification. Input source code
is lexed into a token sequence of variable length , embedded into a × k representation, filtered by n convolutions of size
m × k, and maxpooled along the sequence length to a feature vector of fixed size n. The embedding and convolutional filters
are learned by weighted cross entropy loss from fully-connected classification layers. The learned n-dimensional feature vector
is used as input to a random forest classifier, which improves performance compared to the neural network classifier alone.

we found that a fairly large filter size of m = 9 worked best. A combined dataset to train, validate, and test our models. We
total number of n = 512 filters, paired with batch normaliza- tuned and selected models based on the highest validation
tion followed by ReLU, was most effective. Recurrent feature Matthews Correlation Coefficient (MCC).
extraction: We also explored using recurrent neural networks
for feature extraction to allow longer token-dependencies to be B. Ensemble learning on neural representations
captured. The embedded representation is fed to a multi-layer While the neural network approaches automatically build
RNN and the output at each step in the length sequence is their own features, their classification performance on our
concatenated. We used two-layer Gated Recurrent Unit RNNs full dataset was suboptimal. We found that using the neural
with hidden state size n = 256, though Long Short Term features (outputs from the sequence-maxpooled convolution
Memory RNNs performed equally well. layer in the CNN and sequence-maxpooled output states in
3) Pooling: As the length of C/C++ functions found in the RNN) as inputs to a powerful ensemble classifier such as
the wild can vary dramatically, both the convolutional and random forest or extremely randomized trees yielded the best
recurrent features are maxpooled along the sequence length results on our full dataset. Having the features and classifier
in order to generate a fixed-size (n or n , respectively) optimized separately seemed to help resist overfitting. This
representation. In this architecture, the feature extraction layers approach also makes it more convenient to quickly retrain a
should learn to identify different signals of vulnerability and classifier on new sets of features or combinations of features.
thus the presence of any of these along the sequence is
important. V. R ESULTS
4) Dense layers: The feature extraction layers are followed To provide a strong benchmark, we trained an RF classifier
by a fully-connected classifier. 50% dropout on the maxpooled on a “bag-of-words” (BOW) representation of function source
feature representation connections to the first hidden layer was code, which ignores the order of the tokens. An examination of
used when training. We found that using two hidden layers of the bag-of-words feature importances shows that the classifier
64 and 16 before the final softmax output layer gave the best exploits label correlations with (1) indicators of the source
classification performance. length and complexity and (2) combinations of calls which
5) Training: For data batching convenience, we trained are commonly misused and lead to vulnerabilities (such as
only on functions with token length 10 ≤ ≤ 500, padded memcpy and malloc.) Improvements over this baseline can
to the maximum length of 500. Both the convolutional and be interpreted as being due to more complex and specific
recurrent networks were trained with batch size 128, Adam vulnerability indication patterns.
optimization (with learning rates 5 × 10−4 and 1 × 10−4 , Overall, our CNN models performed better than the RNN
respectively), and with a cross entropy loss. Since the dataset models as both standalone classifiers and feature generators.
was strongly unbalanced, vulnerable functions were weighted In addition, the CNNs were faster to train and required much
more heavily in the loss function. This weight is one of the fewer parameters. On our natural function dataset, the RF
many hyper-parameters we tuned to get the best performance. classifier trained on neural feature representations performed
We used a 80:10:10 split of our SATE IV, Debian, and GitHub better than the standalone network for both the CNN and RNN

760

Authorized licensed use limited to: DRAPER. Downloaded on June 09,2020 at 17:30:23 UTC from IEEE Xplore. Restrictions apply.
Evaluation on packages from Debian and GitHub Evaluation on SATE IV Juliet Test Suite
1.0 1.0
CNN + RF
RNN + RF
0.8 CNN 0.8
RNN

true positive rate

BOW + RF
0.6
precision

0.6 CNN + RF
RNN + RF
0.4 CNN
0.4
RNN
BOW + RF
0.2 0.2 Clang
Flawﬁnder
0.0 Cppcheck
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
recall false positive rate

Fig. 2: Precision versus recall of different ML approaches Fig. 3: SATE IV test data ROC, with true vulnerability
using our lexer representation on Debian and Github test data. labels, compared to the three static analyzers we considered.
Vulnerable functions make up 6.5% of the test data. Vulnerable functions make up 43% of the test data.

PR AUC ROC AUC MCC F1

CWE breakdown for CNN + RF classiﬁer
BOW + RF 0.459 0.883 0.462 0.498
1.0 CWE-120 RNN 0.465 0.896 0.501 0.532
CWE-119
CNN 0.467 0.897 0.509 0.540
RNN + RF 0.498 0.899 0.523 0.552
0.8 CWE-476 CNN + RF 0.518 0.904 0.536 0.566
CWE-469
all others TABLE III: Results on the Debian and GitHub test data for
precision

0.6
our ML models, corresponding to Figure 2.

0.4
PR AUC ROC AUC MCC F1
0.2 Clang – – 0.227 0.450
Flawfinder – – 0.079 0.365
Cppcheck – – 0.060 0.050
0.0
BOW + RF 0.890 0.913 0.607 0.786
0.0 0.2 0.4 0.6 0.8 1.0 RNN 0.900 0.923 0.646 0.807
recall CNN 0.944 0.954 0.698 0.840
RNN + RF 0.914 0.934 0.657 0.813
Fig. 4: Performance of a multi-label CNN + RF classifier on CNN + RF 0.916 0.936 0.672 0.824
Debian and Github data by vulnerability type (see Table II.) TABLE IV: Results on the SATE IV Juliet Suite test data for
our ML models and three static analyzers, as in Figure 3.
features. Likewise, the RF classifiers trained on neural network
representations performed better than the benchmark BOW
classifier. vulnerability labels. Figure 3 shows the performance of our
Figure 2 shows the precision-recall performance of the best models alongside the SA findings on this nearly label-balanced
versions of all of the primary ML approaches on our natural dataset. We find that our models, especially the CNN, perform
function test dataset. The area under the precision-recall curve much better on the SATE IV test data than on the natural
(PR AUC) and receiver operating characteristic (ROC AUC) functions from Debian and GitHub, likely because SATE IV
as well as the MCC and F1 score at the validation-optimal has many examples for each vulnerability it contains and has
thresholds are shown in Table III. Figure 4 shows the perfor- fairly consistent style and structure. Among the SA tools,
mance of our strongest classifier when trained to detect specific Clang performs the best on the SATE IV data, but still finds
vulnerability types from a shared feature representation. Some very few vulnerabilities compared with all of the ML methods.
CWE types are significantly more challenging than others. The full SATE IV results are shown in Table IV.
We compare our ML models against our collection of Our ML methods have some additional advantages over
SA tools on SATE IV Juliet Suite dataset, which has true traditional static analysis tools. Our custom lexer and ML

761

Authorized licensed use limited to: DRAPER. Downloaded on June 09,2020 at 17:30:23 UTC from IEEE Xplore. Restrictions apply.
ACKNOWLEDGMENT
The authors thank Hugh J. Enxing and Thomas Jost for their
efforts creating the data ingestion pipeline. This project was
sponsored by the Air Force Research Laboratory (AFRL) as
part of the DARPA MUSE program.
R EFERENCES
[1] MITRE, Common Weakness Enumeration. https://cwe.mitre.org/data/
index.html.
[2] T. D. LaToza, G. Venolia, and R. DeLine, “Maintaining mental models:
A study of developer work habits,” in Proc. 28th Int. Conf. Software
Engineering, ICSE ’06, (New York, NY, USA), pp. 492–501, ACM,
2006.
[3] D. Yadron, “After heartbleed bug, a race to plug internet hole,” Wall
Street Journal, vol. 9, 2014.
[4] C. Foxx, “Cyber-attack: Europol says it was unprecedented in scale.”
Fig. 5: Screenshot from our interactive vulnerability detection https://www.bbc.com/news/world-europe-39907965, 2017.
[5] C. Arnold, “After Equifax hack, calls for big changes in credit report-
demo. The convolutional feature activation map [24] for a ing industry.” http://www.npr.org/2017/10/18/558570686/after-equifax-
detected vulnerability is overlaid in red on the original code. hack-calls-for-big-changes-in-credit-reporting-industry, 2017.
[6] NIST, Juliet test suite v1.3, 2017. https://samate.nist.gov/SRD/testsuite.
php.
[7] Z. Xu, T. Kremenek, and J. Zhang, “A memory model for static analysis
models can rapidly digest and score large repositories and of C programs,” in Proc. 4th Int. Conf. Leveraging Applications of
source code without requiring that the code be compiled. Formal Methods, Verification, and Validation, pp. 535–548, 2010.
Additionally, since the ML methods all output probabilities, [8] J. C. King, “Symbolic execution and program testing,” Commun. ACM,
vol. 19, pp. 385–394, July 1976.
the thresholds can be tuned to achieve the desired precision [9] M. Allamanis, E. T. Barr, P. T. Devanbu, and C. A. Sutton, “A
and recall. The static analyzers on the other hand return a fixed survey of machine learning for Big Code and naturalness,” CoRR,
number of findings, which may be overwhelmingly large for vol. abs/1709.06182, 2017.
[10] A. Hovsepyan, R. Scandariato, W. Joosen, and J. Walden, “Software
huge codebases or too small for critical applications. While vulnerability prediction using text analysis techniques,” in Proc. 4th Int.
static analyzers are able to better localize the vulnerabilities Workshop Security Measurements and Metrics, MetriSec ’12, pp. 7–10,
they find, we can use visualization techniques, such as the 2012.
[11] Y. Pang, X. Xue, and A. S. Namin, “Predicting vulnerable software
feature activation map shown in Figure 5, to help understand components through n-gram analysis and statistical feature selection,” in
why our algorithms make their decisions. 2015 IEEE 14th Int. Conf. Machine Learning and Applications (ICMLA),
2015.
VI. C ONCLUSIONS [12] L. Mou, G. Li, Z. Jin, L. Zhang, and T. Wang, “TBCNN: A tree-based
convolutional neural network for programming language processing,”
We have demonstrated the potential of using ML to detect CoRR, 2014.
[13] Z. Li et al., “VulDeePecker: A deep learning-based system for vulner-
software vulnerabilities directly from source code. To do ability detection,” CoRR, vol. abs/1801.01681, 2018.
this, we built an extensive C/C++ source code dataset mined [14] J. Harer et al., “Learning to repair software vulnerabilities with gener-
from Debian and GitHub repositories, labeled with curated ative adversarial networks,” arXiv preprint arXiv:1805.07475, 2018.
[15] Debian, Debian - the universal operating system. https://www.debian.
vulnerability findings from a suite of static analysis tools, org/.
and combined it with the SATE IV dataset. We created a [16] Github, Github. https://github.com/.
custom C/C++ lexer to create a simple, generic representation [17] C. Le Goues et al., “The ManyBugs and IntroClass benchmarks
for automated repair of C programs,” IEEE Transactions on Soft-
of function source code ideal for ML training. We applied a ware Engineering (TSE), vol. 41, pp. 1236–1256, December 2015.
variety of ML techniques inspired by classification problems http://dx.doi.org/10.1109/TSE.2015.2454513.
in the natural language domain, fine-tuned them for our [18] T. CI, Travis CI. https://travis-ci.org/.
[19] Y. Zhou and A. Sharma, “Automated identification of security issues
application, and achieved the best overall results using features from commit messages and bug reports,” in Proc. 2017 11th Joint
learned via convolutional neural network and classified with Meeting Foundations of Software Engineering, pp. 914–919, 2017.
an ensemble tree algorithm. [20] Cppcheck, Cppcheck. http://cppcheck.sourceforge.net/.
[21] D. A. Wheeler, Flawfinder. https://www.dwheeler.com/flawfinder/.
Future work should focus on improved labels, such as those [22] Y. Kim, “Convolutional neural networks for sentence classification,” in
from dynamic analysis tools or mined from security patches. Proc. 2014 Conf. Empirical Methods in Natural Language Processing
This would allow scores produced from the ML models to (EMNLP), (Doha, Qatar), pp. 1746–1751, Association for Computational
Linguistics, October 2014.
be more complementary with static analysis tools. The ML [23] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
techniques developed in this work for learning directly on “Distributed representations of words and phrases and their composition-
function source code can also be applied to any code clas- ality,” in Advances in Neural Information Processing Systems, pp. 3111–
3119, 2013.
sification problem, such as detecting style violations, commit [24] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning
categorization, or algorithm/task classification. As larger and deep features for discriminative localization,” in Computer Vision and
better-labeled datasets are developed, deep learning for source Pattern Recognition (CVPR), 2016 IEEE Conference on, pp. 2921–2929,
IEEE, 2016.
code analysis will become more practical for a wider variety
of important problems.

762

View publication stats Authorized licensed use limited to: DRAPER. Downloaded on June 09,2020 at 17:30:23 UTC from IEEE Xplore. Restrictions apply.

Towards An Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches-Le2022 - PHD
No ratings yet
Towards An Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches-Le2022 - PHD
176 pages
Shirley Yang Masc Thesis
No ratings yet
Shirley Yang Masc Thesis
65 pages
Auto-Detection of Programming Code Vulnerabilities With Natural L
No ratings yet
Auto-Detection of Programming Code Vulnerabilities With Natural L
37 pages
Software Defect Prediction PPR
No ratings yet
Software Defect Prediction PPR
11 pages
PHAM Iastate 0097M 11039
No ratings yet
PHAM Iastate 0097M 11039
58 pages
Pattern Based Vulnerability Discovery
No ratings yet
Pattern Based Vulnerability Discovery
151 pages
Haseeb Tahir Report
No ratings yet
Haseeb Tahir Report
40 pages
Cleanvul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics
No ratings yet
Cleanvul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics
25 pages
Survay of Programing Languages
No ratings yet
Survay of Programing Languages
37 pages
Machine Learning For Source Code Vulnerability Detection: What Works and What Isn't There Yet
No ratings yet
Machine Learning For Source Code Vulnerability Detection: What Works and What Isn't There Yet
17 pages
QNLP
No ratings yet
QNLP
20 pages
Vulnerability Detection in Popular Programming Languages With Language Models
No ratings yet
Vulnerability Detection in Popular Programming Languages With Language Models
21 pages
Understanding The Effectiveness of Large Language Models in Detecting Security Vulnerabilities
No ratings yet
Understanding The Effectiveness of Large Language Models in Detecting Security Vulnerabilities
18 pages
Wang 等 - Suitable is the Best Task-Oriented Knowledge Fusion in Vulnerability Detection
No ratings yet
Wang 等 - Suitable is the Best Task-Oriented Knowledge Fusion in Vulnerability Detection
25 pages
Diverse Vu L
No ratings yet
Diverse Vu L
15 pages
The Rise of Software Vulnerability Taxonomy of Sof - 2021 - Journal of Network
No ratings yet
The Rise of Software Vulnerability Taxonomy of Sof - 2021 - Journal of Network
24 pages
Buffer Overflow
No ratings yet
Buffer Overflow
12 pages
Li 2021
No ratings yet
Li 2021
16 pages
C L L M F F V S ?: AN Arge Anguage Odels Ind and IX Ulnerable Oftware
No ratings yet
C L L M F F V S ?: AN Arge Anguage Odels Ind and IX Ulnerable Oftware
18 pages
Securefalcon: Are We There Yet in Automated Software Vulnerability Detection With LLMS?
No ratings yet
Securefalcon: Are We There Yet in Automated Software Vulnerability Detection With LLMS?
18 pages
Devign Effective Vulnerability Identification by Learning Comprehensive Program Semantics Via Graph Neural Networks
No ratings yet
Devign Effective Vulnerability Identification by Learning Comprehensive Program Semantics Via Graph Neural Networks
11 pages
Usenixsecurity23 Mirsky
No ratings yet
Usenixsecurity23 Mirsky
19 pages
A Malicious Code Detection Method Based On Stacked Depthwise Separable Convolutions and Attention Mechanism
No ratings yet
A Malicious Code Detection Method Based On Stacked Depthwise Separable Convolutions and Attention Mechanism
27 pages
Meta-Path Based Attentional Graph Learning Model F
No ratings yet
Meta-Path Based Attentional Graph Learning Model F
13 pages
Deep Learning Solutions For Source Code Vulnerability Detection
No ratings yet
Deep Learning Solutions For Source Code Vulnerability Detection
12 pages
2018 Minhash
No ratings yet
2018 Minhash
13 pages
Applsci 14 09697
No ratings yet
Applsci 14 09697
14 pages
Software Vulnerability Analysis and Discovery Using Deep Learning Techniques A Survey
No ratings yet
Software Vulnerability Analysis and Discovery Using Deep Learning Techniques A Survey
15 pages
Crossproject Transfer Representation Learning For Vulnerable Fun 2018
No ratings yet
Crossproject Transfer Representation Learning For Vulnerable Fun 2018
9 pages
Vul-RAG Enhancing LLM-based Vulnerability Detection Via Knowledge-Level RAG
No ratings yet
Vul-RAG Enhancing LLM-based Vulnerability Detection Via Knowledge-Level RAG
12 pages
Deep Learning Solutions For Source Code Vulnerability Detection
No ratings yet
Deep Learning Solutions For Source Code Vulnerability Detection
12 pages
Automated Vulnerability Detection Using Deep Representation Learning
No ratings yet
Automated Vulnerability Detection Using Deep Representation Learning
7 pages
1 s2.0 S0167404822004096 Main
No ratings yet
1 s2.0 S0167404822004096 Main
11 pages
Usage of Machine Learning in Software Testing
No ratings yet
Usage of Machine Learning in Software Testing
15 pages
Vulnerabilities Classification Machine Learning Paper SIMARGL
No ratings yet
Vulnerabilities Classification Machine Learning Paper SIMARGL
16 pages
Vul-RAG Enhancing LLM-based Vulnerability Detectio
No ratings yet
Vul-RAG Enhancing LLM-based Vulnerability Detectio
12 pages
Applying Machine Learning To Software Fault Prediction: Bartłomiej Wójcicki, Robert Dąbrowski
No ratings yet
Applying Machine Learning To Software Fault Prediction: Bartłomiej Wójcicki, Robert Dąbrowski
18 pages
E-GVD Efficient Software Vulnerability Detection T-1
No ratings yet
E-GVD Efficient Software Vulnerability Detection T-1
9 pages
LineVul A Transformer-Based Line-Level Vulnerability Prediction
No ratings yet
LineVul A Transformer-Based Line-Level Vulnerability Prediction
13 pages
Your Instructions Are Not Always Helpfu
No ratings yet
Your Instructions Are Not Always Helpfu
10 pages
Driller: Augmenting Fuzzing Through Selective Symbolic Execution
No ratings yet
Driller: Augmenting Fuzzing Through Selective Symbolic Execution
16 pages
Final Research Paper
No ratings yet
Final Research Paper
9 pages
Predicting Vulnerable Software Componentsvia Text MiningRiccardo Scandariato
No ratings yet
Predicting Vulnerable Software Componentsvia Text MiningRiccardo Scandariato
15 pages
Systematic Analysis of Deep Learning Model For Vulnerable Code Detection Camera
No ratings yet
Systematic Analysis of Deep Learning Model For Vulnerable Code Detection Camera
7 pages
Usage of Machine Learning in Software Testing: Sumit Mahapatra and Subhankar Mishra
No ratings yet
Usage of Machine Learning in Software Testing: Sumit Mahapatra and Subhankar Mishra
16 pages
E Ure: The Int'l Journal of Information Security
No ratings yet
E Ure: The Int'l Journal of Information Security
8 pages
Security Vulnerability Detection Using Deep Learning Natural Language Processing
No ratings yet
Security Vulnerability Detection Using Deep Learning Natural Language Processing
6 pages
Project Draft 1.2
No ratings yet
Project Draft 1.2
11 pages
Malware Detection Using Convolutional Neural Network, A Deep Learning Framework: Comparative Analysis
No ratings yet
Malware Detection Using Convolutional Neural Network, A Deep Learning Framework: Comparative Analysis
14 pages
Software Vulnerabilities Overview A Descriptive Study
No ratings yet
Software Vulnerabilities Overview A Descriptive Study
11 pages
Intelligent Association of CVE Vulnerabilities Bas
No ratings yet
Intelligent Association of CVE Vulnerabilities Bas
7 pages
Coding Contest RP
No ratings yet
Coding Contest RP
6 pages
AIBug Hunter
No ratings yet
AIBug Hunter
34 pages
MADE-WIC Multiple Annotated Datasets For Exploring Weaknesses in Code
No ratings yet
MADE-WIC Multiple Annotated Datasets For Exploring Weaknesses in Code
4 pages
Paper Submission Example From Peergrade
No ratings yet
Paper Submission Example From Peergrade
6 pages
A Novel Approach For Code Smells Detection Based On Deep Learning
No ratings yet
A Novel Approach For Code Smells Detection Based On Deep Learning
5 pages
OWASP Security Culture-1.0
No ratings yet
OWASP Security Culture-1.0
26 pages
Software Vulnerability Prediction Using Text Analysis Techniques
No ratings yet
Software Vulnerability Prediction Using Text Analysis Techniques
3 pages
Data Mining Static Code Attributes To Learn Defect Predictors
No ratings yet
Data Mining Static Code Attributes To Learn Defect Predictors
3 pages
A Survey of Cyberattack Countermeasures For Unmann
No ratings yet
A Survey of Cyberattack Countermeasures For Unmann
20 pages
Survey of Attack Detection and Defense Technologies in Wireless Sensor Networks
No ratings yet
Survey of Attack Detection and Defense Technologies in Wireless Sensor Networks
9 pages
Deep Trust A Novel Framework For Dynamic Trust and Reputation Management in The Internet of Things IoT-Based Networks
No ratings yet
Deep Trust A Novel Framework For Dynamic Trust and Reputation Management in The Internet of Things IoT-Based Networks
13 pages
A Blockchain-Based Vehicle-Trust Management Framework Under A Crowdsourcing Environment
No ratings yet
A Blockchain-Based Vehicle-Trust Management Framework Under A Crowdsourcing Environment
6 pages

Automated Vulnerability Detectionin Source Code Using Deep Representation Learning

Uploaded by

Automated Vulnerability Detectionin Source Code Using Deep Representation Learning

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Automated Vulnerability Detection in Source Code Using Deep Representation

Conference Paper · December 2018

Louis Kim Paul Ellingwood

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Automated Vulnerability Detection in Source Code

Abstract— Increasing numbers of software vulnerabilities are II. R ELATED W ORK

978-1-5386-6805-4/18/$31.00 ©2018 IEEE 757

TABLE II: CWE statistics of vulnerabilities detected in our C/C++ dataset.

true positive rate

PR AUC ROC AUC MCC F1

You might also like