0% found this document useful (0 votes)
25 views19 pages

An Integrated Smart Contract Vulnerability Detection Tool Using Multi-Layer Perceptron On Real-Time So

Uploaded by

muhammedlatheef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views19 pages

An Integrated Smart Contract Vulnerability Detection Tool Using Multi-Layer Perceptron On Real-Time So

Uploaded by

muhammedlatheef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

IEEE SYSTEMS, MAN AND CYBERNETICS SOCIETY SECTION

Received 8 January 2024, accepted 30 January 2024, date of publication 8 February 2024, date of current version 16 February 2024.
Digital Object Identifier 10.1109/ACCESS.2024.3364351

An Integrated Smart Contract Vulnerability


Detection Tool Using Multi-Layer
Perceptron on Real-Time
Solidity Smart Contracts
LEE SONG HAW COLIN1 , (Member, IEEE), PURNIMA MURALI MOHAN 1 , (Member, IEEE),
JONATHAN PAN2 , (Member, IEEE), AND PETER LOH KOK KEONG 1 , (Senior Member, IEEE)
1 Infocomm Technology Cluster, Singapore Institute of Technology, Singapore 138683
2 Disruptive Technologies Office, Home Team Science and Technology Agency, Singapore 138507
Corresponding author: Purnima Murali Mohan ([email protected])
This work was supported by the Singapore Ministry of Education (MoE) Grant, Singapore Institute of Technology (SIT).

ABSTRACT Smart contract vulnerabilities have led to substantial disruptions, ranging from the DAO attack
to the recent Poolz Finance. While initially, the smart contract vulnerability definition lacked standardization,
even with the advancements in Solidity, the potential for deploying malicious contracts to exploit legitimate
ones persists. The Abstract syntax tree (AST), opcodes, and control flow graph (CFG) are the intermediate
representations for Solidity contracts. In this paper, we propose an integrated and efficient smart contract
vulnerability detection algorithm based on Multi-layer perceptron (MLP). We use feature vectors from the
Opcodes and CFG for the machine learning (ML) model training. The existing ML-based approaches for
analyzing the smart contract code are constrained by the vulnerability detection space, significantly varying
Solidity versions, and no unified approach to verify against the ground truth. The primary contributions
in this paper are 1) a standardized pre-processing method for smart contract training data, 2) introducing
bugs to create a balanced dataset of flawed files across Solidity versions using AST, and 3) standardizing
vulnerability identification using the Smart Contract Weakness Classification (SWC) registry. The ML
models employed for benchmarking the proposed MLP, and a multi-input model combining MLP and Long
short-term memory (LSTM) in our study are Random forest (RF), XGBoost (XGB), Support vector machine
(SVM). The performance evaluation on real-time smart contracts deployed on the Ethereum Blockchain
show an accuracy of up to 91% using MLP with the lowest average False Positive Rate (FPR) among all
tools and models, measuring at 0.0125.

INDEX TERMS Blockchain, ethereum, machine learning, multi-layer perceptron, real-time smart contracts,
solidity smart contracts, vulnerability analysis and detection, code analysis, software testing.

I. INTRODUCTION as Decentralised applications (Dapps) today. Dapps are


In 2022 the amount of funds controlled by smart contracts applications that have their logic written in smart contracts
was worth around USD 1750 million and it is set to reach and deployed to the blockchain. Dapps covers a wide range of
USD 9850 million by 2030 [1]. The gaining popularity of use cases from logistics to finance which enables day to day
smart contracts is due to its autonomy, trust, and secure and monitoring of usage. The transparency and traceability
environment and this has given birth to what we know characteristics of a blockchain attract not only private
companies but also keen government agencies. For example,
The associate editor coordinating the review of this manuscript and a permissioned or private blockchain can be used to manage
approving it for publication was Huiyan Zhang . the flow of currency [2]. Blockchain also finds application

2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 23549
L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

in sharing data between heterogeneous devices using smart training dataset, (iii) scalability and run-time of the model,
contracts. A framework by [3] suggests using smart contracts and (iv) standardized verification and validation of the
for recording data in the cloud, for data security, and model. Recent works that have claimed to have sourced
accountability of Internet of Things (IoT) devices. Another data from reliable sources [18], [19], [20], are still based
use case for IoT smart contracts is unmanned aerial vehicles on the validation of software verification and validation
(UAVs) [4] that transform a centralized trusted authority into tools such as Mythril [14], Slither [13], and Oyente [16]
a secure decentralized network. However, these works use to determine the ground truth information about the smart
smart contracts as authentication mechanisms for IoT and contracts used for training. Firstly, the main concern with
assume the inherent security of smart contracts. such reliance on third-party software verification tools for
Smart contracts have not been immune to hacks, since early the training dataset is that it is prone to inaccuracies that
2016 when major hacks started to surface and gain media are inherent to these software verification tools. Secondly,
attention. These hacks are not only harmful to the industry to introduce vulnerabilities into the training dataset, a syn-
but also cast a large shadow over the longevity of blockchain thetic data generation method using the Synthetic Minority
applications. In 2022, approximately 1.9 billion USD have Oversampling Technique (SMOTE) is being used [19], [20].
been lost to various hacks by exploiting vulnerabilities in However, synthetic data can never be a true representation
smart contract logic and manipulation of human errors in of a truly vulnerable dataset. Not only does it not keep
smart contracts [5]. To add on, specifically, Poolz Finance updated with the significantly varying Solidity versions,
suffered a major arithmetic overflow hack, where one of its but also leads to an imbalance in the training dataset
methods for pool creation contains a manual summing of due to the way it is implemented. Thirdly, some existing
token count which resulted in losing USD 6, 650, 000 [6], [7]. works simply skip the pre-processing steps (that ensures the
There have been attempts to prevent such hacks by quality and correctness of the training dataset) [18] which
creating contract standards and defining vulnerabilities. makes the solution practically not useful — especially when
OpenZeppelin [8], Consensys [9] and a trail of bits [10] are the training data set that comes across different solidity
some of the front runners on the quest to support developers versions, contains commented codes with code-like syntax.
and overcome potential security vulnerabilities. This becomes a prime concern since the solc compiler cannot
In terms of standardization, smart contract vulnerabilities differentiate code-like commented syntax — which is crucial
have been loosely defined since its emergence. There for bug-insertion algorithm. Also, the existing bug-injection
have been different flavors namely — the Smart Contract tools [21] suffer from practical issues such as solidity
Weakness Classification (SWC) registry [11], DASP [12] version, syntax based on solidity version, no exception,
and Crytic’s static analyzer, i.e., the Slither’s detector and no control over bug injection logic (i.e., the existing
documentation [13]. These definitions cover most of the tools dump all bugs in a single smart contract). Motivated
notable vulnerabilities, however, there is an existing gap in by these practical concerns, our proposed solution injects
terms of compliance with any security standards and coverage known vulnerability patterns into clean contracts by ensuring
of the entire smart contract vulnerability space constrained a clear indication of a vulnerability bug type in each
by the varying solidity versions and ever-emerging smart contract while developing a standardized pre-processing
contract logic bugs. These vulnerabilities, often resulting method to generate a balanced and good quality train-
from human errors and version changes, underscore the need ing dataset and validate against well-defined vulnerability
for more robust and reliable security measures in smart standards.
contracts. While such vulnerabilities can be detected by In this paper, we will be introducing an ML approach
software verification and validation tools using static [13], that uses a runtime opcode extraction algorithm for feature
[14], dynamic [15], and formal verification [14], [16], each extraction and a trigram-based method for vectorization.
method has its own limitations. In the work done by [17], We reference the SWC [11] registry for smart contract
the authors have conducted an extensive test on software vulnerability categorization as it has been one of the most
verification and validation tools and concluded that there well-defined mappings while loosely coupled with the
is no single analysis tool that can detect all smart contract Common Weakness Enumeration (CWE) [22]. To test the
vulnerabilities. Not only that new vulnerabilities cannot be efficacy of our solutions and perform benchmarking, we will
detected if it was not predefined within the tool, but also employ Mythril, Slither, and an integrated tool known as
existing vulnerability detection had significant false positive MythSlith. We have developed MythSlith — which is a tool
rates. These findings therefore advocate the use of a Machine that integrates Mythril and Slither to increase the coverage
Learning (ML) approach that can be dynamically trained of smart contract vulnerability detection space. These tools
to newer smart contract bugs and solidity versions while will be used to compare against the ML model trained with
reducing the false positive rate. the bug-injected dataset. Our contributions in this paper are
For any ML model the key factors that decide its reliability summarized as follows:
of that model are, (i) the origin of the dataset used for • We developed a standardized pre-processing algorithm
training that model, (ii) the quality and correctness of the for cleaning smart contract training data to address

23550 VOLUME 12, 2024


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

the limitation of the solc compiler not being able to


differentiate code from commented code-like syntax.
• We developed a practical bug injection algorithm to
create a balanced dataset across Solidity versions using
the Abstract syntax tree on verified smart contracts
that were cleaned using the proposed pre-processing
algorithm. FIGURE 1. Reentrancy snippet.
• We model an MLP framework and a multi-input
model based on MLP and LSTM for smart contract
vulnerability detection. The proposed framework scales
up the smart contract vulnerability space by utilizing
opcodes and Control Flow Graph (CFG) extracted
during model training. Before vectorization, a simpli-
fication method is applied to the opcodes to decrease
dimensionality (reduce the running time) and eliminate
contract-specific hexadecimal values.
FIGURE 2. Arithmetic snippet.
• We thoroughly analyze the time complexity of the
proposed algorithms and the running time for bug
detection. decentralized system. However, being in the eye of the public
• For MLP framework performance benchmarking, blockchain any vulnerabilities could lead to a huge amount
we integrate the well-known software verification tools, of financial loss.
Mythril and Slither, to develop an experimental tool Smart contracts can also be represented in different forms
known as MythSlith which has an increased smart with the help of the Solidity compiler. Some forms are
contract vulnerability detection space than the individual opcodes, abstract syntax tree (AST), and Control Flow Graph
tools. The machine learning models show a superior (CFG).
performance in vulnerability detection than existing Opcodes also known as operation code, are instructions
Software verification tools with the lowest false positive for the Ethereum virtual machine (EVM) to execute any
rate of up to 0.0015. sequential and conditional actions. The complete list of
• We verify the results against the standardized vul- opcodes with descriptions can be found in Ethereum’s yellow
nerability identification — Smart Contract Weakness paper [23].
Classification (SWC) registry as a common analysis Abstract Syntax Tree is a hierarchical tree representation
platform. of the synthetic structure of the source code. Each section
is represented as nodes, which construct the details of
The rest of the paper is organized as follows: Section II the real syntax. Such representation is commonly used for
provides a background of smart contract intermediate syntax checking, semantic analysis, code generation, and
representation, smart contract vulnerabilities, and types of code optimization.
software validation techniques used. Section III presents Control Flow Graph is the representation of program
a literature review of the recent works on smart contract flow from the context of the stack. This is derived from the
vulnerability detection space. In Section IV, we illustrate the opcodes, where a bunch of opcodes is broken down into basic
proposed methodology and framework for feature extraction blocks by flow conditions such as JUMP, JUMPI, REVERT,
to model training algorithms. Section V reports the exper- etc.
imental findings and inferences made on the efficiency of
the proposed methods. Section VI will discuss the findings,
B. TYPES OF BUGS
some possible alternatives, and future development before we
Smart contract vulnerabilities are often caused by oversights
conclude our work in Section VII.
during the programming stage, these bugs may seem harmless
when written but can potentially cause huge financial
II. BACKGROUND loss. The 7 vulnerabilities below are chosen as they are
In this section, we provide an overview of the vulnerabilities commonly found in other studies and they have become
and tools that will be used in the proposed algorithm for smart the foundation for vulnerabilities in both empirical and
contract vulnerability detection. ML-based approaches [19], [21], [24], [25]. The availability
of predefined bug snippets in [21] also helps cut down the
A. SMART CONTRACTS amount of development time.
Smart contracts are programs written to be executed on the Reentrancy was first uncovered in 2016 when a large sum
blockchain. Designed to be autonomous, self-sufficient and of money was stolen in the DAO contract (Fig 1). This was
expected to do their written or agreed-upon task in code primarily caused by the action of sending cryptocurrency to
without interference. They are considered to be the key to a an external account and updating it only after it was sent.

VOLUME 12, 2024 23551


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

FIGURE 3. Unauthorized send snippet.

FIGURE 7. Unhandled exceptions.

the sender is the address that initiated the current call.


Tx origin vulnerability occurs when a contract uses ‘tx.origin’
to authenticate a user rather than ‘msg.sender’, which refers
FIGURE 4. Tx origin snippet.
to the current immediate caller. This is particularly dangerous
when a tranferOwnership function uses ‘tx.origin’ for
authentication.
Timestamp dependency arises when a contract uses
FIGURE 5. Timestamp dependency snippet.
block variables such as block hash, timestamp, number,
difficulty, gaslimit and coinbase to perform critical operations
(Fig 5). These operations are generation of random numbers
and time critical applications such as auction. This is in
particular dangerous because miners gets to choose the
block’s timestamp.
Transaction Order Dependency is a type of vulnerability
where the sequence of calling transactions can impact the
final outcome of the application (Fig 6). This arises when a
state is altered based on the order of incoming transactions,
which can lead to unintended exploitation.
Unhandled Exceptions occurs when checks on send,
transfer, or call are not done (Fig 7). This is important because
calls can fail and intended changes will be reverted. Some
of the reasons for failure can be, out-of-gas exceptions and
FIGURE 6. Transaction order dependency. wrong arithmetic operations such as zero division errors.

This sequence of events might seem normal for traditional III. LITERATURE REVIEW
software code, however, in the case of Solidity, a fallback In this section, we will discuss currently available software
function of the external account can re-trigger the same verification and validation tools for smart contract detection,
sending function again before an update can happen within as well as works that use a machine learning approach.
the original contract. This could result in an indirect recursive Our review will cover the data sampling techniques, fea-
function call. ture extraction methods, and smart contract vulnerability
Arithmetic vulnerabilities, more commonly known as standards. An overview of the comparison can be found in
Overflow-Underflow (Fig 2). Overflow occurs when an Table 1.
operation tries to add to a variable that is already at its
maximum possible value. Without any sort of guard, this A. SOFTWARE VERIFICATION AND VALIDATION TOOLS
variable will overflow and back to 0 or the minimum possible In recent years, a number of analysis tools have been
value. As opposed to Overflow, Underflow happens when introduced and they can be categorized into 3 different types,
an operation tries to subtract a variable when it is at the Static, Dynamic, and Formal verification. Static tools rely
minimum possible value, resulting in the value jumping to on the static information of the code to derive a prediction
the maximum possible value. This vulnerability can lead to without executing the program [27]. Some of those features
severe security issues as hackers can use this behavior to alter extracted are the abstract-syntax tree (AST), compiled
account balance or change ownership of contract. bytecode, and opcodes. Dynamic tools analyze a running
Unauthorized send arises when there is no access control program, one such example is the Fuzzer [28]. Formal
for a function that requires an access check (Fig 3). Such Verification tools rely on the mathematical definition and use
a function may contain withdrawal or reward disbursement a solver such as Z3 to resolve the derived formula [29].
functionality. Among the software verification and validation tools
Transaction origin also known as tx.origin, arises for smart contracts are Slither [13], Mythril [14], and
from the misuse of the global variable ‘tx.origin’ of Solidity DefectChecker [30] for static verification; Manticore [15]
(Fig 4). In all transactions there is an origin and a sender. for dynamic analysis; and Oyente [16], Mythril, and
Origin is the address that started the chain of calls while DefectChecker for formal verification. Although these tools

23552 VOLUME 12, 2024


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

TABLE 1. Comparison with existing works.

have been instrumental in numerous audits, they are con- it contains useful semantic information. Reference [19] uses
strained by the predefined patterns of each bug. Should new a simplified opcode followed by Bigram for vectorization,
bugs emerge, an expert update would be necessary. while [18] uses features from AST, [20] and [26] uses a
mixture of AST and simplified Opcode with Bigram.
B. MACHINE LEARNING BASED VULNERABILITY While traditional models typically process data in a single
DETECTION common format, this does not preclude the combination of
As the name suggests, ML method is used to construct a vectorized data from various features. Works of [20] and [26]
set of decisions from the features of the data to make a have both mixed their feature embeddings and can produce a
logical conclusion of a trend or classification. Such a set good amount of variance for classification. However, in our
of decisions is known as a model and it can be done with proposed work aside from the classical models, we will
supervised or unsupervised learning. Supervised learning is implement a multi-model approach that allows features of
an algorithm that learns with labeled data while unsupervised different shapes to collaborate effectively.
will suggest clusters of possible outcomes and then deriving
of the outcome will depend largely on the feature set. C. VULNERABILITY STANDARDS
Vulnerability standards in the smart contract field are not
1) DATA SAMPLING yet prevalent, resulting in a spew of different definitions
As discussed in Section I, data integrity is an integral part by different organizations [11], [12]. This is not healthy
of ML model training but existing works from [18], [19], for the industry and will hinder further development. It is
and [20] did not have a cleaning or pre-processing step to also clear from existing works [18], [19], [20], [26], that no
ensure contracts are clean. While [26] has done checks using vulnerability standards were taken up. Consequently, in our
three software validation tools namely, Slither, Oyente, and proposed work, we have selected the SWC Registry [11] as
DefectChecker to ensure contracts are properly labelled and the vulnerability standard to provide a clear definition.
also removed if any error were present in the tool’s output, In the proposed solution, opcodes will be used alongside
in addition, smart contracts without version numbers were CFG features. Trigram is used instead as it captures more
also removed. However, the labeling of data relies on software context than Bigram or unigram. Rather than using AST, CFG
validation tools, which face the same issue as previous works, is chosen because it contains flow information which AST
i.e., the training dataset is prone to inaccuracies inherent to does not. Simplification of opcodes is done but different from
these tools. previous works, rather than replacing the entire set of PUSH,
Whereas in our approach, bug injection ensures identified DUP, and SWAP opcodes with a constant, we will leave the
bugs to be injected. However, bug injection from Solidifi [21] first five numbers untouched, allowing more variance to be
lacks post-injection error checking, leading to the creation captured. In addition, we have also removed all hexadecimal
of an erroneous dataset. Moreover, the code from [21] did values to prevent any contract-specific values which could be
not anticipate a situation where the bug count is less than the learned by the models.
available injection location. To accommodate such scenarios, This review has highlighted the need for expert knowledge
a recursive function was incorporated to check on the bug to define new bugs for current software verification and
count and the number of available locations. In addition, validation tools, significant variations in data sampling and
we have also incorporated a validation and error-handling feature extraction methods, and the absence of vulnerability
mechanism into the proposed bug injection algorithm. standards. While each work does give clear definitions of
their selected vulnerabilities, there is no compliance. Data
2) FEATURE EXTRACTION sampling from [26] have implemented a robust method using
Using opcodes for feature extraction is one of the more validation tools for labeling, others have not, raising concerns
popular methods as it is agnostic to the expert pattern and about data integrity. Opcode is a popular representation for
presents a clear path to identify any malicious act. Abstract feature extraction and mixing of features is a popular method
Syntax Tree (AST) is also one of the common methods as as seen in works of [20] and [26] but a multi-model approach

VOLUME 12, 2024 23553


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

FIGURE 8. Pre-processing Flow Diagram for the ML model training.

has yet to be attempted. The disarray in vulnerability TABLE 2. Variables used in the bug injection framework.
standards indicates the need for a single, universally adopted
standard to streamline the development process. The insights
gained from this review provide a good foundation for
developing a robust and reliable smart contract vulnerability
detection method.
The research gap is summarized in Table 1 highlighting the
novelty of the paper. In this paper, we perform standardized
pre-processing steps, by introducing both erroneous solidity
version exclusion and code-like comments removal which
could generate an incorrect AST. For data source labeling
we use bug injection in our work as it provides a reliable
ground truth. This was not done in the referenced existing
works in Table 1. While the work in [21] uses bug injection,
it does not validate the bug injection nor include any
pre-processing steps. The most recent work that performs
pre-processing by removing code-like comments is found
in [26]. However, it does not verify incorrect solidity versions
(during pre-processing) while relying on third-party tools an automated bug injection tool that checks for potential
(such as Mythril, Slither, Oyente, DefectChecker, etc.) to locations using AST tree to inject a set of predefined bugs.
label their datasets. However, Solidifi cannot be directly used to inject bugs since
it suffers drawbacks such as there is (i) no solidity version
IV. METHODOLOGY check, (ii) no syntax check, (ii) no exception for bugs, and
The approach to this research will be detailed in this section more importantly, (iv) it injects all the predefined bugs to a
in the following sequence: IV-A Preparation of dataset, single solidity smart contract which is not a practical scenario
IV-B Feature extraction, IV-D Machine Learning Models for to train the ML model. We hence propose two pre-processing
Classification, IV-E Multi-Model Approach, IV-F Design of algorithms and a bug injection algorithm below to mitigate
MythSlith, IV-G Challenges. A flow diagram of the end-to- the drawbacks of Solidifi.
end pre-processing steps that lead to the ML model training
is illustrated in Figure 8. 2) BUG INJECTION
The goal of employing the bug injection technique is to
A. PREPARE DATASET FOR ML TRAINING imitate the introduction of bugs by developers [21] in
1) DATASET the smart contract logic. The number of bug snippets is
For any ML model to be efficient and usable, generating determined by a predefined bug density. The bug density is
an error-free and practical dataset is important to achieve defined as the number of vulnerable lines of code per clean
good accuracy and false positive rate. As a first step, smart contract. Refer to Table 2 for the variables used in the
the clean dataset was initially sourced from the smart Bug injection algorithm. For example, for every 100 line of
contract sanctuary [31] — an open-sourced repository and code, when we insert 1 line of bug, then the bug density for
we validated the ground truth by compiling each Solidity file that smart contract will be 1%. By setting the bug density,
using the Solidity compiler (solc) to ensure no errors were we are able to have an even spread of bugs within the entire
present before any bug injection was performed. The version dataset used for training the ML model. To be uniform across
of the compiler used for each file is determined with a JSON the smart contract vulnerability space, we represent each bug
file provided by [31], which consists of the specific version snippet as a function.
used when the smart contract was actually deployed to the The process of injecting bugs has the following steps:
mainnet. Step1: Pre-process the source file using Algorithm (1)
The second step is to use the validated clean dataset Step2: Obtain source attribute information by generating
and inject with bugs defined by Solidifi [21] — which is the abstract syntax tree (AST)

23554 VOLUME 12, 2024


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

Algorithm 1 Pre-Processing Procedure Algorithm 2 Filtering Function Definition From AST


1: procedure PreProcess(file_path, output_directory) 1: procedure F(AST)
2: lines ← ReadLines(file_path) 2: Initialize LfDef ← ∅
3: new_lines ← [] 3: for n ∈ N do
4: inside_comment_block ← False 4: t ← τ (n)
5: is_tgt_with_start_end_block ← False 5: if t = ‘‘FunctionDefinition" then
6: is_inside_unique_comment_block ← False 6: LfDef ← LfDef ∪ {n}
7: for line ∈ lines do 7: end if
8: if "/*/" in line then 8: end for
9: Toggle is_inside_unique_comment_block 9: return LfDef
10: Continue 10: end procedure
11: end if
12: if is_inside_unique_comment_block then
13: Continue from the list of smart contract bugs to insert the
14: end if bug to the identified random location using the
15: if "/*" and "*/" in line and "/*/" not in line then Algorithm 3
16: Remove inline comment in line
17: end if a: PRE-PROCESSING SOLIDITY FILES
18: if "/*" in line and "//***" not in line then Cleaning Solidity files is a crucial pre-processing step, as the
19: inside_comment_block ← True solc compiler cannot distinguish between commented code
20: Remove text after "/*" in line and code-like syntax. This distinction is especially vital
21: end if in our implementation, as it impacts the injection process
22: if "*/" in line then and, consequently, influences the false positive rates in the
23: inside_comment_block ← False bug classification process. The details of the pre-processing
24: Remove text before "*/" in line algorithm are outlined in Algorithm (1).
25: end if Similar to the approach used in [21] bug locations are
26: if inside_comment_block then derived from AST nodes of the given Solidity files. The
27: if is_tgt_with_start_end_block then injection process is shown in detail in Algorithm (2) wherein
28: is_tgt_with_start_end_block ← False the fuction is filtered from the AST tree and in Algorithm (3)
29: Append line to new_lines where it takes in the file contents, number of bugs, the list of
30: end if all bugs to uniquely insert as inputs to recursively insert the
31: Continue bugs in randomly selected function location until one of the
32: end if below conditions are met.
33: if ‘‘http://’’ or ‘‘https://’’ in line then
• the function locations are exhausted
34: Append line to new_lines
• the bug density condition is satisfied
35: Continue
• all available bug snippets are exhausted
36: end if
37: if "//" in line then The above condition checks will only be done after a
38: Remove text after "//" in line compilation check using solc after every bug injection. This
39: end if ensures that the new file is free of errors, i.e., an error-free
40: Append line to new_lines dataset for training the ML model.
41: end for
42: file ← Join(new_lines) b: AST INSTANCE
43: Write file to output_directory When defining the AST instance in this section, we will
44: return file only take the attribute of interest into consideration. Let T
45: end procedure be the set of node types, including ‘FunctionDefinition’,
‘PragmaDirective’, ‘ContractDefinition’, ‘VariableDeclara-
tion’, etc. Let N be the set of nodes, where node n ∈ N
and n has a type from T . Let S be the set of all possible ‘src’
Step3: Filter the function definition within the smart attributes, representing the location of a particular node in the
contract definition that is generated by the AST code. We define the below functions to define the AST tree
using Algorithm 2 instance per solidity file.
Step4: Generate the number of bugs to be injected per • τ : N → T : Maps each node to its type (corresponding
Solidity file based on a pre-defined bug density to the ‘nodeType’ attribute).
Step5: Randomly select a function location (LfDef ) from • C : N → 2N : Maps each node to its child nodes,
the AST node and randomly choose a bug (bselected ) capturing the hierarchical structure of the code.

VOLUME 12, 2024 23555


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

Algorithm 3 Bug Injection Algorithm e: INJECTION OF SMART CONTRACT BUGS


1: procedure InjectBugRecursively(fc, nb, B, Bused ) With each iteration of bug injection, used bugs will be
2: AST ← SOLC(fc ) tracked, hence no bugs will be repeated which will result in
3: LfDef ← F(AST ) compilation failure. To represent this, we will let B be the set
4: Bunused = B − Bused of all available bug files, and let Bused be the set of bug files
5: fselected ← R(LfDef ) that have already been used. The set of remaining bug files
6: if (fselected = None) or Bunused can be represented as:
7: (|Bused | = nb and |Bunused | = 0) then
8: Return fc Bunused = B − Bused
9: end if Here, B − Bused denotes the set difference, which results in a
10: bselected = R(Bunused ) new set Bunused containing all the elements that are in B but
11: fc∗ = SliceIn(fc , bselected , fselected ) not in Bused .
12: B∗used = Bused ∪ {bselected } A random function definition node, which has the source
13: Comment: Recursive call to continue bug injection location, will be selected for each bug. This is represented by
14: InjectBugRecursively(fc∗ , nb, B, B∗used ) fselected and it is selected from LfDef :
15: end procedure (
R(LfDef ) if LfDef ̸= ∅
fselected =
None otherwise
•σ : N → S: Maps each node to its ‘src’ attribute. Here, R(LfDef ) represents the random selection of a function
The AST is then represented as a 4-tuple containing the definition node from LfDef . If LfDef is empty, fselected will be
nodes, their types, child nodes, and source locations. set to None.
Similar to how a function definition node is selected, bug
AST = (N , τ, C, σ )
is also selected randomly and represented as bselected which is
selected from the set R of remaining unused bugs:
c: FUNCTION EXTRACTION
bselected = R(Bunused )
Next, we define the function that extracts only the‘Function
Definition’ from the AST nodes. Define a function F : We will be slicing the selected bug bselected with the given
AST → L to extract nodes with the ‘nodeType’ of fselected into the fc. The slicing location is determined by the
‘FunctionDefinition’ from the AST. The function extraction values within the source attribute, where the start and length
step operates as follows: values are represented in sequence while delimited by colons.
• It takes the AST as input, where the AST is represented Given that, slicing is done to place in the bug snippet and it is
as AST = (N , τ, C, σ ). represented by the following function: Let fc∗ be the new file
• It iterates through the nodes in N , checking the content that has the bug injected:
‘nodeType’ attribute for each node using the function fc∗ = SliceIn(fc , bselected , fselected )
τ : N → T.
• If the ‘nodeType’ of a node is ‘FunctionDefinition’, Finally, after each successful injection bselected will be
it adds the node to the resulting list. appended into B∗used . The updated set B∗used of used bugs can
• It returns the list of nodes with the ‘nodeType’ of be obtained as:
‘FunctionDefinition’ as the output.
B∗used = Bused ∪ {bselected }
LfDef ← F(AST ) After each injection, B∗used will be saved and kept for
feature extraction use later on.
d: BUG DENSITY With the bug injection process explained, we now transi-
Selection of the number of bugs is done with a bug density tion to discussing our feature extraction methodologies.
value, which is a predefined ratio of 0.01, and each clean
contract will have at least 1 bug inserted: B. FEATURE EXTRACTION
Let S represent the number of lines of code in the file, Models will be built with features extracted from smart
and let δ be the predetermined constant ratio representing bug contracts that have been injected with the predefined bugs
density. The variable nb, which denotes the number of bugs as mentioned in the previous section. However, there is
to be injected, is given by: one key thing to be noted, as each injection location is
nb = max(1, ⌊S × δ⌋) randomly selected we would not have the same set of code
for the 7 categories of smart contract bugs. This will be
Here, max(1, . . .) ensures that the minimum number of bugs further explained in each feature extraction algorithm section.
is at least 1, and ⌊. . .⌋ represents the floor function, rounding The vulnerabilities to be injected are Reentrancy, Arith-
down to the nearest integer. metic, Unauthorized send, Transaction Origin, Timestamp

23556 VOLUME 12, 2024


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

Algorithm 4 Extract Opcodes From Bugged Files Each element in Gi would map to the corresponding
1: procedure SimplifyOpcode(fc , B∗used ) property in the ABI JSON.
2: Initialize Lopcodes ← ∅ Following this, opcodes will be simplified. Simplification
3: C ← SOLC(fc ) is done because opcodes such as PUSH have 32 variations
4: for (ABI , O) ∈ C do while SWAP, and DUP have 16. Where each variation
5: found_match ← False represents the number of bytes to be pushed on the stack.
6: for bug ∈ B∗used do i.e., PUSH1 - 1 byte, PUSH4 - 4 bytes. The simplification
7: for Gi ∈ ABI do rules enforced are similar to [19] and can be found in
8: if namei = bug.name then Table 3. With this simplification, the number of opcodes
9: found_match ← True remaining will only be 77, thus reducing the dimension of the
10: O∗ ← Simplify(O) feature vector. We then employ the n-gram algorithm [32]
11: Lopcodes ← Lopcodes ∪ {O∗ } for feature extraction. In natural language processing and
12: end if computational linguistics n-gram is widely used as a
13: end for calculation of the frequency distribution of a selected
14: if found_match then n-number of tokens, where n refers to the number of adjacent
15: Break elements to consider from a string of tokens. Unigrams,
16: end if bigrams, and trigrams are some examples of n-grams where n
17: end for is 1,2 or 3, respectively [19]. For this paper, we have chosen
18: end for the trigram approach for feature extraction. This choice is
19: return Lopcodes motivated by the need to capture more information while
20: end procedure maintaining scalability, particularly as additional bugs are
incorporated, allowing for more syntax to be captured for
precise classification. Additionally, all hexadecimal values
dependency, Transaction Order Dependency, and Unhandled are removed, as they are unique to each smart contract
Exceptions. and do not provide any flow information relevant to bug
identification.
1) OPCODES The vectorizing of text features is done by Term
Opcodes are obtained by using SOLC by using the input Frequency-Inverse Document Frequency (Tfidf ). Tfidf can
and output JSON method — which is also the recommended be broken down into two sections, I) Term Frequency,
way that has a consistent interface throughout all compiler 2) Inverse Document Frequency. 1) Term frequency will
versions. As not all contracts within each Solidity file will reflect frequently occurring sections of a document by
have a bug injected, a cross-check will be needed to only take using a weighting factor, in which the weight increases
in the opcode from injected contracts. Hence, given the file proportionally to the number of times a word appears in the
name and B∗used , we will sift out contracts by comparing the document. However, this is offset by 2) Inverse Document
function name with the Abstract Binary Interface(ABI). The Frequency where it buries commonly appearing words and
full algorithm of the Opcode extraction process can be found highlights rare occurring words. This allows unique features
in Algorithm (4). to surface.
As a next step, to describe the sifting process, let’s consider Post extraction, we are left with a sparse matrix. To further
C to be the set of m contracts returned by the solc. Each reduce the dimension for model training, SelectKBest based
contract in C contains both an Application Binary Interface on Chi-square and TruncatedSVD from scikit learn API [33]
(ABI) and an ordered list of opcodes. Formally, C is defined are both employed. SelectKBest will select the best 2000 fea-
as: tures from the sparse matrix followed by the TruncatedSVD.
Rather than the more commonly known dimension reduction
C = {(ABI1 , O1 ), (ABI2 , O2 ), . . . , (ABIm , Om )} method — Principle Component Analysis (PCA), which does
not work on sparse matrix. TruncatedSVD is able to work
where ABIi represents the Application Binary Interface for on it efficiently because it does not center the data before
the ith contract, and Oi represents the ordered list of opcodes computing the single value decomposition. The number of
for the ith contract. Also, ABIi = {G1 , G2 , . . . , Gr }, where components are based on the cumulative variance of 95%
each Gi is a function descriptor with a maximum number of from the selected 2000 features. With this method, we will
r function descriptors per ABI. obtain the features that have a cumulative variance of at least
Each function descriptor Gi could be represented as a tuple: 95% and thereby reducing the number of features.
Upon completion of these steps, the Opcode trigram
Gi = constanti , inputsi , namei , outputsi , feature vector is primed and ready for the machine learning
payablei , stateMutabilityi , typei

model training.

VOLUME 12, 2024 23557


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

Algorithm 5 Extract CFG From Bugged Smart Contracts


1: procedure RuntimeBytecode(fc , B∗used )
2: Initialize Lcfg ← ∅
3: C ← SOLC(fc )
4: for (ABI , BC) ∈ C do
5: found_match ← False
6: for bug ∈ B∗used do
7: for Gi ∈ ABI do
8: if namei = bug.name then
9: found_match ← True
10: CFG ← Ethersolve(BC)
11: Lcfg ← Lcfg ∪ {CFG}
12: end if
13: end for
14: if found_match then
15: Break FIGURE 9. A snippet of digraph from Ethersolve CFG with parameters
16: end if p = 2, q = 0.5.
17: end for
18: end for
19: return Lcfg During each walk, the gathered information is input into
20: end procedure a Word2Vec model, resulting in node embeddings. For this
purpose, the Precomp model is employed. In Precomp, the
return parameter p is set to 2, and the in-out parameter q is 0.5.
This model represents an optimized version of Node2Vec,
C. CONTROL FLOW GRAPH
precomputing and storing all transition probabilities for
Other than extracting static features from opcodes using random walks. The values of p and q significantly influence
n-gram, constructing a CFG has more benefits as it contains the structure of the resultant graph. The return parameter
sequence data from the runtime opcodes. Allowing much p influences the random walk’s likelihood of selecting an
more intrinsic patterns to surface. In this work, we employ immediately preceding node, while the in-out parameter q
a CFG builder from Ethersolve [34]. It is a tool that encourages the walk to remain within its local neighborhood.
uses symbolic stack execution to resolve jump destination, With a lower value of q, the walk is more likely to visit nodes
resulting in accurate edges. As Ethersolve is built using Java, that are also connected to the previous node.
we have utilised its’ core module by creating a wrapper in The final embedded information is constrained by avail-
python contained within a docker instance. able resources, therefore the number of walks from each node
Similar to the process done in opcode for contract will be limited to 1, and walk length will be set to the average
selection, we have done the same by obtaining deployed graph length across all categories, including Clean contracts.
bytecode from solc and only process contracts that has bug Walks shorter than this average are padded when the depth of
function inside the ML model training. Full algorithm can the selected node is less than the average walk length.
be found in Algorithm 5. Below we will describe each Given the parameters and constraints, we can now do
representation. a small example on how Node2vec walks are done using
Let C be the set of contracts returned by the solc. Each a tree snippet generated by Ethersolve CFG at Figure 9.
contract in C contains both an Application Binary Interface Probabilities of paths are grouped into 3 types, direct return
(ABI) and an ordered list of opcodes. Formally, C is defined node, node that does not have a path to the direct return node
as: and node that has a return edge to the direct return node.
C = {(ABI1 , BC1 ), (ABI2 , BC2 ), . . . , (ABIm , BCm )} Assuming that we will start at node 202, it has neighbours
13, 210, 233, 214 and the return node is 13. The transition
where, ABIi represents the Application Binary Interface probabilities for each paths will be 1p to 13 as it is the return
for the ith contract, and BCi represents the ordered list of node, 1q for node 210 and 233 as they do not have a return
deployed bytecode for the ith contract. edge to node 13 and 1 for node 214 as it has a return path
Graph information was subsequently extracted using to node 13. It will then be normalized by the sum of each
PecanPy [35], a fast, efficient, and parallelized Python possible paths, which will give us probabilities of 0.111 for
implementation of Node2Vec [36]. Although Node2Vec a walk going back to 13, 0.444 to node 210 and 233, finally
is adept at learning low-dimensional representations of 0.222 to node 214. Given as such, the next walk will most
nodes within a graph, it lacks parallelization in its random probably be either 210 or 233.
walks. Addressing this limitation significantly enhances the After obtaining the embedded data, a Long Short Term
efficiency of learning in dense networks [35]. Memory (LSTM) model will be used for the evaluation.

23558 VOLUME 12, 2024


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

TABLE 3. Opcode simplification. TABLE 4. Variables used in the MythSlith Algorithm.

of Recurrent Neutral Network (RNN) and it is designed to


address the problem of vanishing gradients that arises during
the back propagation process. The vanishing gradients is a
problem because weights of edges are adjusted according to
the calculated loss. This inhibit long sequences from updating
Model training was implemented based on dataset size earlier layers as the update value is according to the difference
supportable by available computational resources. from the last layer. The main difference between LSTM and
On algorithmic complexity for the Pre-processing its predecessor is the ability to retain more information within
algorithm (Algorithm 1) and Bug insertion algorithm each cell. This is done so by having ‘gates’ to control the
(Algorithm 3), the worst-case complexity depends on the amount of information flow in the network. However, after
size of the smart contract, O(S), where S represents a preliminary study, CFG data with LSTM did not yield good
the number of lines in the smart contract. Whereas the results and we therefore decided to combine the MLP model
complexity of the Function filtering algorithm (Algorithm 2) into the process making it into a multi-model. MLP model
is determined based on the number of nodes in the AST is chosen because the training epoch can be shared between
tree, O(N ). In order to extract the opcodes from a given both models, allowing for simpler implementation. Inputs
smart contract, the complexity depends on the total number are the Tfidf and CFG data. This allows increased depth
of function descriptors in an ABI (r) and the total number and complexity, resulting in better performance in terms of
of ordered lists of ABI, opcodes within a contract (m). generalizing the features.
Hence the algorithmic complexity for extracting opcodes To ensure a well-trained model, we have implemented
from a smart contract (Algorithm 4) is in the order of several measures such as k −fold validation, learning−curve
polynomial complexity, O(rm2 ). For a smaller number of and roc curve. These checks are done to all models before
opcodes within a contract (which is usually true in practical passing off for the unseen data test. K-fold validation is a data
smart contract development), this time is much less. This can partitioning method for assessing model performances. This
be observed from Figure 13. The algorithmic complexity for method evaluate how well a model generalise an independent
the feature selection algorithm using the n-gram algorithm dataset. This is done so by having k folds, where k refers to a
is given by O(n2 log n). Thus the worst-case complexity for number of divided parts for the data. After dividing, 1 part of
Algorithm 5 that extracts the run-time bytecode is given the k fold will be used as the independent test data while the
by O(rm2 n2 log n). For a small number of opcodes within rest would be the training set. This method provides variance
a smart contract and n = 3 (for the trigram feature and bias reduction ensuring a more accurate estimation. The
selection algorithm), the polynomial complexity scales down purpose of learning curves is to identify 2 undesirable states
to a practical running time for detecting smart contract during the training phase for the models, the 2 states are
vulnerabilities. Underfitting and Overfitting. These states are identified by
looking at it’s training scores, since this is a classification
D. MACHINE LEARNING MODELS FOR CLASSIFICATION task we will be using accuracy as the score. Underfitting
Our dataset encompasses eight distinct categories, necessitat- occurs when both training and validation scores are low and
ing the use of a multiclass classifier algorithm for the training continue to persist when more training data is added. Which
of classifiers. These algorithms are capable of handling infer that the model is too simple and is incapable of capturing
multiple classes without the need to be adjusted or modified. the underlying pattern. Overfitting is when training score is
In this research, we will be using the following models high while the validation score is significantly lower. This
from scikit-learn api [33] for comparison: Random Forest infer that model may be too complex and will most likely
(RF), XGBoost (XGB), Support Vector Machine (SVM) and fail at generalizing the different classes. Learning curves of
Multi-layer Perceptron (MLP) which is a Neutral-Network each model can be found in Figure 11a. Epochs training
implementation of scikit-learn. information is much suited for neural network representation,
We will also utilize sequential neutral network models such for our MLP and MLP + LSTM model learning curve
as Long Short-Term Memory (LSTM). LSTM is a variant comparison can be found in Figure 11b.

VOLUME 12, 2024 23559


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

Algorithm 6 MythSlith Algorithm


1: mythrilDepth ← 22
2: if Vulnerability = Reentrancy then
3: Sr ← Slither(sc)
4: output Sr
5: else if (Vulnerability = Unauthorizedsend) or
6: (Vulnerability = Txorigin) or
7: (Vulnerability = Arithmetic) then
8: Mr ← Mythril(sc, mythrilDepth)
9: output Mr
10: else
11: Mr ← Mythril(sc, mythrilDepth)
12: if Mr .Severity = High then
13: output Mr
14: else
15: mythrilDepth ← 100
16: Sr ← Slither(sc)
17: DeepMr ← Mythril(sc, mythrilDepth)
18: if DeepMr .Severity = High then
19: output DeepMr
20: else if Sr .Severity = High then
21: output Sr
22: else if (DeepMr .Severity = Other) or
FIGURE 10. Mapping of Slither vulnerabilities to SWC. 23: (Sr .Severity = Other) then
24: if DeepMr .Severity = Other then
25: output DeepMr
The ROC curve analysis is utilized to evaluate each 26: end if
model’s ability to distinguish specific vulnerabilities from the 27: if Sr .Severity = Other and
rest. These curves, shown in Figure 12, reveal that the RF and 28: |DeepMr .MediumSeverity| < |Sr .MediumSeverity| and
MLP + LSTM models struggle to effectively differentiate 29: |DeepMr .LowSeverity| < |Sr .LowSeverity| then
between the classes. 30: output Sr
31: end if
E. MULTI-MODEL APPROACH 32: end if
Building upon traditional models, we have further developed 33: end if
a multi-model approach utilizing a combination of Tfidf 34: end if
vectorized opcodes and PecanPy node embeddings, employ-
ing a Keras multiple input model. This technique enables the
use of multiple sub-networks, which can be concatenated or
merged into a unified network at a certain point. This method Z3, to determine the satisfiability of the generated symbolic
proves particularly advantageous when dealing with diverse formula, which is a mix of dynamic and formal verification.
data sources, allowing for each to be processed by distinct In [24] an empirical study on available open-sourced tools
models. was done, the result for their curated dataset, which was
In the framework of this multi-model architecture, collected from real vulnerable contracts or injected with bugs,
we employed an MLP for opcode data and an LSTM for showed that Mythril yields an accuracy of 27% while Slither
PecanPy data processing. However, due to the extensive size 17%, though low but they are the highest among all the
of the dataset, we limited our test to only 50 randomly 9 tools presented. In conclusion, [24] suggested to use of a
selected, bugged Solidity contracts from each category. combination of Mythril and Slither which could yield a 37%
accuracy.
F. DESIGN OF MYTHSLITH Certainly, there are also other open-sourced tools available
Analysis tools are considered the go-to when checking for any but, Mythril and Slither are both actively updated and
smart contracts vulnerabilities. However, contemporary tools therefore more suited for a direct comparison. We have
are not yet capable of a full detection [37]. Hence in order designed a new integrated algorithm, MythSlith, a simple
to have a guideline for the ML model, Mythril and Slither is and elegant combination of both tools, which takes reference
used. from a normal depth Mythril analysis to decide on the
Both Mythril and Slither are considered as static tools, process. Mythril is used as the baseline for MythSlith
however, Mythril uses symbolic execution and a SMT solver, because it has a higher detection accuracy as shown

23560 VOLUME 12, 2024


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

in [24]. Furthermore, specific tools have been designated different Solidity compiler version. Inconsistency between
for addressing vulnerabilities like Reentrancy, Unauthorized the source indexes will affect the bug injection process where
send, Tx origin, and Arithmetic. This allocation stems from it is reliant on it to insert bugs.
the outcomes detailed in Table 5. Additionally, although The unseen test set is prepared by a random selection
the severity level was initially absent in Slither, it has been of 100 solidity files from the Smart Contract Sanctuary.
incorporated, referencing the SWC registry and aligning with These files are in addition to the existing 4335 contracts.
the identified vulnerabilities. The algorithm design can be These contracts will then go through the same bug injection
found in the Algorithm 6. The variables used in MythSlith and feature extraction procedure as in Algorithm 3 and
Algorithms is listed in Table 4. Algorithm 4, Algorithm 5.
Complete results can be found in both Table 6 and Table 5.
G. CHALLENGES Table 6 illustrates the overall comparison between analysis
1) MULTIPROCESSING tools and ML models.
Multiprocessing is implemented to increase the time taken for A. EVALUATION METRICS
test, however, this has spun off a new problem while changing To enable the comparison, SWC standard is utilized as the
solc version. Therefore, in order to prevent compilation error, benchmark and it can be found in Fig 10. The results of
all processes are dockerized and solc-select is included inside the analysis tools and ML models are obtained by executing
both Mythril and Slither dockerfile. Any action that has got the same set of data with seven different classifications of
to deal with solc will be spinning up a docker container to vulnerabilities. The clean category is also added for clarity.
prevent any compiler mismatch issue. To ensure a balanced and unbiased result from the analysis
tool, each tool will run the clean dataset and then the result
2) UNIFIED VULNERABILITY CLASSIFICATION will be used as the baseline, ensuring a reference ground truth.
Combining Mythril and Slither poses another challenge This is because we can never assume each smart contract is
which is the definition of vulnerabilities. Though both tools free of bugs.
are capable of detecting some common bugs, they do not use Given the result from the analysis, a confusion matrix
the same standard. While Mythril uses the more recognized will be constructed for the evaluation. The confusion matrix
SWC, Slither uses its own definition. To resolve this, we have comprises the following values in accordance with the result
added in SWC definition in the Slither code and incorporated prediction. These values with description can be found in
the output with SWC ID. This will allow both tools to Table 7.
communicate with reference to the same vulnerabilities With these values, we will be able to construct metrics:
definition standard. In this paper, we have mapped the seven • Accuracy: represents ratio of correctly predicted values
different vulnerabilities that can be found in Slither to SWC to the total dataset. It is a measure of the overall
ID. The mapping can be found in Figure 10. correctness.
TP + TN
Accuracy = (1)
V. RESULTS AND PERFORMANCE ANALYSIS TP + TN + FP + FN
In this section, experiment results from both analysis tools • Precision: can also be referred as positive predicted
and our ML based approach will be put against each other value, represents the number of correctly classified
for comparison. Effectiveness of each tool will be based positive values to the total predicted positives.
on four parameters namely, (i) accuracy, (ii) precision, TP
(iii) recall, and (iv) F1 score. These parameters will then Precision = (2)
TP + FP
form the confusion matrix for better visualization. All
experiments were conducted on a machine with the following • Recall: also known as true positive rate, hit rate or

specifications: R162-ZA1-00 with 16 CPUs x AMD EPYC sensitivity, represents the ratio of correctly classified
7282 16-Core Processor 64GB of RAM and 1.9TB of SSD. instances to all datasets.
In this paper, a total of 4335 manually verified Solidity TP
Recall = (3)
files were sourced from the smart contract sanctuary [31] TP + FN
repository to track verified smart contracts from the Ethereum • F1 Score: more useful for uneven dataset or class
mainnet and testnets, such as rinkeby, ropston, kovan, etc. distribution. This metric is the weighted average for
And also from Binance Chain, Polygon/Matic, and Tron. Precision and Recall.
In the mainnet repository, contract versions have a wide 2 × (Precision × Recall)
spread from 0.4.1 to 0.8.7 given the latest update. For F1 Score = (4)
Precision + Recall
this experiment, we will initially consider contracts from
• False Positive Rate: is a proportion for the measure of
the mainnet and Solidity versions from 0.4.11 to 0.4.26 to
incorrectly classified instances as positive.
compare against the known ground truths. Any contracts
below 0.4.11 will not be considered due to a known compiler FP
FPR = (5)
bug where source indexes could be inconsistent between FP + TN

VOLUME 12, 2024 23561


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

TABLE 5. Comparison of different tools.

In this study, we will be looking at recall followed missing FN has much higher consequences than an increase
by precision and f1 score then accuracy. This is because in FP, as FN can lead to potential security breaches and

23562 VOLUME 12, 2024


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

TABLE 6. Overall comparison different tools.

FIGURE 11. Learning rate with increasing training size (a) and increasing epochs (b).

TABLE 7. Variables used in evaluation metrics calculation. showing a slightly lower accuracy of 0.8954 and an F1 score
of 0.8979. FPR of the models has shed some light on the
performance, that the MLP among the all other models has
the lowest FPR at 0.0125 — which indicates its likeliness to
flag out False positive is unlikely.
The detailed results of the individual vulnerability analysis
are presented in Table 5. This analysis revealed that MLP
has consistently demonstrated superior performance across
most of the vulnerability detection, particularly for the
detection of Clean and Unauthorized send with f1 scores
of 0.8723 and 0.9153 respectively. SVM did well across
most of the categories, however, it has a little trouble with
Reentrancy, while having a high recall of 0.9518 its’ precision
financial losses. However, it is advised to take both Precision took a toll at 0.7940. The XGB model displayed a consistent
and F1 scores into consideration to ensure a balanced performance across various categories, with a notable F1
model. score of 0.7716 in the Clean category, though it did not
outperform all other tools in this category. Surprisingly,
B. PERFORMANCE OF MACHINE LEARNING MODELS bagging ensemble learning technique like RF did not do
In this section, we illustrate the performance comparison very well, measures across the board is not more than 0.67.
between our ML models, Mythril, Slither, and MythSlith The multi-model of MLP and LSTM weak results, while
on the 100 bugged solidity files. The overall performance doing well in Transaction order with F1 score of 0.7176, it is
analysis of the tools, as summarized in Table 6, shows that less effective in other categories. One notable FPR measure
the MLP and SVM models exhibited the highest levels of is for the multi-model, where it is 0.4545 for Unhandled
accuracy, precision, recall, and F1 score, with MLP achieving Exceptions, this clearly indicates the low effectiveness of
an accuracy of 0.9129 and an F1 score of 0.9127, and SVM the features used for the models. While MLP excels in

VOLUME 12, 2024 23563


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

FIGURE 12. ROC Curves for various models. From left to right, top to bottom: Random Forest, XGBoost, SVM, MLP, MLP+LSTM.

Timestamp Dependency with the lowest measure at 0.0037, VI. ANALYSIS AND INFERENCES
indicating high confidence in the detection. A. PRE-PROCESSING AND FEATURE EXTRACTION
In our methodology, opcode sequences are utilized and
C. PERFORMANCE COMPARISON WITH SOFTWARE simplified technique aimed at enhancing variance. We then
VERIFICATION AND VALIDATION TOOLS employ Tfidf technique with trigram-based feature extrac-
Results of the tools can be found at Table 5. Right off
tion. While this approach is efficient and straightforward,
the bat, it is clear that current analysis tool is unable to
it suffers from a lack of context awareness. This limitation
identify some vulnerabilities. Transaction order seem to be
stems from its focus on only three consecutive opcodes
a challenge for the analysis tools, while Arithmetic proof
at a time, leading to a sparsity issue. The sparsity results
to be too hard for Slither to handle. In addition, while it
from the model’s limited exposure to examples, potentially
seem great from the precision point of view for the 3 tools,
compromising its accuracy and robustness.
recall and F1 score of the does not do very well. This has
Furthermore, the sparse nature of the Tfidf repre-
yet again emphasised on the findings from [17]. And current
sentation presents computational challenges, particularly
tools has difficulty identifying Clean contracts, with Slither
as data volumes escalate. In the context of our opcode
leading the pack at only 0.375 in precision while the rest
dataset, this method generates 4, 824 feature columns via
is well below 0.2. Another vulnerability to highlight is the
the Tfidf vectorizer. Additionally, the process of opcode
TimestampDependency, surprisingly all 3 tools did not fare
simplification, while conserving computational resources,
well with recall being lesser than 0.2. FPR of clean contracts
introduces significant drawbacks. A primary concern is the
for Mythril is particularly high at 0.2590, and this behaviour
discarding of hexadecimal values, resulting in the loss of vital
continues in Reentrancy and Arithmetic. However, it did
address information crucial for source location tracing.
really well in Unhandled exceptions where no FP were raised.
Looking ahead, the adoption of an alternative vectorizer
Slither did really well in Tx origin with just 0.0007 for its
to Tfidf could potentially yield improved contextual
FPR. However, such performance is not backed by its recall
understanding, enhancing the models’ learning capabilities.
and f1 score which has clearly indicate poor TP. MythSlith
results were not spectacular, it is just hovering between
Mythil and Slither measures. One example is Timestamp B. COMPARISON OF MYTHSLITH AND MLP MODEL
Dependency where MythSlith has a FPR of 0.0428, Mythril From the test conducted, it is clear that relying on one tool
and Slither have 0.0505 and 0.0179 respectively. Due to its for smart contract analysis is not ideal. Current Software and
design, it can never be better than the highest measure by verification tools such as Mythril [14] and Slither [13] tend to
either tools. be on the safe side because the result has high precision with
These findings highlight the varying strengths and limi- low recall and f1 score. In our effort to combine them and
tations of each model or tool, underscoring the influence of weave through the cracks by constructing MythSlith, such
vulnerability type on the effectiveness of detection methods. behavior still exists. No significant detection progress was

23564 VOLUME 12, 2024


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

FIGURE 13. Pre-processing time (in ms) for opcode extraction with respect to (from left to right): Line Count, AST Node Count, and Opcode count,
respectively.

TABLE 8. Average time taken for Smart contract vulnerability detection sometimes lesser). The average running time for MythSlith is
tool to analyze each smart contract.
1102.14 seconds whereas that of Mythril is 1270.33 seconds.
Note that unlike Mythril, the average running time of Slither
is only 5.36 seconds since it does not involve the depth of the
symbolic execution. The average running time for MythSlith
is slightly less than that of Mythril since the depth of the
symbolic execution is only increased for certain types of
vulnerabilities, whereas for all other cases where Slither has
better detection accuracy, MythSlith chooses Slither.
As observed in the Table 8, the running time to analyze
a smart contract using ML models is much lesser (in
the order of < 7 seconds) compared to the software
verification tools except for the multi-model MLP+LSTM
since it involves two ML models. It is evident that the time
taken to analyze the smart contract features and predict
using the ML-model is real-time for the smart contract
made, which then again is expected as no improvement was analysis. Whereas the pre-processing and training time for
done to each tool. However, MythSlith did cover Slither’s the ML-model is an offline process. The pre-processing time
inability to detect Arithmetic vulnerabilities. This can be for opcode extraction with respect to the number of lines
observed in Table 5. in the smart contract, number of AST nodes and number
In our proposed method, we show that detection via of CFG opcodes is depicted in Figure 13 from left to right,
extracted features from smart contracts presents viable respectively. The maximum pre-processing time observed for
patterns for machine learning models to learn and classify a smart contract with 3500 lines is 26 seconds. While most
them. However, the suitability of different models varies practical smart contracts with less than 1000 lines of code
for this specific application, this can be clearly seen from take < 10 seconds for pre-processing and opcode extraction.
the results of RF. Unlike the gradient correction method
employed by XGB, the ensemble bagging approach of RF VII. CONCLUSION AND FUTURE WORKS
did not yield equally effective results. In terms of neural Securing smart contracts is no easy feat, as they are not pro-
network method, MLP did well for all categories. In contrast, tected and visible to everyone. Current software verification
the multi-model of MLP + LSTM did not. This is primarily tools tend to be on a defensive stance, only flagging the bugs
due to the CFG features derived from Ethersolve [34]. if they are fully sure about it. This however would let the False
These CFG features did not present a strong pattern for our negatives slip through. Therefore, we proposed a machine
model to learn effectively. Furthermore, the generation of learning approach to effectively and efficiently detect seven
features using PecanPy [35] takes up a substantial amount of types of vulnerabilities while also identifying clean contracts.
time due to random walks generation. However, the limited This was done by employing models such as Random
effectiveness of Ethersolve features in this context does not Forest, XGBoost, Support Vector Machine, Multi-Layer
inherently diminish their value; the challenge may lie more Perception, and Long-Short Term Memory. To insert practical
with PecanPy’s processing demands. bugs and increase the vulnerability space, we proposed
a practical bug injection technique that injects bugs into
C. RUNNING TIME verified smart contracts that were cleaned using our proposed
In Table 8, we analyze and compare the running times pre-processing algorithm. This helps to scale up the smart
for the various software validation tools. The running contracts’ vulnerability space using features such as opcodes
time for MythSlith is comparable to that of Mythril (or and CFG that were extracted for the model training. Prior

VOLUME 12, 2024 23565


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

to vectorization, simplification was done to the opcodes in [7] SolidityScan. (2023). Poolz Finance Hack Analysis: Still Experiencing
order to reduce the dimension and remove contract-specific Overflow. SolidityScan Blog. Accessed: May 23, 2023. [Online].
Available: https://blog.solidityscan.com/poolz-finance-hack-analysis-
hexadecimal values. TDIDF was utilized with trigram for the still-experiencing-overflow-fcf35ab8a6c5
vectorization and the CFG data was processed by PencanPy [8] OpenZeppelin. (2022). Developing Smart Contracts. Accessed:
where random walks are generated and vectorized with the Dec. 26, 2022. [Online]. Available: https://docs.openzeppelin.com
/learn/developing-smart-contracts
Word2Vec model. [9] ConsenSys. (2022). Smart Contract Best Practices. Accessed:
The results of the models were then benchmarked with Dec. 26, 2022. [Online]. Available: https://consensys.github.io/smart-
software verification tools such as Mythril, Slither, and contract-best-practices/
[10] Crytic. (2022). Detector Documentation. Accessed: Dec. 26, 2022.
an experimental tool proposed in our work — MythSlith. [Online]. Available: https://github.com/crytic/slither/wiki/Detector-
From the results, machine learning models have shown Documentation
superior performance in vulnerability detection than existing [11] SmartContractSecurity. (2022). Smart Contract Weakness
Classification Registry. Accessed: Dec. 26, 2022. [Online]. Available:
Software verification tools. While testing with real-time https://swcregistry.io/
smart contracts, the MLP model performs the best at having [12] NCC Group. (2024). Top 10 Decentralized Application Security Risks.
91% accuracy along with higher recall and f1-scores. Accessed: Dec. 26, 2022. [Online]. Available: https://dasp.co/
The FPR measures show that MLP achieved the best [13] J. Feist, G. Grieco, and A. Groce, ‘‘Slither: A static analysis framework for
smart contracts,’’ in Proc. IEEE/ACM 2nd Int. Workshop Emerg. Trends
performance with the lowest average among all other tools Softw. Eng. Blockchain (WETSEB), May 2019, pp. 8–15.
at 0.0125, while MLP+LSTM achieved the lowest FPR of [14] B. Mueller, ‘‘Smashing Ethereum smart contracts for fun and real profit,’’
0.0015 for unauthorized send. HITB SECCONF Amsterdam, vol. 9, p. 54, Apr. 2018.
[15] M. Mossberg, F. Manzano, E. Hennenfent, A. Groce, G. Grieco,
While the current model improves the accuracy with J. Feist, T. Brunson, and A. Dinaburg, ‘‘Manticore: A user-friendly
significantly less FPR while detecting contract-wide bugs, symbolic execution framework for binaries and smart contracts,’’ in Proc.
it can further be improved by pinpointing the exact source 34th IEEE/ACM Int. Conf. Automated Softw. Eng. (ASE), Nov. 2019,
pp. 1186–1189.
location. One possible solution is to use a combination [16] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, ‘‘Making smart
of fuzzing with formal verification to extend our current contracts smarter,’’ in Proc. ACM SIGSAC Conf. Comput. Commun. Secur.,
model to add this feature. This solution however will face a 2016, pp. 254–269.
constraint that model patterns cannot be traced back to the [17] X. Tang, K. Zhou, J. Cheng, H. Li, and Y. Yuan, ‘‘The vulnerabilities in
smart contracts: A survey,’’ in Proc. 7th Int. Conf. Adv. Artif. Intell. Secur.
source location. We can explore some viable solutions for (ICAIS) 2021, Dublin, Ireland. Cham, Switzerland: Springer, Jul. 2021,
this constraint by tagging the source index (that may not pp. 177–190.
be used for model training) to a specific feature, thereby [18] P. Momeni, Y. Wang, and R. Samavi, ‘‘Machine learning model for smart
contracts security analysis,’’ in Proc. 17th Int. Conf. Privacy, Secur. Trust
allowing traceback to the source. Bug injection method (PST), Aug. 2019, pp. 1–6.
ensures that newer forms of features and inherent patterns [19] W. Wang, J. Song, G. Xu, Y. Li, H. Wang, and C. Su, ‘‘ContractWard:
are added. The current bug injection method leverages on Automated vulnerability detection models for ethereum smart contracts,’’
IEEE Trans. Netw. Sci. Eng., vol. 8, no. 2, pp. 1133–1144, Apr. 2021.
function descriptors. However, not all bugs are in the form [20] S. Shakya, A. Mukherjee, R. Halder, A. Maiti, and A. Chaturvedi, ‘‘Smart-
of functions. In future works, we will explore more generic MixModel: Machine learning-based vulnerability detection of solidity
bug injection methods. smart contracts,’’ in Proc. IEEE Int. Conf. Blockchain (Blockchain),
Aug. 2022, pp. 37–44.
[21] A. Ghaleb and K. Pattabiraman, ‘‘How effective are smart contract analysis
REFERENCES tools? Evaluating smart contract static analysis tools using bug injection,’’
in Proc. 29th ACM SIGSOFT Int. Symp. Softw. Test. Anal., Jul. 2020,
[1] PR Newswire. (2023). Global Smart Contracts Market to Reach USD pp. 415–427.
9850 Million by 2030 With 24% CAGR | Revolutionizing Contract [22] T. M. Corporation. Common Weakness Enumeration. Accessed: Dec. 28,
Management, Exploring the Opportunities and Trends Report By 2022. [Online]. Available: https://cwe.mitre.org/
Zion Market Research. Accessed: Sep. 3, 2023. [Online]. Available: [23] G. Wood, ‘‘Ethereum: A secure decentralised generalised transaction
https://finance.yahoo.com/news/global-smart-contracts-market-reach- ledger,’’ Ethereum Project Yellow Paper, vol. 151, pp. 1–32, Apr. 2014.
160000824.html
[24] T. Durieux, J. F. Ferreira, R. Abreu, and P. Cruz, ‘‘Empirical review of
[2] P. Praitheeshan, L. Pan, J. Yu, J. Liu, and R. Doss, ‘‘Security analysis automated analysis tools on 47,587 Ethereum smart contracts,’’ in Proc.
methods on Ethereum smart contract vulnerabilities: A survey,’’ 2019, IEEE/ACM 42nd Int. Conf. Softw. Eng. (ICSE), Montreal, QC, Canada,
arXiv:1908.08605. Oct. 2020, pp. 530–541.
[3] K. Ramana, R. M. Mohana, C. K. Kumar Reddy, G. Srivastava, and [25] S. S. Kushwaha, S. Joshi, D. Singh, M. Kaur, and H. Lee, ‘‘Systematic
T. R. Gadekallu, ‘‘A blockchain-based data-sharing framework for cloud review of security vulnerabilities in Ethereum blockchain smart contract,’’
based Internet of Things systems with efficient smart contracts,’’ in IEEE Access, vol. 10, pp. 6605–6621, 2022.
Proc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), May 2023, [26] L. Duan, L. Yang, C. Liu, W. Ni, and W. Wang, ‘‘A new smart contract
pp. 452–457. anomaly detection method by fusing opcode and source code features for
[4] W. Wang, Z. Han, T. R. Gadekallu, S. Raza, J. Tanveer, and C. Su, blockchain services,’’ IEEE Trans. Netw. Service Manage., vol. 20, no. 4,
‘‘Lightweight blockchain-enhanced mutual authentication protocol for pp. 4354–4368, Dec. 2023.
uavs,’’ IEEE Internet Things J., early access, Oct. 13, 2023, doi: [27] J. Zheng, L. Williams, N. Nagappan, W. Snipes, J. P. Hudepohl, and
10.1109/JIOT.2023.3324543. M. A. Vouk, ‘‘On the value of static analysis for fault detection in
[5] J. Korn. (Aug. 2022). Report: $1.9 Billion Stolen in Crypto Hacks software,’’ IEEE Trans. Softw. Eng., vol. 32, no. 4, pp. 240–253, Apr. 2006.
So Far This Year. Accessed: Aug. 16, 2022. [Online]. Available: [28] T. Ball, ‘‘The concept of dynamic analysis,’’ ACM SIGSOFT Softw. Eng.
https://edition.cnn.com/2022/08/16/tech/crypto-hack-rise-2022/index. Notes, vol. 24, no. 6, pp. 216–234, Nov. 1999.
html [29] R. Calinescu, C. Ghezzi, K. Johnson, M. Pezzé, Y. Rafiq, and
[6] Binance. (2023). Poolz Finance Hacked, Token Price Drops 93%. G. Tamburrelli, ‘‘Formal verification with confidence intervals to establish
Accessed: May 23, 2023. [Online]. Available: https://www.binance. quality of service properties of software systems,’’ IEEE Trans. Rel.,
com/en/feed/post/309330 vol. 65, no. 1, pp. 107–125, Mar. 2016.

23566 VOLUME 12, 2024


L. S. H. Colin et al.: Integrated Smart Contract Vulnerability Detection Tool Using MLP

[30] J. Chen, X. Xia, D. Lo, J. Grundy, X. Luo, and T. Chen, ‘‘DefectChecker: PURNIMA MURALI MOHAN (Member, IEEE)
Automated smart contract defect detection by analyzing EVM bytecode,’’ received the M.S. and Ph.D. degrees in electrical
IEEE Trans. Softw. Eng., vol. 48, no. 7, pp. 2189–2207, Jul. 2022. and computer engineering from the National
[31] M. Ortner and S. Eskandari. Smart Contract Sanctuary. [Online]. University of Singapore, in 2014 and 2018, respec-
Available: https://github.com/tintinweb/smart-contract-sanctuary tively. She held a postdoctoral researcher position
[32] W. B. Cavnar and J. M. Trenkle, ‘‘N -gram-based text categorization,’’ in with the National University of Singapore, until
Proc. 3rd Annu. Symp. Document Anal. Inf. Retr. (SDAIR), vol. 161175. 2018. She is currently an Assistant Professor with
Las Vegas, NV, USA, 1994, pp. 1–14. the Information and Communications Technology
[33] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller,
Cluster, Singapore Institute of Technology. She
O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton,
has expertise in Layer 2 and Layer 3 network pro-
J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, ‘‘API design for
machine learning software: Experiences from the scikit-learn project,’’ in tocols while working with the industry. Her current research interests include
Proc. ECML PKDD Workshop Lang. Data Mining Mach. Learn., 2013, blockchain and AI, security in next-generation networks, optimization, and
pp. 108–122. heuristics algorithm design.
[34] F. Contro, M. Crosara, M. Ceccato, and M. D. Preda, ‘‘EtherSolve:
Computing an accurate control-flow graph from Ethereum bytecode,’’ in
Proc. 29th IEEE/ACM Int. Conf. Program Comprehension, May 2021,
pp. 127–137.
[35] R. Liu and A. Krishnan, ‘‘PecanPy: A fast, efficient and parallelized JONATHAN PAN (Member, IEEE) received the
Python implementation of node2vec,’’ Bioinformatics, vol. 37, no. 19, Ph.D. degree in information technology and cyber
pp. 3377–3379, Oct. 2021. security from Murdoch University, Australia. He is
[36] A. Grover and J. Leskovec, ‘‘node2vec: Scalable feature learning for currently the Chief of the Disruptive Technolo-
networks,’’ in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data gies Office and the Director of Cybersecurity
Mining, Aug. 2016. of the Home Team Science and Technology
[37] P. Qian, Z. Liu, Q. He, B. Huang, D. Tian, and X. Wang, ‘‘Smart contract Agency, which is a statutory board formed under
vulnerability detection technique: A survey,’’ 2022, arXiv:2209.05872. Singapore’s Ministry of Home Affairs to develop
science and technology capabilities for the Home
Team. He is also an Adjunct Associate Professor
with Nanyang Technological University, Singapore. His research interests
include cybersecurity, AI, and blockchain.

PETER LOH KOK KEONG (Senior Member,


IEEE) received the M.Sc. degree in computer sci-
ence from the University of Manchester, U.K., and
the Ph.D. degree in computer engineering from
LEE SONG HAW COLIN (Member, IEEE) Nanyang Technological University, Singapore.
received the bachelor’s degree in mechatron- He is currently an Associate Professor with
ics engineering from the Singapore Institute of the Information and Communications Technology
Technology–University of Glasgow, in 2016. He is Cluster, Singapore Institute of Technology. He is a
currently pursuing the Master of Engineering registered Professional Engineer in Singapore and
degree in future communications and blockchain a Chartered Engineer in U.K. He has more than
with the Singapore Institute of Technology. He is 35 years of professional engineering, research, academic, and consultative
a Research Engineer with the Singapore Institute experience. To date, he has authored/coauthored more than 100 publications,
of Technology. His professional journey includes a with several in high-impact, international, and peer-reviewed journals. His
significant role in the development of a COVID-19 research interests include information and cyber security, data analytics and
CMS application with the GP Connect Team. His current research interests machine learning for digital crime, blockchain and the IoT, and malware
include the intersection of AI, blockchain, and their applications in 5G analysis and classification.
networks.

VOLUME 12, 2024 23567

You might also like