0% found this document useful (0 votes)
140 views

Transaction Analysis On Ethereum Network Using Machine Learning A Proposed System

Ethereum, a well-known blockchain’s most famous implementation builds and deploys a decentralized application where users can transact with cryptocurrency using smart contracts
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views

Transaction Analysis On Ethereum Network Using Machine Learning A Proposed System

Ethereum, a well-known blockchain’s most famous implementation builds and deploys a decentralized application where users can transact with cryptocurrency using smart contracts
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Transaction Analysis on Ethereum Network Using


Machine Learning: A Proposed System
Mahmooda Khatoon Dr. Khaleel Ahmad
Student of M.Tech Assistant Professor
Dept. of Computer Science & Information Technology Dept. of Computer Science & Information Technology
Maulana Azad National Urdu University Maulana Azad National Urdu University
Hyderabad, India Hyderabad, India

Abstract:- Ethereum, a well-known blockchain’s most Ethereum, breeding ground for various cybercrimes due to its
famous implementation builds and deploys a decentralized advantages of anonymity and low transfer costs, as well as its
application where users can transact with cryptocurrency huge economic value has been increased [4]. An initial-coin-
using smart contracts. The latest technological offering (ICO) provides funding technique for block-chain
developments in cryptocurrencies and the benefits applications where investors receive tokens in exchange of their
associated with them have been hidden by a number of monetary investment. However, it is currently reported that
illegal activities on the network. Like bribery, phishing more than 10% of ICOs issued on Ethereum are victims of
scam, money laundering and fraud etc. The ‘Pseudo- various types of fraud, including phishing scams; Ponzi based
anonymous’ nature of the participants in Ethereum crimes, etc. [5]. Detailed report by analysis, source
blockchain network leads to cause difficulty in detecting investigative & safety management software for digital cash or
illicit behavior of the users. Anomalies must be detected and virtual-currencies, in 2021, crypto-based crime hit a new high
resolved quickly to ensure participant trust in the largest with $14 billion in illegal addresses during the year, which is
blockchain platforms. There is a lot of work on detailed over $7.8 billion. In 2020 [6], this indicates that financial
analysis of Ethereum transactions in terms of how well they security needs to be assured as it become an important concern
work, but this research is the first to the best of my to get over block-chain environment. However, between the
knowledge to detect anomalies in Ethereum transaction many risk concerns associated with block-chain-based digital
records. To achieve our goal, we have extracted more cryptocurrencies; Fraudulent transactions attributed for over
comprehensive feature of Ethereum transaction data to get 50% of all Ethereum cybercrimes in 2017, and this type of fraud
rid of the shortcomings of existing work. We have used poses a major security threat to Ethereum trading [7]-[8]. When
SMOTE technique to deal with highly imbalanced dataset using Ethereum transaction records to detect fraudster we may
and implemented five supervised-machine-learning model, encounter following problems which may cause difficulties in
Logistic-Regression, KNN, Decision-tree, Random-Forest the detection of fraud.
and SVC classifier to access and compare the best
performer among them. Random Forest Outperforms with While dealing with real time Ethereum transaction data
accuracy of 98%. We evaluated the accuracy, precision, and detection of malicious users is very difficult, as they are very
F-value of each method and backed them up with few in such a huge transaction records, finding defaulter are like
experimental results. searching a needle in a haystack. We are experiencing the same
problem due to excessive data imbalance nature of the dataset.
Keywords:- Blockchain, Ethereum, Cryptocurrency, Smart
Contracts, Anomaly Detection. The diversity of the Ethereum trading system reflects the
diversity of addresses, such as wallets, stock exchanges, and
I. INTRODUCTION well-known ICOs [9], but the trading of typical loss accounts is
relatively small. Using up-to-date information to distinguish
Blockchain is a growing collection of blocks of digital between malicious and non-malicious email addresses can be
records of currency transactions, conceived by Satoshi very difficult in many networks.
Nakamoto [1] to create a decentralized payment system.
Recently, blockchain are in trend in computer technology, and The Ethereum fraud detection concept is a classification
blockchain technology as a whole will lead to major changes in concept in machine learning. The success of this process is
economy of nation [2]. Cryptocurrencies or digital assets are the related to the policy chosen and the accessibility of the
most vital and famous blockchain applications in terms of information provided. Feature extraction that accurately
economy. The Bitcoin project is the latest successful huge differentiates between malicious and non-malicious
implementation for the block-chain technology. Ethereum is transactions can be effectively used for classification models
considered to be the second largest blockchain application [10]-[11].The main contribution of this article is we can detect
supporting smart-contracts, cryptocurrency (Ether), are the the malicious nodes in the Ethereum network with a
second huge crypto-currency [3]. While, due to rapid growth of comparatively high probability. The Proposed method does not

IJISRT22JUL932 www.ijisrt.com 1170


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
classify the genuine user as malicious. Many tests conducted on III. METHODOLOGY
the Ethereum trading network confirmed success of the
dedicated method for detecting fraudulent transactions. Here, we first describe the overview of the dataset used
and then describe all the features extracted from the Ethereum
II. RELATED-WORK transaction record dataset. The Section is concluded with the
description of machine learning classifiers proposed for this
The concept of block-chain was introduced to store study.
information in registers of records that are distributed over
decentralized networks that are part of the blockchain A. Data Description
infrastructure. The Bitcoin infrastructure has attracted many In this study we obtain the Ethereum transaction data from
applications, and the open and close nature of the blockchain Infura API. They provided secure and reliable accessibility to
technology is overall used in crypto currencies. Ethereum is Ethereum API and IPFS gateways [28]. We will label the
opensource place. transactions with instances of legitimate and malicious with 0
and 1 respectively.
The idea of Ethereum was first introduced at the end of
2013, growth of Ethereum program start in initially 2014. But B. Feature Extraction
with rapid growth of blockchain it becomes popular. Scams on Based on the measured variables provided by the
digital currency used as name exchanges and new technologies Ethereum network, we are trying to introduce more meaningful
are not uncommon. Anomaly detection for the Ethereum features that will help our learning algorithm achieve its desired
network and other public channel currencies has been goals. In total, 16 features were extracted from the Ethereum
extensively and comprehensively reviewed existing literature, transaction network.
several detection methods suggested. In 2016, Pham et al. [13]-  Currency features: max val received, Avg val received, avg
[ 15] Extracted bitcoin properties, read them using power law val sent, total ether sent, total ether balance
and density law from custom graph and transaction graph  Network features: Average minimum between sent
network approaches, and found three non-uniform values of 1 transition, Average minimum between found transition,
in 30 known bitcoins using Local-Outlier-Factor One-Class- Difference in between first and last minutes time, Sent
Support-Vector-Machine and Mahala-Nobis-Distance models. transition, found transition, Number of Created-Contracts
Vasek and Moore [16] conducted preliminary research Bit-coin  ERC20 features: ERC20 total Ethereum found, ERC20 total
scammers, identified 4 types of fraud: 1) Ponzi-schemes; 2) Ethereum sent, ERC20 total Ether sent contract, ERC20
Mining; 3) fake wallets; 4) Fraud Many smart contracts and unique sent address, ERC20 unique recent token name
ICO security studies [17] - [20]] have been analyzed on the
Ethereum platform. For example, Atzei et al. [17] studied the C. Pre-Processing
risk of Ethereum smart-contracts and discussed key attacks and Data Preprocessing is a crucial step while using machine
vulnerabilities. A series of fraud detection studies on the learning classifiers as it contains noise and redundancy. Work
Ethereum platform were conducted to study smart contracts and is needed to clean up data and reduce / eliminate noise and to
the first complete overview of the Ethereum Ponzi scheme. adapt to a machine learning model to improve its accuracy and
Later, Chen et al. [21] proposed a method to investigate efficiency.
complex Ponzi-schemes using data-mining and machine-
learning techniques. In 2017, Toyoda et al. [22] Analysis of D. Proposed Method
Bitcoin transaction patterns related to High Product Investment The objects in the dataset obtained by Infura API were
Plan (HPIP) affiliates. Using large-scale supervised learning to Unlabeled as discussed in Section 3.1, hence in this study, the
extract known transaction patterns and features of tokenized appropriate algorithm will be selected to detect the odd value
Bitcoin addresses, they identified fifteen hundreds and more by accumulating the basic structure of the clusters present in the
bit-coin index with a bounce rate of 83% and false-positive rate network. The compared algorithms are Decision Tree
of 4.4%. In 2018, Vasek et al. [23] Ponzi-based survival- Classifier, Logistic Regression, KNN, Random Forest, and
analysis methods that identify factors influencing Bitcoin fraud. SVC. The results obtained using these methods will be
He accumulated 1,424 posts on Bitcointalk to come up with discussed with an indication of their accuracy in the "Results"
1780 different Bitcoin Ponzi schemes and found a positive section.
correlation between the numbers. The interactions between
scammers and their victims determine the duration through IV. RESULT
survival analysis. The existence of the transaction report
provides a great source of computer technology that can help In this section, we will discuss the experimental results of
detect fraud in Ethereum mining transactions [24]. Although the selected supervised machine learning classifiers and present
most fraudsters have traditionally relied on phishing-emails and the comparative study with the help of fig. 7.
webpages to find important data from consumers, modern fraud
techniques eyes on how to find mails or webpages that contain
them. There may be false information [25] - [26].

IJISRT22JUL932 www.ijisrt.com 1171


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Performance-Evaluation-Metrics Table 2: Classification Report of Decision Tree
To find the execution of many techniques in terms of Precisio Reca F1-score Suppor
anomaly find problem, we accept 3 analysis metrics, namely, n (%) ll (%) t
precision, recall, and F-score as the main evaluation metrics. (%)
0 97 88 92 1547
The three metrics are defined as follows: 1 68 88 77 422
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
Accuracy - - 96 1969
Precision = True−Positive+False−Positive Macro Avg 82 88 85 1969
Weighted 90 88 89 1969
Avg
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
Recall =
𝑇𝑟𝑢𝑒−𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒+False−Negative Fig.2. Confusion Matrix of Decision Tree Classifier

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑋 𝑅𝑒𝑐𝑎𝑙𝑙
F-score= 2 𝑋 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+ 𝑅𝑒𝑐𝑎𝑙𝑙

A. Classification-Performance
In this Section, we first present the classification report of
each method followed by their confusion matrix, which gives
the accuracy measures of each classifier. Next we will provide
a comparison report of accuracy and f-score of selected
methods applied for classification.

Table 1. Classification-Report-of-Logistic-Regression
Precision Recall F1- Support
(%) (%) score(%)
0 97 88 92 1547
1 68 88 77 422 Table 3. Classification Report of Random Forest
Accuracy - - 83 1969 Precision Recall F1- Support
Macro Avg 82 88 85 1969 score
Weighted 90 88 89 1969 0 0.99 0.98 0.98 1547
Avg 1 0.93 0.95 0.94 422
Accuracy - - 0.98 1969
Fig.1. Confusion Matrix of Logistic Regression Macro Avg 0.96 0.97 0.96 1969
Weighted 0.97 0.97 0.89 1969
Avg

Fig.3. Confusion Matrix of Random Forest

IJISRT22JUL932 www.ijisrt.com 1172


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Table 4: Classification Report of SVC
Precision Recall F1- Support Table 5: Classification Report of KNN
score Precision Recall F1- Support
0 0.99 0.98 0.98 1547 score
1 0.93 0.95 0.94 422 0 0.99 0.98 0.98 1547
Accuracy - - 0.84 1969 1 0.93 0.95 0.94 422
Macro Avg 0.96 0.97 0.96 1969 Accuracy - - 0.84 1969
Weighted 0.97 0.97 0.89 1969 Macro Avg 0.96 0.97 0.96 1969
Avg Weighted 0.97 0.97 0.89 1969
Avg
Fig.4. Confusion Matrix of Support Vector Classifier
Fig.5. Confusion Matrix of KNN

Fig 7. Performance Comparison of selected machine learning classifiers

IJISRT22JUL932 www.ijisrt.com 1173


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Considering the nature of dataset, first we have chosen REFERENCES
Logistic Regression model which gives accuracy of 83% shown
in table 1. It correctly identifies 373 fraud cases out of 422 that [1]. “The Truth About Blockchain.”
are quite impressive which get improves in SVC with 84% https://hbr.org/2017/01/the-truth-about-blockchain
accuracy when model is evaluated on the test set. The test set (accessed Jun. 07, 2022).
contains 422 malicious and 1547 non-malicious data points. [2]. Y. Yuan and F. Y. Wang, “Blockchain and
SVC gives better results than KNN when tested on similar data Cryptocurrencies: Model, Techniques, and Applications,”
points. We get 96% accuracy while using Decision tree IEEE Trans. Syst. Man, Cybern. Syst., vol. 48, no. 9, pp.
classifier where we trained the model on ground truth. Figure 2 1421–1428, Sep. 2018, doi:
shows that decision tree correctly classified 392 out of 422 10.1109/TSMC.2018.2854904.
fraud cases. The results of Random Forest shows that it is doing [3]. S. Wang, L. Ouyang, Y. Yuan, X. Ni, X. Han, and F. Y.
slightly better that Decision tree when it comes to non-fraud Wang, “Blockchain-Enabled Smart Contracts:
transaction, flagged 29 cases as fraud when they were actually Architecture, Applications, and Future Trends,” IEEE
non-fraud. When it comes to identify fraud cases RF fails to Trans. Syst. Man, Cybern. Syst., vol. 49, no. 11, pp. 2266–
detect 21 fraud cases out of 422 shown in fig 3, suggesting the 2277, Nov. 2019, doi: 10.1109/TSMC.2019.2895123.
best recall score. Considering the overall performance Random [4]. A. Holub and J. O’Connor, “COINHOARDER: Tracking
Forest is the choice that we want. In this study we are not. In a ukrainian bitcoin phishing ring DNS style,” eCrime Res.
this paper we care more about the transactions that are actually Summit, eCrime, vol. 2018-May, pp. 1–5, Jun. 2018, doi:
fraud rather than which were treated as non-fraud. 10.1109/ECRIME.2018.8376207.
[5]. “Ethereum under siege: Scammers make $700,000 in 6
V. CONCLUSION AND FUTURE WORK days from Slack and Reddit phishing attacks | Cyware
Alerts - Hacker News.”
To fulfil the objective of this paper, we have https://cyware.com/news/ethereum-under-siege-
systematically investigated the detection of fraudulent scammers-make-700000-in-6-days-from-slack-and-
transactions in Ethereum via machine learning classifiers. reddit-phishing-attacks-ec6c40c1 (accessed Jun. 07,
Specifically, Proposed a SMOTE method to deal with highly 2022).
imbalanced data sets using features extracted from Ethereum [6]. “The 2022 Crypto Crime Report,” no. February, 2022.
transaction history using machine learning algorithms. The [7]. M. Conti, K. E. Sandeep, C. Lal, and S. Ruj, “A survey on
study in this article is an improvement on similar reports in security and privacy issues of bitcoin,” IEEE Commun.
terms of the number of known cases of fraud. Based on the Surv. Tutorials, vol. 20, no. 4, pp. 3416–3452, Oct. 2018,
efficiency of our proposed discovery system and the doi: 10.1109/COMST.2018.2842460.
characteristics of the Ethereum business network, the [8]. M. Khonji, Y. Iraqi, and A. Jones, “Phishing detection: A
performance of real Ethereum business records demonstrates literature survey,” IEEE Commun. Surv. Tutorials, vol.
that Random Forests outperform other properties. From this 15, no. 4, pp. 2091–2121, 2013, doi:
paper, we can conclude that the principle is implemented in 10.1109/SURV.2013.032213.00009.
Ethereum trading using machine learning methods. The [9]. D. Lin, J. Wu, Q. Yuan, and Z. Zheng, “Modeling and
machine can determine the level of suspicion by classifying Understanding Ethereum Transaction Records via a
each address as malicious or non-malicious by tracking of Complex Network Approach,” IEEE Trans. Circuits Syst.
Potential Data Points. II Express Briefs, vol. 67, no. 11, pp. 2737–2741, Nov.
2020, doi: 10.1109/TCSII.2020.2968376.
The main challenge in this paper that instances are [10]. P. Zheng, Z. Zheng, J. Wu, and H.-N. Dai, “XBlock-ETH:
unlabeled and therefore become tough to validate results. The Extracting and Exploring Blockchain Data From
issue of malicious transactions detected in Ethereum has not Ethereum,” IEEE Open J. Comput. Soc., vol. 1, pp. 95–
been effectively investigated because it provides security risks 106, May 2020, doi: 10.1109/OJCS.2020.2990458.
to users of Ethereum. As a work of research in this area, we [11]. I. Alqassem, I. Rahwan, and D. Svetinovic, “The Anti-
want to attract more interest and work in this area. With more Social System Properties: Bitcoin Network Data
domain knowledge and more accurate analysis, more Analysis,” IEEE Trans. Syst. Man, Cybern. Syst., vol. 50,
systematic and general algorithm can be dedicate for Ethereum, no. 1, pp. 21–31, Jan. 2020, doi:
other transaction wallets. In Future Parallel segmentation 10.1109/TSMC.2018.2883678.
method [24] can be applied to detect windings. [12]. P. Monamo, V. Marivate, and B. Twala, “Unsupervised
learning for robust Bitcoin fraud detection,” 2016 Inf.
In this way we find the proper structure of a fixed graph Secur. South Africa - Proc. 2016 ISSA Conf., pp. 129–134,
to identify anomalies. 2016, doi: 10.1109/ISSA.2016.7802939.

IJISRT22JUL932 www.ijisrt.com 1174


Volume 7, Issue 7, July – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[13]. M. Vasek and T. Moore, “There’s no free lunch, even [24]. “Bee Token ICO Stung by $1 Million Phishing Scam -
using bitcoin: Tracking the popularity and profits of CoinDesk.”
virtual currency scams,” Lect. Notes Comput. Sci. https://www.coindesk.com/markets/2018/02/01/bee-
(including Subser. Lect. Notes Artif. Intell. Lect. Notes token-ico-stung-by-1-million-phishing-scam/ (accessed
Bioinformatics), vol. 8975, pp. 44–61, 2015, doi: Jun. 10, 2022).
10.1007/978-3-662-47854-7_4/COVER/. [25]. “ New rule-based phishing detection method | Semantic
[14]. N. Atzei, M. Bartoletti, and T. Cimoli, “A survey of Scholar.” https://www.semanticscholar.org/paper/New-
attacks on Ethereum smart contracts (SoK),” Lect. Notes rule-based-phishing-detection-method-Moghimi-
Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Varjani/b7578f362b7f043c88f49ad707097c271cf7b65c
Lect. Notes Bioinformatics), vol. 10204 LNCS, pp. 164– (accessed Jun. 10, 2022).
186, 2017, doi: 10.1007/978-3-662-54455-6_8. [26]. “Machine learning based phishing detection from URLs |
[15]. I. Grishchenko, M. Maffei, and C. Schneidewind, “A Semantic Scholar.”
semantic framework for the security analysis of ethereum https://www.semanticscholar.org/paper/Machine-
smart contracts,” Lect. Notes Comput. Sci. (including learning-based-phishing-detection-from-URLs-Sahingoz-
Subser. Lect. Notes Artif. Intell. Lect. Notes Buber/8dd4a8eefa366b1b7d2471c1b8580df5bea23924
Bioinformatics), vol. 10804 LNCS, pp. 243–269, 2018, (accessed Jun. 10, 2022).
doi: 10.1007/978-3-319-89722-6_10.
[16]. G. Fenu, L. Marchesi, M. Marchesi, and R. Tonelli, “The
ICO phenomenon and its relationships with ethereum
smart contract environment,” 2018 IEEE 1st Int. Work.
Blockchain Oriented Softw. Eng. IWBOSE 2018 - Proc.,
vol. 2018-January, pp. 1–7, Mar. 2018, doi:
10.1109/IWBOSE.2018.8327568.
[17]. C. Ferreira Torres, M. Steichen, and R. State, “The Art of
The Scam: Demystifying Honeypots in Ethereum Smart
Contracts,” Accessed: Jun. 10, 2022. [Online]. Available:
https://www.usenix.org/conference/usenixsecurity19/pres
entation/ferreira.
[18]. M. Vasek and T. Moore, “Analyzing the Bitcoin Ponzi
scheme ecosystem,” Lect. Notes Comput. Sci. (including
Subser. Lect. Notes Artif. Intell. Lect. Notes
Bioinformatics), vol. 10958 LNCS, pp. 101–112, 2019,
doi: 10.1007/978-3-662-58820-8_8/COVER/.
[19]. M. Bartoletti, S. Carta, T. Cimoli, and R. Saia, “Dissecting
Ponzi schemes on Ethereum: Identification, analysis, and
impact,” Futur. Gener. Comput. Syst., vol. 102, pp. 259–
277, Jan. 2020, doi: 10.1016/J.FUTURE.2019.08.014.
[20]. W. Chen, Z. Zheng, J. Cui, E. Ngai, P. Zheng, and Y.
Zhou, “Detecting ponzi schemes on ethereum: Towards
healthier blockchain technology,” Web Conf. 2018 - Proc.
World Wide Web Conf. WWW 2018, vol. 4, pp. 1409–
1418, Apr. 2018, doi: 10.1145/3178876.3186046.
[21]. N. Abdelhamid, A. Ayesh, and F. Thabtah, “Phishing
detection based Associative Classification data mining,”
Expert Syst. Appl., vol. 41, no. 13, pp. 5948–5959, Oct.
2014, doi: 10.1016/J.ESWA.2014.03.019.
[22]. E. Medvet, E. Kirda, and C. Kruegel, “Visual-similarity-
based phishing detection,” Proc. 4th Int. Conf. Secur.
Priv. Commun. Networks, Secur., 2008, doi:
10.1145/1460877.1460905.
[23]. M. Zouina and B. Outtaj, “A novel lightweight URL
phishing detection system using SVM and similarity
index,” Human-centric Comput. Inf. Sci., vol. 7, no. 1,
Dec. 2017, doi: 10.1186/S13673-017-0098-1.

IJISRT22JUL932 www.ijisrt.com 1175

You might also like