0% found this document useful (0 votes)

31 views12 pages

Taxes and Finance Field Using Machine Learning Techniques: A Survey

Taxes are considered one of the most important revenues for developed and undeveloped countries alike, because of their importance in raising the level of the country. Taxes are an amount that the state imposes on companies and individuals. However many taxpayers try to evade tax by not paying their taxes in several ways, such as lying on the declaration form, hiding part of the data for tax fraud, and other ways and methods.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views12 pages

Taxes and Finance Field Using Machine Learning Techniques: A Survey

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Taxes and Finance Field Using Machine

Learning Techniques: A Survey
Abeer A Shujaaddeen1*; Fadl Mutaher Ba-Alwi12; Abdulkader M. Al-Badani3
1
Computer Science Department, Faculty of Computer and Information Technology, Sana'a University, Yemen
2
Computer Science Department, Faculty of Computer and Information Technology, Sana'a University, Yemen
3
Faculty of Science and Engineering, Department of Computers,Aljazeera University, Ibb, Yemen

Corresponding Author:- Abeer A Shujaaddeen 1*

Abstract:- Taxes are considered one of the most The persistent issue of aggressive tax avoidance and the
important revenues for developed and undeveloped reluctance of certain tax practitioners to collaborate with tax
countries alike, because of their importance in raising the administrations continue to pose significant challenges.
level of the country. Taxes are an amount that the state Concurrently, business leaders in some developing countries
imposes on companies and individuals. However many express concerns about being held to higher standards
taxpayers try to evade tax by not paying their taxes in compared to other taxpayers [3].
several ways, such as lying on the declaration form, hiding
part of the data for tax fraud, and other ways and Local tax authorities, who are responsible for
methods. Therefore, many countries have implemented developing cost-effective solutions to address this issue, place
many procedures and regulations to reduce tax evasion. a high priority on identifying and preventing tax fraud. The
Recently, it has resorted to artificial intelligence use of machine learning algorithms has been at the forefront
techniques such as machine learning (ML) and deep of several recent efforts to detect tax fraud.
learning (DL) such as neural networks, decision trees,
random forests, clustering techniques such as K-Mean, Machine learning and artificial intelligence play a
and others to reduce tax evasion. In this paper, we will crucial role in combating tax and financial evasion. They
present a summary of a group of countries in their trying achieve this by leveraging algorithms to detect potential
to detect tax and financial evasion and fraud. wrongdoing and conducting real-time transaction analysis,
thereby reducing fraud. The use of machine learning and deep
Keywords:- Taxes, Tax Fraud, Taxpayers, Machine learning techniques is crucial for these systems to function
Learning, and Deep Learning. well.

I. INTRODUCTION  The Researches and Approaches that tackle using

Machine Learning and Deep Learning Techniques in
Tax can be defined as a monetary payment made to the Trying Tax and Finance Fraud Detecting as follows:
government by individuals and organizations. Its purpose is According to [4], supervised machine learning
to provide funding for public sectors that are under the techniques face challenges in detecting tax fraud, particularly
administration of the government. These sectors include in the Colombian context, due to the limited availability of
education, which encompasses schools, teachers' salaries, and historically labeled data. Auditing requires significant time
the salaries of workers in ministries and government and resources, making it difficult to generate labeled data for
institutions. Additionally, tax revenue is used to support training supervised algorithms. Consequently, the
various aspects such as hygiene, economic policies, and the generalization power of these algorithms is hindered, limiting
maintenance of state infrastructure, including sanitation, dam their effectiveness. The researchers in the paper propose a
construction, and unemployment insurance [1]. technique that enables tax authorities to prioritize audits
based on data-driven methods, without relying on historically
Tax fraud is a broad term used to describe the intentional labeled data.
actions taken by individuals or organizations to unlawfully
evade tax payments. It involves concealing the true financial The results produced using this method show that the
status of the taxpayer from tax authorities to minimize the model can identify questionable tax declarations and flag
amount of taxes owed. This can include submitting false tax them as suspicious without the need for past labeled data,
reports, such as underreporting profits or providing inaccurate improving operational efficiency.
information. Tax fraud is commonly associated with activities
conducted in the informal economy. One way to measure tax According to tax invoices, researchers in [5] propose a
fraud is through the "tax gap," which represents the CNN-RNN structure that is compositional and incorporates
discrepancy between the income that should be reported to an attention mechanism to classify transaction behavior.. This
tax authorities and the actual amount reported. In essence, tax classification provides a fresh viewpoint for examining the
fraud involves providing false information on a tax return regional industrial structure and is essential for tax oversight.
form to reduce tax liability [2].

IJISRT24NOV1974 www.ijisrt.com 3035

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
According to preliminary research, transaction behavior can of new businesses in EU nations are more significantly
be classified with an overall accuracy of 75%. impacted by tax administrative costs.

In [6], the paper examines the tax planning landscape in Furthermore, the findings for Slovenia highlight the
the context of artificial intelligence and big data. It addresses need of a reliable tax system, with an emphasis on
tax planning issues within the framework of big data and information technology and procedural measures.
suggests utilizing these technologies to optimize tax planning.
A new model is created when big data and tax planning are In [11], the researchers explore the challenge of
combined. financial fraud and how financial organizations are using
mining tools to counter it. The paper presents an overview of
The paper given in [7] reflects on the preliminary fraud strategies, with a particular emphasis on machine
findings of a collaborative scientific research initiative learning, data mining, and preventative techniques like
between the Tax Administration and the Faculty of Sciences clustering, classification, and regression. The goal is to use
at the University of Novi Sad. The project's goal is to create mining techniques to create remedies for financial fraud.
algorithms for detecting the risk of tax evasion using
advanced big data analytics and artificial intelligence The study provided in [12] addresses the issue of
techniques, as well as machine learning. The presented establishing the strategy of a self-interested, risk-averse tax
approach is based on an indicator that compares a legal body. The study uses Q-learning and new advances in Deep
entity's income distribution to the average income distribution Reinforcement Learning to achieve approximate solutions.
in the relevant business sector. The results illustrate the The research entails identifying the expected tax evasion
effectiveness of the developed indicator. behavior of taxpayer entities, establishing the risk aversion
level of the "average" entity using empirical tax evasion
In [8,] the researchers propose a universal architecture estimates, and evaluating sample tax plans. The model serves
termed the unsupervised conditional adversarial network as a testbed for tax policies and makes various policy
(UCAN) for identifying tax evasion. This approach is the first recommendations based on the outcomes.
attempt to address audit tasks in unlabeled target domains via
inter-region transfer. The architecture makes use of an In [13], the study discusses known strategies for
adversarial neural network and incorporates label information identifying tax evasion in databases utilizing expert systems.
into the distribution adapter, which allows for fine-grained It compares the suggested expert system to various strategies
adaption of the data's joint probability distribution. The model for improving tax evasion detection. The study proposes an
applies a constraint based on the retrieved features' abstract solution based on an expert system in the domain of
conditional maximum mean discrepancy (CMMD) to align tax evasion, complete with performance modeling. The
the conditional probability distribution (CPD) for deep expert system builder acts as an interface for personnel
representation. The model combines the distribution adapter working with the defined expert system. The results show that
and the label predictor to allow for end-to-end learning of the suggested expert system detects tax evasion trends with a
unsupervised feature transfers. Experimental results illustrate high level of accuracy.
the model's remarkable performance in numerous migration
tasks compared to the state-of-the-art.approaches. The study described in reference [14] introduces a
conceptual framework that aims to establish a solid
The research in [9] focuses on identifying tax fraud in methodological and theoretical basis for employing Data
Spanish personal income tax returns (IRPF). The study makes Analytics in the field of taxation. The research primarily
use of cutting-edge machine learning-based forecasting concentrates on the utilization of operational data by tax
techniques, notably Multilayer Perceptron neural network authorities and identifies machine learning techniques that
(MLP) models. Using neural networks, the researchers were prove effective in detecting particular forms of fraud.
able to divide up the taxpayers and assess the probability that
a particular taxpayer would attempt to evade taxes. The In [15], the researchers utilize data mining tools to detect
chosen model outperformed previous tax fraud detection fraud in banking by leveraging the data already collected by
models, with an efficiency rate of 84.3%. The suggested the bank. They employ supervised machine learning
method might be expanded to measure a person's propensity techniques, specifically support vector machines, to detect
for tax fraud in regard to various sorts of taxes. These models fraudulent transactions based on intentional and unintentional
can help tax offices make defensible choices. client reactions and new transactions. The support vector
machine algorithm successfully identifies customers engaged
There are two goals for the study in [10]. Its primary in fraudulent transactions, using a database of credit card
goal is to find out how Small and Medium Businesses (SMEs) transactions to combat banking fraud.
view the existing situation and strategies for cutting
administrative expenses. Second, it examines the connection The study in [16] addresses the economic impact of
between the costs of tax policies and entrepreneurial activity unpaid taxes by suggesting an automated system for
using descriptive statistics and hierarchical cluster analysis. forecasting tax defaults. The researchers use a variety of
Datasets from Slovenia and the European Union (EU) are feature transformation techniques as well as cutting-edge
analyzed independently. The results indicate that the total machine learning algorithms. The prediction algorithm is
amount of early-stage entrepreneurial activity and the density validated using a dataset containing information on tax

IJISRT24NOV1974 www.ijisrt.com 3036

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
defaults and non-defaults in Finnish limited liability revenue through the determination of the probability of
enterprises. detecting tax evasion.

In [17], the researchers look on the use of unsupervised In [23], the study focuses on modeling tax behavior in
and semi-supervised machine learning approaches to detect the expatriate community. The researchers analyze survey
abnormal tax returns for the Norwegian Tax Administration. results from the "Ethical Obligation to Pay Fair Taxation
They investigate the capabilities of these strategies and Survey" to identify possible combinations, resulting in the
examine how different dataset aspects affect their identification of 18 structures. Using a big data strategy, data
performance. The goal is to discover appropriate ways for on these 18 structures is collected, resulting in 2090 pages of
detecting new types of errors, resulting in a reduction in tax data containing 377,783 words related to tax evasion. The
errors that affect tax revenue. data is pre-processed and analyzed using KH Coder, a text
analysis tool. The interpretation of the data leads to the
The research discussed in reference [18] tackles the reduction of the 18 structures to seven comprehensive
issue of having a scarcity of labeled data in the domain of tax structures. A literature review is conducted based on these
fraud detection. To overcome this challenge, the researchers seven "basic" structures. The data is analyzed using KH
utilize unsupervised anomaly detection methods, which are Coder and machine learning techniques, resulting in a new
not commonly employed in tax fraud detection studies. They tax evasion model with seven dimensions: 1. Taxation of the
examine a distinctive dataset that incorporates VAT Rich, 2. Implementation Strategies, 3. Business Tax Planning,
declarations and client listings for all VAT numbers in 4. Capital Gains Tax, 5. Inequality of Wealth and Power, 6.
Belgium across ten sectors. Economic Effects of Taxes, and 7. Audits and Materiality.

The study in [19] seeks to review the body of research In [24] proposes to apply machine learning for decision-
on audit and tax from the perspective of developing making in fiscal audit plans related to service taxes in the
technology while also establishing a research agenda for the municipality of São Paulo. The researchers use machine
future. By combining text analysis and bibliometrics, the learning, specifically Random Forests, to forecast crimes
researchers use a meta-literature technique to assess 154 against the tax system. The findings show that Random
notable English papers published in Scopus journals during Forests outperform other learning algorithms in terms of tax
the last 35 years. The programs utilized in the study included crime prediction. Random Forests also have strong
RStudio, VOS Viewer, and Microsoft Excel. generalization ability. Improved projections result in more
efficient audit strategies, more tax income, and taxpayer
In [20], social planners and economic agents are trained compliance with tax regulations.
via model-free reinforcement learning (RL) in AI-based
economic simulations. The fundamental advantage of model- In [25] examines how artificial intelligence (AI) is used
free RL is its flexibility, which allows the planner to employ in the Indian revenue system. They take into account
any social purpose as a reward function. Furthermore, no variables like tax expertise, tax education, tax complexity,
prior world knowledge is required to design a successful tax legal penalties, interactions with tax authorities, ethics,
policy. perceptions of the tax system's fairness, feelings about paying
taxes, knowledge of offenses and penalties tax compliance,
[21] introduces a revolutionary method called tax education, and the likelihood of an audit. The goal of the
MALDIVE for assisting tax authorities in tax risk assessment study is to comprehend how AI might affect these variables
to find tax evasion and avoidance. The network model used and possibly improve the Indian taxation system.
by MALDIVE to describe the numerous connections amongst
taxpayers. To help public servants identify problematic The study [26] proposes a novel hybrid machine
taxpayers, an approach that combines data mining and visual learning-based technique for mitigating the risk of tax fraud.
analytics methodologies has been developed. The paper The approach incorporates domain information into the
provides a four-step implementation process for MALDIVE. model, resulting in an explainable DT model that domain
experts can verify. It also contains an anomaly validation
The study in [22] analyzes tax evasion detection as a function that employs two separate anomaly detection
critical function of tax administration and develops a model methods (K-means and autoencoder). The method is intended
for estimating the likelihood of tax evasion that incorporates to detect tax fraud involving personal income and makes use
quantitative and qualitative markers. The study employs of big data techniques to improve tax fraud detection.
research techniques such as systematic analysis, scientific
abstraction, logical generalization, expert review, and In [27], the researchers demonstrate the use of machine
statistical analysis. The study evaluates the chance of learning and network science tools to automatically identify
identifying tax evasion in the Republic of Azerbaijan using patterns of tax evaders. This has potential applications in
the proposed model, and the results show a 29% probability. various areas such as bribery practices, money laundering,
The findings suggest the need for improvements in the tax and other illegal activities, benefiting society. However,
administration mechanism in Azerbaijan, emphasizing the caution should be exercised when applying these methods,
practical significance of the proposed model in enhancing the and their limitations should be considered.
effectiveness of tax institutions and impacting state budget

IJISRT24NOV1974 www.ijisrt.com 3037

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The paper [28] describes a machine learning-based In [34], the researchers proposed machine learning-
approach for detecting tax evasion in Espírito Santo, Brazil. based predictive analytics as a decision support system for
Four classifiers (Random Forest, k-nearest Neighbors, Neural exploiting latent tax opportunities. They developed three
Network, and Support Vector Machine) are trained using tax machine learning models: decision tree, random forest, and
and financial data from diverse organizations. The Random logistic regression. Using trigger data and other predictors,
Forest classifier performs the best, with a macro-averaged F1 they analyzed 5,562 samples of potential tax income. The
score of 92.98%. The study illustrates Random Forest's random forest model produced the most precise prediction
ability to produce reliable outcomes. outcomes.

In [29], the researchers discuss financial statement The study in [35] aimed to establish a fraud detection
fraud, which is becoming a major issue for governments, system in tax. The researchers employed predictive
businesses, and investors. They offer a hybrid system that techniques and feature extraction to identify fraud trends and
includes a support vector machine, an upgraded ID3 decision anticipate future tax payments. They were able to use the
tree, multilayer perceptron neural networks, and a genetic random algorithm to anticipate the amount of future tax each
algorithm to improve accuracy and performance. The model individual should pay.
was evaluated on financial statements from Tehran Stock
Exchange-listed companies, and it predicted financial The purpose of [36] was to identify tax fraud features
statement fraud with a high accuracy (about 80%). with a supervised machine learning model. The researchers
compared numerous models, including Gaussian NB, XG
The study in [30] explores the range of applications of Boost, Random Forest, Decision Tree, and Logistic
machine learning, including recommendation systems fraud Regression. The evaluation metrics showed that artificial
detection, customer behavior prediction, image recognition, neural networks were the most accurate model for predicting
speech recognition, black & white movie colorization, and tax fraud.
accounting fraud detection. The focus is on the use of neural
networks in finance, accounting, and research fields. The The primary goal of the study in [37] was to improve the
researchers emphasize that machine learning in accounting effectiveness of detecting tax fraud in Lithuania by utilizing
research has not yet reached its full potential. data mining technologies. The researchers created models for
segmentation, behavioral templates, risk assessment, and tax
In [31], the researchers discuss the increasing threat of criminal detection. The findings proved the capacity of data
financial fraud and the need for solutions in the financial mining tools to detect tax evasion and access confidential
sector. They present an overview of different fraud data, which can assist reduce revenue losses due to tax
techniques and emphasize the importance of continually evasion. The study's findings can help scientists,
improving fraud detection systems. Machine learning and professionals, and decision-makers anticipate tax fraud
data mining techniques, such as classification, clustering, and detection in developing countries.
regression, have been widely used in recent studies for fraud
prevention. The researchers offer a paradigm for identifying tax
fraud in [38]. There are four modules in the framework:
In reference [32], researchers employed machine
learning approaches to solve the difficulty of detecting fraud Monitored Module: A tree-based model is used in this
among a varied set of taxpayers. They created a fraud module to draw knowledge from the data. It uses labeled data
prediction model with gradient boosting as the core method. to train the model in a supervised learning method. The
Despite working with a limited sample size and dealing with objective is to identify data correlations and trends that may
widely defined fraud, the study was able to identify key point to probable fraud.
elements from tax returns with little further information. The
results showed that the projected fraud rate among the top Unsupervised Module: The unsupervised module is
cases was almost 1.85 times higher than the average observed responsible for determining anomaly scores. It identifies
rate. This study demonstrates the usefulness of the proposed patterns that deviate significantly from the norm or exhibit
model in predicting and identifying potential cases of fraud unusual behavior. These anomalies can be indicative of
within the taxpayer community. fraudulent activities.

In [33], used powerful machine learning techniques to Behavioral Module: The behavioral module calculates a
detect tax evasion. To find optimal weights, the researchers taxpayer's compliance score. It assesses the taxpayer's
modified the multilayer perceptron neural network with an historical behavior, such as past compliance with tax
improved particle swarm optimization (IPSO) technique. regulations, timely filing of returns, and acc Prediction
They also improved support vector machine (SVM) Module: To ascertain the possibility of fraud for each tax
classifiers by adjusting their settings. The suggested IPSO- return, the prediction module makes use of the outputs from
MLP model beat the IPSO-SVM, logistic regression, SVM, the previous modules. To produce a thorough fraud prediction
Naive Bayes, k-nearest neighbor, AdaBoost, and C5.0 score, it incorporates the findings from the behavioral,
decision tree models in terms of accuracy. The IPSO-MLP unsupervised, and supervised modules. Accuracy of reported
model obtained 93.68% accuracy, whereas the IPSO-SVM information. A low compliance score may suggest a higher
model achieved 92.24%. likelihood of fraud.

IJISRT24NOV1974 www.ijisrt.com 3038

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The effectiveness of the framework was demonstrated A. Naive Bayes (NB).
by testing it on actual tax returns provided by the Saudi tax B. Decision tree.
administration. The researchers evaluated its performance in C. Random Forest.
detecting tax fraud based on the framework's outputs and D. Neural network.
compared them to known instances of fraud. The results E. K-mean clustering.
showed the framework's ability to effectively identify F. Self-Organizing Map (SOM).
potential cases of tax fraud.
A. Naive Bayes (NB)
Overall, this study presents a comprehensive framework Each pair of features is presumed to be independent by
that combines the techniques of supervised and unsupervised the naive Bayes technique, which is based on the Bayes
learning with behavioral analysis for detecting tax fraud. By theorem. It works well and may be used for both binary and
integrating multiple modules, it provides a holistic approach multi-category applications, such as text or document
to identifying potentially fraudulent activities in tax returns. classification, spam filtering, and so on. The NB classifier can
be used to build a reliable prediction model and classify noisy
II. THE ALGORITHMS AND TECHNIQUES occurrences in the data. The primary advantage is that it
USED IN TAX AND FINANCIAL takes less training data than more involved approaches,
FRAUD DETECTION allowing for faster estimation of the parameters. However,
because it makes such strong assumptions about the
In the previous papers the researchers used many ML independence of features, its performance may be
and DL techniques and different kinds of learning supervised compromised. The most common NB classifier modifications
and un supervised learning as follow: are Gaussian, Complement, Multinomial, Categorical, and
Bernoulli[38].

Fig 1 Naive Bayes

B. Decision Tree (DT) user behavior analytics and cybersecurity analytics,

Decision trees are a popular nonparametric supervised respectively. DT classifies occurrences by organizing the
learning approach. It applied DT learning techniques to both tree's nodes from root to leaf. Examining the tree's root node
classification and regression problems. The most prevalent and moving along the branch that corresponds to the attribute
DT algorithms are CART, ID3, C4.5, and regression. value in order to sort the instances according to their defined
Furthermore, Sarker et al.'s newly developed Intrude Tree and features. Two typical splitting criteria are "gini" for the Gini
Behav DT are effective in the relevant application fields of impurity and "entropy" for achieving.

IJISRT24NOV1974 www.ijisrt.com 3039

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 2 Decision Tree

C. Random Forest (RF) paired with random feature selection to generate a collection
A random forest classifier is an ensemble classification of decision trees with controlled variation.
strategy that is utilized in a variety of machine learning and
data science applications. This method employs "parallel It may be used to tackle classification and regression
ensembling," which parallelizes the fitting of many decision problems, and it is effective with both continuous and
tree classifiers to different dataset subsamples and uses categorical data [39]. A random forest, an ensemble approach
averages to reach the conclusion, final choice, or majority that generates multi-decision trees, is a variant of the Decision
vote. As a result, it decreases overfitting while also improving Tree. In a random forest, each decision tree is built from a
prediction and control precision. As a result, an RF learning subset of features rather than every feature, which would
model with several decision trees outperforms a model with necessitate using every feature. The final class prediction is
only one decision tree. Bootstrap aggregation (bagging) is based on a majority vote among the trees, and the trees
forecast the class outcome[40]

Fig 3 Random Forest

IJISRT24NOV1974 www.ijisrt.com 3040

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
D. K-means Clustering centroids and then allocates each data point to the nearest
Clustering using K-means. When datasets are scattered, cluster possible. The findings may be unequal because the
the resilient, rapid, and simple K-means clustering method selection process begins with randomly chosen cluster
yields accurate results. This method distributes a cluster's data centers. The K-means clustering approach is susceptible to
points in a way that minimizes the squared distance between outliers because extreme numbers can easily change the
the data points and the cluster's centroid. To put it another mean[39].
way, the K-means algorithm calculates the k number of

Fig 4 K-means Clustering

E. Artificial Neural Network and Deep Learning outperforms other methods in many circumstances, especially
A broad family of artificial neural networks (ANN) that when learning from massive datasets.
rely on machine learning and representation learning
approaches includes deep learning. Deep learning offers a Deep learning methods commonly utilized include
computational framework for data learning by combining Convolutional Neural Networks (CNN), Long Short-Term
many processing levels, including input, hidden, and output Memory Recurrent Neural Networks (LSTM-RNN), and
layers. Deep learning's primary advantage is that it Multi-Layer Perceptron (MLP) [39, 40].

Fig 5 Multilayer Perceptron

IJISRT24NOV1974 www.ijisrt.com 3041

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
F. A Self-Organizing Map (SOM) dimensionality. Instead of employing error corrective
A self-organizing map (SOM) is an artificial neural learning (e.g., backpropagation with gradient descent), as
network (ANN) trained using unsupervised learning to other ANNs do, SOMs use competitive learning, which uses
construct a discretized, low-dimensional (usually two- a neighborhood function to preserve the input space's
dimensional) representation of the training samples' input topological features[41].
space. This representation, known as a map, serves to reduce

Fig 6 Self-Organizing Map (SOM)

G. Autoencoder successfully reconstructs the data from the features collected.

An autoencoder is made consisting of an encoder and a It indicates that the encoder's features, which represent the
decoder, both symmetrical. according to Figure 3. The data's content, are those features. It is critical to understand
encoder extracts features from the raw data. The decoder that no part of this procedure requires monitored information.
reconstructs the data from the features it has extracted. During There are other varieties of autoencoders, such as sparse and
training, the divergence between the encoder input and sparsely noisy ones[7].
decoder output gradually decreases. When the decoder

Fig 7 An Autoencoder

IJISRT24NOV1974 www.ijisrt.com 3042

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Table1 Summary of Previous Studies to Trying Detecting Tax and Financial Evasion using
Artificial Intelligence and Machine Learning
Ref Year Technology Study description
[4] 2018 Supervised ML techniques The researchers proposed a system that allows tax authorities to prioritize
audits based on data rather than previously classified data.
[5] 2018 Compositional CNN-RNN model Based on the official transaction code seen on tax bills, researchers
framework proposed a compositional CNN-RNN model architecture with an attention
mechanism for describing transaction behavior.
[6] 2019 AI and Big Data The paper looked into the issues of tax planning in the age of artificial
intelligence and big data
[7] 2019 Advanced methodologies in big In this scientific research project, the Tax Administration and the
data analytics, as well as the University of Novi Sad's Faculty of Sciences worked together to create
development of artificial algorithms that use advanced big data analytics techniques and machine
intelligence using machine learning to create artificial intelligence in order to identify the risk of tax
learning evasion. The project's initial results were featured in the article.
[8] 2019 Unsupervised conditional The researchers suggested the unsupervised conditional adversarial
adversarial network (UCAN) network (UCAN) as a universal architecture for detecting tax evasion.
This is the first solution for addressing audit tasks in unlabeled target
domains via inter-region transfer.
[9] 2019 Machine Learning improved They used Multilayer Perceptron neural network (MLP) models and
predictive tools by utilizing powerful machine learning predictive approaches to aid in the detection of
Multilayer Perceptron neural tax fraud using personal income tax returns (IRPF, in Spanish) filed in
network (MLP) models. Spain.
[10] 2019 Hierarchical clustering and This study has a dual research goal. It first sought to understand how
descriptive statistics SMEs perceived the current situation and the necessary steps to cut back
on the corresponding red tape. Secondly, it made an effort to establish a
link between tax burdens and entrepreneurship using hierarchical cluster
analysis and descriptive statistics.
[11] 2019 Mining techniques The study aimed to use mining techniques to provide solutions to financial
fraud
[12] 2020 Q-learning combined with recent The problem of figuring out a risk-averse, self-interested tax entity's
Deep Reinforcement Learning approach was investigated in the paper. They were able to get
advancements approximations of the answers by combining Q-learning with new
discoveries in Deep Reinforcement Learning .
[13] 2020 An expert system on tax evasion The study used a performance model and an expert system in the area of
and a performance model. tax evasion to provide an abstract solution.
[14] 2020 The recognition of some predictive It was investigated how tax authorities could benefit from their operational
modeling techniques data. The purpose of this work was to uncover machine learning
algorithms that are good at detecting a specific form of fraud[10].
[15] 2020 SVM, a type of supervised ML To leverage the data to identify the fraud occurring at the bank, the
technology researchers used data mining methods.
[16] 2020 ML approaches Using a dataset of tax defaults and non-defaults at Finnish limited liability
businesses, the proposed prediction system was validated.
[17] 2020 ML methods: unsupervised and The Norwegian Tax Administration is interested in understanding how to
semi-supervised. choose unsupervised and semi-supervised machine learning approaches
that are effective at detecting abnormal tax returns. Additionally, they
have looked into the detection techniques and how the various dataset
characteristics affect how well they work.
[18] 2020 Unsupervised anomaly detection The researchers' key argument in this work was that sample selection bias
techniques causes the small number of labeled data points (known fraud/legal cases)
in the tax fraud detection domain to not be representative of the population
as a whole.
[19] 2020 AI, Big Data, and Blockchain In this study, the existing research on audit and tax in relation to
developing technologies was reviewed. In addition, presents a future
research agenda.
[20] 2020 Model-free RL in AI-based
[21] 2020 This approach combined suitably, To help tax administrations uncover tax avoidance and evasion, this study
through a variety of DM and visual introduced an innovative approach named MALDIVE.
analytics techniques .

IJISRT24NOV1974 www.ijisrt.com 3043

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[22] 2020 logical generalization, expert According to the study, identifying tax evasion is one of tax
evaluation, statistical analysis administration's primary responsibilities. It also developed a methodology
for predicting the risk of tax evasion using both quantitative and
qualitative data.
[23] 2020 Big data strategy The study's goal was to simulate expats' tax-related behavior.
[24] 2020 ML The key driving force behind this endeavor was the use of machine
Random Forest learning to enhance decision-making in fiscal audit plans for service taxes
for the municipality of Sao Paulo.
[25] 2021 Artificial Intelligence (AI) Based on criteria including the complexity of the tax system, tax
education, legal sanctions, and relationships with the tax authorities,
researchers seek to determine the function of AI in the Indian taxation
system.
[26] 2021 A hybrid ML-based approach The researchers developed a novel strategy for managing the risk of tax
DT model, K-means fraud utilizing a hybrid approach based on machine learning that has
numerous characteristics.
[27] 2021 Use tools from ML and network The researchers proved that utilizing machine learning and network
science science technologies, it is possible to automatically detect tax evasion
tendencies that are comparable to those previously recognized by humans.
[28] 2021 Different classifiers RFKNN, NN, This investigation introduced a machine learning-based technique.
and SVM. that can determine if a corporation is engaging in fraud or not. Four
distinct classifiers were used to analyze tax and financial data from diverse
organizations.
[29] 2021 SVM with an improved ID3 The present study's objective was to offer a hybrid method for detecting
decision tree is used as a hybrid financial fraud that combines a support vector machine with an enhanced
approach, and also for improving ID3 decision tree.
the accuracy and performance of
the multilayer perceptron neural
networks and genetic algorithm.
[30] 2021 An artificial neural network The researchers cited fraud detection in recommendation systems,
consumer behavior forecasting, picture and speech recognition,
colorization of black and white films, and accounting fraud detection as
some examples of the wide range of applications in which ML is used.
[31] 2021 Clustering, classification, and .In this paper, the researchers proposed a state of art on different fraud
regression. techniques, also, they were forced to continually improve fraud detection
systems
[32] 2022 Gradient Boosting Machine The researchers created a machine learning-based fraud prediction model,
Learning Tools employing gradient boosting as their first choice.
[33] 2022 Robust machine learning methods are used in this study to solve the
particle swarm optimization identification of the tax evasion problem.
(IPSO) algorithm In this study, we established the best weight and ideal parameters for
support vector machine (SVM) classifiers by optimizing the multilayer
perceptron neural network using an advanced optimization of particle
swarms (IPSO) technique.
[34] 2022 . The researchers developed three The research in this study suggested integrating trigger data from
machine learning models: decision taxpayers as a decision support system along with machine learning-based
tree, random forest, and logistic predictive analytics to take advantage of realizing latent tax opportunities.
regression. This study provided more specific predictive analytics algorithms that can
accurately identify which potential taxpayers are most likely to pay their
fair share.
[35] 2022 feature extraction and the random The objective of this study establish a fraud detection system for tax
algorithm
[36] 2023 Supervised machine learning The purpose of this study was to identify symptoms of tax fraud using the
models include most reliable supervised machine learning model.
GaussianNB,XGBoost, Random .
Forest, Decision Tree, and Logistic
Regression.
[37] 2023 Data Mining Techniques This study's main objective is to better detect tax fraud by using data
mining tools to investigate the effects of wealth in Lithuania.
[38] 2023 A supervised module, an The work done in this study was focused on suggesting a framework for
unsupervised module, a behavioral detecting tax fraud
module, and a prediction module

IJISRT24NOV1974 www.ijisrt.com 3044

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
III. CONCLUSION [11]. N. Sael and F. Benabbou, “ScienceDirect
ScienceDirect Performance of machine learning
Many governments around the world rely heavily on techniques in the detection of Performance of machine
taxes ,but tax administrations are facing many problems learning techniques in the detection of financial frauds
and challenges, especially with regard to tax fraud by financial frauds,” Procedia Comput. Sci., vol. 148, no.
taxpayers therefore, tax administrations around the world Icds 2018, pp. 45–54, 2019, doi:
seek to develop their tax systems supported by machine 10.1016/j.procs.2019.01.007.
learning such as clustering methods like k-mean, Self- [12]. N. D. Goumagias, D. Hristu-varsakelis, and Y. M.
Organizing Map (SOM), and classification methods as Assael, “Using deep Q-learning to understand the tax
decision tree, random forest, naive bayes, and neural network, evasion behavior of risk-averse firms,” pp. 1–29,2020.
to increase its efficiency and eliminate fraud and tax evasion. [13]. I. S. Conference and E. Sarajevo, “Expert Systems as
a Means in Detecting Tax Evasion,” no. September,
REFERENCES pp. 18–20, 2020.
[14]. A. Z. Adamov, “Machine Learning and Advanced
[1]. S. Mills, “Chapter 1 Taxation Principles and Theory,” Analytics in Tax Fraud Detection,” no. October 2019,
Found. Tax. Law, no. 1908, 1925,[Online]. 2020, doi: 10.1109/AICT47866.2019.8981758.
Available: [15]. C. Reviews, “An income tax fraud detection using
https://www.oup.com.au/data/assets/file/0014/13206 AI,” vol. 7, no. 16, pp. 119–124, 2020.
2/9780190318529_SC.pdf [16]. M. Z. Abedin, H. Mohammad, D. Science, N. Science,
[2]. http:\\ar,wikipedia.or and G. Bishwabidyalay, “Tax Default Prediction using
[3]. Hartnett D," Tax Administration Challenges in Feature Transformation-Based Machine Learning,”
Developing Countries", 4/4/ 2016. no. December, 2020, doi:
[4]. D. De Roux, B. Pérez, A. Moreno, M. Del Pilar 10.1109/ACCESS.2020.3048018.
Villamil, and C. Figueroa, “Tax fraud detection for [17]. N. Gedde, I.-S. Sandvik, and J. Andersson,
under-reporting declarations using an unsupervised “Unsupervised Machine Learning on Tax Returns
machine learning approach,” Proc. ACM SIGKDD Investigating Unsupervised and Semisupervised
Int. Conf. Knowl. Discov. Data Min., pp. 215–222, Machine Learning Methods to Uncover Anomalous
2018, doi: 10.1145/3219819.3219878. Faulty Tax Returns”,2020.
[5]. J. Yu, Y. Qiao, K. Sun, H. Zhang, and J. Yang, [18]. V. Jellis, M. David, P. Bruno, J. Vanhoeyveld, D.
“Poster: Classification of transaction behavior in tax Martens, and B. Peeters, “This item is the archived
invoices using compositional CNN-RNN model,” peer-reviewed author-version of : Value-added tax
UbiComp/ISWC 2018 - Adjun. Proc. 2018 ACM Int. fraud detection with scalable anomaly detection
Jt. Conf. Pervasive Ubiquitous Comput. Proc. 2018 techniques Reference :,” vol. 86, 2020.
ACM Int. Symp. Wearable Comput., pp. 315–318, [19]. O. F. Atayah, “Audit and tax in the context of
2018, doi: 10.1145/3267305.3267597. emerging technologies : A retrospective analysis ,
[6]. J. Shan, “Optimization Strategy of Tax Planning current trends , and future opportunities,” vol. 21, no.
System in the Context of Artificial Intelligence and November 2020, pp. 95–128, 2021, doi:
Big Data,” in Journal of Physics: Conference Series, 10.4192/1577-8517-v21.
2019, vol. 1345, no. 5, doi: 10.1088/1742- [20]. S. Zheng et al., “The AI Economist: Improving
6596/1345/5/052006. Equality and Productivity with AI-Driven Tax
[7]. J. Atanasijević, D. Jakovetić, N. Krejić, N. Krklec- Policies,” Apr. 2020, [Online]. Available:
Jerinkić, and D. Marković, “Using big data analytics http://arxiv.org/abs/2004.13332.
to improve the efficiency of tax collection in the Tax [21]. W. Didimo, L. Grilli, G. Liotta, F. Montecchiani, and
Administration of the Republic of Serbia,” Ekon. D. Pagliuca, “Combining Network Visualization and
Preduz., vol. 67, no. 1–2, pp. 115–130, 2019, doi: Data Mining for Tax Risk Assessment,” pp. 16073–
10.5937/ekopre1808115a. 16086, 2020.
[8]. R. Wei, B. Dong, Q. Zheng, X. Zhu, J. Ruan, and H. [22]. A. Musayev and M. Gazanfarli, “Modeling the
He, “Unsupervised Conditional Adversarial Networks Probability of the Detection Process of Tax Evasion
for Tax Evasion Detection,” Proc. - 2019 IEEE Int. Taking into Account Quality and Quantity Indicators,”
Conf. Big Data, Big Data 2019, pp. 1675–1680, 2019, Asian J. Econ. Bus. Account., vol. 18, no. 4, pp. 28–
doi: 10.1109/BigData47090.2019.9005656. 37, 2020, doi: 10.9734/ajeba/2020/v18i430291.
[9]. D. Rodr, “Tax Fraud Detection through Neural [23]. A. H. Miller and C. Republic, “Using Database
Networks : An Application Using a Sample of Approach , With Big Data And Unsupervised Machine
Personal Income Taxpayers,” 2019, doi: Learning To Model Tax Behavior In The Expatriate
10.3390/fi11040086. Community,” no. October, 2020.
[10]. “Tax-Related Burden on SMEs in the European [24]. A. Ippolito and A. C. G. Lozano, “Tax crime
Union : The Case of Slovenia Dejan Ravšelj Polonca prediction with machine learning: A case study in the
Kovač Aleksander Aristovnik,” vol. 2117, pp. 69–79, municipality of São Paulo,” ICEIS 2020 - Proc. 22nd
2019, doi: 10.2478/mjss-2019-0024. Int. Conf. Enterp. Inf. Syst., vol. 1, no. Iceis, pp. 452–
459, 2020, doi: 10.5220/0009564704520459.

IJISRT24NOV1974 www.ijisrt.com 3045

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[25]. A. Rathi, S. Sharma, G. Lodha, and M. Srivastava, “A [38]. N. Alsadhan, “A Multi-Module Machine Learning
Study on Application of Artificial Intelligence and Approach to Detect Tax Fraud,” Comput. Syst. Sci.
Machine Learning in Indian Taxation System,” no. Eng., vol. 46, no. 1, pp. 241–253, 2023, doi:
February, 2021, doi: 10.17762/pae.v58i2.2265. 10.32604/csse.2023.033375.
[26]. J. Atanasijevi, “Tax Evasion Risk Management Using [39]. I. Sadgali, N. Sael, and F. Benabbou, “Performance of
a Hybrid Unsupervised Outlier Detection Method,” machine learning techniques in the detection of
no. 451, p. 30, 2021. financial frauds,” Procedia Comput. Sci., vol. 148, no.
[27]. M. Zumaya et al., “Identifying Tax Evasion in Mexico Icds 2018, pp. 45–54, 2019, doi:
with Tools from Network Science and Machine 10.1016/j.procs.2019.01.007.
Learning,” Underst. Complex Syst., pp. 89–113, 2021, [40]. I. H. Sarker, “Machine Learning: Algorithms, Real-
doi: 10.1007/978-3-030-81484-7_6. World Applications and Research Directions,” SN
[28]. J. P. A. Andrade et al., “A Machine Learning-based Comput. Sci., vol. 2, no. 3, 2021, doi:
System for Financial Fraud Detection,” pp. 165–176, 10.1007/s42979-021-00592-x.
2021, doi: 10.5753/eniac.2021.18250. [41]. A. . Shujaaddeen, F. M. . Ba-Alwi, and G. Al-
[29]. A. Javadian, A. Ali, P. Aghajan, and M. Hosseini, “A Gaphari, “A New Machine Learning Model for
Hybrid Model Based on Machine Learning and Detecting levels of Tax Evasion Based on Hybrid
Genetic Algorithm for Detecting Fraud in Financial Neural Network ”, Int J Intell Syst Appl Eng, vol. 12,
Statements,” J. Optim. Ind. Eng., vol. 14, no. 2, pp. no. 11s, pp. 450–468, Jan. 2024.
169–186, 2021, doi: [42]. T. Germano, “Self Organizing Maps @
10.22094/JOIE.2020.1877455.1685. davis.wpi.edu,” p. 4, 1999, [Online]. Available:
[30]. X. Zhang, “Construction and Simulation of Financial http://davis.wpi.edu/~matt/courses/soms/.
Audit Model Based on Convolutional Neural [43]. A. M. Ozbayoglu, M. U. Gudelek, and O. B. Sezer,
Network,” Comput. Intell. Neurosci., vol. 2021, pp. 1– “Deep learning for financial applications: A survey,”
11, 2021. Appl. Soft Comput. J., vol. 93, pp. 1–52, 2020, doi:
[31]. M. Vlad and S. Vlad, “The Use of Machine Learning 10.1016/j.asoc.2020.106384.
Techniques in Accounting . A Short,” J. Soc. Sci.
Fascicle, vol. 4, pp. 1–5, 2021.
[32]. V. Baghdasaryan, H. Davtyan, and A. Sarikyan,
“Improving Tax Audit Efficiency Using Machine
Learning : The Role of Taxpayer ’ s Network Data in
Fraud Detection Improving Tax Audit Efficiency
Using Machine Learning : The Role of Taxpayer ’ s
Network Data in Fraud Detection,” Appl. Artif. Intell.,
vol. 00, no. 00, pp. 1–23, 2022, doi:
10.1080/08839514.2021.2012002.
[33]. H. Mojahedi, A. Babazadeh Sangar, and M. Masdari,
“Towards Tax Evasion Detection Using Improved
Particle Swarm Optimization Algorithm,” Math.
Probl. Eng., vol. 2022, 2022, doi:
10.1155/2022/1027518.
[34]. J. Perbendaharaan, K. Negara Dan Kebijakan Publik,
R. David Febriminanto, and M. Wasesa, “Indonesian
Treasury Review Machine Learning for Predicting
Tax Revenue Potential,” Keuangan Negara dan
Kebijakan Publik, 2022. [Online]. Available:
www.pajak.com
[35]. A. Menon, D. Khator, D. Prajapati, and A. Ekbote,
“IPL Prediction Using Machine Learning,” Indian J.
Comput. Sci., vol. 7, no. 3, pp. 274–276, 2022, doi:
10.17010/ijcs/2022/v7/i3/171267
[36]. B. F. Murorunkwere, D. Haughton, J. Nzabanita, and
I. Kabano, “Predicting tax fraud using supervised
machine learning approach,” African J. Sci. Technol.
Innov. Dev., vol. 0, no. 0, pp. 1–12, 2023, doi:
10.1080/20421338.2023.2187930.
[37]. T. Ruzgas, L. Kižauskienė, M. Lukauskas, E.
Sinkevičius, M. Frolovaitė, and J. Arnastauskaitė,
“Tax Fraud Reduction Using Analytics in an East
European Country,” Axioms, vol. 12, no. 3, p. 288,
Mar. 2023, doi: 10.3390/axioms12030288.