0% found this document useful (0 votes)

45 views7 pages

Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network

In this paper a novel approach for detecting fraudulent financial statements by employing a combination of neural networks and synthetic minority over sampling technique (SMOTE) is introduced. This approach is designed to tackle the problem of imbalanced datasets prevalent in fraudulent cases, which if left unaddressed will hinder the model to accurately identify fraud. Three neural network models, each representing different fraud predictors as the input layer: 28 inputs raw financial data; 14 inputs financial ratios data; and 42 inputs combination both raw financial and financial ratios data are developed. Experimental validation using established research datasets is conducted to assess the performance of the proposed method. Performance metrics, namely area under the curve (AUC), precision, and sensitivity, are used for evaluation, comparing the proposed model against existing benchmark models found in literature. Results indicate that the proposed model achieves an AUC score of 70.6% and a precision score of 2.89%, in comparable to the existing models, with a sensitivity score of 83% outperforming all counterparts. The high sensitivity rate of the proposed model underscores its practical utility for auditors and regulators, as it minimizes the risk of false negatives, thereby enhancing confidence in fraud detection.

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views7 pages

Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network

Uploaded by

IAES IJAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 4, December 2024, pp. 4106~4112

ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i4.pp4106-4112  4106

Detecting fraudulent financial statement under imbalanced data

using neural network

Hendra Tjahyadi, Yosua Efraim Young

Study Program of Informatics, Faculty of Computer Science, Universitas Pelita Harapan, Jakarta, Indonesia

Article Info ABSTRACT

Article history: In this paper a novel approach for detecting fraudulent financial statements by
employing a combination of neural networks and synthetic minority over-
Received Dec 29, 2023 sampling technique (SMOTE) is introduced. This approach is designed to tackle
Revised Apr 19, 2024 the problem of imbalanced datasets prevalent in fraudulent cases, which if left
Accepted Jun 8, 2024 unaddressed will hinder the model to accurately identify fraud. Three neural
network models, each representing different fraud predictors as the input layer:
28 inputs raw financial data; 14 inputs financial ratios data; and 42 inputs
Keywords: combination both raw financial and financial ratios data are developed.
Experimental validation using established research datasets is conducted to
Fraudulent financial statements assess the performance of the proposed method. Performance metrics, namely
Machine learning area under the curve (AUC), precision, and sensitivity, are used for evaluation,
Neural network comparing the proposed model against existing benchmark models found in
Supervised learning literature. Results indicate that the proposed model achieves an AUC score of
Synthetic minority over- 70.6% and a precision score of 2.89%, in comparable to the existing models,
sampling technique with a sensitivity score of 83% outperforming all counterparts. The high
sensitivity rate of the proposed model underscores its practical utility for
auditors and regulators, as it minimizes the risk of false negatives, thereby
enhancing confidence in fraud detection.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Hendra Tjahyadi
Study Program of Informatics, Faculty of Computer Science, Universitas Pelita Harapan
Jakarta, Indonesia
Email: [email protected]

1. INTRODUCTION
Financial statement misstatements may arise from either fraud or error, as stated by the International
Federation of Accountants [1]. It is the auditor’s responsibility to provide reasonable assurance that the
financial statements are free from material misstatement. Misleading financial statements can incur significant
costs, especially for investors, regulators, and society at large, as demonstrated in the Enron scandal—one of
the most notable audit and accounting scandals in history and literature [2]–[4]. It began when Enron shocked
the public by reporting a $638 million loss. This case implicated its auditor, Arthur Andersen, which failed to
detect the misstatement and engaged in document shredding related to Enron audits. This highlights the
difficulty in detecting accounting misstatements.
Detecting accounting misstatements can be challenging due to several reasons such as the complexity
of financial transactions, sophisticated fraud schemes, vast amount of data, and human error and bias. These
challenges underscore the need for innovative approach such as data analytics and machine learning in auditing
[3]–[5]. These technologies offer the potential to enhance audit effectiveness, improve risk assessment, and
mitigate the impact of human limitations on audit quality.
Although data analytics and machine learning are expected to demonstrate a superior method, they
are seldom to use on performing audit procedures. It is relatively unknown whether usage of data analytics and

Journal homepage: http://ijai.iaescore.com

Int J Artif Intell ISSN: 2252-8938  4107

machine learning are indeed transformational for the audit [6]. Various research has been conducted in
searching of fraudulent financial statement detection, including utilization of supervised learning and
unsupervised learning. Supervised learning is used including various models such as neural network [7]–[11],
genetic algorithm [12], decision tree (DT) [8]–[10], [13], Bayesian network [8], [9], support vector machines
(SVM) [8], [13]–[15], and logistic regression (LR) [16]. Unsupervised learning implementation use algorithm
such as self-organizing map [17], [18] and k-means clustering [17]. One significant obstacle in machine
learning is the imbalanced data challenge, where unequal class representation leads to inaccurate detection,
with majority classes overshadowing minorities. Publicly available financial statements often exhibit severe
imbalance due to the rarity of fraudulent instances compared to non-fraudulent ones. Therefore, it is crucial for
models to address this imbalance.
This research aims to develop a model for predicting fraudulent financial statements from real public
datasets and to tackle the imbalanced data issue. Three neural networks models with different types of inputs,
namely raw financial data, financial ratios data, and a combination thereof, combined with synthetic minority
over-sampling technique (SMOTE), are proposed in this study. The rest of this paper is organized as follows:
in section 2, we outline previous efforts by researchers to detect fraudulence in financial statement, both using
commonly balanced simulation data and imbalanced real data. Section 3 details the method we propose,
utilizing a combination of neural networks and SMOTE to detect fraudulence in highly imbalanced real data.
In section 4, the experimental results are presented and compared with those of previous researchers. Finally,
in section 5, we conclude with a summary of our findings.

2. LITERATURE REVIEW
The existing literature has focused considerable attention on financial data as crucial indicators of
fraud, encompassing both raw financial data and financial ratios. As fundamental components of financial
statements, financial data have the potential to indicate fraud risk. For example, a liquidity ratio derived directly
from raw financial data could serve as an effective measure of a company's financial pressure. This underscores
the superiority of certain ratios over others [19], particularly those financial data points closely linked to the
fraud triangle theory. By leveraging financial data, several approaches utilizing machine learning and data
mining to detect fraudulent financial statement are found in the literature.
Green and Choi [7] demonstrated the potential of neural network applications in fraud investigation
and utilized it as a detection tool, employing 172 samples, with 86 samples for both fraudulent and
non-fraudulent cases. The model achieved an accuracy rate of 74%. Kotsiantis et al. [8] conducted experiments
on DT, artificial neural network (ANNs), Bayesian networks, rule learners, nearest neighbors, and SVM. This
study demonstrated that DT outperformed other models with 91.2% accuracy using a balanced dataset of 164
Greek companies listed on the Athens stock exchange, comprising 41 fraudulent and 123 non-fraudulent cases.
In a similar works, Kirkos et al. [9] conducted experiments using DT, neural network, and Bayesian belief
networks, revealing that Bayesian belief networks outperformed others with 90.3% accuracy using a balanced
dataset of 76 Greek manufacturing companies, including 38 fraudulent and 38 non-fraudulent cases.
Cecchini et al. [14] using SVM, accurately identified 80% of fraudulent cases and 90.6% of
non-fraudulent cases from a dataset comprising 6,427 non-fraudulent and 205 fraudulent samples. This study
was considered a pioneering work in the field. Dechow et al. [16] presented an alternative method using LR
with financial ratios to detect fraudulent financial statements, signaling the likelihood of misstatement.
Perols [10], with a larger dataset of 15,934 non-fraudulent and 51 fraudulent cases, demonstrated that LR and
SVM outperformed neural network, bagging, C.45, and stacking algorithm. These findings were consistent with
those of Yao et al. [20], which showed that SVM had the highest accuracy among various classification methods.
Randhawa et al. [21] investigated the effectiveness of single and hybrid methods, employing
under-sampling to detect credit card fraud. Their study revealed that combining AdaBoost and majority voting
methods yielded the best results. Bao et al. [15] extended this research by using a large public dataset and
compared the results of re-implementing the models proposed in [14], [16] with a new state-of-the-art model
using RUSBoost. The proposed method outperformed the previous models with an area under curve (AUC) of
72.5% and sensitivity and precision of 4.88% and 4.48%, respectively. Hoang et al. [22] employed XGBoost
and f-XGBoost on the dataset used in [15], resulting in AUC scores of 68.9% and 69.3%, and precision and
sensitivity of 3.56%, 5%, and 3.36%, and 4.22%, respectively. Ashtiani and Raahemi [3] found that a single
model outperformed both ensemble and hybrid approaches. They highlighted Temponeras et al. [23] approach
of employing a deep dense multilayer perceptron, achieving an accuracy of 93.7% using a dataset of 164 Greek
companies. Craja et al. [24] using a text mining approach to detect fraudulent financial statements from annual
reports, demonstrated the effectiveness and preference for ANN, emphasizing their ability to capture complex
relationships among variables. Inspired by the effectiveness of neural networks, this study proposes an approach
combining neural networks and SMOTE to detect fraudulent financial statements in an imbalanced dataset.

Detecting fraudulent financial statement under imbalanced data using neural network (Yosua Efraim Young)
4108  ISSN: 2252-8938

3. METHOD
The detection model for fraudulent financial statement proposed in this study utilizes a combination
of neural networks and SMOTE. Initially, a severely imbalanced public dataset containing real financial
statements is acquired. Subsequently, the dataset undergoes preprocessing to address the imbalanced dataset
using SMOTE, which generates synthetic samples for the minority class. The preprocessed data is then used
for training and experimentation on the proposed network models. Finally, the results obtained from employing
neural networks are compared to those achieved by the state-of-the-art algorithm proposed in previous
literature. The overall process workflow is illustrated in Figure 1.

Figure 1. Proposed fraud detection workflow

3.1. Data and variables

The dataset is retrieved from previous research by Bao et al. [15], comprising 146,045 records
collected from the COMPUSTAT database, covering all publicly listed U.S. firms from 1990 to 2014. It
includes 42 features, consisting of 28 raw financial data derived from research by Cecchini et al. [14] and 14
financial ratios as researched by Dechow et al. [16]. Following the previous study, the training dataset spans
from 1991 to the test year, with a two-year gap.
Serial fraud, defined as fraudulent cases spanning multiple years, is present in the dataset. The impact
of serial fraud is that it can inflate the model's performance, as the same fraud case may be included in both the
train and test data [15]. Therefore, to prevent overstated results and benchmark against previous literature, the
dataset is preprocessed by recoding all serial fraud as non-fraudulent.
To address the severe imbalance between fraudulent and non-fraudulent cases in the dataset, we
employ a minority oversampling technique called SMOTE [25]. This technique is necessary to address the
challenge where the minority class is often neglected; for example, fraudulent cases represent only 0.67% of
the population, which is the focus of our attention. SMOTE generates new synthetic data through an iterative
process targeting each point in the minority class. It proves to be an effective method for addressing existing
imbalance cases and improving classification performance [26].

3.2. Proposed artificial neural networks

This study experimented with the utilization of ANN, specifically feedforward networks, to detect
fraudulent financial statements from a severely imbalanced dataset. The experiments involved three models or
networks, each representing different fraud predictors as the input layer. The first network used 28 raw financial
data as input layers derived from the fraud predictors of Cecchini et al. [14]. The second network used 14
financial ratios as input layers derived from the fraud predictors of Dechow et al. [16]. Lastly, the third network
used a combined approach from Cecchini and Dechow as the input layer.
The overall architecture of the proposed networks is illustrated in Figure 2. The architecture of the
three networks comprises an input layer followed by three hidden layers and an output layer. The input layer
encompasses three different scenarios, representing different fraud predictors, which can be represented by
input layers of 28, 14, and 42, respectively. Inspired in [23]–[25], the first and second hidden layers consist of
fully connected layers with LeakyReLU (alpha of 0.05) as the activation function to address complex patterns
and relationships of the fraud predictors and handle non-linearity issues. This is followed by L2 regularization
with a coefficient of 0.005 to add a penalty term to the network to avoid overfitting issues [27]. The Adam
optimizer is chosen for its capabilities of efficient computation, minimal memory requirements, and suitability
for large datasets [28]. Additionally, a dropout layer is added with a rate of 0.7 to randomly drop out neurons
in an attempt to prevent overfitting [29]. Finally, an output layer with a sigmoid function is added to perform
binary classification tasks.

3.3. Performance evaluation

The performance of the proposed model is evaluated using three metrics: AUC, sensitivity, and
precision. AUC is a metric used to evaluate the performance of binary classification [30]. It is employed to assess
the accuracy of the proposed model due to the imbalance in the dataset, where the occurrence of fraudulent
samples is not adequately captured in standard accuracy metrics [31]. Therefore, the AUC score provides a more
representative measure of accuracy in this context compared to commonly used accuracy scores.

Int J Artif Intell, Vol. 13, No. 4, December 2024: 4106-4112

Int J Artif Intell ISSN: 2252-8938  4109

Following previous works [15], the measurement of sensitivity and precision is based on data from
the top 1% of observations from the decision value. This choice is driven by practical considerations, as
regulators may not be able to observe all companies predicted as fraudulent due to resource constraints.
Additionally, this decision is influenced by the results of leading research by Cecchini et al. [14], which
reported a high number of false positives in their SVM performance, correctly classifying 80% of fraud cases
and 90.6% of non-fraud cases. Therefore, to mitigate the allocation of excessive resources toward many false
positives, the focus is on the top 1% of observations.

Figure 2. The architecture of the proposed networks

4. RESULTS AND DISCUSSION

This study proposed an approach for detecting fraudulent financial statements by combining neural
networks and SMOTE. While earlier literature has explored and demonstrated the capability of various
algorithms to detect fraudulent financial statements, there is still enhancement needed specifically in terms of
accuracy. Therefore, developing a model with better accuracy is important, especially a model that can be
relied upon for practical adoption.
Three networks with different fraud predictor as input layer are employed. The results obtained are
summarized in Table 1. In the first network, employing raw financial data as fraud predictor, the proposed
network scored an AUC score of 0.706, with a sensitivity of 50% and 1.39%. The second proposed network
achieved AUC score of 0.693, sensitivity of 67%, and precision of 2.89%, by employing financial ratios as
fraud predictor. Lastly, the third network employing both raw financial data and financial ratios as fraud
predictor resulted in AUC of 0.672, followed by sensitivity of 83% and precision of 1.91%.

Table 1. Summary of comparison (test period 2003–2008)

Fraud predictor AUC Sensitivity (%) Precision (%)
28 raw financial data 0.706 50 1.39
14 financial ratios 0.693 67 2.89
Both 0.672 83 1.91

As presented in Table 1, it is found that the proposed neural network can detect fraudulent financial
statements, with raw financial data is the best fraud predictor measured by AUC score. These results
demonstrated that combining both raw financial data and financial ratios as fraud predictor does not yield to a
higher accuracy, measured by AUC score. From this experimentation, the highest precision is obtained through
employing financial ratios as fraud predictor, whereas the highest sensitivity is obtained through employing
both raw financial data and financial ratios as fraud predictor.
Table 2 provides results obtained from the proposed network for the test period of 2003–2008 in
comparison with previous literature. These results demonstrate that the proposed network has comparable AUC
score and precision in comparison with results obtained from previous literatures by employing algorithms
with SVM, LR, RUSBoost, XGBoost, and f-XGBoost. In contrast, the proposed network demonstrates a
superior sensitivity score, indicating that the model is able to identify fraud without producing a high number
of false negatives, which could translate to undetected fraud. Hence, this demonstrates the model’s assurance
by ensuring reliability and practical utility or adoption. To show robustness of the proposed network, three
alternative test periods are added. Consistent with previous study [15], [22], the additional test periods are

Detecting fraudulent financial statement under imbalanced data using neural network (Yosua Efraim Young)
4110  ISSN: 2252-8938

2003–2005, 2003–2011, and 2003–2014. The numerical figure of performance metrics for different additional
test periods are presented in Tables 3 to 5.

Table 2. Summary of comparison with previous study (test period 2003–2008)

Fraud predictor Model AUC Sensitivity (%) Precision (%) Reference
28 raw financial data SVM 0.626 2.53 1.92 [15]
Logistic 0.690 0.73 0.85 [15]
RUSBoost 0.725 4.88 4.48 [15]
XGBoost 0.689 3.56 3.36 [22]
f-XGBoost 0.693 5.00 4.22 [22]
ANN 0.706 50 1.39 This study
14 financial ratios Logistic 0.672 3.99 2.63 [15]
ANN 0.693 67 2.89 This study
Both RUSBoost 0.696 3.19 2.54 [15]
ANN 0.672 83 1.91 This study

Table 3. Summary of comparison with previous study (test period 2003–2005)

Fraud predictor Model AUC Sensitivity (%) Precision (%) Reference
28 raw financial data SVM 0.637 2.28 2.53 [15]
Logistic 0.685 1.45 1.69 [15]
RUSBoost 0.753 7.64 7.83 [15]
f-XGBoost 0.691 6.59 6.71 [22]
ANN 0.694 100 2.79 This study
14 financial ratios Logistic 0.649 1.37 1.29 [15]
ANN 0.667 67 2.53 This study
Both ANN 0.656 100 2.52 This study

Table 4. Summary of comparison with previous study (test period 2003–2011)

Fraud predictor Model AUC Sensitivity (%) Precision (%) Reference
28 raw financial data SVM 0.647 3.07 1.98 [15]
Logistic 0.702 1.87 1.19 [15]
RUSBoost 0.710 4.40 3.60 [15]
f-XGBoost 0.678 3.69 3.02 [22]
ANN 0.720 56 1.34 This study
14 financial ratios Logistic 0.672 3.49 2.23 [15]
ANN 0.685 67 2.40 This study
Both ANN 0.693 89 2.45 This study

Table 5. Summary of comparison with previous study (test period 2003–2014)

Fraud predictor Model AUC Sensitivity (%) Precision (%) Reference
28 raw financial data SVM 0.628 2.30 1.48 [15]
Logistic 0.709 1.84 1.04 [15]
RUSBoost 0.717 3.30 2.70 [15]
f-XGBoost 0.678 2.77 2.26 [22]
ANN 0.718 50 1.15 This study
14 financial ratios Logistic 0.702 3.45 1.86 [15]
ANN 0.694 58 1.99 This study
Both ANN 0.686 75 2.02 This study

The results obtained are compared with previous literature and summarized in Figure 3. As shown in
Figure 3, SVM model demonstrate fluctuations in the performance while both RUSBoost and XGBoost
demonstrate a performance decline when the range of the set is extended. The results is accord to
Hoang et al. [22], that the assumption of undetected fraud grows over time makes a longer test period less
reliable. However, in contrast to SVM, RUSBoost, and XGBoost model, the proposed network, and logistic
models show a slightly performance improvement for the extension of the test period. This demonstrates the
robustness of both models and is expected to have a stable performance when tested with new unseen data.
Employing raw financial data as fraud predictor, the proposed network demonstrated the best AUC
score in scenario of using full test set of 2003–2014 as shown in Table 5 by scoring AUC of 0.718 with
precision of 1.15%, and sensitivity of 50%. Considering stability of AUC to demonstrate robustness, the
proposed network score AUC of 0.694 and 0.720, in test period of 2003–2005 and 2003–2011 as shown in
Tables 3 and 4, respectively. This shows that expanding dataset improves the performance of the proposed
network. Then, in the next scenario using period of 2003–2014, the AUC score dropped to 0.718, slightly lower
than previous scenario of 2003–2011. This indicates that while expanding dataset improves the performance,

Int J Artif Intell, Vol. 13, No. 4, December 2024: 4106-4112

Int J Artif Intell ISSN: 2252-8938  4111

there may be a diminishing return in a certain length of periods. Consistent with the results of Bao et al. [15],
this study results demonstrated that when experimenting with the same model or networks, using 28 raw
financial data that derived from [14] leads to a better result compared to using the other fraud predictors, which
is 14 financial ratios derived from [16].
This study results shows that combining neural network and SMOTE can detect fraudulent financial
statements in a severely imbalanced dataset using raw financial data, financial ratios, or both combined as the
fraud predictor. While the proposed network demonstrated promising utility, it is important to acknowledge
that the dataset used consists of historical data that coming from specific demographics and time periods. This
may promote limitations on generalizability, hence require further calibration or updates to maintain its
effectiveness in the current dynamic environment.

Figure 3. Summary of AUC scores over the additional test periods

5. CONCLUSION
This paper introduces a neural network designed to detect fraudulent financial statements within an
imbalanced dataset, addressing the severe imbalance issue through the utilization of SMOTE. Our experiment
results indicate that the model achieves detection capabilities, with an AUC score of 70.6%, a sensitivity rate
of 83%, and a precision rate of 2.89%. This study contributes significantly by advocating for the integration of
ANN in auditing practices, particularly during the initial audit phase, such as risk assessment procedures. The
proposed model's high sensitivity rate underscores its superiority over similar models, offering practical utility
for auditors and regulators by minimizing the risk of false negatives. However, limitations exist, including the
reliance solely on numerical financial data extracted from financial statements. Future research avenues could
explore the combination of non-financial data and the application of unsupervised learning to address
mislabeling issues, potentially through the implementation of generative artificial intelligence to generate
fraudulent data for training purposes or describing fraud characteristics.

REFERENCES
[1] “International standard on auditing 240: the auditor’s responsibilities relating to fraud in an audit of financial statements,” IFAC, 2013.
Accessed: Dec. 27, 2023. [Online]. Available: https://www.ifac.org/_flysystem/azure-private/publications/files/A012 2013 IAASB
Handbook ISA 240.pdf.
[2] Y. J. Chen, W. C. Liou, Y. M. Chen, and J. H. Wu, “Fraud detection for financial statements of business groups,” International
Journal of Accounting Information Systems, vol. 32, pp. 1–23, 2019, doi: 10.1016/j.accinf.2018.11.004.
[3] M. N. Ashtiani and B. Raahemi, “Intelligent fraud detection in financial statements using machine learning and data mining: a
systematic literature review,” IEEE Access, vol. 10, pp. 72504–72525, 2022, doi: 10.1109/ACCESS.2021.3096799.
[4] W. Xiuguo and D. Shengyong, “An analysis on financial statement fraud detection for Chinese listed companies using deep
learning,” IEEE Access, vol. 10, pp. 22516–22532, 2022, doi: 10.1109/ACCESS.2022.3153478.
[5] D. Botez, “Recent challenge for auditors: using data analytics in the audit of the financial statements,” Brain-broad Research in
Artificial Intelligence and Neuroscience, vol. 9, no. 4, pp. 61–72, 2018.
[6] G. Salijeni, A. S. -Taddei, and S. Turley, “Big data and changes in audit technology: contemplating a research agenda,” Accounting
and Business Research, vol. 49, no. 1, pp. 95–119, 2019, doi: 10.1080/00014788.2018.1459458.
[7] B. P. Green and J. H. Choi, “Assessing the risk of management fraud through neural network technology,” Auditing, vol. 16, no. 1,
pp. 25–28, 1997.
[8] S. Kotsiantis, E. Koumanakos, D. Tzelepis, and V. Tampakas, “Forecasting fraudulent financial statements using data mining,”
International Journal of Computational Intelligence, vol. 3, no. 2, pp. 104–110, 2006.
[9] E. Kirkos, C. Spathis, and Y. Manolopoulos, “Data mining techniques for the detection of fraudulent financial statements,” Expert
Systems with Applications, vol. 32, no. 4, pp. 995–1003, 2007, doi: 10.1016/j.eswa.2006.02.016.
Detecting fraudulent financial statement under imbalanced data using neural network (Yosua Efraim Young)
4112  ISSN: 2252-8938

[10] J. Perols, “Financial statement fraud detection: an analysis of statistical and machine learning algorithms,” Auditing, vol. 30, no. 2,
pp. 19–50, 2011, doi: 10.2308/ajpt-50009.
[11] C. L. Jan, “Detection of financial statement fraud using deep learning for sustainable development of capital markets under
information asymmetry,” Sustainability, vol. 13, no. 17, pp. 9879–9898, 2021, doi: 10.3390/su13179879.
[12] T. Kiehl, B. Hoogs, L. Christina, and S. Deniz, “Evolving multi-variate time-series patterns for the discrimination of fraudulent
financial filings,” Genetic and Evolutionary Computation Conference, pp. 1-8, 2005.
[13] J. Bertomeu, E. Cheynel, E. Floyd, and W. Pan, “Using machine learning to detect misstatements,” Review of Accounting Studies,
vol. 26, no. 2, pp. 468–519, 2021, doi: 10.1007/s11142-020-09563-8.
[14] M. Cecchini, H. Aytug, G. J. Koehler, and P. Pathak, “Detecting management fraud in public companies,” Management Science,
vol. 56, no. 7, pp. 1146–1160, 2010, doi: 10.1287/mnsc.1100.1174.
[15] Y. Bao, B. Ke, B. Li, Y. J. Yu, and J. Zhang, “Detecting accounting fraud in publicly traded U.S. firms using a machine learning
approach,” Journal of Accounting Research, vol. 58, no. 1, pp. 199–235, 2020, doi: 10.1111/1475-679X.12292.
[16] P. M. Dechow, W. Ge, C. R. Larson, and R. G. Sloan, “Predicting material accounting misstatements,” 39th Annual Contemporary
Accounting Research Conference, vol. 28, no. 1, pp. 17-82, 2011, doi: 10.1111/j.1911-3846.2010.01041.x.
[17] Q. Deng and G. Mei, “Combining self-organizing map and k-means clustering for detecting fraudulent financial statements,” in
2009 IEEE International Conference on Granular Computing, GRC, 2009, pp. 126–131, doi: 10.1109/GRC.2009.5255148.
[18] S. Y. Huang, R. H. Tsaih, and W. Y. Lin, “Unsupervised neural networks approach for understanding fraudulent financial reporting,”
Industrial Management and Data Systems, vol. 112, no. 2, pp. 224–244, 2012, doi: 10.1108/02635571211204272.
[19] T. R. Izzalqurny, B. Subroto, and A. Ghofar, “Relationship between financial ratio and financial statement fraud risk moderated by auditor
quality,” International Journal of Research in Business and Social Science, vol. 8, no. 4, pp. 34–43, 2019, doi: 10.20525/ijrbs.v8i4.281.
[20] J. Yao, Y. Pan, S. Yang, Y. Chen, and Y. Li, “Detecting fraudulent financial statements for the sustainable development of the
socio-economy in China: a multi-analytic approach,” Sustainability, vol. 11, no. 6, 2019, doi: 10.3390/su11061579.
[21] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, “Credit card fraud detection using adaBoost and majority voting,”
IEEE Access, vol. 6, pp. 14277–14284, 2018, doi: 10.1109/ACCESS.2018.2806420.
[22] M. N. Hoang, H. T. L. Nguyen, and H. N. Viet, “A model for detecting accounting frauds by using machine learning,” in The Annual
Hawaii International Conference on System Sciences, 2022, vol. 2022, pp. 1552–1561, doi: 10.24251/hicss.2022.193.
[23] G. S. Temponeras, S. A. N. Alexandropoulos, S. B. Kotsiantis, and M. N. Vrahatis, “Financial fraudulent statements detection
through a deep dense artificial neural network,” in 10th International Conference on Information, Intelligence, Systems and
Applications, IISA 2019, 2019, pp. 1–5, doi: 10.1109/IISA.2019.8900741.
[24] P. Craja, A. Kim, and S. Lessmann, “Deep learning for detecting financial statement fraud,” Decision Support Systems, vol. 139,
2020, doi: 10.1016/j.dss.2020.113421.
[25] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal
of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002, doi: 10.1613/jair.953.
[26] D. Elreedy and A. F. Atiya, “A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class
imbalance,” Information Sciences, vol. 505, pp. 32–64, 2019, doi: 10.1016/j.ins.2019.07.070.
[27] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, Cambridge, Massachusetts: MIT Press, 2016.
[28] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv-Computer Science, pp. 1-15, 2017, doi:
10.48550/arXiv.1412.6980.
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[30] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006, doi:
10.1016/j.patrec.2005.10.010.
[31] N. Japkowicz, “Assessment metrics for imbalanced learning,” Imbalanced Learning: Foundations, Algorithms, and Applications,
pp. 187–206, 2013, doi: 10.1002/9781118646106.ch8.

BIOGRAPHIES OF AUTHORS

Yosua Efraim Young is a graduate candidate of Informatics Graduate Program

from Universitas Pelita Harapan. He earned his Bachelor’s Degree in Accounting from
Universitas Pelita Harapan in Indonesia. He can be contacted at email:
[email protected].

Hendra Tjahyadi is an Associate Professor of Informatics Study Program in

Universitas Pelita Harapan. He earned his Bachelor’s Degree in Electrical Engineering from
Universitas Kristen Maranatha, Master’s Degree in Instrumentation and Control from Institut
Teknologi Bandung, and Ph.D. in Control Engineering from School of Engineering, Flinders
University. His research interests are in adaptive control, signal processing, and artificial
intelligence. He can be contacted at email: [email protected].

Int J Artif Intell, Vol. 13, No. 4, December 2024: 4106-4112

BS en 60584-1-2013
100% (2)
BS en 60584-1-2013
72 pages
" by Nils Gottfries (2013), Palgrave Macmillan. This Is An Advanced
No ratings yet
" by Nils Gottfries (2013), Palgrave Macmillan. This Is An Advanced
6 pages
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
No ratings yet
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
18 pages
Automatic Detection of Dress-Code Surveillance in A University Using YOLO Algorithm
No ratings yet
Automatic Detection of Dress-Code Surveillance in A University Using YOLO Algorithm
8 pages
Artists and Artisans
100% (2)
Artists and Artisans
46 pages
Horsetail Equisetum Hyemale1
No ratings yet
Horsetail Equisetum Hyemale1
8 pages
Dela Warr Camera
No ratings yet
Dela Warr Camera
4 pages
75 Ways Coconut Oil
100% (6)
75 Ways Coconut Oil
32 pages
RW 11 12 Unit 5 Lesson 3 Problem-Solution
No ratings yet
RW 11 12 Unit 5 Lesson 3 Problem-Solution
23 pages
Altivar 61 For Medium Voltage Motors
No ratings yet
Altivar 61 For Medium Voltage Motors
34 pages
Ecs268: Structural & Material Laboratory: I. Objective
No ratings yet
Ecs268: Structural & Material Laboratory: I. Objective
7 pages
Machine Learning Algorithm For Financial Fruad Detection
100% (1)
Machine Learning Algorithm For Financial Fruad Detection
25 pages
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
No ratings yet
Two-Dimensional Klein-Gordon and Sine-Gordon Numerical Solutions Based On Deep Neural Network
13 pages
Hybrid Model Detection and Classification of Lung Cancer
No ratings yet
Hybrid Model Detection and Classification of Lung Cancer
11 pages
Video Forgery: An Extensive Analysis of Inter-And Intra-Frame Manipulation Alongside State-Of-The-Art Comparisons
No ratings yet
Video Forgery: An Extensive Analysis of Inter-And Intra-Frame Manipulation Alongside State-Of-The-Art Comparisons
13 pages
Improved Convolutional Neural Networks For Aircraft Type Classification in Remote Sensing Images
No ratings yet
Improved Convolutional Neural Networks For Aircraft Type Classification in Remote Sensing Images
8 pages
"Standing On The Shoulders of Giants": Dominican College of Tarlac
100% (1)
"Standing On The Shoulders of Giants": Dominican College of Tarlac
3 pages
Aefr 2015 5 (11) 1187 1207
No ratings yet
Aefr 2015 5 (11) 1187 1207
21 pages
Detecting Road Damage Utilizing Retinanet and Mobilenet Models On Edge Devices
No ratings yet
Detecting Road Damage Utilizing Retinanet and Mobilenet Models On Edge Devices
11 pages
FA and Big Data Term Paper Final Draft
No ratings yet
FA and Big Data Term Paper Final Draft
16 pages
U-Net For Wheel Rim Contour Detection in Robotic Deburring
No ratings yet
U-Net For Wheel Rim Contour Detection in Robotic Deburring
14 pages
2209 2950 1 SM
No ratings yet
2209 2950 1 SM
25 pages
Optimizing Deep Learning Models From Multi-Objective Perspective Via Bayesian Optimization
No ratings yet
Optimizing Deep Learning Models From Multi-Objective Perspective Via Bayesian Optimization
10 pages
Adaptive Kernel Integration in Visual Geometry Group 16 For Enhanced Classification of Diabetic Retinopathy Stages in Retinal Images
No ratings yet
Adaptive Kernel Integration in Visual Geometry Group 16 For Enhanced Classification of Diabetic Retinopathy Stages in Retinal Images
12 pages
A Novel Scalable Deep Ensemble Learning Framework For Big Data Classification Via MapReduce Integration
No ratings yet
A Novel Scalable Deep Ensemble Learning Framework For Big Data Classification Via MapReduce Integration
15 pages
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
No ratings yet
Developing A Website For English-Speaking Practice To English As A Foreign Language Learners at The University Level
12 pages
Deep Learning For Detecting Financial Statement Fraud
No ratings yet
Deep Learning For Detecting Financial Statement Fraud
46 pages
Event Detection in Soccer Matches Through Audio Classification Using Transfer Learning
No ratings yet
Event Detection in Soccer Matches Through Audio Classification Using Transfer Learning
9 pages
Fraud Detection in Banking Data by Machine Learning
No ratings yet
Fraud Detection in Banking Data by Machine Learning
11 pages
Exploring DenseNet Architectures With Particle Swarm Optimization: Efficient Tomato Leaf Disease Detection
No ratings yet
Exploring DenseNet Architectures With Particle Swarm Optimization: Efficient Tomato Leaf Disease Detection
9 pages
Hybrid Object Detection and Distance Measurement For Precision Agriculture: Integrating YOLOv8 With Rice Field Sidewalk Detection Algorithm
No ratings yet
Hybrid Object Detection and Distance Measurement For Precision Agriculture: Integrating YOLOv8 With Rice Field Sidewalk Detection Algorithm
11 pages
Deep Ensemble Learning With Uncertainty Aware Prediction Ranking For Cervical Cancer Detection Using Pap Smear Images
No ratings yet
Deep Ensemble Learning With Uncertainty Aware Prediction Ranking For Cervical Cancer Detection Using Pap Smear Images
11 pages
Wishup Interview Prep Naveen Complete
No ratings yet
Wishup Interview Prep Naveen Complete
4 pages
Hybrid Horned Lizard Optimization Algorithm-Aquila Optimizer For DC Motor
No ratings yet
Hybrid Horned Lizard Optimization Algorithm-Aquila Optimizer For DC Motor
10 pages
8 Bertomeu Et Al 2021
No ratings yet
8 Bertomeu Et Al 2021
52 pages
Enhancing Fall Detection and Classification Using Jarratt Butterfly Optimization Algorithm With Deep Learning
No ratings yet
Enhancing Fall Detection and Classification Using Jarratt Butterfly Optimization Algorithm With Deep Learning
10 pages
Enhancing Emotion Recognition Model For A Student Engagement Use Case Through Transfer Learning
No ratings yet
Enhancing Emotion Recognition Model For A Student Engagement Use Case Through Transfer Learning
11 pages
A Contest of Sentiment Analysis: K-Nearest Neighbor Versus Neural Network
No ratings yet
A Contest of Sentiment Analysis: K-Nearest Neighbor Versus Neural Network
9 pages
Primary Phase Alzheimer's Disease Detection Using Ensemble Learning Model
No ratings yet
Primary Phase Alzheimer's Disease Detection Using Ensemble Learning Model
9 pages
Sustainability 10 00513 PDF
No ratings yet
Sustainability 10 00513 PDF
14 pages
RFP DURG EPC S&T Work
No ratings yet
RFP DURG EPC S&T Work
110 pages
Graph-Based Methods For Transaction Databases: A Comparative Study
No ratings yet
Graph-Based Methods For Transaction Databases: A Comparative Study
10 pages
Finding Needles in A Haystack: Using Data Analytics To Improve Fraud Prediction
No ratings yet
Finding Needles in A Haystack: Using Data Analytics To Improve Fraud Prediction
53 pages
A Comparative Analysis of Exponential Smoothing Method and Deep Learning Models For Bitcoin Price Prediction
No ratings yet
A Comparative Analysis of Exponential Smoothing Method and Deep Learning Models For Bitcoin Price Prediction
9 pages
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
No ratings yet
Multi-Task Deep Learning For Vietnamese Capitalization and Punctuation Recognition
11 pages
Compatible Final Proofread AI Fraud Detection For FinTech
No ratings yet
Compatible Final Proofread AI Fraud Detection For FinTech
58 pages
Isp98 Confirming Undertaking
No ratings yet
Isp98 Confirming Undertaking
5 pages
Deep Learning-Based Techniques For Video Enhancement, Compression and Restoration
No ratings yet
Deep Learning-Based Techniques For Video Enhancement, Compression and Restoration
13 pages
Strategic Fraud Detection
No ratings yet
Strategic Fraud Detection
30 pages
Portfolio Write-Up
No ratings yet
Portfolio Write-Up
4 pages
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
No ratings yet
Hindi Spoken Digit Analysis For Native and Non-Native Speakers
7 pages
191 - 197 - Detection of Transaction Fraud Using Deep Learning
No ratings yet
191 - 197 - Detection of Transaction Fraud Using Deep Learning
28 pages
JCP 05 00009
No ratings yet
JCP 05 00009
36 pages
A Hyperparameters Tunned ML Algorithm For Fraud Identification in Banking and Financial Transactions
No ratings yet
A Hyperparameters Tunned ML Algorithm For Fraud Identification in Banking and Financial Transactions
7 pages
2023 Usnco National Exam Part III
No ratings yet
2023 Usnco National Exam Part III
14 pages
Pero Ls 2011
No ratings yet
Pero Ls 2011
32 pages
Q2 Project Instructions
No ratings yet
Q2 Project Instructions
12 pages
Ashtiani 2022
No ratings yet
Ashtiani 2022
22 pages
Financial Distress Prediction Using Machine Learning
No ratings yet
Financial Distress Prediction Using Machine Learning
5 pages
Artificial Intelligence Algorithms To Predict Customer Satisfaction: A Comparative Study
No ratings yet
Artificial Intelligence Algorithms To Predict Customer Satisfaction: A Comparative Study
9 pages
Fraud Detection: Data Mining
No ratings yet
Fraud Detection: Data Mining
5 pages
Intelligent Fraud Detection in Financial Statements Using Machine Learning and Data Mining: A Systematic Literature Review
No ratings yet
Intelligent Fraud Detection in Financial Statements Using Machine Learning and Data Mining: A Systematic Literature Review
25 pages
Evaluating ChatGPT's Mandarin "Yue" Pronunciation System in Language Learning
No ratings yet
Evaluating ChatGPT's Mandarin "Yue" Pronunciation System in Language Learning
8 pages
A Comparative Study of Natural Language Inference in Swahili Using Monolingual and Multilingual Models
No ratings yet
A Comparative Study of Natural Language Inference in Swahili Using Monolingual and Multilingual Models
8 pages
1232-Article Text-2726-2-10-20240615
No ratings yet
1232-Article Text-2726-2-10-20240615
22 pages
Seminar Merged
No ratings yet
Seminar Merged
20 pages
Fraud Detection Automation Through Data Analytics and Artificial Intelligence
No ratings yet
Fraud Detection Automation Through Data Analytics and Artificial Intelligence
18 pages
Craig Ch03
No ratings yet
Craig Ch03
46 pages
A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text
No ratings yet
A Proposed Approach For Plagiarism Detection in Myanmar Unicode Text
9 pages
Framework For Detecting Risk of Financial Statemen
No ratings yet
Framework For Detecting Risk of Financial Statemen
17 pages
Credit Card Fraud Detection Using Machine Learning PDF
No ratings yet
Credit Card Fraud Detection Using Machine Learning PDF
6 pages
Credit Card Fraud Detection Using A Deep Learning Multistage Model
No ratings yet
Credit Card Fraud Detection Using A Deep Learning Multistage Model
26 pages
Detection of Fraudulent Transactions Using Artificial Neural Networks and Decision Tree Methods
No ratings yet
Detection of Fraudulent Transactions Using Artificial Neural Networks and Decision Tree Methods
17 pages
Framework For Detecting Risk of Financial Statement Fraud: Mapping The Fraudulent Environment
No ratings yet
Framework For Detecting Risk of Financial Statement Fraud: Mapping The Fraudulent Environment
16 pages
An Analysis On Financial Statement Fraud Detection For Chinese Listed Companies Using Deep Learning
No ratings yet
An Analysis On Financial Statement Fraud Detection For Chinese Listed Companies Using Deep Learning
17 pages
Detecting Financial Statement Fraud With Interpret
No ratings yet
Detecting Financial Statement Fraud With Interpret
11 pages
Cecchini 2010
No ratings yet
Cecchini 2010
16 pages
Key Roles and Life Cycle
No ratings yet
Key Roles and Life Cycle
4 pages
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
No ratings yet
Abstractive Summarization Using Multilingual Text-To-Text Transfer Transformer For The Turkish Text
10 pages
Data Engineering For Fraud Detection
No ratings yet
Data Engineering For Fraud Detection
13 pages
Financial Statement Fraud Challenges and
No ratings yet
Financial Statement Fraud Challenges and
11 pages
Expert Systems With Applications
No ratings yet
Expert Systems With Applications
11 pages
Detection of Fraudulent Financial Statements Using The Hybrid Data Mining Approach
No ratings yet
Detection of Fraudulent Financial Statements Using The Hybrid Data Mining Approach
16 pages
Biology syllabus-WPS Office
No ratings yet
Biology syllabus-WPS Office
35 pages
Fraud Detection Using The Fraud Triangle Theory and Data
No ratings yet
Fraud Detection Using The Fraud Triangle Theory and Data
22 pages
Ye 2019 IOP Conf. Ser. Mater. Sci. Eng. 612 052051
No ratings yet
Ye 2019 IOP Conf. Ser. Mater. Sci. Eng. 612 052051
13 pages
22aa06 - Arun Kumar R
No ratings yet
22aa06 - Arun Kumar R
18 pages
A Review A Review of Financial Accounting Fraud Detection Based On Data Mining Techniquesof Financial Accounting Fraud Detection Based On Data Mining Techniques
No ratings yet
A Review A Review of Financial Accounting Fraud Detection Based On Data Mining Techniquesof Financial Accounting Fraud Detection Based On Data Mining Techniques
11 pages
Fraud Detection in Financial Statements Using Text@2021
No ratings yet
Fraud Detection in Financial Statements Using Text@2021
10 pages
Chapter 9 & 10 - Operating System Concepts
No ratings yet
Chapter 9 & 10 - Operating System Concepts
3 pages
New Synopsis
No ratings yet
New Synopsis
18 pages
Expert Systems With Applications
No ratings yet
Expert Systems With Applications
9 pages
RR Infra Girders Launching
No ratings yet
RR Infra Girders Launching
1 page
3960 Eeeeeeee
No ratings yet
3960 Eeeeeeee
7 pages
IJRPR16322
No ratings yet
IJRPR16322
15 pages
Report
No ratings yet
Report
14 pages
RESEARCHINTELre
No ratings yet
RESEARCHINTELre
8 pages
Major 1 2nd
No ratings yet
Major 1 2nd
13 pages
Distortion in Amplifiers
No ratings yet
Distortion in Amplifiers
6 pages
Case Study Front Page
No ratings yet
Case Study Front Page
11 pages
Fraudulent Financial Transactions Detection Using Machine Learning
No ratings yet
Fraudulent Financial Transactions Detection Using Machine Learning
10 pages
New Song
No ratings yet
New Song
8 pages
Chapter 6 2.0
No ratings yet
Chapter 6 2.0
4 pages
Integrating A Machine Learning-Driven Fraud Detection System
No ratings yet
Integrating A Machine Learning-Driven Fraud Detection System
7 pages
Exploring The Role of Machine Learning in Detecting and Preventing Financial Statement Fraud: A Case Study Analysis
No ratings yet
Exploring The Role of Machine Learning in Detecting and Preventing Financial Statement Fraud: A Case Study Analysis
5 pages
Paper 25-Prevention and Detection of Financial Statement Fraud - An Implementation of Data Mining Framework
No ratings yet
Paper 25-Prevention and Detection of Financial Statement Fraud - An Implementation of Data Mining Framework
7 pages
10 1109@istel 2018 8661129
No ratings yet
10 1109@istel 2018 8661129
6 pages
JETIR2305424
No ratings yet
JETIR2305424
6 pages
A Review of Data Mining-Based Financial Fraud Detection Research
No ratings yet
A Review of Data Mining-Based Financial Fraud Detection Research
4 pages
2011 DSS Detecting Evolutionary Financial Statement Fraud PDF
No ratings yet
2011 DSS Detecting Evolutionary Financial Statement Fraud PDF
7 pages
Risk Management Template
No ratings yet
Risk Management Template
2 pages
ANT 4468 - Syllabus PDF
No ratings yet
ANT 4468 - Syllabus PDF
5 pages
Massachusetts Parent Letter Refusing MCAS
No ratings yet
Massachusetts Parent Letter Refusing MCAS
1 page
UNIT 11 - BT MLH 11 - Test 2
No ratings yet
UNIT 11 - BT MLH 11 - Test 2
3 pages
Rajant SpecSheet LX5 Squid Cable 110817
No ratings yet
Rajant SpecSheet LX5 Squid Cable 110817
2 pages
HJ 1
No ratings yet
HJ 1
1 page
Data Engineer Requirment
No ratings yet
Data Engineer Requirment
2 pages

Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network

Uploaded by

Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network

Uploaded by

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 4, December 2024, pp. 4106~4112

Detecting fraudulent financial statement under imbalanced data

Hendra Tjahyadi, Yosua Efraim Young

Article Info ABSTRACT

Journal homepage: http://ijai.iaescore.com

Figure 1. Proposed fraud detection workflow

3.1. Data and variables

3.2. Proposed artificial neural networks

3.3. Performance evaluation

Int J Artif Intell, Vol. 13, No. 4, December 2024: 4106-4112

Figure 2. The architecture of the proposed networks

4. RESULTS AND DISCUSSION

Table 1. Summary of comparison (test period 2003–2008)

Table 2. Summary of comparison with previous study (test period 2003–2008)

Table 3. Summary of comparison with previous study (test period 2003–2005)

Table 4. Summary of comparison with previous study (test period 2003–2011)

Table 5. Summary of comparison with previous study (test period 2003–2014)

Int J Artif Intell, Vol. 13, No. 4, December 2024: 4106-4112

Figure 3. Summary of AUC scores over the additional test periods

Yosua Efraim Young is a graduate candidate of Informatics Graduate Program

Hendra Tjahyadi is an Associate Professor of Informatics Study Program in

Int J Artif Intell, Vol. 13, No. 4, December 2024: 4106-4112

You might also like