Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network
Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network
Corresponding Author:
Hendra Tjahyadi
Study Program of Informatics, Faculty of Computer Science, Universitas Pelita Harapan
Jakarta, Indonesia
Email: [email protected]
1. INTRODUCTION
Financial statement misstatements may arise from either fraud or error, as stated by the International
Federation of Accountants [1]. It is the auditor’s responsibility to provide reasonable assurance that the
financial statements are free from material misstatement. Misleading financial statements can incur significant
costs, especially for investors, regulators, and society at large, as demonstrated in the Enron scandal—one of
the most notable audit and accounting scandals in history and literature [2]–[4]. It began when Enron shocked
the public by reporting a $638 million loss. This case implicated its auditor, Arthur Andersen, which failed to
detect the misstatement and engaged in document shredding related to Enron audits. This highlights the
difficulty in detecting accounting misstatements.
Detecting accounting misstatements can be challenging due to several reasons such as the complexity
of financial transactions, sophisticated fraud schemes, vast amount of data, and human error and bias. These
challenges underscore the need for innovative approach such as data analytics and machine learning in auditing
[3]–[5]. These technologies offer the potential to enhance audit effectiveness, improve risk assessment, and
mitigate the impact of human limitations on audit quality.
Although data analytics and machine learning are expected to demonstrate a superior method, they
are seldom to use on performing audit procedures. It is relatively unknown whether usage of data analytics and
machine learning are indeed transformational for the audit [6]. Various research has been conducted in
searching of fraudulent financial statement detection, including utilization of supervised learning and
unsupervised learning. Supervised learning is used including various models such as neural network [7]–[11],
genetic algorithm [12], decision tree (DT) [8]–[10], [13], Bayesian network [8], [9], support vector machines
(SVM) [8], [13]–[15], and logistic regression (LR) [16]. Unsupervised learning implementation use algorithm
such as self-organizing map [17], [18] and k-means clustering [17]. One significant obstacle in machine
learning is the imbalanced data challenge, where unequal class representation leads to inaccurate detection,
with majority classes overshadowing minorities. Publicly available financial statements often exhibit severe
imbalance due to the rarity of fraudulent instances compared to non-fraudulent ones. Therefore, it is crucial for
models to address this imbalance.
This research aims to develop a model for predicting fraudulent financial statements from real public
datasets and to tackle the imbalanced data issue. Three neural networks models with different types of inputs,
namely raw financial data, financial ratios data, and a combination thereof, combined with synthetic minority
over-sampling technique (SMOTE), are proposed in this study. The rest of this paper is organized as follows:
in section 2, we outline previous efforts by researchers to detect fraudulence in financial statement, both using
commonly balanced simulation data and imbalanced real data. Section 3 details the method we propose,
utilizing a combination of neural networks and SMOTE to detect fraudulence in highly imbalanced real data.
In section 4, the experimental results are presented and compared with those of previous researchers. Finally,
in section 5, we conclude with a summary of our findings.
2. LITERATURE REVIEW
The existing literature has focused considerable attention on financial data as crucial indicators of
fraud, encompassing both raw financial data and financial ratios. As fundamental components of financial
statements, financial data have the potential to indicate fraud risk. For example, a liquidity ratio derived directly
from raw financial data could serve as an effective measure of a company's financial pressure. This underscores
the superiority of certain ratios over others [19], particularly those financial data points closely linked to the
fraud triangle theory. By leveraging financial data, several approaches utilizing machine learning and data
mining to detect fraudulent financial statement are found in the literature.
Green and Choi [7] demonstrated the potential of neural network applications in fraud investigation
and utilized it as a detection tool, employing 172 samples, with 86 samples for both fraudulent and
non-fraudulent cases. The model achieved an accuracy rate of 74%. Kotsiantis et al. [8] conducted experiments
on DT, artificial neural network (ANNs), Bayesian networks, rule learners, nearest neighbors, and SVM. This
study demonstrated that DT outperformed other models with 91.2% accuracy using a balanced dataset of 164
Greek companies listed on the Athens stock exchange, comprising 41 fraudulent and 123 non-fraudulent cases.
In a similar works, Kirkos et al. [9] conducted experiments using DT, neural network, and Bayesian belief
networks, revealing that Bayesian belief networks outperformed others with 90.3% accuracy using a balanced
dataset of 76 Greek manufacturing companies, including 38 fraudulent and 38 non-fraudulent cases.
Cecchini et al. [14] using SVM, accurately identified 80% of fraudulent cases and 90.6% of
non-fraudulent cases from a dataset comprising 6,427 non-fraudulent and 205 fraudulent samples. This study
was considered a pioneering work in the field. Dechow et al. [16] presented an alternative method using LR
with financial ratios to detect fraudulent financial statements, signaling the likelihood of misstatement.
Perols [10], with a larger dataset of 15,934 non-fraudulent and 51 fraudulent cases, demonstrated that LR and
SVM outperformed neural network, bagging, C.45, and stacking algorithm. These findings were consistent with
those of Yao et al. [20], which showed that SVM had the highest accuracy among various classification methods.
Randhawa et al. [21] investigated the effectiveness of single and hybrid methods, employing
under-sampling to detect credit card fraud. Their study revealed that combining AdaBoost and majority voting
methods yielded the best results. Bao et al. [15] extended this research by using a large public dataset and
compared the results of re-implementing the models proposed in [14], [16] with a new state-of-the-art model
using RUSBoost. The proposed method outperformed the previous models with an area under curve (AUC) of
72.5% and sensitivity and precision of 4.88% and 4.48%, respectively. Hoang et al. [22] employed XGBoost
and f-XGBoost on the dataset used in [15], resulting in AUC scores of 68.9% and 69.3%, and precision and
sensitivity of 3.56%, 5%, and 3.36%, and 4.22%, respectively. Ashtiani and Raahemi [3] found that a single
model outperformed both ensemble and hybrid approaches. They highlighted Temponeras et al. [23] approach
of employing a deep dense multilayer perceptron, achieving an accuracy of 93.7% using a dataset of 164 Greek
companies. Craja et al. [24] using a text mining approach to detect fraudulent financial statements from annual
reports, demonstrated the effectiveness and preference for ANN, emphasizing their ability to capture complex
relationships among variables. Inspired by the effectiveness of neural networks, this study proposes an approach
combining neural networks and SMOTE to detect fraudulent financial statements in an imbalanced dataset.
Detecting fraudulent financial statement under imbalanced data using neural network (Yosua Efraim Young)
4108 ISSN: 2252-8938
3. METHOD
The detection model for fraudulent financial statement proposed in this study utilizes a combination
of neural networks and SMOTE. Initially, a severely imbalanced public dataset containing real financial
statements is acquired. Subsequently, the dataset undergoes preprocessing to address the imbalanced dataset
using SMOTE, which generates synthetic samples for the minority class. The preprocessed data is then used
for training and experimentation on the proposed network models. Finally, the results obtained from employing
neural networks are compared to those achieved by the state-of-the-art algorithm proposed in previous
literature. The overall process workflow is illustrated in Figure 1.
Following previous works [15], the measurement of sensitivity and precision is based on data from
the top 1% of observations from the decision value. This choice is driven by practical considerations, as
regulators may not be able to observe all companies predicted as fraudulent due to resource constraints.
Additionally, this decision is influenced by the results of leading research by Cecchini et al. [14], which
reported a high number of false positives in their SVM performance, correctly classifying 80% of fraud cases
and 90.6% of non-fraud cases. Therefore, to mitigate the allocation of excessive resources toward many false
positives, the focus is on the top 1% of observations.
As presented in Table 1, it is found that the proposed neural network can detect fraudulent financial
statements, with raw financial data is the best fraud predictor measured by AUC score. These results
demonstrated that combining both raw financial data and financial ratios as fraud predictor does not yield to a
higher accuracy, measured by AUC score. From this experimentation, the highest precision is obtained through
employing financial ratios as fraud predictor, whereas the highest sensitivity is obtained through employing
both raw financial data and financial ratios as fraud predictor.
Table 2 provides results obtained from the proposed network for the test period of 2003–2008 in
comparison with previous literature. These results demonstrate that the proposed network has comparable AUC
score and precision in comparison with results obtained from previous literatures by employing algorithms
with SVM, LR, RUSBoost, XGBoost, and f-XGBoost. In contrast, the proposed network demonstrates a
superior sensitivity score, indicating that the model is able to identify fraud without producing a high number
of false negatives, which could translate to undetected fraud. Hence, this demonstrates the model’s assurance
by ensuring reliability and practical utility or adoption. To show robustness of the proposed network, three
alternative test periods are added. Consistent with previous study [15], [22], the additional test periods are
Detecting fraudulent financial statement under imbalanced data using neural network (Yosua Efraim Young)
4110 ISSN: 2252-8938
2003–2005, 2003–2011, and 2003–2014. The numerical figure of performance metrics for different additional
test periods are presented in Tables 3 to 5.
The results obtained are compared with previous literature and summarized in Figure 3. As shown in
Figure 3, SVM model demonstrate fluctuations in the performance while both RUSBoost and XGBoost
demonstrate a performance decline when the range of the set is extended. The results is accord to
Hoang et al. [22], that the assumption of undetected fraud grows over time makes a longer test period less
reliable. However, in contrast to SVM, RUSBoost, and XGBoost model, the proposed network, and logistic
models show a slightly performance improvement for the extension of the test period. This demonstrates the
robustness of both models and is expected to have a stable performance when tested with new unseen data.
Employing raw financial data as fraud predictor, the proposed network demonstrated the best AUC
score in scenario of using full test set of 2003–2014 as shown in Table 5 by scoring AUC of 0.718 with
precision of 1.15%, and sensitivity of 50%. Considering stability of AUC to demonstrate robustness, the
proposed network score AUC of 0.694 and 0.720, in test period of 2003–2005 and 2003–2011 as shown in
Tables 3 and 4, respectively. This shows that expanding dataset improves the performance of the proposed
network. Then, in the next scenario using period of 2003–2014, the AUC score dropped to 0.718, slightly lower
than previous scenario of 2003–2011. This indicates that while expanding dataset improves the performance,
there may be a diminishing return in a certain length of periods. Consistent with the results of Bao et al. [15],
this study results demonstrated that when experimenting with the same model or networks, using 28 raw
financial data that derived from [14] leads to a better result compared to using the other fraud predictors, which
is 14 financial ratios derived from [16].
This study results shows that combining neural network and SMOTE can detect fraudulent financial
statements in a severely imbalanced dataset using raw financial data, financial ratios, or both combined as the
fraud predictor. While the proposed network demonstrated promising utility, it is important to acknowledge
that the dataset used consists of historical data that coming from specific demographics and time periods. This
may promote limitations on generalizability, hence require further calibration or updates to maintain its
effectiveness in the current dynamic environment.
5. CONCLUSION
This paper introduces a neural network designed to detect fraudulent financial statements within an
imbalanced dataset, addressing the severe imbalance issue through the utilization of SMOTE. Our experiment
results indicate that the model achieves detection capabilities, with an AUC score of 70.6%, a sensitivity rate
of 83%, and a precision rate of 2.89%. This study contributes significantly by advocating for the integration of
ANN in auditing practices, particularly during the initial audit phase, such as risk assessment procedures. The
proposed model's high sensitivity rate underscores its superiority over similar models, offering practical utility
for auditors and regulators by minimizing the risk of false negatives. However, limitations exist, including the
reliance solely on numerical financial data extracted from financial statements. Future research avenues could
explore the combination of non-financial data and the application of unsupervised learning to address
mislabeling issues, potentially through the implementation of generative artificial intelligence to generate
fraudulent data for training purposes or describing fraud characteristics.
REFERENCES
[1] “International standard on auditing 240: the auditor’s responsibilities relating to fraud in an audit of financial statements,” IFAC, 2013.
Accessed: Dec. 27, 2023. [Online]. Available: https://www.ifac.org/_flysystem/azure-private/publications/files/A012 2013 IAASB
Handbook ISA 240.pdf.
[2] Y. J. Chen, W. C. Liou, Y. M. Chen, and J. H. Wu, “Fraud detection for financial statements of business groups,” International
Journal of Accounting Information Systems, vol. 32, pp. 1–23, 2019, doi: 10.1016/j.accinf.2018.11.004.
[3] M. N. Ashtiani and B. Raahemi, “Intelligent fraud detection in financial statements using machine learning and data mining: a
systematic literature review,” IEEE Access, vol. 10, pp. 72504–72525, 2022, doi: 10.1109/ACCESS.2021.3096799.
[4] W. Xiuguo and D. Shengyong, “An analysis on financial statement fraud detection for Chinese listed companies using deep
learning,” IEEE Access, vol. 10, pp. 22516–22532, 2022, doi: 10.1109/ACCESS.2022.3153478.
[5] D. Botez, “Recent challenge for auditors: using data analytics in the audit of the financial statements,” Brain-broad Research in
Artificial Intelligence and Neuroscience, vol. 9, no. 4, pp. 61–72, 2018.
[6] G. Salijeni, A. S. -Taddei, and S. Turley, “Big data and changes in audit technology: contemplating a research agenda,” Accounting
and Business Research, vol. 49, no. 1, pp. 95–119, 2019, doi: 10.1080/00014788.2018.1459458.
[7] B. P. Green and J. H. Choi, “Assessing the risk of management fraud through neural network technology,” Auditing, vol. 16, no. 1,
pp. 25–28, 1997.
[8] S. Kotsiantis, E. Koumanakos, D. Tzelepis, and V. Tampakas, “Forecasting fraudulent financial statements using data mining,”
International Journal of Computational Intelligence, vol. 3, no. 2, pp. 104–110, 2006.
[9] E. Kirkos, C. Spathis, and Y. Manolopoulos, “Data mining techniques for the detection of fraudulent financial statements,” Expert
Systems with Applications, vol. 32, no. 4, pp. 995–1003, 2007, doi: 10.1016/j.eswa.2006.02.016.
Detecting fraudulent financial statement under imbalanced data using neural network (Yosua Efraim Young)
4112 ISSN: 2252-8938
[10] J. Perols, “Financial statement fraud detection: an analysis of statistical and machine learning algorithms,” Auditing, vol. 30, no. 2,
pp. 19–50, 2011, doi: 10.2308/ajpt-50009.
[11] C. L. Jan, “Detection of financial statement fraud using deep learning for sustainable development of capital markets under
information asymmetry,” Sustainability, vol. 13, no. 17, pp. 9879–9898, 2021, doi: 10.3390/su13179879.
[12] T. Kiehl, B. Hoogs, L. Christina, and S. Deniz, “Evolving multi-variate time-series patterns for the discrimination of fraudulent
financial filings,” Genetic and Evolutionary Computation Conference, pp. 1-8, 2005.
[13] J. Bertomeu, E. Cheynel, E. Floyd, and W. Pan, “Using machine learning to detect misstatements,” Review of Accounting Studies,
vol. 26, no. 2, pp. 468–519, 2021, doi: 10.1007/s11142-020-09563-8.
[14] M. Cecchini, H. Aytug, G. J. Koehler, and P. Pathak, “Detecting management fraud in public companies,” Management Science,
vol. 56, no. 7, pp. 1146–1160, 2010, doi: 10.1287/mnsc.1100.1174.
[15] Y. Bao, B. Ke, B. Li, Y. J. Yu, and J. Zhang, “Detecting accounting fraud in publicly traded U.S. firms using a machine learning
approach,” Journal of Accounting Research, vol. 58, no. 1, pp. 199–235, 2020, doi: 10.1111/1475-679X.12292.
[16] P. M. Dechow, W. Ge, C. R. Larson, and R. G. Sloan, “Predicting material accounting misstatements,” 39th Annual Contemporary
Accounting Research Conference, vol. 28, no. 1, pp. 17-82, 2011, doi: 10.1111/j.1911-3846.2010.01041.x.
[17] Q. Deng and G. Mei, “Combining self-organizing map and k-means clustering for detecting fraudulent financial statements,” in
2009 IEEE International Conference on Granular Computing, GRC, 2009, pp. 126–131, doi: 10.1109/GRC.2009.5255148.
[18] S. Y. Huang, R. H. Tsaih, and W. Y. Lin, “Unsupervised neural networks approach for understanding fraudulent financial reporting,”
Industrial Management and Data Systems, vol. 112, no. 2, pp. 224–244, 2012, doi: 10.1108/02635571211204272.
[19] T. R. Izzalqurny, B. Subroto, and A. Ghofar, “Relationship between financial ratio and financial statement fraud risk moderated by auditor
quality,” International Journal of Research in Business and Social Science, vol. 8, no. 4, pp. 34–43, 2019, doi: 10.20525/ijrbs.v8i4.281.
[20] J. Yao, Y. Pan, S. Yang, Y. Chen, and Y. Li, “Detecting fraudulent financial statements for the sustainable development of the
socio-economy in China: a multi-analytic approach,” Sustainability, vol. 11, no. 6, 2019, doi: 10.3390/su11061579.
[21] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, “Credit card fraud detection using adaBoost and majority voting,”
IEEE Access, vol. 6, pp. 14277–14284, 2018, doi: 10.1109/ACCESS.2018.2806420.
[22] M. N. Hoang, H. T. L. Nguyen, and H. N. Viet, “A model for detecting accounting frauds by using machine learning,” in The Annual
Hawaii International Conference on System Sciences, 2022, vol. 2022, pp. 1552–1561, doi: 10.24251/hicss.2022.193.
[23] G. S. Temponeras, S. A. N. Alexandropoulos, S. B. Kotsiantis, and M. N. Vrahatis, “Financial fraudulent statements detection
through a deep dense artificial neural network,” in 10th International Conference on Information, Intelligence, Systems and
Applications, IISA 2019, 2019, pp. 1–5, doi: 10.1109/IISA.2019.8900741.
[24] P. Craja, A. Kim, and S. Lessmann, “Deep learning for detecting financial statement fraud,” Decision Support Systems, vol. 139,
2020, doi: 10.1016/j.dss.2020.113421.
[25] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal
of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002, doi: 10.1613/jair.953.
[26] D. Elreedy and A. F. Atiya, “A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class
imbalance,” Information Sciences, vol. 505, pp. 32–64, 2019, doi: 10.1016/j.ins.2019.07.070.
[27] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, Cambridge, Massachusetts: MIT Press, 2016.
[28] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv-Computer Science, pp. 1-15, 2017, doi:
10.48550/arXiv.1412.6980.
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[30] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006, doi:
10.1016/j.patrec.2005.10.010.
[31] N. Japkowicz, “Assessment metrics for imbalanced learning,” Imbalanced Learning: Foundations, Algorithms, and Applications,
pp. 187–206, 2013, doi: 10.1002/9781118646106.ch8.
BIOGRAPHIES OF AUTHORS