0% found this document useful (0 votes)
45 views7 pages

Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network

In this paper a novel approach for detecting fraudulent financial statements by employing a combination of neural networks and synthetic minority over sampling technique (SMOTE) is introduced. This approach is designed to tackle the problem of imbalanced datasets prevalent in fraudulent cases, which if left unaddressed will hinder the model to accurately identify fraud. Three neural network models, each representing different fraud predictors as the input layer: 28 inputs raw financial data; 14 inputs financial ratios data; and 42 inputs combination both raw financial and financial ratios data are developed. Experimental validation using established research datasets is conducted to assess the performance of the proposed method. Performance metrics, namely area under the curve (AUC), precision, and sensitivity, are used for evaluation, comparing the proposed model against existing benchmark models found in literature. Results indicate that the proposed model achieves an AUC score of 70.6% and a precision score of 2.89%, in comparable to the existing models, with a sensitivity score of 83% outperforming all counterparts. The high sensitivity rate of the proposed model underscores its practical utility for auditors and regulators, as it minimizes the risk of false negatives, thereby enhancing confidence in fraud detection.

Uploaded by

IAES IJAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views7 pages

Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network

In this paper a novel approach for detecting fraudulent financial statements by employing a combination of neural networks and synthetic minority over sampling technique (SMOTE) is introduced. This approach is designed to tackle the problem of imbalanced datasets prevalent in fraudulent cases, which if left unaddressed will hinder the model to accurately identify fraud. Three neural network models, each representing different fraud predictors as the input layer: 28 inputs raw financial data; 14 inputs financial ratios data; and 42 inputs combination both raw financial and financial ratios data are developed. Experimental validation using established research datasets is conducted to assess the performance of the proposed method. Performance metrics, namely area under the curve (AUC), precision, and sensitivity, are used for evaluation, comparing the proposed model against existing benchmark models found in literature. Results indicate that the proposed model achieves an AUC score of 70.6% and a precision score of 2.89%, in comparable to the existing models, with a sensitivity score of 83% outperforming all counterparts. The high sensitivity rate of the proposed model underscores its practical utility for auditors and regulators, as it minimizes the risk of false negatives, thereby enhancing confidence in fraud detection.

Uploaded by

IAES IJAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 4, December 2024, pp. 4106~4112


ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i4.pp4106-4112  4106

Detecting fraudulent financial statement under imbalanced data


using neural network

Hendra Tjahyadi, Yosua Efraim Young


Study Program of Informatics, Faculty of Computer Science, Universitas Pelita Harapan, Jakarta, Indonesia

Article Info ABSTRACT


Article history: In this paper a novel approach for detecting fraudulent financial statements by
employing a combination of neural networks and synthetic minority over-
Received Dec 29, 2023 sampling technique (SMOTE) is introduced. This approach is designed to tackle
Revised Apr 19, 2024 the problem of imbalanced datasets prevalent in fraudulent cases, which if left
Accepted Jun 8, 2024 unaddressed will hinder the model to accurately identify fraud. Three neural
network models, each representing different fraud predictors as the input layer:
28 inputs raw financial data; 14 inputs financial ratios data; and 42 inputs
Keywords: combination both raw financial and financial ratios data are developed.
Experimental validation using established research datasets is conducted to
Fraudulent financial statements assess the performance of the proposed method. Performance metrics, namely
Machine learning area under the curve (AUC), precision, and sensitivity, are used for evaluation,
Neural network comparing the proposed model against existing benchmark models found in
Supervised learning literature. Results indicate that the proposed model achieves an AUC score of
Synthetic minority over- 70.6% and a precision score of 2.89%, in comparable to the existing models,
sampling technique with a sensitivity score of 83% outperforming all counterparts. The high
sensitivity rate of the proposed model underscores its practical utility for
auditors and regulators, as it minimizes the risk of false negatives, thereby
enhancing confidence in fraud detection.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Hendra Tjahyadi
Study Program of Informatics, Faculty of Computer Science, Universitas Pelita Harapan
Jakarta, Indonesia
Email: [email protected]

1. INTRODUCTION
Financial statement misstatements may arise from either fraud or error, as stated by the International
Federation of Accountants [1]. It is the auditor’s responsibility to provide reasonable assurance that the
financial statements are free from material misstatement. Misleading financial statements can incur significant
costs, especially for investors, regulators, and society at large, as demonstrated in the Enron scandal—one of
the most notable audit and accounting scandals in history and literature [2]–[4]. It began when Enron shocked
the public by reporting a $638 million loss. This case implicated its auditor, Arthur Andersen, which failed to
detect the misstatement and engaged in document shredding related to Enron audits. This highlights the
difficulty in detecting accounting misstatements.
Detecting accounting misstatements can be challenging due to several reasons such as the complexity
of financial transactions, sophisticated fraud schemes, vast amount of data, and human error and bias. These
challenges underscore the need for innovative approach such as data analytics and machine learning in auditing
[3]–[5]. These technologies offer the potential to enhance audit effectiveness, improve risk assessment, and
mitigate the impact of human limitations on audit quality.
Although data analytics and machine learning are expected to demonstrate a superior method, they
are seldom to use on performing audit procedures. It is relatively unknown whether usage of data analytics and

Journal homepage: http://ijai.iaescore.com


Int J Artif Intell ISSN: 2252-8938  4107

machine learning are indeed transformational for the audit [6]. Various research has been conducted in
searching of fraudulent financial statement detection, including utilization of supervised learning and
unsupervised learning. Supervised learning is used including various models such as neural network [7]–[11],
genetic algorithm [12], decision tree (DT) [8]–[10], [13], Bayesian network [8], [9], support vector machines
(SVM) [8], [13]–[15], and logistic regression (LR) [16]. Unsupervised learning implementation use algorithm
such as self-organizing map [17], [18] and k-means clustering [17]. One significant obstacle in machine
learning is the imbalanced data challenge, where unequal class representation leads to inaccurate detection,
with majority classes overshadowing minorities. Publicly available financial statements often exhibit severe
imbalance due to the rarity of fraudulent instances compared to non-fraudulent ones. Therefore, it is crucial for
models to address this imbalance.
This research aims to develop a model for predicting fraudulent financial statements from real public
datasets and to tackle the imbalanced data issue. Three neural networks models with different types of inputs,
namely raw financial data, financial ratios data, and a combination thereof, combined with synthetic minority
over-sampling technique (SMOTE), are proposed in this study. The rest of this paper is organized as follows:
in section 2, we outline previous efforts by researchers to detect fraudulence in financial statement, both using
commonly balanced simulation data and imbalanced real data. Section 3 details the method we propose,
utilizing a combination of neural networks and SMOTE to detect fraudulence in highly imbalanced real data.
In section 4, the experimental results are presented and compared with those of previous researchers. Finally,
in section 5, we conclude with a summary of our findings.

2. LITERATURE REVIEW
The existing literature has focused considerable attention on financial data as crucial indicators of
fraud, encompassing both raw financial data and financial ratios. As fundamental components of financial
statements, financial data have the potential to indicate fraud risk. For example, a liquidity ratio derived directly
from raw financial data could serve as an effective measure of a company's financial pressure. This underscores
the superiority of certain ratios over others [19], particularly those financial data points closely linked to the
fraud triangle theory. By leveraging financial data, several approaches utilizing machine learning and data
mining to detect fraudulent financial statement are found in the literature.
Green and Choi [7] demonstrated the potential of neural network applications in fraud investigation
and utilized it as a detection tool, employing 172 samples, with 86 samples for both fraudulent and
non-fraudulent cases. The model achieved an accuracy rate of 74%. Kotsiantis et al. [8] conducted experiments
on DT, artificial neural network (ANNs), Bayesian networks, rule learners, nearest neighbors, and SVM. This
study demonstrated that DT outperformed other models with 91.2% accuracy using a balanced dataset of 164
Greek companies listed on the Athens stock exchange, comprising 41 fraudulent and 123 non-fraudulent cases.
In a similar works, Kirkos et al. [9] conducted experiments using DT, neural network, and Bayesian belief
networks, revealing that Bayesian belief networks outperformed others with 90.3% accuracy using a balanced
dataset of 76 Greek manufacturing companies, including 38 fraudulent and 38 non-fraudulent cases.
Cecchini et al. [14] using SVM, accurately identified 80% of fraudulent cases and 90.6% of
non-fraudulent cases from a dataset comprising 6,427 non-fraudulent and 205 fraudulent samples. This study
was considered a pioneering work in the field. Dechow et al. [16] presented an alternative method using LR
with financial ratios to detect fraudulent financial statements, signaling the likelihood of misstatement.
Perols [10], with a larger dataset of 15,934 non-fraudulent and 51 fraudulent cases, demonstrated that LR and
SVM outperformed neural network, bagging, C.45, and stacking algorithm. These findings were consistent with
those of Yao et al. [20], which showed that SVM had the highest accuracy among various classification methods.
Randhawa et al. [21] investigated the effectiveness of single and hybrid methods, employing
under-sampling to detect credit card fraud. Their study revealed that combining AdaBoost and majority voting
methods yielded the best results. Bao et al. [15] extended this research by using a large public dataset and
compared the results of re-implementing the models proposed in [14], [16] with a new state-of-the-art model
using RUSBoost. The proposed method outperformed the previous models with an area under curve (AUC) of
72.5% and sensitivity and precision of 4.88% and 4.48%, respectively. Hoang et al. [22] employed XGBoost
and f-XGBoost on the dataset used in [15], resulting in AUC scores of 68.9% and 69.3%, and precision and
sensitivity of 3.56%, 5%, and 3.36%, and 4.22%, respectively. Ashtiani and Raahemi [3] found that a single
model outperformed both ensemble and hybrid approaches. They highlighted Temponeras et al. [23] approach
of employing a deep dense multilayer perceptron, achieving an accuracy of 93.7% using a dataset of 164 Greek
companies. Craja et al. [24] using a text mining approach to detect fraudulent financial statements from annual
reports, demonstrated the effectiveness and preference for ANN, emphasizing their ability to capture complex
relationships among variables. Inspired by the effectiveness of neural networks, this study proposes an approach
combining neural networks and SMOTE to detect fraudulent financial statements in an imbalanced dataset.

Detecting fraudulent financial statement under imbalanced data using neural network (Yosua Efraim Young)
4108  ISSN: 2252-8938

3. METHOD
The detection model for fraudulent financial statement proposed in this study utilizes a combination
of neural networks and SMOTE. Initially, a severely imbalanced public dataset containing real financial
statements is acquired. Subsequently, the dataset undergoes preprocessing to address the imbalanced dataset
using SMOTE, which generates synthetic samples for the minority class. The preprocessed data is then used
for training and experimentation on the proposed network models. Finally, the results obtained from employing
neural networks are compared to those achieved by the state-of-the-art algorithm proposed in previous
literature. The overall process workflow is illustrated in Figure 1.

Figure 1. Proposed fraud detection workflow

3.1. Data and variables


The dataset is retrieved from previous research by Bao et al. [15], comprising 146,045 records
collected from the COMPUSTAT database, covering all publicly listed U.S. firms from 1990 to 2014. It
includes 42 features, consisting of 28 raw financial data derived from research by Cecchini et al. [14] and 14
financial ratios as researched by Dechow et al. [16]. Following the previous study, the training dataset spans
from 1991 to the test year, with a two-year gap.
Serial fraud, defined as fraudulent cases spanning multiple years, is present in the dataset. The impact
of serial fraud is that it can inflate the model's performance, as the same fraud case may be included in both the
train and test data [15]. Therefore, to prevent overstated results and benchmark against previous literature, the
dataset is preprocessed by recoding all serial fraud as non-fraudulent.
To address the severe imbalance between fraudulent and non-fraudulent cases in the dataset, we
employ a minority oversampling technique called SMOTE [25]. This technique is necessary to address the
challenge where the minority class is often neglected; for example, fraudulent cases represent only 0.67% of
the population, which is the focus of our attention. SMOTE generates new synthetic data through an iterative
process targeting each point in the minority class. It proves to be an effective method for addressing existing
imbalance cases and improving classification performance [26].

3.2. Proposed artificial neural networks


This study experimented with the utilization of ANN, specifically feedforward networks, to detect
fraudulent financial statements from a severely imbalanced dataset. The experiments involved three models or
networks, each representing different fraud predictors as the input layer. The first network used 28 raw financial
data as input layers derived from the fraud predictors of Cecchini et al. [14]. The second network used 14
financial ratios as input layers derived from the fraud predictors of Dechow et al. [16]. Lastly, the third network
used a combined approach from Cecchini and Dechow as the input layer.
The overall architecture of the proposed networks is illustrated in Figure 2. The architecture of the
three networks comprises an input layer followed by three hidden layers and an output layer. The input layer
encompasses three different scenarios, representing different fraud predictors, which can be represented by
input layers of 28, 14, and 42, respectively. Inspired in [23]–[25], the first and second hidden layers consist of
fully connected layers with LeakyReLU (alpha of 0.05) as the activation function to address complex patterns
and relationships of the fraud predictors and handle non-linearity issues. This is followed by L2 regularization
with a coefficient of 0.005 to add a penalty term to the network to avoid overfitting issues [27]. The Adam
optimizer is chosen for its capabilities of efficient computation, minimal memory requirements, and suitability
for large datasets [28]. Additionally, a dropout layer is added with a rate of 0.7 to randomly drop out neurons
in an attempt to prevent overfitting [29]. Finally, an output layer with a sigmoid function is added to perform
binary classification tasks.

3.3. Performance evaluation


The performance of the proposed model is evaluated using three metrics: AUC, sensitivity, and
precision. AUC is a metric used to evaluate the performance of binary classification [30]. It is employed to assess
the accuracy of the proposed model due to the imbalance in the dataset, where the occurrence of fraudulent
samples is not adequately captured in standard accuracy metrics [31]. Therefore, the AUC score provides a more
representative measure of accuracy in this context compared to commonly used accuracy scores.

Int J Artif Intell, Vol. 13, No. 4, December 2024: 4106-4112


Int J Artif Intell ISSN: 2252-8938  4109

Following previous works [15], the measurement of sensitivity and precision is based on data from
the top 1% of observations from the decision value. This choice is driven by practical considerations, as
regulators may not be able to observe all companies predicted as fraudulent due to resource constraints.
Additionally, this decision is influenced by the results of leading research by Cecchini et al. [14], which
reported a high number of false positives in their SVM performance, correctly classifying 80% of fraud cases
and 90.6% of non-fraud cases. Therefore, to mitigate the allocation of excessive resources toward many false
positives, the focus is on the top 1% of observations.

Figure 2. The architecture of the proposed networks

4. RESULTS AND DISCUSSION


This study proposed an approach for detecting fraudulent financial statements by combining neural
networks and SMOTE. While earlier literature has explored and demonstrated the capability of various
algorithms to detect fraudulent financial statements, there is still enhancement needed specifically in terms of
accuracy. Therefore, developing a model with better accuracy is important, especially a model that can be
relied upon for practical adoption.
Three networks with different fraud predictor as input layer are employed. The results obtained are
summarized in Table 1. In the first network, employing raw financial data as fraud predictor, the proposed
network scored an AUC score of 0.706, with a sensitivity of 50% and 1.39%. The second proposed network
achieved AUC score of 0.693, sensitivity of 67%, and precision of 2.89%, by employing financial ratios as
fraud predictor. Lastly, the third network employing both raw financial data and financial ratios as fraud
predictor resulted in AUC of 0.672, followed by sensitivity of 83% and precision of 1.91%.

Table 1. Summary of comparison (test period 2003–2008)


Fraud predictor AUC Sensitivity (%) Precision (%)
28 raw financial data 0.706 50 1.39
14 financial ratios 0.693 67 2.89
Both 0.672 83 1.91

As presented in Table 1, it is found that the proposed neural network can detect fraudulent financial
statements, with raw financial data is the best fraud predictor measured by AUC score. These results
demonstrated that combining both raw financial data and financial ratios as fraud predictor does not yield to a
higher accuracy, measured by AUC score. From this experimentation, the highest precision is obtained through
employing financial ratios as fraud predictor, whereas the highest sensitivity is obtained through employing
both raw financial data and financial ratios as fraud predictor.
Table 2 provides results obtained from the proposed network for the test period of 2003–2008 in
comparison with previous literature. These results demonstrate that the proposed network has comparable AUC
score and precision in comparison with results obtained from previous literatures by employing algorithms
with SVM, LR, RUSBoost, XGBoost, and f-XGBoost. In contrast, the proposed network demonstrates a
superior sensitivity score, indicating that the model is able to identify fraud without producing a high number
of false negatives, which could translate to undetected fraud. Hence, this demonstrates the model’s assurance
by ensuring reliability and practical utility or adoption. To show robustness of the proposed network, three
alternative test periods are added. Consistent with previous study [15], [22], the additional test periods are

Detecting fraudulent financial statement under imbalanced data using neural network (Yosua Efraim Young)
4110  ISSN: 2252-8938

2003–2005, 2003–2011, and 2003–2014. The numerical figure of performance metrics for different additional
test periods are presented in Tables 3 to 5.

Table 2. Summary of comparison with previous study (test period 2003–2008)


Fraud predictor Model AUC Sensitivity (%) Precision (%) Reference
28 raw financial data SVM 0.626 2.53 1.92 [15]
Logistic 0.690 0.73 0.85 [15]
RUSBoost 0.725 4.88 4.48 [15]
XGBoost 0.689 3.56 3.36 [22]
f-XGBoost 0.693 5.00 4.22 [22]
ANN 0.706 50 1.39 This study
14 financial ratios Logistic 0.672 3.99 2.63 [15]
ANN 0.693 67 2.89 This study
Both RUSBoost 0.696 3.19 2.54 [15]
ANN 0.672 83 1.91 This study

Table 3. Summary of comparison with previous study (test period 2003–2005)


Fraud predictor Model AUC Sensitivity (%) Precision (%) Reference
28 raw financial data SVM 0.637 2.28 2.53 [15]
Logistic 0.685 1.45 1.69 [15]
RUSBoost 0.753 7.64 7.83 [15]
f-XGBoost 0.691 6.59 6.71 [22]
ANN 0.694 100 2.79 This study
14 financial ratios Logistic 0.649 1.37 1.29 [15]
ANN 0.667 67 2.53 This study
Both ANN 0.656 100 2.52 This study

Table 4. Summary of comparison with previous study (test period 2003–2011)


Fraud predictor Model AUC Sensitivity (%) Precision (%) Reference
28 raw financial data SVM 0.647 3.07 1.98 [15]
Logistic 0.702 1.87 1.19 [15]
RUSBoost 0.710 4.40 3.60 [15]
f-XGBoost 0.678 3.69 3.02 [22]
ANN 0.720 56 1.34 This study
14 financial ratios Logistic 0.672 3.49 2.23 [15]
ANN 0.685 67 2.40 This study
Both ANN 0.693 89 2.45 This study

Table 5. Summary of comparison with previous study (test period 2003–2014)


Fraud predictor Model AUC Sensitivity (%) Precision (%) Reference
28 raw financial data SVM 0.628 2.30 1.48 [15]
Logistic 0.709 1.84 1.04 [15]
RUSBoost 0.717 3.30 2.70 [15]
f-XGBoost 0.678 2.77 2.26 [22]
ANN 0.718 50 1.15 This study
14 financial ratios Logistic 0.702 3.45 1.86 [15]
ANN 0.694 58 1.99 This study
Both ANN 0.686 75 2.02 This study

The results obtained are compared with previous literature and summarized in Figure 3. As shown in
Figure 3, SVM model demonstrate fluctuations in the performance while both RUSBoost and XGBoost
demonstrate a performance decline when the range of the set is extended. The results is accord to
Hoang et al. [22], that the assumption of undetected fraud grows over time makes a longer test period less
reliable. However, in contrast to SVM, RUSBoost, and XGBoost model, the proposed network, and logistic
models show a slightly performance improvement for the extension of the test period. This demonstrates the
robustness of both models and is expected to have a stable performance when tested with new unseen data.
Employing raw financial data as fraud predictor, the proposed network demonstrated the best AUC
score in scenario of using full test set of 2003–2014 as shown in Table 5 by scoring AUC of 0.718 with
precision of 1.15%, and sensitivity of 50%. Considering stability of AUC to demonstrate robustness, the
proposed network score AUC of 0.694 and 0.720, in test period of 2003–2005 and 2003–2011 as shown in
Tables 3 and 4, respectively. This shows that expanding dataset improves the performance of the proposed
network. Then, in the next scenario using period of 2003–2014, the AUC score dropped to 0.718, slightly lower
than previous scenario of 2003–2011. This indicates that while expanding dataset improves the performance,

Int J Artif Intell, Vol. 13, No. 4, December 2024: 4106-4112


Int J Artif Intell ISSN: 2252-8938  4111

there may be a diminishing return in a certain length of periods. Consistent with the results of Bao et al. [15],
this study results demonstrated that when experimenting with the same model or networks, using 28 raw
financial data that derived from [14] leads to a better result compared to using the other fraud predictors, which
is 14 financial ratios derived from [16].
This study results shows that combining neural network and SMOTE can detect fraudulent financial
statements in a severely imbalanced dataset using raw financial data, financial ratios, or both combined as the
fraud predictor. While the proposed network demonstrated promising utility, it is important to acknowledge
that the dataset used consists of historical data that coming from specific demographics and time periods. This
may promote limitations on generalizability, hence require further calibration or updates to maintain its
effectiveness in the current dynamic environment.

Figure 3. Summary of AUC scores over the additional test periods

5. CONCLUSION
This paper introduces a neural network designed to detect fraudulent financial statements within an
imbalanced dataset, addressing the severe imbalance issue through the utilization of SMOTE. Our experiment
results indicate that the model achieves detection capabilities, with an AUC score of 70.6%, a sensitivity rate
of 83%, and a precision rate of 2.89%. This study contributes significantly by advocating for the integration of
ANN in auditing practices, particularly during the initial audit phase, such as risk assessment procedures. The
proposed model's high sensitivity rate underscores its superiority over similar models, offering practical utility
for auditors and regulators by minimizing the risk of false negatives. However, limitations exist, including the
reliance solely on numerical financial data extracted from financial statements. Future research avenues could
explore the combination of non-financial data and the application of unsupervised learning to address
mislabeling issues, potentially through the implementation of generative artificial intelligence to generate
fraudulent data for training purposes or describing fraud characteristics.

REFERENCES
[1] “International standard on auditing 240: the auditor’s responsibilities relating to fraud in an audit of financial statements,” IFAC, 2013.
Accessed: Dec. 27, 2023. [Online]. Available: https://www.ifac.org/_flysystem/azure-private/publications/files/A012 2013 IAASB
Handbook ISA 240.pdf.
[2] Y. J. Chen, W. C. Liou, Y. M. Chen, and J. H. Wu, “Fraud detection for financial statements of business groups,” International
Journal of Accounting Information Systems, vol. 32, pp. 1–23, 2019, doi: 10.1016/j.accinf.2018.11.004.
[3] M. N. Ashtiani and B. Raahemi, “Intelligent fraud detection in financial statements using machine learning and data mining: a
systematic literature review,” IEEE Access, vol. 10, pp. 72504–72525, 2022, doi: 10.1109/ACCESS.2021.3096799.
[4] W. Xiuguo and D. Shengyong, “An analysis on financial statement fraud detection for Chinese listed companies using deep
learning,” IEEE Access, vol. 10, pp. 22516–22532, 2022, doi: 10.1109/ACCESS.2022.3153478.
[5] D. Botez, “Recent challenge for auditors: using data analytics in the audit of the financial statements,” Brain-broad Research in
Artificial Intelligence and Neuroscience, vol. 9, no. 4, pp. 61–72, 2018.
[6] G. Salijeni, A. S. -Taddei, and S. Turley, “Big data and changes in audit technology: contemplating a research agenda,” Accounting
and Business Research, vol. 49, no. 1, pp. 95–119, 2019, doi: 10.1080/00014788.2018.1459458.
[7] B. P. Green and J. H. Choi, “Assessing the risk of management fraud through neural network technology,” Auditing, vol. 16, no. 1,
pp. 25–28, 1997.
[8] S. Kotsiantis, E. Koumanakos, D. Tzelepis, and V. Tampakas, “Forecasting fraudulent financial statements using data mining,”
International Journal of Computational Intelligence, vol. 3, no. 2, pp. 104–110, 2006.
[9] E. Kirkos, C. Spathis, and Y. Manolopoulos, “Data mining techniques for the detection of fraudulent financial statements,” Expert
Systems with Applications, vol. 32, no. 4, pp. 995–1003, 2007, doi: 10.1016/j.eswa.2006.02.016.
Detecting fraudulent financial statement under imbalanced data using neural network (Yosua Efraim Young)
4112  ISSN: 2252-8938

[10] J. Perols, “Financial statement fraud detection: an analysis of statistical and machine learning algorithms,” Auditing, vol. 30, no. 2,
pp. 19–50, 2011, doi: 10.2308/ajpt-50009.
[11] C. L. Jan, “Detection of financial statement fraud using deep learning for sustainable development of capital markets under
information asymmetry,” Sustainability, vol. 13, no. 17, pp. 9879–9898, 2021, doi: 10.3390/su13179879.
[12] T. Kiehl, B. Hoogs, L. Christina, and S. Deniz, “Evolving multi-variate time-series patterns for the discrimination of fraudulent
financial filings,” Genetic and Evolutionary Computation Conference, pp. 1-8, 2005.
[13] J. Bertomeu, E. Cheynel, E. Floyd, and W. Pan, “Using machine learning to detect misstatements,” Review of Accounting Studies,
vol. 26, no. 2, pp. 468–519, 2021, doi: 10.1007/s11142-020-09563-8.
[14] M. Cecchini, H. Aytug, G. J. Koehler, and P. Pathak, “Detecting management fraud in public companies,” Management Science,
vol. 56, no. 7, pp. 1146–1160, 2010, doi: 10.1287/mnsc.1100.1174.
[15] Y. Bao, B. Ke, B. Li, Y. J. Yu, and J. Zhang, “Detecting accounting fraud in publicly traded U.S. firms using a machine learning
approach,” Journal of Accounting Research, vol. 58, no. 1, pp. 199–235, 2020, doi: 10.1111/1475-679X.12292.
[16] P. M. Dechow, W. Ge, C. R. Larson, and R. G. Sloan, “Predicting material accounting misstatements,” 39th Annual Contemporary
Accounting Research Conference, vol. 28, no. 1, pp. 17-82, 2011, doi: 10.1111/j.1911-3846.2010.01041.x.
[17] Q. Deng and G. Mei, “Combining self-organizing map and k-means clustering for detecting fraudulent financial statements,” in
2009 IEEE International Conference on Granular Computing, GRC, 2009, pp. 126–131, doi: 10.1109/GRC.2009.5255148.
[18] S. Y. Huang, R. H. Tsaih, and W. Y. Lin, “Unsupervised neural networks approach for understanding fraudulent financial reporting,”
Industrial Management and Data Systems, vol. 112, no. 2, pp. 224–244, 2012, doi: 10.1108/02635571211204272.
[19] T. R. Izzalqurny, B. Subroto, and A. Ghofar, “Relationship between financial ratio and financial statement fraud risk moderated by auditor
quality,” International Journal of Research in Business and Social Science, vol. 8, no. 4, pp. 34–43, 2019, doi: 10.20525/ijrbs.v8i4.281.
[20] J. Yao, Y. Pan, S. Yang, Y. Chen, and Y. Li, “Detecting fraudulent financial statements for the sustainable development of the
socio-economy in China: a multi-analytic approach,” Sustainability, vol. 11, no. 6, 2019, doi: 10.3390/su11061579.
[21] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, “Credit card fraud detection using adaBoost and majority voting,”
IEEE Access, vol. 6, pp. 14277–14284, 2018, doi: 10.1109/ACCESS.2018.2806420.
[22] M. N. Hoang, H. T. L. Nguyen, and H. N. Viet, “A model for detecting accounting frauds by using machine learning,” in The Annual
Hawaii International Conference on System Sciences, 2022, vol. 2022, pp. 1552–1561, doi: 10.24251/hicss.2022.193.
[23] G. S. Temponeras, S. A. N. Alexandropoulos, S. B. Kotsiantis, and M. N. Vrahatis, “Financial fraudulent statements detection
through a deep dense artificial neural network,” in 10th International Conference on Information, Intelligence, Systems and
Applications, IISA 2019, 2019, pp. 1–5, doi: 10.1109/IISA.2019.8900741.
[24] P. Craja, A. Kim, and S. Lessmann, “Deep learning for detecting financial statement fraud,” Decision Support Systems, vol. 139,
2020, doi: 10.1016/j.dss.2020.113421.
[25] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal
of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002, doi: 10.1613/jair.953.
[26] D. Elreedy and A. F. Atiya, “A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class
imbalance,” Information Sciences, vol. 505, pp. 32–64, 2019, doi: 10.1016/j.ins.2019.07.070.
[27] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, Cambridge, Massachusetts: MIT Press, 2016.
[28] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv-Computer Science, pp. 1-15, 2017, doi:
10.48550/arXiv.1412.6980.
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[30] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006, doi:
10.1016/j.patrec.2005.10.010.
[31] N. Japkowicz, “Assessment metrics for imbalanced learning,” Imbalanced Learning: Foundations, Algorithms, and Applications,
pp. 187–206, 2013, doi: 10.1002/9781118646106.ch8.

BIOGRAPHIES OF AUTHORS

Yosua Efraim Young is a graduate candidate of Informatics Graduate Program


from Universitas Pelita Harapan. He earned his Bachelor’s Degree in Accounting from
Universitas Pelita Harapan in Indonesia. He can be contacted at email:
[email protected].

Hendra Tjahyadi is an Associate Professor of Informatics Study Program in


Universitas Pelita Harapan. He earned his Bachelor’s Degree in Electrical Engineering from
Universitas Kristen Maranatha, Master’s Degree in Instrumentation and Control from Institut
Teknologi Bandung, and Ph.D. in Control Engineering from School of Engineering, Flinders
University. His research interests are in adaptive control, signal processing, and artificial
intelligence. He can be contacted at email: [email protected].

Int J Artif Intell, Vol. 13, No. 4, December 2024: 4106-4112

You might also like