0% found this document useful (0 votes)
30 views9 pages

Expert Systems With Applications

Uploaded by

yintitienabia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views9 pages

Expert Systems With Applications

Uploaded by

yintitienabia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Expert Systems with Applications

Volume 246, 15 July 2024, 123126

Explainable fraud detection of financial


statement data driven by two-layer
knowledge graph
Author links open overlay panelSiqi Cai , Zhenping Xie
a b a b

Show more
Share
Cite
https://doi.org/10.1016/j.eswa.2023.123126Get rights and content

Abstract
In modern economic activities, financial statement fraud will seriously
mislead the economic decisions of investors and regulators, and will
lead to huge investment losses even corporate bankruptcies. Although
the powerful abilities have been gained by current machine
learning methods in financial statement fraud detection problems, the
explainability and the ability of extracting fraudulent patterns are still
very scarce. In this study, an explainable Financial Statement
data Fraud Detection method is proposed by introducing a Two-
Layer Knowledge Graph (FSFD-TLKG) and a fraudulent pattern
mining strategy on two-layer knowledge graph. Wherein, a two-layer
knowledge graph comprises a semantic layer and a syntactic layer.
Concretely, the subordination relationships among financial variables
are represented in the semantic layer, and their articulation
relationships are represented in the syntactic layer. Moreover, an
explainable approach is designed to extract financial statement
fraudulent patterns for credible fraud assertion. Experimental results
show that, FSFD-TLKG can effectively extract explainable financial
statement fraudulent patterns and obtain better detection accuracy
than almost all traditional machine learning and deep
learning methods except an unexplainable method: Extreme Gradient
Boosting (XGBoost). Even for XGBoost, the explainable financial
statement fraudulent patterns extracted by our method still can
further improve its performance. Clearly, FSFD-TLKG gains the
optimal practical performance which is much better than existing
methods.
Introduction
Financial statement fraud refers to manipulation of corporate
financial information through illegal means (Rezaee, 2005). It
includes overstatement of assets, revenue, and profits, or
understatement of liabilities, expenses, and losses (Zhou & Kapoor,
2011). In recent years, numerous corporate bankruptcies and
economic losses have been caused by financial statement fraud (Omar
et al., 2015). Observation, inquiry and analysis of financial statements
are traditional fraud detection methods (Crawford & Weirich, 2011).
Due to the concealment of financial statement fraud and inefficiency
of reviewing financial statements manually, research on intelligent
financial statement fraud detection models is essential (Agrawal and
Cooper, 2015, Lennox et al., 2013).
Financial statements are fundamental documents that reflect a
company's financial status. It includes balance sheets, income
statements and cash flow statements (Ravisankar et al., 2011). The
balance sheet presents a company’s solvency and financial soundness.
It includes financial variables such as total assets, total liabilities, and
owner's equity. The income statement shows the operating results of a
company, with key financial variables such as net profit, operating
income, and operating costs. The cash flow statement records changes
in shareholders' equity during a specific period, including financial
variables like operating cash flow, and net increase in cash and cash
equivalents.
Financial variables, as the basic components of financial statements,
are critical to fraud detection. Operating revenue, receivable accounts,
and operating profit are frequently manipulated in falsified financial
statements (Chen, 2015). The abnormal fluctuations of these variables
or their abnormal correlations with other financial variables may be
indicators of potential financial fraud. Financial ratios measure the
proportional relationships between financial variables (Kanapickienė
& Grundienė, 2015), prompting many researchers to detect fraudulent
financial statements by analyzing financial ratios (Di & Shi, 2019). By
checking key ratios such as liquidity, profitability and solvency
(Zainudin & Hashim, 2016), auditors can identify potential fraudulent
activities (Shi, 2010).
In addition to the three financial statements which contains specific
data, several files with textual information can also be utilized for
fraud detection (Minhas & Hussain, 2016). For example, audit
information of financial statements, related party transactions, and so
on. Audit information represents the impartial evaluation of a
company's financial status by auditors. Many studies also detect
fraudulent financial statements by examining the effectiveness of audit
opinions (Hapsoro & Santoso, 2018). Related parties include senior
managers and their family members of listed companies, together with
the companies they hold (Lari Dashtbayaz et al., 2022). Related party
transactions are not subject to external market constraints, making it
difficult to assess the fairness and reasonableness of the transactions.
Many financial fraud cases are associated with related party
transactions (Hope & Lu, 2020).
Articulation is a terminology used in the field of financial statements,
it refers to inherent logical relationships between different financial
variables (Wang, 2016). Due to the ambiguity of certain articulation,
some companies may exploit this to falsify financial statements
(Wang, 2018). Therefore, many studies often detect fraudulent
financial statement from the perspective of articulation (Lin, 2020).
With regard to financial statement fraud detection methods, statistical
approaches such as regression analysis and discriminant analysis were
predominantly employed in the early stages. For instance, Persons
(1995) employed regression analysis to identify key financial variables
that associated with fraudulent activities. With the advancement of
data mining and machine learning (Kotsiantis et al., 2006), Chen et al.
(2014) started to utilize association rule and decision tree to identify
fraudulent financial reports. Besides, An & Suh (2020) leveraged
random forest to construct a predictive model for fraud detection. In
recent years, deep learning has also been applied to fraud detection.
Jan (2021) developed a fraud detection model based on a
convolutional neural network (CNN), which exhibited superior
performance over traditional statistical methods. In conclusion,
extensive financial statement fraud approaches have demonstrated
excellent predictive performance, but majority of these approaches are
unexplainable and unable to mine fraudulent patterns. Thereby, above
approaches sill unable to meet the regulators’ requirement of
providing clear explanations in risk assessment and fraud detection.
In this paper, we propose an explainable Financial Statement
data Fraud Detection method driven by a Two-
Layer Knowledge Graph (FSFD-TLKG), which mainly has the
following innovations and contributions,
1) A method for generating derived financial ratios is proposed
according to the articulation relationship of financial statements.
2) A two-layer knowledge graph with a semantic layer and a syntactic
layer is constructed to reveal the potential relationships between
financial variables. The semantic layer is modeled based on the
subordination relationships of financial variables, while the syntactic
layer is built based on the articulation relationships of financial
statements.
3) The process of mining fraudulent patterns from a two-layer
knowledge graph for credible fraud assertion is introduced. Among
this, the final rule set is created by combing fraudulent patterns that
extracted from both semantic and syntactic layers.
4) Experimental results demonstrate that FSFD-TLKG achieves the
optimal practical performance which is much better than traditional
machine learning and deep learning methods except XGBoost. Even
for XGBoost, its performance still can be improved by the explainable
financial statement fraudulent patterns that extracted by FSFD-TLKG.
The rest of the paper is organized as follows. Section 2 reviews the
related work. Section 3 gives the formulation of financial statement
fraud detection. Section 4 outlines the research framework and
presents the implementation details of FSFD-TLKG. Section 5
demonstrates the related experimental process and analysis of the
experimental results. Section 6 concludes the paper and presents the
future work.

Section snippets

Knowledge graph-based financial statement fraud


detection
Knowledge graph, as a form of knowledge representation, is capable of
extracting hidden knowledge from interconnected data effectively
compared to traditional machine learning methods. Wu et al. (2022)
constructed an audit information knowledge graph based on the
relationship among audit firms and auditors. It employed sub-feature
extraction to identify potential fraudulent firms. Given the
concealment of related party transactions, Mao et al. (2022) built a
knowledge graph of related party

Task definition
A two-layer knowledge graph G is defined as G = (V , E , V , E , L), in
1 1 2 2

which V represents the set of nodes at layer i, E ⊆ V × V denotes the


i i i i

set of edges at layer i, and L signifies the connections between two


layers. The two-layer knowledge graph constructed in this paper
comprises the semantic layer and the syntactic layer. The semantic
layer describes the subordination relationships between nodes, while
the syntactic layer describes the articulation relationships between
nodes.
The

Overall framework
The architecture of FSFD-TLKG, illustrated in Fig. 1, comprises four
main parts: the process of financial statement data, the construction of
a two-layer knowledge graph with a semantic layer and a syntactic
layer, the mining of financial statement fraudulent patterns, and
classification. Data processing aims to convert original financial
statements into the required format of experimental data. The
constructed two-layer knowledge graph comprises a semantic layer
and a syntactic layer. The

Experimental setup
Datasets. The experimental data comprises financial statements of
74 listed companies from 2009 to 2022, sourced from China Stock
Market & Accounting Research Database (https://www.gtarsc.com/).
In order to label fraudulent financial statement samples, we rely on
disclosures provided in the administrative penalty decisions issued by
the China Securities Regulatory Commission
(https://www.csrc.gov.cn/). The accounting periods of fraudulent
financial statement data are detailed in Table 1 of the

Conclusion and future work


FSFD-TLKG is an explainable financial statement data fraud detection
method based on a two-layer knowledge graph and a fraudulent
pattern mining strategy. In this paper, we introduce a novel data
modeling approach for financial statements. It incorporates a syntactic
layer into traditional single-layer knowledge graph to describe
semantic logic in general. The constructed two-layer knowledge graph
can capture more comprehensive information about financial
variables, thereby enabling the

Declaration of competing interest


The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to
influence the work reported in this paper.

Acknowledgement
This work is supported by the National Natural Science Foundation of
China under Grant No. 62272201.
References (45)
• A. Agrawal et al.
Insider trading before accounting scandals
Journal of Corporate Finance
(2015)
• P. Hajek et al.
Mining corporate annual reports for intelligent detection of financial
statement fraud–A comparative study of machine learning methods
Knowledge-Based Systems
(2017)
• R. Kanapickienė et al.
The model of fraud detection in financial statements by means of
financial ratios
Procedia-Social and Behavioral Sciences
(2015)
• X. Ma
Knowledge graph construction and application in geosciences: A review
Computers & Geosciences
(2022)
• X. Mao et al.
Financial fraud detection using the related-party transaction knowledge
graph
Procedia Computer Science
(2022)
• N. Omar et al.
Corporate culture and the occurrence of financial statement fraud: A
review of literature
Procedia Economics and Finance
(2015)
• H.M. Proença et al.
Interpretable multiclass classification by MDL-based rule lists
Information Sciences
(2020)
• Z. Rezaee
Causes, consequences, and deterence of financial statement fraud
Critical perspectives on Accounting
(2005)
• S. Wen et al.
Analysis of financial fraud based on manager knowledge graph
Procedia Computer Science
(2022)
• H. Wu et al.
Financial fraud risk analysis based on audit information knowledge
graph
Procedia Computer Science
(2022)
• W. Zhou et al.
Detecting evolutionary financial statement fraud
Decision support systems
(2011)
• B. An et al.
Identifying financial statement fraud with decision rules obtained from
Modified Random Forest
Data Technologies and Applications
(2020)
• N.V. Chawla et al.
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
(2002)
• F.H. Chen et al.
Application of random forest, rough set theory, decision tree and neural
network to detect financial statement fraud–taking corporate
governance into consideration
• Y.Y. Chen
Analysis of financial fraud and audit countermeasures of listed
companies
Communication of Finance and Accounting.
(2015)
• R.L. Crawford et al.
Fraud guidance for corporate counsel reviewing financial statements
and reports
Journal of Financial Crime
(2011)
• G.Y. Di et al.
Analysing the financial statements of listed forestry companies using
ratio analysis - Based on the financial statements of Shanghai Feringer
Wood Industry Co
China Journal of Commerce
(2019)
• Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via
rule ensembles. arXiv preprint...
• L.A. Galárraga et al.
AMIE: Association rule mining under incomplete evidence in
ontological knowledge bases
• Gu, Y., Guan, Y., & Missier, P. (2020a). Towards learning
instantiated logical rules from knowledge graphs. arXiv...
• Gu, Y., Guan, Y., & Missier, P. (2020b). Building rule hierarchies
for efficient logical rule learning from knowledge...
• D. Hapsoro et al.
Does audit quality mediate the effect of auditor tenure, abnormal audit
fee and auditor's reputation on giving going concern opinion?
International Journal of Economics and Financial Issues
(2018)

You might also like