0% found this document useful (0 votes)

10 views10 pages

ICDM2023 Camera Ready

Uploaded by

Ding Rui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

ICDM2023 Camera Ready

Uploaded by

Ding Rui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Themis: Detecting Anomalies from Disguised

Normal Financial Activities

Rui Ding Xiaochun YangB Bin Wang
School of Comp. Sci. and Eng. School of Comp. Sci. and Eng. School of Comp. Sci. and Eng.
Northeastern University Northeastern University Northeastern University
Shenyang, China Shenyang, China Shenyang, China
[email protected] [email protected] [email protected]

Abstract—Financial supervision plays a pivotal role in society

as it provides early warnings of financial activities and aids the
𝑒𝑒5
($120) Loan (formal)
government in detecting financial crimes. Detecting anomalous
activities from normal financial activities is extremely challenging
due to their disguise and complexity. However, existing anomaly
𝑒𝑒 𝑒𝑒3
4 ($120)
Loan (acutral)
($110) Bank Staff
detection methods in real-world financial scenarios typically
Alice Charlie
suffer from some limitations: (a) Their formulations are overly 𝑒𝑒1 𝑒𝑒2 Bank Account
($100) ($100)
simplistic to effectively identify complex anomalies; (b) Machine Transfer Activity
learning-based anomaly-detection methods lack enough training Bob
label, interpretability, and confidence, making it difficult to obtain
approval from governments or financial institutions; (c) Many Fig. 1: A bridge-loan case: (a) Charlie has submitted a loan
of them only focus on the financial transaction itself, ignoring application for $120 (e5 ) to the bank, but there was a long
the spatio-temporal characteristics of transaction and social
relationships. To circumvent the challenges mentioned above, this
waiting period; (b) Bob (a bank staff) promises to help Charlie
paper proposes a novel anomaly-detection framework to detect get the loan as soon as possible, on the condition that charlie
the anomalies from disguised normal financial activities and infer pays $20 in interest. This means that Charlie needs to pay
clue chains for them. In particular, we are the first to formalize back $120 as usual (e5 ) but only get $100. (c) Then Bob
ten anomalies by reference to actual bank statements, and then raises $100 (e1 ) from Alice to Charlie (e2 ), promising to repay
three types of anomaly-detecting algorithms are proposed to
discover these anomalies from financial activities. Next, we utilize
the principal and interest for $110 (e4 ) as long as the bank
an intelligent search algorithm to trace the most suspicious activ- appropriates (e3 ). In this case, Bob illegally obtained $10.
ities (clue chains) for institutions, improving the interpretability
compared with learning-based methods. More importantly, we
developed an anomaly-detection system, Themis, to detect these
complex financial anomalies, which has been deployed in some illegal financial cases. For example, Fig. 1 is an illicit bridge-
real scenarios. The performance of Themis is demonstrated loan case. In normal cases, Charlie should wait for the bank to
through some comprehensive extensive experiments and case release the loan after submitting the application to the bank.
studies on synthetic datasets and real bank statements. However, he chooses to get the loan from Bob (a bank staff)
Index Terms—financial activities, anomalies, clue chains
at high interest during this period. Bob takes advantage of his
occupation to earn illicit income via an implicit anomaly(e3 →
I. I NTRODUCTION Bob → e4 with the brokerage as an implicit anomaly). A group
Financial supervision plays a vital role in maintaining of financial activities with this pattern may indicate a potential
the stability of financial systems and supporting sustainable anomaly is taking place. However, there are no pioneer works
economic growth [1]–[3], providing early warning of abnormal to detect these anomalies since their anomalous patterns are
financial activities. Given some financial activities, one of complex and undetectable. In particular, we focus on detecting
the major tasks in this field is to detect anomalous financial these anomalies from disguised normal financial activities.
activities. Anomalous financial activities can be defined as a Existing financial anomaly-detection methods can be cate-
series of illegal transactions forbidden by financial institutions, gorized into traditional rule-based methods [6], [7] and deep
such as money laundering and bridge loans [4], [5]. Explicit learning-based methods [2], [8]–[11]. Previous rule-based ap-
anomalies are prior anomalies that can be easily detected, proaches have mainly focused on predefined rules manually
such as exceeded amounts and limited log-in. They have been and detecting node- or edge-level anomalies based on these
well inspected by rule-based methods [6]. Implicit anomalies rules. Although these methods are simple and intuitive, they
are complex posterior-anomalous patterns hidden in normal suffer from limited flexibility, failing to detect complex anoma-
financial activities, where accounts or transactions might be lies. Recently, deep learning-based methods use supervised
normal and only turn out to be anomalous when considered learning techniques to predict anomalies of nodes (cards) or
as correlated subgraphs. They are common core components of edges (transactions) on the financial graph [2], [8], [12] (out-
liers and dense subgraphs with unusual topological structure
are not regarded as reasonable financial anomalies due to
lack of transaction information.). However, they also suffer
from many limitations in real-world scenarios: (1) The scarcity
of training labels is significant, posing a great challenge to
supervised detection. They formalize anomaly detection as
a supervised prediction task, highly dependent on financial
anomalies with precious and rare labels in real-world scenar-
ios. (2) The ability to detect complex anomalies is inadequate.
Existing financial anomaly formulations that only consider
node- or edge-level do not support complex anomalies, e.g.,
bridge loans shown in Fig. 1. (3) The interpretability of these
anomalies detected is poor. Although deep learning-based
methods have already obtained good detection accuracy, it is
hard to explain why abnormal. The explainability of financial
anomalies is of great concern to financial institutions and
only those explainable anomalies can attract the attention of
institutions, thus playing the role of early warning.
Despite the success in detecting financial anomalies, most
previous works only focus on detecting node- or edge-level
anomalies from transactions. There is no academic pioneer Fig. 2: Complex heterogeneous features in financial activities.
to detect complex implicit anomalies from disguised nor-
mal financial activities. This is mainly because the detection
of them presents several intractable challenges practically:
(1) Difficulty in formalizing and detecting complex-diverse
anomalies from disguised normal financial activities. Although the clue chain tracing technology to trace potential clues based
there are many subgraph anomaly detection methods, they on the anomalies mentioned above, and then we evaluate
mainly focus on the topological structure of the graph (outlier and recommend the most suspicious clue chains to financial
and dense subgraphs), which is significantly different from institutions. Here, we abandon learning-based methods due
our tasks “detecting anomalies from disguised normal financial to their extremely poor interpretation and confidence. More
activities”. (2) Difficulty in improving interpretability and con- importantly, we developed a practical anomaly-detection sys-
fidence of financial anomalies from an end-to-end supervised tem, Themis, which has been deployed and applied in many
perspective. (3) Difficulty in formalizing and modeling hetero- real institutions. In experiments, we conduct comprehensive
geneous information (social relationships and spatial-temporal experiments to evaluate Themis’s efficiency and effectiveness
features of financial activities) to enhance the accuracy of on actual bank statements and synthesis datasets.
detection. The heterogeneous information mentioned above is The main contributions of this paper are as follows: (1)
not considered when detecting anomalies due to its complex We design a novel practical framework to detect anoma-
structures and attributes, but this information is decisive in lous financial activities from disguised normal financial ac-
anomaly detection, as shown in Fig. 2. (4) Difficulty in tracing tivities. (2) We are the first to formalize three patterns of
clue chains starting from anomalies to assist financial insti- normal financial activities and ten complex anomalies based
tutions in detecting anomalies. The discovery of clue chains on real financial scenarios, meanwhile considering necessary
can free the labor force from time-consuming and tedious heterogeneous features, including social relations and spatial-
verification tasks. However, there is no pioneer academic work temporal features of financial activities. (3) We design a fam-
to explore the tracing of clue chains in financial activities. ily of detection algorithms to dynamically monitor complex
In order to effectively detect complex anomalies from anomalies in financial activities. (4) We propose a clue chain
disguised normal financial activities, we have worked closely tracing technology to infer potential clue chains starting from
with a financial institution to understand implicit anomalies the anomalies mentioned above and recommend the most
and verify that they are the core components of illegal cases in suspicious clue chains to financial institutions. with high inter-
financial activities. In this paper, we develop a novel uniform pretability since the funding flows of clue chains are clear and
framework to detect anomalies, considering the heterogeneous transparent. These clue chains are highly interpretable since
and complex spatial-temporal and social features. Specifically, the funding flows of clue chains are clear and transparent. (5)
we are the first to formalize ten complex implicit anomalies More importantly, we developed an anomaly-detection system,
by reference to actual bank statements where every account Themis, to detect anomalies from disguised normal financial
and transaction is legitimate. Then we design a family of activities, which has been deployed in some real scenarios.
detection algorithms to dynamically detect these anomalies Specifically, we evaluate the efficiency and effectiveness of
under a practical uniform framework. In particular, we propose Themis on actual bank statements and synthesis datasets.
II. R ELATED W ORK false positive samples. The vital factors influencing financial
anomalies should be the transfer amount, the flow of funding
Financial supervision plays a vital role in maintaining the indicating where the money comes from and what it is used
stability of financial ecosystems and supporting sustainable for, social relations among operators, and so on.
economic growth, providing early warning of abnormal activ- In conclusion, existing financial anomaly-detection methods
ities [1], [2], [13]. There has been a long time of research can only detect node-/ edge-level anomalies accurately, failing
efforts in this field [14]–[19]. Traditional anomaly-detection to detect complex anomalies from disguised normal financial
methods in financial supervision detect intuitive node(card)- activities. Although some works focus on subgraph detection,
or edge(transaction)-level anomalies by pre-defined rules [6], anomalies defined by them are significantly different from
[7] or learning-based strategies [20], [21]. For instance, Kim et financial anomalies, and it is unreasonable completely to detect
al. [6] formalize several anomalous rules to detect anomalies financial anomalies based on outliers and dense subgraphs. In
in financial activities, such as limited log-in time, limited addition, very few pioneers introduce heterogeneous informa-
log-in number, changed log-in location, and so on. These tion when exploring abnormal financial activities due to their
rules can only filter many node-/edge-level simplistic anoma- complexity. They are necessary when detecting anomalies.
lies well. In addition, some researchers [2], [8] formalize
the anomaly detection tasks as classification tasks and pre- III. P ROBLEM F ORMULATION
dict node(account)- or edge(transaction)-level anomalies in a
In this section, we formulate the problem we focus on. First,
learning-based manner [7], [22], [23]. They mainly learn the
we introduce the definitions of the four inputs problem, namely
representation of nodes and edges with the help of attribute
bank account, person, financial activity, and social relation.
features and the topological structures of the graph [10], [24]–
They are the original data that can be exploited by the officers
[27]. For instance, SeqFD [28] predicted fraud by aggregating
for the analysis of financial crime. Then we give the statements
statistical features of historical transactions within a time-
of the Malicious Financial Activity Detection Problem and
based sliding window. In addition, Reddy et al. [29] modeled
explain its significance in reducing the cost of institutions.
fraudulent transactions by introducing the temporal features
and the structural features captured by GNN. Meng Shen et Definition 1 (Bank Account). A bank account, denoted by v, is
al. [30] propose a TSRGL framework, which uses R-GCN to related to the following attributes: owner(v) denotes the per-
learn the topology structure of the historical object-relation son who apply for the card, regcity(v) is the registration city
snapshot graph, and realizes the threat prediction of abnormal of card, and type(v) is the type of account including “bank
transaction behaviors. Cao et al. [12] construct financial trans- account”, “enterprise account” and “personal account”.
action networks based on historical transactions, then learn
Definition 2 (Person). A person, denoted by p, is related with
users’ topological structures in an unsupervised manner and
the following attributes: Accounts(p) = {v1p , v2p , · · · , vnp }
predict the anomalies by tree-based classifiers. These methods
represents the bank accounts owned by p, loct (p) represents
all suffer from limited flexibility, only detecting the simple
p’s location at time slot t, and type(p) is the type of person,
anomalies (node- or edge-level anomalies), and failing to
e.g., “citizens” and “bank staff”.
inspect complex abnormal patterns in financial activities. The
scarcity of training labels poses a huge challenge to learning- Definition 3 (Financial Activity). A financial activity from
based methods since their detection accuracy is highly depen- bank account vi to account vj at time slot t, denoted by eti,j , is
dent on the volume of data with labels. What’s worse, these related with the following attributes: value(eti,j ) represents the
methods all suffer from limited flexibility, only detecting the amount delivered in eti,j , actloc(eti,j ) is the activity location
simple anomalies (node- or edge-level anomalies), and failing of eti,j , and IP(eti,j ) is the operating IP address of eti,j . Corre-
to inspect complex abnormal patterns in financial activities. spondingly, type(eti,j ) denotes the transaction type, including
Recently, some learning-based methods regard outliers [15], “transfer”, “deposit”, and “withdraw”.
[19] or dense subgraph [19], [31], [32] as significant abnor-
Definition 4 (Relative Relation). A relative relation from
mal patterns and mine these patterns from the perspective
person pi to person pj , denoted by ri,j , is a binary variable.
of structural characteristics. For instance, Zhang et al. [33]
i.e. ri,j ∈ {0, 1}. ri,j = 1 means pi is immediate relatives of
designed an anomalous subgraph autoencoder (AS-GAE) to
pj . In particular, ri,i = 1.
detect outliers from the perspective of topological structure.
Anting Zhang et al. [34] designed a subgraph embedding Malicious Financial Activity Detection Problem. Given a
method to identify fraud communities that regard dense sub- set of bank account V = {vi } and the related person set
graphs as anomalies. All these works define anomalies from P = {pi }, the financial activity set E = {eti,j }, and the
the perspective of structural characteristics and only utilize social relation set R = {ri,j |ri,j = (pi , pj ), pi , pj ∈ P }, find
the topological information to detect significant outliers and abnormal financial activities from disguised normal financial
dense subgraphs. However, outliers and dense subgraphs are activities and infer a clue chain (i.e. a sequence of financial
not necessarily abnormal patterns in financial activity, and it is activities among different accounts) for these suspect financial
significantly unreasonable to detect financial anomalies based activities eti,j that can tell where the money value(eti,j ) comes
on topological structure in real scenarios, leading to many from and what it is spent on.
The practical significance of this problem is to assist finan- Definition 6 (Relative Networks). A relative Network is a
cial institutions in showing clues of suspect activities. The clue directed graph S = (P, R), where P = {p1 , · · · , pm } denotes
chains are finally determined to be involved in financial crimes the person set, and R = {ri,j |ri,j = (pi , pj )} denotes the
by the law officers (can be verified in practice). Generally, relative relation set from pi to pj . We use a solid line pointing
many anomalies are disguised in normal financial activities, from pi to pj to express such immediate relative relationships.
which harms the financial ecosystems. Thus, it is important to
There exists a many-to-one matching relationship between
define and detect abnormal financial activities.
FAN G and relative network S. A person pi in S may hold
IV. M ODELING multiple accounts in G, while an account vj only belongs
We formulate a financial activity network to represent the to a legal person in S. We demonstrate such relationships in
bank transactions from historical databases, including trans- Fig. 4 and define V (pi ) = {v1i , . . . , vw
i
} as the account set
fers, deposits, and withdraws. Instead of using one vertex to owned by person pi , and owner(vi ) as the owner of vi , where
denote one person, we use a bank account to indicate the basic owner(vi ) ∈ P .
smallest atomic unit in the system since the bank account is
the core component of financial activities. Deposit/Withdraw

Definition 5 (Financial Activity Network, FAN). A financial Relationship

activity network G = (V, E) is a directed parallel graph Transfer Activity

recording financial activities. V = {v1 , . . . , vn } is a set Bank and Bank

of bank accounts (e.g., debit card, credit card, and so on), Account

and E = {eti,j |eti,j = (vi , vj )} is a set of directed edges, Own

representing a financial activity from vi to vj at time slot t.
Etin
1 ,t2
(vi ) and Etout
1 ,t2
(vi ) are the set of in/out edges of node Fig. 4: Mapping between FAN and relative network.
vi whose timestamps t satisfy t ∈ [t1 , t2 ), respectively.
With the help of FAN G and relative networks S, we can
identify normal financial activities before detecting anomalies
𝑡𝑡
𝑒𝑒𝑖𝑖,𝑗𝑗 𝑡𝑡
𝑒𝑒𝑖𝑖,𝑗𝑗 𝑡𝑡
𝑒𝑒𝑖𝑖,𝑗𝑗
𝑡𝑡
𝑒𝑒𝑖𝑖,𝑖𝑖 𝑡𝑡
𝑒𝑒𝑖𝑖,𝑖𝑖 𝑡𝑡 𝑡𝑡
𝑒𝑒𝑖𝑖,𝑖𝑖 𝑡𝑡 𝑡𝑡 in a financial system. Obviously, the three basic activities
𝑒𝑒𝑖𝑖,𝑖𝑖 𝑒𝑒𝑖𝑖,𝑖𝑖 𝑒𝑒𝑖𝑖,𝑖𝑖
𝑣𝑣𝑖𝑖 𝑣𝑣𝑖𝑖 𝑣𝑣𝑖𝑖 𝑣𝑣𝑗𝑗 𝑣𝑣𝑗𝑗 𝑣𝑣𝑗𝑗 mentioned in Figs. 3(a)-3(c) are normal activities, and we
𝑣𝑣𝑖𝑖 𝑣𝑣𝑖𝑖 𝑣𝑣𝑖𝑖 𝑣𝑣𝑖𝑖 𝑣𝑣𝑖𝑖 𝑣𝑣𝑖𝑖 denote them as NA1 , NA2 , and NA3 , respectively.
(a) Transfer (NA1 ) (b) Deposit (NA2 ) (c) Withdraw (NA3 ) Table I lists symbols and notations used in this paper, some
of which are defined by their appearances.
Fig.𝑎𝑎 3: 𝑣𝑣The𝑣𝑣transaction
𝑡𝑡𝑡 type of financial activities.
𝑡𝑡𝑡
𝑒𝑒𝑖𝑖,𝑎𝑎 𝑡𝑡𝑡 𝑣𝑣 𝑡𝑡𝑡 𝑎𝑎 𝑣𝑣𝑎𝑎
𝑎𝑎 𝑣𝑣
𝑎𝑎 𝑣𝑣𝑎𝑎𝑒𝑒𝑎𝑎,𝑖𝑖
𝑡𝑡𝑡 𝑡𝑡𝑡
𝑒𝑒𝑎𝑎,𝑖𝑖 𝑒𝑒𝑎𝑎,𝑖𝑖
𝑒𝑒𝑖𝑖,𝑎𝑎 𝑒𝑒𝑖𝑖,𝑎𝑎 V. E XPLORING A NOMALIES
𝑣𝑣𝑖𝑖 𝑣𝑣𝑖𝑖 𝑣𝑣To better illustrate the FAN, 𝑣𝑣we use a card to represent a Implicit anomalies disguised in normal financial activities
𝑖𝑖 𝑖𝑖 𝑣𝑣𝑖𝑖 𝑣𝑣𝑖𝑖 𝑣𝑣
𝑡𝑡𝑡𝑡
𝑒𝑒𝑖𝑖,𝑏𝑏
bank 𝑡𝑡𝑡𝑡account
𝑡𝑡𝑡𝑡
𝑒𝑒𝑖𝑖,𝑏𝑏 𝑒𝑒𝑖𝑖,𝑏𝑏 and use a solid 𝑡𝑡𝑡𝑡
𝑒𝑒𝑏𝑏,𝑖𝑖 line𝑡𝑡𝑡𝑡 to
𝑒𝑒𝑏𝑏,𝑖𝑖 𝑡𝑡𝑡𝑡define 𝑣𝑣𝑎𝑎 𝑣𝑣𝑣𝑣𝑖𝑖 𝑎𝑎 𝑣𝑣activity.
a𝑎𝑎 financial 𝑣𝑣
𝑖𝑖 𝑣𝑣𝑖𝑖 𝑏𝑏 𝑣𝑣𝑏𝑏pose
𝑣𝑣𝑏𝑏 a huge challenge to financial supervision since every
𝑒𝑒𝑏𝑏,𝑖𝑖
𝑣𝑣𝑏𝑏 𝑣𝑣𝑏𝑏 𝑣𝑣 𝑣𝑣𝑏𝑏 𝑣𝑣𝑏𝑏 𝑣𝑣𝑏𝑏 single transaction is legitimate due to individuals’ conceal-
Fig. 3 shows the𝑏𝑏 three types of financial activities in an FAN.
Transfer Activity: a direct edge eti,j from vi to vj , as
•
ment. In this section, we attempt to explore these implicit
shown in Fig. 3(a). It means money is transferred from anomalies and formalize ten implicit anomalies hidden in three
one account vi to another account vj at time slot t, normal financial activities. These implicit anomalies are the
including transferring-in timestamp tin and transferring- core components of financial anomalies and have potential
out timestamp tout . If not specified, the default is a risks leading to financial crimes.
transferring-out timestamp. Definition 7 (Sensitive-Region Anomaly, AA1 ). A bank ac-
t
• Deposit Activity: an anticlockwise loop ei,i from vi to count vi may exist anomaly if it is issued in sensitive regions
itself, as shown in Fig. 3(b). It means that the owner Cs listed by the financial institution. Say, vi has AA1 , if its
deposits cash with amount value(eti,i ) into his/her bank registration city regcity(vi ) ∈ Cs .
account at time slot t.
t
• Withdraw Activity: a clockwise loop ei,i from vi to
Definition 8 (Transaction-Address Anomaly, AA2 ). A bank
itself, as shown in Fig. 3(c). It means that the owner account vi has a transaction-address anomaly AA2 if vi has
withdraws cash with the amount value(eti,i ) from his/her a financial activity in one city but its owner is verified in
bank account. another city at the same time, i.e. ∃eti,j ∈ E, actloc(eti,j ) ̸=
loct (owner(vi )).
Since G contains self-loops and parallel edges, it is not a
simple graph. Additionally, each vi only represents a bank Definition 9 (Transaction-IP Anomaly, AA3 ). A transaction-IP
account, while a person may have multiple bank accounts for anomaly happens if multiple transfer activities E ′ ⊆ Et,t+∆
transactions. These transaction records also reflect human be- from different account owners P ′ ⊆ P are operated on the
haviors socially. Therefore, to better depict financial activities, same IP address, i.e., ∀eti,a
1
, etj,b
2
∈ E ′ , IP (eti,a
1
) = IP (etj,b
2
)
′
we need to construct another graph, say, a relative network, and |P |>τc . Here, τc is the frequency threshold specified by
as an auxiliary graph for help. the financial institutions.
TABLE I: Primary notations time, i.e.,
P
value(e) −
P
value(e′ ) ≤ ϵv .
e∈E1 e′ ∈E2
Notation Description Definition 15 (Road-Toll Anomaly, AA9 ). The road-toll
G = (V, E) An FAN with account set V = {v1 , . . ., vn } and anomaly happens in a path with at least two edges in FAN,
financial activity set E = {eti,j |eti,j = (vi , vj )}.
type(v) Types of account v ∈ V : type(v) ={“bank account”, if the intermediate account vi in the path belongs to a
t
“enterprise account”, “personal account”}. bank staff or his/her relatives and the account receives some
type(ei,j ) Types of financial activity eti,j ∈ E:
type(eti,j )={“transfer”, “deposit”, “withdraw”}.
kickback, i.e., 0 < value(eta,i 1
) − value(eti,b
2
) < ϵv , and
owner(v) The owner of the account v ∈ V . the owner of vi meets any of the following conditions: (i)
regcity(v) The registration city of the account v ∈ V . type(owner(vi ))=“bank staff”, or (ii) there exists a person
value(ei,j ) The transfer amount of eti,j ∈ E .
t

actloc(eti,j ) The activity location of eti,j ∈ E . p ∈ P , type(p)=“bank staff” and r = (owner(vi ), p) ∈ R.

IP(eti,j ) The operating IP address of eti,j ∈ E .
P(vi , vj ) A financial path (vi , v1 , · · · , vj ) in the FAN. Definition 16 (One-Way-Transfer Anomaly, AA10 ). If the
S = (P, R) A relative network, with people set P ={p1 , . . . , pw } account vi transfers money to the account vj through dif-
and relative relationship set R={ri,j |ri,j = (pi , pj )}.
type(pi ) Types of person pi ∈P : type(pi ) ={“citizens”, “bank ferent paths (maybe across other persons, enterprises, or
staff”}. bank accounts) without paths from vj to vi , there may be
loct (pi ) The location of the person pi at time slot t.
Et1 ,t2 The edge set of all activities within time interval [t1 , t2 ). illegal transactions). Say, the account vi and vj have AA10 , if
in
Et ,t (vi )
1 2
The edge set of transferred-in activities of vi within time |P(vi , vj )| > 0 and P(vj , vi )| = 0, where vi is the transfer-
interval [t1 , t2 ).
out
Et1 ,t2 (vi ) The edge set of transferred-out activities of vi within time out account and vj is the account transferred in. P(vi , vj )
interval [t1 , t2 ). denotes the path set containing all reachable paths from vi to
|Et1 ,t2 | The number of edges in set Et1 ,t2 .
C Evidence Chain. vj , and |P(vi , vj )| is the number of paths from vi to vj .
Cs The sensitive region set.
λ Upper bound on the legal ratio. VI. A LGORITHM D ESIGN
τc A frequency threshold specified by financial institutions.
ϵv A small monetary threshold, indicating kickbacks. In this section, we propose a uniform framework to dy-
ϵt , ϵa A time interval and amount threshold when tracing clue namically detect anomalies from disguised normal financial
chains.
∆ The threshold of a small time interval. activities. Firstly, we design a family of algorithms to detect
implicit abnormal patterns (anomalies) dynamically from nor-
mal financial activities. Then, we propose the “Clue Chains
Tracing” algorithm to find and infer clue chains from these
Definition 10 (Funding-Frequency-Fluctuation Anomaly,
suspect financial anomalies that can tell where the money
AA 4 ). A bank account vi has a funding-frequency-fluctuation
comes from and what it is used for.
anomaly AA4 , if the frequency of recent transactions far
in
exceeds the frequency of previous ones, i.e., |Et−∆,t (vi )| + A. Overview of the Detection Framework
|Et−∆,t (vi )| ≥ λ · (|Et′ −∆,t′ (vi )| + |Et′ −∆,t′ (vi )|), where t′
out in out
Detecting anomalies from disguised normal financial activ-
is the previous time stamp of t, and λ is a larger threshold.
ities is one of the challenging tasks, especially from disguised
Definition 11 (Funding-Amount-Fluctuation Anomaly, AA5 ). normal activities. In this part, we propose a novel uniform
A bank account vi has a funding-amount-fluctuation anomaly framework to detect anomalies from disguised normal finan-
AA 5 , if the recent transaction amount far exceeds the pre- cial activities by utilizing database search techniques. Newly
P
vious one, i.e., in
e∈Et−∆,t out
(vi )∪Et−∆,t (vi ) value(e) ≥ λ ·
discovered anomaly patterns can be added to the framework
P
value(e). easily. We classify ten implicit anomalies of financial activities
e∈E in
′
out
′ (vi )∪E ′ ′ (vi )
t −∆,t t −∆,t
(AA1 ∼AA10 ) into three categories, including (i) single online
Definition 12 (Split Anomaly, AA6 ). An account vi has a split abnormal activity: a new activity is abnormal; (ii) composite
anomaly AA6 , if there is a star structure in FAN, centered online abnormal activity: a combination of new activity and
at vi with outgoing edge set E ′ ⊆ Et−∆,tout
(vi ) to accounts some historical activities in FAN; and (iii) composite history
t1 ′
with different owners, i.e., ∀ei,j ∈ E , t1 ∈ [t − ∆, t), check abnormal activity: anomalies hidden in historical financial
if r = (owner(vi ), owner(vj )) ∈ / R and |E ′ | > τc , where activities in FAN. Accordingly, we design trigger-based de-
′ out
E ⊆ Et−∆,t (vi ). tection, monitor-based detection, and mining-based detection
algorithms, respectively. The structure tree of detecting anoma-
Definition 13 (Merge Anomaly, AA7 ). An account vi has
lies efficiently is demonstrated in Fig. 5.
a merge anomaly AA7 , if there is a star structure in FAN,
The framework of malicious financial activity detection is
centered at vi with incoming edges E ′ ⊆ Et−∆,t
in
(vi ) from
t1 ′ as follows: (1) Anomaly Detection Algorithms. We design a
unfamilar accounts, i.e., ∀ej,i ∈ E , t1 ∈ [t − ∆, t), check
family of detecting algorithms to monitor intractable implicit
if r = (owner(vj ), owner(vi )) ∈/ R and |E ′ | > τc , where
′ in anomalies induced by new arrival transactions in real time. (2)
E ⊆ Et−∆,t (vi ).
Clue Chains Tracing. Trace the detected anomalies (abnormal
Definition 14 (Immediate In-Out Anomaly, AA8 ). A bank patterns) to form clue chains. (There may be multiple possible
account vi has an immediate in-out anomaly AA8 , if it is chains related to an anomaly.) (3) Clue chains rank and rec-
in
frequently transferred in E1 ⊆ Et−∆,t (vi ) and transfers out ommendation. Estimate the clue chains traced and recommend
out
E2 ⊆ Et−∆,t (vi ) a sum of money within a short period of the most suspicious chains to financial institutions.
Fig. 5: Structure tree of Themis.

B. Anomaly Detection Algorithms Algorithm 2: Monitor-based detection (AA3 -AA8 )

The three types of anomalies are detected in different ways. Input: A new activity actn with time slot t;
We give the three corresponding algorithms as follows. Previous time window [t′ − ∆, t′ );
Trigger-based detection. The trigger-based detection algo- Output: Detected anomalies AA3 – AA8 ;
rithm dynamically checks every new arrival single financial 1 AA 3 , AA 4 , AA 5 , AA 6 , AA 7 , AA 8 ← ∅;

activity by examining its relative relations and spatio-temporal 2 foreach AA i ∈ AA 3 – AA 8 do

features, shown as Algorithm 1. The algorithm can find single 3 Calculate expired acte within [t′ −∆, t−∆);
online anomalies AA1 and AA2 in constant time. 4 if actn and acte satisfy the alarm condition then
5 AA i .add(actn );

Algorithm 1: Trigger-based detection (AA1 and AA2 ) 6 Return AA3 – AA8 ;

Input: A new arrival financial activity actn ;
Output: Detected anomalies AA1 and AA2 ;
1 AA 1 , AA 2 ← ∅;
2 if actn is a “NewAccount” v and regcity(v) ∈ Cs The alarm conditions of AA3 – AA8 are listed as follows.
then AA1 .add(actn ); • Alarm condition of AA3 (Transaction-IP Anomaly). Sup-
3 if actn is a “NewTransfer” e = (vi , vj , t) and pose a new activity actn is initiated on an account v and
actloct (e) ̸= loct (owner(vi )) then AA2 .add(actn ); completed on a device with a certain IP address ip. Let
4 Return AA 1 and AA 2 . Etout
′ −∆,t′ (v) be the set of activities initialed on v within

[t′ − ∆, t′ ). We could maintain a hash index on Etout ′ −∆,t′ (v)

for different IP addresses. Then, we can use O(1) time to get

Monitor-based detection. The monitor-based detection al- the set of activities Eip ′
⊆ Etout
′ −∆,t′ (v) whose corresponding
gorithm detects a set of abnormal activities by combining new activities are operated on ip within [t′ − ∆, t′ ). Let Ee ⊆ Eip ′
arrival activities with historical activities, where FAN is normal be the set of activities within [t′ − ∆, t − ∆), then the alarm
but “FAN + new arrival activities” is abnormal. It is obvious condition of AA3 is |Eip ′
| − |Ee | + 1 > τc .
that AA3 -AA8 are all typical composite online anomalies. • Alarm condition of AA4 (Funding-Frequency-Fluctuation
The monitor-based detection algorithm checks the time Anomaly). For the account v initiated on actn , the alarm
sliding window [t−∆, t), where t is the time slot of a new condition of AA4 is (λ − 1) · (|Etin out
′ −∆,t′ (v)| + Et′ −∆,t′ (v)|) +
arrival activity actn . A straightforward way is to examine in out
(|Et′ −∆,t−∆ (v)| + |Et′ −∆,t−∆ (v)|) ≤ 1.
anomalies within the sliding window [t−∆, t) according to the
• Alarm condition of AA5 (Funding-Amount-Fluctuation
definitions of AA3 -AA8 . In order to accelerate the calculations,
we propose an incremental detection (shown as Algorithm 2)
Anomaly). For the account P v initiated on actn , the alarm con-
dition of AA5 is (λ−1)· e∈E in′ value(e)+
by comparing with its previous sliding window [t′ −∆, t′ ), P t −∆,t′
(v)∪Etout
′ −∆,t′ (v)

where acte is the set of expired activities (if any) for the new e∈E in
′
t −∆,t−∆
(v)∪E out
′
t −∆,t−∆
(v) value(e) ≤ 1.
window [t−∆, t). • Alarm conditions of AA6 and AA7 (Split and Merge
Anomaly). For the account v initiated on actn , we could from and what is it used for. We mine clue chains by tracing
maintain a hash index on Etout ′ −∆,t′ (v) for cards with different the activities in FAN and require that the final discovery of
owners. Then, we can use O(1) time to get the set of activities the chain of clues includes the detected abnormal activity. The
out
Eowner ⊆ Etout′ −∆,t′ (v) whose transferred-out cards have the main idea of the clue chain tracing algorithm is based on the
same owner with v. Let Eout ⊆ Etout out
′ −∆,t′ (v)/Eowner be the the detected activity and examine its neighborhood nodes in
′
set of transferred-out activities within [t − ∆, t − ∆), then the FAN. We trace the source of the money by the similarity of
alarm condition of AA6 is |Etout ′ −∆,t′ (v)| − |Eout | + 1 > τc . the timestamp and the amount of money.
Similarly, let v be the transferred-in account of actn . Let In order to recommend the most suspicious clue chains
in
Eowner ⊆ Etin ′ −∆,t′ (v) be the activities whose transferred- to institutions, we evaluate these chains by scoring rules
in account have the same card owner with v, and Ein ⊆ and choose the most suspicious chains. The scoring rules
Etin in
′ −∆,t′ (v)/Eowner be the set of transferred-in activities are dynamically adjusted according to the business scenario,
′
within [t − ∆, t − ∆), then the alarm condition of AA7 is taking into account time interval, transfer amount, intimacy,
|Etin′ −∆,t′ (v)| − |Ein | + 1 > τc . and so on, since different clue chains and relationships have
• Alarm condition of AA8 (Immediate In-Out Anomaly). different importance for institutions. The framework for chain
For the account v initiated on actn , let en be the tracing and recommendation is as follows: (a) Set the time and
corresponding
P edge of actn andPvald be the difference money thresholds; (b) Start from suspect anomalies detected
value e∈Et′ −∆,t′ (v) value(e) −
in out value(e) − and trace abnormal financial activities according to the human-
P P e∈Et′ −∆,t′ (v) money relationship in FAN, and form some suspect clue
( e∈E in′ (v) value(e) − e∈Etout
′ −∆,t−∆ (v)
value(e)),
t −∆,t−∆
chains; (c) Rank these chains based on dynamically adjusted
then the alarm condition of AA8 is vald + value(en ) > ϵv or
scoring rules. In the evaluation part, top-k clue chains will
vald − value(en ) > ϵv .
be recommended to financial institutions to assist in detecting
Mining-based detection method. The mining-based detec-
anomalies effectively.
tion algorithm discoveries a set of composite history abnormal
activities (AA9 and AA10 ) by searching historical activities VII. E XPERIMENTS
in FAN, shown as Algorithm 3. The verification of reach- We show the efficiency and effectiveness of our approach.
ability among different accounts is improved via introducing More importantly, we developed an anomaly-detection system,
maximum-flow min-cut theory [35], [36], pruning unnecessary called Themis, to detect abnormal activities. The interface of
search paths and early stopping in DFS when detecting AA10 . Themis and case studies are also presented. We conducted all
the experiments on a machine with an Intel(R) Core(TM) i7-
Algorithm 3: Mining-based detection (AA9 and AA10 ) 10710U and 16GB memory in Windows OS. All the methods
Input: An FAN G = (V, E) are implemented in C++ compiled by g++ with O3 turned on.
Output: Detected anomalies AA9 and AA10 . All the detection algorithms are run in memory.
1 AA 9 , AA 10 ← ∅;
A. Datasets
2 foreach vi ∈ V do
3 foreach etj,i1
∈ E in (vi ) do Synthetic dataset: In order to evaluate the performance of
t1 Themis in detecting anomalies, we apply an existing graph
4 start← ej,i ;
generator, Watts-Strogatz1 to generate the background finan-
5 foreach eti,k 2
∈ E out (vi ) do
cial graph and relative network, including 1, 000 nodes and
6 if value(etj,i1
) − value(eti,k
2
) < ϵv then 993, 133 edges. Then abnormal patterns are inserted into the
t
end← ei,k ; graph as the ground truth to evaluate our anomaly detection
7 AA 9 .add((start,end)); algorithms, clue chains trancing technology, and Themis.
Real bank statements: To further evaluate the performance
8 DFS from vi , label the visited accounts, and add
of Themis in real-world scenarios, some financial activities
vj to an empty set V1 when visiting a labeled vj ;
of a bank are utilized as a benchmark dataset in our exper-
9 foreach vj ∈ V1 do
iments, including 110, 509 transfer records and 47 features.
10 if The maximum flow from vj to vi is 0 then
We provide a brief description of this dataset as follows,
11 Ec ← the minimum cut set found by the
including the bank statement (transaction amount, transaction
maximum flow algorithm from vi to vj ;
category, etc.), cardholder-related information (name, card
12 foreach (vi , vk ) ∈ V1 do AA10 .add(eti,k );
number, identification number, etc.), counterparty information
13 foreach (vk , vj ) ∈ V1 do AA10 .add(etk,j );
(account, ID number, etc.), and relative relationships.
New arrival activities: Based on the synthetic dataset, we
14 Return AA9 and AA10 .
generate 100 new nodes and 500 new edges with a sorted
time stamp to simulate newly generated financial activities.
The distribution of generated time stamps follows the same
C. Clue Chains Tracing
time distribution in the financial graph.
For a detected anomaly with an amount of money, a
financial institution may want to trace where the money comes 1 https://github.com/sleepokay/watts-strogatz
(a) Trigger-base detection. (b) Monitor-based detection. (c) Mining-based detection.

Fig. 6: Performance of detection algorithms on a synthetic dataset.

B. Evaluation of anomaly detecting algorithms

Anomalies analysis on synthetic datasets. Fig. 6 shows 1 8
1 6
0
0
1 4 0
the performance of three detection algorithms on synthetic 1 2 0

N u m b er
1 0 0
8 0
datasets with different data sizes. We record the average 6 0
4 0
2 01
number of detected anomalies and detection time for each

10
0 .5

00
Sa

0
la
ry
/T

50
new arrival activity for AA1 -AA8 , and the number of detected

00
ns

10
u e

0
ac
0 .2 V a l

00
tio

00
tio n

n
V
n s a c

50
0 .1

ue
T ra

00
anomalies and detection time for AA9 -AA10 .

00
00
00
0
Fig. 6(a) shows that the average number of anomalies using (a) Proportion of anomalies. (b) Funding-Amount-Fluctuation.
trigger-based detection is small and independent from the data Fig. 7: Statistics of anomalies on real bank statements.
size because most of the incoming activity is normal and only
related to the distribution of incoming activities. The average
TABLE II: Number of chains for individual accounts.
detection time is constant for every new arrival activity. The
results in Fig. 6(b) show that the monitor-based detection Alice Bob Charlie David Emma
algorithms have good scalability for detecting anomalies. The Total number of chains 3,975 11,313 11,654 1,576 7,029
number of detected anomalies and the detection time increase Chains (ϵt =6,ϵa =$10,000) 217 503 179 47 219
linearly with the increase in dataset size. Fig. 6(c) shows the Chains (ϵt =1,ϵa =$10,000) 15 21 8 3 9
results of mining-based detection. When increasing the size
of the dataset, the number of detected anomalies increases
linearly for AA9 and quadratically for AA10 since we want Verification benefit for individual accounts. Table II demon-
to find every One-Way-Transfer pair in FAN. Mining-based strates the number of chains for a specific account of a person.
detection requires more time since the complexity of the “Total number of chains” is the account’s total financial
mining-based detection for AA9 is O(|E|2 ) and for AA10 is clues, i.e., the account of David has 1, 576 transaction-related
O(|V |2 |E|). chains, which should be checked by financial institutions
Anomalies analysis on real bank statements. Fig. 7 (a) manually before using Themis. In particular, Chains (ϵt , ϵa )
demonstrates the proportion of abnormal patterns detected is the number of clue chains traced by Themis under a
in real bank statements. Obviously, these anomalies are sig- given time interval ϵt and amount threshold ϵa (months and
nificantly rare compared with normal activities. Tracing the dollars), then top-k chains are recommended to institutions to
suspicious clue chains based on anomalies can improve the verify manually (top-10 chains in Themis). For example, the
efficiency of institutions. Fig. 7 (b) shows the funding-amount- institutions only need to check 47 and 3 suspect chains about
fluctuation anomalies detected, indicating that most customers’ David after deploying Themis under the threshold of (ϵt = 6
transferring amount is positively correlated with their historical and ϵa = $10, 000) and (ϵt = 1 and ϵa = $10, 000), improving
transaction amount, and only a small part of users have the efficiency compared with previous verification.
funding-amount-fluctuation anomalies. Verification benefit for anomalies. Table III shows the
average number of clue chains and their running time. It shows
C. Evaluation of “clue chains Tracing Algorithms” that when the time interval extends from one month to six
1) Efficiency: In this part, we show the efficiency of clue months, more suspicious clues appear. The financial institution
chain tracing in real bank statements. could use these two parameters to trace the clue chain easily.
Fig. 8: Case studies: clue chains detected by our system Themis.

Liu#Feng

a. Relation Tracing

Mr.x

d. Personnel Funds Link

b. Relation Tracing with c. The Personal Financial Overview

Abormal Link
Fig. 9: Interface-Diagram of Themis.

TABLE III: Performance of chains for anomalies. suspicious clue chain. In this anomaly (AA5 ), Hz transferred
AA 4 AA 5 AA 8 AA 9 AA 10
massive money to Lc and Ys.
Case 2 (Clue chain detected by tracing AA8 ): As shown in
Average number of chains 17,647 537 8,673 2,130 47,586
Fig. 8(b), our algorithms detect that Hz acts as an intermediary
Chains (ϵt =6,ϵa =$10,000) 425 64 328 165 760
Time (ϵt =6,ϵa =$10,000) 2.87s 1.25s 2.67s 0.78s 3.18s
and receives two cash deposits from Sx and Yt respectively.
Then Hz transferred the money to Ly and his other account in
Chains (ϵt =1,ϵa =$10,000) 200 45 229 121 521
Time (ϵt =1,ϵa =$10,000) 1.66s 0.99s 1.54s 0.61s 2.56s a short period of time. Our approach traces that he successfully
transfers from Sx and Yt to Lc and Ys as an intermediary.
Case 3 (Clue chain detected by tracing AA4 and AA8 ):
Fig. 8(c) demonstrates a clue chain that tracks where $400, 000
Performance. Table III demonstrates the average running time comes from and what is it used for. $388, 000 was spent by
(seconds) of clue chain generation in Themis, including the Lc in a luxury shop at the end.
process of tracing, evaluating, and recommending chains. We
can find that the process of clue chain inference is significantly
efficient (all in seconds level), supporting anomaly detection D. Themis: An anomaly-detection system
in large-scale bank statements. More importantly, we developed an anomaly-detection sys-
2) Effectiveness: We conduct some case studies of clue tem (Themis) that can detect anomalies from disguised normal
chains generated by Themis, whose anomalies have been financial activities and find suspicious clue chains. It has
verified by institutions. In fig. 8, the node represents a person’s been deployed in many real scenarios, including banks and
account, and the edge denotes financial activities between two financial institutions. The pipeline of Themis is demonstrated
accounts where the activities with solid lines are anomalies as follows: (a) Anomalies are detected by “Anomaly Detecting
traced and the activities with the dashed line are inferred via Algorithm”; (b) Suspect clue chains are traced based on
the account’s deposits and withdrawals. anomalies via “clue chains Tracing Algorithms”; (c) Themis
Case 1 (Clue chain detected by tracing AA5 ): As shown evaluates and recommends suspicious clue chains to institu-
in Fig. 8(a), Hz is Lc’s driver. He is detected as funding- tions. The interface of Themis is shown in Fig. 9, including
amount fluctuation by using “Themis”. These activities sig- individuals’ assets, bank statements, cash transactions, rela-
nificantly exceed Hz’s income level ($5, 000 monthly salary). tive relationships, and so on. With the help of Themis, the
Our approach traces Hz’s transactions and recommends this anomalies (red solid lines in Fig. 9(b)) can be detected.
VIII. C ONCLUSION [15] R. A. L. Torres and M. Ladeira, “A proposal for online analysis and
identification of fraudulent financial transactions,” in IEEE International
In this paper, we design a uniform framework to detect Conference on Machine Learning and Applications (ICMLA), 2020.
anomalies from disguised normal financial activities. We are [16] J. He, C.-C. M. Yeh, Y. Wu, L. Wang, and W. Zhang, “Mining anomalies
the first to formalize and detect complex anomalies, meanwhile in subspaces of high-dimensional time series for financial transactional
data,” in Machine Learning and Knowledge Discovery in Databases.
considering heterogeneous features. In particular, we propose Applied Data Science Track: European Conference, (ECML PKDD).
a clue chain tracing technology to recommend suspect clue Springer, 2021, pp. 19–36.
chains for institutions. What’s more, we deploy a system, [17] Y. Li, Y. Sun, and N. Contractor, “Graph mining assisted semi-
supervised learning for fraudulent cash-out detection,” in Proceedings
Themis, to detect anomalies and infer clue chains in some of the 2017 IEEE/ACM International Conference on Advances in Social
real scenarios. Experiments on synthetic datasets and real bank Networks Analysis and Mining 2017, 2017, pp. 546–553.
statements show the efficiency and effectiveness of the Themis. [18] X. Mao, M. Liu, and Y. Wang, “Using gnn to detect financial fraud
based on the related party transactions network,” Procedia Computer
Science, vol. 214, pp. 351–358, 2022.
IX. ACKNOWLEDGEMENTS [19] Y. Pei, F. Lyu, W. V. Ipenburg, and M. Pechenizkiy, “Subgraph anomaly
The work is partially supported by the National Natural detection in financial transaction networks,” 2020.
[20] D. Wang, Y. Qi, J. Lin, P. Cui, Q. Jia, Z. Wang, Y. Fang, Q. Yu,
Science Foundation of China (Nos. U22A2025, 62072088, J. Zhou, and S. Yang, “A semi-supervised graph attentive network for
62232007), and Liaoning Provincial Science and Technology financial fraud detection,” in 2019 IEEE International Conference on
Plan Project - Key R&D Department of Science and Technol- Data Mining, 2019, pp. 598–607.
[21] W. Kudo, M. Nishiguchi, and F. Toriumi, “Gcnext: graph convolutional
ogy (No. 2023JH2/101300182). network with expanded balance theory for fraudulent user detection,”
Social Network Analysis and Mining, vol. 10, pp. 1–12, 2020.
[22] S. Pathan and V. Shrivastava, “Identifying linked fraudulent activities
R EFERENCES using graphconvolution network,” arXiv:2106.04513, 2021.
[1] B. L. Handoko, R. N. A. Putri, and S. Wijaya, “Analysis of fraudulent [23] X. Wang, Z. Wan, and Y. Zhang, “A dqn-based internet financial fraud
financial reporting based on fraud heptagon model in transportation transaction detection method,” in International Conference on Computer
and logistic industry listed on idx during covid-19 pandemic,” in Science and Application Engineering (CSAE), 2021, pp. 1–5.
International Conference on Software and e-Business, 2022, pp. 56–63. [24] B. Can, A. G. Yavuz, M. E. Karsligil, and M. A. Güvensan, “A closer
[2] E. Hytis, V. Nastos, C. Gogos, and A. Dimitsas, “Automated identi- look into the characteristics of fraudulent card transactions,” IEEE
fication of fraudulent financial statements by analyzing data traces,” Access, vol. 8, pp. 166 095–166 109, 2020.
in The South-East Europe Design Automation, Computer Engineering, [25] X. Mao, H. Sun, X. Zhu, and J. Li, “Financial fraud detection using the
Computer Networks and Social Media Conference (SEEDA-CECNSM). related-party transaction knowledge graph,” Procedia Computer Science,
IEEE, 2022, pp. 1–7. vol. 199, pp. 733–740, 2022.
[3] S. Dhankhad, E. Mohammed, and B. Far, “Supervised machine learning [26] T. Chen, L. Tang, Y. Sun, Z. Chen, and K. Zhang, “Entity embedding-
algorithms for credit card fraudulent transaction detection: a comparative based anomaly detection for heterogeneous categorical events,” in Pro-
study,” in IEEE international conference on information reuse and ceedings of the Twenty-Fifth International Joint Conference on Artificial
integration (IRI), 2018, pp. 122–125. Intelligence, 2016, pp. 1396–1403.
[4] J. C. Ying, J. Zhang, C. W. Huang, K. T. Chen, and V. S. Tseng, [27] H. Xiang, H. Hu, and X. Zhang, “Deepiforest: A deep anomaly detection
“Fraudetector +: An incremental graph-mining approach for efficient framework with hashing based isolation forest,” in IEEE International
fraudulent phone call detection,” ACM Transactions on Knowledge Conference on Data Mining (ICDM), 2022, pp. 1251–1256.
Discovery from Data, vol. 12, no. 6, pp. 1–35, 2018. [28] C. Wang and H. Zhu, “Wrongdoing monitor: A graph-based behavioral
[5] S. Zhou, J. He, H. Yang, D. Chen, and R. Zhang, “Big data-driven anomaly detection in cyber security,” Trans. Info. For. Sec., vol. 17, p.
abnormal behavior detection in healthcare based on association rules,” 2703–2718, jan 2022.
IEEE Access, vol. PP, no. 99, pp. 1–1, 2020. [29] S. Reddy, P. Poduval, A. V. S. Chauhan, M. Singh, S. Verma, K. Singh,
[6] A. C. Kim, W. H. Park, and D. H. Lee, “A framework for anomaly and T. Bhowmik, “Tegraf: temporal and graph based fraudulent trans-
pattern recognition in electronic financial transaction using moving action detection framework,” in The ACM International Conference on
average method,” in IT Convergence and Security, 2013, pp. 93–99. AI in Finance (ICAIF), 2021, pp. 1–8.
[7] J.-S. Chang and W.-H. Chang, “Analysis of fraudulent behavior strate- [30] M. Shen, A. Sang, P. Duan, H. Yu, and L. Zhu, “Threat prediction of
gies in online auctions for detecting latent fraudsters,” Electronic Com- abnormal transaction behavior based on graph convolutional network
merce Research and Applications, vol. 13, no. 2, pp. 79–97, 2014. in blockchain digital currency,” in Blockchain and Trustworthy Systems
[8] F. Rahmani, C. Valmohammadi, and K. Fathi, “Detecting fraudulent (BlockSys). Springer, 2021, pp. 201–213.
transactions in banking cards using scale-free graphs,” Concurrency and [31] B. Hooi, H. A. Song, A. Beutel, N. Shah, K. Shin, and C. Faloutsos,
Computation: Practice and Experience, vol. 34, no. 19, p. e7028, 2022. “FRAUDAR: bounding graph fraud in the face of camouflage,” in
[9] H. Zhang and W. Zhou, “A two-stage virtual machine abnormal Proceedings of the 22nd ACM SIGKDD International Conference on
behavior-based anomaly detection mechanism,” Cluster Computing, Knowledge Discovery and Data Mining, San Francisco, CA, USA,
vol. 25, no. 1, pp. 203–214, 2022. August 13-17, 2016, 2016, pp. 895–904.
[10] M. Y. Turaba, M. Hasan, N. I. Khan, and H. A. Rahman, “Fraud [32] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection
detection during financial transactions using machine learning and deep and description: a survey,” Data Min. Knowl. Discov., vol. 29, no. 3, pp.
learning techniques,” in IEEE International Conference on Communica- 626–688, 2015.
tions, Computing, Cybersecurity, and Informatics, 2022, pp. 1–8. [33] Z. Zhang and L. Zhao, “Unsupervised deep subgraph anomaly detec-
[11] E. E. Papalexakis, A. Beutel, and P. Steenkiste, “Network anomaly de- tion,” in IEEE International Conference on Data Mining (ICDM), 2022,
tection using co-clustering,” in Encyclopedia of Social Network Analysis pp. 753–762.
and Mining, 2nd Edition, 2018. [34] A. Zhang, B. Wu, and Y. Li, “A heterogeneous graph-based fraudulent
[12] S. Cao, X. Yang, J. Zhou, X. Li, Y. Qi, and K. Xiao, “Poster: Actively community detection system,” in IEEE International Conference on e-
detecting implicit fraudulent transactions,” in The ACM SIGSAC Confer- Business Engineering (ICEBE), 2021, pp. 43–48.
ence on Computer and Communications Security, 2017, pp. 2475–2477. [35] H. Yildirim, V. Chaoji, and M. J. Zaki, “GRAIL: a scalable index for
[13] X. Gu and H. Wang, “Online anomaly prediction for robust cluster reachability queries in very large graphs,” VLDB J., vol. 21, no. 4, pp.
systems,” in Proceedings of the 25th International Conference on Data 509–534, 2012.
Engineering, 2009, pp. 1000–1011. [36] H. Wei, J. X. Yu, C. Lu, and R. Jin, “Reachability querying: An
[14] Z. Wang, “Abnormal financial transaction detection via ai technology,” independent permutation labeling approach,” Proc. VLDB Endow., vol. 7,
International Journal of Distributed Systems and Technologies (IJDST), no. 12, pp. 1191–1202, 2014.
vol. 12, no. 2, pp. 24–34, 2021.

Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
No ratings yet
Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037
18 pages
Financial Fraud Detection Using Graph Neural Networks - A Systematic Review
No ratings yet
Financial Fraud Detection Using Graph Neural Networks - A Systematic Review
21 pages
Modern Teaching Methods
75% (4)
Modern Teaching Methods
10 pages
Pulmonology (Q & A) (Medicalstudyzone - Com)
No ratings yet
Pulmonology (Q & A) (Medicalstudyzone - Com)
1,768 pages
Anomaly Detection: World-Leading Research With Real-World Impact!
No ratings yet
Anomaly Detection: World-Leading Research With Real-World Impact!
72 pages
Certified Fraud Examiner Exam Pathway 2025/2026 Version: Practice Smarter With 585+ Targeted Question Sets
From Everand
Certified Fraud Examiner Exam Pathway 2025/2026 Version: Practice Smarter With 585+ Targeted Question Sets
Brittany Deaton
No ratings yet
Examine ML Approaches To Identify Anomalies in Financial Transactions and Operations
No ratings yet
Examine ML Approaches To Identify Anomalies in Financial Transactions and Operations
4 pages
03 Niall Adams
100% (1)
03 Niall Adams
49 pages
A Survey of Anomaly Detection Techniques in Financial (2015)
No ratings yet
A Survey of Anomaly Detection Techniques in Financial (2015)
43 pages
Headspace 2nd Year
100% (1)
Headspace 2nd Year
401 pages
Deep Semi-Supervised Anomaly Detection For Finding Fraud in The Futures Market
No ratings yet
Deep Semi-Supervised Anomaly Detection For Finding Fraud in The Futures Market
35 pages
The Role of AI in Detecting Financial Fraud
No ratings yet
The Role of AI in Detecting Financial Fraud
1 page
Financial Fraud Detection
No ratings yet
Financial Fraud Detection
13 pages
Artificial Intelligence in Fraud Detection and Prevention
100% (1)
Artificial Intelligence in Fraud Detection and Prevention
10 pages
Model Implementation
No ratings yet
Model Implementation
5 pages
Rethinking the Compaction Policies in LSM-trees
No ratings yet
Rethinking the Compaction Policies in LSM-trees
26 pages
Randomized Sketches for Quantile in LSM-tree Based Store
No ratings yet
Randomized Sketches for Quantile in LSM-tree Based Store
26 pages
Data Science Project
No ratings yet
Data Science Project
15 pages
Fraud Detection Research Paper (03,16,33)
No ratings yet
Fraud Detection Research Paper (03,16,33)
12 pages
Pilot Db
No ratings yet
Pilot Db
23 pages
Cecchini 2010
No ratings yet
Cecchini 2010
16 pages
Fraud Detection Techniques For Credit Card Transactions
No ratings yet
Fraud Detection Techniques For Credit Card Transactions
4 pages
Anomaly Detection in Cross-Country Money Transfer - Compressed
No ratings yet
Anomaly Detection in Cross-Country Money Transfer - Compressed
33 pages
Anomaly Detection in Global Financial Markets With Graph Neural Networks and Nonextensive Entropy
No ratings yet
Anomaly Detection in Global Financial Markets With Graph Neural Networks and Nonextensive Entropy
7 pages
Phase 2 New
No ratings yet
Phase 2 New
14 pages
FA and Big Data Term Paper Final Draft
No ratings yet
FA and Big Data Term Paper Final Draft
16 pages
Detecting Anomalies in Financial Statements Using ML
No ratings yet
Detecting Anomalies in Financial Statements Using ML
21 pages
PAD Final Research Paper-1
No ratings yet
PAD Final Research Paper-1
7 pages
Bda Paper 5
No ratings yet
Bda Paper 5
4 pages
Realtime Fraud Detection Using Apache Flink
No ratings yet
Realtime Fraud Detection Using Apache Flink
5 pages
Fraud and Anomaly in Banking
No ratings yet
Fraud and Anomaly in Banking
20 pages
Official Glossary - ISC 2 CC Preparation
No ratings yet
Official Glossary - ISC 2 CC Preparation
12 pages
Annon Pass
No ratings yet
Annon Pass
9 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
8 pages
Financial Fraud
No ratings yet
Financial Fraud
34 pages
Audio Technica ATH-M20x
No ratings yet
Audio Technica ATH-M20x
1 page
Mini Project
No ratings yet
Mini Project
12 pages
Ashtiani 2022
No ratings yet
Ashtiani 2022
22 pages
Paper 29
No ratings yet
Paper 29
9 pages
A Study of Banking Risk Management
From Everand
A Study of Banking Risk Management
Michael AK CCBI MCBI Chartered Banker
No ratings yet
Imac Pretty 1
No ratings yet
Imac Pretty 1
8 pages
Banking Fraud Detection Outline
No ratings yet
Banking Fraud Detection Outline
6 pages
Panel Kapasitor Bank-Model - PDF 1
No ratings yet
Panel Kapasitor Bank-Model - PDF 1
1 page
Austin Data Center Project Feasibility Study TACC SECO Final Feasibility Report CM1001
100% (2)
Austin Data Center Project Feasibility Study TACC SECO Final Feasibility Report CM1001
31 pages
ML-1 Research Paper
No ratings yet
ML-1 Research Paper
7 pages
Archive 1
No ratings yet
Archive 1
13 pages
AI-Driven Fraud Detection in Financial Transactions With Graph Neural Networks and Anomaly Detection
No ratings yet
AI-Driven Fraud Detection in Financial Transactions With Graph Neural Networks and Anomaly Detection
6 pages
Credit Card Fraud Detection A Novel Approach Using Aggregation Strategy and Feedback Mechanism
No ratings yet
Credit Card Fraud Detection A Novel Approach Using Aggregation Strategy and Feedback Mechanism
11 pages
Graph Neural Network For Fraud Detection Via Spatial-Temporal Attention
No ratings yet
Graph Neural Network For Fraud Detection Via Spatial-Temporal Attention
14 pages
Chapter Fraud Detection
No ratings yet
Chapter Fraud Detection
14 pages
AI-Powered Fraud Detection in Real-Time Financial Transactions
No ratings yet
AI-Powered Fraud Detection in Real-Time Financial Transactions
11 pages
Prova-Regular Pattern and Anomaly Detection On Corporate Transaction Time Series
No ratings yet
Prova-Regular Pattern and Anomaly Detection On Corporate Transaction Time Series
6 pages
Error Detection On Banking Data
No ratings yet
Error Detection On Banking Data
30 pages
Direct Memory Access - GeeksforGeeks
No ratings yet
Direct Memory Access - GeeksforGeeks
4 pages
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
No ratings yet
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
10 pages
Temporal Knowledge Financial Services 2312.16799
No ratings yet
Temporal Knowledge Financial Services 2312.16799
7 pages
Expert Systems With Applications
No ratings yet
Expert Systems With Applications
11 pages
Enhancing Attribute-Driven Fraud Detection With Risk-Aware Graph Representation
No ratings yet
Enhancing Attribute-Driven Fraud Detection With Risk-Aware Graph Representation
12 pages
Unified Modeling Language (Uml) : Assignment
No ratings yet
Unified Modeling Language (Uml) : Assignment
32 pages
AML Bitcoin
No ratings yet
AML Bitcoin
7 pages
Phase 1 Doc - Fraud Detection in Financial Transaction
No ratings yet
Phase 1 Doc - Fraud Detection in Financial Transaction
6 pages
Case Study Front Page
No ratings yet
Case Study Front Page
11 pages
Group 19 Literature Review
No ratings yet
Group 19 Literature Review
11 pages
Final Eddited Research Paper1
No ratings yet
Final Eddited Research Paper1
6 pages
Dictionary of Mahratta Language
No ratings yet
Dictionary of Mahratta Language
664 pages
MBA Analytics For Finance 08
No ratings yet
MBA Analytics For Finance 08
9 pages
Ielts Practice-Reading-Skimming and Scanning
No ratings yet
Ielts Practice-Reading-Skimming and Scanning
5 pages
Research Paper
No ratings yet
Research Paper
8 pages
Ubc 2020 November Tembrevilla Gerald
No ratings yet
Ubc 2020 November Tembrevilla Gerald
253 pages
Intelligent Fraud Detection in Financial Statements Using Machine Learning and Data Mining: A Systematic Literature Review
No ratings yet
Intelligent Fraud Detection in Financial Statements Using Machine Learning and Data Mining: A Systematic Literature Review
25 pages
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
No ratings yet
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
10 pages
Car Basic Mechanics
No ratings yet
Car Basic Mechanics
3 pages
Entrep Report
No ratings yet
Entrep Report
6 pages
LLM Research Paper
No ratings yet
LLM Research Paper
30 pages
Software Questionbank 1st Edition
No ratings yet
Software Questionbank 1st Edition
3 pages
EBSCO-FullText-03 03 2025
No ratings yet
EBSCO-FullText-03 03 2025
12 pages
Presentasi Bulldozer D6N LGP
No ratings yet
Presentasi Bulldozer D6N LGP
28 pages
Six Sigma Methodology With Fraud Detection: 1 Applications of Data Mining
No ratings yet
Six Sigma Methodology With Fraud Detection: 1 Applications of Data Mining
4 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
On The Reasonable Effectiveness of Relational Diagrams-模板
No ratings yet
On The Reasonable Effectiveness of Relational Diagrams-模板
27 pages
820.9.5X MLA Style (8th Edition) - S17
No ratings yet
820.9.5X MLA Style (8th Edition) - S17
6 pages
Data Partition Survey
No ratings yet
Data Partition Survey
23 pages
6anomaly Fraud Detection
No ratings yet
6anomaly Fraud Detection
5 pages
Sigmod - 15 - Locality-Aware Partitioning in Parallel Database Systems
No ratings yet
Sigmod - 15 - Locality-Aware Partitioning in Parallel Database Systems
14 pages
Paris Rhone PE AH001 Ultrasonic Cool Mist Humidifier User Manual
No ratings yet
Paris Rhone PE AH001 Ultrasonic Cool Mist Humidifier User Manual
3 pages
Ailibaba-Time-Series DB
No ratings yet
Ailibaba-Time-Series DB
13 pages
ICDE 2018 A Graph-Based Database Partitioning Method For Parallel OLAP Query Processing
No ratings yet
ICDE 2018 A Graph-Based Database Partitioning Method For Parallel OLAP Query Processing
12 pages
PM2-Project Charter
No ratings yet
PM2-Project Charter
23 pages
IntellSys2021 Camera Ready
No ratings yet
IntellSys2021 Camera Ready
9 pages
SDM2020 Camera Ready
No ratings yet
SDM2020 Camera Ready
9 pages
RecSys2023 Camera Ready
No ratings yet
RecSys2023 Camera Ready
7 pages
How To Apply, Submission of Application and Printing of Admit Card
No ratings yet
How To Apply, Submission of Application and Printing of Admit Card
3 pages
Netezza Analytics Transition Service Flyer
No ratings yet
Netezza Analytics Transition Service Flyer
2 pages
IO List
No ratings yet
IO List
2 pages
Recloser-Fuse Coordination of Radial Distribution Systems in Presence of DG: Analysis, Simulation Studies, & An Adaptive Relaying Scheme
No ratings yet
Recloser-Fuse Coordination of Radial Distribution Systems in Presence of DG: Analysis, Simulation Studies, & An Adaptive Relaying Scheme
31 pages
DE Ch21
No ratings yet
DE Ch21
20 pages
Email Invoicing (E-Invoicing) : A Tool For Customer Satisfaction and Logistics Optimization
No ratings yet
Email Invoicing (E-Invoicing) : A Tool For Customer Satisfaction and Logistics Optimization
3 pages
Wpq-105-03 Gmaw 3g Jose A. Rivas
No ratings yet
Wpq-105-03 Gmaw 3g Jose A. Rivas
1 page
Eagle Incident Form: User Information
No ratings yet
Eagle Incident Form: User Information
6 pages
Presentation On Nanda Nilekani
No ratings yet
Presentation On Nanda Nilekani
7 pages
Hardware of The PIC16F877
No ratings yet
Hardware of The PIC16F877
2 pages

ICDM2023 Camera Ready

Uploaded by

ICDM2023 Camera Ready

Uploaded by

Themis: Detecting Anomalies from Disguised

Normal Financial Activities

Abstract—Financial supervision plays a pivotal role in society

Definition 5 (Financial Activity Network, FAN). A financial Relationship

activity network G = (V, E) is a directed parallel graph Transfer Activity

recording financial activities. V = {v1 , . . . , vn } is a set Bank and Bank

and E = {eti,j |eti,j = (vi , vj )} is a set of directed edges, Own

actloc(eti,j ) The activity location of eti,j ∈ E . p ∈ P , type(p)=“bank staff” and r = (owner(vi ), p) ∈ R.

B. Anomaly Detection Algorithms Algorithm 2: Monitor-based detection (AA3 -AA8 )

activity by examining its relative relations and spatio-temporal 2 foreach AA i ∈ AA 3 – AA 8 do

Algorithm 1: Trigger-based detection (AA1 and AA2 ) 6 Return AA3 – AA8 ;

[t′ − ∆, t′ ). We could maintain a hash index on Etout ′ −∆,t′ (v)

for different IP addresses. Then, we can use O(1) time to get

Fig. 6: Performance of detection algorithms on a synthetic dataset.

B. Evaluation of anomaly detecting algorithms

d. Personnel Funds Link

b. Relation Tracing with c. The Personal Financial Overview

You might also like