ICDM2023 Camera Ready
ICDM2023 Camera Ready
features, shown as Algorithm 1. The algorithm can find single 3 Calculate expired acte within [t′ −∆, t−∆);
online anomalies AA1 and AA2 in constant time. 4 if actn and acte satisfy the alarm condition then
5 AA i .add(actn );
where acte is the set of expired activities (if any) for the new e∈E in
′
t −∆,t−∆
(v)∪E out
′
t −∆,t−∆
(v) value(e) ≤ 1.
window [t−∆, t). • Alarm conditions of AA6 and AA7 (Split and Merge
Anomaly). For the account v initiated on actn , we could from and what is it used for. We mine clue chains by tracing
maintain a hash index on Etout ′ −∆,t′ (v) for cards with different the activities in FAN and require that the final discovery of
owners. Then, we can use O(1) time to get the set of activities the chain of clues includes the detected abnormal activity. The
out
Eowner ⊆ Etout′ −∆,t′ (v) whose transferred-out cards have the main idea of the clue chain tracing algorithm is based on the
same owner with v. Let Eout ⊆ Etout out
′ −∆,t′ (v)/Eowner be the the detected activity and examine its neighborhood nodes in
′
set of transferred-out activities within [t − ∆, t − ∆), then the FAN. We trace the source of the money by the similarity of
alarm condition of AA6 is |Etout ′ −∆,t′ (v)| − |Eout | + 1 > τc . the timestamp and the amount of money.
Similarly, let v be the transferred-in account of actn . Let In order to recommend the most suspicious clue chains
in
Eowner ⊆ Etin ′ −∆,t′ (v) be the activities whose transferred- to institutions, we evaluate these chains by scoring rules
in account have the same card owner with v, and Ein ⊆ and choose the most suspicious chains. The scoring rules
Etin in
′ −∆,t′ (v)/Eowner be the set of transferred-in activities are dynamically adjusted according to the business scenario,
′
within [t − ∆, t − ∆), then the alarm condition of AA7 is taking into account time interval, transfer amount, intimacy,
|Etin′ −∆,t′ (v)| − |Ein | + 1 > τc . and so on, since different clue chains and relationships have
• Alarm condition of AA8 (Immediate In-Out Anomaly). different importance for institutions. The framework for chain
For the account v initiated on actn , let en be the tracing and recommendation is as follows: (a) Set the time and
corresponding
P edge of actn andPvald be the difference money thresholds; (b) Start from suspect anomalies detected
value e∈Et′ −∆,t′ (v) value(e) −
in out value(e) − and trace abnormal financial activities according to the human-
P P e∈Et′ −∆,t′ (v) money relationship in FAN, and form some suspect clue
( e∈E in′ (v) value(e) − e∈Etout
′ −∆,t−∆ (v)
value(e)),
t −∆,t−∆
chains; (c) Rank these chains based on dynamically adjusted
then the alarm condition of AA8 is vald + value(en ) > ϵv or
scoring rules. In the evaluation part, top-k clue chains will
vald − value(en ) > ϵv .
be recommended to financial institutions to assist in detecting
Mining-based detection method. The mining-based detec-
anomalies effectively.
tion algorithm discoveries a set of composite history abnormal
activities (AA9 and AA10 ) by searching historical activities VII. E XPERIMENTS
in FAN, shown as Algorithm 3. The verification of reach- We show the efficiency and effectiveness of our approach.
ability among different accounts is improved via introducing More importantly, we developed an anomaly-detection system,
maximum-flow min-cut theory [35], [36], pruning unnecessary called Themis, to detect abnormal activities. The interface of
search paths and early stopping in DFS when detecting AA10 . Themis and case studies are also presented. We conducted all
the experiments on a machine with an Intel(R) Core(TM) i7-
Algorithm 3: Mining-based detection (AA9 and AA10 ) 10710U and 16GB memory in Windows OS. All the methods
Input: An FAN G = (V, E) are implemented in C++ compiled by g++ with O3 turned on.
Output: Detected anomalies AA9 and AA10 . All the detection algorithms are run in memory.
1 AA 9 , AA 10 ← ∅;
A. Datasets
2 foreach vi ∈ V do
3 foreach etj,i1
∈ E in (vi ) do Synthetic dataset: In order to evaluate the performance of
t1 Themis in detecting anomalies, we apply an existing graph
4 start← ej,i ;
generator, Watts-Strogatz1 to generate the background finan-
5 foreach eti,k 2
∈ E out (vi ) do
cial graph and relative network, including 1, 000 nodes and
6 if value(etj,i1
) − value(eti,k
2
) < ϵv then 993, 133 edges. Then abnormal patterns are inserted into the
t
end← ei,k ; graph as the ground truth to evaluate our anomaly detection
7 AA 9 .add((start,end)); algorithms, clue chains trancing technology, and Themis.
Real bank statements: To further evaluate the performance
8 DFS from vi , label the visited accounts, and add
of Themis in real-world scenarios, some financial activities
vj to an empty set V1 when visiting a labeled vj ;
of a bank are utilized as a benchmark dataset in our exper-
9 foreach vj ∈ V1 do
iments, including 110, 509 transfer records and 47 features.
10 if The maximum flow from vj to vi is 0 then
We provide a brief description of this dataset as follows,
11 Ec ← the minimum cut set found by the
including the bank statement (transaction amount, transaction
maximum flow algorithm from vi to vj ;
category, etc.), cardholder-related information (name, card
12 foreach (vi , vk ) ∈ V1 do AA10 .add(eti,k );
number, identification number, etc.), counterparty information
13 foreach (vk , vj ) ∈ V1 do AA10 .add(etk,j );
(account, ID number, etc.), and relative relationships.
New arrival activities: Based on the synthetic dataset, we
14 Return AA9 and AA10 .
generate 100 new nodes and 500 new edges with a sorted
time stamp to simulate newly generated financial activities.
The distribution of generated time stamps follows the same
C. Clue Chains Tracing
time distribution in the financial graph.
For a detected anomaly with an amount of money, a
financial institution may want to trace where the money comes 1 https://github.com/sleepokay/watts-strogatz
(a) Trigger-base detection. (b) Monitor-based detection. (c) Mining-based detection.
N u m b er
1 0 0
8 0
datasets with different data sizes. We record the average 6 0
4 0
2 01
number of detected anomalies and detection time for each
10
0 .5
00
Sa
0
la
ry
/T
50
new arrival activity for AA1 -AA8 , and the number of detected
ra
00
ns
10
u e
0
ac
0 .2 V a l
00
tio
00
tio n
n
V
n s a c
al
50
0 .1
ue
T ra
00
anomalies and detection time for AA9 -AA10 .
10
00
00
00
0
Fig. 6(a) shows that the average number of anomalies using (a) Proportion of anomalies. (b) Funding-Amount-Fluctuation.
trigger-based detection is small and independent from the data Fig. 7: Statistics of anomalies on real bank statements.
size because most of the incoming activity is normal and only
related to the distribution of incoming activities. The average
TABLE II: Number of chains for individual accounts.
detection time is constant for every new arrival activity. The
results in Fig. 6(b) show that the monitor-based detection Alice Bob Charlie David Emma
algorithms have good scalability for detecting anomalies. The Total number of chains 3,975 11,313 11,654 1,576 7,029
number of detected anomalies and the detection time increase Chains (ϵt =6,ϵa =$10,000) 217 503 179 47 219
linearly with the increase in dataset size. Fig. 6(c) shows the Chains (ϵt =1,ϵa =$10,000) 15 21 8 3 9
results of mining-based detection. When increasing the size
of the dataset, the number of detected anomalies increases
linearly for AA9 and quadratically for AA10 since we want Verification benefit for individual accounts. Table II demon-
to find every One-Way-Transfer pair in FAN. Mining-based strates the number of chains for a specific account of a person.
detection requires more time since the complexity of the “Total number of chains” is the account’s total financial
mining-based detection for AA9 is O(|E|2 ) and for AA10 is clues, i.e., the account of David has 1, 576 transaction-related
O(|V |2 |E|). chains, which should be checked by financial institutions
Anomalies analysis on real bank statements. Fig. 7 (a) manually before using Themis. In particular, Chains (ϵt , ϵa )
demonstrates the proportion of abnormal patterns detected is the number of clue chains traced by Themis under a
in real bank statements. Obviously, these anomalies are sig- given time interval ϵt and amount threshold ϵa (months and
nificantly rare compared with normal activities. Tracing the dollars), then top-k chains are recommended to institutions to
suspicious clue chains based on anomalies can improve the verify manually (top-10 chains in Themis). For example, the
efficiency of institutions. Fig. 7 (b) shows the funding-amount- institutions only need to check 47 and 3 suspect chains about
fluctuation anomalies detected, indicating that most customers’ David after deploying Themis under the threshold of (ϵt = 6
transferring amount is positively correlated with their historical and ϵa = $10, 000) and (ϵt = 1 and ϵa = $10, 000), improving
transaction amount, and only a small part of users have the efficiency compared with previous verification.
funding-amount-fluctuation anomalies. Verification benefit for anomalies. Table III shows the
average number of clue chains and their running time. It shows
C. Evaluation of “clue chains Tracing Algorithms” that when the time interval extends from one month to six
1) Efficiency: In this part, we show the efficiency of clue months, more suspicious clues appear. The financial institution
chain tracing in real bank statements. could use these two parameters to trace the clue chain easily.
Fig. 8: Case studies: clue chains detected by our system Themis.
Liu#Feng
a. Relation Tracing
Mr.x
TABLE III: Performance of chains for anomalies. suspicious clue chain. In this anomaly (AA5 ), Hz transferred
AA 4 AA 5 AA 8 AA 9 AA 10
massive money to Lc and Ys.
Case 2 (Clue chain detected by tracing AA8 ): As shown in
Average number of chains 17,647 537 8,673 2,130 47,586
Fig. 8(b), our algorithms detect that Hz acts as an intermediary
Chains (ϵt =6,ϵa =$10,000) 425 64 328 165 760
Time (ϵt =6,ϵa =$10,000) 2.87s 1.25s 2.67s 0.78s 3.18s
and receives two cash deposits from Sx and Yt respectively.
Then Hz transferred the money to Ly and his other account in
Chains (ϵt =1,ϵa =$10,000) 200 45 229 121 521
Time (ϵt =1,ϵa =$10,000) 1.66s 0.99s 1.54s 0.61s 2.56s a short period of time. Our approach traces that he successfully
transfers from Sx and Yt to Lc and Ys as an intermediary.
Case 3 (Clue chain detected by tracing AA4 and AA8 ):
Fig. 8(c) demonstrates a clue chain that tracks where $400, 000
Performance. Table III demonstrates the average running time comes from and what is it used for. $388, 000 was spent by
(seconds) of clue chain generation in Themis, including the Lc in a luxury shop at the end.
process of tracing, evaluating, and recommending chains. We
can find that the process of clue chain inference is significantly
efficient (all in seconds level), supporting anomaly detection D. Themis: An anomaly-detection system
in large-scale bank statements. More importantly, we developed an anomaly-detection sys-
2) Effectiveness: We conduct some case studies of clue tem (Themis) that can detect anomalies from disguised normal
chains generated by Themis, whose anomalies have been financial activities and find suspicious clue chains. It has
verified by institutions. In fig. 8, the node represents a person’s been deployed in many real scenarios, including banks and
account, and the edge denotes financial activities between two financial institutions. The pipeline of Themis is demonstrated
accounts where the activities with solid lines are anomalies as follows: (a) Anomalies are detected by “Anomaly Detecting
traced and the activities with the dashed line are inferred via Algorithm”; (b) Suspect clue chains are traced based on
the account’s deposits and withdrawals. anomalies via “clue chains Tracing Algorithms”; (c) Themis
Case 1 (Clue chain detected by tracing AA5 ): As shown evaluates and recommends suspicious clue chains to institu-
in Fig. 8(a), Hz is Lc’s driver. He is detected as funding- tions. The interface of Themis is shown in Fig. 9, including
amount fluctuation by using “Themis”. These activities sig- individuals’ assets, bank statements, cash transactions, rela-
nificantly exceed Hz’s income level ($5, 000 monthly salary). tive relationships, and so on. With the help of Themis, the
Our approach traces Hz’s transactions and recommends this anomalies (red solid lines in Fig. 9(b)) can be detected.
VIII. C ONCLUSION [15] R. A. L. Torres and M. Ladeira, “A proposal for online analysis and
identification of fraudulent financial transactions,” in IEEE International
In this paper, we design a uniform framework to detect Conference on Machine Learning and Applications (ICMLA), 2020.
anomalies from disguised normal financial activities. We are [16] J. He, C.-C. M. Yeh, Y. Wu, L. Wang, and W. Zhang, “Mining anomalies
the first to formalize and detect complex anomalies, meanwhile in subspaces of high-dimensional time series for financial transactional
data,” in Machine Learning and Knowledge Discovery in Databases.
considering heterogeneous features. In particular, we propose Applied Data Science Track: European Conference, (ECML PKDD).
a clue chain tracing technology to recommend suspect clue Springer, 2021, pp. 19–36.
chains for institutions. What’s more, we deploy a system, [17] Y. Li, Y. Sun, and N. Contractor, “Graph mining assisted semi-
supervised learning for fraudulent cash-out detection,” in Proceedings
Themis, to detect anomalies and infer clue chains in some of the 2017 IEEE/ACM International Conference on Advances in Social
real scenarios. Experiments on synthetic datasets and real bank Networks Analysis and Mining 2017, 2017, pp. 546–553.
statements show the efficiency and effectiveness of the Themis. [18] X. Mao, M. Liu, and Y. Wang, “Using gnn to detect financial fraud
based on the related party transactions network,” Procedia Computer
Science, vol. 214, pp. 351–358, 2022.
IX. ACKNOWLEDGEMENTS [19] Y. Pei, F. Lyu, W. V. Ipenburg, and M. Pechenizkiy, “Subgraph anomaly
The work is partially supported by the National Natural detection in financial transaction networks,” 2020.
[20] D. Wang, Y. Qi, J. Lin, P. Cui, Q. Jia, Z. Wang, Y. Fang, Q. Yu,
Science Foundation of China (Nos. U22A2025, 62072088, J. Zhou, and S. Yang, “A semi-supervised graph attentive network for
62232007), and Liaoning Provincial Science and Technology financial fraud detection,” in 2019 IEEE International Conference on
Plan Project - Key R&D Department of Science and Technol- Data Mining, 2019, pp. 598–607.
[21] W. Kudo, M. Nishiguchi, and F. Toriumi, “Gcnext: graph convolutional
ogy (No. 2023JH2/101300182). network with expanded balance theory for fraudulent user detection,”
Social Network Analysis and Mining, vol. 10, pp. 1–12, 2020.
[22] S. Pathan and V. Shrivastava, “Identifying linked fraudulent activities
R EFERENCES using graphconvolution network,” arXiv:2106.04513, 2021.
[1] B. L. Handoko, R. N. A. Putri, and S. Wijaya, “Analysis of fraudulent [23] X. Wang, Z. Wan, and Y. Zhang, “A dqn-based internet financial fraud
financial reporting based on fraud heptagon model in transportation transaction detection method,” in International Conference on Computer
and logistic industry listed on idx during covid-19 pandemic,” in Science and Application Engineering (CSAE), 2021, pp. 1–5.
International Conference on Software and e-Business, 2022, pp. 56–63. [24] B. Can, A. G. Yavuz, M. E. Karsligil, and M. A. Güvensan, “A closer
[2] E. Hytis, V. Nastos, C. Gogos, and A. Dimitsas, “Automated identi- look into the characteristics of fraudulent card transactions,” IEEE
fication of fraudulent financial statements by analyzing data traces,” Access, vol. 8, pp. 166 095–166 109, 2020.
in The South-East Europe Design Automation, Computer Engineering, [25] X. Mao, H. Sun, X. Zhu, and J. Li, “Financial fraud detection using the
Computer Networks and Social Media Conference (SEEDA-CECNSM). related-party transaction knowledge graph,” Procedia Computer Science,
IEEE, 2022, pp. 1–7. vol. 199, pp. 733–740, 2022.
[3] S. Dhankhad, E. Mohammed, and B. Far, “Supervised machine learning [26] T. Chen, L. Tang, Y. Sun, Z. Chen, and K. Zhang, “Entity embedding-
algorithms for credit card fraudulent transaction detection: a comparative based anomaly detection for heterogeneous categorical events,” in Pro-
study,” in IEEE international conference on information reuse and ceedings of the Twenty-Fifth International Joint Conference on Artificial
integration (IRI), 2018, pp. 122–125. Intelligence, 2016, pp. 1396–1403.
[4] J. C. Ying, J. Zhang, C. W. Huang, K. T. Chen, and V. S. Tseng, [27] H. Xiang, H. Hu, and X. Zhang, “Deepiforest: A deep anomaly detection
“Fraudetector +: An incremental graph-mining approach for efficient framework with hashing based isolation forest,” in IEEE International
fraudulent phone call detection,” ACM Transactions on Knowledge Conference on Data Mining (ICDM), 2022, pp. 1251–1256.
Discovery from Data, vol. 12, no. 6, pp. 1–35, 2018. [28] C. Wang and H. Zhu, “Wrongdoing monitor: A graph-based behavioral
[5] S. Zhou, J. He, H. Yang, D. Chen, and R. Zhang, “Big data-driven anomaly detection in cyber security,” Trans. Info. For. Sec., vol. 17, p.
abnormal behavior detection in healthcare based on association rules,” 2703–2718, jan 2022.
IEEE Access, vol. PP, no. 99, pp. 1–1, 2020. [29] S. Reddy, P. Poduval, A. V. S. Chauhan, M. Singh, S. Verma, K. Singh,
[6] A. C. Kim, W. H. Park, and D. H. Lee, “A framework for anomaly and T. Bhowmik, “Tegraf: temporal and graph based fraudulent trans-
pattern recognition in electronic financial transaction using moving action detection framework,” in The ACM International Conference on
average method,” in IT Convergence and Security, 2013, pp. 93–99. AI in Finance (ICAIF), 2021, pp. 1–8.
[7] J.-S. Chang and W.-H. Chang, “Analysis of fraudulent behavior strate- [30] M. Shen, A. Sang, P. Duan, H. Yu, and L. Zhu, “Threat prediction of
gies in online auctions for detecting latent fraudsters,” Electronic Com- abnormal transaction behavior based on graph convolutional network
merce Research and Applications, vol. 13, no. 2, pp. 79–97, 2014. in blockchain digital currency,” in Blockchain and Trustworthy Systems
[8] F. Rahmani, C. Valmohammadi, and K. Fathi, “Detecting fraudulent (BlockSys). Springer, 2021, pp. 201–213.
transactions in banking cards using scale-free graphs,” Concurrency and [31] B. Hooi, H. A. Song, A. Beutel, N. Shah, K. Shin, and C. Faloutsos,
Computation: Practice and Experience, vol. 34, no. 19, p. e7028, 2022. “FRAUDAR: bounding graph fraud in the face of camouflage,” in
[9] H. Zhang and W. Zhou, “A two-stage virtual machine abnormal Proceedings of the 22nd ACM SIGKDD International Conference on
behavior-based anomaly detection mechanism,” Cluster Computing, Knowledge Discovery and Data Mining, San Francisco, CA, USA,
vol. 25, no. 1, pp. 203–214, 2022. August 13-17, 2016, 2016, pp. 895–904.
[10] M. Y. Turaba, M. Hasan, N. I. Khan, and H. A. Rahman, “Fraud [32] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection
detection during financial transactions using machine learning and deep and description: a survey,” Data Min. Knowl. Discov., vol. 29, no. 3, pp.
learning techniques,” in IEEE International Conference on Communica- 626–688, 2015.
tions, Computing, Cybersecurity, and Informatics, 2022, pp. 1–8. [33] Z. Zhang and L. Zhao, “Unsupervised deep subgraph anomaly detec-
[11] E. E. Papalexakis, A. Beutel, and P. Steenkiste, “Network anomaly de- tion,” in IEEE International Conference on Data Mining (ICDM), 2022,
tection using co-clustering,” in Encyclopedia of Social Network Analysis pp. 753–762.
and Mining, 2nd Edition, 2018. [34] A. Zhang, B. Wu, and Y. Li, “A heterogeneous graph-based fraudulent
[12] S. Cao, X. Yang, J. Zhou, X. Li, Y. Qi, and K. Xiao, “Poster: Actively community detection system,” in IEEE International Conference on e-
detecting implicit fraudulent transactions,” in The ACM SIGSAC Confer- Business Engineering (ICEBE), 2021, pp. 43–48.
ence on Computer and Communications Security, 2017, pp. 2475–2477. [35] H. Yildirim, V. Chaoji, and M. J. Zaki, “GRAIL: a scalable index for
[13] X. Gu and H. Wang, “Online anomaly prediction for robust cluster reachability queries in very large graphs,” VLDB J., vol. 21, no. 4, pp.
systems,” in Proceedings of the 25th International Conference on Data 509–534, 2012.
Engineering, 2009, pp. 1000–1011. [36] H. Wei, J. X. Yu, C. Lu, and R. Jin, “Reachability querying: An
[14] Z. Wang, “Abnormal financial transaction detection via ai technology,” independent permutation labeling approach,” Proc. VLDB Endow., vol. 7,
International Journal of Distributed Systems and Technologies (IJDST), no. 12, pp. 1191–1202, 2014.
vol. 12, no. 2, pp. 24–34, 2021.