Ju 2009
Ju 2009
Research on Credit Card Fraud Detection Model Based on Similar Coefficient Sum
Chun-Hua JU Na Wang
Computer Science and Information Engineering Computer Science and Information Engineering
College College
Zhejiang Gongshang University Zhejiang Gongshang University
Hangzhou, China Hangzhou, China
[email protected] [email protected]
Abstract—This paper analyses the causes that give rise to the comprehensive evaluating. Therefore, this paper puts
risk of credit fraud and presents an anomaly detection method forward a detection model to check credit card fraud based
by using an outlier detection model based on similar coefficient on similar coefficient sum, which account the similar
sum. It finds fraud record by computing similar coefficient coefficient sum between objects to check outliers hidden in
sum of every two objects and an example is given to validate data by using the outlier mining arithmetic based on similar
the model. The results show the feasibility and validity of the coefficient sum. Compared with other abnormal detection
method. The research work furnishes a basis for further study technologies, this model needn’t the process of training, thus
of applying outlier to the analysis and prediction of deception it overcome the problem of high false alarm rate. And
risks.
experiments have shown that this model is feasible and
Keywords-similar coefficient sum; outlier; credit card; fraud
veracity.
detection II. CREDIT CARD FRAUD DETECTION MODEL BASED ON
I. INTRODUCTION SIMILAR COEFFICIENT SUM
With the rise of economic culture standard and the rapid A. Clustering Alogorithm to Check Outlier Based on
of people’s life rhythm, China’s credit card market has a Similar Coefficient Sum
great development. It is said that, up to the end of 2007, the Similar coefficient sum-based outlier detection algorithm
number of credit card in mainland has reached 70,000,000[1]. is described as follows [8]:
Meanwhile, crimes involving credit card fraud increasing, Let , ,…, be a set of objects to be checked,
this would disturb the parties’ financial order seriously. It every object with m indexes, that is,
cause losses to bank and cardholder, and affect development , ,…, , 1,2, … ,
of banks. How to strengthen the ability of identifying and Using Data Matrix will be presented as follows:
preventing credit card fraud has become the focus of banks’ x x x
risk management[2]. x x … x
Traditional detection methods discern frauds mainly X (1)
depending on the support of the database system and clients’
x x x
education level, whose disadvantages lie in bad in-time, Now, outlier sets of n objects are required to be
inaccurate and hysteretic nature. Then Based on discriminate estimated.
analysis and based on regression analysis had been Before estimating the dispersion degree of objects among
presented. The two analyses identify frauds through giving X, similar coefficient r between every object should be
credit grade to cardholder and credit card transaction and are
computed first, and be composed similar coefficient matrix,
used widely [3], but the shortcoming of big amount of data
that is
still exist. In recent years, Data Mining becomes increasingly r11 r12 r1n
important and has widely applied in process industry, which
r21 r22 … r2n
make people began to concern credit card fraud detection R (2)
model based on Data Mining. Relative to the whole deals,
credit card fraud transaction belongs to the fewness of rn1 rn2 rnn
abnormity data. In this paper, the method of detection r 1 ∑ x x (3)
outliers is used for set up a detection model, which could
mine fraud transactions as outliers[4]; thereby provide p ∑ r (4)
decision support to prevent frauds and to control risks. Many p is the sum of the ith line in similar coefficient matrix,
outlier detection algorithms such as base on statistics[5] and the smallest, the furthest between object i and other objects.
distance [6, 7] are gain good application. That means object i is candidate item of outlier set.
It is based on the character of item set those above- λ 100% (5)
mentioned algorithms to check outlier in data mining, and
they are not suitable for the outlier checking in MODM and
296
A. Experiment process basis for further study of applying outlier to the analysis and
There are two type mistakes in credit card fraud detection prediction of deception risks.
researching[9]; one is mistakenly regard fraudulent
transactions as non-fraudulent transactions, which can be
called as the first class error or False Negative error, the
other is mistakenly regard non-fraudulent transactions as
fraudulent transactions, called as the second class error of
False Positive error. As confusion matrix of figure 2 shows
FN
that, the first class error rate is , the second class
TP FN
FP
error rate is , and the accuracy, which can be
FP TN
TP TN
expression as , can only reflect the total
TP TN FP TN
accuracy of the algorithm, but it cannot reflect the ability of
prediction aiming at fraud sample set. So, in this paper, two
class error rate are used as evaluate arithmetic.
Forecast Result TABLE II. EXPERIMENT RESULT
Non-
Fraud
Fraud
Fraud TP FN
Practice
Data
Non-
FP TN
Fraud
The experimental process can mainly be divided into TABLE III. EXPERIMENT RESULT ANALYSE
three parts : IV. CONCLUSION
• ①Input a data set of credit card transaction record,
there are m characteristics attributes in every record. This paper presents a new credit card fraud detection
The final sample data can be obtained by model based on similar coefficient sum to forecast whether
pretreatment [10]. the credit card transaction is fraud transaction or not. The
• ②Compute similar coefficient r between two credit experiment shows that the model can detect fraud transaction
exactly, and the result is better than anomaly detection by
card transaction records of data set, and make up of
clustering when the anomaly data is far less then normal data.
similar coefficient matrix, then according p
If the algorithm can be used in bank’s credit card fraud
∑ r , the similar coefficient sum is computed. The
detection system, it is impossible to dope out the probability
smallest of the similar coefficient sum, the furthest of cheating soon after transaction. Thereby, a serious of anti-
distance between ith credit card transaction record fraud strategies can be done purposefully. It will reduce
and other objects, which means candidate item of bank’s risk effective.
outlier set.
• ③ Computer the distance threshold λ ACKNOWLEDGMENT
100%, and set threshold parameter λ. All objects Chunhua Ju thanks the Zhejiang Xinmiao Talent Project
accord with λ λ are considered as outliers (2008R40G2050025) , Zhejiang Technology Project
outputted. (2008C14061) and the Project of Graduate Student Science
Innovation of Zhejiang Gongshang University
B. Experiment result analysis (1130XJ1508083) for support.
As table 2 and table 3 shows that, five experiments result
all reached high accuracy and low error rate. For example, REFERENCES
the first class error rate is lowest and the total accuracy is
highest when is equal to 12; while the second class error [1] Wang Xi. Some Ideas about Credit Card Fraud Prediction China Trial.
rate is lowest when is equal to 9. It is concluded that the Apr. 2008, pp. 74-75.
credit card fraud detection model based on similar coefficient [2] Chen Lei. Fraud and Prevention of International Credit Card. China
sum is feasible and available. The research work furnishes a Credit Card. Jun. 2004, pp. 43-47. vol. 294, Dec. 2001,
297
[3] Liu Ren, Zhang Liping, Zhan Yinqiang. A Study on Construction of [7] Arning A, Agrawal R, Raghavwn P. A Linear Method for Deviation
Analysis Based CRM System. Computer Applications and Software. Detection in Large Database. In Proc. 1996 Int. Conf. Data Mining
Vol.21, Apr. 2004, pp. 46-47 and Knowledge Discovery(KDD07), Portland, OR, Aug. 1996, pp.
[4] Han J W, Kamber M. Data Mining: Concepts and Techniques. 164-169.
Beijing: Higher Education Pr. and Morgan Kaufmann Publishers, [8] Jiang Lingmin. Clustering Algorithm to Check Outlier Based on
2007. Similar Coefficient Sum. Computer Engineering. Vol.29, Nov, 2003,
[5] Barnett V, Lewis T. Outliers in Statistical Data New York: John pp.183-185.
Wiley & Sons, 1994. [9] Tong Fengru. Research on Credit Card Fraud Identification Based on
[6] Knorr E, Ng R. A Unified Notion of Outliers:Properties and Combined Classifier.
Computation In proc. 1997 Int. Conf. Knowledge Discovery and Data [10] Zhai Linghui, Ma Shaoping. Tang Huanling. Data Preprocessing of
Mining(KDD 97), Newport Beach, CA, 1997, pp. 219-222. Classification Mining of Credit Card. Computer Engineering. Vol.29,
Dec.2003, pp.195-197.
298