0% found this document useful (0 votes)
6 views4 pages

Ju 2009

research paper

Uploaded by

nitinkymr21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Ju 2009

research paper

Uploaded by

nitinkymr21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2009 First International Workshop on Database Technology and Applications

Research on Credit Card Fraud Detection Model Based on Similar Coefficient Sum

Chun-Hua JU Na Wang
Computer Science and Information Engineering Computer Science and Information Engineering
College College
Zhejiang Gongshang University Zhejiang Gongshang University
Hangzhou, China Hangzhou, China
[email protected] [email protected]

Abstract—This paper analyses the causes that give rise to the comprehensive evaluating. Therefore, this paper puts
risk of credit fraud and presents an anomaly detection method forward a detection model to check credit card fraud based
by using an outlier detection model based on similar coefficient on similar coefficient sum, which account the similar
sum. It finds fraud record by computing similar coefficient coefficient sum between objects to check outliers hidden in
sum of every two objects and an example is given to validate data by using the outlier mining arithmetic based on similar
the model. The results show the feasibility and validity of the coefficient sum. Compared with other abnormal detection
method. The research work furnishes a basis for further study technologies, this model needn’t the process of training, thus
of applying outlier to the analysis and prediction of deception it overcome the problem of high false alarm rate. And
risks.
experiments have shown that this model is feasible and
Keywords-similar coefficient sum; outlier; credit card; fraud
veracity.
detection II. CREDIT CARD FRAUD DETECTION MODEL BASED ON
I. INTRODUCTION SIMILAR COEFFICIENT SUM
With the rise of economic culture standard and the rapid A. Clustering Alogorithm to Check Outlier Based on
of people’s life rhythm, China’s credit card market has a Similar Coefficient Sum
great development. It is said that, up to the end of 2007, the Similar coefficient sum-based outlier detection algorithm
number of credit card in mainland has reached 70,000,000[1]. is described as follows [8]:
Meanwhile, crimes involving credit card fraud increasing, Let , ,…, be a set of objects to be checked,
this would disturb the parties’ financial order seriously. It every object with m indexes, that is,
cause losses to bank and cardholder, and affect development , ,…, , 1,2, … ,
of banks. How to strengthen the ability of identifying and Using Data Matrix will be presented as follows:
preventing credit card fraud has become the focus of banks’ x x x
risk management[2]. x x … x
Traditional detection methods discern frauds mainly X (1)
depending on the support of the database system and clients’
x x x
education level, whose disadvantages lie in bad in-time, Now, outlier sets of n objects are required to be
inaccurate and hysteretic nature. Then Based on discriminate estimated.
analysis and based on regression analysis had been Before estimating the dispersion degree of objects among
presented. The two analyses identify frauds through giving X, similar coefficient r between every object should be
credit grade to cardholder and credit card transaction and are
computed first, and be composed similar coefficient matrix,
used widely [3], but the shortcoming of big amount of data
that is
still exist. In recent years, Data Mining becomes increasingly r11 r12 r1n
important and has widely applied in process industry, which
r21 r22 … r2n
make people began to concern credit card fraud detection R (2)
model based on Data Mining. Relative to the whole deals,
credit card fraud transaction belongs to the fewness of rn1 rn2 rnn
abnormity data. In this paper, the method of detection r 1 ∑ x x (3)
outliers is used for set up a detection model, which could
mine fraud transactions as outliers[4]; thereby provide p ∑ r (4)
decision support to prevent frauds and to control risks. Many p is the sum of the ith line in similar coefficient matrix,
outlier detection algorithms such as base on statistics[5] and the smallest, the furthest between object i and other objects.
distance [6, 7] are gain good application. That means object i is candidate item of outlier set.
It is based on the character of item set those above- λ 100% (5)
mentioned algorithms to check outlier in data mining, and
they are not suitable for the outlier checking in MODM and

978-0-7695-3604-0/09 $25.00 © 2009 IEEE 295


DOI 10.1109/DBTA.2009.170
λ is the threshold, and all object accord will λ λ are characteristics of cardholder. It is difficult to accurately
considered as outlier set. determine whether the transaction is deceptive or not only
according to the information of transaction. Thereby, some
B. Design Thought attributes of character of cardholders can be chosen for
Figure1 shows the basic framework of credit card object attribute too. Combination of above-mentioned, every
detection model based on similar coefficient sum. First of all, sample has a total of 51 attributes, according to the business
according to every attribute of the transaction sample, the experience, the properties which have none or little
model does some data preprocessing to convert all of them to connection with fraud are deleted, and finally 26 properties-
numerical attribute. Then, a algorithm to check outlier based related left as model input, as shown in Table 1 below
on similar coefficient sum is used for calculating the similar
coefficient between sample objects, and for obtaining similar TABLE I. ATTRIBUTES OF TRAINING SAMPLE SET
coefficient sum. At last, the paper sets a threshold and Attribute number Attribute
obtains outlier set by comparing the size between similar
coefficient sum and threshold. 1 Customer income
The main design thought of credit card fraud detection
model based on similar coefficient sum is to check outlier set 2 Customer age
by accounting similar coefficient sum and setting the 3 Customer profession
threshold, while the threshold is just a input parameter and
will be changed in different area. Based on this property, this 4 Customer position
model are suitable for the outlier checking in MODM and
comprehensive evaluating. 5 Marriage status
6 Working years
7 Number of card used
8 Housing type
9 Credit card type
10 Credit grade
11 Credit line
12 Book balance
13 Times of using card
14 Times of overdraft
15 Time bracket
16 Times of overdraft
17 Times of bad debt
Times of overdraft but not
18
bad debt
Figure 1. Credit Card Fraud Detection Model Structure
19 using card frequency
III. EXPERIMENT AND RESULT ANALYSIS 20 Overdraft rate
In this paper, the real credit card data from a domestic 21 Growth rate of shopping
commercial bank is chosen as study object. The sample set is
constituted by 16,584 transaction records of 67 cardholders 22 Average of book balance
in database, in which 1, 5135 records are non-fraudulent
transactions and 1449 are fraudulent transactions. The Fraud 23 Average daily spending
attribute of them are marked 0 and 1 respectively. The 24 Average daily overdraft
difference of consumer behavior between fraudster and
cardholder is obvious. Moreover, cardholders’ account data Average amount per
25
and transaction data is large relevant with and reflect the transaction
consumption habits. So, in this paper, account data and
transaction data are one of object attributes. Similarly, Average number of days
26
consumption habits have a great association with per overdraft

296
A. Experiment process basis for further study of applying outlier to the analysis and
There are two type mistakes in credit card fraud detection prediction of deception risks.
researching[9]; one is mistakenly regard fraudulent
transactions as non-fraudulent transactions, which can be
called as the first class error or False Negative error, the
other is mistakenly regard non-fraudulent transactions as
fraudulent transactions, called as the second class error of
False Positive error. As confusion matrix of figure 2 shows
FN
that, the first class error rate is , the second class
TP FN
FP
error rate is , and the accuracy, which can be
FP TN
TP TN
expression as , can only reflect the total
TP TN FP TN
accuracy of the algorithm, but it cannot reflect the ability of
prediction aiming at fraud sample set. So, in this paper, two
class error rate are used as evaluate arithmetic.
Forecast Result TABLE II. EXPERIMENT RESULT

Non-
Fraud
Fraud

Fraud TP FN

Practice
Data
Non-
FP TN
Fraud

Figure 2. Confusion Matrix

The experimental process can mainly be divided into TABLE III. EXPERIMENT RESULT ANALYSE
three parts : IV. CONCLUSION
• ①Input a data set of credit card transaction record,
there are m characteristics attributes in every record. This paper presents a new credit card fraud detection
The final sample data can be obtained by model based on similar coefficient sum to forecast whether
pretreatment [10]. the credit card transaction is fraud transaction or not. The
• ②Compute similar coefficient r between two credit experiment shows that the model can detect fraud transaction
exactly, and the result is better than anomaly detection by
card transaction records of data set, and make up of
clustering when the anomaly data is far less then normal data.
similar coefficient matrix, then according p
If the algorithm can be used in bank’s credit card fraud
∑ r , the similar coefficient sum is computed. The
detection system, it is impossible to dope out the probability
smallest of the similar coefficient sum, the furthest of cheating soon after transaction. Thereby, a serious of anti-
distance between ith credit card transaction record fraud strategies can be done purposefully. It will reduce
and other objects, which means candidate item of bank’s risk effective.
outlier set.
• ③ Computer the distance threshold λ ACKNOWLEDGMENT
100%, and set threshold parameter λ. All objects Chunhua Ju thanks the Zhejiang Xinmiao Talent Project
accord with λ λ are considered as outliers (2008R40G2050025) , Zhejiang Technology Project
outputted. (2008C14061) and the Project of Graduate Student Science
Innovation of Zhejiang Gongshang University
B. Experiment result analysis (1130XJ1508083) for support.
As table 2 and table 3 shows that, five experiments result
all reached high accuracy and low error rate. For example, REFERENCES
the first class error rate is lowest and the total accuracy is
highest when is equal to 12; while the second class error [1] Wang Xi. Some Ideas about Credit Card Fraud Prediction China Trial.
rate is lowest when is equal to 9. It is concluded that the Apr. 2008, pp. 74-75.
credit card fraud detection model based on similar coefficient [2] Chen Lei. Fraud and Prevention of International Credit Card. China
sum is feasible and available. The research work furnishes a Credit Card. Jun. 2004, pp. 43-47. vol. 294, Dec. 2001,

297
[3] Liu Ren, Zhang Liping, Zhan Yinqiang. A Study on Construction of [7] Arning A, Agrawal R, Raghavwn P. A Linear Method for Deviation
Analysis Based CRM System. Computer Applications and Software. Detection in Large Database. In Proc. 1996 Int. Conf. Data Mining
Vol.21, Apr. 2004, pp. 46-47 and Knowledge Discovery(KDD07), Portland, OR, Aug. 1996, pp.
[4] Han J W, Kamber M. Data Mining: Concepts and Techniques. 164-169.
Beijing: Higher Education Pr. and Morgan Kaufmann Publishers, [8] Jiang Lingmin. Clustering Algorithm to Check Outlier Based on
2007. Similar Coefficient Sum. Computer Engineering. Vol.29, Nov, 2003,
[5] Barnett V, Lewis T. Outliers in Statistical Data New York: John pp.183-185.
Wiley & Sons, 1994. [9] Tong Fengru. Research on Credit Card Fraud Identification Based on
[6] Knorr E, Ng R. A Unified Notion of Outliers:Properties and Combined Classifier.
Computation In proc. 1997 Int. Conf. Knowledge Discovery and Data [10] Zhai Linghui, Ma Shaoping. Tang Huanling. Data Preprocessing of
Mining(KDD 97), Newport Beach, CA, 1997, pp. 219-222. Classification Mining of Credit Card. Computer Engineering. Vol.29,
Dec.2003, pp.195-197.

298

You might also like