0% found this document useful (0 votes)
135 views21 pages

Main Project

The document summarizes a research project comparing different machine learning algorithms for credit card fraud detection. It discusses the challenges with existing fraud detection systems, including imbalanced data and lack of standard metrics. The proposed system will compare Support Vector Machine, Artificial Neural Network, and Random Forest algorithms on measures like accuracy, sensitivity and AUPRC to determine the best performing algorithm for credit card fraud classification.

Uploaded by

Harsha Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views21 pages

Main Project

The document summarizes a research project comparing different machine learning algorithms for credit card fraud detection. It discusses the challenges with existing fraud detection systems, including imbalanced data and lack of standard metrics. The proposed system will compare Support Vector Machine, Artificial Neural Network, and Random Forest algorithms on measures like accuracy, sensitivity and AUPRC to determine the best performing algorithm for credit card fraud classification.

Uploaded by

Harsha Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Jnanasangama, Belagavi-590018

VIJAYA VITTALA INSTITUTE OF TECHNOLOGY


Dept. Of CSE

Performance Evaluation Of Different


Decision Making Algorithm On
Credit Card Fraud Detection Paradigm

Submitted By
Chabikant Sarnakar (1VJ16CS016)
Chaithra Kishan (1VJ16CS017)
DEPT OF CSE, VVIT Harsha Saw (1VJ16CS024)1
Contents

• Introduction
• Literature Survey
• Problem Statement
 Existing System
 Drawbacks of Existing System
 Proposed System
 Objective
• System Requirements
• Architecture Diagram
• Algorithms
• Methodology
• Result Analysis
• Conclusion
• Future Enhancement
• References

DEPT OF CSE, VVIT 2


Introduction

• Fraud detection concerns a large number of financial institutions and banks as this crime costs
them around $67 billion per year.
• There are different type of fraud: insurance fraud, credit card fraud, statement fraud, securities
fraud etc.
• Of all of them, credit card fraud is the most common type.
• It is defined as an unauthorized use of a credit card account. It occurs when the cardholder and
the card issuer are not aware that the card is being used by a third party.
• Fraud detection and prevention are costly, time-consuming, and labor-intensive tasks. A number
of significant research works have been dedicated to developing innovative solutions to detect
different types of fraud. According to a survey, 33,305 cases of credit card identity fraud were
reported between January and June in 2018.

DEPT OF CSE, VVIT 3


The fraudsters can obtain goods without paying, or gain illegal access to funds from an account
Credit
card fraud is classified into different types based on the nature of fraudulent activities. They are
briefly introduced in the following:
 Simple theft (offline fraud): a stolen card is the most straightforward type of credit card fraud. It
is also the fastest to be detected.
 Application fraud: when individuals obtain new credit cards using false personal information.
 Bankruptcy fraud: this consists in using a credit card while being insolvent, and purchasing
goods knowing that they are not able to pay. This type can be prevented with credit scoring
techniques.
 Internal fraud: when bank employees steal the card details to use it remotely.
 Counterfeit fraud / behavioral fraud / cardholder-not present fraud: when transactions are
made remotely (mobile sales, online, etc.), the cardholder does not need to be present, only the
details of a legitimate credit card are needed. The card’s details can be obtained by skimming or
shoulder surfing. The detection of this type of credit card fraud can take time, and needs
sophisticated methods that catch the transactions patterns.

• In this paper, we focus on counterfeit fraud as it is far more challenging to detect, and the damage of
DEPT OF CSE, VVIT 4
this fraud is irrevocable.
Literature Survey
Authors Algorithm Advantages Disadvantages
S.Ghosh and Credit card fraud 1)Ability to learn from the past 1)Difficulty to confirm the structure
D. L. Reilly detection with a 2)no need to be reprogrammed 2)high processing time for large neural
neural-network 3) Ability to extract rules and predict networks and excessive training
future activities based on the current 3)high expense
situation 4)non numerical data need to be
4) High accuracy, portability, high converted and normalized
speed in detection, the easiness to be 5)Sensitivity to data format.
built and operated
5)Effectiveness in dealing with noisy
data, in predicting patterns, in solving
complex problems, and in processing
new instances
6)Adaptability, Maintainability.

DEPT OF CSE, VVIT 5


Authors Algorithms Advantages Disadvantages

Y. Sahin and E. Duman Detecting credit 1)It is being used for 1)Poor in process large
card fraud by many applications, dataset
decision trees and such as hand writing 2)has low speed of
support vector analysis, face analysis detection
machines and especially for 3)more computation time
pattern based
applications
2)SVM can be robust .

Y. Sahin, S. Bulkan, and A cost-sensitive decision 1)Impressive in versatility, 1)Need to choose the
E. Duman tree approach for fraud parallelizable, great with number of trees manually
detection high dimensionality 2)cant be interpreted
2)quick prediction, training 3)consumes more
speed, handles unbalanced memory.
data.

DEPT OF CSE, VVIT 6


Problem Statement
 Existing System
This research was based on unsupervised learning. Significance of this paper was to find new
methods for fraud detection and to increase the accuracy of results.
Thus the accuracy of the results obtained from these methods are less when compared with the
proposed system
• As imbalanced classification persists, the number of false alarms generated is higher than the
number of frauds that are detected.
• Another problematic issue is in credit card detection is the scarcity of available data due to
confidentiality issues that give little chance to the community to share real datasets and assess
existing techniques.
 Drawbacks Of Existing System
• Fraud detection systems are prune to several difficulties and challenges enumerated below. An
effective fraud detection technique should have abilities to address these difficulties in order to
achieve best performance.
DEPT OF CSE, VVIT 7
• Imbalanced data: The credit card fraud detection data has imbalanced nature. It means
that very small percentages of all credit card transactions are fraudulent. This cause the
detection of fraud transactions very difficult and imprecise.
• Fraud detection cost: The system should take into account both the cost of fraudulent
behavior that is detected and the cost of preventing it.
• Nonexistence of standard algorithm :There is not any powerful algorithm known in
credit card fraud literature that outperforms all others. Each technique has its own advantages
and disadvantages . Combining impactful algorithms to support each other’s benefits and
cover their weaknesses would be of great interest.
• Nonexistence of suitable metrics: The limitation of good metrics in order to evaluate the
results of fraud detection system is yet an open issue. Nonexistence of such metrics causes
incapability of researchers and practitioners in comparing different approaches and
determining priority of most efficient fraud detection systems.

DEPT OF CSE, VVIT


8
 Proposed System
• Several solutions have been proposed in a large body which to the best of our
knowledge, are built on machine learning algorithms.
• As it is a classification paradigm, only the classification algorithms would differentiate
fraud and non-fraud transaction.
• Support Vector Machine, Artificial Neural Network, and Random Forest algorithms are
used as they are the most suitable methods according to the 3 considered performance
measures (Accuracy, Sensitivity and AUPRC).
• The comparative analysis on the desired parameters like confusion matrix , measure,
precision, accuracy, intervention and recall are used to compare these algorithm.
• We will develop a model for the class imbalance problem to find a trade-off between
sensitivity and accuracy.

DEPT OF CSE, VVIT 9


 Objectives

• The objectives of credit card fraud detection are to reduce losses due to payment fraud for
both merchants and issuing banks an increase revenue opportunities for merchants.
• The aim is to drop in the false alarm and thus also to lead an increase in accuracy.
• Our goal is to detect the issues that must be solved to product a highly efficient solution for
the class imbalance problem.
• Our aim here is to detect 100% of the fraudulent transactions while minimizing the incorrect
fraud classifications.
• The performance evaluation by developing the confusion matrix,measure, precision,
accuracy, intervention and recall are used to compare these algorithms. Python and SKlearn
based implementation is carried out.
 

DEPT OF CSE, VVIT


10
Software and Hardware Requirements
• Software Requirements
 Python with Sklearn
Jupyter cloud platform
• Hardware Requirements

RAM with 4GB


i3 or i5 processor

DEPT OF CSE, VVIT


11
System Architecture

DEPT OF CSE, VVIT 12


Algorithms
Logistic Regression
• Logistic Regression is a type of generalized linear model. Simple linear regression is not suitable when the variable to be predicted is
binary.
• The vector α = (α0, α1, α2, . . . , αn) represents the coefficients, X = (1, X1, X2, . . . , Xn) the exploratory variables, and the model’s
error. The linear model is defined as follows:Y = α0 + α1 X1 + α2 X2 + · · · + αnXn +$ = Xα + $ (since $=eta)
• In logistics regression, a logic link function g over [0, 1] in R is introduced, to force the linear combination of the
variables to take values between 0 and 1: g(p) = Xα, where p is the probability of fraud risk that we are estimating.
• logic function is defined as: g(p) = ln p/ (1 − p )with p = e Xα/( 1 + e)

Support Vector Machines


• Support vector machine is a method used in pattern recognition and classification.
• It is a classifier to predict or classify patterns into two categories; fraudulent or non fraudulent.
• It is well suited for binary classifications.
• As any artificial intelligence tool, it has to be trained to obtain a learned model.
• SVM has been used in many classification pattern recognition problems such as text categorization, bioinformatics and face
detection.
• SVM is correlated to , neural networks and machine learning.
DEPT OF CSE, VVIT 13
Random Forest Algorithm
• Random Forest is also called as Random Decision Forest (RFA) which is used for Classification,
Regression and other tasks that are performed by constructing multiple decision trees.
• This Random Forest Algorithm is based on supervised learning and the major advantage of this
algorithm is that it can be used for both Classification and Regression.
• Random Forest Algorithm gives better accuracy when compared with all other existing systems and
this is most commonly used algorithm.
• This algorithm is based on supervised learning algorithm where it uses decision trees for classification
of the dataset.
• After classification of dataset a confusion matrix is obtained.
• The performance of Random Forest Algorithm is evaluated based on the confusion matrix.

DEPT OF CSE, VVIT 14


Methedology
Exploratory Data Analysis
• In this module we will first collect all the credit card dataset and store it in a database.
• Then we will perform some descriptive analysis about the dataset.

Data Cleaning
• In the next step, after analyzing the dataset then we have to clean the data.
• In this cleaning process all the duplicate values and null values that are present in the dataset will be removed and a new
dataset will be obtained.

Preprocessing of dataset
• In this module the cleaned dataset will be preprocessed where the dataset will be divided based on amount and
transaction time.

DEPT OF CSE, VVIT 15


Dataset Partition

• In this module first the dataset will be divided into two partitions as trained dataset and
testing dataset.
• After the data partitions the Random Forest Algorithm is applied.
• After applying Random Forest Algorithm finally a confusion matrix is obtained.

Evaluation
• Now the resultant data obtained in the form of confusion matrix can be evaluated by using graphical
representation which gives better accuracy.

DEPT OF CSE, VVIT 16


Module Diagram

DEPT OF CSE, VVIT 17


Result Analysis
• The performance of the proposed classifier is evaluated in terms of 4 classification metrics the meaning of the
terms P, N, TP, TN, FP, and FN are defined as follows:
1. True positives (TP): number of fraud transactions predicted as fraud.
2. True negatives (TN): number of legal transactions predicted as legal.
3. False positives (FP): number of legal transactions predicted as fraud.
4. False negatives (FN): number of fraud transactions predicted as legal.
• Also the performance of the proposed fraud detection model (Fraud Miner) is compared with 3 other states of the
art classifiers used for credit card fraud detection : support vector machine (SVM) , logistic regression(LR), and
random forest .
• These are the base classifiers used in the state-of the-art financial fraud detection models described in the
literature review.
• Some parameter important to the finding the result are: Sensitivity/Fraud Catching Rate, False Alarm Rate,
Balanced Classification Rate (BCR), Matthews Correlation Coefficient (MCC).

DEPT OF CSE, VVIT


18
Conclusion
• It proposes a fraud detection model whose performance is evaluated with an anonymized dataset and it is found
that the proposed model works well with this kind of data since it is independent of attribute values.
• The second feature of the proposed model is its ability to handle class imbalance. This is incorporated in the
model by creating two separate pattern databases for fraud and legal transactions.
• Both customer and fraudulent behaviors are found to be changing gradually over a longer period of time. This
may degrade the performance of fraud detection model.
• Therefore the fraud detection model should be adaptive to these behavioral changes. These behavioral changes
can be incorporated into the proposed model by updating the fraud and legal pattern databases.
• This can be done by running the proposed pattern recognition algorithm at fixed time points like once in 3
months or six months or once in every one lakh transaction.

DEPT OF CSE, VVIT 19


Future enhancement
• Advances in technology give criminals increasingly powerful tools to commit fraud, especially using credit cards
or internet bots. To combat the evolving face of fraud, researchers are developing increasingly sophisticated
tools, with algorithms and data structures capable of handling large-scale complex data analysis and storage.
• The most popular area of current fraud detection research has been in credit card, but we see online bots and Ad
click fraud as growing concerns for the future. With rapid reduction in the cost of computing power, publishers
can exploit vulnerabilities by creating bots to click on Ads to generate more revenue.
• We designed an electronic payment system to prevent fraud in “card not present” transactions. This system is
capable of providing most of the essential features required to prevent fraudulent and legitimate transactions.
• As technology changes, it becomes difficult to track the behaviour and pattern of fraudulent transactions.
Preventing known and unknown fraud in real-time is not easy but it is feasible.
• Further enhancement can be done by making this system secure with the use of certificates for both merchant
and customer and as technology changes new checks can be added.

DEPT OF CSE, VVIT 20


References

[1] P. Richhariya and P. K. Singh, ‘‘Evaluating and emerging payment card fraud
challenges and resolution,’’ Int. J. Comput. Appl., vol. 107, no. 14, pp. 5–10, Jan. 2014.
[2] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, ‘‘Data mining for
credit card fraud: A comparative study,’’ Decis. Support Syst., vol. 50, no. 3, pp. 602–
613, 2011.
[13] A. Dal Pozzolo, O. Caelen, Y.-A. L. Borgne, S. Waterschoot, and G. Bontempi,
‘‘Learned lessons in credit card fraud detection from a practitioner perspective,’’ Expert
Syst. Appl., vol. 41, no. 10, pp. 4915–4928, 2014.
[25] S. Ghosh and D. L. Reilly, ‘‘Credit card fraud detection with a neural-network,’’ in
Proc. 27th Hawaii Int. Conf. Syst. Sci., Jan. 1994, pp. 621–630.
[35] H. Hormozi, M. K. Akbari, E. Hormozi, and M. S. Javan, ‘‘Credit cards fraud
detection by negative selection algorithm on Hadoop (To reduce the training time),’’ in
Proc. 5th Conf. Inf. Knowl. Technol., May 2013, pp. 40–43.

DEPT OF CSE, VVIT 21

You might also like