CSEIT1833119

The document discusses techniques for detecting financial fraud using data mining. It addresses common fraud types and techniques like logistic models, decision trees, Bayesian belief networks, outlier detection, and neural networks. Each technique can be applied to different fraud types based on their characteristics and patterns.

Uploaded by

Hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

CSEIT1833119

Uploaded by

Hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

© 2018 IJSRCSEIT | Volume 3 | Issue 4 | ISSN : 2456-3307

Data Mining Techniques for Financial Fraud Detection

1Aditi Satpute, 2Anuj Shenoy

Department of Computer Science, Fergusson College, Pune, Maharashtra, India

2 Department of Computer Science , Kaveri College of Arts,Science,Commerce, Pune, Maharahtra, India

ABSTRACT

This presented article mainly circumspect the idea of the increasing number of frauds in recent times. Frauds
can deliberately cause accident for payout or intentional losses. With all the different methods of fraud,
detection still becomes an upheaval task. In this article, we shed light on the various frauds that take account
and the fraud detection techniques used with the help of data mining. The broad-based definition of data
mining is ‘processes and activities designed to obtain and evaluate data to extract useful information’. It is
definitely something very important when it comes to detection, which can result in taking immediate action
to minimize cost.
Keywords: Data mining, Financial Fraud, Fraud detection techniques

I. INTRODUCTION classified as fraudulent and non-fraudulent

behaviour. Assuming non- fraudulent behaviour as
Fraud implies intentional deception to gain impetus normal and having the rest as exception, we can
through unfair means .According to the Kroll Annual identify fraudulent behaviour.
Global Fraud & Risk report 92% of the companies
experienced a fraud incident in 2017.It also reveals Financial fraud can be mainly listed into four three
that financial sector is one of the top sectors that are types as showed in figure1.
vulnerable to fraud causing a loss of billions every
year. Financial Fraud

Fraud detection means identifying the fraud or

predicting an expected fraud. There are various ways Corporate Insurance
Bank fraud
to detect a fraud, data mining being the most popular Fraud Fraud
among them. More and more data is being generated
Credit card Financial Automobile
in every aspect we can think of . On most fraud statement insurance
fraud fraud
transactions we do there is some sort of data being
Money
downloaded into the database. Organizations are laundering Securities Health
and insurance
storing, processing and analysing data than any time commoditi
Mortgage fraud
in the history. It is a trend that is going to continue fraud
es fraud

to grow for a very long time.

Figure 1. A flowchart showing different types of

Every fraud has a specific pattern. Data mining
fraud under financial fraud
techniques are used to identify those patterns and
give results. In terms of data mining it can be

CSEIT1833119 | Received : 05 March 2018 | Accepted : 14 March 2018 | March-April-2018 [ (3) 4 : 147-150 ] 147
This paper addresses the most common techniques C. Outlier Detection
that are used to eliminate the above frauds. The Outliers detection is also known as anomaly
techniques we have discussed are neural networks, detection. It is used to identify transactions that are
text mining, decision tree, Bayesian belief network, showing abnormal behavior. To do that, outlier
logistic model, text mining, Outlier detection, fuzzy detection, such a density based model ,can help us to
logic. detect anomalies. Patterns can be used to identify
fraudulent behavior by using statistical methods. As
We have also relegated the frauds according to the each fraud has a pattern they can be identified using
techniques. this method. e.g. Spending on expensive items .If not
typical for an account, it can be considered for
II. TECHNIQUES USED FOR FRAUD outliers detection. Standard statistical methods are
DETECTION used in outlier detection to observe how two
variables interact to ascertain normal behavior.
The proposed context of financial fraud detection ,we Although, it’s been observed that such techniques are
hereby discuss the most efficient techniques- used where anomalies are less in number.

A. Logistic Model D. Bayesian Belief Network

This model is suitable for classification. Bayesian belief network is probably the most widely
The idea of linear model is to make linear regression used data mining technique as well as the most
produce possibilities. It calculates linear function and popular technique used in fraud detection. It is based
then a threshold in order to classify. This model could on conditional probability. It helps in representing a
be binomial, ordinal or multinomial. The result of this set of variables and their conditional dependencies via
model is logistic regression, a popular regression Directed Acyclic Graph (DAG).The nodes of the DAG
technique. may be observable quantities, latent variables,
unknown parameters or hypotheses. Conditional
B. Decision Tree dependencies are shown on edges. It also works on
Decision trees are one of the easiest to use. They are the causality principle i.e. it takes in consideration of
used for prediction and classification. Classification the prior event. The variants of Bayesian probability
rules are represented on the path from root to leaf. differ mainly in their interpretation and construction
They are best suited for choice model or selection of the prior probability. This prior probability is also
model algorithm. Another use of decision trees is for known as marginal depending on the probability
calculating conditional probabilities. Decision trees direction.
extract information in human-understandable form.
The rules used for extracting this information are if The probability of event ‘A’ happening given that
condition1 and condition2 and condition3 then event ‘B’ has already happened, the formula for
outcome expressions, they explain the decisions that Bayesian can be stated as:
lead to the prediction. Decision trees are effective as
they clearly layout the problem so that all options are P(A|B)=P(B|A)P(A)/P(B)
challenged. Also, they allow us to analyse all the where,
possible consequences of decision.  P(A|B) represents Posterior- Probability of our
hypothesis being true, give the data is collected.
 P(B|A) represents Likelihood- Probability of
collecting this data when our hypothesis is true.

Volume 3, Issue 4 | March-April-2018 | http:// ijsrcseit.com 148

 P(A) represents Prior – Probability of our F. Neural Network
hypothesis being true before data collection. Neural networks are systems, which are computing
 P(B) represents Marginal – What is the systems inspired by the biological neural networks.
probability of collecting this data under all An assembly of connected units is called as artificial
possible hypothesis? neurons.Fraud detection using neural network is
totally
Example –Getting cards out of a deck a replica of the working of a human brain. Neural
Probability of getting 2 queens out of a deck of network method gives the computer the
cards. technological way to think like a brain. Neural
network uses this method to learn experiences and
Probability (2 queens)=4/52∗3/51≈0.45% knowledge gained in daily life to take the decisions,
Prior Probability: 4 queens in the set of 52 cards which will imply the evaluation for a fraud.
Second Probability: Only 3 queens in a set of 51
cards Pattern Feature Class

Feature
In spite of its complexity it’s one of the most reliable Classification
extraction
methods used for detection.

Figure 3. Pattern recognition

E. Text Mining
Generally, text is counted as unstructured data which
must be converted into structured data before In Figure 3, it’s shown that when a pattern is
applying any data mining techniques such as detected then it is considered with its features which
classification or clustering in order to detect is further used for the classification. Now, this
fraudulent content. Text mining is capable of classification is used for detecting the actual fraud.
studying plain text, which helps give a different
approach to the problem. It is typically used in
G. Fuzzy Logic
clustering and anomaly detection. The flow chart
Fuzzy logic is quite similar method to human
below explains the proposed text mining approach
reasoning. In this method it imitates the human way
for financial fraud detection.
of thinking that involves all possibilities between
digital values YES or NO. This approach is based on
Retrieve Inform the degrees of truth. Decision makers always look for
Financial and ation
statements reprocess extracti some patterns or groups, while studying a particular
document on data, when it does not suggest a clear answer. This
s logic is also called as cluster analysis which is used
for finding groups in a particular data. Fuzzy logic or
Fraud clustering allows for some ambiguity in the data
detection where the solution might not be clear in that cluster.
Fuzzy logic is mainly used to model partial
categorizations. Although identifying the number of
Evaluation clusters is very tedious task, with the help of fuzzy-c
means algorithm, which is one of the most popular
objective function, it becomes easier. As the different
Figure 2. A flowchart showing different types of
methods of fuzzy logic are becoming ubiquitous,
fraud under financial fraud

Volume 3, Issue 4 | March-April-2018 | http:// ijsrcseit.com 149

these can be used further in various other carried out with their fraudulent behavior and some
applications. effective methods to detect these frauds. Financial
III. CLASSIFICATION Fraud is a major concern in the world today.

We have further classified all the techniques

Information received in the annual reports should be
according to the type of fraud to get a more clear
carefully investigated by decision makers so as to
picture.
take measures in identifying any potential threats of
Table 1. Classification of techniques according to the a fraud. The only flaw that these methods find while
types of fraud detecting a fraud is that they need some actual data
Sr. no Technique Type of fraud that has been implemented on the organisation
1 Logistic Credit card before they can actually come to a
Models fraud, decisionUnderstanding the problems being the most
Insurance important aspect her followed by lexical modeling
fraud, and with the use of technology we can predict or
Financial detect a fraud.
statement fraud V. REFERENCES
2 Neural Network Financial
statement fraud [1]. Prof. Gupta Rajan, Gill N.S. 2012 "Data Mining
Techniques–A Key for detection of financial
3 Bayesian Belief Insurance
statement fraud
Network fraud,
[2]. D.WHITLEY,"Genetic Algorithm And Neural
Financial
Network."2003
statement
[3]. H.C. Koh, C.K. Low, Going concern prediction
fraud,
using data mining technique 2004
Corporate fraud
[4]. Maes, S., Tuyls, K., and Vanschoenwinkel, B.,
4 Outlier Credit card
Machine Learning Techniques For Fraud
Detection fraud
Detection. 2000
5 Text Mining Financial
[5]. Hoogs Bethany, Thomas Kiehl, Christina Lacomb
statement fraud
and DenizSenturk(2007). A Genetic Algorithm
Approach to Detecting Temporal Patterns
6 Decision Tree Credit card
Indicative Of Financial Statement Fraud
fraud,
[6]. P.Ravisankar, V. Ravi, G.RaghavaRao, I., Bose,
Financial
Detection of financial statement fraud and
statement fraud
feature selection using data mining
techniques,Decision Support System. 2011
7 Fuzzy Logic Credit card
[7]. Jenson, F.V., An Introduction To Bayesian
fraud
Networks. 1998

IV. CONCLUSION