A MINOR PROJECT by SHIVANG CHADHA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

A MINOR PROJECT

REPORT ON
“CRERDIT CARD FRAUD DETECTION
SYSTEM”
SUBMITED IN THE PARTIAL FULFILLMENT FOR THE AWARD OF DEGREE
OF BACHELOR IN BUSINESS ADMINISTRATION (CAM):2021-24
UNDER THE GUIDENCE: SUBMITTED BY:
MS.REKHA JAIN SHIVANG CHADHA:
ASSOCIATE PROFESSOR,CPJCHS ENROLLMENT NO:00424201921
BATCH:2021-24

CHANDERPRABHU JAIN COLLEGE OF HIGHER STUDIES & SCHOOL OF LAW An ISO 9001: 2015
Certified Institute (Approved by the Govt of NCT of Delhi Affiliated to Guru Gobind Singh
Indraprastha University, Delhi) Plot No OCF Sector A-8, Narela New Delhi - 110040
DECLARATION
This is to certify that Minor Project Report entitled
“Credit Card Fraud Detection System” which is
submitted by me in partial fulfillment of the
requirement for the award of degree BBA (CAM) to
GGSIP University, Dwarka, Delhi comprises only my
original work and due acknowledgement has been
made in the text to all other material used.

Date: NAME AND SIGNATURE OF STUDNET

APPROVED BY:
NAME OF THE SUBECT TEACHER/SUPERVISOR
CERTIFICATE

This is to certify that Minor Project Report


entitled “credit card fraud System” which is
submitted by SHIVANG CHADHA in partial
fulfillment of the requirement for the award of
degree BBA (CAM) to GGSIP University,
Dwarka, Delhi is a record of the candidate own
work carried out by him under my/our
supervision.

Date: supervisor
signature
ACKNOLEDGEMENT
I offer my sincere thanks and humble regards to
Chander Prabhu Jain College of Higher Studies & School
of Law, GGSIP University, New Delhi for imparting us
very valuable professional learning in Minor Project
Report of BBA (CAM). I pay my gratitude and sincere
regards to Ms. REKHA JAIN my project Guide, for
imparting his/ her knowledge. I am thankful to him as
he has been a constant source of advice, motivation and
inspiration. I am also thankful to him for giving his
suggestions and encouragement throughout the project
work. I take the opportunity to express my gratitude
and thanks to our computer Lab staff and library staff
for providing me opportunity to utilize their resources
for the completion of the project. I am also thankful to
my family and friends for constantly motivating me to
complete the project and providing me an environment
which enhanced my knowledge.

Student’s Signature
TABLE OF CONTENTS
• DECLARATION
• CERTIFICATE
• ACKNOWLEDGEMENT

1.INTRODUCTION
2.SOURCES OF FRAUD
3.NEGLEGENCE WE DO
4. HOW IT WORKS
5. IDENTITY THEFT
6 .TRANSACTION LAUNDERING
7. An investigation on
credit card fraud detection
using machine learning

8.CREDIT CARD FRAUD


DETECTION WITH
CLASSIFICATION ALGORITHMS
IN PYTHON

9.TABLE OF CONTENTS
10. WHY DO WE NEED TO FIND
FRAUD TRANSACTIONS
11.false positives in credit card
fraud detection
12. Data Ingestion Exploratory Data Analysis
INTRODUCTION

We frequently uses credit card on our


daily basis but sometimes we face some
irrevelent issues such as transctions we
is not done by our side. We can called it
as fraud on credit card
Sources of fraud

The sources of fraud is comes


From unemployment peoples
Who have lots of free time and
Who use their skills in efficient
Manner to let this incident
Happen these people are
genrally from local areas
Villages or even from modern
areas

They even open their offices


On local areas and sometimes
they even involve the officers
of the government which helps
them to increase their powers
NEGLEGENCE WE DO

WE BASICALY TRUST ON THESE GUYS AND


MISTAKENLY SHARE OUR CARD DETAILS
FOR EXAMPLE:CARD NUMBER,CVV,AND
OTP ETC.
HOW IT WORKS

THE PROCESS IS DESCRIBE IN FOLLWING


DIAGRAM ABOVE.
What is credit card
fraud and who become
targets of scams
According to the FBI, credit card fraud is “the
unauthorized use of a credit or debit card, or similar
payment tool to fraudulently obtain money or
property.” All players involved in the card-based payment
process can potentially fall victim to scammers,
including:

• cardholders,
• online merchants,
• payment gateway providers
• payment processing companies,
• credit card payment systems,
• card issuers (issuing banks), and
• acquirers (acquiring banks)
Except for cardholders whose anti-fraud
measures narrow down to vigilance and
timely reporting about lost or stolen cards,
all other players rely on various digital tools
designed to combat scams. The importance
of these tools is hard to overstate. Say, if an
online business shows a fraud rate greater
than one percent,card networks like
Mastercard or AmEx may cancel permission
to accept and process credit card payments.

With all the variety of fraudulent schemes


involving credit cards, they can be roughly
divided into two large groups — identity
theft and transaction laundering.
Identity theft
Credit card fraud is the most common form
of identity theft, affecting more than 10.7
million people annually. It occurs when
someone steals a card or snatches personal
information to perform so-called card-not-
present (CNP) transactions.

Most commonly, ID thieves use a victim’s


identity and payment credentials to

• make purchases a cardholder doesn’t


authorize,
• withdraw money from a victim’s existing
account (account takeover),
• apply for a new credit card (fraudulent
application), or
• open a new account.
Criminals may obtain sensitive information
via phishing emails, skimming devices
embedded into card readers, and cyber-
attacks on banks or retailers with a low
fraud control level. But often scammers use
far simpler methods — such as rummaging
through papers carelessly dumped in the
trash or just looking over a person’s
shoulder when a potential victim enters a
PIN.
Transaction laundering
This relatively new and advanced method of money
laundering is also known as undisclosed
aggregation, factoring, or credit card laundering. The
fraud involves a legitimate merchant whose
credentials are used to process payments for illicit or
illegal products and services through a payment
card network.

Criminals may exploit huge online marketplaces to


launder dirty money via fake transactions. Another
scenario is to create an innocent-looking shell
website (say, a toy or clothing store) to actually sell
illegal substances.
Big players: what they do to
protect online payments
Large online merchants and payment service
providers are no strangers to credit card fraud and its
consequences. They have been building their risk
management strategies for years, being among early
adopters of machine learning. Some of these pioneers
share experience with the general public, even giving
open access to their antifraud solutions.
American Express: achieving the
lowest fraud rate
Operating as a credit card issuer, network, and
merchant acquirer, AmEx handles 25 percent of
the credit card activity in the US. This 170-year
old company deployed its first machine learning
models in 2014, and now also uses deep learning
models to capitalize on the huge datasets
available. AmEx tools monitor in real-time $1.2
trillion worth of transactions a year,
demonstrating the lowest fraud rates in the
credit card industry.

For small businesses, the financial giant offers a


free ML-fueled solution called Enhanced
Authorization. Merchants using the technology
report a 60 percent reduction in fraudulent
transactions.
An investigation on credit
card fraud detection using
machine learning
In this article we will analyze how
various machine learning algorithms
perform on balance and unbalance
dataset by taking example of credit card
fraud detection.
CREDIT CARD FRAUD DETECTION
WITH CLASSIFICATION
ALGORITHMS IN PYTHON
Fraud transactions or fraudulent activities are
significant issues in many industries like banking,
insurance,etc. Especially for the banking industry,
credit card fraud detection is a pressing issue to
resolve.
These industries suffer too much due to fraudulent
activities towards revenue growth and lose customer’s
trust. So these companies need to find fraud
transactions before it becomes a big problem for them.

Unlike the other machine learning problems, in credit


card fraud detection the target class distribution is not
equally distributed. It is popularly known as the class
imbalance problem or unbalanced data issue.
This makes this problem even more challenging to solve.
So In this article, we will explain to you how to build
credit card fraud detection using different machine
learning classified algorithum.
Such as,
• Dicision tree algorithum
• Random forest algorithum
You will also get an idea about the impact of unbalanced
data on the model’s performance.
Why do we need to find
fraud transactions?

For many companies, fraud detection is a big problem


because they find these fraudulent activities after they
experience high loss.
Fraud activities happen in all industries. We can't say
only particular companies/industries suffer from these
fraudulent activities or transactions.
But when it comes to financial-related companies, this
fraud transaction becomes more of an issue/problem.
So these companies want to detect fraud transactions
before the fraud activities turn into significant damage
to their company.
In the current generation, with high-end technology,
still, on every 100 credit card transactions, 13% are
falling into the fraudulent activities reported by the
creditcards website.
A survey paper mentioned that in the year 1997,
63% of companies experienced one fraud in the past
two years, and in another year 1999, 57% of companies
experienced at least one fraud in the last one year.
Here the point is not only fraud activities increase, but
the way of doing scams also increases badly.
Reducing false
positives in credit
card fraud detection
Model extracts granular behavioral
patterns from transaction data to
more accurately flag suspicious
activity.
Have you ever used your credit card at a new store or
location only to have it declined? Has a sale ever been
blocked because you charged a higher amount than
usual?

Consumers’ credit cards are declined surprisingly often


in legitimate transactions. One cause is that fraud-
detecting technologies used by a consumer’s bank have
incorrectly flagged the sale as suspicious. Now MIT
researchers have employed a new machine-learning
technique to drastically reduce these false positives,
saving banks money and easing customer frustration.

Using machine learning to detect financial fraud dates


back to the early 1990s and has advanced over the
years. Researchers train models to extract behavioral
patterns from past transactions, called “features,” that
signal fraud. When you swipe your card, the card pings
the model and, if the features match fraud behavior, the
sale gets blocked.

Behind the scenes, however, data scientists must dream


up those features, which mostly center on blanket rules
for amount and location. If any given customer spends
more than, say, $2,000 on one purchase, or makes
numerous purchases in the same day, they may be
flagged. But because consumer spending habits vary,
even in individual accounts, these models are sometime
inaccurate: A 2015 report from Javelin Strategy and
Research estimates that only one in five fraud
predictions is correct and that the errors can cost a bank
$118 billion in lost revenue, as declined customers then
refrain from using that credit card.

The MIT researchers have developed an “automated


feature engineering” approach that extracts more than
200 detailed features for each individual transaction —
say, if a user was present during purchases, and the
average amount spent on certain days at certain
vendors. By doing so, it can better pinpoint when a
specific card holder’s spending habits deviate from the
norm.

Tested on a dataset of 1.8 million transactions from a


large bank, the model reduced false positive predictions
by 54 percent over traditional models, which the
researchers estimate could have saved the bank
190,000 euros (around $220,000) in lost revenue.

“The big challenge in this industry is false positives,”


says Kalyan Veeramachaneni, a principal research
scientist at MIT’s Laboratory for Information and Decision
Systems (LIDS) and co-author of a paper describing the
model, which was presented at the recent European
Conference for Machine Learning. “We can say there’s a
direct connection between feature engineering and
[reducing] false positives. … That’s the most impactful
thing to improve accuracy of these machine-learning
models.”

Paper co-authors include: lead author Roy Wedge '15, a


former researcher in the Data to AI Lab at LIDS; James
Max Kanter ’15, SM ’15; and Sergio Iglesias Perez of
Banco Bilbao Vizcaya Argentaria.

Extracting “deep” features

Three years ago, Veeramachaneni and Kanter


developed Deep Feature Synthesis (DFS), an automated
approach that extracts highly detailed features from any
data, and decided to apply it to financial transactions.

Enterprises will sometimes host competitions where they


provide a limited dataset along with a prediction problem
such as fraud. Data scientists develop prediction models,
and a cash prize goes to the most accurate model. The
researchers entered one such competition and achieved
top scores with DFS.

However, they realized the approach could reach its full


potential if trained on several sources of raw data. “If you
look at what data companies release, it’s a tiny sliver of
what they actually have,” Veeramachaneni says. “Our
question was, ‘How do we take this approach to actual
businesses?’”

Backed by the Defense Advanced Research Projects


Agency’s Data-Driven Discovery of Models program,
Kanter and his team at Feature Labs — a spinout
commercializing the technology — developed an open-
source library for automated feature extraction,
called Featuretools, which was used in this research.

The researchers obtained a three-year dataset provided


by an international bank, which included granular
information about transaction amount, times, locations,
vendor types, and terminals used. It contained about 900
million transactions from around 7 million individual
cards. Of those transactions, around 122,000 were
confirmed as fraud. The researchers trained and tested
their model on subsets of that data.
In training, the model looks for patterns of transactions
and among cards that match cases of fraud. It then
automatically combines all the different variables it finds
into “deep” features that provide a highly detailed look at
each transaction. From the dataset, the DFS model
extracted 237 features for each transaction. Those
represent highly customized variables for card holders,
Veeramachaneni says. “Say, on Friday, it’s usual for a
customer to spend $5 or $15 dollars at Starbucks,” he
says. “That variable will look like, ‘How much money was
spent in a coffee shop on a Friday morning?’”

It then creates an if/then decision tree for that account of


features that do and don’t point to fraud. When a new
transaction is run through the decision tree, the model
decides in real time whether or not the transaction is
fraudulent.

Pitted against a traditional model used by a bank, the


DFS model generated around 133,000 false positives
versus 289,000 false positives, about 54 percent fewer
incidents. That, along with a smaller number of false
negatives detected — actual fraud that wasn’t detected
— could save the bank an estimated 190,000 euros, the
researchers estimate.

Iglesias notes he and his colleagues at BBVA have


consistently been able to reproduce the MIT team’s
results using the DFS model with additional card and
business data, with a minimum increase in computational
cost.

Stacking primitives

The backbone of the model consists of creatively


stacked “primitives,” simple functions that take two inputs
and give an output. For example, calculating an average
of two numbers is one primitive. That can be combined
with a primitive that looks at the time stamp of two
transactions to get an average time between
transactions. Stacking another primitive that calculates
the distance between two addresses from those
transactions gives an average time between two
purchases at two specific locations. Another primitive
could determine if the purchase was made on a weekday
or weekend, and so on.

“Once we have those primitives, there is no stopping us


for stacking them … and you start to see these
interesting variables you didn’t think of before. If you dig
deep into the algorithm, primitives are the secret sauce,”
Veeramachaneni says.

One important feature that the model generates,


Veeramachaneni notes, is calculating the distance
between those two locations and whether they happened
in person or remotely. If someone who buys something
at, say, the Stata Center in person and, a half hour later,
buys something in person 200 miles away, then it’s a
high probability of fraud. But if one purchase occurred
through mobile phone, the fraud probability drops.

“There are so many features you can extract that


characterize behaviors you see in past data that relate to
fraud or nonfraud use cases,” Veeramachaneni says.

"In fact, this automated feature synthesis technique, and


the overall knowledge provided by MIT in this project,
has shown us a new way of refocusing research in other
challenges in which we initially have a reduced set of
features. For example, we are obtaining equally
promising results in the detection of anomalous behavior
in internal network traffic or in market operations, just to
mention two [examples],” Iglesias adds.
s-based fraud detection (top) vs.
classification decision tree-based
detection (bottom): The risk scoring in
the former model is calculated using
policy-based, manually crafted rules and
their corresponding weights. In contrast,
the decision tree classifies observations
based on attribute splits learned from the
statistical properties of the training data.
Automated credit card fraud detection is generally
implemented using one of the following methods:
Rule-based detection - based on hard-coded rules, this
approach requires a substantial amount of manual work to
define the majority of the possible fraud conditions and to
put rules in place that trigger alarms or block the
suspicious transaction. An advantage of this approach is
that its decisions are inherently explainable - it is
straightforward to identify the rule, which flagged a
specific transaction as fraudulent. The drawbacks are that
rule-based detection is computationally intensive and is
usually implemented as batch (or offline) scoring.
Keeping the rules updated and constantly scanning for
false negatives that slip through the cracks is also
challenging from a maintenance perspective.
Machine Learning-based detection - using statistical
learning is another approach that is gaining popularity,
mostly because it is less laborious. It can be implemented
as either unsupervised (e.g. anomaly detection) or
supervised model (classification), and requires less
maintenance as the model can be automatically retrained
to keep its associations up to date. It is also suitable for
online applications, as the scoring function is usually very
lightweight. A drawback of the ML approach is that there
for certain algorithms (e.g. deep learning) there is no
guaranteed explainability.
In this blog post we will show an implementation of an
ML-based anomaly detection based on XGBoost. We will
go through a typical ML pipeline, where we do data
ingestion, exploratory data analysis, feature engineering,
model training and evaluation.
Data Ingestion Exploratory Data Analysis
We start by importing the Python libraries needed for the
pipeline. We then read and inspect the sample dataset.
This dataset contains anonymized credit card transaction
data and is freely available from Kaggle. The data has
been collected as part of a research collaboration between
Worldline and the Machine Learning Group of
Université Libre de Bruxelles. The dataset contains
transactions made by European credit card holders in
September 2013, and has been anonymized - Features V1,
V2, ..., V28 are results from applying PCA on the raw
data. The only intact features are Time and Amount. The
class label is titled Class where 0 denotes a genuine
transaction and 1 signifies fraud.

You might also like