CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING

Ramya R

Assistant professors, Department of Computer Science and Engineering, Dhanalakshmi


Srinivasan College of Engineering.

Abstract:

Credit card fraud is one of the most common types of fraud, and it causes significant
financial losses to individuals and companies. Machine learning algorithms have been widely
used to detect credit card fraud due to their ability to learn from historical data and identify
fraudulent patterns. The dataset used in this study was obtained from a financial institution
and contained both fraudulent and non-fraudulent transactions.

We used various pre-processing techniques such as data cleaning, feature selection, and
normalization to prepare the dataset for the algorithm. We then trained the random forest
algorithm on the pre-processed dataset and evaluated its performance using various metrics
such as accuracy, precision, recall, and F1-score. The results showed that the random forest
algorithm performed well in detecting credit card fraud, achieving a good accuracy. Our
findings suggest that the random forest algorithm can be an effective tool for credit card fraud
detection and can help financial institutions to prevent fraudulent transactions and minimize
financial losses.

Keywords: cybercrime, Dataset, Data pre-processing, Random Forest Algorithm, Credit


Card.

1.INTRODUCTION relevant for day by day for a person. One


person behaviour we understand followed
Financial fraud is increases
by credit card usage. By detecting this fraud
in modern communication world within
cases different software are developed but
seconds. They stolen billions of dollars. this
by chance it cannot existing much more
way company and financial institutes are
years. So, we are going for next stage for
losses their profit, mainly all the bank
detecting fraudulent cases in credit card by
transactions are now converted in online. In
machine learning approach. Machine
online we have username and password .so
learning approach is based on algorithm
they provide credit card for purchasing and
performance, so here we use much accurate
transaction purpose. Credit card is more
algorithm Random Forest. this is the best
algorithm for classification. This analysis
has taken by choose different attributes of
credit card.

Machine learning (ML) is the


scientific study of algorithms and statical
models that computer systems use to
effectively perform a specific task without
Fig.1: Machine learning process
using explicit instructions, relying on
Traditionally, data analysis was
models and inference instead. It is seen as a
always being characterized by trial and
subset of artificial intelligence. Machine
error, an approach that becomes impossible
learning algorithms build a mathematical
when data sets are large and heterogeneous.
model of sample data, known as "training
data", in order to make predictions or Machine learning comes as the solution to
all this chaos by proposing clever
decisions without being explicitly
programmed to perform the task Machine alternatives to analyzing huge volumes of
data. By developing fast and efficient
learning algorithms are used in the
applications of email filtering, detection of algorithms and data-driven models for real-
time processing of data, machine learning is
network intruders, and computer vision,
able to produce accurate results and
where it is infeasible to develop an
analysis.
algorithm of specific instructions for
performing the task. Machine learning is II.MACHINE LEARNING
closely related to computational statistics, TECHNIQUES
which focuses on making predictions using
Machine learning tasks are
computers. The study of mathematical
classified into several broad categories.
optimization delivers methods, theory and
In supervised learning, the algorithm builds
application domains to the field of machine
a mathematical model of a set of data that
learning. Data mining is a field of study
contains both the inputs and the desired
within machine learning, and focuses on
outputs. For example, if the task were
exploratory data analysis through
determining whether an image contained a
unsupervised learning in its application
certain object, the training data for a
across business problems, machine learning
supervised learning algorithm would
is also referred to as predictive analytics.
include images with and without that object
(the input), and each image would have a
label (the output) designating whether it random forest and because each tree is
contained the object. In special cases, the trained independently of the others. The
input may be only partially available, or Random Forest algorithm has been found to
restricted to special feedback. Semi- provide a good estimate of the
supervised learning algorithms develop generalization error and to be resistant to
mathematical models from incomplete over fitting.
training data, where a portion of the sample
inputs are missing the desired output. IV. SYSTEM APPROACH
➢ Data Pre-Processing

• Data preprocessing which mainly


include data cleaning, integration,
transformation and reduction, and
obtains training sample data
needed.

Fig.2 Supervised learning method • It is a data mining technique that


transforms raw data into an
understandable format
III.PROPOSED SYSTEM Steps in Data Preprocessing
In proposed System, we are 1. Import libraries
applying random forest algorithm for 2. Read data
classify the credit card dataset. Random 3. Checking for missing values
Forest is an algorithm for classification and 4. Checking for categorical data
regression. Summarily, it is a collection of
5. Standardize the data
decision tree classifiers. Random forest has
6. PCA transformation
advantage over decision tree as it corrects
7. Data splitting
the habit of over fitting to their training set.
➢ Training Data and Test Data
A subset of the training set is sampled
randomly so that to train each individual The training data set in
tree and then a decision tree is built; each Machine Learning is used to train
node then splits on a feature selected from the model for carrying out abundant
a random subset of the full feature set. Even actions. Detailed features are
for large data sets with many features and fetched from the training set to train
data instances training is extremely fast in the model. These structures are
therefore combined into the studies, the success ratio can be increased
prototype , if the training set is by strengthening the data set. Lung
trained correctly, then the model tomography can be used in addition to chest
will be able to acquire something radiographs. By developing different deep
from the comparison image. So for learning models, success ratio and
testing the model such type of data performance can be increased.
is used to check whether it is
➢ Algorithm Implementation
responding correctly or not.

Random Forest Algorithm:-


➢ Model Creation Random Forest is one of the most
• Contextualise machine popular and commonly used algorithms by
learning in your Data Scientists. Random forest is a
organisation. Supervised Machine Learning Algorithm
that is used widely in Classification and
• Explore the data and choose
the type of algorithm. Regression problems. It builds decision
trees on different samples and takes their
• Prepare and clean the majority vote for classification and average
dataset.
in case of regression.
• Split the prepared dataset
and perform cross One of the most important features
validation. of the Random Forest Algorithm is that it
can handle the data set containing
• Perform machine learning
optimisation. continuous variables, as in the case of
regression, and categorical variables, as in
• Deploy the model. the case of classification. It performs better
➢ Model Prediction for classification and regression tasks. In
this tutorial, we will understand the
Predictive modeling is a statistical
working of random forest and implement
technique using machine learning and data random forest on a classification task.
mining to predict and forecast likely future
outcomes with the aid of historical and
existing data. It works by analyzing current
and historical data and projecting what it
learns on a model generated to forecast
likely outcomes. In this Project, our final
prediction is to predict whether a
transaction should be successfully done or Fig.3: working of Random Forest
any fraud activities are held .In future Algorithm
Working of Random Forest Algorithm

Before understanding the working of the


random forest algorithm in machine
learning, we must look into the ensemble
learning technique. Ensemble simplymeans
combining multiple models. Thus a
collection of models is used to make
Fig.4: Block Diagram
predictions rather than an individual model.

Ensemble uses two types of methods: V. OBSERVATIONAL RESULTS AND


ANALYSIS
1. Bagging– It creates a different training
In conclusion, credit card fraud
subset from sample training data with
replacement & the final output is based on is a serious problem that affects individuals
majority voting. For example, Random and companies worldwide. Machine
Forest. learning algorithms have shown great
potential in detecting fraudulent
2. Boosting– It combines weak learners into
strong learners by creating sequential transactions and minimizing financial
models such that the final model has the losses. In this study, we used the random
highest accuracy. For example, ADA
forest algorithm to detect credit card fraud,
BOOST, XG BOOST.
Steps Involved in Random Forest and our results showed that it performed
Algorithm well in identifying fraudulent patterns in the
Step 1: In the Random forest model, a dataset. The accuracy achieved by the
subset of data points and a subset of algorithm was 98.5%, indicating that it can
features is selected for constructing each
be a useful tool for financial institutions in
decision tree. Simply put, n random records
and m features are taken from the data set detecting credit card fraud. However, it is
having k number of records. important to note that no algorithm is
perfect, and there is always room for
Step 2: Individual decision trees are
constructed for each sample. improvement. Future research can focus on
improving the performance of the random
Step 3: Each decision tree will generate an forest algorithm by using advanced
output.
preprocessing techniques, incorporating
Step 4: Final output is considered based on more features, and exploring other machine
Majority Voting or Averaging for learning algorithms. Overall, the use of
Classification and regression, respectively.
machine learning algorithms for credit card
fraud detection can help financial VI. REFERENCE
institutions to prevent fraudulent 1. L. Zheng, G. Liu, C. Yan, and C.
transactions and protect their customers Jiang, “Transaction fraud detection
based on total order relation and
from financial losses.
behavior diversity,” IEEE Trans.
Comput. Soc. Syst., vol. 5, no. 3, pp.
796–806, Sep. 2018.
2. V. Van Vlasselaer et al., “APATE: A
novel approach for automated credit
card transaction fraud detection
using network-based extensions,”
Decis. Support Syst., vol. 75, pp.
Fig.5: Observation of the Result 38–48, Jul. 2015.
3. K. Fu, D. Cheng, Y. Tu, and L.
Zhang, “Credit card fraud detection
using convolutional neural
networks,” in Proc. Int. Conf.
Neural Inf. Process. (ICONIP).
Cham, Switzerland: Springer, 2016,
pp. 483–490.
Fig.6(a) Logistic Regression 4. A. D. Pozzolo, G. Boracchi, O.
Caelen, C. Alippi, and G. Bontempi,
“Credit card fraud detection: A
realistic modeling and a novel
learning strategy,” IEEE Trans.
Neural Netw. Learn. Syst., vol. 29,
no. 8, pp. 3784–3797, Sep. 2018.
5. J. Jurgovsky et al., “Sequence
classification for credit-card fraud
Fig.6(b) SVM detection,” Expert Syst. Appl., vol.
100, pp. 234–245, Jun. 2018.
6. E. Kim et al., “Champion-
challenger analysis for credit card
fraud detection: Hybrid ensemble
and deep learning,” Expert Syst.
Appl., vol. 128, pp. 214–224, Aug.
2019.

Fig.6(c) Random Forest Classifier 7. A. Dal Pozzolo, O. Caelen, Y.-A. Le


Borgne, S. Waterschoot, and G.
Bontempi, “Learned lessons in
credit card fraud detection from a
practitioner perspective,” Expert
Syst. Appl., vol. 41, no. 10, pp.
4915–4928, Aug. 2014.
8. H. He and E. A. Garcia, “Learning
from imbalanced data,” IEEE Trans.
Knowl. Data Eng., no. 9, pp. 1263–
1284, Jun. 2008.
9. N. V. Chawla, K. W. Bowyer, L. O.
Hall, and W. P. Kegelmeyer,
“Smote: Synthetic minority over-
sampling technique,” J. Artif. Intell.
Res., vol. 16, pp. 321–357, Jul.
2018.
10. S. H. Khan, M. Hayat, M.
Bennamoun, F. A. Sohel, and R.
Togneri, “Cost-sensitive learning of
deep feature representations from
imbalanced data,” IEEE Trans.
Neural Netw. Learn. Syst., vol. 29,
no. 8, pp. 3573–3587, Aug. 2018.
11. Y. Bengio, A. Courville, and P.
Vincent, “Representation learning:
A review and new perspectives,”
IEEE Trans. Pattern Anal. Mach.
Intell., vol. 35, no. 8, pp. 1798–
1828, Aug. 2013

You might also like