0% found this document useful (0 votes)
30 views22 pages

Report 16 TH

Uploaded by

akbillioner7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views22 pages

Report 16 TH

Uploaded by

akbillioner7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

FRAUD DETECTION IN FINANCIAL SECTOR

By

Anupam Kant Chaudhary (2000910130017)

Akash Ojha (2000910130007)

Akhil Kumar Singh (2000910130008)

Mayank Majila (2000910130063)

Submitted to the Department of Information Technology

in partial fulfillment of the requirements

for the degree of

Bachelor of Technology

in

Information Technology

JSS Academy Of Technical Education

Dr APJ Abdul Kalam Technical University

1
DECLARATION

We hereby declare that this submission is our own work and that, to the best of our
knowledge and belief, it contains no material previously published or written by another
person nor material which to a substantial extent has been accepted for the award of
any other degree or diploma of the university or other institute of higher learning, except
where due acknowledgment has been made in the text.

Name : Anupam Kant Chaudhary

Roll No.: 2000910130017

Date :

Name : Akash Ojha

Roll No.: 2000910130007

Date :

Name : Akhil Kumar Singh

Roll No.: 2000910130008

Date :

Name : Mayank Majila

Roll No.: 2000910130063

Date :

2
CERTIFICATE

This is to certify that Project Report entitled “Fraud Detection In Financial Sector” which is
submitted by Anupam, Akash, Akhil, Mayank in partial fulfillment of the requirement for the
award of degree B. Tech. in Department of Information Technology of Dr APJ Abdul Kalam
Technical University, is a record of the candidate own work carried out by him under my/our
supervision. The matter embodied in this thesis is original and has not been submitted for the
award of any other degree.

Supervisor Mrs Vaishali Tyagi

Date 16th December’23

3
ACKNOWLEDGEMENT

It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken during B.
Tech. Final Year. We owe special debt of gratitude to Professor Vaishali Tyagi, Department of
Information Technology, JSS Academy Of Technical Education, Noida for her constant support and
guidance throughout the course of our work. Her sincerity, thoroughness and perseverance have been a
constant source of inspiration for us. It is only her cognizant efforts that our endeavors have seen light
of the day.

We also take the opportunity to acknowledge the contribution of Professor Dhiraj Pandey, Head,
Department of Information Technology, JSS Academy Of Technical Education, Noida for his full
support and assistance during the development of the project.

We also do not like to miss the opportunity to acknowledge the contribution of all faculty members of
the department for their kind assistance and cooperation during the development of our project. Last
but not the least, we acknowledge our friends for their contribution in the completion of the project.

Name : Anupam Kant Chaudhary

Roll No.: 2000910130017

Date :

Name : Akash Ojha

Roll No.: 2000910130007

Date :

Name : Akhil Kumar Singh

Roll No.: 2000910130008

Date :

Name : Mayank Majhila

Roll No.: 2000910130064

Date :
4
5
ABSTRACT

Payments related fraud is a key aspect of cyber-crime agencies and recent research has shown
that machine learning techniques can be applied successfully to detect fraudulent transactions
in large amounts of payments data. Such techniques have the ability to detect fraudulent
transactions that human auditors may not be able to catch and also do this on a real time
basis.

In this project, we apply multiple supervised machine learning techniques to the problem of
fraud detection using a publicly available simulated payment transactions data. We aim to
demonstrate how supervised ML techniques can be used to classify data with high class
imbalance with high accuracy.

We demonstrate that exploratory analysis can be used to separate fraudulent and non-
fraudulent transactions. We also demonstrate that for a well separated dataset, tree based
algorithms like Random Forest work much better than Logistic Regression.

6
TABLE OF CONTENTS Page

DECLARATION ................................................................................................... ii
CERTIFICATE ..................................................................................................... iii
ACKNOWLEDGEMENTS .................................................................................. iv
ABSTRACT ........................................................................................................... v
LIST OF TABLES.................................................................................................. vii
LIST OF FIGURES................................................................................................ viii
LIST OF SYMBOLS .............................................................................................. ix
LIST OF ABBREVIATIONS ................................................................................ x
CHAPTER 1 1
1.1. ................................................................................................................. 5
1.2. ................................................................................................................. 8
CHAPTER 2 ……………………………. ......................................................... 13
2.1. .................................................................................................................. 15
2.2. .................................................................................................................. 17
2.2.1. ......................................................................................................... 19
2.2.2. ......................................................................................................... 20
2.2.2.1. ................................................................................................ 21
2.2.2.2. .......................................................................................... 22
2.3. ................................................................................................................. 23
CHAPTER 3 …………………………….......................................................... 30
3.1. ................................................................................................................ 36
3.2. ................................................................................................................ 39
CHAPTER 4 (CONCLUSIONS) ......................................................................... 40
APPENDIX A ......................................................................................................... 45
APPENDIX B ......................................................................................................... 47
REFERENCES... .................................................................................................... 49

7
LIST OF TABLES

8
LIST OF FIGURES

9
LIST OF SYMBOLS

[x] Integer value of x.

≠ Not Equal

 Belongs to

€ Euro- A Currency

_ Optical distance

_o Optical thickness or optical half thickness

10
(Example)

LIST OF ABBREVIATIONS

AAM Active Appearance Model

ICA Independent Component Analysis

ISC Increment Sign Correlation

PCA Principal Component Analysis

ROC Receiver Operating Characteristics

11
CHAPTER 1

1.1 Introduction
Digital payments of various forms are rapidly increasing across the world. Payments companies
are experiencing rapid growth in their transactions volume. For example, PayPal processed
~$578 billion in total payments in 2018. Along with this transformation, there is also a rapid
increase in financial fraud that happens in these payment systems.

Preventing online financial fraud is a vital part of the work done by cyber security and cyber-
crime teams. Most banks and financial institutions have dedicated teams of dozens of analysts
building automated systems to analyze transactions taking place through their products and
flag potentially fraudulent ones. Therefore, it is essential to explore the approach to solving the
problem of detecting fraudulent entries/transactions in large amounts of data in order to be
better prepared to solve cyber-crime cases.

1.1.1 Motivation

1. Escalating Fraud Incidents: The financial industry has witnessed a surge in


fraudulent activities in recent years, driven by increasingly sophisticated tactics
employed by criminals. This necessitates the constant evolution of fraud detection
methodologies to stay ahead of perpetrators.
2. Financial Implications: Fraudulent transactions result in substantial financial losses
for individuals, businesses, and financial institutions. These losses have wide-ranging
economic implications and undermine confidence in the financial system.
3. Technological Advancements: Advances in technology, including artificial
intelligence, machine learning, big data analytics, and real-time processing, have
opened new avenues for the development of more efficient and accurate fraud
detection systems.

1.1.2 Objective

Advanced Algorithm Development: Design and implement cutting-edge algorithms


capable of identifying fraudulent transactions and activities, even as fraudsters evolve
their tactics.

False Positive Minimization: Minimize the occurrence of false positives to ensure


legitimate transactions are not erroneously flagged as fraudulent, preserving the
customer experience.

Customer Security Enhancement: Improve the overall security of financial


transactions to protect customers from falling victim to financial fraud schemes. It will

12
use machine learning techniques to generate alerts, recommendations, actions, or
decisions based on the fraud risk level of each entity.

Regulatory Compliance: Ensure that financial institutions are equipped with the tools
and systems required to meet regulatory obligations regarding fraud detection and
reporting.

Improved Accuracy: It will use machine learning models to classify, score, rank, or
predict the fraud risk level of each transaction, customer, account, device, location,
etc.

1.1.3 Scope

In this proposed project we designed a protocol or a model to detect the fraud activity in
financial transactions. This system would be capable of providing most of the essential
features required to detect fraudulent and legitimate transactions. As technology changes, it
becomes difficult to track the Modelling and pattern of fraudulent transactions. With the rise
of machine learning, artificial intelligence and other relevant fields of information
technology, it becomes feasible to automate this process and to save some of the intensive
amount of labor that is put into detecting credit card fraud.
Phase 1: Business Understanding
As stated before credit card fraud is increasing drastically every year, many people are facing
the problem of having their credits breached by those fraudulent people, which is impacting
their daily lives, as payments using a credit card is similar to taking a loan. If the problem is
not solved many people will have large amounts of loans that they cannot pay back which
will make them face a hard life, and they won’t be able to afford necessary products, in the
long run not being able to pay back the amount might lead to them going to jail. Basically,
the problem proposed is the detection of the credit card fraudulent transactions made by
fraudsters to stop those breaches and to ensure customers security.
Phase 2: Data Understanding
In the Data understanding phase, it was critical to obtain a high-quality dataset as the model
is based on it, the dataset was explored by taking a closer look into it which gave the
knowledge needed to confirm the quality of the dataset, additionally to reading the
description of the whole dataset and each attribute. It’s also important to have a dataset that
contains several mixed transaction types “Fraudulent and real” and a class to clarify the type
of transaction, finally, identifiers to clarify the reason behind the classification of 3 the
transaction type. I made sure to follow all of those points during the search for the most
suited dataset.
Phase 3: Data Preparation
After choosing the most suited dataset the preparation phase begins, the preparation of the
dataset includes selecting the wanted attributes or variables, cleaning it by excluding Null

13
rows, deleting duplicated variables, treating outlier if necessary, in addition to transforming
data types to the wanted type, data merging can be performed as well where two or more
attributes get merged. All those alterations lead to the wanted result which is to make the data
ready to be modelled.
Phase 4: Modelling
Four machine learning models will be used in modelling phase, KNN, SVM, Logistic
Regression and Naïve Bayes. A comparison of the results will be presented later in the paper
to know which technique is most suited in the credit card fraudulent transactions detection.
The dataset is will be sectioned into a ratio of 80:20, the training set will be the 80% and
remaining set will be the testing set which is the 20%.
Phase 5: Evaluation and Deployment
The final phase will show evaluations of the models by presenting their efficiency, the
accuracies of the models will be presented in addition to any comment observed, to find the
best and most suited model for detecting the fraud transactions made by credit card.

1.2 Related Work

Significant research and practical applications have been conducted in the field of fraud
detection within the financial sector. Key contributions include:

Machine Learning Approaches: Research exploring the use of machine learning algorithms
such as Random Forests, Neural Networks, Support Vector Machines, and Gradient Boosting
for fraud detection.

Behavioural Analysis: Studies investigating behavioural analysis of customers to detect


unusual patterns and deviations from typical transaction behaviours, a crucial aspect of fraud
detection.

Anomaly Detection: Research into the application of anomaly detection techniques, including
clustering, outlier detection, and network analysis, to identify irregularities in financial data.

Fraud Detection Platforms: The development and implementation of comprehensive fraud


detection platforms and software solutions that integrate various detection techniques and
data sources.

14
CHAPTER 2

2.1 Literature Survey


Considerable literature is available on financial fraud detection due to its high importance in
reducing cyber-crimes and also from a business point of view. A few researchers have also
conducted literature reviews of articles published in the 2000s and 2010s.

To detect financial fraud, researchers typically use outlier detection techniques (Jayakumar
et.al, 2013) with highly imbalanced datasets. Different types of financial frauds are also
possible. One article suggests four categories of financial fraud – financial statement fraud,
transaction fraud, insurance fraud and credit fraud (Jan’s et al., 2011). In this project, the focus
is on transaction fraud specifically as it applies to mobile payments and deep fake Voice fraud.

2.2 A variety of techniques have been tested to detect financial fraud.

15
2.2.1 Deepfake Audio Detection via MFCC Features Using Machine
Learning

Deepfake content is created or altered synthetically using artificial intelligence (AI)


approaches to appear real. It can include synthesizing audio, video, images, and text.
Deepfakes may now produce natural-looking content, making them harder to identify. Much
progress has been achieved in identifying video deepfakes in recent years; nevertheless, most
investigations in detecting audio deepfakes have employed the ASVSpoof or AVSpoof
dataset and various machine learning, deep learning, and deep learning algorithms. This
research uses machine and deep learning-based approaches to identify deepfake audio. Mel-
frequency cepstral coefficients (MFCCs) technique is used to acquire the most useful
information from the audio.

16
2.2.2 Deep fake Audio Detection: A Deep Learning Based Solution for Group
Conversations

We built Deep Neural Network models and integrated them into a single solution using
different datasets, including but not limited to UrbanSound8K (5.6GB), Conversational
(12.2GB), AMI-Corpus (5GB), and FakeOrReal (4GB).

Our proposed approach consists of four main components. The speech-denoising component
cleans and preprocesses the audio using Multilayer-Perceptron and Convolutional Neural
Network architectures, with 93% and 94% accuracies accordingly.

The speaker diarization was implemented using two different approaches, Natural Language
Processing for text conversion with 93% accuracy and Recurrent Neural Network model for
speaker labeling with 80% accuracy and 0.52 Diarization-Error-Rate.

The final component distinguishes between real and fake audio using a CNN architecture
with 94% accuracy. With these findings, this research will contribute immensely to the
domain of speech analysis.

17
2.2.3 Detecting Deep fake Voice Using Explainable Deep Learning
Techniques

In this paper, we present a human perception level of interpretability for deepfake audio
detection.

Based on their characteristics, we implement several explainable artificial intelligence (XAI)


methods used for image classification on an audio-related task. In addition, by examining the
human cognitive process of XAI on image classification, we suggest the use of a
corresponding data format for providing interpretability. Using this novel concept, a fresh
interpretation using attribution scores can be provided.

For the general interpretation of the detection model, two datasets with exclusive
characteristics were used. The first set of experiments was conducted upon the ASVspoof
2021 Logical Access dataset. ASVspoof consists of 2580 bona fide user speech data collected
from 107 speakers and the corresponding 22,800 synthesized speech data generated using 19
synthesizers.

As mentioned earlier, visualized interpretation with XAI methods for image classification
that provides the output in the form of a heatmap is often acceptable. If the classification
accuracy is high enough, the ensuing XAI result also tends to proceed properly. However,
current XAI methods are not perfect, and often fail to separate an object from the
background, eventually only highlighting the high-contrast object contour

18
2.2.4 The Effect of Deep Learning Methods on Deepfake Audio Detection
for Digital Investigation

Voice cloning methods have been used in a range of ways, from customized speech interfaces
for marketing to video games. Current voice cloning systems are smart enough to learn speech
characteristics from a few samples and produce perceptually unrecognizable speech. These
systems pose new protection and privacy risks to voice-driven interfaces. Fake audio has been
used for malicious purposes and is difficult to classify what is real and fake during a digital
forensic investigation. This paper reviews the issue of deep-fake audio classification and
evaluates the current methods of deep-fake audio detection for forensic investigation. Audio
file features were extracted and visually presented using MFCC, Mel-spectrum, Chromagram,
and spectrogram representations to further study the differences.

19
2.2.5 REAL-TIME DETECTION OF AI-GENERATED SPEECH FOR
DEEPFAKE VOICE CONVERSION
There are growing implications surrounding generative AI in the speech domain that enable
voice cloning and real-time voice conversion from one individual to another. This technology
poses a significant ethical threat and could lead to breaches of privacy and misrepresentation,
thus there is an urgent need for real-time detection of AI-generated speech for DeepFake
Voice Conversion. To address the above emerging issues, the DEEP-VOICE dataset is
generated in this study, comprised of real human speech from eight well-known figures and
their speech converted to one another using Retrieval-based Voice Conversion. Presenting as
a binary classification problem of whether the speech is real or AI-generated, statistical
analysis of temporal audio features through t-testing reveals that there are significantly
different distributions. Hyperparameter optimisation is implemented for machine learning
models to identify the source of speech. Following the training of 208 individual machine
learning models over 10-fold cross validation, it is found that the Extreme Gradient Boosting
model can achieve an average classification accuracy of 99.3% and can classify speech in
real-time, at around 0.004 milliseconds given one second of speech. All data generated for
this study is released publicly for future research on AI speech detection.

20
System Design And Methodology

System Diagram For Audio Classification

Methodology for audio classification


1. Data Collection & Preprocessing:

-Collect labeled audio data and extract relevant features (e.g.,MFCC,spectrograms).

-Normalize and potentially augment the data for diversity.

2. Model Selection & Training:

- Choose a suitable model (CNN, RNN, CRNN) for audio classification.

- Split data into train, validation, test sets.

- Train the model, tune hyper parameters, and validate its performance.

21
3. Model Evaluation & Optimization:

- Evaluate model performance on the test set.

- Optimize the model based on evaluation results, fine-tuning for better accuracy.

4. Deployment & Maintenance:

- Deploy the model and monitor its performance.

- Periodically update or retrain the model with new data for continuous accuracy.

5. Compliance & Regulation:

- Ensure the solution complies with financial regulations and industry standards throughout
the process.

22

You might also like