0% found this document useful (0 votes)

27 views26 pages

Fraud Transaction Prediction

1. The document cleans fraud transaction data and explores outliers and multicollinearity. It removes outliers using IQR and encodes categorical variables. High correlation is found between oldbalanceDest and newbalanceDest but they are important variables.

Uploaded by

Devanshu Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views26 pages

Fraud Transaction Prediction

Uploaded by

Devanshu Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

data = pd.read_csv('Fraud.csv')

data.head()

step type amount nameOrig oldbalanceOrg

newbalanceOrig \
0 1 PAYMENT 9839.64 C1231006815 170136.0
160296.36
1 1 PAYMENT 1864.28 C1666544295 21249.0
19384.72
2 1 TRANSFER 181.00 C1305486145 181.0
0.00
3 1 CASH_OUT 181.00 C840083671 181.0
0.00
4 1 PAYMENT 11668.14 C2048537720 41554.0
29885.86

nameDest oldbalanceDest newbalanceDest isFraud

isFlaggedFraud
0 M1979787155 0.0 0.0 0
0
1 M2044282225 0.0 0.0 0
0
2 C553264065 0.0 0.0 1
0
3 C38997010 21182.0 0.0 1
0
4 M1230701703 0.0 0.0 0
0

1. Data cleaning including missing values,

outliers and multi-collinearity.
data.isna().sum()

step 0
type 0
amount 0
nameOrig 0
oldbalanceOrg 0
newbalanceOrig 0
nameDest 0
oldbalanceDest 0
newbalanceDest 0
isFraud 0
isFlaggedFraud 0
dtype: int64

data = data.drop(['nameOrig','nameDest'],axis=1)

data.head()

step type amount oldbalanceOrg newbalanceOrig

oldbalanceDest \
0 1 PAYMENT 9839.64 170136.0 160296.36
0.0
1 1 PAYMENT 1864.28 21249.0 19384.72
0.0
2 1 TRANSFER 181.00 181.0 0.00
0.0
3 1 CASH_OUT 181.00 181.0 0.00
21182.0
4 1 PAYMENT 11668.14 41554.0 29885.86
0.0

newbalanceDest isFraud isFlaggedFraud

0 0.0 0 0
1 0.0 0 0
2 0.0 1 0
3 0.0 1 0
4 0.0 0 0

from sklearn.preprocessing import LabelEncoder

ord = LabelEncoder()
data['type'] = ord.fit_transform(data['type'])

data.head()

step type amount oldbalanceOrg newbalanceOrig oldbalanceDest

\
0 1 3 9839.64 170136.0 160296.36 0.0

1 1 3 1864.28 21249.0 19384.72 0.0

2 1 4 181.00 181.0 0.00 0.0

3 1 1 181.00 181.0 0.00 21182.0

4 1 3 11668.14 41554.0 29885.86 0.0

newbalanceDest isFraud isFlaggedFraud
0 0.0 0 0
1 0.0 0 0
2 0.0 1 0
3 0.0 1 0
4 0.0 0 0

Checking the Outliers

for i in data.columns:
plt.figure(figsize=(10,10))
sns.boxplot(data[i],orient='h')
plt.xlabel(i)

C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
Removing the Outliers Interquartile Range Method
columns_to_analyze =
['amount','oldbalanceOrg','newbalanceOrig','oldbalanceDest','newbalanc
eDest']
# plt.figure(figsize=(30,30))

# Create boxplots for each selected column

# data[columns_to_analyze].boxplot(figsize = (100,100))

# Identify and remove outliers by Using Interquartile range Concept

q1 = data[columns_to_analyze].quantile(0.25)
q3 = data[columns_to_analyze].quantile(0.75)
iqr = q3 - q1

# Filtering the outliers

outliers = (data[columns_to_analyze] < (q1 - 1.5 * iqr)) |
(data[columns_to_analyze] > (q3 + 1.5 * iqr))
df_no_outliers = data[~outliers.any(axis=1)]

df_no_outliers.shape

(4393187, 9)

data.shape

(6362620, 9)

total_outlier = data.shape[0] - df_no_outliers.shape[0]

total_outlier

1969433

cleaned_data = df_no_outliers

cleaned_data.head()

step type amount oldbalanceOrg newbalanceOrig oldbalanceDest

\
0 1 3 9839.64 170136.0 160296.36 0.0

1 1 3 1864.28 21249.0 19384.72 0.0

2 1 4 181.00 181.0 0.00 0.0

3 1 1 181.00 181.0 0.00 21182.0

4 1 3 11668.14 41554.0 29885.86 0.0

newbalanceDest isFraud isFlaggedFraud

0 0.0 0 0
1 0.0 0 0
2 0.0 1 0
3 0.0 1 0
4 0.0 0 0

Checking the Multicollinearity

from statsmodels.stats.outliers_influence import
variance_inflation_factor
def calc_VIF(x):
vif= pd.DataFrame()
vif['variables']=x.columns
vif["VIF"]=[variance_inflation_factor(x.values,i) for i in
range(x.shape[1])]

return(vif)
x=cleaned_data.drop('isFraud',axis=1)
calc_VIF(x)

C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\
statsmodels\regression\linear_model.py:1783: RuntimeWarning: invalid
value encountered in double_scalars
return 1 - self.ssr/self.uncentered_tss

variables VIF
0 step 2.964152
1 type 2.435823
2 amount 4.805709
3 oldbalanceOrg 2.420535
4 newbalanceOrig 2.996540
5 oldbalanceDest 53.449834
6 newbalanceDest 68.756255
7 isFlaggedFraud NaN

As We can see oldbalanceOrig and newbalanceDest are higly

correlated
plt.figure(figsize=(10,10))
corr = cleaned_data.corr()
sns.heatmap(corr,annot=True)

<AxesSubplot:>
It is also evident from the heatmap that
columns oldbalanceDest and newbalanceDest
are Highly correlated But as They are Important
Becoz Initial and Final Money on Destination
Side Is Important to Know
cleaned_data.head()

step type amount oldbalanceOrg newbalanceOrig oldbalanceDest

\
0 1 3 9839.64 170136.0 160296.36 0.0

1 1 3 1864.28 21249.0 19384.72 0.0

2 1 4 181.00 181.0 0.00 0.0

3 1 1 181.00 181.0 0.00 21182.0

4 1 3 11668.14 41554.0 29885.86 0.0

newbalanceDest isFraud isFlaggedFraud

0 0.0 0 0
1 0.0 0 0
2 0.0 1 0
3 0.0 1 0
4 0.0 0 0

sns.kdeplot(cleaned_data['oldbalanceOrg'])

<AxesSubplot:xlabel='oldbalanceOrg', ylabel='Density'>
sns.kdeplot(cleaned_data['oldbalanceDest'])

<AxesSubplot:xlabel='oldbalanceDest', ylabel='Density'>

sns.kdeplot(cleaned_data['newbalanceDest'])
<AxesSubplot:xlabel='newbalanceDest', ylabel='Density'>

Splitting the data into training and testing

from sklearn.model_selection import train_test_split
X = cleaned_data.drop('isFraud',axis=1)
y= cleaned_data['isFraud']
X_train,X_test,Y_train,Y_test =
train_test_split(X,y,test_size=0.9,random_state=42,stratify=y)

X_test.head()

step type amount oldbalanceOrg newbalanceOrig

oldbalanceDest \
1609976 156 3 9545.78 26355.00 16809.22
0.00
2220960 186 3 4790.25 0.00 0.00
0.00
4596673 328 3 51.52 108710.26 108658.74
0.00
19284 8 1 32966.31 59607.00 26640.69
1450296.94
5990985 419 3 106993.87 157767.00 50773.13
0.00

newbalanceDest isFlaggedFraud
1609976 0.00 0
2220960 0.00 0
4596673 0.00 0
19284 1236584.82 0
5990985 0.00 0

Describe your fraud detection model in

elaboration.
Model Training
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from xgboost import XGBClassifier
from sklearn.metrics import
accuracy_score,confusion_matrix,ConfusionMatrixDisplay,precision_score

lg = LogisticRegression()
lg.fit(X_train,Y_train)
print("accuracy",accuracy_score(lg.predict(X_test),Y_test))

accuracy 0.9994499059022947

xgb = XGBClassifier()
xgb.fit(X_train,Y_train)
pred1 = xgb.predict(X_test)
print("accuracy",accuracy_score(pred1,Y_test))

accuracy 0.9997152156533259

How did you select variables to be included in

the model?
• The Variables are selected on the basis of eda and feature correlation defined by
heatmap with a threshold of 0.5
Demonstrate the performance of the model by
using best set of tools.
conf = confusion_matrix(pred1,Y_test)
disp = ConfusionMatrixDisplay(confusion_matrix=conf,display_labels =
[False, True])

disp.plot()

<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at
0x244498d7700>

From The Above Even 0.1 of the dataset Used For Training
Is Performing well on 0.9 of the testing Data Hence Our
Model is Working Great
What are the key factors that predict fraudulent customer?
Predicting fraudulent customers is crucial for businesses to protect themselves from financial
losses and maintain trust with legitimate customers. Several key factors can help in identifying
potentially fraudulent customers:
1. Abnormal transaction patterns: Look for unusual or atypical behavior, such as a
sudden increase in transaction volume, larger-than-usual purchases, or multiple
transactions from different locations within a short timeframe.

2. Unusual login activity: Frequent failed login attempts, multiple login locations, or
suspicious IP addresses could indicate potential unauthorized access.

3. Geographical incongruities: Analyze the geographic location of the customer's

transactions compared to their usual location. Rapid changes in locations can be a
red flag.

4. Payment discrepancies: Monitor for inconsistencies between the billing address,

shipping address, and the customer's location. Mismatched or incomplete
information may raise suspicions.

5. Velocity checks: Identify customers with an unusually high number of transactions

in a short period. This could suggest automated or bot-driven activities.

6. Device fingerprinting: Track and analyze the characteristics of the customer's

device used for transactions. Sudden changes in device information might indicate
suspicious behavior.

7. Account age and history: New accounts with a large number of transactions or
customers with little history may pose a higher risk.

8. Unusual purchase timing: Transactions made during non-business hours or

holidays may warrant additional scrutiny.

9. Unusual product combinations: Customers purchasing an unusual mix of

products or a large quantity of high-value items may raise suspicion.

10. Customer behavior changes: Look for changes in a customer's behavior, such as a
sudden shift in spending habits or a switch to higher-risk products.

11. Social network analysis: Investigate relationships between customers and identify
connections between potentially fraudulent accounts.

12. Watchlists and databases: Check against internal and external fraud databases or
watchlists for known fraudulent customers.

13. Payment methods: Some payment methods, such as virtual credit cards or prepaid
cards, are associated with higher fraud risk.

14. Machine learning models: Implement advanced machine learning algorithms that
can analyze large amounts of data and identify patterns indicative of fraud.
It's important to remember that no single factor can reliably predict fraudulent customers. A
combination of these factors, along with continuous monitoring and analysis, will provide a
more accurate assessment of potential fraud. Moreover, it's essential to maintain a balance
between fraud detection and customer experience to avoid false positives that could harm
genuine customers.

What kind of prevention should be adopted while company

update its infrastructure?
When a company updates its infrastructure for handling fraud transactions, it should implement
a multi-layered approach to prevention. This will help to minimize the risk of fraud and protect
both the company and its customers. Here are some key prevention strategies to adopt:

1. Advanced authentication methods: Implement multi-factor authentication (MFA)

to add an extra layer of security. This can include something the user knows
(password), something they have (OTP or token), and something they are
(fingerprint or facial recognition).

2. Encryption and secure communication: Ensure that all data transmissions are
encrypted using industry-standard protocols like TLS (Transport Layer Security) to
protect sensitive information during transit.

3. Real-time transaction monitoring: Utilize sophisticated fraud detection systems

that can analyze transactions in real-time and identify suspicious patterns or
anomalies.

4. Behavioral analysis: Implement machine learning algorithms to analyze customer

behavior and create profiles for normal usage. Any deviations from these profiles
can trigger alerts for potential fraud.

5. IP geolocation and device profiling: Use geolocation data and device

fingerprinting to detect suspicious login attempts or transactions from unfamiliar
locations or devices.

6. Velocity checks: Set thresholds for transaction volume to identify and block
multiple transactions occurring in quick succession.

7. Blacklists and whitelists: Maintain lists of known fraudulent customers or high-

risk regions to block or flag suspicious activities.

8. Regular security audits and vulnerability assessments: Conduct periodic

security audits to identify and address potential weaknesses in the system.

9. Employee training and awareness: Educate employees about fraud prevention,

cybersecurity best practices, and how to recognize potential threats.

10. Secure APIs and third-party integrations: If the company integrates with third-
party services or APIs, ensure that they have robust security measures in place to
prevent data breaches.

11. Fraud analysis and reporting: Establish a process for reporting and investigating
suspected fraudulent activities promptly.
12. Customer communication: Keep customers informed about security measures,
potential risks, and the steps they can take to protect themselves.

13. Compliance with industry standards and regulations: Ensure that the company
complies with relevant data protection laws and industry security standards.

14. Regular system updates and patches: Keep all software and systems up to date
with the latest security patches to minimize vulnerabilities.

15. Continuous improvement: Regularly review and update fraud prevention

strategies to stay ahead of evolving fraud tactics.
By adopting these prevention measures, the company can create a robust and secure
infrastructure that safeguards against fraud transactions and builds trust with its customers.

Assuming these actions have been implemented, how

would you determine if they work?
To determine the effectiveness of the implemented actions in fraud detection, you can follow a
comprehensive evaluation process. Here are some steps to assess the success of the fraud
prevention measures:

1. Data analysis and metrics: Collect and analyze data related to fraud detection and
prevention. This includes the number of detected fraud incidents, false positives,
true positives, and the overall accuracy of the system.

2. Benchmarking: Establish benchmarks based on historical data before

implementing the new fraud prevention measures. This will allow you to compare
the current performance with past performance.

3. False positive rate: Measure the rate of false positives, i.e., legitimate transactions
flagged as fraudulent. A high false positive rate can inconvenience customers and
impact the company's revenue.

4. True positive rate: Measure the rate of true positives, i.e., actual fraudulent
transactions correctly identified by the system. A high true positive rate indicates
effective fraud detection.

5. Reduction in fraud losses: Calculate the reduction in financial losses due to fraud
after implementing the prevention measures.

6. Customer feedback and complaints: Gather feedback from customers to gauge

their experience with the new security measures. Address any complaints or
concerns promptly.

7. Comparison with industry standards: Compare the company's fraud detection

performance with industry benchmarks and best practices.
8. Time-to-detect and response time: Measure the time taken to detect potential
fraud and respond to suspicious activities. A faster response can minimize damage.

9. Adaptability to new fraud patterns: Assess how well the system adapts to
evolving fraud patterns and whether it can detect new types of fraud.

10. Cost-effectiveness: Evaluate the cost-effectiveness of the fraud prevention

measures. The benefits of preventing fraud should outweigh the expenses
associated with implementing and maintaining the system.

11. External validation: Consider seeking third-party validation or conducting

penetration tests to assess the system's resilience against potential attacks.

12. Continuous improvement: Establish a feedback loop for ongoing improvement.

Regularly review and fine-tune the fraud detection system based on the analysis of
new data and emerging fraud trends.

13. Comparing against control groups: Use control groups to compare the
performance of the fraud detection system with areas where the new measures
haven't been implemented. This can help isolate the impact of the prevention
measures.
By conducting a thorough evaluation using these metrics, the company can determine the
effectiveness of its fraud detection measures. This information can be used to make informed
decisions on refining existing strategies or implementing new ones to further strengthen the
fraud prevention system. It's important to note that fraudsters continually evolve their tactics, so
the evaluation process should be ongoing and adaptive.

Predictive System
a =
['8','1','32966.31','59607.00','26640.69','1450296.94','1236584.82','0
']
a = pd.DataFrame([a],columns=X_test.columns,dtype='float')

step type amount oldbalanceOrg newbalanceOrig oldbalanceDest

\
0 8.0 1.0 32966.31 59607.0 26640.69 1450296.94

newbalanceDest isFlaggedFraud
0 1236584.82 0.0

prediction = xgb.predict(a)[0]

if prediction==0:
print("Not A Fraud Transaction")
else:
print("Is a Fraud Transaction")

Not A Fraud Transaction

Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
No ratings yet
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
2 pages
Online Payments Fraud Detection Documentation
No ratings yet
Online Payments Fraud Detection Documentation
40 pages
Predictive Modelling Alternative Firm Level PDF
100% (4)
Predictive Modelling Alternative Firm Level PDF
26 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
DataMiningProjectProblem1 Clustering
100% (4)
DataMiningProjectProblem1 Clustering
20 pages
IBM Credit Card Fraud Detection
No ratings yet
IBM Credit Card Fraud Detection
12 pages
Task
No ratings yet
Task
15 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
Online Feaud Detection
No ratings yet
Online Feaud Detection
280 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Fault Prediction
No ratings yet
Fault Prediction
6 pages
Online Payment Fraud Detection - Ipynb
No ratings yet
Online Payment Fraud Detection - Ipynb
120 pages
Practical 3
No ratings yet
Practical 3
8 pages
"Normal" "Fraud": #Check For Any Null Values
No ratings yet
"Normal" "Fraud": #Check For Any Null Values
7 pages
AIML Lab Ex 3-5 - 1
No ratings yet
AIML Lab Ex 3-5 - 1
31 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
AIML 01 Merged
No ratings yet
AIML 01 Merged
25 pages
Danmairo - Analysis - Ipynb - Colaboratory
No ratings yet
Danmairo - Analysis - Ipynb - Colaboratory
18 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
28 pages
Untitled
No ratings yet
Untitled
29 pages
Credit-Card - Notebooks - Preprocessed-Data - Data - Preprocessing - Ipynb at Main Shubhamdongarjal - Credit-Card
No ratings yet
Credit-Card - Notebooks - Preprocessed-Data - Data - Preprocessing - Ipynb at Main Shubhamdongarjal - Credit-Card
15 pages
Churn For Bank Customers
No ratings yet
Churn For Bank Customers
28 pages
Phase 2 New
No ratings yet
Phase 2 New
14 pages
ML
No ratings yet
ML
23 pages
Fraud 2
No ratings yet
Fraud 2
20 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Python Note 3
No ratings yet
Python Note 3
11 pages
MSML Project 1
No ratings yet
MSML Project 1
8 pages
Time Series Analysis of HDFCBANK Stock by Pavan
No ratings yet
Time Series Analysis of HDFCBANK Stock by Pavan
10 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
ML
No ratings yet
ML
10 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Final Document
No ratings yet
Final Document
14 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Practical 5
No ratings yet
Practical 5
6 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
8 pages
基于Engle Granger的低频、高频统计套利研究
No ratings yet
基于Engle Granger的低频、高频统计套利研究
22 pages
10 Techniques To Deal With Class Imbalance in Machine Learning
No ratings yet
10 Techniques To Deal With Class Imbalance in Machine Learning
10 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
Linear Regression - 25mar2025
No ratings yet
Linear Regression - 25mar2025
14 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Edp 3
No ratings yet
Edp 3
16 pages
Project Intern - Jupyter Notebook
No ratings yet
Project Intern - Jupyter Notebook
16 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Gestión de Carteras Mapa de Calor
No ratings yet
Gestión de Carteras Mapa de Calor
1 page
Fraud Transaction Detection - Ipynb - Colab - Rameshkumar
No ratings yet
Fraud Transaction Detection - Ipynb - Colab - Rameshkumar
7 pages
Ifm Group2 Code
No ratings yet
Ifm Group2 Code
7 pages
Finaldoc
No ratings yet
Finaldoc
19 pages
Granger Causality and VAR Models
No ratings yet
Granger Causality and VAR Models
1 page
Lab3.ipynb - Colaboratory
No ratings yet
Lab3.ipynb - Colaboratory
7 pages
Xtasy
No ratings yet
Xtasy
14 pages
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
No ratings yet
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
17 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Credit Card 1679991215
No ratings yet
Credit Card 1679991215
26 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Predictive 23-06-2025 - Jupyter Notebook
No ratings yet
Predictive 23-06-2025 - Jupyter Notebook
14 pages
Prototype 13
No ratings yet
Prototype 13
1 page

Fraud Transaction Prediction

Uploaded by

Fraud Transaction Prediction

Uploaded by

import numpy as np

step type amount nameOrig oldbalanceOrg

nameDest oldbalanceDest newbalanceDest isFraud

1. Data cleaning including missing values,

step type amount oldbalanceOrg newbalanceOrig

newbalanceDest isFraud isFlaggedFraud

from sklearn.preprocessing import LabelEncoder

step type amount oldbalanceOrg newbalanceOrig oldbalanceDest

1 1 3 1864.28 21249.0 19384.72 0.0

2 1 4 181.00 181.0 0.00 0.0

3 1 1 181.00 181.0 0.00 21182.0

4 1 3 11668.14 41554.0 29885.86 0.0

Checking the Outliers

# Create boxplots for each selected column

# Identify and remove outliers by Using Interquartile range Concept

# Filtering the outliers

total_outlier = data.shape[0] - df_no_outliers.shape[0]

step type amount oldbalanceOrg newbalanceOrig oldbalanceDest

1 1 3 1864.28 21249.0 19384.72 0.0

2 1 4 181.00 181.0 0.00 0.0

3 1 1 181.00 181.0 0.00 21182.0

4 1 3 11668.14 41554.0 29885.86 0.0

newbalanceDest isFraud isFlaggedFraud

Checking the Multicollinearity

As We can see oldbalanceOrig and newbalanceDest are higly

step type amount oldbalanceOrg newbalanceOrig oldbalanceDest

1 1 3 1864.28 21249.0 19384.72 0.0

2 1 4 181.00 181.0 0.00 0.0

3 1 1 181.00 181.0 0.00 21182.0

4 1 3 11668.14 41554.0 29885.86 0.0

newbalanceDest isFraud isFlaggedFraud

Splitting the data into training and testing

step type amount oldbalanceOrg newbalanceOrig

Describe your fraud detection model in

How did you select variables to be included in

3. Geographical incongruities: Analyze the geographic location of the customer's

4. Payment discrepancies: Monitor for inconsistencies between the billing address,

5. Velocity checks: Identify customers with an unusually high number of transactions

6. Device fingerprinting: Track and analyze the characteristics of the customer's

8. Unusual purchase timing: Transactions made during non-business hours or

9. Unusual product combinations: Customers purchasing an unusual mix of

What kind of prevention should be adopted while company

1. Advanced authentication methods: Implement multi-factor authentication (MFA)

3. Real-time transaction monitoring: Utilize sophisticated fraud detection systems

4. Behavioral analysis: Implement machine learning algorithms to analyze customer

5. IP geolocation and device profiling: Use geolocation data and device

7. Blacklists and whitelists: Maintain lists of known fraudulent customers or high-

8. Regular security audits and vulnerability assessments: Conduct periodic

9. Employee training and awareness: Educate employees about fraud prevention,

15. Continuous improvement: Regularly review and update fraud prevention

Assuming these actions have been implemented, how

2. Benchmarking: Establish benchmarks based on historical data before

6. Customer feedback and complaints: Gather feedback from customers to gauge

7. Comparison with industry standards: Compare the company's fraud detection

10. Cost-effectiveness: Evaluate the cost-effectiveness of the fraud prevention

11. External validation: Consider seeking third-party validation or conducting

12. Continuous improvement: Establish a feedback loop for ongoing improvement.

step type amount oldbalanceOrg newbalanceOrig oldbalanceDest

Not A Fraud Transaction

You might also like