Fraud Transaction Prediction
Fraud Transaction Prediction
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
data = pd.read_csv('Fraud.csv')
data.head()
step 0
type 0
amount 0
nameOrig 0
oldbalanceOrg 0
newbalanceOrig 0
nameDest 0
oldbalanceDest 0
newbalanceDest 0
isFraud 0
isFlaggedFraud 0
dtype: int64
data = data.drop(['nameOrig','nameDest'],axis=1)
data.head()
data.head()
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\seaborn\
_decorators.py:36: FutureWarning: Pass the following variable as a
keyword arg: x. From version 0.12, the only valid positional argument
will be `data`, and passing other arguments without an explicit
keyword will result in an error or misinterpretation.
warnings.warn(
Removing the Outliers Interquartile Range Method
columns_to_analyze =
['amount','oldbalanceOrg','newbalanceOrig','oldbalanceDest','newbalanc
eDest']
# plt.figure(figsize=(30,30))
df_no_outliers.shape
(4393187, 9)
data.shape
(6362620, 9)
total_outlier
1969433
cleaned_data = df_no_outliers
cleaned_data.head()
return(vif)
x=cleaned_data.drop('isFraud',axis=1)
calc_VIF(x)
C:\Users\devan\.conda\envs\MachineLearning\lib\site-packages\
statsmodels\regression\linear_model.py:1783: RuntimeWarning: invalid
value encountered in double_scalars
return 1 - self.ssr/self.uncentered_tss
variables VIF
0 step 2.964152
1 type 2.435823
2 amount 4.805709
3 oldbalanceOrg 2.420535
4 newbalanceOrig 2.996540
5 oldbalanceDest 53.449834
6 newbalanceDest 68.756255
7 isFlaggedFraud NaN
<AxesSubplot:>
It is also evident from the heatmap that
columns oldbalanceDest and newbalanceDest
are Highly correlated But as They are Important
Becoz Initial and Final Money on Destination
Side Is Important to Know
cleaned_data.head()
sns.kdeplot(cleaned_data['oldbalanceOrg'])
<AxesSubplot:xlabel='oldbalanceOrg', ylabel='Density'>
sns.kdeplot(cleaned_data['oldbalanceDest'])
<AxesSubplot:xlabel='oldbalanceDest', ylabel='Density'>
sns.kdeplot(cleaned_data['newbalanceDest'])
<AxesSubplot:xlabel='newbalanceDest', ylabel='Density'>
X_test.head()
newbalanceDest isFlaggedFraud
1609976 0.00 0
2220960 0.00 0
4596673 0.00 0
19284 1236584.82 0
5990985 0.00 0
lg = LogisticRegression()
lg.fit(X_train,Y_train)
print("accuracy",accuracy_score(lg.predict(X_test),Y_test))
accuracy 0.9994499059022947
xgb = XGBClassifier()
xgb.fit(X_train,Y_train)
pred1 = xgb.predict(X_test)
print("accuracy",accuracy_score(pred1,Y_test))
accuracy 0.9997152156533259
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at
0x244498d7700>
From The Above Even 0.1 of the dataset Used For Training
Is Performing well on 0.9 of the testing Data Hence Our
Model is Working Great
What are the key factors that predict fraudulent customer?
Predicting fraudulent customers is crucial for businesses to protect themselves from financial
losses and maintain trust with legitimate customers. Several key factors can help in identifying
potentially fraudulent customers:
1. Abnormal transaction patterns: Look for unusual or atypical behavior, such as a
sudden increase in transaction volume, larger-than-usual purchases, or multiple
transactions from different locations within a short timeframe.
2. Unusual login activity: Frequent failed login attempts, multiple login locations, or
suspicious IP addresses could indicate potential unauthorized access.
7. Account age and history: New accounts with a large number of transactions or
customers with little history may pose a higher risk.
10. Customer behavior changes: Look for changes in a customer's behavior, such as a
sudden shift in spending habits or a switch to higher-risk products.
11. Social network analysis: Investigate relationships between customers and identify
connections between potentially fraudulent accounts.
12. Watchlists and databases: Check against internal and external fraud databases or
watchlists for known fraudulent customers.
13. Payment methods: Some payment methods, such as virtual credit cards or prepaid
cards, are associated with higher fraud risk.
14. Machine learning models: Implement advanced machine learning algorithms that
can analyze large amounts of data and identify patterns indicative of fraud.
It's important to remember that no single factor can reliably predict fraudulent customers. A
combination of these factors, along with continuous monitoring and analysis, will provide a
more accurate assessment of potential fraud. Moreover, it's essential to maintain a balance
between fraud detection and customer experience to avoid false positives that could harm
genuine customers.
2. Encryption and secure communication: Ensure that all data transmissions are
encrypted using industry-standard protocols like TLS (Transport Layer Security) to
protect sensitive information during transit.
6. Velocity checks: Set thresholds for transaction volume to identify and block
multiple transactions occurring in quick succession.
10. Secure APIs and third-party integrations: If the company integrates with third-
party services or APIs, ensure that they have robust security measures in place to
prevent data breaches.
11. Fraud analysis and reporting: Establish a process for reporting and investigating
suspected fraudulent activities promptly.
12. Customer communication: Keep customers informed about security measures,
potential risks, and the steps they can take to protect themselves.
13. Compliance with industry standards and regulations: Ensure that the company
complies with relevant data protection laws and industry security standards.
14. Regular system updates and patches: Keep all software and systems up to date
with the latest security patches to minimize vulnerabilities.
1. Data analysis and metrics: Collect and analyze data related to fraud detection and
prevention. This includes the number of detected fraud incidents, false positives,
true positives, and the overall accuracy of the system.
3. False positive rate: Measure the rate of false positives, i.e., legitimate transactions
flagged as fraudulent. A high false positive rate can inconvenience customers and
impact the company's revenue.
4. True positive rate: Measure the rate of true positives, i.e., actual fraudulent
transactions correctly identified by the system. A high true positive rate indicates
effective fraud detection.
5. Reduction in fraud losses: Calculate the reduction in financial losses due to fraud
after implementing the prevention measures.
9. Adaptability to new fraud patterns: Assess how well the system adapts to
evolving fraud patterns and whether it can detect new types of fraud.
13. Comparing against control groups: Use control groups to compare the
performance of the fraud detection system with areas where the new measures
haven't been implemented. This can help isolate the impact of the prevention
measures.
By conducting a thorough evaluation using these metrics, the company can determine the
effectiveness of its fraud detection measures. This information can be used to make informed
decisions on refining existing strategies or implementing new ones to further strengthen the
fraud prevention system. It's important to note that fraudsters continually evolve their tactics, so
the evaluation process should be ongoing and adaptive.
Predictive System
a =
['8','1','32966.31','59607.00','26640.69','1450296.94','1236584.82','0
']
a = pd.DataFrame([a],columns=X_test.columns,dtype='float')
newbalanceDest isFlaggedFraud
0 1236584.82 0.0
prediction = xgb.predict(a)[0]
if prediction==0:
print("Not A Fraud Transaction")
else:
print("Is a Fraud Transaction")