ML Final
ML Final
Online Payment
Fraud Detection
using Machine
Learning
D O N E B Y:
G AU TH AM G D
A I S H WA R YA R
INTRODUCTION
In the digital age, online payments have become an
integral part of our daily lives. With the increasing
trend of online transactions, fraud cases are also
rising, resulting in significant financial losses. To
combat this issue, we have developed a machine
learning-based system for online payment fraud
detection. This project aims to provide a robust and
accurate solution to detect fraudulent transactions in
real-time, reducing financial losses and increasing
confidence in online payments. By leveraging machine
learning algorithms and historical data, our system can
identify patterns and anomalies to flag potential fraud
cases, providing a secure and streamlined transaction
experience for users.
PROBLEM
DEFINITION:
Online payment fraud detection is a critical
issue in the digital payment ecosystem.
With the increasing trend of online
transactions, fraudulent activities are also
rising, resulting in significant financial
losses. The problem is to develop a system
that can accurately detect fraudulent
transactions in real-time, preventing
financial losses and enhancing customer
trust.
PROPOSED SYSTEM:
5. Visualization:
This module plots training and testing accuracy
for each model, providing a visual comparison of
their performance.
The correlation matrix is also visualized, helping
to identify relationships between numeric
columns.
ARCHITECTURE
MODEL
DATA TRANING USER INPUT
FEATURE AND
PREPROCESSIN VISUALIZTION
SELECTION And EVALUATIO
G PREDICTION
N
PROCESSOR : Intel Core i5 or equivalent
RAM: 8 GB or more
HARDWARE
REQUIREMENT
S: STORAGE: 500 GB or more
Libraries: Pandas,
NumPy, Matplotlib, Seaborn ,XGBoost,
Scikit-learn
IDE: Jupyter Notebook or equivalent
01 02 03 04 05
XGBoost: Logistic Random Forest: K-Neighbors: AdaBoost:
Gradient Regression: Ensemble Finds similar Combines
Boosted Predicts learning for transactions to multiple weak
decision trees probability of classification. known models to
for fraud. fraudulent ones. improve
classification. accuracy.
MACHINE LEARNING MODELS:
1. XGBoost:
import xgboost as xgb
Initialize: xgb.XGBClassifier()
Predict: xgb.XGBClassifier().predict(X_test)
Initialize: LogisticRegression()
Predict: LogisticRegression().predict(X_test)
DONE BY:
GAUTHAM GD
AISHWARYA R
STEPS TO IMPLEMENT
2. Create a new notebook by clicking on "File" > "New Notebook" or "File" >
"Upload Notebook" if you have a notebook file.
3. If you are creating a new notebook, you will see a new cell. You can start typing
code in this cell.
drive.mount('/content/drive')
2. Click on the link generated, allow access to your Google Drive, and copy the
authentication code. Paste this code into the cell and press Enter.
2. Run the cell, either by clicking the play button next to the cell or pressing
Shift+Enter.
Step 8: Check Results
1. After running the code, you will see the results of model evaluation and
predictions in the output cells.
2. Look for the prediction result for the user input transaction to see if it's predicted
as fraud or non-fraud.
PROGRAM TO RUN
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix, roc_curve, auc, log_loss
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report, roc_curve, auc,
ConfusionMatrixDisplay
import random
from sklearn.metrics import roc_auc_score as ras
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import AdaBoostClassifier
df = pd.read_csv('/content/drive/MyDrive/ML /onlinefraud.csv')
print(df.shape)
print(df.head(5))
# Check for missing values
missing_values = df.isnull().sum()
print("Missing Values:\n", missing_values)
df = df.dropna() # Remove rows with missing values
# Create a correlation matrix
correlation_matrix = df.corr(numeric_only=True) # To calculate correlation only
for numeric columns
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Matrix")
plt.show()
# Feature selection and splitting
X = df.drop(['isFraud'], axis=1)
y = df['isFraud']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Exclude non-numeric columns from the training and testing data
non_numeric_columns = ['nameOrig', 'nameDest', 'type']
X_train = X_train.drop(columns=non_numeric_columns)
X_test = X_test.drop(columns=non_numeric_columns)
model = xgb.XGBClassifier()
model1 = LogisticRegression()
model2 =
RandomForestClassifier(n_estimators=7,criterion='entropy',random_state=7)
model3 = KNeighborsClassifier()
model4 = AdaBoostClassifier(random_state=42)
models = [model, model1, model2, model3, model4]
model_names = ['XGBoost', 'Logistic Regression', 'Random Forest', 'K-
neighbours', 'AdaBoost']
train_accuracy = []
test_accuracy = []
train_losses = []
test_losses = []
for model, name in zip(models, model_names):
model.fit(X_train, y_train)
# Training accuracy and loss
train_pred = model.predict(X_train)
train_acc = accuracy_score(y_train, train_pred)
train_loss = log_loss(y_train, model.predict_proba(X_train))
train_accuracy.append(train_acc)
train_losses.append(train_loss)
# Testing accuracy and loss
test_pred = model.predict(X_test)
test_acc = accuracy_score(y_test, test_pred)
test_loss = log_loss(y_test, model.predict_proba(X_test))
test_accuracy.append(test_acc)
test_losses.append(test_loss)
print(f"Accuracy for {name}: {test_acc}, Loss: {test_loss}")
# Plotting
plt.figure(figsize=(12, 8))
plt.plot(model_names, train_accuracy, marker='o', label='Training Accuracy')
plt.plot(model_names, test_accuracy, marker='o', label='Testing Accuracy')
plt.title('Training and Testing Accuracies for Different Models')
plt.xlabel('Models')
plt.ylabel('Accuracy')
plt.legend()
plt.xticks(rotation=45)
plt.grid(True)
plt.show()
user_input = {
'step': 1,
'amount': 10000.00,
'oldbalanceOrg': 30000.00,
'newbalanceOrig': 60000.0,
'oldbalanceDest': 3000.00,
'newbalanceDest': 33000.00,
'isFlagge1dFraud': df['isFlaggedFraud'].values[0] # Extract from your dataset
}
# Create a DataFrame from user input
user_df = pd.DataFrame([user_input])
# Make predictions using the model
user_predictions = model.predict(user_df)
# Check if the user input resulted in fraud or not
if user_predictions[0] == 1:
print("The transaction is predicted as fraud.")
else:
print("The transaction is predicted as non-fraud.")