0% found this document useful (0 votes)
26 views34 pages

ML Final

The document describes a machine learning model for detecting fraudulent online payments. It discusses preprocessing the transaction data, training various models like XGBoost and evaluating their performance, and using the best model to predict fraud. The system aims to help secure online transactions.

Uploaded by

4023 Keerthana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views34 pages

ML Final

The document describes a machine learning model for detecting fraudulent online payments. It discusses preprocessing the transaction data, training various models like XGBoost and evaluating their performance, and using the best model to predict fraud. The system aims to help secure online transactions.

Uploaded by

4023 Keerthana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

ML PROJECT

Online Payment
Fraud Detection
using Machine
Learning

D O N E B Y:
G AU TH AM G D
A I S H WA R YA R
INTRODUCTION
 In the digital age, online payments have become an
integral part of our daily lives. With the increasing
trend of online transactions, fraud cases are also
rising, resulting in significant financial losses. To
combat this issue, we have developed a machine
learning-based system for online payment fraud
detection. This project aims to provide a robust and
accurate solution to detect fraudulent transactions in
real-time, reducing financial losses and increasing
confidence in online payments. By leveraging machine
learning algorithms and historical data, our system can
identify patterns and anomalies to flag potential fraud
cases, providing a secure and streamlined transaction
experience for users.
PROBLEM
DEFINITION:
 Online payment fraud detection is a critical
issue in the digital payment ecosystem.
With the increasing trend of online
transactions, fraudulent activities are also
rising, resulting in significant financial
losses. The problem is to develop a system
that can accurately detect fraudulent
transactions in real-time, preventing
financial losses and enhancing customer
trust.
PROPOSED SYSTEM:

 Our proposed system detects fraudulent


transactions in real-time using advanced
machine learning algorithms and multi-
parameter analysis. It automatically
updates models with new data and
algorithms, staying ahead of evolving fraud
patterns. This improves detection accuracy
and reduces false positives, enhancing
online payment security.
MODULES:
1 Data Preprocessing :

 Checking for missing values and preprocessing data to


remove missing values, outliers, and inconsistencies ensures
data quality and prevents errors in later stages.

 Creating a correlation matrix helps identify relationships


between numeric columns, which can inform feature selection.

2. Feature engineering module:

 Applies feature engineering techniques to extract relevant


features from data like transaction amount, location, and time,
transforms and select key attributes,
3. Model Training and Evaluation:
 This module trains five different machine learning
models (XGBoost, Logistic Regression, Random
Forest, K-Neighbors, and AdaBoost) on the training
data.
 Each model's performance is evaluated on the
testing data using accuracy and loss metrics,
providing insights into their effectiveness.

4. User Input and Prediction:


 This module takes user input for transaction details,
such as step, amount, old balance, and new balance.
 A Data frame is created from the user input,
MODULES: mimicking the format of the training data. The trained
model is used to predict whether the transaction is
fraudulent or not based on the user input.
MODULES:

5. Visualization:
 This module plots training and testing accuracy
for each model, providing a visual comparison of
their performance.
 The correlation matrix is also visualized, helping
to identify relationships between numeric
columns.
ARCHITECTURE

MODEL
DATA TRANING USER INPUT
FEATURE AND
PREPROCESSIN VISUALIZTION
SELECTION And EVALUATIO
G PREDICTION
N
PROCESSOR : Intel Core i5 or equivalent

RAM: 8 GB or more
HARDWARE
REQUIREMENT
S: STORAGE: 500 GB or more

GRAPHICS CARD : NVIDIA GeForce GTX


1060 or equivalent (for visualization)
SOFTWARE Operating System: Windows 10 or macOS
REQUIREMENT High Sierra or later
S: Python: Version 3.8 or later

Libraries: Pandas,
NumPy, Matplotlib, Seaborn ,XGBoost,
Scikit-learn
IDE: Jupyter Notebook or equivalent

Database: CSV file or equivalent (for


storing the dataset)
LIBRARIES:

The project uses the following libraries are:


 Pandas: For data manipulation and analysis

 NumPy: For numerical computations

 Matplotlib and Seaborn: For data


visualization
 Scikit-learn: For machine learning models

 XGBoost: For gradient boosting


FEATURE DESCRIPTION:
1. Step: This feature represents the unit of time (1 hour) in which
the transaction occurred. It can help identify patterns or anomalies
in transaction behavior over time.

2. Amount: This feature represents the transaction amount, which


can be a key indicator of fraud. Large or unusual transaction
amounts may be flagged as potential fraud.

3. OldbalanceOrg: This feature represents the old balance of the


origin account before the transaction occurred. It can help identify
changes in account behavior or unusual activity.

4. NewbalanceOrig: This feature represents the new balance of the


origin account after the transaction occurred. It can help identify
changes in account behavior or unusual activity.
FEATURE
DESCRIPTION:
5. OldbalanceDest: This feature represents the old balance of the
destination account before the transaction occurred. It can help
identify changes in account behavior or unusual activity.

6. NewbalanceDest: This feature represents the new balance of the


destination account after the transaction occurred. It can help
identify changes in account behavior or unusual activity.

7. IsFlaggedFraud: This feature indicates whether the transaction is


fraudulent (1) or not (0). It is the target variable that the machine
learning models are trained to predict.

 These features are used together to train machine learning models


to detect fraudulent transactions. By analyzing patterns and
relationships between these features, the models can identify
potential fraud and flag it for further investigation.
PROGRAM
FUNCTIONALITY:
 Loads a dataset of transactions

 Preprocesses the data by removing missing values and


selecting numeric columns
 Splits the data into training and testing sets

 Trains five different machine learning models (XGBoost,


Logistic Regression, Random Forest, K-Neighbors, and
AdaBoost) on the training data
 Evaluates the performance of each model on the testing
data
 Plots the training and testing accuracies for each model

 Prompts the user to input transaction details

 Makes a prediction using the trained model based on the


user input
 Indicates whether the transaction is predicted as fraud or
ARCHITECTURE:
 Data Preprocessing: The dataset is loaded, and
missing values are removed.
 Feature Selection: Non-numeric columns are
dropped.
 Model Training: Five machine learning models are
trained on the dataset.
 Model Evaluation: The models are evaluated on the
testing set.
 Prediction: The best model is used to predict
fraudulent transactions.
MACHINE LEARNING MODELS:

01 02 03 04 05
XGBoost: Logistic Random Forest: K-Neighbors: AdaBoost:
Gradient Regression: Ensemble Finds similar Combines
Boosted Predicts learning for transactions to multiple weak
decision trees probability of classification. known models to
for fraud. fraudulent ones. improve
classification. accuracy.
MACHINE LEARNING MODELS:
1. XGBoost:
 import xgboost as xgb
 Initialize: xgb.XGBClassifier()

 Train: xgb.XGBClassifier().fit(X_train, y_train)

 Predict: xgb.XGBClassifier().predict(X_test)

 Hyperparameter tuning: Use xgb.XGBClassifier() with GridSearchCV or


RandomizedSearchCV to optimize parameters like max_depth, learning_rate, and
n_estimators
MACHINE LEARNING MODELS:
2. Logistic Regression:
 Import: from sklearn.linear_model import LogisticRegression

 Initialize: LogisticRegression()

 Train: LogisticRegression().fit(X_train, y_train)

 Predict: LogisticRegression().predict(X_test)

 Hyperparameter tuning: Use LogisticRegression()


with GridSearchCV or RandomizedSearchCV to optimize parameters like C and
penalty
MACHINE LEARNING MODELS:
3. Random Forest:
 Import: from sklearn.ensemble import RandomForestClassifier
 Initialize: RandomForestClassifier()
 Train: RandomForestClassifier().fit(X_train, y_train)
 Predict: RandomForestClassifier().predict(X_test)
 Hyperparameter tuning: Use RandomForestClassifier() with GridSearchCV or
RandomizedSearchCV to optimize parameters like n_estimators, max_depth, and
min_samples_split
MACHINE LEARNING MODELS:
4. K-Neighbors:
 Import: from sklearn.neighbors import KNeighborsClassifier
 Initialize: KNeighborsClassifier()
 Train: KNeighborsClassifier().fit(X_train, y_train)
 Predict: KNeighborsClassifier().predict(X_test)
 Hyperparameter tuning: Use KNeighborsClassifier() with GridSearchCV or
RandomizedSearchCV to optimize parameters like n_neighbors and weights
MACHINE LEARNING MODELS:
5. AdaBoost:
 Import: from sklearn.ensemble import AdaBoostClassifier
 Initialize: AdaBoostClassifier()
 Train: AdaBoostClassifier().fit(X_train, y_train)
 Predict: AdaBoostClassifier().predict(X_test)
 Hyperparameter tuning: Use AdaBoostClassifier() with GridSearchCV or
RandomizedSearchCV to optimize parameters like n_estimators and
learning_rate
Correlation
Matrix
• correlation_matrix = df.corr(nu
meric_only=True)
• plt.figure(figsize=(10, 8))
• sns.heatmap(correlation_matri
x, annot=True, cmap='coolwar
m', fmt=".2f")
• plt.title("Correlation Matrix")
• plt.show()
OUTPUT
• Accuracy for XGBoost:
0.9996605172083198, Loss:
0.0010340594057733003
• Accuracy for Logistic Regression
0.9983128019589415, Loss:
0.01829306096263862
• Accuracy for Random Forest:
0.9995677881124443, Loss:
0.0051479299492394335
• Accuracy for K-neighbours:
0.9994499121431109, Loss:
0.009307683764686884
• Accuracy for AdaBoost:
0.9992015867677152, Loss:
0.5698750028631996
CONCULSION:

 We conclude by that the XGBoost


classifier is giving the highest
accuracy and low losses while
training and testing for Online
Payment Fraud.

This Photo by Unknown author is licensed under CC BY-SA-NC.


INNOVATION:

 Machine learning algorithms adapt to new


fraud patterns
 Real-time detection responds to
transactions as they occur
 External data integration enhances detection
accuracy
 User feedback mechanism improves model
accuracy
Online Payment Fraud
Detection using Machine
Learning
ML PROJECT

DONE BY:

GAUTHAM GD

AISHWARYA R
STEPS TO IMPLEMENT

Step 1: Upload Dataset to Google Drive


1. Go to https://drive.google.com/ and log in with your Google account.

2. Upload your dataset ('onlinefraud.csv') to your Google Drive. Make sure to


remember the path where you upload the file.

Step 2: Set Up Google Colab


1. Go to https://colab.research.google.com/ and log in with the same Google
account.

2. Create a new notebook by clicking on "File" > "New Notebook" or "File" >
"Upload Notebook" if you have a notebook file.

3. If you are creating a new notebook, you will see a new cell. You can start typing
code in this cell.

Step 3: Mount Google Drive


1. In a new cell in Colab, run the following code:

from google.colab import drive

drive.mount('/content/drive')
2. Click on the link generated, allow access to your Google Drive, and copy the
authentication code. Paste this code into the cell and press Enter.

Step 4: Install Required Libraries


1. In a new cell in Colab, run the following code to install the necessary libraries:

!pip install xgboost

Step 5: Copy and Paste the Code


1. Copy the entire code you provided in your question.

Step 6: Modify File Path


1. Find the line where the dataset is loaded (`df =
pd.read_csv('/content/drive/MyDrive/ML /onlinefraud.csv')`).

2. Replace `'/content/drive/MyDrive/ML /onlinefraud.csv'` with the path to your


dataset in your Google Drive. The path should start with
`'/content/drive/MyDrive/'`.

Step 7: Run the Code


1. Paste the modified code into a new cell in Colab.

2. Run the cell, either by clicking the play button next to the cell or pressing
Shift+Enter.
Step 8: Check Results
1. After running the code, you will see the results of model evaluation and
predictions in the output cells.

2. Look for the prediction result for the user input transaction to see if it's predicted
as fraud or non-fraud.

PROGRAM TO RUN
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix, roc_curve, auc, log_loss
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report, roc_curve, auc,
ConfusionMatrixDisplay
import random
from sklearn.metrics import roc_auc_score as ras
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import AdaBoostClassifier
df = pd.read_csv('/content/drive/MyDrive/ML /onlinefraud.csv')
print(df.shape)
print(df.head(5))
# Check for missing values
missing_values = df.isnull().sum()
print("Missing Values:\n", missing_values)
df = df.dropna() # Remove rows with missing values
# Create a correlation matrix
correlation_matrix = df.corr(numeric_only=True) # To calculate correlation only
for numeric columns
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title("Correlation Matrix")
plt.show()
# Feature selection and splitting
X = df.drop(['isFraud'], axis=1)
y = df['isFraud']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Exclude non-numeric columns from the training and testing data
non_numeric_columns = ['nameOrig', 'nameDest', 'type']
X_train = X_train.drop(columns=non_numeric_columns)
X_test = X_test.drop(columns=non_numeric_columns)
model = xgb.XGBClassifier()
model1 = LogisticRegression()
model2 =
RandomForestClassifier(n_estimators=7,criterion='entropy',random_state=7)
model3 = KNeighborsClassifier()
model4 = AdaBoostClassifier(random_state=42)
models = [model, model1, model2, model3, model4]
model_names = ['XGBoost', 'Logistic Regression', 'Random Forest', 'K-
neighbours', 'AdaBoost']
train_accuracy = []
test_accuracy = []
train_losses = []
test_losses = []
for model, name in zip(models, model_names):
model.fit(X_train, y_train)
# Training accuracy and loss
train_pred = model.predict(X_train)
train_acc = accuracy_score(y_train, train_pred)
train_loss = log_loss(y_train, model.predict_proba(X_train))
train_accuracy.append(train_acc)
train_losses.append(train_loss)
# Testing accuracy and loss
test_pred = model.predict(X_test)
test_acc = accuracy_score(y_test, test_pred)
test_loss = log_loss(y_test, model.predict_proba(X_test))
test_accuracy.append(test_acc)
test_losses.append(test_loss)
print(f"Accuracy for {name}: {test_acc}, Loss: {test_loss}")
# Plotting
plt.figure(figsize=(12, 8))
plt.plot(model_names, train_accuracy, marker='o', label='Training Accuracy')
plt.plot(model_names, test_accuracy, marker='o', label='Testing Accuracy')
plt.title('Training and Testing Accuracies for Different Models')
plt.xlabel('Models')
plt.ylabel('Accuracy')
plt.legend()
plt.xticks(rotation=45)
plt.grid(True)
plt.show()
user_input = {
'step': 1,
'amount': 10000.00,
'oldbalanceOrg': 30000.00,
'newbalanceOrig': 60000.0,
'oldbalanceDest': 3000.00,
'newbalanceDest': 33000.00,
'isFlagge1dFraud': df['isFlaggedFraud'].values[0] # Extract from your dataset
}
# Create a DataFrame from user input
user_df = pd.DataFrame([user_input])
# Make predictions using the model
user_predictions = model.predict(user_df)
# Check if the user input resulted in fraud or not
if user_predictions[0] == 1:
print("The transaction is predicted as fraud.")
else:
print("The transaction is predicted as non-fraud.")

You might also like