0% found this document useful (0 votes)
21 views22 pages

Fraud Detection Using ML

The document discusses developing machine learning models for fraud detection in e-commerce transactions. It describes preprocessing a dataset, training models like Random Forest, SVM, Decision Tree and Gradient Boosting, and evaluating their performance on classification metrics. A Flask application was also created to allow users to input transactions and receive fraud predictions.

Uploaded by

vikrant kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views22 pages

Fraud Detection Using ML

The document discusses developing machine learning models for fraud detection in e-commerce transactions. It describes preprocessing a dataset, training models like Random Forest, SVM, Decision Tree and Gradient Boosting, and evaluating their performance on classification metrics. A Flask application was also created to allow users to input transactions and receive fraud predictions.

Uploaded by

vikrant kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

FRAUD DETECTION USING

ML IN E-COMMERCE
A synopsis submitted
in partial fulfillment for the Degree of

Bachelor of Technology
In
Computer Science and Engineering
By
VIKRANT KUMAR (20033440030)

DEVENDRA KUMAR (20033440009)


ANKIT KUMAR (20033440006)
AMAN KUMAR SINGH (20033440003)
ANKUR KUMAR THAKUR (20033440007)

Pursuing in
Department of Computer Science and Engineering

RAMGARH ENGINEERING COLLEGE


(Estd. By Govt. of Jharkhand & run by Techno India under PPP)

January,2024
INDEX
DESCRIPTION PAGE NO.
1. ABSTRACT 1
2. INTRODUCTION 2
3. METHODOLOGY 3
4.MODELING AND ANALYSIS 4
5. RESULTS AND DISCUSSION 5
7. CONCLUSION 6
8. REFERENCE 7
1

ABSTRACT
In recent years, the rapid growth of e-commerce has been accompanied by a
significant increase in fraudulent transactions, posing severe financial risks
to both businesses and consumers. This paper presents a comprehensive
study on the detection of fraudulent transactions in e-commerce using
various machine learning algorithms, including Random Forest, Support
Vector Machine (SVM), Decision Tree, and Gradient Boosting. The objective
is to develop a robust and efficient fraud detection model that can accurately
identify fraudulent activities with minimal false positives.

The dataset utilized in this study comprises a large number of transaction


records, which were pre-processed to handle missing values, encode
categorical variables, and scale numerical features. The processed data was
then split into training and testing sets to evaluate the performance of the
different machine learning models.

A Random Forest classifier was initially implemented and fine-tuned,


resulting in an impressive accuracy of 95%, significantly outperforming the
other models. The SVM and Decision Tree classifiers demonstrated
competitive performance but fell short in accuracy compared to the Random
Forest. Gradient Boosting, while powerful, also did not surpass the accuracy
of the Random Forest model. Detailed evaluation metrics, including
precision, recall, and F1-score, were calculated to ensure a thorough
comparison of the models.

In addition to model development and evaluation, a simple yet effective


Flask-based web interface was created to facilitate user interaction with the
fraud detection system. This interface allows users to input transaction
details and receive instant feedback on the likelihood of the transaction
being fraudulent. The Flask application was designed with user-friendliness
in mind, ensuring that even non-technical users can leverage the fraud
detection system with ease.

The high accuracy achieved by the Random Forest model, coupled with the
simplicity of the Flask interface, makes this approach highly practical for
real-world implementation. This study highlights the potential of machine
learning algorithms in combating e-commerce fraud and provides a scalable
solution that can be adapted to various e-commerce platforms.

Future work will focus on further enhancing the model's performance by


incorporating more sophisticated feature engineering techniques and
exploring additional machine learning algorithms. Additionally, the
integration of real-time transaction monitoring and the development of more
advanced user interfaces will be prioritized to improve the system's usability
and effectiveness.
1
Overall, this research demonstrates a successful application of machine
learning to e-commerce fraud detection, offering a viable solution to a
critical issue faced by the industry. The implementation of such a system
can significantly mitigate financial losses due to fraud, thereby fostering a
safer and more secure online shopping environment.
1

INTRODUCTION
E-commerce fraud detection is an essential aspect of ensuring the security and
integrity of online transactions in the rapidly expanding digital marketplace. As e-
commerce continues to grow, so does the sophistication and frequency of
fraudulent activities. E-commerce fraud can take various forms, including identity
theft, account takeover, fraudulent transactions, and payment fraud, among others.
These activities not only result in significant financial losses for businesses and
consumers but also undermine trust in online commerce.

Fraudulent activities in e-commerce typically involve exploiting vulnerabilities in


payment systems, user authentication processes, and data handling mechanisms.
Fraudsters employ a range of tactics, from using stolen credit card information to
creating fake accounts and manipulating transaction details. The increasing
complexity and volume of e-commerce transactions make traditional fraud
detection methods, such as manual reviews and simple rule-based systems,
insufficient and inefficient.

To address these challenges, modern e-commerce platforms are turning to


advanced technologies and methodologies to enhance fraud detection capabilities.
Machine learning, a subset of artificial intelligence, has emerged as a powerful tool
in the fight against e-commerce fraud. By analyzing large datasets and identifying
patterns indicative of fraudulent behavior, machine learning models can detect and
prevent fraud in real-time with high accuracy.

Machine learning models used in e-commerce fraud detection include supervised


learning algorithms, such as Random Forest, Support Vector Machines (SVM),
Decision Trees, and Gradient Boosting. These models are trained on historical
transaction data, learning to distinguish between legitimate and fraudulent
transactions based on various features and patterns. The ability to continuously
learn and adapt to new fraud tactics makes machine learning models particularly
effective in dynamic e-commerce environments.

In addition to machine learning, other technological advancements, such as big


data analytics, blockchain, and biometric authentication, are being integrated into
fraud detection systems. These technologies provide additional layers of security
and improve the robustness of fraud detection mechanisms.
The goal of e-commerce fraud detection is not only to identify fraudulent
transactions but also to do so in a way that minimizes disruption to legitimate
customers. Striking a balance between security and user experience is crucial, as
1

overly stringent fraud detection measures can lead to false positives, where
legitimate transactions are mistakenly flagged as fraudulent. Therefore, continuous
improvement and fine-tuning of fraud detection systems are necessary to achieve
optimal performance.
In summary, e-commerce fraud detection is a critical component of modern online
commerce, aimed at protecting businesses and consumers from the financial and
reputational damage caused by fraud. Leveraging advanced technologies like
machine learning, alongside other innovative solutions, enables e-commerce
platforms to effectively combat fraud while maintaining a seamless and secure
shopping experience for users. As the e-commerce landscape continues to evolve,
so too must the strategies and tools used to detect and prevent fraudulent activities.
1
1

Modules Explanation
1. Data Cleaning and Processing Module: This module focuses on preparing
the dataset obtained from Kaggle for training the fraud detection models.
Initially, the dataset undergoes thorough cleaning to handle missing values
effectively. Techniques such as imputation or removal of missing values are
applied to ensure data integrity. Next, categorical variables within the dataset
are encoded into numerical representations suitable for machine learning
algorithms. This encoding step enables the algorithms to interpret and learn
from categorical data accurately. Additionally, numerical features are scaled
to ensure uniformity in scale, which aids in the convergence of certain
algorithms like Support Vector Machine (SVM). Scaling helps prevent
features with larger magnitudes from dominating those with smaller
magnitudes during model training. By performing these preprocessing steps,
the dataset becomes ready for training the fraud detection models effectively.

2. Random Forest Algorithm Module: The Random Forest algorithm module


is dedicated to implementing, fine-tuning, and evaluating the Random Forest
classifier for fraud detection. In the implementation phase, the Random
Forest classifier is instantiated with default hyperparameters or initial
configurations. Subsequently, the model is fine-tuned using techniques such
as grid search or random search to optimize its performance. Fine-tuning
involves adjusting hyperparameters such as the number of trees in the forest,
maximum depth of trees, and minimum samples per leaf node. Once the
model is trained, its performance is evaluated using various metrics such as
accuracy, precision, recall, and F1-score. These metrics provide insights into
the classifier's ability to accurately identify fraudulent transactions while
minimizing false positives.
1

3. Support Vector Machine (SVM) Algorithm Module: The Support Vector


Machine (SVM) algorithm module is designed to implement, fine-tune, and
evaluate the SVM classifier for fraud detection. In the implementation
phase, the SVM classifier is instantiated with appropriate kernel functions,
such as linear, polynomial, or radial basis function (RBF), based on the
dataset characteristics. Hyperparameters like the regularization parameter
(C) and kernel coefficients are then fine-tuned to optimize the classifier's
performance. Following training, the SVM classifier's effectiveness is
assessed using evaluation metrics such as accuracy, precision, recall, and F1-
score. These metrics gauge the classifier's ability to distinguish between
fraudulent and legitimate transactions accurately.
1

4. Decision Tree Algorithm Module: The Decision Tree algorithm module


focuses on implementing, fine-tuning, and evaluating the Decision Tree
classifier for fraud detection. Initially, the Decision Tree classifier is
constructed based on the dataset's features, with splits made at each node to
maximize information gain or minimize impurity. Hyperparameters such as
the maximum depth of the tree and minimum samples per leaf are adjusted
during the fine-tuning phase to enhance the classifier's performance.
Subsequently, the trained Decision Tree classifier is evaluated using metrics
like accuracy, precision, recall, and F1-score to assess its ability to detect
1

fraudulent transactions effectively.

5. Gradient Boosting Algorithm Module: The Gradient Boosting algorithm


module is dedicated to implementing, fine-tuning, and evaluating the
Gradient Boosting classifier for fraud detection. During implementation, the
Gradient Boosting classifier is constructed sequentially by combining
multiple weak learners, typically decision trees, to create a strong ensemble
model. Hyperparameters such as the learning rate, number of estimators, and
maximum depth of trees are fine-tuned to optimize the classifier's
performance. Following training, the Gradient Boosting classifier's
performance is evaluated using metrics such as accuracy, precision, recall,
and F1-score to measure its effectiveness in identifying fraudulent
transactions accurately.
1

6. Evaluation Module: This module encompasses the evaluation of all


implemented machine learning algorithms for fraud detection. Each
algorithm's performance is assessed using metrics such as accuracy,
precision, recall, and F1-score, providing a comprehensive understanding of
their effectiveness in identifying fraudulent transactions. These metrics
enable comparisons between different algorithms and aid in selecting the
most suitable approach for real-world implementation. Additionally, the
evaluation module ensures that the selected fraud detection model achieves
high accuracy while minimizing false positives, thereby mitigating financial
risks for businesses and consumers.

7. Flask Web Interface Module: The Flask Web Interface module focuses on
developing a user-friendly web interface for interacting with the fraud
detection system. The frontend development involves designing an intuitive
user interface using HTML, CSS, and JavaScript, allowing users to input
transaction details easily. On the backend, a Flask application is
implemented to handle user inputs, process requests, and provide instant
feedback on the likelihood of a transaction being fraudulent. Integration with
the trained fraud detection model enables real-time detection of fraudulent
activities. The Flask web interface is designed with user-friendliness in
mind, ensuring that both technical and non-technical users can leverage the
fraud detection system effortlessly.

Source code
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, recall_score,
precision_score
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier
import time
1

import warnings

warnings.filterwarnings("ignore")

# Load the dataset


df = pd.read_csv('creditCardFraud.csv')

df['Transaction Date'] = pd.to_datetime(df['Transaction Date'])


# Define the label mapping dictionary

product_label = {
'electronics' : 0,
'home & garden': 1,
'clothing': 2,
'toys & games': 3,
'health & beauty': 4
}

devices_labels = {
'tablet': 0,
'desktop': 1,
'mobile': 2
}

payment_labels = {
'bank transfer': 0,
'debit card': 1,
'PayPal': 2,
'credit card': 3
}

# Map the label column using the dictionary


df['Device Used'] = df['Device Used'].map(devices_labels)
df['Product Category'] = df['Product Category'].map(product_label)
df['Payment Method'] = df['Payment Method'].map(payment_labels)

# Drop irrelevant or high cardinality columns


df = df.drop(columns=['Transaction ID', 'Customer ID', 'Transaction Date',
'Customer Location', 'IP Address', 'Shipping Address',
'Billing Address', 'Transaction Hour'])

plt.figure(figsize=(5,5))
df['Is Fraudulent'].value_counts().plot(kind='pie',autopct="%.1f%%")
1

plt.show()

df.head()

x = df.drop('Is Fraudulent',axis=1)
y = df['Is Fraudulent']

X_train,X_test,y_train,y_test =
train_test_split(x,y,test_size=0.20,random_state=42)

Algorithms
# Random Forest
start_time = time.time()
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)
train_time_rf = time.time() - start_time

y_pred_train_rf = rf_model.predict(X_train)
y_pred_test_rf = rf_model.predict(X_test)

train_accuracy_rf = accuracy_score(y_train, y_pred_train_rf)


test_accuracy_rf = accuracy_score(y_test, y_pred_test_rf)
1

f1_rf = f1_score(y_test, y_pred_test_rf, average='weighted')


recall_rf = recall_score(y_test, y_pred_test_rf, average='weighted')
precision_rf = precision_score(y_test, y_pred_test_rf, average='weighted')

print("Random Forest:")
print(f"Training Accuracy: {train_accuracy_rf}")
print(f"Testing Accuracy: {test_accuracy_rf}")
print(f"F1 Score: {f1_rf}")
print(f"Recall: {recall_rf}")
print(f"Precision: {precision_rf}")
print(f"Training Time: {train_time_rf}\n")

Random Forest:
Training Accuracy: 0.9999471095361506
Testing Accuracy: 0.9532473027290036
F1 Score: 0.936013542377074
Recall: 0.9532473027290036
Precision: 0.9472335221004647
Training Time: 2.634901285171509

# Support Vector Machine


start_time = time.time()
svm_model = SVC()
svm_model.fit(X_train, y_train)
train_time_svm = time.time() - start_time

y_pred_train_svm = svm_model.predict(X_train)
y_pred_test_svm = svm_model.predict(X_test)

train_accuracy_svm = accuracy_score(y_train, y_pred_train_svm)


test_accuracy_svm = accuracy_score(y_test, y_pred_test_svm)

f1_svm = f1_score(y_test, y_pred_test_svm, average='weighted')


recall_svm = recall_score(y_test, y_pred_test_svm, average='weighted')
precision_svm = precision_score(y_test, y_pred_test_svm, average='weighted')

print("Support Vector Machine:")


print(f"Training Accuracy: {train_accuracy_svm}")
print(f"Testing Accuracy: {test_accuracy_svm}")
print(f"F1 Score: {f1_svm}")
print(f"Recall: {recall_svm}")
print(f"Precision: {precision_svm}")
print(f"Training Time: {train_time_svm}\n")

Support Vector Machine:


Training Accuracy: 0.9534563918125561
1

Testing Accuracy: 0.9524011000634652


F1 Score: 0.933936325946068
Recall: 0.9524011000634652
Precision: 0.9461072593552824
Training Time: 2.641307830810547

# Decision Tree
dtr_model = DecisionTreeClassifier()

# Train the model and measure the training time


start_time = time.time()
dtr_model.fit(X_train, y_train)
train_time_dtr = time.time() - start_time

print(f"Precision: {precision_dtr}")y_pred_train_dtr =
dtr_model.predict(X_train)
y_pred_test_dtr = dtr_model.predict(X_test)

train_accuracy_dtr = accuracy_score(y_train, y_pred_train_dtr)


test_accuracy_dtr = accuracy_score(y_test, y_pred_test_dtr)

f1_dtr = f1_score(y_test, y_pred_test_dtr, average='weighted')


recall_dtr = recall_score(y_test, y_pred_test_dtr, average='weighted')
precision_dtr = precision_score(y_test, y_pred_test_dtr, average='weighted')

print("Decision Tree:")
print(f"Training Accuracy: {train_accuracy_dtr}")
print(f"Testing Accuracy: {test_accuracy_dtr}")
print(f"F1 Score: {f1_dtr}")
print(f"Recall: {recall_dtr}")
print(f"Training Time: {train_time_dtr}\n")

Decision Tree:
Training Accuracy: 1.0
Testing Accuracy: 0.912418024116776
F1 Score: 0.9155646953821494
Recall: 0.912418024116776
Precision: 0.9188736602303765
Training Time: 0.1372387409210205
# Gradient Boosting
gradient_boosting_model = GradientBoostingClassifier()

# Train the model and measure the training time


start_time = time.time()
gradient_boosting_model.fit(X_train, y_train)
train_time_gradient_boosting = time.time() - start_time
1

# Make predictions
y_pred_train_gradient_boosting = gradient_boosting_model.predict(X_train)
y_pred_test_gradient_boosting = gradient_boosting_model.predict(X_test)

# Calculate accuracy, F1 score, recall, and precision


train_accuracy_gradient_boosting = accuracy_score(y_train,
y_pred_train_gradient_boosting)
test_accuracy_gradient_boosting = accuracy_score(y_test,
y_pred_test_gradient_boosting)
f1_gradient_boosting = f1_score(y_test, y_pred_test_gradient_boosting,
average='weighted')
recall_gradient_boosting = recall_score(y_test, y_pred_test_gradient_boosting,
average='weighted')
precision_gradient_boosting = precision_score(y_test,
y_pred_test_gradient_boosting, average='weighted')

# Print results
print("Gradient Boosting:")
print(f"Training Accuracy: {train_accuracy_gradient_boosting}")
print(f"Testing Accuracy: {test_accuracy_gradient_boosting}")
print(f"F1 Score: {f1_gradient_boosting}")
print(f"Recall: {recall_gradient_boosting}")
print(f"Precision: {precision_gradient_boosting}")
print(f"Training Time: {train_time_gradient_boosting}\n")

Gradient Boosting:
Training Accuracy: 0.9565769291796689
Testing Accuracy: 0.951977998730696
F1 Score: 0.9345763615073522
Recall: 0.951977998730696
Precision: 0.9414915744169469
Training Time: 2.767469644546509

from joblib import dump, load

# Save the Multinomial Naive Bayes model


dump(rf_model, 'models.joblib')

['models.joblib']

df.head()

# Load the saved Multinomial Naive Bayes model


from joblib import load
1

loaded_model = load('models.joblib')

product_label = {
'electronics' : 0,
'home & garden': 1,
'clothing': 2,
'toys & games': 3,
'health & beauty': 4
}

devices_labels = {
'tablet': 0,
'desktop': 1,
'mobile': 2
}

payment_labels = {
'bank transfer': 0,
'debit card': 1,
'PayPal': 2,
'credit card': 3
}
Transaction_Amount = 42
Payment_Method = 'PayPal'
Product_Category = 'electronics'
Quantity = 1
Customer_Age = 40
Device_Used = 'desktop'
Account_Age_Days = 282
Month = 3
Day = 24
Hour = 23

# Transaction_Amount = 222
# Payment_Method = 'bank transfer'
# Product_Category = 'home & garden'
# Quantity = 1
# Customer_Age = 51
# Device_Used = 'tablet'
# Account_Age_Days = 194
# Month = 3
# Day = 25
# Hour = 19
res_labels = ['Not a Fraud Transaction', 'Fraud Transaction']
1

Payment_Method = payment_labels[Payment_Method]
Product_Category = product_label[Product_Category]
Device_Used = devices_labels[Device_Used]

def fraud(Transaction_Amount, Payment_Method, Product_Category, Quantity,


Customer_Age, Device_Used, Account_Age_Days, Month, Day, Hour):

prediction = loaded_model.predict([[Transaction_Amount, Payment_Method,


Product_Category, Quantity, Customer_Age, Device_Used, Account_Age_Days, Month,
Day, Hour]])
return prediction[0]

# Get the recommended crop


prediction = fraud(Transaction_Amount, Payment_Method, Product_Category,
Quantity, Customer_Age, Device_Used, Account_Age_Days, Month, Day, Hour)
print("Prediction:", res_labels[prediction])

Prediction: Not a Fraud Transaction

InterFace Of a Website

*Home Page
1

*Not a Fraud Transaction

*Fraud Transaction
1

Objectives:
1. Develop a robust fraud detection model for e-commerce transactions using
machine learning algorithms, including Random Forest, Support Vector
Machine (SVM), Decision Tree, and Gradient Boosting.

2. Preprocess the dataset obtained from Kaggle by handling missing values,


encoding categorical variables, and scaling numerical features to ensure data
integrity and suitability for model training.

3. Implement and fine-tune machine learning algorithms to achieve high


accuracy in identifying fraudulent activities while minimizing false
positives.

4. Evaluate the performance of each algorithm using metrics such as accuracy,


precision, recall, and F1-score to determine the most effective approach for
fraud detection in e-commerce.

5. Create a user-friendly Flask-based web interface to facilitate easy interaction


with the fraud detection system, allowing users to input transaction details
and receive instant feedback on the likelihood of fraud.

Future Scope:
1. Incorporate more sophisticated feature engineering techniques, such as
feature selection and creation of new features, to further enhance the fraud
detection model's performance and robustness.

2. Explore additional machine learning algorithms beyond the ones studied,


such as neural networks or ensemble methods, to identify potential
improvements in fraud detection accuracy.

3. Integrate real-time transaction monitoring capabilities into the system to


detect and prevent fraudulent activities as they occur, enhancing the system's
responsiveness and effectiveness.
1

4. Enhance the user interface of the Flask-based web application with advanced
features such as data visualization, transaction history, and personalized
recommendations, improving user experience and engagement.

5. Collaborate with e-commerce platforms and financial institutions to deploy


the developed fraud detection system at scale, leveraging their domain
expertise and data resources to address evolving fraud patterns and
challenges effectively.

You might also like