Fraud Detection Using ML
Fraud Detection Using ML
ML IN E-COMMERCE
A synopsis submitted
in partial fulfillment for the Degree of
Bachelor of Technology
In
Computer Science and Engineering
By
VIKRANT KUMAR (20033440030)
Pursuing in
Department of Computer Science and Engineering
January,2024
INDEX
DESCRIPTION PAGE NO.
1. ABSTRACT 1
2. INTRODUCTION 2
3. METHODOLOGY 3
4.MODELING AND ANALYSIS 4
5. RESULTS AND DISCUSSION 5
7. CONCLUSION 6
8. REFERENCE 7
1
ABSTRACT
In recent years, the rapid growth of e-commerce has been accompanied by a
significant increase in fraudulent transactions, posing severe financial risks
to both businesses and consumers. This paper presents a comprehensive
study on the detection of fraudulent transactions in e-commerce using
various machine learning algorithms, including Random Forest, Support
Vector Machine (SVM), Decision Tree, and Gradient Boosting. The objective
is to develop a robust and efficient fraud detection model that can accurately
identify fraudulent activities with minimal false positives.
The high accuracy achieved by the Random Forest model, coupled with the
simplicity of the Flask interface, makes this approach highly practical for
real-world implementation. This study highlights the potential of machine
learning algorithms in combating e-commerce fraud and provides a scalable
solution that can be adapted to various e-commerce platforms.
INTRODUCTION
E-commerce fraud detection is an essential aspect of ensuring the security and
integrity of online transactions in the rapidly expanding digital marketplace. As e-
commerce continues to grow, so does the sophistication and frequency of
fraudulent activities. E-commerce fraud can take various forms, including identity
theft, account takeover, fraudulent transactions, and payment fraud, among others.
These activities not only result in significant financial losses for businesses and
consumers but also undermine trust in online commerce.
overly stringent fraud detection measures can lead to false positives, where
legitimate transactions are mistakenly flagged as fraudulent. Therefore, continuous
improvement and fine-tuning of fraud detection systems are necessary to achieve
optimal performance.
In summary, e-commerce fraud detection is a critical component of modern online
commerce, aimed at protecting businesses and consumers from the financial and
reputational damage caused by fraud. Leveraging advanced technologies like
machine learning, alongside other innovative solutions, enables e-commerce
platforms to effectively combat fraud while maintaining a seamless and secure
shopping experience for users. As the e-commerce landscape continues to evolve,
so too must the strategies and tools used to detect and prevent fraudulent activities.
1
1
Modules Explanation
1. Data Cleaning and Processing Module: This module focuses on preparing
the dataset obtained from Kaggle for training the fraud detection models.
Initially, the dataset undergoes thorough cleaning to handle missing values
effectively. Techniques such as imputation or removal of missing values are
applied to ensure data integrity. Next, categorical variables within the dataset
are encoded into numerical representations suitable for machine learning
algorithms. This encoding step enables the algorithms to interpret and learn
from categorical data accurately. Additionally, numerical features are scaled
to ensure uniformity in scale, which aids in the convergence of certain
algorithms like Support Vector Machine (SVM). Scaling helps prevent
features with larger magnitudes from dominating those with smaller
magnitudes during model training. By performing these preprocessing steps,
the dataset becomes ready for training the fraud detection models effectively.
7. Flask Web Interface Module: The Flask Web Interface module focuses on
developing a user-friendly web interface for interacting with the fraud
detection system. The frontend development involves designing an intuitive
user interface using HTML, CSS, and JavaScript, allowing users to input
transaction details easily. On the backend, a Flask application is
implemented to handle user inputs, process requests, and provide instant
feedback on the likelihood of a transaction being fraudulent. Integration with
the trained fraud detection model enables real-time detection of fraudulent
activities. The Flask web interface is designed with user-friendliness in
mind, ensuring that both technical and non-technical users can leverage the
fraud detection system effortlessly.
Source code
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, recall_score,
precision_score
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier
import time
1
import warnings
warnings.filterwarnings("ignore")
product_label = {
'electronics' : 0,
'home & garden': 1,
'clothing': 2,
'toys & games': 3,
'health & beauty': 4
}
devices_labels = {
'tablet': 0,
'desktop': 1,
'mobile': 2
}
payment_labels = {
'bank transfer': 0,
'debit card': 1,
'PayPal': 2,
'credit card': 3
}
plt.figure(figsize=(5,5))
df['Is Fraudulent'].value_counts().plot(kind='pie',autopct="%.1f%%")
1
plt.show()
df.head()
x = df.drop('Is Fraudulent',axis=1)
y = df['Is Fraudulent']
X_train,X_test,y_train,y_test =
train_test_split(x,y,test_size=0.20,random_state=42)
Algorithms
# Random Forest
start_time = time.time()
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)
train_time_rf = time.time() - start_time
y_pred_train_rf = rf_model.predict(X_train)
y_pred_test_rf = rf_model.predict(X_test)
print("Random Forest:")
print(f"Training Accuracy: {train_accuracy_rf}")
print(f"Testing Accuracy: {test_accuracy_rf}")
print(f"F1 Score: {f1_rf}")
print(f"Recall: {recall_rf}")
print(f"Precision: {precision_rf}")
print(f"Training Time: {train_time_rf}\n")
Random Forest:
Training Accuracy: 0.9999471095361506
Testing Accuracy: 0.9532473027290036
F1 Score: 0.936013542377074
Recall: 0.9532473027290036
Precision: 0.9472335221004647
Training Time: 2.634901285171509
y_pred_train_svm = svm_model.predict(X_train)
y_pred_test_svm = svm_model.predict(X_test)
# Decision Tree
dtr_model = DecisionTreeClassifier()
print(f"Precision: {precision_dtr}")y_pred_train_dtr =
dtr_model.predict(X_train)
y_pred_test_dtr = dtr_model.predict(X_test)
print("Decision Tree:")
print(f"Training Accuracy: {train_accuracy_dtr}")
print(f"Testing Accuracy: {test_accuracy_dtr}")
print(f"F1 Score: {f1_dtr}")
print(f"Recall: {recall_dtr}")
print(f"Training Time: {train_time_dtr}\n")
Decision Tree:
Training Accuracy: 1.0
Testing Accuracy: 0.912418024116776
F1 Score: 0.9155646953821494
Recall: 0.912418024116776
Precision: 0.9188736602303765
Training Time: 0.1372387409210205
# Gradient Boosting
gradient_boosting_model = GradientBoostingClassifier()
# Make predictions
y_pred_train_gradient_boosting = gradient_boosting_model.predict(X_train)
y_pred_test_gradient_boosting = gradient_boosting_model.predict(X_test)
# Print results
print("Gradient Boosting:")
print(f"Training Accuracy: {train_accuracy_gradient_boosting}")
print(f"Testing Accuracy: {test_accuracy_gradient_boosting}")
print(f"F1 Score: {f1_gradient_boosting}")
print(f"Recall: {recall_gradient_boosting}")
print(f"Precision: {precision_gradient_boosting}")
print(f"Training Time: {train_time_gradient_boosting}\n")
Gradient Boosting:
Training Accuracy: 0.9565769291796689
Testing Accuracy: 0.951977998730696
F1 Score: 0.9345763615073522
Recall: 0.951977998730696
Precision: 0.9414915744169469
Training Time: 2.767469644546509
['models.joblib']
df.head()
loaded_model = load('models.joblib')
product_label = {
'electronics' : 0,
'home & garden': 1,
'clothing': 2,
'toys & games': 3,
'health & beauty': 4
}
devices_labels = {
'tablet': 0,
'desktop': 1,
'mobile': 2
}
payment_labels = {
'bank transfer': 0,
'debit card': 1,
'PayPal': 2,
'credit card': 3
}
Transaction_Amount = 42
Payment_Method = 'PayPal'
Product_Category = 'electronics'
Quantity = 1
Customer_Age = 40
Device_Used = 'desktop'
Account_Age_Days = 282
Month = 3
Day = 24
Hour = 23
# Transaction_Amount = 222
# Payment_Method = 'bank transfer'
# Product_Category = 'home & garden'
# Quantity = 1
# Customer_Age = 51
# Device_Used = 'tablet'
# Account_Age_Days = 194
# Month = 3
# Day = 25
# Hour = 19
res_labels = ['Not a Fraud Transaction', 'Fraud Transaction']
1
Payment_Method = payment_labels[Payment_Method]
Product_Category = product_label[Product_Category]
Device_Used = devices_labels[Device_Used]
InterFace Of a Website
*Home Page
1
*Fraud Transaction
1
Objectives:
1. Develop a robust fraud detection model for e-commerce transactions using
machine learning algorithms, including Random Forest, Support Vector
Machine (SVM), Decision Tree, and Gradient Boosting.
Future Scope:
1. Incorporate more sophisticated feature engineering techniques, such as
feature selection and creation of new features, to further enhance the fraud
detection model's performance and robustness.
4. Enhance the user interface of the Flask-based web application with advanced
features such as data visualization, transaction history, and personalized
recommendations, improving user experience and engagement.