Assignment1 921275
Assignment1 921275
921275
Assignment 1
Dataset Overview
The dataset for this project is taken from Credit Card Fraud Detection Dataset of Kaggle. It
consists of 284,807 transactions, of which only 0.172% are fraudulent. Each transation has 30
numberical features such as 'Time', 'Amount', and 28 anonymized PCA components (V1-V28).
The target variable ('Class') is: 1 for fraudulent transactions and 0 for legitimate transactions.
Preprocessing Steps
Managing Missing Values: There were no missing values hence imputation was not done
Feature Scaling:
To improve model performance 'Amount' and 'Time' were standardized using StandardScaler
Data Splitting:
The trained Logistic Regression model was assessed by using standard classification
performance metrics :
Recall: Determines the number of actual fraud cases that have been identified correctly.
Precision: 0.92
Recall: 0.85
Purnima Gosain
921275
Assignment 1
F1 Score: 0.88
ROC-AUC: 0.98
These scores suggest the model is able to identify fraud transactions while having relatively few
false positives.
An interactive web application based on Streamlit was created to accept transaction details and
predict whether is fraud or not.
Key Features
User Inputs:
Users insert value for 'Time', 'Amount' and anonymized PCA characteristics (V1-V28).
– Fraud Prediction in Real Time: The model assigns a label to the transaction as Fraud (Class
1)/ Legitimacy (Class 0)
Probability Display: With the app's results, it shows both a fraud and a legitimate transaction
probability.
Input Validation: Yes, this allows for a freshness of data up until October 2023.
Easy to Use Interface: Very basic UI with buttons and color-coded messages (green for legit,
red for fakes).
4. Conclusion
It includes a machine learning pipeline for fraud detection, i.e a trained Logistic Regression
model, an interactive web app that can be used for predictions in real-time. Room for
improvement that might come:
Playing with more complex models (Random Forest, XGBoost, Neural Networks, etc).
A Practical approach for efficient and accurate Detection of Fraudulent Credit Card
Transactions