0% found this document useful (0 votes)
6 views2 pages

Assignment1 921275

The document outlines a project on credit card fraud detection using a dataset from Kaggle, consisting of 284,807 transactions with a focus on preprocessing steps, model performance, and a front-end application. A Logistic Regression model was trained, achieving a precision of 0.92, recall of 0.85, and an ROC-AUC score of 0.98, indicating effective fraud detection. Additionally, an interactive web application was developed to predict fraud in real-time, with features for user input and probability display.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Assignment1 921275

The document outlines a project on credit card fraud detection using a dataset from Kaggle, consisting of 284,807 transactions with a focus on preprocessing steps, model performance, and a front-end application. A Logistic Regression model was trained, achieving a precision of 0.92, recall of 0.85, and an ROC-AUC score of 0.98, indicating effective fraud detection. Additionally, an interactive web application was developed to predict fraud in real-time, with features for user input and probability display.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Purnima Gosain

921275
Assignment 1

1. Data Preparation and Preprocessing Steps

Dataset Overview

The dataset for this project is taken from Credit Card Fraud Detection Dataset of Kaggle. It
consists of 284,807 transactions, of which only 0.172% are fraudulent. Each transation has 30
numberical features such as 'Time', 'Amount', and 28 anonymized PCA components (V1-V28).
The target variable ('Class') is: 1 for fraudulent transactions and 0 for legitimate transactions.

Preprocessing Steps

Managing Missing Values: There were no missing values hence imputation was not done

Feature Scaling:

To improve model performance 'Amount' and 'Time' were standardized using StandardScaler

Class Imbalance Handling:

Synthetic Minority Over-sampling Technique (SMOTE) outperformed better on the imbalance


dataset since fraudulent transactions were highly imbalanced.

Data Splitting:

80% training and 20% testing** of the dataset.

2. Model Performance Metrics

Logistic Regression Model

The trained Logistic Regression model was assessed by using standard classification
performance metrics :

Precision: The fraction of predicted frauds which were actually fraud.

Recall: Determines the number of actual fraud cases that have been identified correctly.

F1 Score: it is Harmonic mean of precision and recall.

ROC-AUC Score: Represents overall classification performance

Precision: 0.92

Recall: 0.85
Purnima Gosain
921275
Assignment 1

F1 Score: 0.88

ROC-AUC: 0.98

These scores suggest the model is able to identify fraud transactions while having relatively few
false positives.

3. Key Features of the Front-End Application

An interactive web application based on Streamlit was created to accept transaction details and
predict whether is fraud or not.

Key Features

User Inputs:

Users insert value for 'Time', 'Amount' and anonymized PCA characteristics (V1-V28).

– Fraud Prediction in Real Time: The model assigns a label to the transaction as Fraud (Class
1)/ Legitimacy (Class 0)

Probability Display: With the app's results, it shows both a fraud and a legitimate transaction
probability.

Input Validation: Yes, this allows for a freshness of data up until October 2023.

Easy to Use Interface: Very basic UI with buttons and color-coded messages (green for legit,
red for fakes).

4. Conclusion

It includes a machine learning pipeline for fraud detection, i.e a trained Logistic Regression
model, an interactive web app that can be used for predictions in real-time. Room for
improvement that might come:

Playing with more complex models (Random Forest, XGBoost, Neural Networks, etc).

Using real-time transaction monitoring in a production system

A Practical approach for efficient and accurate Detection of Fraudulent Credit Card
Transactions

You might also like