B17 Discrete Report
B17 Discrete Report
CONTENTS
Abstract ................................................................................. 3
1. INTRODUCTION…............................................................... 4
1.1. FRAUD DETECTION… ................................................. 4
1.2. MODEL EXPLAINABILITY… ......................................... 4
1.3.ML FOR FRAUD DETECTION ....................................... 4
2. DATASET ............................................................................. 6
3. METHADOLOGY… ................................................................6
3.1 Data Loading And Exploration ..................................... 6
3.2. Data Preparation..........................................................6
3.3. Model Creation and Training ....................................... 7
3.4. Model Evaluation ........................................................ 7
3.5. Predictions on New Data ............................................ 7
4. RESULTS ............................................................................. 7
4.1.Results .......................................................................... 7
4.2. Accuracy ...................................................................... 7
4.3. Classification Report ....................................................8
4.4.Confusion Matrix ......................................................... 8
5. ANALYSIS .............................................................................9
5.1.High Accuracy............................................................... 9
5.2.Class Imbalance Consideration .................................... 9
5.3 Precision and Recall Trade-off .................................... 9
5.4.Generalization to New Data ....................................... 9
6. EXPLANATION OF CODE .................................................... 10
6.1 DATA LOADING AND EXPLORATION ........................... 10
6.2 DATA SPLITTING AND MODEL TRAINING ................... 10
6.3.MODEL EVALUATION ................................................... 11
6.4 NEW DATA PREDICTION...............................................13
7. ANALYSIS TABLE ................................................................ 14
8. CONCLUSION ......................................................................15
9. REFERENCES ………………………………………………………………… 16
2
ABSTRACT
3
1. INTRODUCTION
2. DATASET
The IBM credit card transaction dataset is a publicly available dataset that
contains information about credit card transactions. It is often used for
research and testing of fraud detection models. The dataset includes a variety
of features such as the amount of the transaction, the type of card used, and
the location of the transaction. It also includes a label indicating whether the
transaction was fraudulent. The dataset is designed to be representative of
real-world transactions and therefore contains a certain level of class
imbalance, with a higher number of non-fraudulent transactions than
fraudulent transactions. The dataset is provided by IBM and the data is
simulated, but it is not specified the exact process of data simulation. It is
important to note that the data is not real and it is not linked to any real
customer or financial institution.
5
The data set contains:
In our case we will use Kaggle to create our model so we are using the dataset
on Kaggle.
3. METHADOLOGY
Split the dataset into features (X) and the target variable (y).
Divide the data into training and testing sets
6
3.3. Model Creation and Training:
4. RESULTS
4.1. Results
The evaluation of the Random Forest model for fraud detection yielded
insightful outcomes. The following key metrics were employed to assess the
model's performance
4.2. Accuracy
The model achieved an accuracy score of [insert accuracy score], reflecting the
proportion of correctly classified instances in the test set.
7
4.3. Classification Report
8
5.ANALYSIS
Given the imbalanced nature of the dataset (with appreciably more non-
fraudulent transactions), extra recognition is needed on metrics like precision,
remember, and the F1-score to evaluate the model's effectiveness in figuring
out fraudulent cases.
While the model demonstrates fantastic precision (low false effective fee), the
remember for the fraudulent magnificence is extraordinarily lower. Striking a
balance among precision and do not forget is important in fraud detection, as
lacking real fraud instances (fake negatives) is a significant problem.
The model's robustness may be similarly assessed via comparing its overall
performance on new information ('m1.Csv').
9
6. EXPLANATION OF CODE
Mainly this part code of the code gives an detailed idea on how the data is is
collected an being processed to the next. Here in this part the data will be
collected from the file that we had been submitted
X contains the features of the dataset, excluding the 'Class' column. Each row in
X represents a set of features for a specific data point.
y contains the target variable, which is the 'Class' column in this case. This
10
column typically contains labels indicating whether a transaction is fraudulent
(1) or not fraudulent (0).
‘train_test_split’ is a function from scikit-learn that splits the dataset into
training and testing sets.
X_train and y_train represent the features and labels of the training set,
respectively.
‘X_test’ and ‘y_test’ represent the features and labels of the testing set,
respectively. ‘test_size=0.2’ indicates that 20% of the data will be used for
testing, and the remaining 80% will be used for training.
random_state=42 ensures reproducibility by fixing the random seed for
the split. ‘RandomForestClassifier’ is a machine learning algorithm used
for classification tasks, and it belongs to the ensemble learning family.
‘n_estimators=100’ specifies the number of trees in the forest (you can
adjust this number based on your needs).
‘random_state’=42 ensures reproducibility by fixing the random seed for the
algorithm.
The fit method is used to train the Random Forest Classifier on the training data
(X_train and y_train).
After this step, the model has learned the patterns in the training data and is
ready to make predictions on new, unseen data.
6.3.MODEL EVALUATION
11
instances to the total instances.
Accuracy:
The percentage of correctly classified instances.
Precision:
The ratio of true positive predictions to the total predicted positives. It
measures the accuracy of positive predictions.
F1-score:
The harmonic mean of precision and recall. It provides a balance between
precision and recall.
Confusion Matrix:
A table showing the true positive, true negative, false positive, and false
negative counts. It gives insights into the types of errors the model makes.
12
6.4 NEW DATA PREDICTION
If 'Class' is not present, it assumes the entire new data as the feature set.
Uses the pre-trained Random Forest model (model) to make predictions on the
new data features (new_data_features).
13
7.Analysis Table
0.6 0.9912
0.7 0.9849
0.8 0.9868
0.9 0.9883
14
7. CONCLUSION
In the realm of fraud detection, the utility of the Random Forest algorithm has
verified top notch efficacy in discerning fraudulent sports inside credit score
card transactions. The version exhibited a commendable accuracy fee,
underscoring its ability to efficiently classify times and distinguish among valid
and fraudulent transactions.
While the high precision price suggests a low fake advantageous charge,
making sure that valid transactions aren't incorrectly flagged as fraudulent,
there may be room for development in don't forget to limit fake negatives.
Achieving a balanced precision-remember change-off is vital in fraud detection,
in which each minimizing false positives and capturing as many real fraud cases
as possible are critical targets.
In conclusion, the Random Forest version offers a robust basis for fraud
detection, and its overall performance can be further more advantageous thru
nice-tuning and model optimization. The dynamic nature of fraud strategies
necessitates a continuous dedication to investigate and development, making
sure that the model stays adaptive and resilient to emerging threats in the
ever-evolving panorama of financial fraud. As technology evolves, the
integration of advanced methodologies will play a pivotal function in fortifying
security features and retaining the integrity of financial transactions.
15
8. REFERENCES:
https://github.com/Sivaramasaran2773/Credit-Card-Fraud-Detection-using-Machine-
Learning-Models ------- Learned on how to use ML in this project
https://www.kaggle.com/code/kabure/credit-card-fraud-prediction-rf-smote ---For
detailed understanding of randomforest analysis of fraud detection
16