0% found this document useful (0 votes)

3 views14 pages

Introduction of Phase 4

The document outlines the development of a fraud detection model for financial transactions, detailing stages such as data collection, feature engineering, model training, and evaluation metrics like precision, recall, and F1-score. It includes a Python program demonstrating algorithm selection using Logistic Regression, Decision Tree, and Random Forest, and emphasizes the importance of model selection based on performance metrics. The conclusion highlights the iterative process of evaluating models to choose the best one that meets project objectives and user requirements.

Uploaded by

JM REMILDAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views14 pages

Introduction of Phase 4

Uploaded by

JM REMILDAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

phase 4 : Artificial Intelligence- Project Development- Fraud

Detections in Financial Transactions

Introduction
Developing a fraud detection model for financial transactions involves several key stages, including
model development, selection of evaluation metrics, and finally, model selection. Here's an overview of
each stage:

1. Model Development:

 Data Collection and Preprocessing: Gather historical transaction data, including features
such as transaction amount, merchant ID, time of transaction, etc. Preprocess the data by
handling missing values, encoding categorical variables, and scaling numerical features.
 Feature Engineering: Create new features or transform existing ones that might help in
identifying fraudulent transactions. For example, calculating transaction frequency, average
transaction amount, or flagging transactions that deviate significantly from a user's typical
behavior.
 Model Training: Select an appropriate machine learning algorithm for anomaly detection, such
as Isolation Forest, One-Class SVM, or ensemble methods. Train the model on the preprocessed
data.
 Model Tuning: Fine-tune hyperparameters to optimize model performance. This can be done
using techniques like grid search or randomized search.

2. Evaluation Metrics:

 Precision, Recall, and F1-score: These metrics evaluate the model's ability to correctly identify
fraudulent transactions while minimizing false positives. Precision measures the proportion of
correctly identified frauds among all flagged transactions, recall measures the proportion of
correctly identified frauds among all actual frauds, and F1-score is the harmonic mean of
precision and recall.
 Accuracy: Measures the overall correctness of the model's predictions, considering both true
positives and true negatives.
 AUC-ROC: Area Under the Receiver Operating Characteristic curve measures the model's ability
to distinguish between fraud and legitimate transactions across different threshold settings.

3. Model Selection:
 Compare Performance: Evaluate each model using the selected evaluation metrics. Consider the
trade-offs between accuracy, novelty detection, and computational complexity.
 Select the Best Model: Choose the model that best aligns with project objectives and user
requirements. This could involve selecting the model with the highest F1-score, or considering a
combination of metrics based on the specific needs of the application.
 Iterate if Necessary: If none of the models meet the desired criteria, iterate by fine-tuning
parameters, exploring different algorithms, or preprocessing the data differently.
 By following these stages, one can develop an effective fraud detection model for financial
transactions, ensuring accurate identification of fraudulent activity while minimizing false
positives.

Algorithm selection
Certainly! Here's a simple Python program that demonstrates algorithm selection for fraud detection in
financial transactions. The program uses three different algorithms: Logistic Regression, Decision Tree,
and Random Forest. It selects the best algorithm based on performance metrics such as accuracy,
precision, recall, and F1-score.

To run this program, you'll need to have the following Python libraries installed:

pandas

numpy

scikit-learn

Lib file

pip install pandas numpy scikit-learn

Here’s the Python program:

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

def load_data():
data = pd.DataFrame({

'feature1': np.random.randn(1000),

'feature2': np.random.randn(1000),

'feature3': np.random.randn(1000),

'feature4': np.random.randn(1000),

'fraud': np.random.randint(0, 2, 1000)

})

return data

def evaluate_model(model, X_train, X_test, y_train, y_test):

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred)

recall = recall_score(y_test, y_pred)

f1 = f1_score(y_test, y_pred)

return accuracy, precision, recall, f1

data = load_data()

X = data.drop('fraud', axis=1)

y = data['fraud']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

models = {

'Logistic Regression': LogisticRegression(max_iter=1000),

'Decision Tree': DecisionTreeClassifier(),

'Random Forest': RandomForestClassifier()

}
results = {}

for name, model in models.items():

accuracy, precision, recall, f1 = evaluate_model(model, X_train, X_test, y_train, y_test)

results[name] = {

'Accuracy': accuracy,

'Precision': precision,

'Recall': recall,

'F1 Score': f1

for model_name, metrics in results.items():

print(f"Model: {model_name}")

for metric_name, metric_value in metrics.items():

print(f" {metric_name}: {metric_value:.4f}")

print()

best_model_name = max(results, key=lambda name: results[name]['F1 Score'])

best_model = models[best_model_name]

print(f"The best model is: {best_model_name} with an F1 Score of {results[best_model_name]['F1

Score']:.4f}")

best_model.fit(X_train, y_train)

final_predictions = best_model.predict(X_test)

output:
Model evaluation

Lib file

pip install pandas numpy scikit-learn

Python code

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score,

confusion_matrix

def load_data():

data = pd.DataFrame({

'feature1': np.random.rand(1000),

'feature2': np.random.rand(1000),

'feature3': np.random.rand(1000),

'label': np.random.randint(0, 2, size=1000)

})

return data

def preprocess_data(data):

X = data.drop('label…

Evaluation metrics -- accuracy metrics

We will use the scikit-learn library to calculate these metrics

import numpy as np

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score,

confusion_matrix, roc_curve, auc
import matplotlib.pyplot as plt

y_true = np.array([0, 1, 0, 0, 1, 0, 1, 1, 0, 1])

y_pred = np.array([0, 0, 0, 0, 1, 1, 1, 1, 0, 1])

y_pred_prob = np.array([0.1, 0.4, 0.2, 0.1, 0.9, 0.8, 0.7, 0.95, 0.3, 0.85]) # Predicted probabilities for
ROC AUC

accuracy = accuracy_score(y_true, y_pred)

print(f'Accuracy: {accuracy:.2f}')

precision = precision_score(y_true, y_pred)

print(f'Precision: {precision:.2f}')

recall = recall_score(y_true, y_pred)

print(f'Recall: {recall:.2f}')

f1 = f1_score(y_true, y_pred)

print(f'F1 Score: {f1:.2f}')

roc_auc = roc_auc_score(y_true, y_pred_prob)

print(f'ROC AUC: {roc_auc:.2f}')

conf_matrix = confusion_matrix(y_true, y_pred)

print(f'Confusion Matrix:\n{conf_matrix}')

fpr, tpr, _ = roc_curve(y_true, y_pred_prob)

roc_auc = auc(fpr, tpr)

plt.figure()

plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')

plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')

plt.xlim([0.0, 1.0])

plt.ylim([0.0, 1.05])

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('Receiver Operating Characteristic (ROC) Curve')

plt.legend(loc="lower right")

plt.show()

output:

Ranking metrics

Python code

import numpy as np

import pandas as pd

from sklearn.metrics import precision_score, average_precision_score, roc_auc_score

np.random.seed(42)

n_transactions = 1000

data = pd.DataFrame({

'transaction_id': range(1, n_transactions + 1),

'amount': np.random.rand(n_transactions) * 1000, # transaction amounts between 0 and 1000

'is_fraud': np.random.binomial(1, 0.05, n_transactions) # 5% fraud transactions

})

data['fraud_score'] = np.random.rand(n_transactions)

data.loc[data['is_fraud'] == 1, 'fraud_score'] = np.random.rand(sum(data['is_fraud'])) * 0.5 + 0.5 #

higher scores for fraud

data = data.sort_values(by='fraud_score', ascending=False).reset_index(drop=True)

def precision_at_k(y_true, y_scores, k):

"""Calculate Precision at K."""

order = np.argsort(y_scores)[::-1]

y_true = np.asarray(y_true)[order][:k]

return np.mean(y_true)

def mean_average_precision(y_true, y_scores):

"""Calculate Mean Average Precision (MAP)."""

return average_precision_score(y_true, y_scores)

def roc_auc(y_true, y_scores):

"""Calculate ROC AUC."""

return roc_auc_score(y_true, y_scores)

y_true = data['is_fraud'].values

y_scores = data['fraud_score'].values

p_at_10 = precision_at_k(y_true, y_scores, 10)

p_at_100 = precision_at_k(y_true, y_scores, 100)

map_score = mean_average_precision(y_true, y_scores)

roc_auc_score_val = roc_auc(y_true, y_scores)

print(f"Precision at 10: {p_at_10:.4f}")

print(f"Precision at 100: {p_at_100:.4f}")

print(f"Mean Average Precision (MAP): {map_score:.4f}")

print(f"ROC AUC: {roc_auc_score_val:.4f}")

output:

Diversity metrics

Python code

from sklearn.metrics import precision_score, recall_score, f1_score

def evaluate_diversity_metrics(true_labels, predicted_labels):

precision = precision_score(true_labels, predicted_labels)

recall = recall_score(true_labels, predicted_labels)

f1 = f1_score(true_labels, predicted_labels)

return precision, recall, f1

true_labels = [0, 1, 1, 0, 1, 0]

predicted_labels = [0, 1, 0, 0, 1, 1]

precision, recall, f1 = evaluate_diversity_metrics(true_labels, predicted_labels)

print("Precision:", precision)

print("Recall:", recall)

print("F1 Score:", f1)

output:

Novelty metrics

Make sure you have scikit-learn installed (pip install scikit-learn).

Python code

from sklearn.metrics import precision_score, recall_score, f1_score

true_labels = [0, 1, 0, 0, 1, 1, 0, 0, 0, 1]

predicted_labels = [0, 0, 1, 0, 1, 1, 0, 1, 0, 1]

precision = precision_score(true_labels, predicted_labels)

recall = recall_score(true_labels, predicted_labels)

f1 = f1_score(true_labels, predicted_labels)

print("Precision:", precision)

print("Recall:", recall)

print("F1-score:", f1)

output:
Model selection

Sure, here's a Python code example that follows the steps outlined:

python

from sklearn.model_selection import train_test_split

from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score, roc_auc_score

from sklearn.ensemble import IsolationForest

from sklearn.svm import OneClassSVM

from sklearn.neighbors import LocalOutlierFactor

# Step 1: Choose Evaluation Metrics

evaluation_metrics = {

'Precision': precision_score,

'Recall': recall_score,

'F1-score': f1_score,

'Accuracy': accuracy_score,

'AUC-ROC': roc_auc_score

# Step 2: Train and Evaluate Models

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

models = {

'Isolation Forest': IsolationForest(),

'One-Class SVM': OneClassSVM(),

'Local Outlier Factor': LocalOutlierFactor()

results = {}
for name, model in models.items():

model.fit(X_train)

y_pred = model.predict(X_test)

results[name] = {}

for metric_name, metric_func in evaluation_metrics.items():

if metric_name == 'AUC-ROC':

if name == 'Local Outlier Factor':

# AUC-ROC not applicable for Local Outlier Factor

results[name][metric_name] = None

else:

results[name][metric_name] = metric_func(y_test, y_pred)

else:

results[name][metric_name] = metric_func(y_test, y_pred)

for name, metrics in results.items():

print(f"Model: {name}")

for metric_name, value in metrics.items():

print(f"{metric_name}: {value}")

print()

best_model = max(results, key=lambda x: results[x]['F1-score'])

print(f"Best Model: {best_model}")

output:

Make sure to replace X and y with your feature matrix and target vector respectively. This code trains
and evaluates three anomaly detection models (Isolation Forest, One-Class SVM, and Local Outlier
Factor) using various evaluation metrics. Finally, it selects the best model based on the F1-score.

Conclusion

In conclusion, the process of model selection for fraud detection in financial transactions involves
several key steps:

1. Choosing Evaluation Metrics: Defining evaluation metrics that align with project objectives and user
requirements, such as precision, recall, F1-score, accuracy, and AUC-ROC.

2. Training and Evaluating Models: Training different anomaly detection models, such as Isolation
Forest, One-Class SVM, and Local Outlier Factor, and evaluating their performance using the chosen
evaluation metrics.

3. Considering Trade-offs: Analyzing trade-offs between accuracy, diversity, and novelty. Models with
high accuracy may not necessarily excel in detecting novel fraud patterns, and vice versa.

4. Examining Performance Across Metrics: Comparing the performance of each model across all
evaluation metrics to understand their strengths and weaknesses.
5. Selecting the Best Model: Selecting the model that best balances the trade-offs and meets the
project objectives and user requirements. This could involve choosing the model with the highest F1-
score or considering a combination of metrics.

6. Iterating if Necessary: If none of the models meet the desired criteria, iterating by fine-tuning
parameters, exploring different algorithms, or preprocessing the data differently.

By following these steps, one can systematically evaluate and select the most suitable model for fraud
detection in financial transactions, considering both performance metrics and trade-offs.

Supervised Learning
100% (1)
Supervised Learning
15 pages
Credit Card Fraud Detection (Data Analyst)
No ratings yet
Credit Card Fraud Detection (Data Analyst)
22 pages
Credit Card Fraud Detection
100% (1)
Credit Card Fraud Detection
20 pages
ANN, KNN & Decision Tree
No ratings yet
ANN, KNN & Decision Tree
13 pages
Module 3.4 Classification Models, Case Study
No ratings yet
Module 3.4 Classification Models, Case Study
12 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
8 pages
Case Study Stock Market Prediciton
No ratings yet
Case Study Stock Market Prediciton
10 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Phase 3
No ratings yet
Phase 3
19 pages
Credit Card Fraud Detection
100% (1)
Credit Card Fraud Detection
14 pages
Credit Card Fraud Analysis Ashutosh
No ratings yet
Credit Card Fraud Analysis Ashutosh
3 pages
Import As Import As From Import From Import From Import From Import
No ratings yet
Import As Import As From Import From Import From Import From Import
4 pages
Project Report
No ratings yet
Project Report
34 pages
AI and DS Final Document For Phase 5
No ratings yet
AI and DS Final Document For Phase 5
9 pages
Phase 5
No ratings yet
Phase 5
10 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
25 pages
SML Practicals
No ratings yet
SML Practicals
4 pages
Final Way
No ratings yet
Final Way
15 pages
Online Payment Fraud Detection Using Machine Learning
No ratings yet
Online Payment Fraud Detection Using Machine Learning
2 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Aifb Lab Manual Exp 6 - Aids
No ratings yet
Aifb Lab Manual Exp 6 - Aids
3 pages
Project Report - Credit Card Fraud Detection
No ratings yet
Project Report - Credit Card Fraud Detection
11 pages
Code
No ratings yet
Code
6 pages
Session 5
No ratings yet
Session 5
21 pages
Disaster
No ratings yet
Disaster
20 pages
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
No ratings yet
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
15 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
ADS Expt5 BE9 29
No ratings yet
ADS Expt5 BE9 29
3 pages
Final Project Document
No ratings yet
Final Project Document
8 pages
Machine Learning Report
No ratings yet
Machine Learning Report
5 pages
Fraud Prediction Random Forest
No ratings yet
Fraud Prediction Random Forest
22 pages
ML Final
No ratings yet
ML Final
34 pages
Project 2
No ratings yet
Project 2
5 pages
SQR Da 2
No ratings yet
SQR Da 2
11 pages
Jupyter Lab
No ratings yet
Jupyter Lab
42 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
Journal Paper
No ratings yet
Journal Paper
5 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Classification
No ratings yet
Classification
3 pages
Deep Learning For Credit Risk 1713932406
No ratings yet
Deep Learning For Credit Risk 1713932406
13 pages
ML Fat
No ratings yet
ML Fat
9 pages
IBM Credit Card Fraud Detection
No ratings yet
IBM Credit Card Fraud Detection
12 pages
CP4252 Lab Manual
No ratings yet
CP4252 Lab Manual
13 pages
Fraud Detection in Banking Data Using Machine Learning
No ratings yet
Fraud Detection in Banking Data Using Machine Learning
17 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
B17 Discrete Report
No ratings yet
B17 Discrete Report
16 pages
AI
No ratings yet
AI
16 pages
Reseach Paper 2023
No ratings yet
Reseach Paper 2023
9 pages
Presentation Credit Card
No ratings yet
Presentation Credit Card
25 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Aiml Assignment
No ratings yet
Aiml Assignment
15 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
28 pages
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
9 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
Phase 2 New
No ratings yet
Phase 2 New
14 pages
Random Forest
No ratings yet
Random Forest
11 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
Backtrader Essentials: Building Successful Strategies with Python
From Everand
Backtrader Essentials: Building Successful Strategies with Python
Ali AZARY
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Wi-Fi Embedded Webcam
No ratings yet
Wi-Fi Embedded Webcam
17 pages
NM Front Page For IT
No ratings yet
NM Front Page For IT
2 pages
Call of Duty Wwii
No ratings yet
Call of Duty Wwii
2 pages
IPL Schedule 2024 in PDF
No ratings yet
IPL Schedule 2024 in PDF
11 pages
Skinny Arms Diet Plan
No ratings yet
Skinny Arms Diet Plan
2 pages
CCS345 Ethics and AI Lab
No ratings yet
CCS345 Ethics and AI Lab
61 pages
CCS345 Ethics and AI Lab Experiments - IT 35
No ratings yet
CCS345 Ethics and AI Lab Experiments - IT 35
61 pages
Oose Unit II
No ratings yet
Oose Unit II
76 pages
Oose Unit Notes
No ratings yet
Oose Unit Notes
33 pages
PHYSICS - Solved Exam
No ratings yet
PHYSICS - Solved Exam
16 pages
Nonverbal Communication
No ratings yet
Nonverbal Communication
6 pages
Kkwieer Category Wise Cap-I, Cap-II & Cap-III Off 2024-2025
No ratings yet
Kkwieer Category Wise Cap-I, Cap-II & Cap-III Off 2024-2025
4 pages
Research Methodologies Research Exercise
No ratings yet
Research Methodologies Research Exercise
14 pages
A Guide To Business Extended Essays
No ratings yet
A Guide To Business Extended Essays
23 pages
Eport Simplicity
25% (4)
Eport Simplicity
2 pages
2024 Emnlp-Main 1102
No ratings yet
2024 Emnlp-Main 1102
11 pages
Block 1: Unit 1 Teens Volunteering
No ratings yet
Block 1: Unit 1 Teens Volunteering
24 pages
Agile Teamwork - Minimize Handoffs
No ratings yet
Agile Teamwork - Minimize Handoffs
3 pages
Biology Review Webquest
No ratings yet
Biology Review Webquest
23 pages
DLP-DLL Making
No ratings yet
DLP-DLL Making
44 pages
Lifeskills 8 Simple Ways To Build Stronger Relationships, Communicate More Clearly, and Improve Your Health Full Chapter Download
100% (15)
Lifeskills 8 Simple Ways To Build Stronger Relationships, Communicate More Clearly, and Improve Your Health Full Chapter Download
15 pages
Cbp-2018-Concept-note-module Empowering Leadership Positive Gov Final
No ratings yet
Cbp-2018-Concept-note-module Empowering Leadership Positive Gov Final
11 pages
Perdev Week 1 3
No ratings yet
Perdev Week 1 3
24 pages
Course Outline DCIT 305 For 2425
No ratings yet
Course Outline DCIT 305 For 2425
3 pages
HRD BSP
No ratings yet
HRD BSP
11 pages
6A Programming-Assignment: Harmonic Model (Week 6) : Instructions
No ratings yet
6A Programming-Assignment: Harmonic Model (Week 6) : Instructions
3 pages
Final Output in 3is
No ratings yet
Final Output in 3is
2 pages
Module 5, PED 109
100% (1)
Module 5, PED 109
25 pages
10 Unidad Efl Tealear II
No ratings yet
10 Unidad Efl Tealear II
30 pages
Upcat 2020
No ratings yet
Upcat 2020
4 pages
Personal Data Form PDF
No ratings yet
Personal Data Form PDF
4 pages
ERST-3602H-A - Environment and Development (2024WI - Peterborough Campus)
No ratings yet
ERST-3602H-A - Environment and Development (2024WI - Peterborough Campus)
6 pages
Linda Silverman
No ratings yet
Linda Silverman
7 pages
Lesson Plan - Science Practicum
No ratings yet
Lesson Plan - Science Practicum
3 pages
Ms. Fatima Session Yazan On 9-4-2025
No ratings yet
Ms. Fatima Session Yazan On 9-4-2025
3 pages
Thinking (Week 9) Reviewer
No ratings yet
Thinking (Week 9) Reviewer
4 pages
Renesas Flash Programmer Sample Circuit For Programming PC Serial PDF
No ratings yet
Renesas Flash Programmer Sample Circuit For Programming PC Serial PDF
5 pages
Airez LP Demo
No ratings yet
Airez LP Demo
6 pages
NEUROLINGUISTIC PROGRAMMING Final
No ratings yet
NEUROLINGUISTIC PROGRAMMING Final
47 pages