0% found this document useful (0 votes)

11 views14 pages

Phase 2 New

The document discusses various techniques for data wrangling and preparing financial transaction data for fraud detection analysis. It covers data cleaning, feature engineering, outlier detection, data transformation, aggregation, handling imbalanced data, and splitting data into training, validation and test sets. The goal is to ensure high quality data and accurate fraud detection models.

Uploaded by

Ajay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views14 pages

Phase 2 New

Uploaded by

Ajay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Wrangling and Analysis

Introduction
In the intricate world of financial transactions, the specter of fraud looms large,
posing a constant threat to the integrity of systems and the trust of stakeholders. As
technology advances, so do the methods of fraudulent actors, necessitating a
dynamic and adaptive approach to detection and prevention. This paper delves into
the evolving landscape of fraud detection, exploring cutting-edge techniques and
collaborative efforts aimed at thwarting illicit activities. By examining the
intersection of data analytics, machine learning, and regulatory frameworks, we
seek to bolster defenses, minimize risks, and uphold the integrity of financial
systems. Join us on a journey to unravel the complexities of fraud detection, where
innovation and vigilance serve as our guiding principles in safeguarding assets and
preserving the integrity of financial transactions.

Objectives:
1. Early Detection: Implement systems capable of identifying fraudulent
activities at the earliest possible stage to minimize financial losses and
mitigate damage.
2. Accuracy: Strive for high accuracy in fraud detection algorithms to reduce
false positives and negatives, ensuring efficient allocation of resources for
investigation and prevention.
3. Adaptability: Develop flexible and adaptive fraud detection systems
capable of evolving alongside emerging fraud schemes and changing
regulatory landscapes.
4. Compliance: Ensure compliance with relevant laws, regulations, and
industry standards governing fraud detection and prevention to mitigate legal
risks and maintain trust.
5. Collaboration: Foster collaboration and information sharing among
financial institutions, regulatory bodies, law enforcement agencies, and
technology providers to enhance the collective ability to detect and combat
fraudulent activities effectively.
Dataset Description
The dataset comprises a collection of financial transactions spanning various types,
such as credit card transactions, wire transfers, and online payments. Each
transaction entry includes relevant features such as transaction amount, timestamp,
merchant information, and customer details. Additionally, the dataset contains
labels indicating whether each transaction is fraudulent or legitimate. With a
diverse range of transaction types and associated attributes, this dataset provides a
rich resource for training and evaluating fraud detection algorithms in real-world
scenarios.

Data Wrangling Techniques

1. Data Cleaning: Identify and handle missing values, outliers, and inconsistencies
in the dataset to ensure data quality and reliability for accurate fraud detection
models.

Code:
import pandas as pd

# Example data
data = {
'transaction_id': [1, 2, 3, 4, 5, 6],
'amount': [100, -200, 300, 400, 500, 600],
'merchant': ['A', 'B', 'C', 'A', 'B', 'C'],
'transaction_type': ['purchase', 'refund', 'purchase', 'purchase', 'purchase',
'refund'],
'is_fraud': [0, 1, 0, 0, 0, 1]
}

# Create DataFrame
df = pd.DataFrame(data)
# Remove transactions with negative amounts
df = df[df['amount'] > 0]

# Output before cleaning

print("Before Data Cleaning:")
print(df)

# Remove duplicates
df = df.drop_duplicates()

# Remove missing values

df = df.dropna()

# Output after cleaning

print("\nAfter Data Cleaning:")
print(df)

Output
2. Feature Engineering: Creating new features: Derived features such as
transaction frequency, transaction amount variability, and time-based features like
day of the week or time of day can provide valuable information for fraud
detection.

Code:
import pandas as pd
from datetime import datetime

# Sample data
data = {
'transaction_id': [1, 2, 3, 4, 5],
'amount': [100, 200, 150, 300, 400],
'merchant': ['A', 'B', 'C', 'A', 'B'],
'transaction_type': ['purchase', 'purchase', 'refund', 'purchase', 'purchase'],
'timestamp': ['2024-05-01 08:00:00', '2024-05-01 09:00:00', '2024-05-01
10:00:00', '2024-05-01 11:00:00', '2024-05-01 12:00:00'],
'is_fraud': [0, 0, 1, 0, 0]
}

# Create DataFrame
df = pd.DataFrame(data)

# Convert timestamp to datetime object

df['timestamp'] = pd.to_datetime(df['timestamp'])

# Feature engineering
df['hour_of_day'] = df['timestamp'].dt.hour
df['day_of_week'] = df['timestamp'].dt.dayofweek
df['transaction_count_merchant'] = df.groupby('merchant')
['transaction_id'].transform('count')
df['transaction_total_merchant'] = df.groupby('merchant')
['amount'].transform('sum')

# Print the DataFrame after feature engineering

print("DataFrame after Feature Engineering:")
print(df)

Output

3. Outlier Detection and Treatment: Identifying outliers: Outliers in transaction

amounts or other features can indicate potentially fraudulent behavior.

Code:

import pandas as pd
import numpy as np

# Sample financial transactions data

data = {
'transaction_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'amount': [100, 200, 150, 300, 400, 500, 600, 700, 800, 900],
'is_fraud': [0, 0, 0, 0, 0, 1, 0, 0, 0, 0] # Assuming 6th transaction is fraudulent
}

# Create DataFrame
df = pd.DataFrame(data)
# Detect outliers using z-score method
threshold = 3
mean = np.mean(df['amount'])
std_dev = np.std(df['amount'])
df['z_score'] = (df['amount'] - mean) / std_dev

# Filter out transactions with z-score greater than threshold

df_filtered = df[df['z_score'].abs() <= threshold]

# Print DataFrame after outlier treatment

print("DataFrame after Outlier Detection and Treatment:")
print(df_filtered)

Output

4. Data Transformation: Normalization or standardization: Scaling numerical

features to a similar range can improve the performance of certain algorithms, such
as distance-based methods.
Code:
import pandas as pd

# Sample financial transactions data

data = {
'transaction_id': [1, 2, 3, 4, 5],
'amount': [100, 200, 150, 300, 400],
'merchant': ['A', 'B', 'C', 'A', 'B'],
'transaction_type': ['purchase', 'purchase', 'refund', 'purchase', 'purchase'],
'is_fraud': [0, 0, 1, 0, 0]
}

# Create DataFrame
df = pd.DataFrame(data)

# Data transformation: Normalization of 'amount' column

df['amount_normalized'] = (df['amount'] - df['amount'].min()) / (df['amount'].max()
- df['amount'].min())

# Print DataFrame after data transformation

print("DataFrame after Data Transformation:")
print(df)

Output
5. Data Aggregation and Summarization: Aggregating transactions: Grouping
transactions by attributes such as customer ID, merchant, or time period to
calculate summary statistics like total transaction amount, average transaction
amount, etc.

Code:

import pandas as pd

# Sample financial transactions data

# Create DataFrame
df = pd.DataFrame(data)

# Data aggregation and summarization: Group by 'merchant' and calculate total

transaction amount and count
summary_df = df.groupby('merchant').agg({'amount': 'sum', 'transaction_id':
'count'}).reset_index()
summary_df.columns = ['merchant', 'total_transaction_amount',
'transaction_count']

# Print summary DataFrame

print("Summary DataFrame after Data Aggregation and Summarization:")
print(summary_df)

Output

6. Handling Imbalanced Data: Resampling techniques: Addressing class

imbalance by oversampling minority class instances, undersampling majority class
instances, or using more advanced techniques like SMOTE (Synthetic Minority
Over-sampling Technique).

Code:

import pandas as pd
from sklearn.utils import resample

# Sample financial transactions data

# Separate majority and minority classes

df_majority = df[df['is_fraud'] == 0]
df_minority = df[df['is_fraud'] == 1]

# Upsample minority class

df_minority_upsampled = resample(df_minority, replace=True,
n_samples=len(df_majority), random_state=42)

# Combine majority class with upsampled minority class

df_upsampled = pd.concat([df_majority, df_minority_upsampled])

# Display class distribution after resampling

print("Class distribution after resampling:")
print(df_upsampled['is_fraud'].value_counts())

Output:

7. Data Splitting: Splitting the data into training, validation, and test sets to
evaluate model performance effectively.

Code:

import pandas as pd
from sklearn.model_selection import train_test_split
# Sample financial transactions data
data = {
'transaction_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'amount': [100, 200, 150, 300, 400, 500, 600, 700, 800, 900],
'merchant': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A'],
'is_fraud': [0, 0, 0, 0, 1, 0, 0, 0, 0, 0] # Assuming 5th transaction is fraudulent
}

# Create DataFrame
df = pd.DataFrame(data)

# Separate features and target variable

X = df.drop('is_fraud', axis=1)
y = df['is_fraud']

# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Display the shape of training and test sets

print("Shape of training set:", X_train.shape)
print("Shape of test set:", X_test.shape)

Output

Conclusion

Data wrangling techniques are vital for preparing raw data for analysis and
modeling. From cleaning data to handling imbalanced datasets, each step ensures
data integrity and model accuracy. Feature engineering uncovers valuable patterns,
while outlier detection prevents skewed results. Data transformation normalizes
features, enhancing model performance, and splitting data aids in robust
evaluation. Overall, data wrangling lays the foundation for effective fraud
detection, enabling accurate identification and prevention of fraudulent activities in
financial transactions.

Code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.utils import resample

# Sample financial transactions data

data = {
'transaction_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'amount': [100, 200, 150, 300, 400, 500, 600, 700, 800, 900],
'merchant': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A'],
'timestamp': ['2024-05-01 08:00:00', '2024-05-01 09:00:00', '2024-05-01
10:00:00', '2024-05-01 11:00:00',
'2024-05-01 12:00:00', '2024-05-01 13:00:00', '2024-05-01 14:00:00',
'2024-05-01 15:00:00',
'2024-05-01 16:00:00', '2024-05-01 17:00:00'],
'is_fraud': [0, 0, 0, 0, 1, 0, 0, 0, 0, 0] # Assuming 5th transaction is fraudulent
}

# Create DataFrame
df = pd.DataFrame(data)

# Data cleaning
df_cleaned = df.drop_duplicates().dropna().reset_index(drop=True)

# Feature engineering
df_cleaned['hour_of_day'] = pd.to_datetime(df_cleaned['timestamp']).dt.hour
df_cleaned['day_of_week'] =
pd.to_datetime(df_cleaned['timestamp']).dt.dayofweek
df_cleaned['transaction_count_merchant'] = df_cleaned.groupby('merchant')
['transaction_id'].transform('count')
df_cleaned['total_transaction_amount'] = df_cleaned.groupby('merchant')
['amount'].transform('sum')

# Handling imbalanced data

fraudulent = df_cleaned[df_cleaned['is_fraud'] == 1]
non_fraudulent = df_cleaned[df_cleaned['is_fraud'] == 0]
non_fraudulent_upsampled = resample(non_fraudulent, replace=True,
n_samples=len(fraudulent), random_state=42)
df_balanced = pd.concat([fraudulent, non_fraudulent_upsampled])

# Data transformation
df_balanced['normalized_amount'] = (df_balanced['amount'] -
df_balanced['amount'].min()) / (df_balanced['amount'].max() -
df_balanced['amount'].min())

# Data splitting
X = df_balanced.drop('is_fraud', axis=1)
y = df_balanced['is_fraud']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Output
print("Cleaned and Engineered Data:")
print(df_balanced)

Output

Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
No ratings yet
Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
22 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
8 pages
Bank Fraud Detection Project
No ratings yet
Bank Fraud Detection Project
30 pages
IBM Credit Card Fraud Detection
No ratings yet
IBM Credit Card Fraud Detection
12 pages
PPT
100% (1)
PPT
19 pages
21BCE3954 FraudDetectionInBanking
No ratings yet
21BCE3954 FraudDetectionInBanking
26 pages
Fraud Detection On Bankism Data
No ratings yet
Fraud Detection On Bankism Data
25 pages
Disaster
No ratings yet
Disaster
20 pages
PPT Dự án cuối kỳ nhóm 8
No ratings yet
PPT Dự án cuối kỳ nhóm 8
38 pages
Data Science Project
No ratings yet
Data Science Project
15 pages
Financial Fraud Detection
No ratings yet
Financial Fraud Detection
13 pages
Fraud 2
No ratings yet
Fraud 2
20 pages
PROJECT1
No ratings yet
PROJECT1
17 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
25 pages
Error Detection On Banking Data
No ratings yet
Error Detection On Banking Data
30 pages
Bis Micro Project
No ratings yet
Bis Micro Project
8 pages
Final Document
No ratings yet
Final Document
14 pages
Report
No ratings yet
Report
14 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Phase-2 For DS
No ratings yet
Phase-2 For DS
13 pages
AI and DS Final Document For Phase 5
No ratings yet
AI and DS Final Document For Phase 5
9 pages
Finaldoc
No ratings yet
Finaldoc
19 pages
Banking Fraud Detection Outline
No ratings yet
Banking Fraud Detection Outline
6 pages
Task
No ratings yet
Task
15 pages
Paper 29
No ratings yet
Paper 29
9 pages
Phase 5
No ratings yet
Phase 5
10 pages
Financial Fraud Detection
No ratings yet
Financial Fraud Detection
11 pages
Midway Report Group 7
No ratings yet
Midway Report Group 7
8 pages
Fraud Detection in Financial Transaction Project
No ratings yet
Fraud Detection in Financial Transaction Project
18 pages
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
No ratings yet
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
15 pages
Synopsis ML Projectpdf
No ratings yet
Synopsis ML Projectpdf
13 pages
Phase 3-Artificial Intelligence-Project Development-Fraud Detection in Financial Transactions
No ratings yet
Phase 3-Artificial Intelligence-Project Development-Fraud Detection in Financial Transactions
6 pages
Phase 5 Fraud Detection in Financial Transactions
No ratings yet
Phase 5 Fraud Detection in Financial Transactions
17 pages
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
No ratings yet
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
10 pages
Sibi 5
No ratings yet
Sibi 5
27 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
8 pages
Phase 3
No ratings yet
Phase 3
19 pages
Mano Phase 2
No ratings yet
Mano Phase 2
10 pages
RJPOLICE HACK 496 Doc Submission
No ratings yet
RJPOLICE HACK 496 Doc Submission
5 pages
Imac Pretty 1
No ratings yet
Imac Pretty 1
8 pages
Credit Card Fraud Detection - Final
No ratings yet
Credit Card Fraud Detection - Final
3 pages
Fraud Detection Project Report
No ratings yet
Fraud Detection Project Report
4 pages
Documentation Part by Pranay Kashyap
No ratings yet
Documentation Part by Pranay Kashyap
7 pages
11
No ratings yet
11
15 pages
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
No ratings yet
DBNex Deep Belief Network and Explainable AI Based Financial Fraud Detection
10 pages
Phase 1 Doc - Fraud Detection in Financial Transaction
No ratings yet
Phase 1 Doc - Fraud Detection in Financial Transaction
6 pages
Nityananda Vyawhare 2223216 Case Study 5
No ratings yet
Nityananda Vyawhare 2223216 Case Study 5
5 pages
B17 Discrete Report
No ratings yet
B17 Discrete Report
16 pages
Machine Learning Report
No ratings yet
Machine Learning Report
5 pages
Research Paper
No ratings yet
Research Paper
8 pages
Topic 2
No ratings yet
Topic 2
5 pages
Full Report On Friction Stirr Weld
No ratings yet
Full Report On Friction Stirr Weld
66 pages
Secureswipe Pioneering Strategies For Next-Gen Credit Card Fraud Prevention 1
No ratings yet
Secureswipe Pioneering Strategies For Next-Gen Credit Card Fraud Prevention 1
9 pages
Steam Turbine Q &amp A
100% (2)
Steam Turbine Q &amp A
47 pages
Credit Card Fraud Detection Report
No ratings yet
Credit Card Fraud Detection Report
3 pages
Report
No ratings yet
Report
14 pages
Final Eddited Research Paper1
No ratings yet
Final Eddited Research Paper1
6 pages
FSR 2022 10 Toromocho SAG Mech Rev PDF
No ratings yet
FSR 2022 10 Toromocho SAG Mech Rev PDF
62 pages
Fraud Detection in Financial Transaction
No ratings yet
Fraud Detection in Financial Transaction
5 pages
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
No ratings yet
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
17 pages
Credit Card Fraud Detection and Analysis
No ratings yet
Credit Card Fraud Detection and Analysis
4 pages
Final Project Document
No ratings yet
Final Project Document
8 pages
OTP-Based Two-Factor Authentication Using Mobile Phones
No ratings yet
OTP-Based Two-Factor Authentication Using Mobile Phones
5 pages
Exalca Company Profile V 0.2
No ratings yet
Exalca Company Profile V 0.2
8 pages
Serial Port Programming:: RS 232 Standard
No ratings yet
Serial Port Programming:: RS 232 Standard
7 pages
CPP Tutorial
No ratings yet
CPP Tutorial
25 pages
Herborist Case
0% (1)
Herborist Case
7 pages
Toshiba MiniNB255 N250
No ratings yet
Toshiba MiniNB255 N250
3 pages
Measurement of Pressure Distribution Over A Cambered Airfoil
No ratings yet
Measurement of Pressure Distribution Over A Cambered Airfoil
8 pages
Samsung GT-i9082 Galaxy Grand 07 Level 2 Repair - Assembly, Disassembly
No ratings yet
Samsung GT-i9082 Galaxy Grand 07 Level 2 Repair - Assembly, Disassembly
6 pages
Infineon IHW20N135R5 DataSheet v02 - 02 EN
No ratings yet
Infineon IHW20N135R5 DataSheet v02 - 02 EN
15 pages
505
No ratings yet
505
2 pages
Screenshot 2023-11-13 at 8.08.02 AM
No ratings yet
Screenshot 2023-11-13 at 8.08.02 AM
15 pages
Legends Database - A Mix of The Old and New
No ratings yet
Legends Database - A Mix of The Old and New
3 pages
ZF 5hp24e Repair Manual
100% (60)
ZF 5hp24e Repair Manual
10 pages
Strategies For Translation of Similes in Four Different Persian Translations of Hamlet
No ratings yet
Strategies For Translation of Similes in Four Different Persian Translations of Hamlet
5 pages
3 2 2
No ratings yet
3 2 2
21 pages
The Human Person Flourishing 1.2
No ratings yet
The Human Person Flourishing 1.2
2 pages
TGT2-500-612 BLK 2,2kW (230400V50Hz) IE3 V5-2743
No ratings yet
TGT2-500-612 BLK 2,2kW (230400V50Hz) IE3 V5-2743
4 pages
Scheme H01H PDF
No ratings yet
Scheme H01H PDF
36 pages
Stucor APP
No ratings yet
Stucor APP
14 pages
HTTPSN - stucor.inqpSTUCOR - QP-ME8693.pdf - Ga 2.37483045.44996556.1678868889-841726017.1678868889& - GL 1a7dtw6 - gaODQxNzI2MDE 3
No ratings yet
HTTPSN - stucor.inqpSTUCOR - QP-ME8693.pdf - Ga 2.37483045.44996556.1678868889-841726017.1678868889& - GL 1a7dtw6 - gaODQxNzI2MDE 3
14 pages
Essential Photoshop Skills: Making Selections
No ratings yet
Essential Photoshop Skills: Making Selections
4 pages
PULSE2
No ratings yet
PULSE2
3 pages
Radar Manual EN V011531312026193
No ratings yet
Radar Manual EN V011531312026193
20 pages
P2 - BDD
No ratings yet
P2 - BDD
5 pages
The Avengers Revise
No ratings yet
The Avengers Revise
2 pages
Ae 20241129224525
No ratings yet
Ae 20241129224525
3 pages
Merritt Morning Market 3213 - Nov 5
No ratings yet
Merritt Morning Market 3213 - Nov 5
2 pages
60+ Best Sad Quotes About Love and Life - Funky Life: Visit
No ratings yet
60+ Best Sad Quotes About Love and Life - Funky Life: Visit
1 page
Card Speaker PDF
No ratings yet
Card Speaker PDF
2 pages
List Part Top Urgent
No ratings yet
List Part Top Urgent
8 pages
NN47210-503 01.01 Cfsym
No ratings yet
NN47210-503 01.01 Cfsym
132 pages
Customer Data Platforms: Use People Data to Transform the Future of Marketing Engagement
From Everand
Customer Data Platforms: Use People Data to Transform the Future of Marketing Engagement
Martin Kihn
No ratings yet

Phase 2 New

Uploaded by

Phase 2 New

Uploaded by

Data Wrangling and Analysis

Data Wrangling Techniques

# Output before cleaning

# Remove missing values

# Output after cleaning

# Convert timestamp to datetime object

# Print the DataFrame after feature engineering

3. Outlier Detection and Treatment: Identifying outliers: Outliers in transaction

# Sample financial transactions data

# Filter out transactions with z-score greater than threshold

# Print DataFrame after outlier treatment

4. Data Transformation: Normalization or standardization: Scaling numerical

# Sample financial transactions data

# Data transformation: Normalization of 'amount' column

# Print DataFrame after data transformation

# Sample financial transactions data

# Data aggregation and summarization: Group by 'merchant' and calculate total

# Print summary DataFrame

6. Handling Imbalanced Data: Resampling techniques: Addressing class

# Sample financial transactions data

# Separate majority and minority classes

# Upsample minority class

# Combine majority class with upsampled minority class

# Display class distribution after resampling

# Separate features and target variable

# Split the data into training and test sets

# Display the shape of training and test sets

# Sample financial transactions data

# Handling imbalanced data

You might also like