0% found this document useful (0 votes)

29 views27 pages

Sibi 5

Uploaded by

Viththagi Kirishnarajah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views27 pages

Sibi 5

Uploaded by

Viththagi Kirishnarajah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

UNIVERSITY COLLEGE OF ENGINEERING (BIT CAMPUS)

– TIRUCHIRAPALLI

DEPARTMENT OF INFORMATION TECHNOLOGY

Completed the AI project named as

Fraud Detection on Online Transactions

Submitted by

Sibani Selvi P– 810022205056

PROJECT TITLE: AI-FRAUD DETECTION ON
TRANSACTIONS

Introduction:

Fraud detection in online transactions is a critical aspect of modern e-commerce

and financial services, aiming to safeguard both businesses and consumers from
fraudulent activities. With the rapid growth of digital transactions, the need for
robust and efficient fraud detection systems has become paramount. These
systems leverage advanced machine learning algorithms and data analytics to
identify and prevent fraudulent behavior in real time. By analyzing patterns and
anomalies in transaction data, they ensure the security and integrity of online
financial interactions, thereby fostering trust and reliability in digital commerce.

Project Objectives:

1. Accurate Identification of Fraudulent Transactions:

• Develop a robust model that can accurately distinguish between
fraudulent and legitimate transactions. This involves optimizing the
model to minimize both false positives (legitimate transactions
flagged as fraud) and false negatives (fraudulent transactions not
detected).

2. Efficient Real-Time Detection:

• Ensure the model can process and evaluate transactions in real-
time, providing immediate responses to prevent fraudulent activities
from causing significant harm.
3. Handling Data Imbalance:
• Implement techniques to effectively manage the typically
imbalanced nature of fraud detection datasets, where fraudulent
transactions are rare compared to legitimate ones.
4. Scalability and Performance:
• Build a model that can scale with increasing transaction
volumes and maintain high performance and accuracy, even as the
data grows over time.

System Requirements:
Data:

• "fraud detection" dataset: a comprehensive dataset containing mode of

transaction and determining whether it is fraud or not and the account
balances of the user .

• Features: step,type,amount,nameOrg,oldbalanceOrg,newbalanceOrg

nameDest, oldbalanceDest, newbalanceDest, isfraud.

Hardware:

• Processor: Intel Core i5 or equivalent (i7 or better

recommended)
• RAM: 8 GB minimum (16 GB or more recommended for larger
datasets)
• Hard Drive: 50 GB free space (more space may be needed
depending on data size)
• Internet Connection (for downloading libraries and data)

Software:
• Operating System: Windows 10 (64-bit), macOS, or Linux (e.g.,
Ubuntu)
• Python (version 3.6 or later): https://www.python.org/downloads/

Python Libraries:

• -Pandas: https://pandas.pydata.org/ (data manipulation)

• scikit-learn: https://scikit-learn.org/ (machine learning)
• TensorFlow: https://www.tensorflow.org/ (deep learning)
• NumPy (usually installed with SciPy): https://numpy.org/
(numerical computing)
• Matplotlib: https://matplotlib.org/ (data visualization)
• Text Editor or IDE with Python Support

o Methodology:
o Data Preprocessing:
o 1. Data Cleaning:
• Identify missing values in the dataset.
• Decide on appropriate strategies to handle missing values,
such as imputation (mean, median, mode) or removal of
records with missing data.
• Detect outliers using statistical methods (e.g., Z-score, IQR).
• Handle outliers by either capping values, transforming them,
or removing the outlier data points if necessary.

2. Data Transformation:
• Normalize numerical features to a standard range (e.g., 0 to
1) or standardize them to have a mean of 0 and standard
deviation of 1.
• Convert categorical variables into numerical form using
techniques such as one-hot encoding or label encoding.

3. Data Splitting:

• Split the dataset into training and testing sets (e.g., 80%
training, 20% testing) to evaluate model performance.
• Further split the training set into a validation set to fine-tune
model parameters and prevent overfitting.

4. Feature Engineering:

• Extract relevant features from raw data that may be useful

for prediction (e.g., age, blood pressure, medical history).
• Create new features based on domain knowledge and
exploratory data analysis (e.g., BMI, age groups).
• Select the most relevant features using techniques such as
correlation analysis, mutual information, or feature
importance from models like random forests.

5. Handling Imbalanced Data:

• Apply techniques such as oversampling the minority class
(e.g., SMOTE) or undersampling the majority class to
balance the dataset.
• Generate synthetic data points for the minority class to
improve model training on imbalanced datasets.

6. Data Augmentation:

▪ If applicable, apply data augmentation techniques to increase

the diversity of the training set without collecting new data

(e.g., noise addition, rotations for image data).

7. Data Integration:

• If multiple data sources are available, integrate them into a

single dataset, ensuring consistency and alignment of
features.
• Identify and remove duplicate records to avoid redundancy
and potential bias in the dataset.

8. Data Annotation:

• Ensure all data points are correctly labeled, especially in

supervised learning scenarios, to maintain the integrity of the
model training process.

9. Dimensionality Reduction (if necessary):

• Apply techniques like Principal Component Analysis (PCA)
or t-SNE to reduce the dimensionality of the dataset while
preserving important information.
• Reduce computational complexity and improve model
performance by eliminating irrelevant or redundant features.

10. Data Pipeline Automation:

• Create automated data preprocessing pipelines using tools

such as Python scripts, Scikit-learn pipelines, or data flow
management tools to ensure reproducibility and consistency.
• Implement monitoring mechanisms to continuously check
and validate the quality of incoming data.

Documentation and Reporting:

▪ Document the data preprocessing steps, including methods

used, transformations applied, and any assumptions made.

▪ Prepare detailed reports and visualizations to communicate

the preprocessing steps and their impact on the dataset

quality and model performance.

Existing work:

Fraud detection in online transactions is a critical area of research aimed at

identifying and preventing unauthorized or illegitimate activities. Existing work
in this field leverages a variety of techniques, including machine learning,
statistical analysis, and rule-based systems. Machine learning approaches often
employ supervised learning models, such as decision trees, random forests, and
neural networks, which are trained on labeled datasets containing both
fraudulent and legitimate transactions. These models can identify complex
patterns and anomalies indicative of fraud. Unsupervised learning methods, like
clustering and anomaly detection, are also used to detect outliers in transaction
data that may suggest fraudulent behavior. Statistical methods, including
logistic regression and Bayesian networks, provide probabilistic frameworks for
assessing the likelihood of fraud. Rule-based systems, which rely on predefined
heuristics and expert knowledge, offer straightforward but often less flexible
solutions. Additionally, hybrid models that combine multiple techniques are
increasingly popular, as they can leverage the strengths of each method to
improve detection accuracy. Ongoing advancements in data analytics, real-time
processing, and the integration of external data sources continue to enhance the
efficacy of fraud detection systems in mitigating online transaction fraud.

Proposed Work:

The core of the project involves the selection and training of machine learning
models. We will leverage a traditional and advanced algorithms called
Random Forest. We have enhanced the ai model to determine whether the
transaction is fraud or not based on their recent transactions with the other users
and based on the complaints registered by them.

Flow Chart:

Implementation:

#Data Description
import pandas as pd
import numpy as np
data=pd.read_csv("/content/fraud1.csv")
data.head()
data.tail()
data.info()
data.describe()
#Null Data Handling
data.isnull()
data.notnull()
data.isnull().sum()
data.dropna()
data.fillna(0)
#Data Validation
data["type"].unique()
data["oldbalanceOrg"].unique()
data["isFraud"].unique()
#Data Reshaping
df_stacked=data.stack()
print(df_stacked.head(10))
df_unstacked=df_stacked.unstack()
print(df_unstacked.head(5))
df_melt=data.melt(id_vars=['type','isFraud'])
print(df_melt.head(10))
transposed_data=data.T
print(transposed_data)
#data merging
data1=pd.read_csv("/content/crd.csv")
merged_data=pd.merge(data, data1, on="type", how="inner")
print(merged_data)
#Data Aggregation
aggregated_df = data.groupby('type').agg({'amount': ['mean', 'sum']})
print(aggregated_df)
#data Groupby
mean_value = data.groupby('type')['amount'].mean()
sum_value = data.groupby('type')['amount'].sum()

print("Mean:", mean_value)
print("Sum:", sum_value)
#Data Analysis Techniques
#Univariate Analysis
import matplotlib.pyplot as plt
import seaborn as sns
sns.histplot(data['amount'].tail(15),bins=20)
plt.title("univariate analysis")
plt.show()
#Bivariate analysis
x=data["amount"].head(10)
y=df["oldbalanceOrg"].head(10)
plt.scatter(x,y)
plt.title("Bivariate analysis")
plt.show()
#multivariate analysis
sns.pairplot(data.head(10))
plt.title("multivariate analysis")
plt.show()
#Histogram
import matplotlib.pyplot as plt
import pandas as pd
path="/content/drive/MyDrive/fraud1.csv"
df=pd.read_csv(path)
plt.hist(df['amount'].head(10),bins=20)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()
#Bar Chart
plt.bar(df['type'].value_counts().index,df['type'].value_counts().values)
plt.xlabel('Category')
plt.ylabel('Frequency')
plt.title('Bar Chart')
plt.show()
#Scatter Plot
plt.scatter(df['type'].head(25), df['amount'].head(25))
plt.xlabel('type')
plt.ylabel('amount')
plt.title('Scatter Plot')
plt.show()
#Box Plot
plt.boxplot(df['amount'])
plt.xlabel('Amount')
plt.ylabel('Value')
plt.title('Box Plot')
plt.show()
#Plot Pairs
sns.pairplot(df)
plt.title('Pair Plot')
plt.show()
#Interactive Scatter Plot
import plotly.express as px
fig = px.scatter(df.head(10), x='amount', y='type')
fig.show()
#Interactive Dashboards
import dash
import dash_core_components as dcc
import dash_html_components as html
import pandas as pd
app = dash.Dash(_name_)
app.layout = html.Div([
dcc.Graph(
id='interactive-plot',
figure={
'data': [
{'x': df['amount'], 'y': df['type'],
'mode': 'markers', 'type': 'scatter'}
],
'layout': {
'title': 'Interactive Scatter Plot',
'xaxis': {'title': 'amount'},
'yaxis': {'title': 'type'}
}
}
)
])
if _name_ == '_main_':
app.run_server(debug=True)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import scipy as sp
from tabulate import tabulate
import random
import tensorflow as tf
df = pd.read_csv('/content/drive/MyDrive/onlinefraud.csv')
df.head()
df.drop('isFlaggedFraud', axis=1, inplace=True)
df.info()
df.sample(5)
df.describe()
df.isnull().sum()
fraud_min_max = [
['amount', df.amount.min(), df.amount.max()],
['oldbalanceOrg', df.oldbalanceOrg.min(), df.oldbalanceOrg.max()],
['newbalanceOrig', df.newbalanceOrig.min(), df.newbalanceOrig.max()],
['oldbalanceDest', df.oldbalanceDest.min(), df.oldbalanceDest.max()],
['isFraud', df.isFraud.min(), df.isFraud.max()]
]

print(
tabulate(
fraud_min_max,
headers=['columns', 'min value', 'max value'],
showindex=True,
tablefmt='github',
numalign='right'
))
# Downcast numerical columns with smaller dtype
for col in df.columns:
if df[col].dtype == 'float64':
df[col] = pd.to_numeric(df[col], downcast='float')
if df[col].dtype == 'int64':
df[col] = pd.to_numeric(df[col], downcast='unsigned')

# Use category dtype for categorical column

df['type'] = df['type'].astype('category')
# Check duplicate values
df.duplicated().sum()
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (8,6)
df['step'].value_counts()
ax = sns.countplot(x='type', data=df, palette='PuBu')
for container in ax.containers:
ax.bar_label(container)
plt.title('Count plot of transaction type')
plt.legend(bbox_to_anchor=(1.05,1), loc='upper left')
plt.ylabel('Number of transactions')
sns.kdeplot(df['amount'], linewidth=4)
plt.title('Distribution of transaction amount')
fig, ax = plt.subplots(1,2,figsize=(20,5))

sns.countplot(x='type', data=df, hue='isFraud', palette='PuBu', ax=ax[0])

for container in ax[0].containers:
ax[0].bar_label(container)
ax[0].set_title('Count plot of transaction type')
ax[0].legend(loc='best')
ax[0].set_ylabel('Number of transactions')
df2 = df.groupby(['type', 'isFraud']).size().unstack()
df2.apply(lambda x : round(x/sum(x)*100, 2), axis=1).plot(kind='barh', stacked=True,
color=['lightsteelblue', 'steelblue'], ax=ax[1])
for container in ax[1].containers:
ax[1].bar_label(container, label_type='center')
ax[1].set_title('Count plot of transaction type')
ax[1].legend(bbox_to_anchor=(1.05,1), loc='upper left')
ax[1].set_ylabel('Number of transactions')
ax[1].grid(axis='y')
df1 = df[df['isFraud']==1]
df2 = df1['step'].value_counts().head(10)
ax = df2.plot(kind='bar', color='lightsteelblue')
for container in ax.containers:
ax.bar_label(container)
plt.title('Top 10 steps that often lead to fraudulent transactions')
plt.ylabel('Number of fraudulent transactions')
plt.xlabel('Step')
plt.grid(axis='x')

del ax, df2

# Data preprocessing
df['type'] = df['type'].map({'PAYMENT':0, 'CASH_IN':1, 'DEBIT':2, 'CASH_OUT':3,
'TRANSFER':4})
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from imblearn.under_sampling import RandomUnderSampler
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report, roc_curve, auc, ConfusionMatrixDisplay

seed = 42
np.random.seed(seed)
random.seed(seed)
tf.random.set_seed(seed)

X = df.copy()
X.drop(['nameOrig', 'newbalanceOrig', 'nameDest', 'newbalanceDest'], axis=1, inplace=True)
y = X.pop('isFraud')
def model_comparison_evaluate(classifiers, X, y):
print('K-Fold Cross-Validation:\n')
for name, model in classifiers.items():
print('{}:'.format(name))

scoring = ['accuracy', 'precision', 'recall', 'f1', 'roc_auc']

for score in scoring:

scores = cross_val_score(model, X, y, scoring=score, cv=skfold, n_jobs=-1)
print('Mean {} score: {:.3f} ({:.3f})'.format(score, scores.mean(), scores.std()))

print('\n')
lassifiers = { 'Random Forest Classifier':RandomForestClassifier(class_weight='balanced',
random_state=seed)}
model_comparison_evaluate(classifiers, X_train, y_train)
model = RandomForestClassifier(class_weight='balanced', random_state=seed)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_pred_score = model.predict_proba(X_test)[:,1]
print('Random Forest Classifier:')
print(classification_report(y_pred, y_test, labels=[0,1], target_names=['Non-Fraud [0]', 'Fraud
[1]']), '\n')

fig, ax = plt.subplots(1, 2, figsize=(20,5))

ax[0].set_title('Confusion Matrix of Random Forest Model:')
ConfusionMatrixDisplay.from_predictions(y_test, y_pred, colorbar=False, values_format='',
cmap='crest', ax=ax[0])
ax[0].grid(False)

fpr, tpr, thresholds = roc_curve(y_test, y_pred_score)

roc_auc = auc(fpr, tpr)
ax[1].set_title('ROC Curve - Random Forest Classifier')
ax[1].plot(fpr, tpr, label = 'AUC = %0.3f' % roc_auc, c='steelblue')
ax[1].plot([0,1],[0,1],'--', c='lightsteelblue')
ax[1].legend(loc='lower right')
ax[1].set_ylabel('True Positive Rate')
ax[1].set_xlabel('False Positive Rate')
# Calculate Mean Reciprocal Rank (MRR)
def reciprocal_rank(y_true, y_score):
if np.isscalar(y_true):
y_true = np.array([y_true])
order = np.argsort(y_score)[::-1]
ranks = np.where(y_true[order] == 1)[0]
if len(ranks) > 0:
return 1.0 / (ranks[0] + 1)
else:
return 0.0

# Calculate MRR for each test instance

rr_list = [reciprocal_rank(y_true, y_score) for y_true, y_score in zip(y_test, y_pred_score)]

# Calculate Mean Reciprocal Rank (MRR)

mrr = np.mean(rr_list)
print("Mean Reciprocal Rank (MRR):", mrr)
# Calculate Normalized Discounted Cumulative Gain (NDCG)
ndcg = ndcg_score(y_test.reshape(1, -1), y_pred_score.reshape(1, -1))
print("Normalized Discounted Cumulative Gain (NDCG):", ndcg)

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def intra_list_diversity(recommendation_lists):
intra_diversities = []
for recommendation_list in recommendation_lists:
list_length = len(recommendation_list)
if list_length <= 1:
intra_diversities.append(0) # If list has only one element, diversity is 0
else:
list_array = np.array(recommendation_list).reshape(1, -1) # Reshape to 2D array
similarity_matrix = cosine_similarity(list_array)
intra_diversity = 1 - np.mean(similarity_matrix)
intra_diversities.append(intra_diversity)
return intra_diversities

def inter_list_diversity(recommendation_lists):
inter_diversity = []
for i in range(len(recommendation_lists)):
for j in range(i + 1, len(recommendation_lists)):
list_i = np.array(recommendation_lists[i]).reshape(-1, 1) # Reshape to 2D array with
one column
list_j = np.array(recommendation_lists[j]).reshape(-1, 1) # Reshape to 2D array with
one column
similarity_matrix = cosine_similarity(list_i, list_j)
avg_similarity = similarity_matrix[0][0] # Only one value in the similarity matrix
inter_diversity.append(avg_similarity)
return np.mean(inter_diversity)

# Sample data
data = {
'step': [1, 1, 1, 1, 1],
'type': ['PAYMENT', 'PAYMENT', 'TRANSFER', 'CASH_OUT', 'PAYMENT'],
'amount': [9839.64, 1864.28, 181.0, 181.0, 11668.14],
'nameOrig': ['C1231006815', 'C1666544295', 'C1305486145', 'C840083671',
'C2048537720'],
'oldbalanceOrg': [170136.0, 21249.0, 181.0, 181.0, 41554.0],
'newbalanceOrig': [160296.36, 19384.72, 0.0, 0.0, 29885.86],
'nameDest': ['M1979787155', 'M2044282225', 'C553264065', 'C38997010',
'M1230701703'],
'oldbalanceDest': [0.0, 0.0, 0.0, 21182.0, 0.0],
'newbalanceDest': [0.0, 0.0, 0.0, 0.0, 0.0],
'isFraud': [0, 0, 1, 1, 0],
'isFlaggedFraud': [0, 0, 0, 0, 0]
}
# Create DataFrame
df = pd.DataFrame(data)

# Group transactions by type

recommendation_lists = df.groupby('type')['isFraud'].apply(list).tolist()

# Calculate Intra-List Diversity

intra_diversities = intra_list_diversity(recommendation_lists)
print("Intra-List Diversities:", intra_diversities)

# Calculate Inter-List Diversity

inter_diversities = inter_list_diversity(recommendation_lists)
print("Inter-List Diversities:", inter_diversities)

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

data = {
'step': [1, 1, 1, 1, 1],
'type': ['PAYMENT', 'PAYMENT', 'TRANSFER', 'CASH_OUT', 'PAYMENT'],
'amount': [9839.64, 1864.28, 181.0, 181.0, 11668.14],
'nameOrig': ['C1231006815', 'C1666544295', 'C1305486145', 'C840083671',
'C2048537720'],
'oldbalanceOrg': [170136.0, 21249.0, 181.0, 181.0, 41554.0],
'newbalanceOrig': [160296.36, 19384.72, 0.0, 0.0, 29885.86],
'nameDest': ['M1979787155', 'M2044282225', 'C553264065', 'C38997010',
'M1230701703'],
'oldbalanceDest': [0.0, 0.0, 0.0, 21182.0, 0.0],
'newbalanceDest': [0.0, 0.0, 0.0, 0.0, 0.0],
'isFraud': [0, 0, 1, 1, 0],
'isFlaggedFraud': [0, 0, 0, 0, 0]
}

# Create DataFrame
df = pd.DataFrame(data)

# Group transactions by some criteria (e.g., 'type')

recommendation_lists = df.groupby('type')['isFraud'].apply(list).tolist()
# Calculate Inter-List Diversity
def inter_list_diversity(recommendation_lists):
inter_diversity = []
for i in range(len(recommendation_lists)):
for j in range(i + 1, len(recommendation_lists)):
list_i = np.array(recommendation_lists[i]).reshape(-1, 1) # Reshape to 2D array with
one column
list_j = np.array(recommendation_lists[j]).reshape(-1, 1) # Reshape to 2D array with
one column
similarity_matrix = cosine_similarity(list_i, list_j)
avg_similarity = similarity_matrix[0][0] # Only one value in the similarity matrix
inter_diversity.append(avg_similarity)
return np.mean(inter_diversity)

# Calculate Inter-List Diversity

inter_list_div = inter_list_diversity(recommendation_lists)
print("Inter-List Diversity:", inter_list_div)

# Calculate Average Popularity

def average_popularity(y_pred, popularity_scores):
total_popularity = 0
num_recommendations = 0
for pred_labels in y_pred:
indices = np.where(pred_labels == 1)[0] # Indices of recommended items
if len(indices) > 0:
total_popularity += np.sum(popularity_scores[indices])
num_recommendations += len(indices)
if num_recommendations > 0:
return total_popularity / num_recommendations
else:
return 0

# Calculate Novelty Score

def novelty_score(y_pred, popularity_scores):
avg_popularity = average_popularity(y_pred, popularity_scores)
if avg_popularity > 0:
return 1 / avg_popularity
else:
return float('inf') # Return infinity for cases where no recommendations are made

# Assuming popularity_scores is an array containing popularity scores for each item

# Calculate novelty metrics
# Assuming popularity_scores is an array containing popularity scores for each item
# Calculate novelty metrics
# Assuming popularity_scores is an array containing popularity scores for each item
# For example, if you have a DataFrame df containing the data and 'popularity' is one of the
columns representing popularity scores:

popularity_scores = df['amount'].values

# Calculate novelty metrics

average_pop = average_popularity(y_pred, popularity_scores)
novelty_scr = novelty_score(y_pred, popularity_scores)

print("Average Popularity:", average_pop)

print("Novelty Score:", novelty_scr)

OUTPUT:
#Data Description:

#Null Data Handling

#Data Validation

#isFraud.unique()

#Data Reshaping

#transpose
#Data Merging

#Data Aggregation

#Data Analysis
#Histogram
#Bar Chart

#Scatter Plot

#Box Plot
#Multivariate Analysis

#Interactive Scatterplots

#Interactive Dashboards
#Count Plot
#Module Training

Future Enhancements:

Future enhancements of this fraud detection project could include the

integration of more sophisticated machine learning techniques, such as deep
learning models, to improve accuracy and adaptability. Additionally,
incorporating real-time data streams and enhancing the system's ability to learn
from new fraud patterns continuously would further strengthen its effectiveness.
Expanding the dataset to include a more diverse range of transaction types and
geographical locations can also improve the model's generalizability.
Furthermore, implementing advanced user behavior analytics and anomaly
detection mechanisms could provide deeper insights into fraudulent activities.
Lastly, ensuring the system's scalability and compliance with evolving
regulatory standards will be crucial for its long-term viability and success.

Conclusion:

In conclusion, our project on fraud detection in online transactions using AI and

machine learning has demonstrated significant potential in enhancing financial
security. By leveraging advanced algorithms and real-time data analysis, we
have developed a robust model capable of accurately identifying fraudulent
activities, thereby minimizing financial losses and protecting users. The
successful implementation of this project underscores the critical role of AI in
combating online fraud, and it paves the way for further innovations and
improvements in the field of cybersecurity.

Fraud Detection in Financial Transactions
No ratings yet
Fraud Detection in Financial Transactions
5 pages
Fraud Detection Using Machine Learning
No ratings yet
Fraud Detection Using Machine Learning
46 pages
Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
No ratings yet
Fraud Detection in Financial Transactions - PPT.PPTX - 20240805 - 175608 - 0000
22 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
8 pages
BESM - Cold Hands, Dark Hearts
No ratings yet
BESM - Cold Hands, Dark Hearts
132 pages
Fraud Detection On Bankism Data
No ratings yet
Fraud Detection On Bankism Data
25 pages
Phase 2 New
No ratings yet
Phase 2 New
14 pages
ML CBP Finally Done
No ratings yet
ML CBP Finally Done
23 pages
A Comparison Study of Fraud Detection in Usage of Credit Cards Using Machine Learning
No ratings yet
A Comparison Study of Fraud Detection in Usage of Credit Cards Using Machine Learning
24 pages
Notes Applications of ICT
No ratings yet
Notes Applications of ICT
10 pages
Credit Card Fraud Detection Report
100% (1)
Credit Card Fraud Detection Report
17 pages
21BCE3954 FraudDetectionInBanking
No ratings yet
21BCE3954 FraudDetectionInBanking
26 pages
AI and DS Final Document For Phase 5
No ratings yet
AI and DS Final Document For Phase 5
9 pages
Fraud Detection in Financial Transactions
No ratings yet
Fraud Detection in Financial Transactions
2 pages
Machine Learning For Fraud Detection in Online Transactions
No ratings yet
Machine Learning For Fraud Detection in Online Transactions
4 pages
Fraud Detection in Financial Transaction Project
No ratings yet
Fraud Detection in Financial Transaction Project
1 page
ML Final
No ratings yet
ML Final
34 pages
Porposal Datamining
No ratings yet
Porposal Datamining
4 pages
Banking Fraud Detection Outline
No ratings yet
Banking Fraud Detection Outline
6 pages
Enhancing Financial Security
No ratings yet
Enhancing Financial Security
7 pages
New Report
No ratings yet
New Report
61 pages
Fin Irjmets1723025229-1
No ratings yet
Fin Irjmets1723025229-1
5 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
25 pages
Major Project 1
No ratings yet
Major Project 1
14 pages
Phase-2 For DS
No ratings yet
Phase-2 For DS
13 pages
.Trashed 1750261541 Phase 2 - Hari
No ratings yet
.Trashed 1750261541 Phase 2 - Hari
3 pages
1.3 Project Objectives
No ratings yet
1.3 Project Objectives
3 pages
TESUP ATLAS7 Wind Turbine User Manual
No ratings yet
TESUP ATLAS7 Wind Turbine User Manual
31 pages
Fraud Detection Project Report
No ratings yet
Fraud Detection Project Report
4 pages
Online Fraud Report
No ratings yet
Online Fraud Report
15 pages
8838 Sanjay Fraud Detection
No ratings yet
8838 Sanjay Fraud Detection
5 pages
Phase 5 Fraud Detection in Financial Transactions
No ratings yet
Phase 5 Fraud Detection in Financial Transactions
17 pages
Quiz (FSC200 FSG L2) - Attempt Review2
100% (1)
Quiz (FSC200 FSG L2) - Attempt Review2
11 pages
Aifb Lab Manual Exp 6 - Aids
No ratings yet
Aifb Lab Manual Exp 6 - Aids
3 pages
Mano Phase 2
No ratings yet
Mano Phase 2
10 pages
1
No ratings yet
1
13 pages
Fraud Detection Synopsis
No ratings yet
Fraud Detection Synopsis
5 pages
Wa0006
No ratings yet
Wa0006
6 pages
PROPOSAL - TechFusion Innovators Challenge 2024
No ratings yet
PROPOSAL - TechFusion Innovators Challenge 2024
4 pages
HACKATHON
No ratings yet
HACKATHON
6 pages
Synopsis ML Projectpdf
No ratings yet
Synopsis ML Projectpdf
13 pages
Phase 5
No ratings yet
Phase 5
10 pages
Financial Fraud Detection
No ratings yet
Financial Fraud Detection
11 pages
2017 Pascal Solution
No ratings yet
2017 Pascal Solution
9 pages
Atlib B
100% (1)
Atlib B
2 pages
Phase 3
No ratings yet
Phase 3
19 pages
Synopsis Format For MR
No ratings yet
Synopsis Format For MR
5 pages
11
No ratings yet
11
15 pages
Anytone AT-D578UV User
No ratings yet
Anytone AT-D578UV User
38 pages
Upi Demo 1
No ratings yet
Upi Demo 1
12 pages
Phase 1 Doc - Fraud Detection in Financial Transaction
No ratings yet
Phase 1 Doc - Fraud Detection in Financial Transaction
6 pages
6:12 Volt Lead Acid Battery Charger - Power Supply Circuits
No ratings yet
6:12 Volt Lead Acid Battery Charger - Power Supply Circuits
3 pages
Instructions of SH 043 Interface Screen汇能达CEM9000SH 043接口屏使用说明书 20180515
No ratings yet
Instructions of SH 043 Interface Screen汇能达CEM9000SH 043接口屏使用说明书 20180515
16 pages
Group10 PPT
No ratings yet
Group10 PPT
31 pages
Nityananda Vyawhare 2223216 Case Study 5
No ratings yet
Nityananda Vyawhare 2223216 Case Study 5
5 pages
Report
No ratings yet
Report
14 pages
Internship Project
No ratings yet
Internship Project
8 pages
Multiple Choice Questions For Mid - 1
No ratings yet
Multiple Choice Questions For Mid - 1
26 pages
Fraud Detection
No ratings yet
Fraud Detection
19 pages
Final Year Project
No ratings yet
Final Year Project
27 pages
Final Synopsis Fraud Detection
No ratings yet
Final Synopsis Fraud Detection
15 pages
Mini Project
No ratings yet
Mini Project
3 pages
Samsung LN46C550J1FXZA Fast Track Guide (SM)
No ratings yet
Samsung LN46C550J1FXZA Fast Track Guide (SM)
4 pages
Fraud Detection Synopsis
No ratings yet
Fraud Detection Synopsis
14 pages
Chapter No. Title NO.: 1.2 About The Project
No ratings yet
Chapter No. Title NO.: 1.2 About The Project
5 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Final Project Document
No ratings yet
Final Project Document
8 pages
Fraud Detection in Financial Transaction
No ratings yet
Fraud Detection in Financial Transaction
5 pages
RRL - Revision
No ratings yet
RRL - Revision
4 pages
Yashwanth Kumar G N: Mob No:-9980703082 Email ID
No ratings yet
Yashwanth Kumar G N: Mob No:-9980703082 Email ID
2 pages
Sequence The Activities
No ratings yet
Sequence The Activities
1 page
Week 7 8 End User Computing
No ratings yet
Week 7 8 End User Computing
29 pages
Online Transactions Fraud Detection Using Machine Learning
No ratings yet
Online Transactions Fraud Detection Using Machine Learning
4 pages
Farm Land Leads PDF
No ratings yet
Farm Land Leads PDF
28 pages
Morelia Neo IV Pro KL As Turf Soccer Shoe - Mizuno USA
No ratings yet
Morelia Neo IV Pro KL As Turf Soccer Shoe - Mizuno USA
1 page
My Triumph Connectivity - Faq - English
No ratings yet
My Triumph Connectivity - Faq - English
21 pages
CCS345 Ethics and AI Lecture Notes 1
No ratings yet
CCS345 Ethics and AI Lecture Notes 1
3 pages
AI For Generation of Images
No ratings yet
AI For Generation of Images
2 pages
TK Series Magnet GPS Tracker USER MANUAL
No ratings yet
TK Series Magnet GPS Tracker USER MANUAL
26 pages
Brain CT and MRI Medical Image Fusion Using Convolutional Neural Networks and A Dual-Channel Spiking Cortical Model
No ratings yet
Brain CT and MRI Medical Image Fusion Using Convolutional Neural Networks and A Dual-Channel Spiking Cortical Model
14 pages
Fonduri Europene Digitalizare
No ratings yet
Fonduri Europene Digitalizare
4 pages
Linux Imp Topics
No ratings yet
Linux Imp Topics
29 pages
Tasks and Milestones
No ratings yet
Tasks and Milestones
2 pages
INTRODUCTION
No ratings yet
INTRODUCTION
5 pages
Fraud Detection in Financial Transaction
No ratings yet
Fraud Detection in Financial Transaction
7 pages
Code Calculus
No ratings yet
Code Calculus
20 pages
A3 - Adopt and Manage Devices
No ratings yet
A3 - Adopt and Manage Devices
9 pages
HV 48V 80AH LiFeP04
No ratings yet
HV 48V 80AH LiFeP04
1 page
Noto Sans Korean Font License
No ratings yet
Noto Sans Korean Font License
2 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet

Sibi 5

Uploaded by

Sibi 5

Uploaded by

UNIVERSITY COLLEGE OF ENGINEERING (BIT CAMPUS)

DEPARTMENT OF INFORMATION TECHNOLOGY

Completed the AI project named as

Fraud Detection on Online Transactions

Sibani Selvi P– 810022205056

Fraud detection in online transactions is a critical aspect of modern e-commerce

1. Accurate Identification of Fraudulent Transactions:

2. Efficient Real-Time Detection:

• "fraud detection" dataset: a comprehensive dataset containing mode of

nameDest, oldbalanceDest, newbalanceDest, isfraud.

• Processor: Intel Core i5 or equivalent (i7 or better

• -Pandas: https://pandas.pydata.org/ (data manipulation)

• Extract relevant features from raw data that may be useful

5. Handling Imbalanced Data:

▪ If applicable, apply data augmentation techniques to increase

the diversity of the training set without collecting new data

• If multiple data sources are available, integrate them into a

• Ensure all data points are correctly labeled, especially in

9. Dimensionality Reduction (if necessary):

10. Data Pipeline Automation:

• Create automated data preprocessing pipelines using tools

Documentation and Reporting:

▪ Document the data preprocessing steps, including methods

used, transformations applied, and any assumptions made.

▪ Prepare detailed reports and visualizations to communicate

the preprocessing steps and their impact on the dataset

Fraud detection in online transactions is a critical area of research aimed at

# Use category dtype for categorical column

sns.countplot(x='type', data=df, hue='isFraud', palette='PuBu', ax=ax[0])

del ax, df2

scoring = ['accuracy', 'precision', 'recall', 'f1', 'roc_auc']

for score in scoring:

fig, ax = plt.subplots(1, 2, figsize=(20,5))

fpr, tpr, thresholds = roc_curve(y_test, y_pred_score)

# Calculate MRR for each test instance

# Calculate Mean Reciprocal Rank (MRR)

# Group transactions by type

# Calculate Intra-List Diversity

# Calculate Inter-List Diversity

# Group transactions by some criteria (e.g., 'type')

# Calculate Inter-List Diversity

# Calculate Average Popularity

# Calculate Novelty Score

# Assuming popularity_scores is an array containing popularity scores for each item

# Calculate novelty metrics

print("Average Popularity:", average_pop)

#Null Data Handling

Future enhancements of this fraud detection project could include the

In conclusion, our project on fraud detection in online transactions using AI and

You might also like