0% found this document useful (0 votes)
65 views51 pages

Book Recommendation System

The document discusses credit card fraud detection using machine learning algorithms. It introduces credit cards and issues related to fraudulent transactions. It then compares the performance of random forest, KNN and other algorithms in classifying credit card transactions as fraudulent or not. The study found KNN produced the highest accuracy in predicting and detecting fraudulent transactions.

Uploaded by

ranjith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views51 pages

Book Recommendation System

The document discusses credit card fraud detection using machine learning algorithms. It introduces credit cards and issues related to fraudulent transactions. It then compares the performance of random forest, KNN and other algorithms in classifying credit card transactions as fraudulent or not. The study found KNN produced the highest accuracy in predicting and detecting fraudulent transactions.

Uploaded by

ranjith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 51

PROJECT TITLE

Submitted in partial fulfillment of the requirements for the degree of

BACHELOR OF TECHNOLOGY
in
Computer Science and Engineering

by

NAME OF THE STUDENT


ROLL NUMBER

Under the guidance of


GUIDE NAME
DESIGNATION

DEPARTMENT OF COMPUTER SCIENCE AND


ENGINEERING TKR COLLEGE OF ENGINEERING AND
TECHNOLOGY (AUTONOMOUS)
(Accredited By NBA and NAAC with ’A+’ Grade)
Medbowli, Meerpet, Balapur (M), Hyderabad-500097
DECLARATION BY THE CANDIDATE

I, Mr./Ms. bearing Hall Ticket Number: ROLL NUMBER, hereby declare


that the main project report titled PROJECT TITLE under the guidance of
Dr./Mr./Ms. GUIDE NAME, Designation in Department of Computer
Science and Engineering is submitted in partial fulfillment of the requirements
for the award of the degree of Bachelor of Technology in Computer Science
and Engineering.

Signature of the Candidate


Roll Number

Place: Meerpet

Date: DD/MM/YYYY

Note: Take printout of this page on college letter head


CERTIFICATE

This is to certify that the main project report entitled PROJECT TITLE, being submit-
ted by Mr./Ms.Student Name, bearing ROLL.NO:.XXK9XA05XX in partial fulfillment of
requirements for the award of degree of Bachelor of Technology in Computer Science and
Engineering, to the TKR College of Engineering and Technology is a record of bonafide
work carried out by him/her under my guidance and supervision.

Name and Signature of the Guide HoD


(Dr.A.Suresh Rao)

Place: Meerpet
Date: DD/MM/YYYY

Note: Take printout of this page on college letter head


TABLE OF CONTENTS

ABSTRACT
ACKNOWLEDGEMENT
LIST OF FIGURES
LIST OF TABLES
LIST OF SYMBOLS AND ABBREVIATIONS

1 INTRODUCTION
Motivation
Problem definition
Limitations of existing system
Proposed system

2 LITERATURE REVIEW
Review of Literature

3 REQUIREMENTS ANALYSIS
Functional Requirements
Non-Functional Requirements

4 DESIGN
DFDs and UML diagrams
Algorithm
Sample Data

5 CODING
Pseudo Code
6 IMPLEMENTATION and RESULTS
Explanation of Key functions
Meth of Implementation
Forms
Output Screens
Result Analysis
7 TESTING and VALIDATION
Design of Test Cases and Scenarios
Validation
Use of Abbreviations

8 CONCLUSION
REFERENCES
ABSTRACT

Fraudsters are now more active in their attacks on credit card transactions than ever before.
With the advancement in data science and machine learning, various algorithms have been
developed to determine whether a transaction is fraudulent. Which is where a machine learning
model comes in handy and allows the banks and major financial institutions to predict whether
the customer, they are giving the loan to, will default or not.
We study the performance of three different machine learning models: Random forest, and
KNN(k-nearest neighbors) to classify, predict, and detect fraudulent credit card transactions.
We compare these models’ performance and show that KNN (k-nearest neighbors) produces a
maximum accuracy ,in predicting and detecting fraudulent credit card transactions. Thus, we
recommend random forest as the most appropriate machine learning algorithm for predicting
and detecting fraud in credit card transactions. Credit Card holders above 60 years were found
to be mostly victims of these fraudulent transactions,

Keywords: Random forest


KNN (k-nearest neighbors)
Credit card
Fraud detection and prediction

i
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompanies the successful completion of any task
would be incomplete without the mention of the people who made it possible and whose
encouragement and guidance have crowned my efforts with success.

I am indebted to the Internal Guide, Dr./Mr./Ms. Name of Guide, Designation,


Dept. of Computer Science and Engineering, TKR College of Engineering and Technology,
for his/her support and guidance throughout my Thesis/Dissertation.

I am also indebted to the Head of the Department, Dr. A. Suresh Rao, Professor,
Computer Science and Engineering, TKR College of Engineering and Technology, for his
support and guidance throughout my Thesis/Dissertation.

I extend my deep sense of gratitude to the Principal, Dr. D. V. Ravi Shankar, TKR
College of Engineering and Technology, for permitting me to undertake this Thesis/Dissertation.

Finally, I express my thanks to one and all that have helped me in successfully
completing this Thesis/Dissertation. Furthermore, I would like to thank my family and
friends for their moral support and encouragement

NAME OF THE STUDENT


Roll Number

Place: Meerpet

Date: D/MM/YYYY

ii
LIST OF FIGURES

Pictures number Picture name Page number

iii
LIST OF TABLES

Table number Table name Page number

iv
1
Chapter 1

INTRODUCTION

Banks used to provide only in-person services to customers until 1996 when the first internet
banking application was introduced in the United States of America by Citibank and Wells Forgo
Bank . After the introduction of internet banking, the use of credit cards over the internet was
adopted. This has increased rapidly during the past decade and services like e-commerce, online
payment systems, working from home, online banking, and social networking have also been
introduced and widely used . Due to this, fraudsters have intensified their efforts to target online
transactions utilizing various payment systems .
In recent times, improvements in digital technologies, particularly for cash transactions, have
changed the way people manage money in their daily activities. Many payment systems have
transitioned tremendously from physical pay points to digital platforms . To sustain productivity
and competitive advantage, the use of technology in digital transactions has been a game-
changer and many economics have resorted to it. Hence, internet banking and other online
transactions have been a convenient avenue for customers to carry out their financial and other
banking transactions from the comfort of their homes or offices, particularly through the use of
credit cards
According to a credit card is designed as a piece of plastic with personal information
incorporated and issued by financial service providers to enable customers to purchase goods and
services at their convenience worldwide. The unlawful use of another person’s credit card to get
money or property either physically or digitally is known as credit card fraud . Events involving
credit card fraud occur often end in enormous financial losses . It is simpler to commit fraud now
than it was in the past because an online transaction environment does not require the actual card
and the card’s information suffices to complete a payment , postulate that monetary policy as
well as business plans and methods used by big and small businesses alike have been imparted
by the introduction of credit cards.
transactions as either fraudulent or not fraudulent.

The Bank of Ghana (BoG) reported an estimated loss value of GH¢ 1.26 million ($250,000) in
2019 due to credit card fraud which increased to GH¢ 8.20 million ($1.46 million) in 2020,
(BoG, 2021). This represented an estimated 548.0% increase in losses in year-toyear terms. All
payment channels have experienced persistent increases in fraud in recent years, with digital
transactions seeing the largest rise . One such instance is payment fraud, which includes checks,
deposits, person-to-person (P2P), wire transfers, automated clearing, house transfers, internet
payments, automated bill payments, debit and credit card transactions, and Automated Teller
Machine (ATM) transactions.
Following similar patterns, compliance and risk management services employed to identify
online fraud have shown a lot of interest in AI and machine learning models Some of these
models include Random Forest, k-nearest neighbors. This has become necessary because credit
card fraud detection is a classification and prediction problem. Supervised machine learning
models have been proved as the best models to detect fraud using the above-mentioned
algorithms . This study therefore seeks to compare three classification and prediction techniques,
namely; Rrandom Forest, k-nearest neighbors and in classifying and predicting financial
Transactions Credit acceptance remains a challenge for moneylenders as it is difficult to forecast
2
whether consumers pose an acceptable credit risk and should be granted credit. This is
especially true in emerging nations, where established rules and models from industrialized
nations may not apply. Therefore, productive methods for automatic credit approval that can aid
bankers in analysing consumer credit must be investigated.
Each bank receives tens of thousands of credit card applications each month. Banks have to
manually skim through each of these applications, while paying close attention to these factors to
determine whether the applicant is to be granted a credit card
or not. Due to the time-intensive nature of this activity and the growing likelihood of error as the
number of applications increases, banks are seeking prediction-based algorithms that can do this
task effectively and accurately.

3
PROPOSED SYSTEM :

1) Dataset: The dataset has been taken from Kaggle’s Credit Card Approval Prediction
page. We have merged two datasets containing application and credit records of the
applicants on primary key 'ID'. After merger, our columns contain variety of information
of the applicant through which the lending corporation can easily take a decision whether
to lend out to a particular candidate.
2) Pre-processing: The dataset had column names in the camel case format which we
converted into more readable format.
3) In addition, we integrating an extra K-Nearest Neighbors (KNN) algorithm into a
proposed system can offer several advantages, it's essential to consider factors such as
feature selection, distance metric choice, handling of missing values, and model
evaluation techniques to ensure optimal performance.

EXISTING SYSTEM:

Compares the prediction accuracy of : logistic regression, random forest, and decision trees
Classifier in the credit cardapproval process, with the Balanced Accuracy as the
performance criteria. The dataset contains 2 types of features, numerical and categorical.
Some of them include debt, age, income, education, income, etc. Based on the model
implementation, Random forest, has showcased the best prediction performance among
the models, with a Balanced Accuracy of around 98.9 %. However, the performance for
each model would fluctuate slightly depending on the data processing, parameter tuning
process and data features. One of the limitations to this paper is that further
comprehensive factors such as the computational efficiency, reject inference and outlier
handling to assess the prediction performance are not included

4
Chapter 2

LITERATURE REVIEW

Literature Review Student Reference Manual.

2.1 A supervised machine learning algorithm for detecting and


predicting fraud in credit card transactions
Logistic regression is a technique that is used to predict an outcome variable that is binary. This
technique does not demand that explanatory variables follow normal distribution, or correlated .
The outcome variable in Logistic Regression models is qualitative. Explanatory variables might
take the shape of numbers or categories. Numerous scholars have used Logistic Regression to
detect financial bankruptcies.
Decision tree is a non-linear classification technique that divides a sample into increasingly
smaller subgroups using a collection of explanatory variables. At each branch of the tree, the
process iteratively chooses the explanatory variable that, in accordance with a predetermined
criterion, has the strongest correlation with the outcome variable . It is nonparametric, therefore
there is no requirement to choose unimodal training data and it is simple to add a variety of
quantitative or qualitative data structures. However, when applied to the entire data set, decision
trees have a tendency to overfit the training data, which might produce bad results. Decision
trees can be used to filter spam emails and also to predict the kind of persons who will be
vulnerable to a certain virus in the area of medicine.
Random forests, which proposed, are an additional level of randomness for bagging. Random
forests alter how the classification or regression trees are built, in addition to employing various
bootstrap samples of the data for each tree’s construction. In conventional trees, the optimal split
among all variables is used to divide each node. Each node in a random forest is split using the
best predictor among a subset that was randomly selected at that node. The average of all trees’
predictions is then the output for any location. The random forest package in R was used to
create bagging and random forest models . Each feature’s significance in relation to the training
data set can be measured. However, Random forests are biased towards attributes with several
levels for data including qualitative variables with differing number of levels. Random forest can
be applied as follows; complex biological data analysis in the area of Bioinformatics,
segmentation of video and classification of image for pixel analysis.
The categories of credit card fraud recognized by are bankruptcy fraud, counterfeit fraud,
application fraud, and behavioural fraud. Depending on the sort of fraud that banks or credit card
companies are dealing with, several precautions can be created and implemented. For identifying
fraudulent transactions in other jurisdictions, machine learning methods like Logistic Regression,
Naive Bayes, Random Forest, K Nearest Neighbour, Gradient Boosting, Support Vector
Machines, and neural network algorithms were used by . To choose the top features for the
model, they used feature importance approach and reported an accuracy of 95.9% with Gradient
Boosting performed better than the other algorithms.
A machine learning-based technique for identifying credit card fraud was developed by for the
application of hybrid models with Ada Boost and majority voting strategies. They added noise of
around 10% and 30% to their hybrid models to facilitate the approach. A good score of 0.942
was awarded to multiple voting approaches based on data from the sample for 30% more noise.
As a result, they settled on the voting system as the most effective technique in the presence of
noise. proposed two different types of random forests which were used to teach the behavioural
5
characteristics of typical and abnormal transactions. The study looks at how well these two
random forests, in terms of their classifiers, perform in identifying credit card fraud. Data from a
Chinese e-commerce company was utilized to analyse the performance of these two different
random forest models. According to other study findings, even if the suggested random forests
perform well on small datasets, other problems, such as imbalanced data, prevent them from
being as effective as other datasets. researched on practical methods for detecting credit card
fraud which affects financial institutions. Different machine learning algorithms were employed
and were able to determine the best algorithm that predicted fraudulent transactions. Two
resampling methods (under sampling and over sampling) were used to train the algorithms. Out
of the many algorithms trained, the best models for predicting credit card fraud were found to be
Random Forest, Xgboost, and Decision Tree, with AUC values of 1.00%, 0.99%, and 0.99%,
respectively.
Machine learning algorithms can help detect fraudulent transactions, classify them, and
probably stop the transaction process if required. Credit card fraud detection prognosis
consists of modelling past credit card transactions, which have records of transactions that are
fraudulent, after which the model will be used on new transactions to detect if it is a genuine
or fraudulent transaction

6
Chapter 3

REQUIREMENTS ANALYSIS

REQUIREMENT ANALYSIS
The project involved analyzing the design of few applications so as to make
the application more users friendly. To do so, it was really important to keep the
navigations from one screen to the other well ordered and at the same time
reducing the amount of typing the user needs to do. In order to make the
application more accessible, the browser version had to be chosen so that it is
compatible with most of the Browsers.

REQUIREMENT SPECIFICATION

Functional Requirements

Data Collection and Preprocessing:


– Data Sources: Specify the sources of transaction data, including historical and
real-time data.
– Data Cleaning: Define processes for handling missing data, outliers, and
inconsistencies.
– Feature Engineering: Specify the features relevant to fraud detection, such as
transaction amount, location, time, etc.
Model Training:
– Algorithm Selection: Identify and justify the choice of a specific supervised
machine learning algorithm (e.g., logistic regression, decision trees, random
forests, etc.).
– Training Set: Define the criteria for selecting and splitting the dataset into training
and validation sets.
– Model Evaluation: Establish metrics (precision, recall, F1 score, ROC-AUC, etc.)
for assessing model performance.

 Graphical User interface with the User.

Non Functional Requirements:

Security:
Data Encryption: All sensitive data transmitted between the user, merchant, and financial
institutions should be encrypted using secure protocols to prevent interception and tampering.
Access Control: Implement stringent access controls to ensure that only authorized personnel
can access sensitive systems and data related to credit card transactions.
Authentication: Employ multi-factor authentication mechanisms to verify the identity of users
and mitigate unauthorized access.

7
Reliability:
System Availability: Ensure high availability of the credit card transaction system to minimize
downtime and ensure that legitimate transactions can be processed without interruption.
Fault Tolerance: Implement redundancy and failover mechanisms to ensure that the system can
continue to operate in the event of hardware or software failures.
Data Integrity: Guarantee the integrity of transaction data throughout the entire process to
prevent unauthorized modifications or tampering.

Performance:
Response Time: Ensure that transaction processing times meet acceptable thresholds to provide a
seamless user experience and minimize delays for both merchants and customers.
Throughput: Optimize system performance to support a high volume of concurrent transactions
without degradation in speed or reliability.

Scalability:
Growth Planning: Ensure that the system architecture is designed to accommodate future growth
in transaction volume and user base without compromising performance or security.

Software Requirements
For developing the application the following are the Software Requirements:
1. Python
Operating Systems supported
1. Windows 7
2. Windows XP
3. Windows 8

Technologies and Languages used to Develop


1. Python
2. Flask

Debugger and Emulator


 Any Browser (Particularly Chrome)

Hardware Requirements
For developing the application the following are the Hardware Requirements:
 Processor: Pentium IV or higher
 RAM: 256 MB
 Space on Hard Disk: minimum 512MB

8
Chapter 4

DESIGN

DFDs and UML diagrams


Algorithm

Random forest algorithm

Random Forest is a supervised machine learning algorithm that uses a group of


decision tree models for classification and making predictions . Each decision tree
is a weak learner because they have a low predictive power. It is based on
ensemble learning, which uses many decision tree classifiers to classify a problem
and improve the accuracy of the model . As a result, the random forest employs a
bagging method to generate a forest of decision trees. Given a dataset (𝑋,𝑌 ) with
𝑁 total observation where 𝑋 being the predictor variables and 𝑌 the outcome
variable, the random forest algorithm first creates 𝐾𝑖 random variables (𝑖 = 1,2,
…,𝑁) to form a vector and then converts
each 𝐾𝑖 random vector into a decision tree to obtain the 𝑑𝐾𝑖 decision tree
(𝑑𝐾1(𝑋),𝑑𝐾2(𝑋),…,𝑑𝐾𝑁(𝑋)).

Accuracy,
As a measurement metric, measures the ratio of the total number of correct
predictions of fraud to the total number of predictions (both fraud and not fraud)
made by the model [43]. It is calculated as
𝑇𝑁 + 𝑇𝑃
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑁 + 𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃

KNN: (k-nearest neighbors algorithm ):


The k-nearest neighbors algorithm (k-NN) is a non-parametric method used
for classification and regression. [1] In both cases, the input consists of
the k closest training examples in the feature space. The output depends on
whether k-NN is used for classification or regression.
 In k-NN classification, the output is a class membership. An object is classified
by a plurality vote of its neighbors, with the object being assigned to the class most

9
common among its k nearest neighbors (k is a positive integer, typically
small).If k = 1, then the object is simply assigned to the class of that single nearest
neighbor.
 In k-NN regression, the output is the property value for the object. This value is
the average of the values of its k nearest neighbors.

k-NN is a type of instance-based learning, or lazy learning, where the function is


only approximated locally and all computation is deferred until classification.
The k-NN algorithm is among the simplest of all machine learning algorithms.
Both for classification and regression, a useful technique can be used to assign
weight to the contributions of the neighbors, so that the nearer neighbors contribute
more to the average than the more distant ones. For example, a common weighting
scheme consists in giving each neighbor a weight of 1/d, where d is the distance to
the neighbor. The neighbors are taken from a set of objects for which the class
(for k-NN classification) or the object property value (for k-NN regression) is
known. This can be thought of as the training set for the algorithm, though no
explicit training step is required.
A peculiarity of the k-NN algorithm is that it is sensitive to the local structure of
the data.

Accuracy,
As a measurement metric, measures the ratio of the total number of correct
predictions of fraud to the total number of predictions (both fraud and not fraud)
made by the model [43]. It is calculated as
𝑇𝑁 + 𝑇𝑃
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑁 + 𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃

10
ARCHITECTURE

UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized


general-purpose modeling language in the field of object-oriented software
engineering. The standard is managed, and was created by, the Object
Management Group.

11
The goal is for UML to become a common language for creating models of
object oriented computer software. In its current form UML is comprised of two
major components: a Meta-model and a notation. In the future, some form of
method or process may also be added to; or associated with, UML.

The Unified Modeling Language is a standard language for specifying,


Visualization, Constructing and documenting the artifacts of software system, as
well as for business modeling and other non-software systems.

The UML represents a collection of best engineering practices that have


proven successful in the modeling of large and complex systems.

The UML is a very important part of developing objects oriented software


and the software development process. The UML uses mostly graphical notations
to express the design of software projects.

GOALS:

The Primary goals in the design of the UML are as follows:


1. Provide users a ready-to-use, expressive visual modeling Language so that
they can develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core
concepts.
3. Be independent of particular programming languages and development
process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations,
frameworks, patterns and components.
7. Integrate best practices.

12
CLASS DIAGRAM:
In software engineering, a class diagram in the Unified Modeling Language
(UML) is a type of static structure diagram that describes the structure of a system
by showing the system's classes, their attributes, operations (or methods), and the
relationships among the classes. It explains which class contains information.

USE CASE DIAGRAM:


A use case diagram in the Unified Modeling Language (UML) is a type of
behavioral diagram defined by and created from a Use-case analysis. Its purpose is
to present a graphical overview of the functionality provided by a system in terms
of actors, their goals (represented as use cases), and any dependencies between
those use cases. The main purpose of a use case diagram is to show what system
functions are performed for which actor. Roles of the actors in the system can be
depicted.

13
SEQUENCE DIAGRAM:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction
diagram that shows how processes operate with one another and in what order. It is
a construct of a Message Sequence Chart. Sequence diagrams are sometimes called
event diagrams, event scenarios, and timing diagrams.

14
ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities
and actions with support for choice, iteration and concurrency. In the Unified
Modeling Language, activity diagrams can be used to describe the business and
operational step-by-step workflows of components in a system. An activity
diagram shows the overall flow of control.

15
Table 4.1
Basic Statistics for the character variables.
Name Cou Uniq Top Freque
nt ue ncy
Transac 5557 5447 2020-12-19 4
tion 19 60 16:02:22
date
and
time
Mercha 5557 693 fraud_Kilbac 1859
nt 19 k LLC
Categor 5557 14 gas_transpor 56370
y 19 t
First 5557 341 Christopher 11443
19
Last 5557 471 Smith 12146
19
Gender 5557 2 F 304886
19
Street 5557 924 444 Robert 1474
19 Mews
City 5557 849 Birmingham 2423
19
State 5557 50 TX 40393
19
Job 5557 478 Film/video 4119
19 editor
Date of 5557 910 1977-03-23 2408
birth 19
Transac 5557 5557 2da90c7d74 1
tion 19 19 bd46a
number

16
Table 4.2
Basic Statistics for the numeric variables.
Name Count Mean Std Min 25% 50% 75% Max
Unique identifier 555719 277859 160422.4 0 138929.5 277859 416788.5 555718
Credit card number of customers 555719 4178387 1309837 6041621 1800429 3521417 4635331 4992346
Amount 555719 69.39 156.75 1 9.63 47.29 83.01 22768.11
Zip 555719 48842.63 26855.28 1257 26292 48174 72011 99921
Latitude 555719 38.54 5.061 20.03 34.67 39.37 41.89 65.69
Longitude 555719 −90.23 13.72 −165.67 −96.8 −87.48 −80.18 −67.95
City population 555719 88221.89 300390.9 23 741 2408 19685 2906700
Time (s) 555719 1380679 5201104 1371817 1376029 1380762 1385867 1388534
Merchant latitude 555719 38.54 5.1 19.03 34.76 39.38 41.95 66.68
Merchant longitude 555719 −90.23 13.73 −166.67 −96.91 −87.45 −80.27 −66.95
Fraud status 555719 0.0039 0.062 0 0 0 0 1

17
Chapter 5

CODING

from flask import


Flask,redirect,url_for,render_template,request,redirect,session
import os
from werkzeug.utils import secure_filename
from flask import Flask,render_template,url_for,request
import pandas as pd
import tkinter
from tkinter import messagebox
from tkinter import *
from tkinter import simpledialog
import tkinter as tk
from tkinter import filedialog
import matplotlib.pyplot as plt
import numpy as np
from tkinter.filedialog import askopenfilename
import numpy as np
import pandas as pd
from sklearn import *
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

UPLOAD_FOLDER = os.path.join('static', 'images')


app=Flask(__name__)
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.secret_key = 'This is your secret key to utilize session in Flask'
@app.route("/")
def main():
return render_template("main.html")
def logout():
return redirect(url_for("main"))
@app.route("/passhome")
def passhome():

main = tkinter.Tk()
main.title("Machine Learning Algorithm for detecting and Predicting Fraud in
Credit Card Transactions") #designing main screen
main.geometry("1300x1200")

18
global filename
global cls
global X, Y, X_train, X_test, y_train, y_test
global random_acc # all global variables names define in above lines
global clean
global attack
global total

def traintest(train): #method to generate test and train data from


dataset
X = train.values[:, 0:29]
Y = train.values[:, 30]
print(X)
print(Y)
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size = 0.3, random_state = 0)
return X, Y, X_train, X_test, y_train, y_test

def generateModel(): #method to read dataset values which contains all five
features data
global X, Y, X_train, X_test, y_train, y_test
train = pd.read_csv(filename)
X, Y, X_train, X_test, y_train, y_test = traintest(train)
text.insert(END,"Train & Test Model Generated\n\n")
text.insert(END,"Total Dataset Size : "+str(len(train))+"\n")
text.insert(END,"Split Training Size : "+str(len(X_train))+"\n")
text.insert(END,"Split Test Size : "+str(len(X_test))+"\n")

def upload(): #function to upload tweeter profile


global filename
filename = filedialog.askopenfilename(initialdir="dataset")
text.delete('1.0', END)
text.insert(END,filename+" loaded\n");

def prediction(X_test, cls): #prediction done here


y_pred = cls.predict(X_test)
for i in range(50):
print("X=%s, Predicted=%s" % (X_test[i], y_pred[i]))
return y_pred

# Function to calculate accuracy


def cal_accuracy(y_test, y_pred, details):
accuracy = accuracy_score(y_test,y_pred)*100
text.insert(END,details+"\n\n")
text.insert(END,"Accuracy : "+str(accuracy)+"\n\n")
return accuracy

19
def runRandomForest():
headers =
["Time","V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V1
4","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27"
,"V28","Amount","Class"]
global random_acc
global cls
global X, Y, X_train, X_test, y_train, y_test
cls =
RandomForestClassifier(n_estimators=50,max_depth=2,random_state=0,class_weight='
balanced')
cls.fit(X_train, y_train)
text.insert(END,"Prediction Results\n\n")
prediction_data = prediction(X_test, cls)
random_acc = cal_accuracy(y_test, prediction_data,'Random Forest
Accuracy')
#str_tree = export_graphviz(cls, out_file=None,
feature_names=headers,filled=True, special_characters=True, rotate=True,
precision=0.6)
#display.display(str_tree)
def runKNeighborsClassifier():
headers =
["Time","V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V1
4","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27"
,"V28","Amount","Class"]
global random_acc
global cls
global X, Y, X_train, X_test, y_train, y_test
cls = KNeighborsClassifier(n_neighbors=50, weights='distance',
algorithm='auto', p=2, metric='minkowski')
cls.fit(X_train, y_train)
text.insert(END,"Prediction Results\n\n")
prediction_data = prediction(X_test, cls)
random_acc = cal_accuracy(y_test, prediction_data,'knn Accuracy')
#str_tree = export_graphviz(cls, out_file=None,
feature_names=headers,filled=True, special_characters=True, rotate=True,
precision=0.6)
#display.display(str_tree)

def predicts():
global clean
global attack
global total
clean = 0;
attack = 0;
text.delete('1.0', END)

20
filename = filedialog.askopenfilename(initialdir="dataset")
test = pd.read_csv(filename)
test = test.values[:, 0:29]
total = len(test)
text.insert(END,filename+" test file loaded\n");
y_pred = cls.predict(test)
for i in range(len(test)):
if str(y_pred[i]) == '1.0':
attack = attack + 1
text.insert(END,"X=%s, Predicted = %s" % (test[i], 'Contains
Fraud Transaction Signature')+"\n\n")
else:
clean = clean + 1
text.insert(END,"X=%s, Predicted = %s" % (test[i], 'Transaction
Contains Cleaned Signatures')+"\n\n")

def graph():
height = [total,clean,attack]
bars = ('Total Transactions','Normal Transaction','Fraud Transaction')
y_pos = np.arange(len(bars))
plt.bar(y_pos, height)
plt.xticks(y_pos, bars)
plt.show()

font = ('times', 16, 'bold')


title = Label(main, text='Machine Learning Algorithm for detecting and
Predicting Fraud in Credit Card Transactions')
title.config(bg='greenyellow', fg='dodger blue')
title.config(font=font)
title.config(height=3, width=120)
title.place(x=0,y=5)

font1 = ('times', 12, 'bold')


text=Text(main,height=20,width=150)
scroll=Scrollbar(text)
text.configure(yscrollcommand=scroll.set)
text.place(x=50,y=120)
text.config(font=font1)

font1 = ('times', 14, 'bold')


uploadButton = Button(main, text="Upload Credit Card Dataset",
command=upload)
uploadButton.place(x=50,y=550)
uploadButton.config(font=font1)

modelButton = Button(main, text="Generate Train & Test Model",


command=generateModel)
modelButton.place(x=350,y=550)

21
modelButton.config(font=font1)

runrandomButton = Button(main, text="Run Random Forest Algorithm",


command=runRandomForest)
runrandomButton.place(x=650,y=550)
runrandomButton.config(font=font1)

runrandomButton = Button(main, text=" Run Knn Algorithm",


command=runKNeighborsClassifier)
runrandomButton.place(x=950,y=550)
runrandomButton.config(font=font1)

predictButton = Button(main, text="Detect Fraud From Test Data",


command=predicts)
predictButton.place(x=50,y=600)
predictButton.config(font=font1)

graphButton = Button(main, text="Clean & Fraud Transaction Detection Graph",


command=graph)
graphButton.place(x=350,y=600)
graphButton.config(font=font1)

exitButton = Button(main, text="Exit", command=exit)


exitButton.place(x=770,y=600)
exitButton.config(font=font1)

main.config(bg='LightSkyBlue')
main.mainloop()

@app.route("/about")
def about():
return render_template('about.html')
@app.route("/contact")
def contact():
return render_template('contact.html')
@app.route("/register",methods=["POST","GET"])
def register():

if request.method=="POST":
name=request.form["firstname"]
lname=request.form["lastname"]
uemail=request.form["email"]
Password=request.form["password"]
print(name,lname,uemail,Password)
import sqlite3
con=sqlite3.connect("test.db")
cur=con.cursor()
table="CREATE TABLE if not exists user (name varchar(255),lastname
varchar(255),email varchar(255),password varchar(255))"

22
cur.execute(table)
a=f"select email from user where email='{uemail}'"
cur.execute(a)
result=cur.fetchone()
if result!=None:
return "email alredy registered"
else:
#a="create table emp(name varchar(100),lastname varchar(100),email
varchar(100),password varchar(100))"
cur.execute("INSERT INTO user('name', 'lastname', 'email',
'password') VALUES (?,?,?,?)",(name,lname,uemail,Password))
con.commit()
con.close()
return "successfully registered"
return render_template("register.html")

@app.route("/login",methods=["POST","GET"])
def login():
if request.method=="POST":
uemail=request.form["email"]
upassword=request.form["password"]
print(uemail,upassword)
import sqlite3
con=sqlite3.connect("test.db")
cur=con.cursor()
a = "SELECT *FROM user WHERE email='"+uemail+"' AND
password='"+upassword+"'"
cur.execute(a)
result=cur.fetchone()
print("database",result)
if result==None:
return "enter valid details"
else:
return redirect(url_for("passhome"))
return render_template("login.html")

@app.route("/logout")
def logout():
return redirect(url_for("main"))

@app.route("/home",methods=["POST","GET"])
def home():
return render_template('home.html')

##########

if __name__ == '__main__':

23
#DEBUG is SET to TRUE. CHANGE FOR PROD
app.run(debug=True)

Pseudo Code

1. Load dataset of credit card transactions

2. Preprocess data:
- Remove irrelevant features ( customer name, transaction ID)
- Handle missing values ( imputation, removal)
- Normalize/standardize features ( scaling)

3. Split the dataset into training and testing sets:


- Training set ( 70% of data)
- Testing set ( 30% of data)

4. Train a machine learning model:


- Choose an appropriate algorithm ( Random forest, Knn)
- Train the model using the training data

5. Evaluate the model:


- Predict fraud labels for the testing set
- Calculate performance metrics (Accuracy)
- Analyze confusion matrix to understand true positives, false positives, true negatives, and
false negatives

6. Tune model parameters (if necessary):


- Adjust hyperparameters to improve performance (e.g., grid search, random search, cross-
validation)

7. Deploy the model:


- Integrate the model into a production environment
- Monitor model performance over time
- Retrain model periodically with new data

24
Graph:

25
MODULES

1.Load Dataset:

Load data set using pandas read_csv() method. Here we will read the excel
sheet data and store into a variable.

2.Split Data Set:

Split the data set to two types. One is train data test and another one is test
data set.here we will remove missing values from the dataset.

3.Train data set:

Train data set will train our data set using fit method. 80% of data from
dataset we use for training the algorithm.

4.Test data set:

Test data set will test the data set using algorithm. 20% of data from dataset
we use for testing the algorithm.

5.Predict data set:

Predict() method will predict the results. In this step we will predict the
ranking of the google play store app.

26
Chapter 6

IMPLEMENTATION and RESULTS

Explanation of Key functions


Method of Implementation
Forms:

Index page:

27
Register Page:

Login Page:

28
Screen:

In above screen click on ‘Upload Credit Card Dataset’ button to upload


dataset

29
After uploading dataset will get below screen
Now click on ‘Generate Train & Test Model’ to generate training model for
Random Forest Classifier

In above screen after generating model we can see total records available in
dataset and then application using how many records for training and how
many for testing. Now click on “Run Random Forest Algorithm’ button to
generate Random Forest model on train and test data and on “KNN
Algorithm’ button to generate KNN model on train and test data

In above screen we can see Random Forest generate 99.78% percent accuracy
And KNN 99.83% percent accuracy while building model on train and test data.
Now click on ‘Detect Fraud From Test Data’ button to upload test data and to
predict whether test data contains normal or fraud transaction

In above screen I am uploading test dataset and after uploading test data will
30
get below prediction details

In above screen beside each test data application will display output as
whether transaction contains cleaned or fraud signatures. Now click on
‘Clean & Fraud Transaction Detection Graph’ button to see total test
transaction with clean and fraud signature in graphical format. See below
screen

Result Analysis:

31
In above graph we can see total test data and number of normal and fraud
transaction detected. In above graph x-axis represents type and y-axis represents
count of clean and fraud transaction

32
Chapter 7

TESTING and VALIDATION

Testing is a procedure, which uncovers blunders in the program.


Programming testing is a basic component of programming quality affirmation and
speaks to a definitive audit of determination, outline and coding. The expanding
perceivability of programming as a framework component and chaperon costs
related with a product disappointment are propelling variables for we arranged,
through testing. Testing is the way toward executing a program with the plan of
finding a mistake. The plan of tests for programming and other built items can be
as trying as the underlying outline of the item itself It is the significant quality
measure utilized amid programming improvement. Amid testing, the program is
executed with an arrangement of experiments and the yield of the program for the
experiments is assessed to decide whether the program is executing as it is relied
upon to perform.
7.2TESTING STRATEGIES
A technique for programming testing coordinates the outline of
programming experiments into an all around arranged arrangement of steps that
outcome in fruitful improvement of the product. The procedure gives a guide that
portrays the means to be taken, when, and how much exertion, time, and assets
will be required. The procedure joins test arranging, experiment configuration, test
execution, and test outcome gathering and assessment. The procedure gives
direction to the specialist and an arrangement of points of reference for the chief.
Due to time weights, advance must be quantifiable and issues must surface as
ahead of schedule as would be prudent
Keeping in mind the end goal to ensure that the framework does not have
blunders, the distinctive levels of testing techniques that are connected at varying
periods of programming improvement are:
Unit Testing

33
Unit Testing is done on singular modules as they are finished and turned out to be
executable. It is restricted just to the planner's prerequisites. It centers testing
around the capacity or programming module. It Concentrates on the interior
preparing rationale and information structures. It is rearranged when a module is
composed with high union
• Reduces the quantity of experiments
• Allows mistakes to be all the more effectively anticipated and
revealed
Black Box Testing
It is otherwise called Functional testing. A product testing strategy whereby
the inward workings of the thing being tried are not known by the analyzer. For
instance, in a discovery test on a product outline the analyzer just knows the
information sources and what the normal results ought to be and not how the
program touches base at those yields. The analyzer does not ever inspect the
programming code and does not require any further learning of the program other
than its determinations. In this system some experiments are produced as
information conditions that completely execute every single practical prerequisite
for the program. This testing has been utilizations to discover mistakes in the
accompanying classifications:
\
• Incorrect or missing capacities
• Interface blunders
• Errors in information structure or outside database get to
• Performance blunders
• Initialization and end blunders.
In this testing just the yield is checked for rightness.
White Box testing

It is otherwise called Glass box, Structural, Clear box and Open box testing
. A product testing procedure whereby express learning of the inner workings of
the thing being tried are utilized to choose the test information. Not at all like
34
discovery testing, white box testing utilizes particular learning of programming
code to inspect yields. The test is precise just if the analyzer comprehends what the
program should do. He or she would then be able to check whether the program
veers from its expected objective. White box testing does not represent blunders
caused by oversight, and all obvious code should likewise be discernable. For an
entire programming examination, both white box and discovery tests are required.

In this the experiments are produced on the rationale of every module by


drawing stream diagrams of that module and sensible choices are tried on every
one of the cases. It has been utilizations to produce the experiments in the
accompanying cases:

• Guarantee that every single free way have been Executed.

• Execute every single intelligent choice on their actual and false Sides.
Integration Testing
Coordination testing guarantees that product and subsystems cooperate an
entirety. It tests the interface of the considerable number of modules to ensure that
the modules carry on legitimately when coordinated together. It is characterized as
a deliberate procedure for developing the product engineering. In the meantime
reconciliation is happening, lead tests to reveal blunders related with interfaces. Its
Objective is to take unit tried modules and assemble a program structure in view of
the recommended outline
Two Approaches of Integration Testing
• Non-incremental Integration Testing
• Incremental Integration Testing
System Testing

Framework testing includes in-house testing of the whole framework before


conveyance to the client. Its point is to fulfill the client the framework meets all
necessities of the customer's determinations. This testing assesses working of

35
framework from client perspective, with the assistance of particular report. It
doesn't require any inward learning of framework like plan or structure of code.

It contains utilitarian and non-useful zones of utilization/item. Framework


Testing is known as a super arrangement of a wide range of testing as all the
significant sorts of testing are shrouded in it. In spite of the fact that attention on
sorts of testing may differ on the premise of item, association procedures, course of
events and necessities. Framework Testing is the start of genuine testing where you
test an item all in all and not a module/highlight.

Acceptance Testing
Acknowledgment testing, a testing method performed to decide if the
product framework has met the prerequisite particulars. The principle motivation
behind this test is to assess the framework's consistence with the business
necessities and check in the event that it is has met the required criteria for
conveyance to end clients. It is a pre-conveyance testing in which whole
framework is tried at customer's site on genuine information to discover blunders.
The acknowledgment test bodies of evidence are executed against the test
information or utilizing an acknowledgment test content and afterward the
outcomes are contrasted and the normal ones.
The acknowledgment test exercises are completed in stages. Right off the
bat, the essential tests are executed, and if the test outcomes are palatable then the
execution of more intricate situations are done.
7.3 TEST APPROACH
A Test approach is the test system usage of a venture, characterizes how
testing would be done. The decision of test methodologies or test technique is a
standout amongst the most intense factor in the achievement of the test exertion
and the precision of the test designs and gauges.
Testing should be possible in two ways
• Bottom up approach
• Top down approach
36
Bottom up Approach
Testing can be performed beginning from littlest and most reduced level
modules and continuing each one in turn. In this approach testing is directed from
sub module to primary module, if the fundamental module is not built up a
transitory program called DRIVERS is utilized to recreate the principle module. At
the point when base level modules are tried consideration swings to those on the
following level that utilization the lower level ones they are tried exclusively and
afterward connected with the already inspected bring down level modules
Top down Approach
In this approach testing is directed from fundamental module to sub module.
in the event that the sub module is not built up an impermanent program called
STUB is utilized for mimic the sub module. This sort of testing begins from upper
level modules. Since the nitty gritty exercises more often than not performed in the
lower level schedules are not given stubs are composed. A stub is a module shell
called by upper level module and that when achieved legitimately will restore a
message to the calling module demonstrating that appropriate association
happened.
7.4 VALIDATION
The way toward assessing programming amid the improvement procedure or
toward the finish of the advancement procedure to decide if it fulfills determined
business prerequisites. Approval Testing guarantees that the item really addresses
the customer's issues. It can likewise be characterized as to exhibit that the item
satisfies its proposed utilize when sent on proper condition.
The framework has been tried and actualized effectively and along these
lines guaranteed that every one of the prerequisites as recorded in the product
necessities determination are totally satisfied.
7.5 Test Cases
Experiments include an arrangement of steps, conditions and sources of info
that can be utilized while performing testing undertakings. The principle
expectation of this action is to guarantee whether a product passes or bombs as far

37
as usefulness and different perspectives. The way toward creating experiments can
likewise help discover issues in the prerequisites or plan of an application.
Experiment goes about as the beginning stage for the test execution, and in the
wake of applying an arrangement of information esteems, the application has a
conclusive result and leaves the framework at some end point or otherwise called
execution post condition.

38
Chapter 8

CONCLUSION

In this project, we have mentioned various machine learning methods to predict whether a credit
card will be approved for an individual or not. Several parameters were taken into consideration
as these parameters make the model more effective and help institutions make better decisions to
avoid fraud and losses. We applied a lot of data pre-processing techniques as good amount of
data pre-processing contributes effectively to developing better performance of traditional
machine learning models. During Exploratory Data Analysis, we plotted a lot of graphs and
charts to study the dataset deeply so that we can get a better understanding of the dataset. This
was done so that we can decide which models to apply which can perform well on this dataset
and can correctly predict whether to approve a credit card or not. This prediction system can be
helpful to various banks as it makes their task easier and increases efficiency as compared to the
manual system which is currently used by many banks and this system is cost effective.

The KNN model is performed better than the other model

39
REFERENCES

[1] Alishahi, K., Marvasti, F., Aref, V. and Pad, P. [2009], ‘Bounds on the sum capac-
ity of synchronous binary cdma channels’, IEEE transactions on information theory
55(8), 3577–3593.

[2] Alred, G. J., Brusaw, C. T. and Oliu, W. E. [2019], Handbook of technical writing,
Bedford/St. Martin’s Macmillan Learning.

[3] Babington, P. [1993], The title of the work, Vol. 4 of 10, 3 edn, The name of the publisher,
The address. An optional note.

[4] Caxton, P. [1993], ‘The title of the work’, How it was published, The address of the
publisher. An optional note.

[5] Conley, T. G. and Galenson, D. W. [1998], ‘Nativity and wealth in mid-nineteenth-


century cities’, Journal of Economic History pp. 468–493.

[6] Doan, A., Madhavan, J., Domingos, P. and Halevy, A. [2002], Learning to map between

ontologies on the semantic web, in ‘Proceedings of the 11th international conference on


World Wide Web’, ACM, pp. 662–673.

[7] Doe, R. [2009], ‘This is a test entry of type @ONLINE’.


URL: http://www.test.org/doe/

[8] Draper, P. [1993], The title of the work, in T. editor, ed., ‘The title of the book’, Vol. 4
of 5, The organization, The publisher, The address of the publisher, p. 213. An optional
note.

[9] Duzdevich, D., Redding, S. and Greene, E. C. [2014], ‘Dna dynamics and single-
molecule biology’, Chemical reviews 114(6), 3072–3086.

[10] Farindon, P. [1993], The title of the work, in T. editor, ed., ‘The title of the book’, 3

40
edn, Vol. 4 of 5, The name of the publisher, The address of the publisher, chapter 8,
pp. 201–213. An optional note.

[11] Gainsford, P. [1993], The title of the work, 3 edn, The organization, The address of the
publisher. An optional note.

[12] Ganesh, S., Jayaprakash, A., Mohanaprasad, K. and Sivanantham, S. [2016],


Optimized- fuzzy-logic-based bit loading algorithms, IGI Global.

[13] Gilbarg, D. and Trudinger, N. S. [2015], Elliptic partial differential equations of


second order, Springer Publications.

[14] Harwood, P. [1993], The title of the work, Master’s thesis, The school of the thesis,
The address of the publisher. An optional note.

[15] Haykin, S. [2004], Kalman filtering and neural networks, Vol. 47, John Wiley and Sons.

[16] Haykin, S. [2005], ‘Cognitive radio: brain-empowered wireless communications’,


IEEE journal on selected areas in communications 23(2), 201–220.

[17] Isley, P. [1993], ‘The title of the work’, How it was published. An optional note.

[18] Joslin, P. [1993], The title of the work, PhD thesis, The school of the thesis, The address
of the publisher. An optional note.

[19] Kidwelly, P., ed. [1993], The title of the work, Vol. 4 of 5, The organization, The name
of the publisher, The address of the publisher. An optional note.

[20] Kothari, C. R. [2004], Research methodology: Methods and techniques, New Age
Inter- national Publications.

[21] Marcheford, P. [1993], The title of the work. An optional note.

[22] Neil, W. and David, H. [2016], CMOS VLSI Design, Pearson Education.

[23] Waldron, S. [2008a], ‘Generalized welch bound equality sequences are tight frames’,
IEEE Transactions on Information Theory 49(2), 2307–2309.

41
[24] Waldron, S. [2008b], ‘Ontology learning for the semantic web’, International Journal
of Mathematics 16(2), 72–79.

42

You might also like