0% found this document useful (0 votes)
114 views95 pages

Sample

This document is a project report on developing a machine learning model for online fraud detection. It was submitted by four students to fulfill the requirements of a Bachelor of Technology degree in Computer Science and Engineering. The report includes an introduction describing the problem, existing systems, and proposed system. It also includes sections on literature survey, system requirements analysis, feasibility study, and system design with UML diagrams.

Uploaded by

mohankrishna6040
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views95 pages

Sample

This document is a project report on developing a machine learning model for online fraud detection. It was submitted by four students to fulfill the requirements of a Bachelor of Technology degree in Computer Science and Engineering. The report includes an introduction describing the problem, existing systems, and proposed system. It also includes sections on literature survey, system requirements analysis, feasibility study, and system design with UML diagrams.

Uploaded by

mohankrishna6040
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

A PROJECT REPORT

ON
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION

Submitted in partial fulfillment of the requirements for the award of the degree

Bachelor of Technology
in
COMPUTER SCIENCE & ENGINEERING

Submitted by BATCH-2
AVULAPATI LAHARI 18G01A0509
MANGATI LAHARI 18G01A0549
A.R.KALAI SELVAN 18G01A0503
D.HEMA SAMEERA 18G01A0520
Under the guidance of
Dr. V. Janardhan Babu, M. Tech, Ph. D.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SRI VENKATESA PERUMAL COLLEGE OF ENGINEERING AND


TECHNOLOGY (AUTONOMOUS)

RVS Nagar, KN Road, Puttur, Chittoor(Dist)-517583 Andhra Pradesh.

www.svpcet.org

(2018-2022)

i
Institute Vision and Mission

Vision: To emerge as a Centre of Excellence for Learning and Research in the domains
of Engineering, Technology, Computing and Management.

Mission:

M1: To provide congenial academic ambience with state-of-art resources for learning
and research.

M2: Ignite the students to acquire self-reliance in the latest technologies,

M3: Unleash and encourage the innate potential and creativity of students.

M4: Inculcate confidence to face and experience new challenges, and

M5: Foster enterprising spirit among students work collaboratively with technical
Institutes / Universities / Industries of National and International repute.

CSE Department

Vision: To contribute for the society through excellence in Computer Science and
Engineering with a deep passion for wisdom, culture and values

Mission:

M1: Provide congenial academic ambience with necessary infrastructure and learning
resources.

M2: Inculcate confidence to face and experience new challenges from industry and
society.

M3: Ignite the students to acquire self-reliance in the latest technologies.

M4: Foster Enterprising spirit among students.

Program Educational Objectives:

PEO1: Excel in Computer Science and Engineering program through quality studies,
enabling success in computing industry.

PEO2: Surpass in one’s career by critical thinking towards successful services and
growth of the organization, or as an entrepreneur or in higher studies. (Successful Career
Goals).
PEO3: Enhance knowledge by updating advanced technological concepts for facing the
rapidly changing world and contribute to society through innovation and creativity
(Continuing Education and Contribution to Society).

Program Specific Outcomes:

PSO1: Have Ability to understand, analyse and develop computer programs in the areas
like algorithms, system software, web design, big data analytics, and networking.

PSO2: Deploy the modern computer languages, environment, and platforms in creating
innovative products and solutions.
SRI VENKATESA PERUMAL COLLEGE OF ENGINEERING AND
TECHNOLOGY (AUTONOMOUS)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
*************
This is to certify that the Major Project Phase-II report entitled “A MACHINE
LEARNING MODEL FOR ONLINE FRAUD DETECTION” is being
submitted by members of batch no: CS18A2

AVULAPATI LAHARI 18G01A0509


MANGATI LAHARI 18G01A0549
A R KALAISELVAN 18G01A0503
D HEMA SAMEERA 18G01A0520
In partial fulfillment of the requirements for the award of the degree Bachelor of
Technology in Computer Science & Engineering from Sri Venkatesa
Perumal College Of Engineering and Technology, Puttur, affiliated to
Jawaharlal Nehru Technological University Anantapur, Anantapuram. This is
the bonafide work carried out by them under my guidance and supervision
during the academic year 2021-22.

PROJECT GUIDE HEAD OF THE DEPARTMENT


Dr. V.JanardhanBabu, M.Tech, Ph.D Dr. V.JanardhanBabu, M.Tech,Ph.D.
. Professor Professor
Submitted for the viva-voce examination held on……………………

Internal Examiner External Examiner

ii
DECLARATION BY PROJECT GUIDE

I hereby declare that major project phase-II report entitled “A MACHINE


LEARNING MODEL FOR ONLINE FRAUD DETECTION” is the bonafide
work carried out by the members of batch no.CS18A2 of Sri Venkatesa Perumal of
Engineering and Technology(Autonomous), Puttur for the award of degree
Bachelor of Technology in Computer Science & Engineering during the academic
year 2021-2022 is original work and the project has not formed the basis for the
award of any degree, diploma, associate fellowship or any other titled submitted
previously.

PROJECT GUIDE

Dr. V. Janardhan Babu, M.Tech, Ph. D.

Professor

iii
DECLARATION BY PROJECT MEMBERS

We hereby combinedly declare that the project phase-II entitled “A MACHINE


LEARNING MODEL FOR ONLINE FRAUD DETECTION” submitted by batch
no.CS18A2 for the award of our degree in B. Tech Computer Science &
Engineering is our original work and the project has not formed the basis for the
award of any degree, diploma, associate fellowship or any other similar title submitted
previously.

AVULAPATI LAHARI MANGATI LAHARI

(18G01A0509) (18G01A0549)

A R KALAISELVAN D HEMA SAMEERA

(18G01A0503) (18G01A0520)

Place:
Date:

iv
ACKNOWLEDGEMENT

The satisfaction and euphoria accompany the successful completion of task and
would be incomplete without the mention of the people who made it possible,
whose constant guidance and encouragement crown all the efforts with success.

We wish to express my deepest sense of gratitude and pay our sincere thanks to our
project phase-II guide Dr. V. JANARADHAN BABU, M. Tech, Ph. D , Head of the
Department of CSE, who evinced keen interest in our efforts and provided his
valuable guidance throughout our project work.

We also express our sincere gratitude to Dr. V. JANARDHAN BABU, M. Tech, Ph. D.,

HOD of CSE for his great encouragement and valuable support throughout our study.

We owe our gratitude our principal Dr. T. SUNIL KUMAR REDDY, M. Tech., Ph.D.,

for his kind attention and valuable guidance given to me throughout this course.

We sincerely and whole heartedly thank to our beloved Sri. RAVURI. V. BALAJI,
Vice-Chairman for giving art of infrastructure facilities to us throughout our course
study and leading to successful completion of our project.

We are very much thankful to our beloved Dr. R. VENKATASWAMY, Chairman of


Sri Venkatesa Perumal College of Engineering & Technology, Puttur for his kind
attention and valuable guidance to me throughout the course.

We also thankful to all staff members of CSE Department for helping us to complete
this project work by giving valuable suggestions.

We would like to thank the members of our family who assisted in the preparation of
this report financially.

The last but not least we express our sincere thanks to all our friends who have
supported us in the accomplishment of this project.

v
ABSTRACT

In our project, mainly focused on credit card fraud detection for in real world. Initially
I will collect the credit card datasets for trained dataset. Then will provide the user
credit card queries for testing data set. After classification process of random forest
algorithm using to the already analysing data set and user provide current dataset.
Finally optimizing the accuracy of the result data. Then will apply the processing of
some of the attributes provided can find affected fraud detection in viewing the
graphical model visualization. The performance of the techniques is evaluated based
on accuracy, sensitivity, and specificity, precision. The results indicate about the
optimal accuracy for Random Forest are 98.6% respectively.

vi
CONTENT

CHAPTERNO CHAPTER NAME


PAGENO

TITLE PAGE
i
CERTIFICATE iii

DECLARATION BY PROJECT GUIDE iiii

DECLARATION BY PROJECT MEMBER iv

ACKNOWLEDGEMENT v

ABSTRACT vi

1 INTRODUCTION 1

1.1 PROBLEM SOLVING 1

1.2 EXISTING SYSTEM 2

1.3 PROPOSED SYSTEM 3

2 LITERATURE SURVEY 4

3 SYSTEM REQUIREMENT ANALYSIS 10

3.1 NON FUCTIONAL REQUIREMENTS 10

3.2 FUNCTIONAL REQUIREMENTS 11

4 FEASIBILITY STUDY 12

5 SYSTEM DESIGN 14

5.1 SYSTEM ARCHITECTURE 14

5.1.1 USE CASE DIAGRAM 15

5.1.2 CLASS DIAGRAM 16

5.1.3 ACTIVITY DIAGRAM 17

5.1.4 COLLABORATION DIAGRAM 18

5.2 DATA FLOW DIAGRAM 19

6 CODING 20

6.1 IMPLEMENTATION 21
6.2 SOURCE CODE 41

7 SYSTEM TESTING 43

8 OUTPUT SCREENSHOTS 47

9 CONCLUSION 49

10 FUTURE ENHANCEMENT 50

11 BIBILOGRAPHY 51

11.1 REFERENCE 51

12 APPENDIX 52

12.1 INSTALLATION STEPS 54

12.2 USER MANUAL 66


LIST OF FIGURES

S. NAME OF FIGURES PAGE NO


No.

1. Fig: 3.1 System Architecture 14

2. Fig: 5.1.1 Use Case Diagram 15

3. Fig: 5.1.2 Class Diagram 16

4. Fig: 5.1.3Activity Diagram 17

5. Fig: 5.1.4Collaboration Diagram 18

6. Fig: 8.1 Rows and Columns in Dataset 47

7. Fig: 8.2 Normalized amount in Dataset 47

8. Fig: 8.3 Class Values 48

9. Fig: 8.4 Train and Test Data 48

10. Fig: 8.5 Confusion matrix of y_test & y_pred 49


LIST OF TABLES

S. NAME OF TABLES PAGE NO


No.

1. Table: 6.1.1 Categories of Supervised Learning 25

2. Table: 6.1.2 Categories of Un Supervised Learning 27

x
CHAPTER 1

INTRODUCTION
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION INTRODUCTION

1. INTRODUCTION
Billions of dollars of loss are caused every year by the fraudulent credit card
transactions. Fraud is old as humanity itself and can take an unlimited variety of
different forms. The PwC global economic crime survey of 2017 suggests that
approximately 48% of organizations experienced economic crime. Therefore, there is
definitely a need to solve the problem of credit card fraud detection. Moreover, the
development of new technologies provides additional ways in which criminals may
commit fraud. The use of credit cards is prevalent in modern day society and credit card
fraud has been kept on growing in recent years. Huge Financial losses has been
fraudulent affects not only merchants and banks, but also individual person who are
using the credits. Fraud may also affect the reputation and image of a merchant causing
non-financial losses that, though difficult to quantify in the short term, may become
visible in the long period. For example, if a cardholder is victim of fraud with a certain
company, he may no longer trust their business and choose a competitor.
1.1. Problem Definition
Credit card fraud is connected with illicit utilizing a credit card data to buy that
credit card sum are utilized in item buy. In the purchasing time the user use the credit
card, the fraudster trace out the password or user oriented important details, then it will
be applied in our transaction easily use the credit card cash amount but cannot find out
that person, that is fraudster. The credit card transaction completed through physically or
carefully. The physical exchanges based credit card is utilized in amid exchange, based
credit card is used only the phone or web.
The cardholders are basically provides the important details such as, card number
ended date and card validation number via phone or web. But technological world
currently use the credit card so increase the credit card transactions in every day and the
rise of e-commerce field like that every second use this credit card. The digits of credit
card business are increased in every year. So the technology is mostly developed and
gets more benefit in the people, but another side increases this credit card fraud cases. It
is most effective problem in the world. Then, the logical and numerical authentication
methods are applied in this credit card fraud cases, but this method is not most detected
one, because the fraudsters are hidden their details like identity and location in the

CS18A2 2021-2022 1
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION INTRODUCTION

internet, so that problem is big impact of financial industry also. This credit card fraud
problem affects both sides that mean admin and user side.
It affects the (a) issuer fees, (b) charges, (c) administrative charges that is the
fees are loss. So the merchants make the decision that is high rate fix in goods or
discounts are reduced. In this proposed system is to reduce the depletion from credit
card fraud, to eliminate the fraud cases. In two machines learning techniques are used in
(i) artificial networks, (ii) rule-detection techniques, (iii) decision trees, (iv) logistic
regression, and (v) support vector machine (SVM). This above model are combining
several methods that is, hybrid methods. The AdaBoost and greater part casting a ballot
strategies are connected and to recognize the credit card extortion.The main contribution
of the paper is as follow:
1.2. Existing system:
In existing System, a research about a case study involving credit card
fraud detection, where data normalization is applied before Cluster Analysis and with
results obtained from the use of Cluster Analysis and Artificial Neural Networks on
fraud detection has shown that by clustering attributes neuronal inputs can be
minimized. And promising results can be obtained by using normalized data and data
should be MLP trained. This research was based on unsupervised learning. Significance
of this paper was to find new methods for fraud detection and to increase the accuracy of
results. The data set for this paper is based on real life transactional data by a large
European company and personal details in data is kept confidential. Accuracy of an
algorithm is around 50%. Significance of this paper was to find an algorithm and to
reduce the cost measure. The result obtained was by 23% and the algorithm they find

was bayes minimum risk.

Disadvantages:

 In this paper a new collative comparison measure that reasonably represents the
gains and losses due to fraud detection is proposed.
 Cost sensitive method which is based on Bayes minimum risk is presented using
the proposed cost measure.

CS18A2 2021-2022 2
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION INTRODUCTION

1.3. Proposed System:


In proposed System, we are applying random forest algorithm for classify the
credit card dataset. Random Forest is an algorithm for classification and regression.
Summarily, it is a collection of decision tree classifiers. Random forest has advantage
over decision tree as it corrects the habit of over fitting to their training set. A subset of
the training set is sampled randomly so that to train each individual tree and then a
decision tree is built, each node then splits on a feature selected from a random subset of
the full feature set. Even for large data sets with many features and data instances
training is extremely fast in random forest and because each tree is trained
independently of the others. The Random Forest algorithm has been found to provide a
good estimate of the generalization error.

1.4. Scopes and System :


 Random forest ranks the importance of variables in a regression or classification
problem in a natural way can be done by Random Forest.
 The 'amount' feature is the transaction amount. Feature 'class' is the target class
for the binary classification and it takes value 1 for positive case (fraud) and 0
for negative case(non frau

CS18A2 2021-2022 3
CHAPTER 2

LITERATURE SURVEY
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY

2. LITERATURE SURVEY

J. Esmaily, R.Moradinezhad [2]. Proposed a hybrid of Artificial Neural Network


and Decision Trees, in 2015. One of the reasons to use this model was because it
promises reliability by giving very low false detection rate. Their model consists of a
two-phase approach, wherein the first phase was the classification results of Decision
Trees and Multilayer perceptron. This first layer was used to generate a new dataset
which in turn was fed into Multilayer perceptron in the second layer to classify the data.
In 2011, Siddhartha Bhattacharyya, Sanjeev Jha, Kurian Tharakunnel and J. Christopher
Westland [3] conducted a comprehensive comparative study on Support Vector Machine
(SVM) and Random Forest along with Logistical Retrogression.
This model is very accurate by providing very low detection rates. In 2011,
Raghavendra Patidar and Lokesh Sharma [4] proposed the Artificial Neural Network
and Genetic Algorithms hybrid. They concluded with experiments to show that Random
Forest methodology is most accurate, followed by Logistic Regression and Support
Vector Machines. They utilized neural nets to classify transactions & genetic algorithms
so that solution is optimized & the system is not trained. In 2015, Tanmay Kumar and
Suvasini Panigrahi [5] in this paper, they proposed a novel approach credit card
detection in which the fraud detection is done in three phases.
The first phase does the initial user authentication and verification of card
details. If the check is successfully cleared, then the transaction is passed to the next
phase where fuzzy c-means clustering cleared algorithm is applied to find out the
normal usage patterns of credit card users based on their past activity.
In another paper published by Wen-Fang YU & Na Wang [6] proposed the
distance-based method. This method judges whether it is outlier or not according to the
nearest neighbors of data objects. They only showed the highest accuracy of about 89.4
percent but did not talk about FP & FN. Ayushi Agrawal and others [7] proposed testing
a transaction, wherein they used the Hidden Markov Model to maintain the record of
previous transactions, Behavior based technique for grouping of datasets and lastly
genetic algorithm for optimization i.e. calculating the threshold value. Sam Maes [8]
proposed detecting frauds in credit card using two machine learning techniques namely
Bayesian Networks and Artificial Neural Network.

CS18A6 2021-2022 4
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY

The paper discussed that how Bayesian Networks after a short training gave
good results and their speed was enhanced by the use of ANN. Y. Sahin and E. Duman
[9] proposed fraud detection in credit card using a combination of Support Vector
Machines and Decision Trees. Geoffrey F.Miller, Peter M.Todd and Sailesh Hegde [10]
have elaborated the concept of designing of Neural Networks using Genetic Algorithms.
It aims to free the network design process from the constraints of human biases. They
built a system which would have applications in biological, neurological and
psychological modelling as well as the engineering and design applications using
automated network design.
Ekrem Duman and M. Hamdi Ozcelik [11] proposed a system to credit each
transaction a certain score and based on that score the transaction was judged, and to
implement this they combined Neural Networks with Scatter Search. Alireza
Pouramirarsalani1, Majid Khalilian, Alireza Nikravanshalmani [12] proposed a new
method of fraud detection which used a hybrid of feature selection and genetic
algorithm. They observed the salient features of the transactions and used the same
while detecting any unusual feature and flagging it to be the fraud one.
Pooja Chougule and others [13] in their paper proposed simple K-means and
Simple Genetic Algorithm for fraud detection. They showed that how k-means
algorithm grouped the transactions based on the distinct attribute values and genetic
algorithm. This was used for optimization since with the increase in size of the input
kmeans algorithm produced outliers. S.Fashoto, O.Adeleye and J.Wandera [14] have
used a hybrid of K-means clustering with Multilayer Perceptron (MLP) and the Hidden
Markov Model (HMM) in their paper. They have used K-means clustering in order to
group together the suspected fraudulent transactions into a similar cluster.
The output of this stage is used to train the HMM and the MLP which then
classify the incoming transactions. M.R. Harati Nik, M. Akrami, S. Khadivi and M.
Shajari [15] in their paper have proposed a fusion on Fuzzy expert system and Fogg
behavioral analysis thus naming it the Fuzzy hybrid model. The Fogg behavioral model
describes the merchant behavior in two dimensions: motivation and ability to make a
fraud. The fraud tendency weight is then calculated for each merchant followed by the
degree of suspicion for the incoming transactions. Krishna K. Tripathi and Mahesh A.
Pavaskar [16] have done a comparative study of different techniques in their paper and

CS18A2 2021-2022 5
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY

one of the techniques they have worked upon is a fusion of Dempster-Shafer theory and
Bayesian learning which combines the evidences or datasets from past as well as the
current behavior.
The rule-based filter transaction history and Bayesian learner are the 4 stages of
this system via which we decide the suspicious and unsuspicious transactions altogether.
In the first component the extent to which the incoming transaction has deviated is
determined so as to get the suspicion level.
TITLE: The Use of Predictive Analytics Technology to Detect Credit Card Fraud in
Canada

AUTHOR: Kosemani Temitayo Hafiz, Dr. Shaun Aghili, Dr. Pavol Zavarsky

ABSTRACT: This research paper focuses on the creation of a scorecard from relevant
evaluation criteria, features, and capabilities of predictive analytics vendor solutions
currently being used to detect credit card fraud. The scorecard provides a side-by-side
comparison of five credit card predictive analytics vendor solutions adopted in Canada.
From the ensuing research findings, a list of credit card fraud PAT vendor solution
challenges, risks, and limitations was outlined.

TITLE: BLAST-SSAHA Hybridization for Credit Card Fraud Detection

AUTHOR: Amlan Kundu, Suvasini Panigrahi, Shamik Sural, Senior Member, IEEE,
and Arun K. Majumdar

ABSTRACT: In this paper, we propose to use two-stage sequence alignment in which a


profile Analyser (PA) first determines the similarity of an incoming sequence of
transactions on a given credit card with the genuine cardholder’s past spending
sequences. The unusual transactions traced by the profile analyser are next passed on to
a deviation analyser (DA) for possible alignment with past fraudulent behaviour. The
final decision about the nature of a transaction is taken on the basis of the observations
by these two analysers. In order to achieve online response time for both PA and DA,
we suggest a new approach for combining two sequence alignment algorithms BLAST
and SSAHA.

CS18A2 2021-2022 6
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY

TITLE: Research on Credit Card Fraud Detection Model Based on Distance Sum

AUTHOR: Wen-Fang YU, Na Wang

ABSTRACT: Along with increasing credit cards and growing trade volume in China,
credit card fraud rises sharply. How to enhance the detection and prevention of credit
card fraud becomes the focus of risk control of banks. This paper proposes a credit card
fraud detection model using outlier detection based on distance sum according to the
infrequency and unconventionality of fraud in credit card transaction data, applying
outlier mining into credit card fraud detection. Experiments show that this model is
feasible and accurate in detecting credit card fraud.

TITLE: Fraudulent Detection in Credit Card System Using SVM & Decision Tree

AUTHOR: Vijayshree B. Nipane, Poonam S. Kalinge, Dipali Vidhate, Kunal War,


Bhagyashree P. Deshpande

ABSTRACT: With growing advancement in the electronic commerce field, fraud is


spreading all over the world, causing major financial losses. In current scenario, Major
cause of financial losses is credit card fraud; it not only affects trades person but also
individual clients. Decision tree, Genetic algorithm, Meta learning strategy, neural
network, HMM are the presented methods used to detect credit card frauds. In
contemplate system for fraudulent detection, artificial intelligence concept of Support
Vector Machine (SVM) & decision tree is being used to solve the problem. Thus by
implementation of this hybrid approach, financial losses can be reduced to greater
extend.

TITLE: Supervised Machine (SVM) Learning for Credit Card Fraud Detection

AUTHOR: Sitaram patel, Sunita Gond

ABSTRACT: In this thesis we are proposing the SVM (Support Vector Machine) based
method with multiple kernel involvement which also includes several fields of user
profile instead of only spending profile. The simulation result shows improvement in TP
(true positive), TN (true negative) rate, & also decreases the FP (false positive) & FN
(false negative) rate.

CS18A2 2021-2022 7
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY

TITLE: Detecting Credit Card Fraud by Decision Trees and Support Vector Machines

AUTHOR: Y. Sahin and E. Duman

ABSTRACT: In this study, classification models based on decision trees and support
vector machines (SVM) are developed and applied on credit card fraud detection
problem. This study is one of the firsts to compare the performance of SVM and
decision tree methods in credit card fraud detection with a real data set.

TITLE: Machine Learning based Approach to Financial Fraud Detection Process in


Mobile Payment System

AUTHOR: Dahee Choi and Kyungho Lee

ABSTRACT: Mobile payment fraud is the unauthorized use of mobile transaction


through identity theft or credit card stealing to fraudulently obtain money. Mobile
payment fraud is the fast growing issue through the emergence of smart phone and
online transition services. In the real world, highly accurate process in mobile payment
fraud detection is needed since financial fraud causes financial loss. Therefore, our
approach proposed the overall process of detecting mobile payment fraud based on
machine learning, supervised and unsupervised method to detect fraud and process large
amounts of financial data. Moreover, our approach performed sampling process and
feature selection process for fast processing with large volumes of transaction data and
to achieve high accuracy in mobile payment detection. F-measure and ROC curve are
used to validate our proposed model.

TITLE: Credit Card Fraud Detection Using Decision Tree Induction Algorithm

AUTHOR: Snehal Patil, Harshada Somavanshi, Jyoti Gaikwad, Amruta Deshmane,


Rinku Badgujar

ABSTRACT: A new cost-sensitive decision tree approach which reduces the sum of
misclassification costs while selecting the splitting attribute at each nonterminal node is
advanced and the act of this approach is compared with the well-known traditional
classification models on a real world credit card data set. This research is totally
concerned with credit card application fraud detection by performing the process of

CS18A2 2021-2022 8
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY

asking security queries to the persons intricate with the transactions and as well as by
eliminating real time data faults.

TITLE: Data Mining Techniques for Credit Card Fraud Detection: Empirical Study

AUTHOR: Marwan Fahmi, Abeer Hamdy, Khaled Nagati

ABSTRACT: Fraud detection is a crucial problem that has been facing the e-commerce
industry for decades. Financial institutions throughout the world lose billions due to
credit card fraud, which necessitate the use of credit card fraud prevention. Several
models have been proposed in the literature, however, the accuracy of the model is
crucial. In this paper four fraud detection models based on data mining techniques
(Support vector machine, K-nearest neighbours, Decision Trees, Naïve Bayes) were
developed and their performances were compared when applied on a real life
anonymised data set of transactions (“UCSD-FICO Data Mining Contest 2009”). Four
relevant metrics were used in evaluating the performance of the classifiers which are
True positive rate (TPR), False Positive Rate (FPR), Balanced Classification Rate
(BCR) and Matthews Correlation Coefficient (MCC).

TITLE: Card Fraud Detection Using Learning Machines

AUTHOR: Gheorghe Asachi” din Iaşi

ABSTRACT: Searching Card Fraud via Internet will return approximately 180 million
results. The total level of fraud reached 1.26 billion euro in 2010 in Europe according
with BCE. The ingenuity of thieves reached highly sophisticated forms. To model
mathematically this behaviour requires a classification method derived from supervised
learning algorithm which must be able to separate the class of fraudulent with a high
degree of accuracy. Following his definition, the technique of Support Vector Machines
is characterized by two strong hypotheses: margin optimization and kernel
representation. So, I chose the techniques of SVM with non-linear kernels. We propose
the Gaussian kernel function for measuring the similarities between features into new
linear space as the best approach to detect the fraud patterns.

CS18A2 2021-2022 9
CHAPTER 3

SYSTEM REQUIREMENTS ANALYSIS


A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM REQUIREMENT ANALYSIS

3. SYSTEM REQUIREMENTS ANALYSIS


3.1. Non- Functional requirements:
If you think of functional requirements as those that define what a system is
supposed to do, non functional requirements (NFRs) define constraints which affect how
the system should do it. While a system can still work if NFRs are not met, it may not
meet user or stakeholder expectations, or the needs of the business. NFRs also keep
functional requirements in line, so to speak. Attributes that make the product affordable,
easy to use, and accessible

 Types of Non Functional Requirements

There are many common categories of non functional requirements .NFRs are
often thought of as the “itys.” While the specifics will vary between products, having a
list of these NFR(non functional requirements) types defined up front provides a handy
checklist to make sure you’re not missing critical requirements. This is not an
exhaustive list, but here’s what we mean: NFR “Itys”

Security — Does your product store or transmit sensitive information? Does


your IT department require adherence to specific standards? What security best practices
are used in your industry?

Capacity — What are your system’s storage requirements, today and in the
future? How will your system scale up for increasing volume demands?

Compatibility — What are the minimum hardware requirements? What


operating systems and their versions must be supported?

Reliability and Availability — What is the critical failure time under normal usage?
Does a user need access to this all hours of every day?

Maintainability + Manageability—How much time does it take to fix


components, and how easily can an administrator manage the system? Under this
umbrella, you could also define Recoverability and Serviceability.

Scalability – The Black Friday test. What are the highest workloads under which
the system will still perform as expected?

CS18A2 2021-2022 10
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM REQUIREMENT ANALYSIS

Usability — How easy is it to use the product? What defines the experience of
using the product?

3.1.1. Hardware Requirements

 Windows 10 Operating System


 4 GB of RAM
 Inter corei5 processor
3.1.2. Software Requirement

 Anaconda Navigator (Jupyter Notebook)

3.2. Functional requirements

 Functional requirements define what a product must do, what its features
 Functional requirements are product features or functions that developers must
implement to enable users to accomplish their tasks. So, it’s important to make
them clear both for the development team and the stakeholders. Generally,
functional requirements describe system behavior under specific conditions. For
example:

The system sends an approval request after the user enters personal information

CS18A2 2021-2022 11
CHAPTER 4

FEASIBILITY STUDY
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION FEASIBILITY STUDY

4. FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business
proposal is put forth with a very general plan for the project and some cost estimates.
During system analysis the feasibility study of the proposed system is to be carried out.
This is to ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.Three
key considerations involved in the feasibility analysis are:

 Economical Feasibility
 Technical Feasibility
 Social Feasibility
Economical Feasibility
This study is carried out to check the economic impact that the system will have
on the organization. The amount of fund that the company can pour into the research
and development of the system is limited. The expenditures must be justified. Thus,
the developed system as well within the budget and this was achieved because most of
the technologies used are freely available. Only the customized products had to be
purchased.
Technical Feasibility
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed
system must have a modest requirement, as only requirements of the system. Any
system developed must not have a high demand on the available technical resources.
This will lead to high demands on the available technical resources. This will lead to
high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null are required for implementing this system.
resources. This will lead to high demands being placed on the client.

CS18A2 2021-2022 12
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION FEASIBILITY STUDY

Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user must
not feel threatened by the system, instead must accept it as a necessity. The user must
not feel threatened by the system, instead must accept it as a necessity. The level of
acceptance by the users solely depends on the methods that are employed to educate the
user about the system and to make him familiar with it. His/ Her level of confidence
must be raised so that he/she is also able to make some constructive criticism, which is
welcomed, as he/she is the final user of the system.

CS18A2 2021-2022 13
CHAPTER 5

SYSTEM DESIGN
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN

5. SYSTEM DESIGN

Fig 5.1: SYSTEM DESIGN

CS18A2 2021-2022 14
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN

5.1. USE CASE DIAGRAM:

A use case diagram in the Unified Modeling Language (UML) is a type of


behavioural diagram defined by and created from a Use-case analysis. The main purpose
of a use case diagram is to show what system functions are performed for which actor.
Roles of the actors in the system can be depicted.

Fig 5.1.1: Use Case Diagram

CS18A2 2021-2022 15
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN

5.2. CLASS DIAGRAM:

In software engineering, a class diagram in the Unified Modelling Language


(UML) is a type of static structure diagram that describes the structure of a system by
showing the system's classes, their attributes, operations (or methods), and the
relationships among the classes. It explains which class contains information.

Fig 5.1.2: Class Diagram

The class diagram is the main building block of object-oriented modelling. It is


used for general conceptual modelling of the structure of the application, and for
detailed modelling, translating the models into programming code. Class diagrams can
also be used for data modelling.[1] The classes in a class diagram represent both the
main elements, interactions in the application, and the classes to be programmed.

In the design of a system, a number of classes are identified and grouped


together in a class diagram that helps to determine the static relations between them. In
detailed modelling, the classes of the conceptual design are often split into subclasses.

CS18A2 2021-2022 16
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN

5.3. ACTIVITY DIAGRAM:


Activity diagrams are graphical representations of workflows of stepwise
activities and actions with support for choice, iteration and concurrency. In the Unified
Modelling Language, activity diagrams can be used to describe the business and
operational step-by-step workflows of components in a system. An activity diagram
shows the overall flow of control.

Fig 5.1.3: Activity Diagram

CS18A2 2021-2022 17
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN

5.4. COLLABORATION DIAGRAM:


The collaboration diagram is used to show the relationship between the objects
in a system. Both the sequence and the collaboration diagrams represent the same
information but differently. Instead of showing the flow of messages, it depicts the
architecture of the object residing in the system as it is based on object-oriented
programming. An object consists of several features. Multiple objects present in the
system are connected to each other. The collaboration diagram, which is also known as a
communication diagram, is used to portray the object's architecture in the system.

Fig 5.1.4: Collaboration Diagram

CS18A2 2021-2022 18
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN

5.5. DATA FLOW DIAGRAM

LEVEL 0

CS18A2 2021-2022 19
CHAPTER 6

CODING
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

6. CODING
6. 1. DESCRIPTION OF TECHNOLOGY USED:

6.1.1. MACHINE LEARNING

Machine Learning is a system that can learn from example through self-
improvement and without being explicitly coded by programmer. The breakthrough
comes with the idea that a machine can singularly learn from the data (i.e., example) to
produce accurate results.

Machine learning combines data with statistical tools to predict an output. This
output is then used by corporate to makes actionable insights. Machine learning is
closely related to data mining and Bayesian predictive modelling. The machine receives
data as input, use an algorithm to formulate answers.

A typical machine learning tasks are to provide a recommendation. For those


who have a Netflix account, all recommendations of movies or series are based on the
user's historical data. Tech companies are using unsupervised learning to improve the
user experience with personalizing recommendation. Machine learning is also used for a
variety of task like fraud detection, predictive maintenance, portfolio optimization,
automatize task and so on.

Machine Learning vs. Traditional Programming

Traditional programming differs significantly from machine learning. In


traditional programming, a programmer code all the rules in consultation with an expert
in the industry for which software is being developed. Each rule is based on a logical
foundation; the machine will execute an output following the logical statement. When
the system grows complex, more rules need to be written. It can quickly become
unsustainable to maintain.

CS18A2 2021-2022 20
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

How does Machine learning work?

Machine learning is the brain where all the learning takes place. The way the
machine learns is similar to the human being. Humans learn from experience. The more
we know, the more easily we can predict. By analogy, when we face an unknown
situation, the likelihood of success is lower than the known situation. Machines are
trained the same. To make an accurate prediction, the machine sees an example. When
we give the machine a similar example, it can figure out the outcome. However, like a
human, if its feed a previously unseen example, the machine has difficulties to predict.

The core objective of machine learning is the learning and inference. First of
all, the machine learns through the discovery of patterns. This discovery is made thanks
to the data. One crucial part of the data scientist is to choose carefully which data to
provide to the machine. The list of attributes used to solve a problem is called a feature
vector. You can think of a feature vector as a subset of data that is used to tackle a
problem.

The machine uses some fancy algorithms to simplify the reality and transform
this discovery into a model. Therefore, the learning stage is used to describe the data
and summarize it into a model.

For instance, the machine is trying to understand the relationship between the
wage of an individual and the likelihood to go to a fancy restaurant. It turns out the
machine finds a positive relationship between wage and going to a high-end restaurant:
This is the model Inferring.

CS18A2 2021-2022 21
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

When the model is built, it is possible to test how powerful it is on never-seen-


before data. The new data are transformed into a features vector, go through the model
and give a prediction. This is all the beautiful part of machine learning. There is no need
to update the rules or train again the model. You can use the model previously trained to
make inference on new data.

CS18A2 2021-2022 22
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

The life of Machine Learning programs is straightforward and can be summarized in


the following points:

1. Define a question
2. Collect data
3. Visualize data
4. Train algorithm
5. Test the Algorithm
6. Collect feedback
7. Refine the algorithm
8. Loop 4-7 until the results are satisfying
9. Use the model to make a prediction

Once the algorithm gets good at drawing the right conclusions, it applies that
knowledge to new sets of data.

Machine learning Algorithms and where they are used?

Machine learning can be grouped into two broad learning tasks: Supervised and
Unsupervised. There are many other algorithms

CS18A2 2021-2022 23
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

Fig 6.1.1: Machine Learning Diagram

Supervised learning :

An algorithm uses training data and feedback from humans to learn the
relationship of given inputs to a given output. For instance, a practitioner can use
marketing expense and weather forecast as input data to predict the sales of cans.

You can use supervised learning when the output data is known. The algorithm
will predict new data.

There are two categories of supervised learning:

CS18A2 2021-2022 24
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

Table 6.1.1 : Supervised Learning Categories

Algorithm Description Type


Name

Linear Finds a way to correlate each feature to the output to Regression


regression help predict future values.

Logistic Extension of linear regression that's used for Classification


regression classification tasks. The output variable 3is binary (e.g.,
only black or white) rather than continuous (e.g., an
infinite list of potential colors)

Decision Highly interpretable classification or regression model Regression


tree that splits data-feature values into branches at decision Classification
nodes (e.g., if a feature is a color, each possible color
becomes a new branch) until a final decision output is
made

Naive Bayes The Bayesian method is a classification method that Regression


makes use of the Bayesian theorem. The theorem Classification
updates the prior knowledge of an event with the
independent probability of each feature that can affect
the event.

Support Support Vector Machine, or SVM, is typically used for Regression


vector the classification task. SVM algorithm finds a Classification
machine hyperplane that optimally divided the classes. It is best
used with a non-linear solver.
Random The algorithm is built upon a decision tree to improve Regression
forest the accuracy drastically. Random forest generates many Classification

CS18A2 2021-2022 25
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

times simple decision trees and uses the 'majority vote'


method to decide on which label to return. For the
classification task, the final prediction will be the one
with the most vote; while for the regression task, the
average prediction of all the trees is the final prediction.

Ada Boost Classification or regression technique that uses a Regression


multitude of models to come up with a decision but Classification
weighs them based on their accuracy in predicting the
outcome.

Gradient- Gradient-boosting trees is a state-of-the-art Regression


boosting classification/regression technique. It is focusing on the Classification
trees error committed by the previous trees and tries to
correct it.

● Classification task
● Regression task

Classification

Imagine you want to predict the gender of a customer for a commercial. You will
start gathering data on the height, weight, job, salary, purchasing basket, etc. from your
customer database. You know the gender of each of your customer, it can only be male
or female. The objective of the classifier will be to assign a probability of being a male
or a female (i.e., the label) based on the information (i.e., features you have collected).
When the model learned how to recognize male or female, you can use new data to
make a prediction. For instance, you just got new information from an unknown
customer, and you want to know if it is a male or female. If the classifier predicts male =
70%, it means the algorithm is sure at 70% that this customer is a male, and 30% it is a
female.

CS18A2 2021-2022 26
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

The label can be of two or more classes. The above example has only two
classes, but if a classifier needs to predict object, it has dozens of classes (e.g., glass,
table, shoes, etc. each object represents a class)
Regression
When the output is a continuous value, the task is a regression. For instance, a
financial analyst may need to forecast the value of a stock based on a range of feature
like equity, previous stock performances, macro economics index. The system will be
trained to estimate the price of the stocks with the lowest possible error.
Unsupervised learning
In unsupervised learning, an algorithm explores input data without being given
an explicit output variable (e.g., explores customer demographic data to identify
patterns)
You can use it when you do not know how to classify the data, and you want the
algorithm to find patterns and classify the data for you.

Table 6.1.2 : Unsupervised Learning Categories

Algorithm Description Type

K-means clustering Puts data into some groups (k) that each contains Clustering
data with similar characteristics (as determined by
the model, not in advance by humans).

Gaussian mixture A generalization of k-means clustering that Clustering


model provides more flexibility in the size and shape of
groups (clusters).
Hierarchical Splits clusters along a hierarchical tree to form a Clustering
clustering classification system.

Can be used for Cluster loyalty-card customer.

CS18A2 2021-2022 27
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

Recommender Help to define the relevant data for making a Clustering


system recommendation.

PCA/T-SNE Mostly used to decrease the dimensionality of the Dimension


data. The algorithms reduce the number of Reduction
features to 3 or 4 vectors with the highest
variances.

Application of Machine learning

Augmentation:

● Machine learning, which assists humans with their day-to-day tasks, personally
or commercially without having complete control of the output. Such machine
learning is used in different ways such as Virtual Assistant, Data analysis,
software solutions. The primary user is to reduce errors due to human bias.

Automation:

● Machine learning, which works entirely autonomously in any field without the
need for any human intervention. For example, robots performing the essential
process steps in manufacturing plants.

Finance Industry

● Machine learning is growing in popularity in the finance industry. Banks are


mainly using ML to find patterns inside the data but also to prevent fraud.

Government organization

● The government makes use of ML to manage public safety and utilities. Take the
example of China with the massive face recognition. The government uses
Artificial intelligence to prevent jaywalker.

CS18A2 2021-2022 28
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

Healthcare industry

● Healthcare was one of the first industry to use machine learning with image
detection.

Marketing

● Broad use of AI is done in marketing thanks to abundant access to data. Before


the age of mass data, researchers develop advanced mathematical tools like
Bayesian analysis to estimate the value of a customer. With the boom of data,
marketing department relies on AI to optimize the customer relationship and
marketing campaign.

Example of application of Machine Learning in Supply Chain

Machine learning gives terrific results for visual pattern recognition, opening up
many potential applications in physical inspection and maintenance across the entire
supply chain network.

Unsupervised learning can quickly search for comparable patterns in the diverse
dataset. In turn, the machine can perform quality inspection throughout the logistics hub,
shipment with damage and wear.

For instance, IBM's Watson platform can determine shipping container damage.
Watson combines visual and systems-based data to track, report and make
recommendations in real-time.In past year stock manager relies extensively on the
primary method to evaluate and forecast the inventory. When combining big data and
machine learning, better forecasting techniques have been implemented (an
improvement of 20 to 30 % over traditional forecasting tools). In term of sales, it means
an increase of 2 to 3 % due to the potential reduction in inventory costs.

Example of Machine Learning Google Car

For example, everybody knows the Google car. The car is full of lasers on the
roof which are telling it where it is regarding the surrounding area. It has radar in the

CS18A2 2021-2022 29
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

front, which is informing the car of the speed and motion of all the cars around it. It uses
all of that data to figure out not only how to drive the car but also to figure out and
predict what potential drivers around the car are going to do. What's impressive is that
the car is processing almost a gigabyte a second of data.

Reinforcement Learning

Reinforcement learning is a subfield of machine learning in which systems are


trained by receiving virtual "rewards" or "punishments," essentially learning by trial and
error. Google's DeepMind has used reinforcement learning to beat a human champion in
the Go games. Reinforcement learning is also used in video games to improve the
gaming experience by providing smarter bot.

One of the most famous algorithms are:

● Q-learning
● Deep Q network
● State-Action-Reward-State-Action (SARSA)
● Deep Deterministic Policy Gradient (DDPG)

Applications/ Examples of deep learning applications

AI in Finance:

The financial technology sector has already started using AI to save time, reduce
costs, and add value. Deep learning is changing the lending industry by using more
robust credit scoring. Credit decision-makers can use AI for robust credit lending
applications to achieve faster, more accurate risk assessment, using machine intelligence
to factor in the character and capacity of applicants.

Underwrite is a Fintech company providing an AI solution for credit makers


company. underwrite.ai uses AI to detect which applicant is more likely to pay back a
loan. Their approach radically outperforms traditional methods.

AI in HR:

CS18A2 2021-2022 30
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

Under Armour, a sportswear company revolutionizes hiring and modernizes the


candidate experience with the help of AI. In fact, Under Armour Reduces hiring time for
its retail stores by 35%. Under Armour faced a growing popularity interest back in 2012.
They had, on average, 30000 resumes a month. Reading all of those applications and
begin to start the screening and interview process was taking too long. The lengthy
process to get people hired and on-boarded impacted Under Armour's ability to have
their retail stores fully staffed, ramped and ready to operate.

At that time, Under Armour had all of the 'must have' HR technology in place
such as transactional solutions for sourcing, applying, tracking and on boarding but
those tools weren't useful enough. Under armour choose HireVue, an AI provider for
HR solution, for both on-demand and live interviews. The results were bluffing; they
managed to decrease by 35% the time to fill. In return, the hired higher quality staffs.

AI in Marketing:

AI is a valuable tool for customer service management and personalization


challenges. Improved speech recognition in call-center management and call routing as a
result of the application of AI techniques allows a more seamless experience for
customers.

For example, deep-learning analysis of audio allows systems to assess a


customer's emotional tone. If the customer is responding poorly to the AI chatbot, the
system can be rerouted the conversation to real, human operators that take over the
issue.

Apart from the three examples above, AI is widely used in other


sectors/industries.

Difference between Machine Learning and Deep Learning

Table 6.1.3 : Supervised Learning Categories

CS18A2 2021-2022 31
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

Machine Learning Deep Learning

Data Excellent performances on a Excellent performance on a big


small/medium dataset dataset
Dependencies

Hardware Work on a low-end machine. Requires powerful machine,


preferably with GPU: DL performs a
Dependencies
significant amount of matrix
multiplication

Feature Need to understand the features that No need to understand the best
engineering represent the data feature that represents the data

Execution time From few minutes to hours Up to weeks. Neural Network needs
to compute a significant number of
weights

Interpretability Some algorithms are easy to interpret Difficult to impossible


(logistic, decision tree), some are
almost impossible (SVM, XGBoost)

Difference between Machine Learning and Deep Learning


When to use ML or DL?

In the table below, we summarize the difference between machine learning and deep
learning.

Machine learning Deep learning

Training dataset Small Large

Choose features Yes No

CS18A2 2021-2022 32
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

Number of algorithms Many Few

Training time Short Long

With machine learning, you need fewer data to train the algorithm than deep
learning. Deep learning requires an extensive and diverse set of data to identify the
underlying structure. Besides, machine learning provides a faster-trained model. Most
advanced deep learning architecture can take days to a week to train. The advantage of
deep learning over machine learning is it is highly accurate. You do not need to
understand what features are the best representation of the data; the neural network
learned how to select critical features. In machine learning, you need to choose for
yourself what features to include in the model.

PYTHON 3
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords frequently
where as other languages use punctuation, and it has fewer syntactical constructions
than other languages.

CS18A2 2021-2022 33
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

 Python is Interpreted: Python is processed at runtime by the interpreter. You


do not need to compile your program before executing it. This is similar to
PERL and PHP.
 Python is Interactive: You can actually sit at a Python prompt and interact with
the interpreter directly to write your programs.
 Python is Object-Oriented: Python supports Object-Oriented style or technique
of programming that encapsulates code within objects.
 Python is a Beginner's Language: Python is a great language for the beginner-
level programmers and supports the development of a wide range of applications
from simple text processing to WWW browsers to games.
History of Python:

Python was developed by Guido van Rossum in the late eighties and early
nineties at the National Research Institute for Mathematics and Computer Science in the
Netherlands.

Python is derived from many other languages, including ABC, Modula-3, C,


C++, Algol-68, SmallTalk, Unix shell, and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under the
GNU General Public License (GPL).

Python is now maintained by a core development team at the institute, although


Guido van Rossum still holds a vital role in directing its progress.

Python Features

Python's features include:

 Easy-to-learn: Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.

 Easy-to-read: Python code is more clearly defined and visible to the eyes.

 Easy-to-maintain: Python's source code is fairly easy-to-maintain.

CS18A2 2021-2022 34
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

 A broad standard library: Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.

 Interactive Mode: Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.

 Portable: Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.
 Extendable: You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.

 Databases: Python provides interfaces to all major commercial databases.

 GUI Programming: Python supports GUI applications that can be created and
ported to many system calls, libraries, and windows systems, such as Windows
MFC, Macintosh, and the X Window system of Unix.

 Scalable: Python provides a better structure and support for large programs than
shell scripting.
Apart from the above-mentioned features, Python has a big list of good features, few
are listed below:

 IT supports functional and structured programming methods as well as OOP.


 It can be used as a scripting language or can be compiled to byte-code for
building large application.
Python built-in modules:
1. Numpy
2. Pandas
3. Matplotlib
4. Sklearn
5. seaborn

CS18A2 2021-2022 35
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

NUMPY:
Numpy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects (such as
masked arrays and matrices), and an assortment of routines for fast operations on arrays,
including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete
Fourier transforms, basic linear algebra, basic statistical operations, random simulation
and much more. At the core of the NumPy package, is the array object. This
encapsulates dimensional arrays of homogeneous data types, with many operations
being performed in compiled code for performance. There are several important
differences between NumPy arrays and the standard Python sequences NumPy arrays
have a fixed size at creation, unlike Python lists (which can grow dynamically). hanging
the size of an array will create a new array and delete the original.
 The elements in a NumPy array are all required to be of the same data type, and
thus will be the same size in memory. The exception: one can have arrays of (Python,
including NumPy) objects, thereby allowing for arrays of different sized elements.
 NumPy arrays facilitate advanced mathematical and other types of operations on
large numbers of data. Typically, such operations are executed more efficiently and
with less code than is possible using Python’s built-in sequences.
 A growing plethora of scientific and mathematical Python-based packages are
using NumPy arrays; though these typically support Python-sequence input, they
convert such input to NumPy arrays prior to processing, and they often output
NumPy arrays. In other words, in order to efficiently use much (perhaps even most)
of today’s scientific/mathematical Pythonbased software, just knowing how to use
Python’s built-in sequence types is insufficient - one also needs to know how to use
NumPy arrays.
 The points about sequence size and speed are particularly important in scientific
computing.
As a simple example, consider the case of multiplying each element in a 1-D
sequence with the corresponding element in another sequence of the same length. If the
data are stored in two Python lists, a and b, we could iterate over each element:
The Numeric Python extensions (NumPy henceforth) is a set of extensions to the
Python programming language which allows Python programmers to efficiently

CS18A2 2021-2022 36
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

manipulate large sets of objects organized in grid like fashion. These sets of objects are
called arrays, and they can have any number of dimensions: one dimensional arrays are
similar to standard Python sequences, two-dimensional arrays are similar to matrices
from linear algebra. Note that one-dimensional arrays are also different from any other
Python sequence, and that two-dimensional matrices are also different from the matrices
of linear algebra, in ways which we will mention later in this text. Why are these
extensions needed? The core reason is a very prosaic one, and that is that manipulating a
set of a million numbers in Python with the standard data structures such as lists, tuples
or classes is much too slow and uses too much space.
Anything which we can do in NumPy we can do in standard Python – we just may
not be alive to see the program finish. A more subtle reason for these extensions
however is that the kinds of operations that programmers typically want to do on arrays,
while sometimes very complex, can often be decomposed into a set of fairly standard
operations. This decomposition has been developed similarly in many array languages.
In some ways, NumPy is simply the application of this experience to the Python
language – thus many of the operations described in NumPy work the way they do
because experience has shown that way to be a good one, in a variety of contexts. The
languages which were used to guide the development of NumPy include the infamous
APL family of languages, Basis, MATLAB, FORTRAN, S and S+, and others. This
heritage will be obvious to users of NumPy who already have experience with these
other languages. This tutorial, however, does not assume any such background, and all
that is expected of the reader is a reasonable working knowledge of the standard Python
language. This document is the “official” documentation for NumPy. It is both a tutorial
and the most authoritative source of information about NumPy with the exception of the
source code. The tutorial material will walk you through a set of manipulations of
simple, small, arrays of numbers, as well as image files. This choice was made because:

 A concrete data set makes explaining the behavior of some functions much easier to
motivate than simply talking about abstract operations on abstract data sets;

• Every reader will at least an intuition as to the meaning of the data and
organization of image Files.

• The result of various manipulations can be displayed simply since the data set
has a natural graphical representation. All users of NumPy, whether interested

CS18A2 2021-2022 37
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

in image processing or not, are encouraged to follow the tutorial with a working
NumPy installation at their side, testing the examples, and, more importantly,
transferring the understanding gained by working on images to their specific
domain. The best way to learn is by doing the aim of this tutorial is to guide you
along this “doing.”
PANDAS:
Pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool, built on top of the Python programming language. Pandas is a
software library written for the Python programming language for data manipulation and
analysis. In particular, it offers data structures and operations for manipulating
numerical tables and time series. pandas is a Python package that provides fast, flexible,
and expressive data structures designed to make working with structured (tabular,
multidimensional, potentially heterogeneous) and time series data both easy and
intuitive. It aims to be the fundamental high-level building block for doing practical, real
world data analysis in Python. Additionally, it has the broader goal of becoming the
most powerful and flexible open source data analysis / manipulation tool available in
any language. It is already well on its way toward this goal.
Pandas is well suited for many different kinds of data:
1. 1.Tabular data with heterogeneously-typed columns, as in an SQL table or Excel
spreadsheet.
2. Ordered and unordered (not necessarily fixed-frequency) time series data.
3. Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
column labels.
4. Any other form of observational / statistical data sets. The data actually need not
be labelled at all to be placed into a pandas data structure.
5. The two primary data structures of pandas, Series (1-dimensional) and Data
Frame (2- dimensional), handle the vast majority of typical use cases in finance,
statistics, social science, and many areas of engineering. For R users, Data Frame
provides everything that R’s provides and much more. pandas is built on top of
Numpy and is intended to integrate well within a scientific computing
environment with many other 3rd party libraries.
Here are just a few of the things that pandas does well:

CS18A2 2021-2022 38
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

1. Easy handling of missing data (represented as NaN) in floating point as well as


non-floating point data
2. Size mutability: columns can be inserted and deleted from Data Frame and
higher dimensional objects
3. Automatic and explicit data alignment: objects can be explicitly aligned to a
set of labels, or the user can simply ignore the labels and let Series, Data Frame,
etc. automatically align the data for you in computations
4. Powerful, flexible group by functionality to perform split-apply-combine
operations on data sets, for both aggregating and transforming data
5. Make it easy to convert ragged, differently-indexed data in other Python and
NumPy data structure into Data Frame objects
6. Intelligent label-based slicing, fancy indexing, and sub setting of large data sets
7. Intuitive merging and joining data sets
8. Flexible reshaping and pivoting of data sets
9. Hierarchical labelling of axes (possible to have multiple labels per tick)
10. Robust IO tools for loading data from flat files (CSV and delimited), Excel files,
databases, and saving loading data from the ultrafast HDF5 format .
11. Time series-specific functionality: date range generation and frequency
conversion, moving window statistics, date shifting and lagging.

MATPLOTLIB:
Matplotlib is a comprehensive library for creating static, animated, and
interactive
visualizations in Python.
Visualizations in Python:

Matplotlib produces publication-quality figures in a variety of hardcopy formats


and interactive environments across platforms. Matplotlib can be used in Python

CS18A2 2021-2022 39
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

scripts, the Python and IPython shell, web application servers, and various graphical
user interface toolkits.

SEABORN

Seaborn is a library for making statistical graphics in Python. It is built on top of


matplotlib and closely integrated with pandas data structures. Here is some of the
functionality that seaborn offers: A dataset-oriented API for examining relationships
between multiple variables

1. Specialized support for using categorical variables to show observations or


aggregate statistics
2. Options for visualizing univariate or bivariate distributions and for comparing
them between subsets of data
3. Automatic estimation and plotting of linear regression models for different
kinds dependent variables
4. Convenient views onto the overall structure of complex datasets

5. High-level abstractions for structuring multi-plot grids that let you easily build
complex visualizations
6. Concise control over matplotlib figure styling with several built-in themes

7. Tools for choosing color palettes that faithfully reveal patterns in your data

8. Seaborn aims to make visualization a central part of exploring and


understanding data. Its dataset-oriented plotting functions operate on dataframes
and arrays containing whole datasets and internally perform the necessary
semantic mapping and statistical aggregation to produce informative plots.

CS18A2 2021-2022 40
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

6.2.SOURCE CODE:

import pandas as pd

import numpy as np

from sklearn.metrics import confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

np.random.seed(2)

data.head(130000)

from sklearn.preprocessing import StandardScaler

data['normalizedAmount']=StandardScaler().fit_transform(data['Amount'].values.reshap
e(-1,1))

data = data.drop(['Amount'],axis=1)

data.head()

data = data.drop(['Time'],axis=1)

data.head()

X = data.iloc[:, data.columns != 'Class']

y = data.iloc[:, data.columns == 'Class']

y.head()

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state=0)

X_train.shape

X_test.shape

from sklearn.ensemble import RandomForestClassifier

random_forest = RandomForestClassifier(n_estimators=100)

CS18A2 2021-2022 41
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING

random_forest.fit(X_train,y_train.values.ravel())

y_pred = random_forest.predict(X_test)

random_forest.score(X_test,y_test)

cnf_matrix = confusion_matrix(y_test,y_pred)

labels = [0,1]

sns.heatmap(cnf_matrix, annot=True, cmap="YlGnBu", fmt=".3f", xticklabels=labels,


yticklabels=labels)

plt.show()

y_pred = random_forest.predict(X)

print(y_pred)

cnf_matrix = confusion_matrix(y,y_pred.round())

sns.heatmap(cnf_matrix, annot=True, cmap="YlGnBu", fmt=".3f", xticklabels=labels,


yticklabels=labels)

plt.show()

CS18A2 2021-2022 42
CHAPTER 7

SYSTEM TESTING
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM TESTING

7. SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to
discover every conceivable fault or weakness in a work product. It provides a way to
check the functionality of components, sub-assemblies, assemblies and/or a finished
product. It is the process of exercising software with the intent of ensuring that the
Software system meets its requirements and user expectations and does not fail in an
unacceptable manner.
There are various types of each test type addresses a specific testing requirement.

7.1. TESTING ACTIVITIES


7.1.1. Unit Testing
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid outputs.
All decision branches and internal code flow should be validated. It is the testing of
individual software units of the application .it is done after the completion of an
individual unit before integration. This is a structural testing, that relies on knowledge of
its construction and is invasive. Unit tests perform basic tests at component level and
test a specific business process, application, and/or system configuration. Unit tests
ensure that each unique path of a business process performs accurately to the
documented specifications and contains clearly defined inputs and expected results.
Test strategy and approach
Field testing will be performed manually and functional tests will be written in detail.
Test objectives
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.
Features to be tested
• Verify that the entries are of the correct format
• No duplicate entries should be allowed
• All links should take the user to the correct page.

CS18A2 2021-2022 43
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM TESTING

7.1.2. Integration testing

Integration tests are designed to test integrated software components to


determine if they actually run as one program. Testing is event driven and is more
concerned with the basic outcome of screens or fields. Integration tests demonstrate that
although the components were individually satisfaction, as shown by successfully unit
testing, the combination of components is correct and consistent. Integration testing is
specifically aimed at exposing the problems that arise from the combination of
components.

7.1.3 Functional testing

Functional tests provide systematic demonstrations that functions tested are


available as specified by the business and technical requirements, system
documentation, and user manuals.

Functional testing is centred on the following items:

• Valid Input : identified classes of valid input must be accepted.


• Invalid Input : identified classes of invalid input must be rejcted.
• Functions : identified functions must be exercised.
• Output : identified classes of application outputs must be exercised.
• Systems/Procedures : interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key
functions, or special test cases. In addition, systematic coverage pertaining to identify
Business process flows; data fields, predefined processes, and successive processes must
be considered for testing. Before functional testing is complete, additional tests are
identified and the effective value of current tests is determined.
.7.1.4 System Testing
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An
example of system testing is the configuration oriented system integration test. System
testing is based on process descriptions and flows, emphasizing pre-driven process links
and integration points.

CS18A2 2021-2022 44
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM TESTING

7.1.5 Acceptance Testing


User Acceptance Testing is a critical phase of any project and requires
significant participation by the end user. It also ensures that the system meets the
functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.

7.2. TYPES OF TESTING


7.2.1. White Box Testing
White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at least its
purpose. It is purpose. It is used to test areas that cannot be reached from a black box
level.
Simpler models such as linear regression and decision trees on the other hand
provide less predictive capacity and are not always capable of modelling the inherent
complexity of the dataset (i.e. feature interactions). They are however significantly
easier to explain and interpret. White-box models are the type of models which one can
clearly explain how they behave, how they produce predictions.
7.2.2. Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most
other kinds of tests, must be written from a definitive source document, such as
specification or requirements document, such as specification or requirements
document. It is a testing in which the software under test is treated, as a black box .you
cannot “see” into it. The test provides inputs and responds to outputs without
considering how the software works.
Black-box models such as neural networks, gradient boosting models or
complicated ensembles often provide great accuracy. The inner workings of these
models are harder to understand and they don’t provide an estimate of the importance of
each feature on the model predictions. Black-box models, users can only observe the
input- output relationship. For example, input the customer profile then output customer
churn propensity score. But the underlying reasons or processes to produce the output

CS18A2 2021-2022 45
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM TESTING

are not available Black-box models often result in 1pc to 3pc better accuracy than
whitebox models, but you sacrifice transparency and accountability.

CS18A2 2021-2022 46
CHAPTER 8

OUTPUT SCREEN SHOTS


A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION OUTPUT SCREENSHOTS

8. OUTPUT SCREEN SHOTS


Data.head()

Figure 8.1: Rows and Columns in Dataset:

Figure 8.2: Normalized amount in dataset

CS18A2 2021-2022 47
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION OUTPUT SCREENSHOTS

Figure 8.3: Class values

Figure 8. 4 : Train and test data

Figure 8.5: Confusion matrix of y_test and y_pred

CS18A2 2021-2022 48
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION OUTPUT SCREENSHOTS

Figure 8.6: Confusion matrix of y_test and y_pred

CS18A2 2021-2022 49
CHAPTER 9

CONCLUSION
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CONCLUSION

9. CONCLUSION
In this paper, Machine learning technique like Logistic regression, Decision
Tree and Random forest were used to detect the fraud in credit card system. Sensitivity,
Specificity, accuracy and error rate are used to evaluate the performance for the
proposed system. The accuracy for logistic regression, Decision tree and random forest
classifier are 90.0, 94.3, and 99.9 respectively. By comparing all the three method,
found that random forest classifier is better than the logistic regression and decision tree.

CS18A2 2021-2022 50
CHAPTER 10

FUTURE ENHANCEMENTS
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION FUTURE ENHANCEMENT

10. FUTURE ENHANCEMENT


While we couldn’t reach out goal of 100% accuracy in fraud detection, we did
end up creating a system that can, with enough time and data, get very close to that goal.
As with any such project, there is some room for improvement here.

The very nature of this project allows for multiple algorithms to be integrated
together as modules and their results can be combined to increase the accuracy of the
final result.

This model can further be improved with the addition of more algorithms into it.
However, the output of these algorithms needs to be in the same format as the others.
Once that condition is satisfied, the modules are easy to add as done in the code. This
provides a great degree of modularity and versatility to the project.

More room for improvement can be found in the dataset. As demonstrated


before, the precision of the algorithms increases when the size of dataset is increased.
Hence, more data will surely make the model more accurate in detecting frauds and
reduce the number of false positives. However, this requires official support from the
banks themselves.

CS18A2 2021-2022 51
CHAPTER 11

BIBLIOGRAPHY
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION BIBLIOGRAPHY

11. REFERENCES
[1] Andrew. Y. Ng, Michael. I. Jordan, "On discriminative vs. generative classifiers: A
comparison of logistic regression and naive bayes", Advances in neural information
processing systems, vol. 2, pp. 841-848, 2002.

[2] A. Shen, R. Tong, Y. Deng, "Application of classification models on credit card


fraud detection", Service Systems and Service Management 2007 International
Conference, pp. 1-4, 2007.

[3] A. C. Bahnsen, A. Stojanovic, D. Aouada, B. Ottersten, "Cost sensitive credit card


fraud detection using Bayes minimum risk", Machine Learning and Applications
(ICMLA). 2013 12th International Conference, vol. 1, pp. 333-338, 2013.

[4] B.Meena, I.S.L.Sarwani, S.V.S.S.Lakshmi,” Web Service mining and its techniques
in Web Mining” IJAEGT,Volume 2,Issue 1 , Page No.385-389.

[5] F. N. Ogwueleka, "Data Mining Application in Credit Card Fraud Detection


System", Journal of Engineering Science and Technology, vol. 6, no. 3, pp. 311-322,
2011.

[6] G. Singh, R. Gupta, A. Rastogi, M. D. S. Chandel, A. Riyaz, "A Machine Learning


Approach for Detection of Fraud based on SVM", International Journal of Scientific
Engineering and Technology, vol. 1, no. 3, pp. 194-198, 2012, ISSN ISSN: 2277-1581.

[7] K. Chaudhary, B. Mallick, "Credit Card Fraud: The study of its impact and detection
techniques", International Journal of Computer Science and Network (IJCSN), vol. 1,
no. 4, pp. 31-35, 2012, ISSN ISSN: 2277-5420.

[8] M. J. Islam, Q. M. J. Wu, M. Ahmadi, M. A. SidAhmed, "Investigating the


Performance of Naive-Bayes Classifiers and KNearestNeighbor Classifiers", IEEE
International Conference on Convergence Information Technology, pp. 1541-1546,
2007.

CS18A2 2021-2022 52
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION BIBLIOGRAPHY

[9] R. Wheeler, S. Aitken, "Multiple algorithms for fraud detection" in Knowledge-


Based Systems, Elsevier, vol. 13, no. 2, pp. 93-99, 2000.

[10] S. Patil, H. Somavanshi, J. Gaikwad, A. Deshmane, R. Badgujar, "Credit Card


Fraud Detection Using Decision Tree Induction Algorithm", International Journal of
Computer Science and Mobile Computing (IJCSMC), vol. 4, no. 4, pp. 92-95, 2015,
ISSN ISSN: 2320-088X.

[11] S. Maes, K. Tuyls, B. Vanschoenwinkel, B. Manderick, "Credit card fraud


detection using Bayesian and neural networks", Proceedings of the 1st international
naiso congress on neuro fuzzy technologies, pp. 261-270, 2002.

[12] S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C. Westland, "Data mining for credit


card fraud: A comparative study", Decision Support Systems, vol. 50, no. 3, pp. 602-
613, 2011.

[13] Y. Sahin, E. Duman, "Detecting credit card fraud by ANN and logistic regression",
Innovations in Intelligent Systems and Applications (INISTA) 2011 International
Symposium, pp. 315-319, 2011.

[14] Selvani Deepthi Kavila,LAKSHMI S.V.S.S.,RAJESH B “ Automated Essay


Scoring using Feature Extraction Method “ IJCER ,volume 7,issue 4(L), Page No.
12161-12165.

[15] S.V.S.S.Lakshmi,K.S.Deepthi,Ch.Suresh “Text Summarization basing on Font and


Cue-phrase Feature for a Single Document”, Emerging ICT for Bridging the Future −
Volume 2, Advances in Intelligent Systems and Computing ,Page No. 537- 542.

[16] Y. Sahin, S. Bulkan, E. Duman, "A cost-sensitive decision tree approach for fraud
detection", Expert Systems with Applications, vol. 40, no. 15, pp. 5916- 5923, 2013.

[17] Y. Kou, C-T. Lu, S. Sinvongwattana, Y-P. Huang, "Survey of Fraud Detection
Techniques", Proceedings of the 2004 IEEE International Conference on Networking
Sensing & Control, 2004.

[18] Y. Sahin, E. Duman, "Detecting Credit Card Fraud by Decision Trees and Support
Vector Machines", Proceedings of International Multi-Conference of Engineers.

CS18A2 2021-22 53
CHAPTER 11

APPENDIX
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

12. APPENDIX

12.1. INSTALLATION MANUAL

ANACONDA NAVIGATOR

Anaconda Navigator is a desktop graphical user interface (GUI) included in


Anaconda distribution that allows you to launch applications and easily manage conda
packages, environments and channels without using command-line commands. Navigator
can search for packages on Anaconda Cloud or in a local Anaconda Repository. It is
available for Windows, mac OS and Linux.

Why use Navigator?

In order to run, many scientific packages depend on specific versions of


other packages. Data scientists often use multiple versions of many packages, and use
multiple environments to separate these different versions.

The command line program anaconda is both a package manager and an


environment manager, to help data scientists ensure that each version of each package has
all the dependencies it requires and works correctly.

Navigator is an easy, point-and-click way to work with packages and


environments without needing to type anaconda commands in a terminal window. You
can use it to find the packages you want, install them in an environment, run the packages
and update them, all inside Navigator.

WHAT APPLICATIONS CAN I ACCESS USING NAVIGATOR?


The following applications are available by default in Navigator

 Jupyter Notebook

 QTConsole

 Spyder

 VSCode

 Glueviz

 Orange 3 App

CS18A2 2021-2022 54
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

 Rodeo

 RStudio

Advanced anaconda users can also build your own Navigator applications
Python Installation Tutorial

In this tutorial, we will show how to install Python on your system. The Python is free of
cost.

Step 1: Open your browser and go to Anaconda


website(https://www.anaconda.com/distribution/) to download and install Anaconda.
You will see a page like this. Click on download.

Step 2: You will see that following page appears. By default, Anaconda shows you the
download link for Mac operating system. If you have Mac, then you can click “64-Bit
Graphical Installer” under Python 3.7 version to start downloading the file. In this
computer, Windows is the operating system, so we will select Windows as shown below.
If you have Linux as operating system, then you can select Linux option and download
file in similar manner as Mac. Mac and Linux users can skip Step 3 & 4.

CS18A2 2021-2022 55
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

After clicking on Windows button, the following page will appear.

Step 3: You can see that there are two options for Windows: 64-Bit and 32-Bit. You need
to find out whether your system is 64-Bit or 32-Bit and accordingly you need to select the
file for your system. To do so, go to your desktop home screen, right click on
‘Computer’ icon, then select Properties.

CS18A2 2021-2022 56
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

This will show you basic information about your system. Look for “System Type” as
shown below and check whether it is 64-bit or 32-bit. For this computer, we see that
Windows system type is 64 -bit.

Step 4: Now, go back to your browser and then click “64-Bit Graphical Installer (662
MB)” as this computer is 64 bit (as identified in Step 3)

CS18A2 2021-2022 57
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

The installer will start downloading the file (this may take a while) and will appear in
bottom left of your browser (if you are using google chrome) as shown below.

Step 5: When the file is completely downloaded, click on the file. You will see that
following window appears. Click on ‘Run’, and then click ‘Next’ button.

CS18A2 2021-2022 58
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

A new window will appear asking you to accept the terms of agreement,
select “I Agree”.

Select Just Me’ which is recommended and then click Next.

CS18A2 2021-2022 59
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

Step 6: Make sure you have the required free space for software installation. which you
can check as shown below. Then click Next. (If you don’t have required space, then you
need to delete some of your items to free the space)

Step 7: You will see that following window appears. Click on Install.

CS18A2 2021-2022 60
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

This will lead you to installation page showing the progress of installation. It will take
some time for the software to get installed.

After all the files are extracted, the “Next” button will get enabled. Click on Next button.

CS18A2 2021-2022 61
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

Then following window will appear. Click on Finish button to complete the installation.
Now Anaconda has been installed on your computer.

Step 8: Type ‘anaconda navigator’ in search box and click on the icon indicated
below.

CS18A2 2021-2022 62
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

Step 9: You will see that the Anaconda Navigator icon appears on the
bottom toolbar. Click on the icon tosee the contents of Navigator.

You will see that following page appears showing different options available
which you can use. For CRE, we need Spyder. So, click on ‘Launch’ under
Spyder section to install Spyder on your computer.

CS18A2 2021-2022 63
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

Step 10: Type ‘spyder’ in search box and click on the icon indicated below.

CS18A2 2021-2022 64
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

A pop- up window will appear asking your permission to allow access to Python.
Click on “Allow access”

Step 11: The following window will appear showing the Spyder interface. Now, you
are ready to runPython LEP codes or create a new Python code.

CS18A2 2021-2022 65
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

12.2. USER MANUAL:

Python is a high-level, interpreted, interactive and object-oriented scripting


language. Python is designed to be highly readable. It uses English keywords frequently
where as other languages use punctuation, and it has fewer syntactical constructions than
other languages.

 Python is Interpreted: Python is processed at runtime by the interpreter. You


do not need to compile your program before executing it. This is similar to PERL
and PHP.
 Python is Interactive: You can actually sit at a Python prompt and interact
with the interpreter directly to write your programs.
 Python is Object-Oriented: Python supports Object-Oriented style or
technique of programming that encapsulates code within objects.
 Python is a Beginner's Language: Python is a great language for the
beginner-level programmers and supports the development of a wide range of
applications from simple text processing to WWW browsers to games.

History of Python:
Python was developed by Guido van Rossum in the late eighties and early nineties
at the National Research Institute for Mathematics and Computer Science in the
Netherlands.

Python is derived from many other languages, including ABC, Modula-3, C, C++,
Algol-68, SmallTalk, Unix shell, and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under the
GNU General Public License (GPL).

Python is now maintained by a core development team at the institute, although


Guido van Rossum still holds a vital role in directing its progress.

Python Features

Python's features include:

CS18A2 2021-2022 66
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

 Easy-to-learn: Python has few keywords, simple structure, and a clearly


defined syntax. This allows the student to pick up the language quickly.

 Easy-to-read: Python code is more clearly defined and visible to the eyes.

 Easy-to-maintain: Python's source code is fairly easy-to-maintain.

 A broad standard library: Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.

 Interactive Mode: Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.

 Portable: Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.

 Extendable: You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.

 Databases: Python provides interfaces to all major commercial databases.

 GUI Programming: Python supports GUI applications that can be created


and ported to many system calls, libraries, and windows systems, such as
Windows MFC, Macintosh, and the X Window system of Unix.

 Scalable: Python provides a better structure and support for large programs than
shell scripting.

CS18A2 2021-2022 67
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

Apart from the above-mentioned features, Python has a big list of good features, few are
listed below:

 IT supports functional and structured programming methods as well as OOP.

 It can be used as a scripting language or can be compiled to byte-code for building


large applications.
 It provides very high-level dynamic data types and supports dynamic type
checking.
 IT supports automatic garbage collection.
 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

Python built-in modules:


1. Numpy
2. Pandas
3. Matplotlib
4. Sklearn
5. seaborn
NUMPY:
NumPy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects (such as
masked arrays and matrices), and an assortment of routines for fast operations on arrays,
including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete
Fourier transforms, basic linear algebra, basic statistical operations, random simulation
and much more. At the core of the NumPy package, is the array object. This encapsulates
dimensional arrays of homogeneous data types, with many operations being performed in
compiled code for performance. There are several important differences between NumPy
arrays and the standard Python sequences NumPy arrays have a fixed size at creation,
unlike Python lists (which can grow dynamically). hanging the size of an array will create
a new array and delete the original.

CS18A2 2021-2022 68
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

 The elements in a NumPy array are all required to be of the same data type, and
thus will be the same size in memory. The exception: one can have arrays of
(Python, including NumPy) objects, thereby allowing for arrays of different sized
elements.

 NumPy arrays facilitate advanced mathematical and other types of operations on


large numbers of data. Typically, such operations are executed more efficiently
and with less code than is possible using Python’s built-in sequences.

 A growing plethora of scientific and mathematical Python-based packages are


using Numpy arrays; though these typically support Python-sequence input, they
convert such input to Numpy arrays prior to processing, and they often output
Numpy arrays. In other words, in order to efficiently use much (perhaps even
most) of today’s scientific/mathematical Python based software, just knowing
how to use Python’s built-in sequence types is insufficient - one also needs to
know how to use Numpy arrays.

 The points about sequence size and speed are particularly important in scientific
computing.
As a simple example, consider the case of multiplying each element in a 1-D
sequence with the corresponding element in another sequence of the same length. If the
data are stored in two Python lists, a and b, we could iterate over each element:
The Numeric Python extensions (Numpy henceforth) is a set of extensions to the
Python programming language which allows Python programmers to efficiently
manipulate large sets of objects organized in grid like fashion. These sets of objects are
called arrays, and they can have any number of dimensions: one dimensional arrays are
similar to standard Python sequences, two-dimensional arrays are similar to matrices from
linear algebra. Note that one-dimensional arrays are also different from any other Python
sequence, and that two-dimensional matrices are also different from the matrices of linear
algebra, in ways which we will mention later in this text. Why are these extensions
needed? The core reason is a very prosaic one, and that is that manipulating a set of a
million numbers in Python with the standard data structures such as lists, tuples or classes
is much too slow and uses too much space.

CS18A2 2021-2022 69
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX

This decomposition has been developed similarly in many array languages. In some
ways, NumPy is simply the application of this experience to the Python language – thus
many of the operations described in NumPy work the way they do because experience has
shown that way to be a good one, in a variety of contexts. The languages which were used
to guide the development of NumPy include the infamous APL family of languages,
Basis, MATLAB, FORTRAN, S and S+, and others. This heritage will be obvious to
users of NumPy who already have experience with these other languages. This tutorial,
however, does not assume any such background, and all that is expected of the reader is a
reasonable working knowledge of the standard Python language. This document is the
“official” documentation for NumPy. It is both a tutorial and the most authoritative source
of information about NumPy with the exception of the source code. The tutorial material
will walk you through a set of manipulations of simple, small, arrays of numbers, as well
as image files. This choice was made because:

 A concrete data set makes explaining the behavior of some functions much easier to
motivate than simply talking about abstract operations on abstract data sets;

• Every reader will at least an intuition as to the meaning of the data and
organization of image Files.

• The result of various manipulations can be displayed simply since the data set has
a natural graphical representation. All users of NumPy, whether interested in
image processing or not, are encourage

• follow the tutorial with a working NumPy installation at their side, testing the
examples, and, more importantly, transferring the understanding gained by
working on images to their specific domain. The best way to learn is by doing the
aim of this tutorial is to guide you along this “doing.”

CS18A2 2021-2022 70
PO MAPPING

PROGROM DESCRIPTION
OUTCOMES

Engineering knowledge

Problem analysis

Design/development of
solutions
Conduct investigation
of complex problems

Modern tool usage

The engineer and


society
Environment and
sustainability
Ethics

Individual and team


work

Communication

Project managementand
finance

Life-long learning

You might also like