Sample
Sample
ON
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION
Submitted in partial fulfillment of the requirements for the award of the degree
Bachelor of Technology
in
COMPUTER SCIENCE & ENGINEERING
Submitted by BATCH-2
AVULAPATI LAHARI 18G01A0509
MANGATI LAHARI 18G01A0549
A.R.KALAI SELVAN 18G01A0503
D.HEMA SAMEERA 18G01A0520
Under the guidance of
Dr. V. Janardhan Babu, M. Tech, Ph. D.
www.svpcet.org
(2018-2022)
i
Institute Vision and Mission
Vision: To emerge as a Centre of Excellence for Learning and Research in the domains
of Engineering, Technology, Computing and Management.
Mission:
M1: To provide congenial academic ambience with state-of-art resources for learning
and research.
M3: Unleash and encourage the innate potential and creativity of students.
M5: Foster enterprising spirit among students work collaboratively with technical
Institutes / Universities / Industries of National and International repute.
CSE Department
Vision: To contribute for the society through excellence in Computer Science and
Engineering with a deep passion for wisdom, culture and values
Mission:
M1: Provide congenial academic ambience with necessary infrastructure and learning
resources.
M2: Inculcate confidence to face and experience new challenges from industry and
society.
PEO1: Excel in Computer Science and Engineering program through quality studies,
enabling success in computing industry.
PEO2: Surpass in one’s career by critical thinking towards successful services and
growth of the organization, or as an entrepreneur or in higher studies. (Successful Career
Goals).
PEO3: Enhance knowledge by updating advanced technological concepts for facing the
rapidly changing world and contribute to society through innovation and creativity
(Continuing Education and Contribution to Society).
PSO1: Have Ability to understand, analyse and develop computer programs in the areas
like algorithms, system software, web design, big data analytics, and networking.
PSO2: Deploy the modern computer languages, environment, and platforms in creating
innovative products and solutions.
SRI VENKATESA PERUMAL COLLEGE OF ENGINEERING AND
TECHNOLOGY (AUTONOMOUS)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
*************
This is to certify that the Major Project Phase-II report entitled “A MACHINE
LEARNING MODEL FOR ONLINE FRAUD DETECTION” is being
submitted by members of batch no: CS18A2
ii
DECLARATION BY PROJECT GUIDE
PROJECT GUIDE
Professor
iii
DECLARATION BY PROJECT MEMBERS
(18G01A0509) (18G01A0549)
(18G01A0503) (18G01A0520)
Place:
Date:
iv
ACKNOWLEDGEMENT
The satisfaction and euphoria accompany the successful completion of task and
would be incomplete without the mention of the people who made it possible,
whose constant guidance and encouragement crown all the efforts with success.
We wish to express my deepest sense of gratitude and pay our sincere thanks to our
project phase-II guide Dr. V. JANARADHAN BABU, M. Tech, Ph. D , Head of the
Department of CSE, who evinced keen interest in our efforts and provided his
valuable guidance throughout our project work.
We also express our sincere gratitude to Dr. V. JANARDHAN BABU, M. Tech, Ph. D.,
HOD of CSE for his great encouragement and valuable support throughout our study.
We owe our gratitude our principal Dr. T. SUNIL KUMAR REDDY, M. Tech., Ph.D.,
for his kind attention and valuable guidance given to me throughout this course.
We sincerely and whole heartedly thank to our beloved Sri. RAVURI. V. BALAJI,
Vice-Chairman for giving art of infrastructure facilities to us throughout our course
study and leading to successful completion of our project.
We also thankful to all staff members of CSE Department for helping us to complete
this project work by giving valuable suggestions.
We would like to thank the members of our family who assisted in the preparation of
this report financially.
The last but not least we express our sincere thanks to all our friends who have
supported us in the accomplishment of this project.
v
ABSTRACT
In our project, mainly focused on credit card fraud detection for in real world. Initially
I will collect the credit card datasets for trained dataset. Then will provide the user
credit card queries for testing data set. After classification process of random forest
algorithm using to the already analysing data set and user provide current dataset.
Finally optimizing the accuracy of the result data. Then will apply the processing of
some of the attributes provided can find affected fraud detection in viewing the
graphical model visualization. The performance of the techniques is evaluated based
on accuracy, sensitivity, and specificity, precision. The results indicate about the
optimal accuracy for Random Forest are 98.6% respectively.
vi
CONTENT
TITLE PAGE
i
CERTIFICATE iii
ACKNOWLEDGEMENT v
ABSTRACT vi
1 INTRODUCTION 1
2 LITERATURE SURVEY 4
4 FEASIBILITY STUDY 12
5 SYSTEM DESIGN 14
6 CODING 20
6.1 IMPLEMENTATION 21
6.2 SOURCE CODE 41
7 SYSTEM TESTING 43
8 OUTPUT SCREENSHOTS 47
9 CONCLUSION 49
10 FUTURE ENHANCEMENT 50
11 BIBILOGRAPHY 51
11.1 REFERENCE 51
12 APPENDIX 52
x
CHAPTER 1
INTRODUCTION
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION INTRODUCTION
1. INTRODUCTION
Billions of dollars of loss are caused every year by the fraudulent credit card
transactions. Fraud is old as humanity itself and can take an unlimited variety of
different forms. The PwC global economic crime survey of 2017 suggests that
approximately 48% of organizations experienced economic crime. Therefore, there is
definitely a need to solve the problem of credit card fraud detection. Moreover, the
development of new technologies provides additional ways in which criminals may
commit fraud. The use of credit cards is prevalent in modern day society and credit card
fraud has been kept on growing in recent years. Huge Financial losses has been
fraudulent affects not only merchants and banks, but also individual person who are
using the credits. Fraud may also affect the reputation and image of a merchant causing
non-financial losses that, though difficult to quantify in the short term, may become
visible in the long period. For example, if a cardholder is victim of fraud with a certain
company, he may no longer trust their business and choose a competitor.
1.1. Problem Definition
Credit card fraud is connected with illicit utilizing a credit card data to buy that
credit card sum are utilized in item buy. In the purchasing time the user use the credit
card, the fraudster trace out the password or user oriented important details, then it will
be applied in our transaction easily use the credit card cash amount but cannot find out
that person, that is fraudster. The credit card transaction completed through physically or
carefully. The physical exchanges based credit card is utilized in amid exchange, based
credit card is used only the phone or web.
The cardholders are basically provides the important details such as, card number
ended date and card validation number via phone or web. But technological world
currently use the credit card so increase the credit card transactions in every day and the
rise of e-commerce field like that every second use this credit card. The digits of credit
card business are increased in every year. So the technology is mostly developed and
gets more benefit in the people, but another side increases this credit card fraud cases. It
is most effective problem in the world. Then, the logical and numerical authentication
methods are applied in this credit card fraud cases, but this method is not most detected
one, because the fraudsters are hidden their details like identity and location in the
CS18A2 2021-2022 1
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION INTRODUCTION
internet, so that problem is big impact of financial industry also. This credit card fraud
problem affects both sides that mean admin and user side.
It affects the (a) issuer fees, (b) charges, (c) administrative charges that is the
fees are loss. So the merchants make the decision that is high rate fix in goods or
discounts are reduced. In this proposed system is to reduce the depletion from credit
card fraud, to eliminate the fraud cases. In two machines learning techniques are used in
(i) artificial networks, (ii) rule-detection techniques, (iii) decision trees, (iv) logistic
regression, and (v) support vector machine (SVM). This above model are combining
several methods that is, hybrid methods. The AdaBoost and greater part casting a ballot
strategies are connected and to recognize the credit card extortion.The main contribution
of the paper is as follow:
1.2. Existing system:
In existing System, a research about a case study involving credit card
fraud detection, where data normalization is applied before Cluster Analysis and with
results obtained from the use of Cluster Analysis and Artificial Neural Networks on
fraud detection has shown that by clustering attributes neuronal inputs can be
minimized. And promising results can be obtained by using normalized data and data
should be MLP trained. This research was based on unsupervised learning. Significance
of this paper was to find new methods for fraud detection and to increase the accuracy of
results. The data set for this paper is based on real life transactional data by a large
European company and personal details in data is kept confidential. Accuracy of an
algorithm is around 50%. Significance of this paper was to find an algorithm and to
reduce the cost measure. The result obtained was by 23% and the algorithm they find
Disadvantages:
In this paper a new collative comparison measure that reasonably represents the
gains and losses due to fraud detection is proposed.
Cost sensitive method which is based on Bayes minimum risk is presented using
the proposed cost measure.
CS18A2 2021-2022 2
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION INTRODUCTION
CS18A2 2021-2022 3
CHAPTER 2
LITERATURE SURVEY
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY
2. LITERATURE SURVEY
CS18A6 2021-2022 4
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY
The paper discussed that how Bayesian Networks after a short training gave
good results and their speed was enhanced by the use of ANN. Y. Sahin and E. Duman
[9] proposed fraud detection in credit card using a combination of Support Vector
Machines and Decision Trees. Geoffrey F.Miller, Peter M.Todd and Sailesh Hegde [10]
have elaborated the concept of designing of Neural Networks using Genetic Algorithms.
It aims to free the network design process from the constraints of human biases. They
built a system which would have applications in biological, neurological and
psychological modelling as well as the engineering and design applications using
automated network design.
Ekrem Duman and M. Hamdi Ozcelik [11] proposed a system to credit each
transaction a certain score and based on that score the transaction was judged, and to
implement this they combined Neural Networks with Scatter Search. Alireza
Pouramirarsalani1, Majid Khalilian, Alireza Nikravanshalmani [12] proposed a new
method of fraud detection which used a hybrid of feature selection and genetic
algorithm. They observed the salient features of the transactions and used the same
while detecting any unusual feature and flagging it to be the fraud one.
Pooja Chougule and others [13] in their paper proposed simple K-means and
Simple Genetic Algorithm for fraud detection. They showed that how k-means
algorithm grouped the transactions based on the distinct attribute values and genetic
algorithm. This was used for optimization since with the increase in size of the input
kmeans algorithm produced outliers. S.Fashoto, O.Adeleye and J.Wandera [14] have
used a hybrid of K-means clustering with Multilayer Perceptron (MLP) and the Hidden
Markov Model (HMM) in their paper. They have used K-means clustering in order to
group together the suspected fraudulent transactions into a similar cluster.
The output of this stage is used to train the HMM and the MLP which then
classify the incoming transactions. M.R. Harati Nik, M. Akrami, S. Khadivi and M.
Shajari [15] in their paper have proposed a fusion on Fuzzy expert system and Fogg
behavioral analysis thus naming it the Fuzzy hybrid model. The Fogg behavioral model
describes the merchant behavior in two dimensions: motivation and ability to make a
fraud. The fraud tendency weight is then calculated for each merchant followed by the
degree of suspicion for the incoming transactions. Krishna K. Tripathi and Mahesh A.
Pavaskar [16] have done a comparative study of different techniques in their paper and
CS18A2 2021-2022 5
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY
one of the techniques they have worked upon is a fusion of Dempster-Shafer theory and
Bayesian learning which combines the evidences or datasets from past as well as the
current behavior.
The rule-based filter transaction history and Bayesian learner are the 4 stages of
this system via which we decide the suspicious and unsuspicious transactions altogether.
In the first component the extent to which the incoming transaction has deviated is
determined so as to get the suspicion level.
TITLE: The Use of Predictive Analytics Technology to Detect Credit Card Fraud in
Canada
AUTHOR: Kosemani Temitayo Hafiz, Dr. Shaun Aghili, Dr. Pavol Zavarsky
ABSTRACT: This research paper focuses on the creation of a scorecard from relevant
evaluation criteria, features, and capabilities of predictive analytics vendor solutions
currently being used to detect credit card fraud. The scorecard provides a side-by-side
comparison of five credit card predictive analytics vendor solutions adopted in Canada.
From the ensuing research findings, a list of credit card fraud PAT vendor solution
challenges, risks, and limitations was outlined.
AUTHOR: Amlan Kundu, Suvasini Panigrahi, Shamik Sural, Senior Member, IEEE,
and Arun K. Majumdar
CS18A2 2021-2022 6
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY
TITLE: Research on Credit Card Fraud Detection Model Based on Distance Sum
ABSTRACT: Along with increasing credit cards and growing trade volume in China,
credit card fraud rises sharply. How to enhance the detection and prevention of credit
card fraud becomes the focus of risk control of banks. This paper proposes a credit card
fraud detection model using outlier detection based on distance sum according to the
infrequency and unconventionality of fraud in credit card transaction data, applying
outlier mining into credit card fraud detection. Experiments show that this model is
feasible and accurate in detecting credit card fraud.
TITLE: Fraudulent Detection in Credit Card System Using SVM & Decision Tree
TITLE: Supervised Machine (SVM) Learning for Credit Card Fraud Detection
ABSTRACT: In this thesis we are proposing the SVM (Support Vector Machine) based
method with multiple kernel involvement which also includes several fields of user
profile instead of only spending profile. The simulation result shows improvement in TP
(true positive), TN (true negative) rate, & also decreases the FP (false positive) & FN
(false negative) rate.
CS18A2 2021-2022 7
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY
TITLE: Detecting Credit Card Fraud by Decision Trees and Support Vector Machines
ABSTRACT: In this study, classification models based on decision trees and support
vector machines (SVM) are developed and applied on credit card fraud detection
problem. This study is one of the firsts to compare the performance of SVM and
decision tree methods in credit card fraud detection with a real data set.
TITLE: Credit Card Fraud Detection Using Decision Tree Induction Algorithm
ABSTRACT: A new cost-sensitive decision tree approach which reduces the sum of
misclassification costs while selecting the splitting attribute at each nonterminal node is
advanced and the act of this approach is compared with the well-known traditional
classification models on a real world credit card data set. This research is totally
concerned with credit card application fraud detection by performing the process of
CS18A2 2021-2022 8
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION LITERATURE SURVEY
asking security queries to the persons intricate with the transactions and as well as by
eliminating real time data faults.
TITLE: Data Mining Techniques for Credit Card Fraud Detection: Empirical Study
ABSTRACT: Fraud detection is a crucial problem that has been facing the e-commerce
industry for decades. Financial institutions throughout the world lose billions due to
credit card fraud, which necessitate the use of credit card fraud prevention. Several
models have been proposed in the literature, however, the accuracy of the model is
crucial. In this paper four fraud detection models based on data mining techniques
(Support vector machine, K-nearest neighbours, Decision Trees, Naïve Bayes) were
developed and their performances were compared when applied on a real life
anonymised data set of transactions (“UCSD-FICO Data Mining Contest 2009”). Four
relevant metrics were used in evaluating the performance of the classifiers which are
True positive rate (TPR), False Positive Rate (FPR), Balanced Classification Rate
(BCR) and Matthews Correlation Coefficient (MCC).
ABSTRACT: Searching Card Fraud via Internet will return approximately 180 million
results. The total level of fraud reached 1.26 billion euro in 2010 in Europe according
with BCE. The ingenuity of thieves reached highly sophisticated forms. To model
mathematically this behaviour requires a classification method derived from supervised
learning algorithm which must be able to separate the class of fraudulent with a high
degree of accuracy. Following his definition, the technique of Support Vector Machines
is characterized by two strong hypotheses: margin optimization and kernel
representation. So, I chose the techniques of SVM with non-linear kernels. We propose
the Gaussian kernel function for measuring the similarities between features into new
linear space as the best approach to detect the fraud patterns.
CS18A2 2021-2022 9
CHAPTER 3
There are many common categories of non functional requirements .NFRs are
often thought of as the “itys.” While the specifics will vary between products, having a
list of these NFR(non functional requirements) types defined up front provides a handy
checklist to make sure you’re not missing critical requirements. This is not an
exhaustive list, but here’s what we mean: NFR “Itys”
Capacity — What are your system’s storage requirements, today and in the
future? How will your system scale up for increasing volume demands?
Reliability and Availability — What is the critical failure time under normal usage?
Does a user need access to this all hours of every day?
Scalability – The Black Friday test. What are the highest workloads under which
the system will still perform as expected?
CS18A2 2021-2022 10
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM REQUIREMENT ANALYSIS
Usability — How easy is it to use the product? What defines the experience of
using the product?
Functional requirements define what a product must do, what its features
Functional requirements are product features or functions that developers must
implement to enable users to accomplish their tasks. So, it’s important to make
them clear both for the development team and the stakeholders. Generally,
functional requirements describe system behavior under specific conditions. For
example:
The system sends an approval request after the user enters personal information
CS18A2 2021-2022 11
CHAPTER 4
FEASIBILITY STUDY
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION FEASIBILITY STUDY
4. FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business
proposal is put forth with a very general plan for the project and some cost estimates.
During system analysis the feasibility study of the proposed system is to be carried out.
This is to ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.Three
key considerations involved in the feasibility analysis are:
Economical Feasibility
Technical Feasibility
Social Feasibility
Economical Feasibility
This study is carried out to check the economic impact that the system will have
on the organization. The amount of fund that the company can pour into the research
and development of the system is limited. The expenditures must be justified. Thus,
the developed system as well within the budget and this was achieved because most of
the technologies used are freely available. Only the customized products had to be
purchased.
Technical Feasibility
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed
system must have a modest requirement, as only requirements of the system. Any
system developed must not have a high demand on the available technical resources.
This will lead to high demands on the available technical resources. This will lead to
high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null are required for implementing this system.
resources. This will lead to high demands being placed on the client.
CS18A2 2021-2022 12
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION FEASIBILITY STUDY
Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user.
This includes the process of training the user to use the system efficiently. The user must
not feel threatened by the system, instead must accept it as a necessity. The user must
not feel threatened by the system, instead must accept it as a necessity. The level of
acceptance by the users solely depends on the methods that are employed to educate the
user about the system and to make him familiar with it. His/ Her level of confidence
must be raised so that he/she is also able to make some constructive criticism, which is
welcomed, as he/she is the final user of the system.
CS18A2 2021-2022 13
CHAPTER 5
SYSTEM DESIGN
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN
5. SYSTEM DESIGN
CS18A2 2021-2022 14
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN
CS18A2 2021-2022 15
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN
CS18A2 2021-2022 16
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN
CS18A2 2021-2022 17
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN
CS18A2 2021-2022 18
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM DESIGN
LEVEL 0
CS18A2 2021-2022 19
CHAPTER 6
CODING
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
6. CODING
6. 1. DESCRIPTION OF TECHNOLOGY USED:
Machine Learning is a system that can learn from example through self-
improvement and without being explicitly coded by programmer. The breakthrough
comes with the idea that a machine can singularly learn from the data (i.e., example) to
produce accurate results.
Machine learning combines data with statistical tools to predict an output. This
output is then used by corporate to makes actionable insights. Machine learning is
closely related to data mining and Bayesian predictive modelling. The machine receives
data as input, use an algorithm to formulate answers.
CS18A2 2021-2022 20
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
Machine learning is the brain where all the learning takes place. The way the
machine learns is similar to the human being. Humans learn from experience. The more
we know, the more easily we can predict. By analogy, when we face an unknown
situation, the likelihood of success is lower than the known situation. Machines are
trained the same. To make an accurate prediction, the machine sees an example. When
we give the machine a similar example, it can figure out the outcome. However, like a
human, if its feed a previously unseen example, the machine has difficulties to predict.
The core objective of machine learning is the learning and inference. First of
all, the machine learns through the discovery of patterns. This discovery is made thanks
to the data. One crucial part of the data scientist is to choose carefully which data to
provide to the machine. The list of attributes used to solve a problem is called a feature
vector. You can think of a feature vector as a subset of data that is used to tackle a
problem.
The machine uses some fancy algorithms to simplify the reality and transform
this discovery into a model. Therefore, the learning stage is used to describe the data
and summarize it into a model.
For instance, the machine is trying to understand the relationship between the
wage of an individual and the likelihood to go to a fancy restaurant. It turns out the
machine finds a positive relationship between wage and going to a high-end restaurant:
This is the model Inferring.
CS18A2 2021-2022 21
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
CS18A2 2021-2022 22
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
1. Define a question
2. Collect data
3. Visualize data
4. Train algorithm
5. Test the Algorithm
6. Collect feedback
7. Refine the algorithm
8. Loop 4-7 until the results are satisfying
9. Use the model to make a prediction
Once the algorithm gets good at drawing the right conclusions, it applies that
knowledge to new sets of data.
Machine learning can be grouped into two broad learning tasks: Supervised and
Unsupervised. There are many other algorithms
CS18A2 2021-2022 23
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
Supervised learning :
An algorithm uses training data and feedback from humans to learn the
relationship of given inputs to a given output. For instance, a practitioner can use
marketing expense and weather forecast as input data to predict the sales of cans.
You can use supervised learning when the output data is known. The algorithm
will predict new data.
CS18A2 2021-2022 24
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
CS18A2 2021-2022 25
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
● Classification task
● Regression task
Classification
Imagine you want to predict the gender of a customer for a commercial. You will
start gathering data on the height, weight, job, salary, purchasing basket, etc. from your
customer database. You know the gender of each of your customer, it can only be male
or female. The objective of the classifier will be to assign a probability of being a male
or a female (i.e., the label) based on the information (i.e., features you have collected).
When the model learned how to recognize male or female, you can use new data to
make a prediction. For instance, you just got new information from an unknown
customer, and you want to know if it is a male or female. If the classifier predicts male =
70%, it means the algorithm is sure at 70% that this customer is a male, and 30% it is a
female.
CS18A2 2021-2022 26
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
The label can be of two or more classes. The above example has only two
classes, but if a classifier needs to predict object, it has dozens of classes (e.g., glass,
table, shoes, etc. each object represents a class)
Regression
When the output is a continuous value, the task is a regression. For instance, a
financial analyst may need to forecast the value of a stock based on a range of feature
like equity, previous stock performances, macro economics index. The system will be
trained to estimate the price of the stocks with the lowest possible error.
Unsupervised learning
In unsupervised learning, an algorithm explores input data without being given
an explicit output variable (e.g., explores customer demographic data to identify
patterns)
You can use it when you do not know how to classify the data, and you want the
algorithm to find patterns and classify the data for you.
K-means clustering Puts data into some groups (k) that each contains Clustering
data with similar characteristics (as determined by
the model, not in advance by humans).
CS18A2 2021-2022 27
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
Augmentation:
● Machine learning, which assists humans with their day-to-day tasks, personally
or commercially without having complete control of the output. Such machine
learning is used in different ways such as Virtual Assistant, Data analysis,
software solutions. The primary user is to reduce errors due to human bias.
Automation:
● Machine learning, which works entirely autonomously in any field without the
need for any human intervention. For example, robots performing the essential
process steps in manufacturing plants.
Finance Industry
Government organization
● The government makes use of ML to manage public safety and utilities. Take the
example of China with the massive face recognition. The government uses
Artificial intelligence to prevent jaywalker.
CS18A2 2021-2022 28
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
Healthcare industry
● Healthcare was one of the first industry to use machine learning with image
detection.
Marketing
Machine learning gives terrific results for visual pattern recognition, opening up
many potential applications in physical inspection and maintenance across the entire
supply chain network.
Unsupervised learning can quickly search for comparable patterns in the diverse
dataset. In turn, the machine can perform quality inspection throughout the logistics hub,
shipment with damage and wear.
For instance, IBM's Watson platform can determine shipping container damage.
Watson combines visual and systems-based data to track, report and make
recommendations in real-time.In past year stock manager relies extensively on the
primary method to evaluate and forecast the inventory. When combining big data and
machine learning, better forecasting techniques have been implemented (an
improvement of 20 to 30 % over traditional forecasting tools). In term of sales, it means
an increase of 2 to 3 % due to the potential reduction in inventory costs.
For example, everybody knows the Google car. The car is full of lasers on the
roof which are telling it where it is regarding the surrounding area. It has radar in the
CS18A2 2021-2022 29
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
front, which is informing the car of the speed and motion of all the cars around it. It uses
all of that data to figure out not only how to drive the car but also to figure out and
predict what potential drivers around the car are going to do. What's impressive is that
the car is processing almost a gigabyte a second of data.
Reinforcement Learning
● Q-learning
● Deep Q network
● State-Action-Reward-State-Action (SARSA)
● Deep Deterministic Policy Gradient (DDPG)
AI in Finance:
The financial technology sector has already started using AI to save time, reduce
costs, and add value. Deep learning is changing the lending industry by using more
robust credit scoring. Credit decision-makers can use AI for robust credit lending
applications to achieve faster, more accurate risk assessment, using machine intelligence
to factor in the character and capacity of applicants.
AI in HR:
CS18A2 2021-2022 30
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
At that time, Under Armour had all of the 'must have' HR technology in place
such as transactional solutions for sourcing, applying, tracking and on boarding but
those tools weren't useful enough. Under armour choose HireVue, an AI provider for
HR solution, for both on-demand and live interviews. The results were bluffing; they
managed to decrease by 35% the time to fill. In return, the hired higher quality staffs.
AI in Marketing:
CS18A2 2021-2022 31
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
Feature Need to understand the features that No need to understand the best
engineering represent the data feature that represents the data
Execution time From few minutes to hours Up to weeks. Neural Network needs
to compute a significant number of
weights
In the table below, we summarize the difference between machine learning and deep
learning.
CS18A2 2021-2022 32
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
With machine learning, you need fewer data to train the algorithm than deep
learning. Deep learning requires an extensive and diverse set of data to identify the
underlying structure. Besides, machine learning provides a faster-trained model. Most
advanced deep learning architecture can take days to a week to train. The advantage of
deep learning over machine learning is it is highly accurate. You do not need to
understand what features are the best representation of the data; the neural network
learned how to select critical features. In machine learning, you need to choose for
yourself what features to include in the model.
PYTHON 3
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords frequently
where as other languages use punctuation, and it has fewer syntactical constructions
than other languages.
CS18A2 2021-2022 33
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
Python was developed by Guido van Rossum in the late eighties and early
nineties at the National Research Institute for Mathematics and Computer Science in the
Netherlands.
Python is copyrighted. Like Perl, Python source code is now available under the
GNU General Public License (GPL).
Python Features
Easy-to-learn: Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.
Easy-to-read: Python code is more clearly defined and visible to the eyes.
CS18A2 2021-2022 34
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
A broad standard library: Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode: Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
Portable: Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.
Extendable: You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.
GUI Programming: Python supports GUI applications that can be created and
ported to many system calls, libraries, and windows systems, such as Windows
MFC, Macintosh, and the X Window system of Unix.
Scalable: Python provides a better structure and support for large programs than
shell scripting.
Apart from the above-mentioned features, Python has a big list of good features, few
are listed below:
CS18A2 2021-2022 35
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
NUMPY:
Numpy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects (such as
masked arrays and matrices), and an assortment of routines for fast operations on arrays,
including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete
Fourier transforms, basic linear algebra, basic statistical operations, random simulation
and much more. At the core of the NumPy package, is the array object. This
encapsulates dimensional arrays of homogeneous data types, with many operations
being performed in compiled code for performance. There are several important
differences between NumPy arrays and the standard Python sequences NumPy arrays
have a fixed size at creation, unlike Python lists (which can grow dynamically). hanging
the size of an array will create a new array and delete the original.
The elements in a NumPy array are all required to be of the same data type, and
thus will be the same size in memory. The exception: one can have arrays of (Python,
including NumPy) objects, thereby allowing for arrays of different sized elements.
NumPy arrays facilitate advanced mathematical and other types of operations on
large numbers of data. Typically, such operations are executed more efficiently and
with less code than is possible using Python’s built-in sequences.
A growing plethora of scientific and mathematical Python-based packages are
using NumPy arrays; though these typically support Python-sequence input, they
convert such input to NumPy arrays prior to processing, and they often output
NumPy arrays. In other words, in order to efficiently use much (perhaps even most)
of today’s scientific/mathematical Pythonbased software, just knowing how to use
Python’s built-in sequence types is insufficient - one also needs to know how to use
NumPy arrays.
The points about sequence size and speed are particularly important in scientific
computing.
As a simple example, consider the case of multiplying each element in a 1-D
sequence with the corresponding element in another sequence of the same length. If the
data are stored in two Python lists, a and b, we could iterate over each element:
The Numeric Python extensions (NumPy henceforth) is a set of extensions to the
Python programming language which allows Python programmers to efficiently
CS18A2 2021-2022 36
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
manipulate large sets of objects organized in grid like fashion. These sets of objects are
called arrays, and they can have any number of dimensions: one dimensional arrays are
similar to standard Python sequences, two-dimensional arrays are similar to matrices
from linear algebra. Note that one-dimensional arrays are also different from any other
Python sequence, and that two-dimensional matrices are also different from the matrices
of linear algebra, in ways which we will mention later in this text. Why are these
extensions needed? The core reason is a very prosaic one, and that is that manipulating a
set of a million numbers in Python with the standard data structures such as lists, tuples
or classes is much too slow and uses too much space.
Anything which we can do in NumPy we can do in standard Python – we just may
not be alive to see the program finish. A more subtle reason for these extensions
however is that the kinds of operations that programmers typically want to do on arrays,
while sometimes very complex, can often be decomposed into a set of fairly standard
operations. This decomposition has been developed similarly in many array languages.
In some ways, NumPy is simply the application of this experience to the Python
language – thus many of the operations described in NumPy work the way they do
because experience has shown that way to be a good one, in a variety of contexts. The
languages which were used to guide the development of NumPy include the infamous
APL family of languages, Basis, MATLAB, FORTRAN, S and S+, and others. This
heritage will be obvious to users of NumPy who already have experience with these
other languages. This tutorial, however, does not assume any such background, and all
that is expected of the reader is a reasonable working knowledge of the standard Python
language. This document is the “official” documentation for NumPy. It is both a tutorial
and the most authoritative source of information about NumPy with the exception of the
source code. The tutorial material will walk you through a set of manipulations of
simple, small, arrays of numbers, as well as image files. This choice was made because:
A concrete data set makes explaining the behavior of some functions much easier to
motivate than simply talking about abstract operations on abstract data sets;
• Every reader will at least an intuition as to the meaning of the data and
organization of image Files.
• The result of various manipulations can be displayed simply since the data set
has a natural graphical representation. All users of NumPy, whether interested
CS18A2 2021-2022 37
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
in image processing or not, are encouraged to follow the tutorial with a working
NumPy installation at their side, testing the examples, and, more importantly,
transferring the understanding gained by working on images to their specific
domain. The best way to learn is by doing the aim of this tutorial is to guide you
along this “doing.”
PANDAS:
Pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool, built on top of the Python programming language. Pandas is a
software library written for the Python programming language for data manipulation and
analysis. In particular, it offers data structures and operations for manipulating
numerical tables and time series. pandas is a Python package that provides fast, flexible,
and expressive data structures designed to make working with structured (tabular,
multidimensional, potentially heterogeneous) and time series data both easy and
intuitive. It aims to be the fundamental high-level building block for doing practical, real
world data analysis in Python. Additionally, it has the broader goal of becoming the
most powerful and flexible open source data analysis / manipulation tool available in
any language. It is already well on its way toward this goal.
Pandas is well suited for many different kinds of data:
1. 1.Tabular data with heterogeneously-typed columns, as in an SQL table or Excel
spreadsheet.
2. Ordered and unordered (not necessarily fixed-frequency) time series data.
3. Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
column labels.
4. Any other form of observational / statistical data sets. The data actually need not
be labelled at all to be placed into a pandas data structure.
5. The two primary data structures of pandas, Series (1-dimensional) and Data
Frame (2- dimensional), handle the vast majority of typical use cases in finance,
statistics, social science, and many areas of engineering. For R users, Data Frame
provides everything that R’s provides and much more. pandas is built on top of
Numpy and is intended to integrate well within a scientific computing
environment with many other 3rd party libraries.
Here are just a few of the things that pandas does well:
CS18A2 2021-2022 38
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
MATPLOTLIB:
Matplotlib is a comprehensive library for creating static, animated, and
interactive
visualizations in Python.
Visualizations in Python:
CS18A2 2021-2022 39
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
scripts, the Python and IPython shell, web application servers, and various graphical
user interface toolkits.
SEABORN
5. High-level abstractions for structuring multi-plot grids that let you easily build
complex visualizations
6. Concise control over matplotlib figure styling with several built-in themes
7. Tools for choosing color palettes that faithfully reveal patterns in your data
CS18A2 2021-2022 40
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
6.2.SOURCE CODE:
import pandas as pd
import numpy as np
np.random.seed(2)
data.head(130000)
data['normalizedAmount']=StandardScaler().fit_transform(data['Amount'].values.reshap
e(-1,1))
data = data.drop(['Amount'],axis=1)
data.head()
data = data.drop(['Time'],axis=1)
data.head()
y.head()
X_train.shape
X_test.shape
random_forest = RandomForestClassifier(n_estimators=100)
CS18A2 2021-2022 41
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CODING
random_forest.fit(X_train,y_train.values.ravel())
y_pred = random_forest.predict(X_test)
random_forest.score(X_test,y_test)
cnf_matrix = confusion_matrix(y_test,y_pred)
labels = [0,1]
plt.show()
y_pred = random_forest.predict(X)
print(y_pred)
cnf_matrix = confusion_matrix(y,y_pred.round())
plt.show()
CS18A2 2021-2022 42
CHAPTER 7
SYSTEM TESTING
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM TESTING
7. SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to
discover every conceivable fault or weakness in a work product. It provides a way to
check the functionality of components, sub-assemblies, assemblies and/or a finished
product. It is the process of exercising software with the intent of ensuring that the
Software system meets its requirements and user expectations and does not fail in an
unacceptable manner.
There are various types of each test type addresses a specific testing requirement.
CS18A2 2021-2022 43
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM TESTING
CS18A2 2021-2022 44
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM TESTING
CS18A2 2021-2022 45
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION SYSTEM TESTING
are not available Black-box models often result in 1pc to 3pc better accuracy than
whitebox models, but you sacrifice transparency and accountability.
CS18A2 2021-2022 46
CHAPTER 8
CS18A2 2021-2022 47
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION OUTPUT SCREENSHOTS
CS18A2 2021-2022 48
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION OUTPUT SCREENSHOTS
CS18A2 2021-2022 49
CHAPTER 9
CONCLUSION
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION CONCLUSION
9. CONCLUSION
In this paper, Machine learning technique like Logistic regression, Decision
Tree and Random forest were used to detect the fraud in credit card system. Sensitivity,
Specificity, accuracy and error rate are used to evaluate the performance for the
proposed system. The accuracy for logistic regression, Decision tree and random forest
classifier are 90.0, 94.3, and 99.9 respectively. By comparing all the three method,
found that random forest classifier is better than the logistic regression and decision tree.
CS18A2 2021-2022 50
CHAPTER 10
FUTURE ENHANCEMENTS
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION FUTURE ENHANCEMENT
The very nature of this project allows for multiple algorithms to be integrated
together as modules and their results can be combined to increase the accuracy of the
final result.
This model can further be improved with the addition of more algorithms into it.
However, the output of these algorithms needs to be in the same format as the others.
Once that condition is satisfied, the modules are easy to add as done in the code. This
provides a great degree of modularity and versatility to the project.
CS18A2 2021-2022 51
CHAPTER 11
BIBLIOGRAPHY
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION BIBLIOGRAPHY
11. REFERENCES
[1] Andrew. Y. Ng, Michael. I. Jordan, "On discriminative vs. generative classifiers: A
comparison of logistic regression and naive bayes", Advances in neural information
processing systems, vol. 2, pp. 841-848, 2002.
[4] B.Meena, I.S.L.Sarwani, S.V.S.S.Lakshmi,” Web Service mining and its techniques
in Web Mining” IJAEGT,Volume 2,Issue 1 , Page No.385-389.
[7] K. Chaudhary, B. Mallick, "Credit Card Fraud: The study of its impact and detection
techniques", International Journal of Computer Science and Network (IJCSN), vol. 1,
no. 4, pp. 31-35, 2012, ISSN ISSN: 2277-5420.
CS18A2 2021-2022 52
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION BIBLIOGRAPHY
[13] Y. Sahin, E. Duman, "Detecting credit card fraud by ANN and logistic regression",
Innovations in Intelligent Systems and Applications (INISTA) 2011 International
Symposium, pp. 315-319, 2011.
[16] Y. Sahin, S. Bulkan, E. Duman, "A cost-sensitive decision tree approach for fraud
detection", Expert Systems with Applications, vol. 40, no. 15, pp. 5916- 5923, 2013.
[17] Y. Kou, C-T. Lu, S. Sinvongwattana, Y-P. Huang, "Survey of Fraud Detection
Techniques", Proceedings of the 2004 IEEE International Conference on Networking
Sensing & Control, 2004.
[18] Y. Sahin, E. Duman, "Detecting Credit Card Fraud by Decision Trees and Support
Vector Machines", Proceedings of International Multi-Conference of Engineers.
CS18A2 2021-22 53
CHAPTER 11
APPENDIX
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
12. APPENDIX
ANACONDA NAVIGATOR
Jupyter Notebook
QTConsole
Spyder
VSCode
Glueviz
Orange 3 App
CS18A2 2021-2022 54
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
Rodeo
RStudio
Advanced anaconda users can also build your own Navigator applications
Python Installation Tutorial
In this tutorial, we will show how to install Python on your system. The Python is free of
cost.
Step 2: You will see that following page appears. By default, Anaconda shows you the
download link for Mac operating system. If you have Mac, then you can click “64-Bit
Graphical Installer” under Python 3.7 version to start downloading the file. In this
computer, Windows is the operating system, so we will select Windows as shown below.
If you have Linux as operating system, then you can select Linux option and download
file in similar manner as Mac. Mac and Linux users can skip Step 3 & 4.
CS18A2 2021-2022 55
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
Step 3: You can see that there are two options for Windows: 64-Bit and 32-Bit. You need
to find out whether your system is 64-Bit or 32-Bit and accordingly you need to select the
file for your system. To do so, go to your desktop home screen, right click on
‘Computer’ icon, then select Properties.
CS18A2 2021-2022 56
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
This will show you basic information about your system. Look for “System Type” as
shown below and check whether it is 64-bit or 32-bit. For this computer, we see that
Windows system type is 64 -bit.
Step 4: Now, go back to your browser and then click “64-Bit Graphical Installer (662
MB)” as this computer is 64 bit (as identified in Step 3)
CS18A2 2021-2022 57
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
The installer will start downloading the file (this may take a while) and will appear in
bottom left of your browser (if you are using google chrome) as shown below.
Step 5: When the file is completely downloaded, click on the file. You will see that
following window appears. Click on ‘Run’, and then click ‘Next’ button.
CS18A2 2021-2022 58
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
A new window will appear asking you to accept the terms of agreement,
select “I Agree”.
CS18A2 2021-2022 59
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
Step 6: Make sure you have the required free space for software installation. which you
can check as shown below. Then click Next. (If you don’t have required space, then you
need to delete some of your items to free the space)
Step 7: You will see that following window appears. Click on Install.
CS18A2 2021-2022 60
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
This will lead you to installation page showing the progress of installation. It will take
some time for the software to get installed.
After all the files are extracted, the “Next” button will get enabled. Click on Next button.
CS18A2 2021-2022 61
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
Then following window will appear. Click on Finish button to complete the installation.
Now Anaconda has been installed on your computer.
Step 8: Type ‘anaconda navigator’ in search box and click on the icon indicated
below.
CS18A2 2021-2022 62
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
Step 9: You will see that the Anaconda Navigator icon appears on the
bottom toolbar. Click on the icon tosee the contents of Navigator.
You will see that following page appears showing different options available
which you can use. For CRE, we need Spyder. So, click on ‘Launch’ under
Spyder section to install Spyder on your computer.
CS18A2 2021-2022 63
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
Step 10: Type ‘spyder’ in search box and click on the icon indicated below.
CS18A2 2021-2022 64
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
A pop- up window will appear asking your permission to allow access to Python.
Click on “Allow access”
Step 11: The following window will appear showing the Spyder interface. Now, you
are ready to runPython LEP codes or create a new Python code.
CS18A2 2021-2022 65
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
History of Python:
Python was developed by Guido van Rossum in the late eighties and early nineties
at the National Research Institute for Mathematics and Computer Science in the
Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++,
Algol-68, SmallTalk, Unix shell, and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the
GNU General Public License (GPL).
Python Features
CS18A2 2021-2022 66
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
Easy-to-read: Python code is more clearly defined and visible to the eyes.
A broad standard library: Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.
Interactive Mode: Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
Portable: Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.
Extendable: You can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.
Scalable: Python provides a better structure and support for large programs than
shell scripting.
CS18A2 2021-2022 67
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
Apart from the above-mentioned features, Python has a big list of good features, few are
listed below:
CS18A2 2021-2022 68
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
The elements in a NumPy array are all required to be of the same data type, and
thus will be the same size in memory. The exception: one can have arrays of
(Python, including NumPy) objects, thereby allowing for arrays of different sized
elements.
The points about sequence size and speed are particularly important in scientific
computing.
As a simple example, consider the case of multiplying each element in a 1-D
sequence with the corresponding element in another sequence of the same length. If the
data are stored in two Python lists, a and b, we could iterate over each element:
The Numeric Python extensions (Numpy henceforth) is a set of extensions to the
Python programming language which allows Python programmers to efficiently
manipulate large sets of objects organized in grid like fashion. These sets of objects are
called arrays, and they can have any number of dimensions: one dimensional arrays are
similar to standard Python sequences, two-dimensional arrays are similar to matrices from
linear algebra. Note that one-dimensional arrays are also different from any other Python
sequence, and that two-dimensional matrices are also different from the matrices of linear
algebra, in ways which we will mention later in this text. Why are these extensions
needed? The core reason is a very prosaic one, and that is that manipulating a set of a
million numbers in Python with the standard data structures such as lists, tuples or classes
is much too slow and uses too much space.
CS18A2 2021-2022 69
A MACHINE LEARNING MODEL FOR
ONLINE FRAUD DETECTION APPENDIX
This decomposition has been developed similarly in many array languages. In some
ways, NumPy is simply the application of this experience to the Python language – thus
many of the operations described in NumPy work the way they do because experience has
shown that way to be a good one, in a variety of contexts. The languages which were used
to guide the development of NumPy include the infamous APL family of languages,
Basis, MATLAB, FORTRAN, S and S+, and others. This heritage will be obvious to
users of NumPy who already have experience with these other languages. This tutorial,
however, does not assume any such background, and all that is expected of the reader is a
reasonable working knowledge of the standard Python language. This document is the
“official” documentation for NumPy. It is both a tutorial and the most authoritative source
of information about NumPy with the exception of the source code. The tutorial material
will walk you through a set of manipulations of simple, small, arrays of numbers, as well
as image files. This choice was made because:
A concrete data set makes explaining the behavior of some functions much easier to
motivate than simply talking about abstract operations on abstract data sets;
• Every reader will at least an intuition as to the meaning of the data and
organization of image Files.
• The result of various manipulations can be displayed simply since the data set has
a natural graphical representation. All users of NumPy, whether interested in
image processing or not, are encourage
• follow the tutorial with a working NumPy installation at their side, testing the
examples, and, more importantly, transferring the understanding gained by
working on images to their specific domain. The best way to learn is by doing the
aim of this tutorial is to guide you along this “doing.”
CS18A2 2021-2022 70
PO MAPPING
PROGROM DESCRIPTION
OUTCOMES
Engineering knowledge
Problem analysis
Design/development of
solutions
Conduct investigation
of complex problems
Communication
Project managementand
finance
Life-long learning