NC Report
NC Report
Bachelor of Engineering
in
Students Name
NES’s
2024 – 2025
NES’s
GANGAMAI COLLEGE OF ENGINEERING NAGAON
DEPARTMENT OF COMPUTER ENGINEERING
CERTIFICATE
This is to certify that the Project entitled “Credit Card Fraud Detection
machine” has been carried out by “Nakshatra Chaudhri” under my
guidance in partial fulfillment of the degree of Bachelor of Engineering
in Computer Engineering of North Maharashtra University, Jalgaon
during the academic year 2024 -2025. To the best of my knowledge and
belief this work has not been submitted elsewhere for the award of any
other degree.
Date: Guide
Place: Nagaon
Nakshatra Chaudhri
Students Name
ABSTRACT
It is vital that credit card companies are able to identify fraudulent credit card transactions so
that customers are not charged for items that they did not purchase. Such problems can be
tackled with Data Science and its importance, along with Machine Learning, cannot be
overstated. This project intends to illustrate the modelling of a data set using machine learning
with Credit Card Fraud Detection. The Credit Card Fraud Detection Problem includes
modelling past credit card transactions with the data of the ones that turned out to be fraud.
This model is then used to recognize whether a new transaction is fraudulent or not. Our
objective here is to detect 100% of the fraudulent transactions while minimizing the incorrect
fraud classifications. Credit Card Fraud Detection is a typical sample of classification. In this
process, we have focused on analyses and pre-processing data sets as well as the deployment
of multiple anomaly detection algorithms such as Local Outlier Factor and Isolation Forest
algorithm on the PCA transformed Credit Card Transaction data.
Table of Contents
Nowadays Credit card usage has been drastically increased across the world, now
people believe in going cashless and are completely dependent on online
transactions. The credit card has made the digital transaction easier and more
accessible. A huge number of dollars of loss are caused every year by the criminal
credit card transactions. Fraud is as old as mankind itself and can take an unlimited
variety of different forms. The PwC global economic crime survey of 2017 suggests
that approximately 48% of organizations experienced economic crime. Therefore,
there’s positively a necessity to unravel the matter of credit card fraud detection.
Moreover, the growth of new technologies provides supplementary ways in which
criminals may commit a scam.
The use of credit cards is predominant in modern day society and credit card fraud
has been kept on increasing in recent years. Huge Financial losses have been
fraudulent effects on not only merchants and banks but also the individual person
who are using the credits. Fraud may also affect the reputation and image of a
merchant causing non-financial losses that. For example, if a cardholder is a victim
of fraud with a certain company, he may no longer trust their business and choose a
competitor. Fraud Detection is the process of monitoring the transaction behavior of
a cardholder to detect whether an incoming transaction is authentic and authorized
or not otherwise it will be detected as illicit. In a planned system, we are applying
the random forest algorithm for classifying the credit card
dataset. Random Forest is an associate in the nursing algorithmic program for
classification and regression. Hence, it is a collection of decision tree classifiers. The
random forest has an advantage over the decision tree as it corrects the habit of over
fitting to their training set. A subset of the training set is sampled randomly so that
to train each individual tree and then a decision tree is built, each node then splits on
a feature designated from a random subset of the complete feature set. Even for large
data sets with many features and data instances, training is extremely fast in the
random forest and because each tree is trained independently of the others. The
Random Forest algorithm has been found to provide a good estimate of the
generalization error and to be resistant to overfitting.
1.5 Objective and scope
LITERUTRE SURVEY
Fraudulent Detection in Credit Card System Using SVM & Decision Tree
(Nakshatra Chaudhri): With growing advancement in the electronic commerce field,
fraud is spreading all over the world, causing major financial losses. In the current
scenario, Major cause of financial losses is credit card fraud; it not only affects
tradesperson but also individual clients. Decision tree, Genetic algorithm, Meta
learning strategy, neural network, HMM are the presented methods used to detect
credit card frauds. In contemplating system for fraudulent detection, artificial
intelligence concept of Support Vector Machine (SVM) & decision tree is being
used to solve the problem. Thus by the implementation of this hybrid approach,
financial losses can be reduced to greater extent. Machine Learning Based Approach
to Financial Fraud Detection Process in Mobile Payment System (Dahee Choi and
Kyungho Lee): Mobile payment fraud is the unauthorized use of mobile transaction
through identity theft or credit card stealing to fraudulently obtain money. Mobile
payment fraud is a fast growing issue through the emergence of smartphone and
online transition services. In the real world, a highly accurate process in mobile
payment fraud detection is needed since financial fraud causes financial loss.
Therefore, our approach proposed the overall process of detecting mobile payment
fraud based on machine learning, supervised and unsupervised method to detect
fraud and process large amounts of financial data. Moreover, our approach
performed sampling process and feature selection process for fast processing with
large volumes of transaction data and to achieve high accuracy in mobile payment
detection. F-measure and ROC curve are used to validate our proposed model. 5.
PURPOSE OF THE PROJECT We propose a Machine learning model to detect
fraudulent credit card activities in online financial transactions. Analyzing fake
transactions manually is impracticable due to vast amounts of data and its
complexity. However, adequately given informative features, could make it is
possible using Machine Learning. This hypothesis will be explored in the project.
To classify fraudulent and legitimate credit card transaction by supervised learning
Algorithm such as Random forest. To help us to get awareness about the fraudulent
and without loss of any financially. There have also been efforts to progress from a
completely new aspect. Attempts have been made to improve the alert- feedback
interaction in case of fraudulent transaction. In case of fraudulent transaction, the
authorised system would be alerted and a feedback would be sent to deny the
ongoing transaction. Artificial Genetic Algorithm, one of the approaches that shed
new light in this domain, countered fraud from a different direction.
CHAPTER -3
The feasibility of the project is analyzed in this phase and business proposal is put
forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out. This is to
ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are
Economical Feasibility: This study is carried out to check the economic impact
hat the system will have on the organization. The amount of fund that the company
can pour into the research and development of the system is limited. The
expenditures must be justified. Thus the developed system as well within achieved
because most of the technologies used are freely available. Only the customized
products had to be purchased achieved because most of the technologies used are
freely available. Only the customized products had to be purchased.
Technical feasibility: This study is carried out to check the technical feasibility, that
is, the technical requirements of the system. Any system developed must not have a
high demand on the available technical resources. This will lead to high demands on
the available technical resources. This will lead to high demands being placed
on the client. The developed system must have a modest requirement, as only
minimal or null changes are required for implementing this system.
Effort Allocation:
1. Lack of Adaptability: Conventional methods had difficulties in effectively
adjusting to the dynamic nature of evolving fraud practice. Once those engaging in
fraudulent activities were familiar with the established regulations and limitations,
they could readily adapt their strategies in order to circumvent them.
1. Damage Costs
The damage cost in cost model characterizes the amount of damage by an attack
when anomaly detection is unavailable. The defined cost function per attack should
be used to measure the cost of damage. This means, that rather than simply
measuring False Negative FN, as a rate of missed anomalies, rather we measure
total loss based upon DCost(s,a), which varies with the service(s) and the particular
type of attack(a).
2. Challenge Costs
The challenge cost is the cost to act upon an alarm when there is indication of a
potential intrusion. For Intrusion Detection System (IDS), one may consider
suspending a suspicious connection and attempting to stop, by analysing the service
request (SR), if any system resources have been blocked from other legitimate users.
As a first cut, these costs can be estimated by the amount of CPU and disk resources
needed to challenge a suspicious connection. In simple, instead of estimating the
challenge cost for each intrusive connection, we can determine “average” the
challenge costs to a single challenge cost per potential intrusive connection, i.e.,
Overhead.
3.Operational Costs
The major issue in operational costs for IDS (Intrusion Detection System) is the
amount of resources for extracting and test features from raw traffic data. Some of
the features are costlier than others to gather and at the times, costlier features are
more informative for detecting intrusions overhead, therefore clearly ignoring these
attacks saves cost. Hence, for a true positive (TP), if overhead > DCost(s; a), the
intrusion is not challenged and the loss is DCost(s; a). But if overhead < DCost(s;
a), the intrusion is challenged and the loss is limited to overhead. FP cost. When IDS
falsely allegate an event of being attack and the attack type is regarded as high cost,
a challenge will ensue. Naturally, when evaluating IDS we should concern with
measuring this loss. For this, we define the loss is just overhead for False Positive
(FP), True Negative (TN) cost. IDS correctly decide that a connection is normal and
truly not an attack. Therefore, as far we have only considers costs that depend on the
outcome of IDS, we now put together the operational cost, Op-Cost. In this point of
Op-Cost which measures the cost of computing values of characteristics in the IDS.
5. Card Association Fees Visa and MasterCard have taken fairly strict programs that
penalize merchants generating excessive charge backs. Generally, if a merchant
exceeds established chargeback rates for any three month period, the merchant could
be penalized with a fee for every chargeback.
6. Merchant Bank Fees In addition to the penalties charged by card associations, the
merchant has to pay an additional processing fee to the acquiring bank for every
chargeback claimed.
--Hardware
OS – Windows 7, 8 and 10 (32 and 64 bit)
RAM – 4GB
--Software
Python
Anaconda
Performance Requirements
Performance requirements tells about the software capability to respond on users’
action such as:
• Upon running the application, it shouldn’t take more than 3 seconds.
• Data validation shouldn’t take above 5 seconds.
• Result generation should be achieved within 5 seconds
SOFTWARE REQUIREMENT SPECIFIFCATION:
Operating system : Windows 8/10.
IDE Tool : PyCharm
Coding Language : Python 3.6
APIs : Numpy, Pandas, Py-Spark, Matplotlib
CHAPTER -5 SYSTEM DESIGN
SYSTEM ARCHITECTURE
Above fig shows the process of CCFDS. This system model accepts real time
customer credit card transaction database.it is more important to find fraud rate of
credit card.
Outlier detection: It measures the distance between each similar data to the clustering
technique. The values which are not follows the trained data consider as outlier.
Classification: As the dataset is imbalanced, many classifiers show bias for majority
classes. PySpark library is applied as a SQL-like analysis to a large amount of
structured or semi-structured data. GBT Classifier does the classification of data
coming through the stream.
DATA FLOW DIAGRAM: The DFD used as communication tool between system
and user.it is a simple representation of the complete project process. Transaction
detection activity follows three phases.1.Data exploration 2.Data prepressing 3.data
classifications.
UML DIAGRAM