0% found this document useful (0 votes)
9 views8 pages

Enhancing Financial Security With Deep Learning and OpenCV For Pan

The document discusses a study focused on detecting fake PAN cards using deep learning and OpenCV, aiming to enhance financial security by identifying tampering through image analysis. It outlines the methodology, including the use of a structural similarity algorithm and convolutional neural networks (CNN) for image classification, while emphasizing the importance of specific Python libraries and tools for implementation. The study aims to provide organizations with a reliable system for verifying the authenticity of PAN cards, thereby preventing fraudulent activities.

Uploaded by

lasyapallapu16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views8 pages

Enhancing Financial Security With Deep Learning and OpenCV For Pan

The document discusses a study focused on detecting fake PAN cards using deep learning and OpenCV, aiming to enhance financial security by identifying tampering through image analysis. It outlines the methodology, including the use of a structural similarity algorithm and convolutional neural networks (CNN) for image classification, while emphasizing the importance of specific Python libraries and tools for implementation. The study aims to provide organizations with a reliable system for verifying the authenticity of PAN cards, thereby preventing fraudulent activities.

Uploaded by

lasyapallapu16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Enhancing Financial Security with Deep Learning

and OpenCV For Pan Card Integrity


Vyshnavi Pamidi Uppalapati Priyanka Rajyalakshmi
Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering, Engineering, Engineering ,
Koneru Lakshmaiah Education Koneru Lakshmaiah Education Koneru Lakshmaiah Education
Foundation, Foundation, Foundation,
Vaddeswaram,India Vaddeswaram,India Vaddeswaram,India
[email protected] [email protected] @gmail.com

Jyosthna Mallu Lasya Priya Pallapu


Department of Computer Science and Department of Computer Science and
Engineering, Engineering,
Koneru Lakshmaiah Education Koneru Lakshmaiah Education
Foundation, Foundation,
Vaddeswaram,India Vaddeswaram,India
[email protected] [email protected]

Abstract— The study is focused on “Detecting fake PAN submitted photos with original PAN card images will be used
cards leveraging deep learning, OpenCV-Python, and in the proposed study. To find any differences or changes,
computer vision”. The primary objective is to facilitate the this program will examine the texture, colour, and form of
identification of any tampering with PAN cards by analyzing the photos, among other things.as multi-levelled equations,
images using computer vision. The result of this study is graphics, and tables are not prescribed, although the various
expected to provide significant assistance to organizations in table text styles are provided. The formatter will need to
independently verifying the authenticity of PAN cards create these components, incorporating the applicable criteria
submitted by their employees, clients, or any other individuals.
that follow.
For this purpose, the project will employ a structural similarity
algorithm to compare images of original PAN cards with those Real-time image processing is made possible by the popular
submitted by the users. The outcome of this study has the open-source computer vision package OpenCV. It is very
potential to be an important step in preventing fraudulent lightweight and efficient, and it is based on the C++
activities related to PAN cards. With more than 2500 image programming language. The fact that OpenCV has over 2500
processing algorithms at its disposal, OpenCV is the go-to tool image processing methods that are useful for a variety of
for any job. The ANN provides remarkable precision and applications is one of its key features. In the subject of
efficiency in data processing, emulating biological neurons. The
artificial intelligence, computer vision focuses on giving
input, hidden, and output layers of neural networks offer
unmatched computational capacity. In order to process images,
machines the ability to understand visual input, including
CNN, an algorithm used in deep learning, are required. In pictures and movies. Creating methods and algorithms that
addition to classifying various classes, it applies filters and adds let computers interpret, process, and analyse visual data is
weights and biases. The structure of the human brain served as what it entails. The application of artificial neural networks,
the model for deep learning, a branch of artificial intelligence which are based on biological neurons, is one of the main
and machine learning that can tackle complicated problems ideas in computer vision. An input layer, one or more hidden
that conventional algorithms are unable to. Python 3.x, layers, and an output layer make up a neural network's layers.
OpenCV 4.4.0, Numpy 1.19.3, Tensorflow 2.5.0, Scikit-learn Together, these layers perform intricate mathematical
0.24.2, MediaPipe 0.8.5, and Tqdm 4.60.0 are among the calculations to interpret and extract meaningful information
prerequisites needed for the study. The proposed work will from visual data.
train the classifier model that can distinguish between real and
fake PAN cards. The study will be utilizing the TensorFlow Realizing that visuals are more complicated than text-based
library, which is a powerful deep learning platform developed data is crucial. Therefore, processing images requires a
by Google. particular kind of algorithm. CNN, are useful in this
situation. One kind of deep learning technique that's
Keywords— OpenCV, Numpy, Tensorflow, MediaPipe, Tqdm commonly utilized for image processing applications is
, Convolutional Neural Networks(CNN), Artificial Neural CNN. CNNs can distinguish between several types of images
Network(ANN), Support vector regression (SVR), Deep learning, since they are particularly made to identify patterns in
Fraud Prevention images. They accomplish this by using a sequence of filters
on input images, together with learnable weights and biases
applied to the filters. These filters aid in identifying particular
I. INTRODUCTION visual elements including forms, lines, and edges.
This endeavour intends to create a computer vision-based On the other hand, deep learning is a branch of artificial
system that will assist enterprises in identifying any PAN intelligence and machine learning that draws inspiration from
card tampering. The principal aim is to furnish a dependable the structure of the human brain. It solves complicated issues
and impartial verification system for PAN cards furnished by that conventional machine learning methods are unable to
staff members, customers, or other persons. In order to do handle by using artificial neural networks. Deep learning
this, a structural similarity algorithm that can compare user- algorithms are perfect for tasks requiring great precision and
sophisticated decision-making because they can Wenxuan Li. There are now around two ways to identify
automatically learn from experience and improve. It is fraud. First, comparable criteria for manual detection in the
necessary to establish specific conditions. Python 3.x, financial domain should be established. Its high false
OpenCV 4.4.0, Numpy 1.19.3, Tensorflow 2.5.0, Scikit-learn positive rate, delayed updating, and poor detecting speed are
0.24.2, MediaPipe 0.8.5, and Tqdm 4.60.0 are among these its flaws. The automated recognition of the machine is an
required. These tools and libraries are essential for building a additional technique. Consequently, there is a lot of promise
classifier model that can accurately distinguish between real for using artificial intelligence techniques in fraud detection
and fake pan cards. given the current developments in the financial industry.
II. LITERATURE SURVEY SVR and CNN are now the two most used intelligent
techniques for fraud detection. The outcomes demonstrate
The research paper "Machine Learning for PAN Card Fraud the model's strong performance in identifying fraudulent
Detection" was released by Rohini Hanchate, Shreyas activity and attest to its viability.
Yerole, Nazir Lalloti, Piyush Mahajan published by Journal
of Emerging Technologies and Innovative Research III. METHODOLOGY
(JETIR). On the other hand, as the demand for PAN Cards The phases for developing a pan-card identification system
has grown, so too have fraudulent operations involving are outlined in the study.
phony PAN Cards. CNN are the method suggested to
recognize authentic and phony PAN cards in order to solve Preserving the Authenticity of the Guidelines:
this problem. The suggested solution trains the CNN model
using a dataset of actual and fraudulent PAN Cards, In order to maintain the integrity of the standard, the study
enabling it to classify PAN Cards as authentic or false with a emphasizes the use of deep learning and computer vision to
high degree of accuracy. In addition to preventing tax fraud handle crucial aspects of pan card fraud detection. It ensures
and duplicate personal information, the suggested approach that when designing, the strictest fraud detection criteria are
may offer a dependable and effective solution to the issue of considered.
identifying phony PAN cards. This would enhance the
overall integrity of India's tax system. Choosing the Correct Library and Tools: To ensure
uniformity and accessibility, the study contains a list of the
The paper "Credit Card Fraud Detection System Using specific Python libraries and frameworks utilized in research
CNN" was written by P. Shanmugapriya, R. Shupraja, and work. Promotes ease of use by enabling developers to utilize
V. Madhumitha was published in IJRASET and examined the same environment and tools twice. When the tools
the application of machine learning techniques for credit needed to complete the project are readily available, it is
card fraud detection. The identification of credit card fraud simpler to follow the instructions in the manual.
in actual situations is the main goal of this project. One of
the most important moral problems facing the credit card STEPS TO BUILD A PAN CARD FRAUD
business is fraud. Furthermore, credit card use has become IDENTIFICATION SYSTEM USING PYTHON, DEEP
an indispensable and handy aspect of everyday finances due LEARNING AND COMPUTER VISION:
to the quick development of e-commerce. Credit card fraud
occurs when someone uses another person's credit card or To construct a classification algorithm, adhere to these
information without that person's knowledge. People are steps:
readily duped and lose their money in two main categories
of fraud: application fraud and behavioral fraud. The 1) Bring along the required packaging.
proposed work wants to categorize the credit card dataset 2) Data preprocessing.
through the use of both supervised and unsupervised 3) Design the architecture model.
techniques. Utilizing CNN, the proposed work will correlate 4) Make the model more proficient.
the data train to determine the correctness of the model. 5) Put the model to the test.
In the research of the IEEE publication "Credit Card Fraud
Detection with a Neural-Network," authored by Sushmito A. Bring along the required packaging:
Ghosh and Douglas L. Reilly. A neural network-based fraud
detection system was trained on a large sample of labeled The script begins by importing the necessary libraries from
credit card account transactions using data from a credit card tensorflow.keras, os, cv2, numpy, tqdm, tensorflow, and
issuer. It was then evaluated on a holdout data set that matplotlib.pyplot, along with modules from
included all account activity over a following two-month sklearn.model_selection like train_test_split and layers and
period. When compared to rule-based fraud detection models. Using a function called PreProcess, the script reads
processes, the network found much more fraudulent and preprocesses photos from a given directory in order to
accounts. It was also evaluated on a holdout data set that preprocess the data. Following that, the dataset is separated
included every account activity for the next two months. For into training and testing sets using train_test_split.
the purpose of detecting fraud on the bank's portfolio of
CLIC cards, the system has been implemented on an IBM An appropriate optimizer, loss function, and metrics are
3090 at Mellon Bank. included in the compilation of the model. The model is then
trained with the training data, which has a predetermined
A paper on ECIT, titled "Financial Fraud Detection Using batch size and number of epochs. The test data is used to
Deep Learning Based on Modified Tabular Learning," was assess the model's performance when training is complete.
published in 2022 and was written by Meiying Huang and The script also shows how to export the trained model and
predict labels from example photos. Using the TensorFlow
and OpenCV libraries, this Python script summarizes the
process of developing, testing, and evaluating a deep
learning model for pan card fraud detection.

The neural network model is then built using the Sequential


API from TensorFlow and includes convolutional layers
(Conv2D), a dropout layer (Dropout), a flattening layer
(Flatten), and dense layers (Dense). TensorFlow modules
are being used by the study to build model architecture.
Dense is a popular deep learning layer that is one of the key
modules. Connecting every output produced by the
preceding layer to every one of its neurons is the primary
function of the dense layer. Following receipt of the inputs Figure 1: Tampered PAN Card Image used for comparison.
from the preceding layer, every neuron inside the Dense
layer sends one output to the layer below. In the figure 1 is the example for the tampered pan card
image for comparison. A list of all the files and directories in
the specified path is returned by os.listdir(). The output
The primary role of a Dense layer in deep learning is this
['real', 'fake'] identifies the two unique categories that are
process of processing inputs and transmitting outputs. In contained in the dataset: ‘real' and 'fake,' which stand for
neural networks, the flatten layer is a kind of layer that is authentic and counterfeit pan card images, respectively.
used to convert multi-dimensional data into a one-
dimensional tensor. When feeding the output of one layer
into another that requires a flattened input, this is helpful.
For instance, if a layer has an input shape of (batch_size,
100,100,3), number_1 is the total number of elements in the
input tensor, and the output shape is (batch_size, 30000).

Conversely, dropout is a regularization strategy in neural


networks that keeps the model from overfitting. Dropout
randomly disregards (drops out) a portion of the network's
neurons during the training process. As a result, the model is
strengthened and becomes less prone to overfit the training
set. A hyperparameter that may be adjusted to maximize
performance is the likelihood of dropping out each neuron.
Dropout is generally applied to completely linked layers of Figure 2: Original PAN Card Image used for comparison.
the network.

In the figure 2 is the example for the original pan card image
B. Data preprocessing: for the comparison. The picture is resized to 100x100 using
the cv2.resize() function, converted to numpy arrays, then
The variable img_size, which indicates the ideal size for divided into training and testing sets using the
resizing the photos during preprocessing, is set to 100 in this sklearn.model_selection.train_test_split method in order to
section of the code. The path 'images', denoting the root prepare the data for a deep learning model that detects PAN
directory containing the picture data, is set to the variable card fraud. The float32 data type yields the best precision.
datadir. A list of categories (subdirectories) inside the root Using cv2.imread(), a loop progress visualizer known as
data directory is acquired using os.listdir(datadir), and this Tqdm reads an image from the local path and stores it in
list is saved in the variable CATEGORIES. Lastly, the memory as a numpy array.
categories included in the root data directory are shown using
print(CATEGORIES). By specifying the image size and
The highest accuracy is achieved with float32 data type using
determining the categories included in the dataset for
deep learning. Thus, split the array of scaled images by
additional processing, this code segment essentially gets
255.The text explains how to use train_test_split method of
ready for the next step of the data pretreatment procedure.
sklearn.model_selection to divide a dataset into training and
The root data directory has two folders, as may be seen testing sets. It mentions four inputs: random_state=42, x and
above. Every folder has the data that goes with it. Two y (features and labels). The dataset is transformed into
arguments are passed to the PreProcess method in the code: numpy arrays for deep learning frameworks. Finally, a
path, which is the path to the directory containing the image numpy array is created from each list for model assessment.
data, and img_size, which is the size to which images should Using OpenCV to convert pictures to grayscale. Since
be scaled. Two empty lists, x and y, which will hold the grayscale photos only have one channel, colored images
processed images and the corresponding class labels, are have three, making them more difficult for machines to
initialized by the function. grasp. This is because many image processing tools fail to
assist humans discover the crucial details in colored images.
C. Design the architecture model: internal weights and biases to enhance its capacity to
With the aid of over 2500 algorithms, OpenCV is a accurately classify various picture classes. From self-driving
computer vision library that can handle image processing vehicles to medical image analysis, CNNs are extensively
jobs. Because the backend is C++, it is lightweight and employed in a number of applications since they have
quick. The discipline of computer vision uses AI to extract generally shown to be very successful for tasks including
data from visual inputs. image classification, object recognition, and image
segmentation.
A computing system called an ANN is based on the neural
network of the human brain. The foundation of deep
learning algorithms is this. The network is designed to
resemble real neurons, which exchange electrical messages
with one another. ANNs, like real neurons, consist of linked
layers of nodes that execute intricate calculations.

Figure 3: Artificial Neural Network

In figure 3, the input layer, one or more hidden layers, and


an output layer make up these layers. After the data is Figure 5: Deep Learning
received by the input layer and processed by the hidden
levels, the output layer generates the final output. In figure 5, Artificial intelligence and machine learning
Applications for ANNs include speech recognition, picture encompass the very sophisticated technology of deep
identification, natural language processing, and many more. learning. It's a kind of neural network that draws inspiration
from the composition and operations of the human brain.
Since working with pictures in this instance, CNN is Deep learning's main goal is to solve very difficult and
required. Convolutional neural networks are an algorithmic complicated issues that conventional machine learning
kind that is often employed in machine learning, particularly algorithms are unable to handle. Several layers of artificial
in the field of deep learning, for image processing neurons are used in this technology to learn and interpret
applications. large volumes of data in various formats, including text,
audio, and pictures. Deep learning models are perfect for
applications like speech recognition, picture and video
analysis, natural language processing, and autonomous
driving because they can recognize patterns, extract features,
and make predictions with a high degree of accuracy.

In terms of xy coordinate location, the structural similarity


index aids in pinpointing the precise place of the picture
differences. This is where the attempting to compare the
altered image with the original. The similarity decreases
with decreasing SSIM score.

SSIM: 0.31678790332739426

Figure 4: Convolutional Neural Networks Pan card fraud detection utilizing CNN model and
TensorFlow's Keras Sequential API. To avoid overfitting,
In figure 4, the CNN's architecture is built to be capable of the model consists of several convolutional layers followed
efficiently analyzing and extracting characteristics from by dropout regularization. After that, the output is flattened
pictures. This is accomplished by running the input image and sent via a series of closely spaced layers to the last
through a succession of filters, each of which extracts a softmax layer for classification. The Adam optimizer and
distinct pattern or feature that corresponds to a different part sparse categorical cross entropy loss function are used to
of the image. The most common method for learning these compile the model. Finally, model.summary() is used to
filters is backpropagation, in which the network modifies its provide an overview of the architecture and parameters.
layers, 1 Dropout layer, 1 Flatten layer, and 3 Dense layers
The threshold function is a potent tool in computer vision in this model architecture. A batch of arrays is sent into the
that enables the application of an adaptive threshold to an first convolution layer. Two outputs come from the last
image that is stored as an array. This function's main goal is Dense layer. since only have two classes. Due to the
to use a mathematical technique to convert a grayscale categorical nature of the data, the "softmax" activation
image into a binary image. This threshold value is used to function and "sparse_categorical_crossentropy" loss
establish whether a pixel belongs in the image's foreground function were employed in the final layer. Ultimately, utilize
or background. models to construct the model.put together ().

The contours of the binary picture can be obtained by using With a batch size of two, this line of code trains the model
the findContours function after it has been produced. The for 15 epochs using the training data (X_train and Y_train).
points that make up the image's borders combine to generate During training, training progress updates can be shown
these contours, which are really a collection of curves. They thanks to the verbose=1 setting. Marning model will be
may be applied to recognize and identify different items in trained using the `model.fit()} method. Will give the model
the image, making them a valuable tool for form analysis two photos at once while it is being trained. This is a
and recognition. hyperparameter that may be changed to maximize model
performance; it is referred to as the batch size.
The correct contour values are extracted using the
grabContours function. The image can then be processed The number of epochs is another crucial hyperparameter.
using these values in a number of ways, such cropping it or The number of times the model iterates over the whole
highlighting particular areas. A popular tool for image dataset during training is determined by this. Trial and error
processing and analysis in a variety of domains, such as is frequently used to establish the ideal number of epochs,
computer vision, robotics, and machine learning, is the which might vary according to the particular issue and
combination of the threshold, findContours, and dataset.
grabContours functions.
Using the matplotlib package, Will plot the accuracy and
Table 1: Model Architecture Summary loss over time to track model's training progress. This will
enable us to see the extent to which the model is picking up
Layer (type) Output Shape Param# patterns in the data and pinpoint any problems that require
conv2d_16 (Conv2D) (None, 98, 98, 448 attention. Through meticulous adjustment of these
16) hyperparameters and close observation of the training
conv2d_17 (Conv2D) (None, 48, 48, 4640 procedure, the study may develop a machine learning model
32) that excels in task.
conv2d_18 (Conv2D) (None, 46, 46, 18496
64)
E. Put the model to the test:
conv2d_19 (Conv2D) (None, 22, 22, 4616
8)
dropout_4 (Dropout) (None, 22, 22, 0 Determining the bounding rectangle width to height of an
8) item might be helpful when comparing two photos. The
flatten_4 (Flatten) (None, 3872) 0 bounding box of the object's contour in both photos must
first be calculated in order to do this. Upon obtaining the
dense_12 (Dense) (None, 50) 193650
bounding boxes, the proposed work can effortlessly discern
dense_13 (Dense) (None, 30) 1530
the differences or similarities between the two input photos
dense_14 (Dense) (None, 2) 62
by drawing them on them. A number of image comparison
tasks, including recognizing changes in photos over time or
As shown in table 1, The CNN layers and matching output spotting abnormalities in a collection of photographs, might
forms are broken down in depth in the model architecture benefit from this method. Using the testing dataset (X_test
overview. and y_test), the evaluate() function evaluates the accuracy
and loss measures of the trained model. This aids in
Total params: 223,442 evaluating the model's performance in real-world scenarios
Trainable params: 223,442 and its ability to generalize to new data.
Non-trainable params: 0
[0.3114681839942932, 0.8571428656578064]
The parameterization of the model, which has 223,442
parameters in total, is summarized in this section. Since any The first item shows the loss that the test dataset
parameter can be trained, the architecture of the model may experienced, while the second item shows the accuracy that
be completely changed while it is being trained. the same dataset managed to attain. It is noteworthy that the
model performs satisfactorily in predicting the outcomes, as
D. Train the model and Make the model more proficient: evidenced by its accuracy of over 85% in the test dataset.

Simple model architecture is built using sequential API. It is It is crucial to handle the image in the same manner as the
a simple stack of layers with one input and one output for training photos when making predictions straight from the
each layer. Employing a model. With add(), the study may image. This guarantees that the model is generating
expand model's layers. It have employed 4 Convolution predictions based on the same data and attributes. The
np.expand_dims function, which lets us increase the picture labels) are shown for each epoch. Following each epoch, the
dimension, may be used to do this. loss and accuracy values are calculated and updated to
represent the learning dynamics of the model and its
For instance, the np.expand_dims function may be used to performance gains throughout training. This data is used to
increase the image's dimension if its form is (100, 100, 3). monitor the convergence of the model and evaluate how
Once np.expand_dims has been applied, the picture shape well it learns the underlying patterns in the dataset.
will be (1, 100, 100, 3). Even when only one image is being
forecasted, the model anticipates a batch of photos, which is
why this additional dimension is required.

The photos in this part will serve as a visual representation


of the data, along with the corresponding ground truth
labels. This will make it easier to comprehend how the data
is categorized and labeled, which will help with dataset
analysis and interpretation.

The model's output indicates whether or not the pan card is


authentic by labeling it as "real" or "fake." The paragraph
emphasizes how well the algorithm predicts the validity of
pan cards. Using the model.save() function, the study can
export the model and store it in the h5 format once the Figure 6: Accuracy Over Time
model training is over. As a result, the study will be able to
use the trained model to do various tasks or make In figure 6, visual representation of the accuracy for every
predictions as needed. iteration in order to monitor the development of model
respectively. The proposed work will utilize the widely used
matplotlib visualization package to do this. The study may
IV. RESULT AND ANALYSIS obtain a more comprehensive insight into the model’s
Determining an image's structural resemblance might be a performance and if it is improving by charting the accuracy
useful method for examining and identifying its form. The and loss over time. This will assist us in making well-
study can ascertain the degree of similarity or difference informed choices on how to optimize model and raise its
between two photographs by analyzing their structural level of performance.
similarities. The study can also improve analysis of the
picture form by defining a threshold and then using that
threshold to find contours for images that have been
converted to grayscale binary.
The Structural Similarity Index Measure, or SSIM, for the
picture you submitted is around 31.2%. This implies that the
picture could not be a true copy of the original or that it has
been altered.

Figure 7: Loss Over Time

In figure 7, visual representation of the loss for every


iteration in order to monitor the development of model
respectively. To ensure optimal training for model, The
study have opted for a batch size of 2. This suggests that
during training, the model will be fed two pictures
simultaneously.
Furthermore, epochs are essential to the training of the
model. It's a hyperparameter that controls how many times
the model iterates across the whole dataset while being
trained. As a result, the number of epochs directly affects
Figure 5: Visualization of Training Progress: Epoch-wise Accuracy
and Loss
how well the model predicts the future, and determining the
In figure 5, the given data illustrates the model's training ideal number of epochs is frequently an essential first step in
over a period of 15 epochs. Every epoch denotes a full run developing a reliable and accurate model.
of the training program. The metrics for accuracy (the
percentage of properly identified samples) and loss (the [0.3114681839942932, 0.8571428656578064]
difference between the model's predictions and the actual
Two significant assessment measures that were discovered several image processing methods and raise the accuracy of
after the machine learning model was tested on the the model. Through this research, the designers have gained a
validation dataset are included in the output [0.311, 0.857]. great deal of expertise and hands-on experience in
The model's loss on the validation set is shown by the first developing deep learning image classifiers. The model
value (0.311). Stated differently, it represents the degree to architecture created for this research has potential
which the model's predictions agree with the actual labels. applications beyond detecting PAN card fraud, including
Better performance is shown by a smaller loss value, which object identification, facial recognition, and more. All things
shows that the model's predictions are more accurate than considered, the research has been effective in offering a
thorough grasp of deep learning methods and their practical
the actual labels.
applications.
Conversely, the accuracy of the model on the validation set
is shown by the second value (0.857). It displays the The complexity of differentiating between authentic and
percentage of samples in the validation set that were counterfeit PAN cards has increased with technological
properly categorized out of all the samples. An increased advancements. The development of computer vision and
accuracy value is indicative of a well-performing model as it deep learning techniques has made it possible for phishers to
shows that the majority of the samples in the validation set produce counterfeit cards that closely mimic genuine ones.
were accurately predicted by the model. The government must intervene and conduct thorough
verification processes in order to ensure the validity of PAN
Understanding the machine learning model's performance cards. This will safeguard not only the interests of those who
use PAN cards for various transactions but also those of
and its capacity to generalize to new data depends on these
businesses that utilize them in order to avoid fraudulent
two indicators. It is often believed that a model doing well
behaviour.
on the validation set would also perform well on other
unknown datasets. Consequently, these measures aid in The study was able to determine the differences or
establishing if the machine learning model is appropriate for similarities in the photos' shapes by determining the
the intended use case and whether any more enhancements structural similarity between them. In a similar vein,
are needed. determining the threshold and contours based on those
thresholds for the grayscale binary pictures also aided in
Future Directions: The model architecture that was ability to recognize and analyze shapes. Since SSIM is less
previously employed may be applied to create a classifier than 31.2%, The proposed study may
that can predict more than two classes. Essentially, the study
can make the classifier more capable of recognizing and VI. ACKNOWLEDGMENT
differentiating between more than simply two groups. This In acknowledgment of pan card fraud detection study, we
may be accomplished by adding more output nodes to the would like to sincerely thank everyone who helped make the
model, each of which is associated with a distinct class project a success. We are appreciative of the vital
label. By doing this, rather than just two classes, The study contributions that the conference attendees, data producers,
may train the model to identify and categorize data points open-source communities, and our project team have played
into several classes. in this endeavour. The support of our academic institution,
our supervisors' advice, and the help of other students has
Real-World Implications: Many different types of been priceless. We expect to work together in the future and
enterprises that demand identity verification from their users see our research advance.
or customers can benefit from this research. Organizations
may quickly ascertain if the IDs given by users or customers
are authentic or fake with the use of this initiative. Any kind VII. REFERENCES
of identity, including voter ID and Aadhaar cards, can be
used for this. Organizations may improve their security [1] Rohini Hanchate, Shreyas Yerole, Nazir Lalloti, Piyush Mahajan,"
protocols and thwart identity theft and fraud by putting this Fraud Detection of PAN Card Using Machine Learning", Journal of
study into practice. This will guarantee that all of their users Emerging Technologies and Innovative Research (JETIR), India
published on May 2023. J. Clerk Maxwell, A Treatise on Electricity
and customers have a safe and secure experience. and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.
[2] P. Shanmugapriya, R. Shupraja, V. Madhumitha,"Credit Card Fraud
Together, these elements make up the SSIM metric, which Detection System Using CNN", International Journal for Research in
evaluates the overall structural similarity of the two pictures. Applied Science & Engineering Technology (IJRASET), March
Published in 2022 K. Elissa, “Title of paper if known,” unpublished.
A number nearer 1 indicates a stronger structural similarity.
[3] Sushmito Ghosh and Douglas L. Reilly, "Credit Card Fraud Detection
The SSIM index has a range of -1 to 1. Therefore, the with a Neural-Network", International Conference on System
greater the similarity between the structures of the given Sciences, 1994.
photos, the closer the SSIM value is to 1. [4] Meiying Huang and Wenxuan Li, "Financial Fraud Detection Using
Deep Learning Based on Modified Tabular Learning", ECIT 2022,
AHE 11, pp. 550–558, 2023. M. Young, The Technical Writer’s
V. CONCLUSION Handbook. Mill Valley, CA: University Science, 1989.
[5] Vaishnavi Nath, Geetha S's Credit Card Fraud Detection using
Machine Learning Algorithms, published in IJCSMC,2019.
The primary goal is to use cutting-edge Python deep [6] Salomi Hurriya Anjum, Geeta Patil. "Cheat Detection for Credit Cards
learning algorithms to develop a pan card fraud detection Using Artificial Intelligence", 2022 IEEE North KarnatakaSubsection
system. TensorFlow's sequential API, a potent tool for Flagship International Conference (NKCon), 2022.
creating deep learning models, is used in the model
architecture. Furthermore, OpenCV is utilized to apply
[7] Credit Card Fraud Detection Using Machine Learning and Data [12] Ulutas, G., Muzaffer, G.: ‘A new copy move forgery detection
Science" by S P Maniraj, Aditya Saini, and Swarna Deep Sarkar method resistant to object removal with uniform background forgery’,
Shadab Ahmed, published in IJERT, 2019. Math. Probl. Eng., 2016, 2016, pp. 1–19.
[8] Pradheepan Raghavan, Neamat El Gayar,” Fraud Detection using [13] I. D. Mienye and Y. Sun, "A Deep Learning Ensemble With Data
Machine Learning and Deep Learning,” December 2019. Resampling for Credit Card Fraud Detection," in IEEE Access, vol.
[9] Survey of Various Techniques Used for Credit Card Fraud Detection 11, pp. 30628-30638, 2023.
Aayushi Agarwal1 Md Iqbal2, Baldivya Mitra3 1, 2, 3Department of [14] Balasundaram Ananthakrishnan, Ayesha Shaik, Soham Kumar, S. O.
Computer Science & Engineering, MIET, Meerut, U.P, India Narendran, Khushi Mattu, Muthu Subash Kavitha. "Automated
published on, 8th July 2020. Detection and Classification of Oral Squamous Cell Carcinoma Using
[10] R. T. Trippi, E. Turban (eds), Neural Networks in Finance and Deep Neural Networks", Diagnostics, 2023.
Investing, Probus Publishing Company (1993). [15] SMOTE Oversampling and ExtraTreeClassifier Feature Selection",
[11] S. Shang, N. Memon, and X. Kong, “Detecting documents forged by 2023 10th International Conference on Electrical Engineering,
printing and copying,” EURASIP Journal on Advances in Signal Computer Science and Informatics (EECSI),2023.
Processing, vol. 2014, no. 1, p. 140, 2014.

You might also like