0% found this document useful (0 votes)
263 views

Banknote Authentication Analysis Using Python K-Means Clustering

The objective is to analyze the given data sets V1 and V2 from the bank_authentication_notes.csv which is taken from openML datasets, is to identify the forged and real notes using K-Means Clustering Concept forming two distinct clusters of real and forged notes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views

Banknote Authentication Analysis Using Python K-Means Clustering

The objective is to analyze the given data sets V1 and V2 from the bank_authentication_notes.csv which is taken from openML datasets, is to identify the forged and real notes using K-Means Clustering Concept forming two distinct clusters of real and forged notes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and ResearchTechnology

ISSNNo:-2456-2165

Banknote Authentication Analysis Using Python


K-Means Clustering
Ragavi E
Final year Engineering Student,
Dept. of Computer Science and Engineering.
Prathyusha Engineering College, Thiruvallur 602 025

Abstract:- The objective is to analyze the given data sets II. METHODOLOGY
V1 and V2 from the bank_authentication_notes.csv
which is taken from open ML datasets, is to identify the
forged and real notes using K-Means Clustering Concept
forming two distinct clusters of real and forged notes. K-
means is easy and simple uses unsupervised learning to
solve clustering related problems. It classifies the given
datasets to form a group of clusters based on some
similarities. The major goal is defining k centers, one for
each cluster. The ultimate aim is to use this dataset to
train a machine to detect fake notes automatically.
Table -1: The values v1 and v2 obtained from the dataset
However, before implementation, it is important to
access if this dataset can sufficiently distinguish forged
banknotes from genuine ones. Hence, in this report, with The Algorithm used here is K-Means Clustering. As it
k-mean cluster analysis, unsupervised machine learning, is the simplest method for forming clusters to easily detect
performed on the datasets, we will visualize and outline the forged and clean banknotes based on the variance and
the results and make according torecommendations. skewness (i.e. V1 and V2). Here, the data is widely
diversified, so it is important to normalize the data using
Keywords:- K-Means Clustering, unsupervised Learning,
Clusters, banknotes. Data normalized = (data - data min () / (data max () - data
min ())).
I. INTRODUCTION
After normalization, the data becomes stable where it
Forged banknotes are no longer just a problem for comes under a certain range of 0 to 1. Using the data
merchants, but for banks as well. In recent years, all across describe, the instances can be described as follows: The data
the UK and the Eurozone, the service of direct cash deposit describe function gives the number of instances,
at a cash machine has been rolled out. It is of the uttermost mathematical distributions such as mean, standard deviation,
importance to find a solution to stop criminal action. Here, minimum value among all the instances, maximum value
we have developed a robust system to identify forged notes among the instances ,etc. which are further helpful in the
by identifying just two features and clustering them in order normalization of data.
to predict the forgeries. The dataset is taken from
https://www.openml.org/d/1462. This report is consists of
data extracted from imagines with Wavelet Transform. The
imagines were taken from genuine and forged banknote
specimens (n=1372). There are two attributes in this dataset
(V1: variance of Wavelet Transformed image and V2
skewness of Wavelet Transformedimage).
In mathematics, a wavelet series is a representation of
a square-integral (real-or complex-valued) function by a
certain orthonormal series generated by a wavelet. This
provides a formal, mathematical definition of
an orthonormal wavelet and of the integral wavelet
transform.These wavelet of values are defined using
variance and skewness are transformed to an image values.
Here the input csv file consists of 1372 instances
divided into 4 values as V1, V2, V3, and V4. Here we are
using the V1 and V2 values only. The values are classified
into two classes 1 and 2. Fig 1:- The properties of data in the dataset

IJISRT20OCT060 www.ijisrt.com 80
Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and ResearchTechnology
ISSNNo:-2456-2165
The above Figure -1 depicts the overall description of After normalization plot the obtained two clusters as
data in the data set given. We see that the values are Forged and real bank notes.
extremely diversified ranging from -ve to +ve. So, we
cannot form clusters from these values. We must convert The graph forms two clusters at
them into values which are of a well-defined range in order [[0.65504068 0.48596745]
to groupthem. [-0.85034594 -0.63086227]] Positions.

The data is normalized using the above formula, which The two clusters formed by plotting the data before
lies in the specific interval makes it easy to plot the data in a normalization and after normalization of data are:
scatter plot graph given below:

Fig 3:- Cluster before normalizing data


Fig 2:- The initial plotting of V1 and V2 before
clustering.

III. MODELLING ANDANALYSIS

The model used here is K-Means clustering, the major


objective of K-means is to group similar data points together
and discover any similar patterns. The k means clustering
will start by finding the k centres to form 'n' clusters. A
cluster can be formed by grouping of similar data points
aggregated together considering some specific functionalities
and similarities. The centroids which are formed is an
imaginary or real location at the centre of a cluster identified
by k-means clustering forming k-centres. These k-centres
allows nearest data points to the nearest clusters forming a 'n'
clusterskeepingthedatacentresas smallaspossible.

The ‘means’ in the K-means is obtained by finding the


centroid, using average distance calculation.

The Scikit-learn library is used to import the necessary


libraries for the k-means clustering implementation Fig 4:- Visualization of K-Means cluster Analysis

The following libraries in our project: The cluster formed after normalizing data where the
yellow region represents forged bank notes and the green
 Pandas-usedtoreadandwritethecsvfile
region represents real bank notes which are formed by the
 Numpy - used to perform mathematicaloperations
clusters at centroids represented by black dots.
 Matplotlib-usedtovisualizethedataingraphs
 Scikit-learn - used in k-meansclustering

IJISRT20OCT060 www.ijisrt.com 81
Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and ResearchTechnology
ISSNNo:-2456-2165
IV. RESULTS AND DISCUSSION [5]. Python (2014), "python", Available at:
www.python.org (Accessed: 1st March2014).
We need to quantify how good our model is for you to [6]. Scikit-learn (2013),"sklearn. Cluster.
implement in your network. Given that we just have two MiniBatchKMeans", Available at the link given below
outcomes, real and forged, we proceeded to cluster the data as:
around two points, where each point that belongs to a cluster http://scikitlearn.org/stable/modules/generated/sklearn.
shares similar features. After calibrating the model, we got Cluster. MiniBatchKMe ans.html (Accessed: 1st
the two clusters. After running the k means algorithm a few March2014).
times, it was found out that the clusters were more or less [7]. Maulik U, Bandyopadhyay S: Genetic algorithm-
stable. The cluster is not fully reliable as it might have some based clustering technique. Pattern Recognition.2000,
tolerance. 33: 1455-1456.10.1016/S0031-3203(99)00137-5.

With just two parameters, our model only passed 3


forged notes as genuine out of 1,372 banknotes, an error of
roughly 0.22.The next step would be to improve the dataset,
especially gathering information on genuine bank notes.
Going back to the last plot, we see that the data points of
features of real bank notes are less disperse than the
counterfeit ones.

V. CONCLUSION

The client can able to easily identify the real and fake
banknotes from the scatter graph pointed forming two
clusters. The analysis was done for the given data set.The
data could be better processed if there were more features
corresponding to an item in the list. Let us assume that from
our plot one data note is termed as fake, but still that data
may be free of errors but it could be wrongly classified as
fake one. Similarly, a data might contain some errors too but
it is classified as real bank note. So, in order to give the right
classification, we need a few more parameters.

For example, if there were 4 features instead of two,


then we could have analyzed which feature affects more
than the other by plotting them against each other which
improves the accuracy. I would suggest the client can do
additional tests on the banknotes which are classified as
forged banknote and to monitor realbanknotes.

REFERENCES

[1]. The 7th python in Science Conference 2007, Aric


Hagberg, Daniel Schult and Pieter Swart. Exploring
network structure, dynamics, and function using
network. Proceedings.
[2]. Journal of Machine Learning Research,2010 .Tom
Schaul, Justin Bayer, Daan Wierstra, Yi Sun, Martin
Felder, Frank Sehnke, Thomas Rückstieß, and Jurgen
schmindhuber, Pybrain.
[3]. Ali Feizollah,Nor Badrul Anuar ,Fair Amalina,
Comparative study of k-means and mini batch k-means
clustering algorithms in android malware detection
using network traffic analysis, published in
2014International Symposium on Biometrics and
SecurityTechnologies (ISBAST)
[4]. Sculley D (2010), "Web-scale k-means clustering",
Proceedings of the 19th international conference on
World Wide Web, Raleigh, North Carolina, USA, pp.
1177-1178.

IJISRT20OCT060 www.ijisrt.com 82

You might also like