Banknote Authentication Analysis Using Python K-Means Clustering
Banknote Authentication Analysis Using Python K-Means Clustering
ISSNNo:-2456-2165
Abstract:- The objective is to analyze the given data sets II. METHODOLOGY
V1 and V2 from the bank_authentication_notes.csv
which is taken from open ML datasets, is to identify the
forged and real notes using K-Means Clustering Concept
forming two distinct clusters of real and forged notes. K-
means is easy and simple uses unsupervised learning to
solve clustering related problems. It classifies the given
datasets to form a group of clusters based on some
similarities. The major goal is defining k centers, one for
each cluster. The ultimate aim is to use this dataset to
train a machine to detect fake notes automatically.
Table -1: The values v1 and v2 obtained from the dataset
However, before implementation, it is important to
access if this dataset can sufficiently distinguish forged
banknotes from genuine ones. Hence, in this report, with The Algorithm used here is K-Means Clustering. As it
k-mean cluster analysis, unsupervised machine learning, is the simplest method for forming clusters to easily detect
performed on the datasets, we will visualize and outline the forged and clean banknotes based on the variance and
the results and make according torecommendations. skewness (i.e. V1 and V2). Here, the data is widely
diversified, so it is important to normalize the data using
Keywords:- K-Means Clustering, unsupervised Learning,
Clusters, banknotes. Data normalized = (data - data min () / (data max () - data
min ())).
I. INTRODUCTION
After normalization, the data becomes stable where it
Forged banknotes are no longer just a problem for comes under a certain range of 0 to 1. Using the data
merchants, but for banks as well. In recent years, all across describe, the instances can be described as follows: The data
the UK and the Eurozone, the service of direct cash deposit describe function gives the number of instances,
at a cash machine has been rolled out. It is of the uttermost mathematical distributions such as mean, standard deviation,
importance to find a solution to stop criminal action. Here, minimum value among all the instances, maximum value
we have developed a robust system to identify forged notes among the instances ,etc. which are further helpful in the
by identifying just two features and clustering them in order normalization of data.
to predict the forgeries. The dataset is taken from
https://www.openml.org/d/1462. This report is consists of
data extracted from imagines with Wavelet Transform. The
imagines were taken from genuine and forged banknote
specimens (n=1372). There are two attributes in this dataset
(V1: variance of Wavelet Transformed image and V2
skewness of Wavelet Transformedimage).
In mathematics, a wavelet series is a representation of
a square-integral (real-or complex-valued) function by a
certain orthonormal series generated by a wavelet. This
provides a formal, mathematical definition of
an orthonormal wavelet and of the integral wavelet
transform.These wavelet of values are defined using
variance and skewness are transformed to an image values.
Here the input csv file consists of 1372 instances
divided into 4 values as V1, V2, V3, and V4. Here we are
using the V1 and V2 values only. The values are classified
into two classes 1 and 2. Fig 1:- The properties of data in the dataset
IJISRT20OCT060 www.ijisrt.com 80
Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and ResearchTechnology
ISSNNo:-2456-2165
The above Figure -1 depicts the overall description of After normalization plot the obtained two clusters as
data in the data set given. We see that the values are Forged and real bank notes.
extremely diversified ranging from -ve to +ve. So, we
cannot form clusters from these values. We must convert The graph forms two clusters at
them into values which are of a well-defined range in order [[0.65504068 0.48596745]
to groupthem. [-0.85034594 -0.63086227]] Positions.
The data is normalized using the above formula, which The two clusters formed by plotting the data before
lies in the specific interval makes it easy to plot the data in a normalization and after normalization of data are:
scatter plot graph given below:
The following libraries in our project: The cluster formed after normalizing data where the
yellow region represents forged bank notes and the green
Pandas-usedtoreadandwritethecsvfile
region represents real bank notes which are formed by the
Numpy - used to perform mathematicaloperations
clusters at centroids represented by black dots.
Matplotlib-usedtovisualizethedataingraphs
Scikit-learn - used in k-meansclustering
IJISRT20OCT060 www.ijisrt.com 81
Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and ResearchTechnology
ISSNNo:-2456-2165
IV. RESULTS AND DISCUSSION [5]. Python (2014), "python", Available at:
www.python.org (Accessed: 1st March2014).
We need to quantify how good our model is for you to [6]. Scikit-learn (2013),"sklearn. Cluster.
implement in your network. Given that we just have two MiniBatchKMeans", Available at the link given below
outcomes, real and forged, we proceeded to cluster the data as:
around two points, where each point that belongs to a cluster http://scikitlearn.org/stable/modules/generated/sklearn.
shares similar features. After calibrating the model, we got Cluster. MiniBatchKMe ans.html (Accessed: 1st
the two clusters. After running the k means algorithm a few March2014).
times, it was found out that the clusters were more or less [7]. Maulik U, Bandyopadhyay S: Genetic algorithm-
stable. The cluster is not fully reliable as it might have some based clustering technique. Pattern Recognition.2000,
tolerance. 33: 1455-1456.10.1016/S0031-3203(99)00137-5.
V. CONCLUSION
The client can able to easily identify the real and fake
banknotes from the scatter graph pointed forming two
clusters. The analysis was done for the given data set.The
data could be better processed if there were more features
corresponding to an item in the list. Let us assume that from
our plot one data note is termed as fake, but still that data
may be free of errors but it could be wrongly classified as
fake one. Similarly, a data might contain some errors too but
it is classified as real bank note. So, in order to give the right
classification, we need a few more parameters.
REFERENCES
IJISRT20OCT060 www.ijisrt.com 82