100% found this document useful (1 vote)
91 views3 pages

Banknote Authentication

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (1 vote)
91 views3 pages

Banknote Authentication

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 3
In [1]: In [2]: In [3]: out [3]: In [4]: #import the needed python Libraries import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt from sklearn.cluster import KMeans Step 1: Load the given Banknote authentication dataset. data = pd.read_csv("Banknote-authentication-dataset-.csv") aStep 2: Calculate statistical measures, e.g. mean and standard deviation. data.describe() wt v2 count | 1372,000000 | 1372.000000 mean |0.433735 | 1.922353 std |2.842763 [5.869047 min |-7.042100 | -13.773100 25% |-1.773000 _|-1.708200 50% |0.496180 |2.319650 75% |2.821475 | 6.814625 max |6.824800 | 12.951600 mean = np.mean(data, @) print (mean) std_dev = np.std(data, @) print(std_dev) V1 0.433735 v2 1.922353 ctype: floatea vi 2.841726 v2 5.866907 dtype: floatea In [5]: #Step 3: Visualise your data as you consider fit. plt.plot(data['vi'], data['v2"], 'rx') plt.plot(mean['V1"], mean['v2"], "*") plt.xlabel('Vi") plt.ylabel('v2") plt.show() Step 4: Evaluate if the given dataset is suitable for the K-Means clustering task. Visually, there are a few clusters that can be extracted from the graph. There are a few at a point that are distinctly different from the rest of the data. Therefore, this data would be suitable data clustering, but we must do data normalization for better results. In [6]: # Normalise the data data_min = np.min(data,@) data_max = np.max(data,@) 4print(data_min, data_max) normed = (data - data_min) / (data_max-data_min) print (normed) In [7]: out [7]: In [8]: ‘performing clustering v1 = normed['V1'] v2 = normed['v2"] km_res = KMeans(n_clusters=2) normed_predicted = km_res.fit_predict(normed[["V1',‘V2"]]) formed predicted normed[ ‘cluster’ ]= normed_predicted ‘fnormed. head km_res.cluster_centers_ array([[@.67378548, @.69821998],, [e.36988789, @.4479234 ]]) df1 = normed[normed.cluster==0] f2 = normed[normed.cluster==1] plt.scatter(df1.V1,df1['V2"], color ='red') plt.scatter(d#2.V1,df2['v2"], color ="green') plt.scatter(kn_res.cluster_centers_[:,@],kn_res.cluster_centers_[:,1],color='b lack’ marker="*", label='centroid', s=400) plt.xlabel(‘v1') plt.ylabel(‘v2") plt.show() 10 og 06 04 02 00 00 02 04 06 08 10

You might also like