0% found this document useful (0 votes)

46 views32 pages

DWDM Lab Manual

Uploaded by

bharathkatamneni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views32 pages

DWDM Lab Manual

Uploaded by

bharathkatamneni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Exercise 1: INTRODUCTION

1. Introduction to Python libraries for Data Mining: NumPy, Pandas, Matplotlib etc.
Write a Python program to do the following operations: Library: NumPy
a) Create multi-dimensional arrays and find its shape and dimension
b) Create a matrix full of zeros and ones
c) Reshape and flatten data in the array
d) Append data vertically and horizontally
e) Apply indexing and slicing on array
f) Use statistical functions on array - Min, Max, Mean, Median and Standard Deviation
PROCEDURE:
1. Create: Open a new file in Python shell, write a program and save the program with .py
extension.
2. Execute: Go to Run -> Run module (F5)
a) Create multi-dimensional arrays and find its shape and dimension
import numpy as np
#creation of multi-dimensional array
a=np.array([[1,2,3],[2,3,4],[3,4,5]])
#shape
b=a.shape
print("shape:")
print(a.shape)
#dimension
c=a.ndim
print("dimensions:")
print(a.ndim)
b) Create a matrix full of zeros and ones
import numpy as np
#matrix full of zeros
z=np. zeros ((2,2))

DEPARTMENT OF AI Page 1
print("zeros:")
print(z)
#matrix full of ones
o=np. ones ((2,2))
print("ones:")
print(o)

c) Reshape and flatten data in the

import numpy as np
a=np.array([[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7]])
b=a.reshape(4,2,2)
print("reshape:")
print(b)
#matrix flatten
c=a.flatten()
print("flatten:")
print(c)
d) Append data vertically and horizontally
import numpy as np
#Appending data vertically
x=np.array([[10,20],[80,90]])
y=np.array([[30,40],[60,70]])
v=np.vstack((x,y))
print("vertically:")
print(v)
#Appending data horizontally
h=np.hstack((x,y))
print("horizontally:")
print(h)
DEPARTMENT OF AI Page 2
e) Apply indexing and slicing on array #indexing
import numpy as np
a=np.array([[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7]])
temp = a[[0, 1, 2, 3],[1, 1, 1, 1]]
print('indexing:')
print(temp)
#slicing
i=a[:4,::2]
print('slicing:')
print(i)

f) Use statistical functions on array - Min, Max, Mean, Median and Standard Deviation

import numpy as np
#min for finding minimum of an array
a=np.array([[1,3,-1,4],[3,-2,1,4]])
b=a.min()
print("minimum:",b)
#max for finding maximum of an array
c=a.max()
print("maximum:",c)
#mean
a=np.array([1,2,3,4,5])
d=a.mean()
print("mean:",d)
#median
e=np.median(a)
print("median:",e)
#standard deviation
f=a.std()
print("standard deviation:",f)

DEPARTMENT OF AI Page 3
Exercise 2: UNDERSTANDING DATA
Write Python programs to do the following operations:
1. Loading data from CSV file
2. Compute the basic statistics of given data - shape, no. of columns, mean
3. Splitting a data frame on values of categorical variables
4. Visualize data using Scatter plot
Dataset: brain_size.csv
Library: Pandas, matplotlib
a) Loading data from CSV file
b) Compute the basic statistics of given data - shape, no. of columns, mean
c) Splitting a data frame on values of categorical variables
d) Visualize data using Scatter plot
1. Loading data from CSV file
import pandas as pd
a=pd.read_csv("D:/data.csv")
print(a)
2. Compute the basic statistics of given data - shape, no. of columns, mean
import pandas as pd
a=pd.read_csv("D:/data.csv")
print('shape :',a.shape)
#no of columns
cols=len(a.axes[1])
print('no of columns:',cols)
#mean of data

m=a["marks"].mean()

DEPARTMENT OF AI Page 4
print('mean of marks:',m)

3. Splitting a data frame on values of categorical variables

import pandas as pd

a=pd.read_csv("D:\data.csv")

print("Before:")

print(a)

a_split=a['address'].str.split(' ',1)

a['district']=a_split.str.get(0)

a['state']=a_split.str.get(1)

del(a['address'])

print("After:")

print(a)

4. Visualize data using Scatter plot

import pandas as pd

import matplotlib.pyplot as plt

a=pd.read_csv("D:\data.csv")

print("Before",a)

a_split=a['address'].str.split(',',1)

a['district']=a_split.str.get(0)

a['state']=a_split.str.get(1)

del(a['address'])

print("After=",a)

a.plot(kind='scatter',x='marks',y='rollno',c='red')
DEPARTMENT OF AI Page 5
plt.show()

Exercise 3: CORRELATION MATRIX

Write a python programs to load the dataset and understand the input data
1. Load data, describe the given data and identify missing, outlier data items
2. Find correlation among all attributes
3. Visualize correlation matrix
1. Load data
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("D:/diabetes.csv")
print(df. describe())
print(df.head(10))
print(df.isnull())
2. Find correlation among all attributes
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats
# Making data frame from the csv file
df = pd.read_csv("nba.csv")
# Printing the first 10 rows of the data frame for visualization
print(df[:10])

DEPARTMENT OF AI Page 6
# To find the correlation among columns # using pearson method
print(df.corr(method ='pearson'))
# using „kendall‟ method.
print(df.corr(method ='kendall'))

3. Visualize correlation matrix

import pandas as pd
df = pd.read_csv("D:/diabetes.csv")
print(df. describe())
print(df.head(10))

DEPARTMENT OF AI Page 7
Exercise 4: DATA PREPROCESSING – HANDLING MISSING VALUES

Write a python program to impute missing values with various techniques on given dataset.
1. Remove rows/ attributes
2. Replace with mean or mode
3. Write a python program to perform transformation of data using Discretization (Binning)
and normalization (MinMaxScaler or MaxAbsScaler) on given dataset.
1. Remove rows/ attributes
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("D:/diabetes.csv")
# filling missing value using fillna()
print(df.fillna(0))
# filling a missing value with previous value
print(df.fillna(method ='pad'))
#Filling null value with the next ones
print(df.fillna(method ='bfill'))
# filling a null values using fillna()

DEPARTMENT OF AI Page 8
print(df["gender"].fillna("No Gender", inplace = True))
# will replace Nan value in dataframe with value -99
print(df.replace(to_replace = np.nan, value = -99))
# using dropna() function to remove rows having one Nan
print(df.dropna())
# using dropna() function to remove rows with all Nan
print(df.dropna(how = 'all'))
# using dropna() function to remove column having one Nan
print(df.dropna(axis = 1))

2. Replace with mean or mode

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("D:/diabetes.csv")
# filling missing value using fillna()
print(df.fillna(0))
# filling a missing value with previous value
print(df.fillna(method ='pad'))
#Filling null value with the next ones
print(df.fillna(method ='bfill'))
# filling a null values using fillna()
print(df["gender"].fillna("No Gender", inplace = True))
# will replace Nan value in dataframe with value -99
print(df.replace(to_replace = np.nan, value = -99))
# using dropna() function to remove rows having one Nan
print(df.dropna())
DEPARTMENT OF AI Page 9
# using dropna() function to remove rows with all Nan
print(df.dropna(how = 'all'))
# using dropna() function to remove column having one Nan
print(df.dropna(axis = 1))
print(df["Age"].fillna(df["Age"].mean()))

3. Perform transformation of data using Discretization (Binning)

Binning can also be used as a discretization technique. Discretization refers to the
process of converting or partitioning continuous attributes, features or variables to
discretized or nominal attributes/ features/ variables/ intervals.
For example, attribute values can be discretized by applying equal-width or equal-
frequency binning, and then replacing each bin value by the bin mean or median, as in
smoothing by bin means or smoothing by bin medians, respectively. Then the continuous
values can be converted to a nominal or discretized value which is same as the value of
their corresponding bin.
There are basically two types of binning approaches –
Equal width (or distance) binning: The simplest binning approach is to partition the
range of the variable into k equal-width intervals. The interval width is simply the range
[A, B] of the variable divided by k, w = (B-A) / k
Thus, ith interval range will be [A + (i-1)w, A + iw] where i = 1, 2, 3…..k
Skewed data cannot be handled well by this method.
Equal depth (or frequency) binning : In equal-frequency binning we divide the range
[A, B] of the variable into intervals that contain (approximately) equal number of points;
equal frequency may not be possible due to repeated values.
DEPARTMENT OF AI Page 10
There are three approaches to perform smoothing –
Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced
by the mean value of the bin.
Smoothing by bin median : In this method each bin value is replaced by its bin
median value.
Smoothing by bin boundary : In smoothing by bin boundaries, the minimum and
maximum values in a given bin are identified as the bin boundaries. Each bin value is then
replaced by the closest boundary value.
Example:
Sorted data for price(in dollar) : 2, 6, 7, 9, 13, 20, 21, 25, 30

Smoothing by bin means:

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn import linear_model

# import statsmodels.api as sm

DEPARTMENT OF AI Page 11
import statistics

import math

from collections import OrderedDict

x =[]

print("enter the data:")

x = list(map(float, input().split()))

print("enter the number of bins:")

bi = int(input())

# X_dict will store the data in sorted order

X_dict = OrderedDict()

# x_old will store the original data

x_old ={}

# x_new will store the data after binning

x_new ={}

for i in range(len(x)):

X_dict[i]= x[i]

x_old[i]= x[i]

x_dict = sorted(X_dict.items(), key = lambda x: x[1])

# list of lists(bins)

binn =[]

# a variable to find the mean of each bin

DEPARTMENT OF AI Page 12
avrg = 0

i=0

k=0

num_of_data_in_each_bin = int(math.ceil(len(x)/bi))

# performing binning

for g, h in X_dict.items():

if(i<num_of_data_in_each_bin):

avrg = avrg + h

i=i+1

elif(i == num_of_data_in_each_bin):

k=k+1

i=0

binn.append(round(avrg / num_of_data_in_each_bin, 3))

avrg = 0

avrg = avrg + h

i=i+1

rem = len(x)% bi

if(rem == 0):

binn.append(round(avrg / num_of_data_in_each_bin, 3))

else:

binn.append(round(avrg / rem, 3))

# store the new value of each data

DEPARTMENT OF AI Page 13
i=0

j=0

for g, h in X_dict.items():

if(i<num_of_data_in_each_bin):

x_new[g]= binn[j]

i=i+1

else:

i=0

j=j+1

x_new[g]= binn[j]

i=i+1

print("number of data in each bin")

print(math.ceil(len(x)/bi))

for i in range(0, len(x)):

print('index {2} old value {0} new value {1}'.format(x_old[i], x_new[i], i))

Smoothing by bin medians:

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn import linear_model

# import statsmodels.api as sm

import statistics

import math

DEPARTMENT OF AI Page 14
from collections import OrderedDict

x =[]

print("enter the data")

x = list(map(float, input().split()))

print("enter the number of bins")

bi = int(input())

# X_dict will store the data in sorted order

X_dict = OrderedDict()

# x_old will store the original data

x_old ={}

# x_new will store the data after binning

x_new ={}

for i in range(len(x)):

X_dict[i]= x[i]

x_old[i]= x[i]

x_dict = sorted(X_dict.items(), key = lambda x: x[1])

# list of lists(bins)

binn =[]

# a variable to find the mean of each bin

avrg =[]

i=0

k=0

DEPARTMENT OF AI Page 15
num_of_data_in_each_bin = int(math.ceil(len(x)/bi))

# performing binning

for g, h in X_dict.items():

if(i<num_of_data_in_each_bin):

avrg.append(h)

i=i+1

elif(i == num_of_data_in_each_bin):

k=k+1

i=0

binn.append(statistics.median(avrg))

avrg =[]

avrg.append(h)

i=i+1

binn.append(statistics.median(avrg))

# store the new value of each data

i=0

j=0

for g, h in X_dict.items():

if(i<num_of_data_in_each_bin):

x_new[g]= round(binn[j], 3)

i=i+1

else:

DEPARTMENT OF AI Page 16
i=0

j=j+1

x_new[g]= round(binn[j], 3)

i=i+1

print("number of data in each bin")

print(math.ceil(len(x)/bi))

for i in range(0, len(x)):

print('index {2} old value {0} new value {1}'.format(x_old[i], x_new[i], i))

Smoothing by bin boundaries:

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn import linear_model

# import statsmodels.api as sm

import statistics

import math

from collections import OrderedDict

x =[]

print("enter the data")

x = list(map(float, input().split()))

print("enter the number of bins")

bi = int(input())

# X_dict will store the data in sorted order

DEPARTMENT OF AI Page 17
X_dict = OrderedDict()

# x_old will store the original data

x_old ={}

# x_new will store the data after binning

x_new ={}

for i in range(len(x)):

X_dict[i]= x[i]

x_old[i]= x[i]

x_dict = sorted(X_dict.items(), key = lambda x: x[1])

# list of lists(bins)

binn =[]

# a variable to find the mean of each bin

avrg =[]

i=0

k=0

num_of_data_in_each_bin = int(math.ceil(len(x)/bi))

for g, h in X_dict.items():

if(i<num_of_data_in_each_bin):

avrg.append(h)

i=i+1

elif(i == num_of_data_in_each_bin):

k=k+1

i=0

DEPARTMENT OF AI Page 18
binn.append([min(avrg), max(avrg)])

avrg =[]

avrg.append(h)

i=i+1

binn.append([min(avrg), max(avrg)])

i=0

j=0

for g, h in X_dict.items():

if(i<num_of_data_in_each_bin):

if(abs(h-binn[j][0]) >= abs(h-binn[j][1])):

x_new[g]= binn[j][1]

i=i+1

else:

x_new[g]= binn[j][0]

i=i+1

else:

i=0

j=j+1

if(abs(h-binn[j][0]) >= abs(h-binn[j][1])):

x_new[g]= binn[j][1]

else:

x_new[g]= binn[j][0]

i=i+1

DEPARTMENT OF AI Page 19
print("number of data in each bin")

print(math.ceil(len(x)/bi))

for i in range(0, len(x)):

print('index {2} old value {0} new value {1}'.format(x_old[i], x_new[i], i))

3. Perform transformation of data using normalization (MinMaxScaler or MaxAbsScaler) on given

dataset. In preprocessing, standardization of data is one of the transformation task. Standardization is
scaling features to lie between a given minimum and maximum value, often between zero and one, or
so that the maximum absolute value of each feature is scaled to unit size. This can be achieved using
MinMaxScaler or MaxAbsScaler, respectively.

The motivation to use this scaling include robustness to very small standard deviations of features and
preserving zero entries in sparse data.

# example of a normalization
from numpy import asarray
from sklearn.preprocessing import MinMaxScaler
# define data
data = asarray([[100, 0.001],[8, 0.05],[50, 0.005],[88, 0.07],[4, 0.1]])
print(data)

# define min max scaler

scaler = MinMaxScaler()

# transform data
scaled = scaler.fit_transform(data)
print(scaled)

DEPARTMENT OF AI Page 20
Exercise 5: ASSOCIATION RULE MINING- APRIORI

Write a python program to find rules that describe associations by using Apriori algorithm

Steps in Apriori:

1. Set a minimum value for support and confidence. This means that we are only interested in finding rules for
the items that have certain default existence (e.g. support) and have a minimum value for co-occurrence with
other items (e.g. confidence).

DEPARTMENT OF AI Page 21
2. Extract all the subsets having higher value of support than minimum threshold.

3. Select all the rules from the subsets with confidence value higher than minimum threshold.

4. Order the rules by descending order of Lift.

Example:

from apyori import apriori

transactions = [ ['beer', 'nuts'], ['beer', 'cheese'], ]

#CASE1:

results = list(apriori(transactions))

association_results = list(results)

print(results[0])

#CASE2:

min support=.5,minconfidence=.8

results = list(apriori(transactions,min_support=0.5, min_confidence=0.8))

association_results = list(results)

print(len(results))

print(association_results)

OUTPUT: 5

RelationRecord(items=frozenset({'beer'}), support=1.0,
ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'beer'}),
confidence=1.0, lift=1.0)])

Case 2:

[RelationRecord(items=frozenset({'beer'}), support=1.0,
ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'beer'}),
confidence=1.0, lift=1.0)]),
DEPARTMENT OF AI Page 22
RelationRecord(items=frozenset({'cheese', 'beer'}), support=0.5,
ordered_statistics=[OrderedStatistic(items_base=frozenset({'cheese'}), items_add=frozenset({'beer'}),
confidence=1.0, lift=1.0)]),

RelationRecord(items=frozenset({'nuts', 'beer'}), support=0.5,

ordered_statistics=[OrderedStatistic(items_base=frozenset({'nuts'}), items_add=frozenset({'beer'}),
confidence=1.0, lift=1.0)])]

Three major measures to validate Association Rules:

• Support

• Confidence

• Lift

Suppose a record of 1 thousand customer transactions. Consider two items e.g. burgers and ketchup.
Out of one thousand transactions, 100 contain ketchup while 150 contain a burger. Out of 150
transactions where a burger is purchased, 50 transactions contain ketchup as well. Using this data,
Find the support, confidence, and lift.

Support:

Support(B) = (Transactions containing (B))/(Total Transactions)

For instance if out of 1000 transactions, 100 transactions contain Ketchup then the support for item
Ketchup can be calculated as:

Support(Ketchup) = (Transactions containingKetchup)/(Total Transactions)

Support(Ketchup) = 100/1000 = 10%

Confidence:

Confidence refers to the likelihood that an item B is also bought if item A is bought. It can be
calculated by finding the number of transactions where A and B are bought together, divided by total
number of transactions where A is bought.

Confidence(A→B) = (Transactions containing both (A and B))/(Transactions containing A)

DEPARTMENT OF AI Page 23
A total of 50 transactions where Burger and Ketchup were bought together. While in 150 transactions,
burgers are bought. Then we can find likelihood of buying ketchup when a burger is bought can be
represented as confidence of Burger -> Ketchup and can be mathematically written as:

Confidence (Burger→Ketchup) = (Transactions containing both (Burger and Ketchup))/(Transactions

containing A)

Confidence(Burger→Ketchup) = 50/150 = 33.3%

Lift :

Lift (A -> B) refers to the increase in the ratio of sale of B when A is sold. Lift(A –> B) can be
calculated by dividing Confidence(A -> B) divided by Support(B). Mathematically it can be
represented as:

Lift (A→B) = (Confidence (A→B))/(Support (B))

In Burger and Ketchup problem, the Lift (Burger -> Ketchup) can be calculated as:

Lift (Burger → Ketchup) = (Confidence (Burger → Ketchup))/(Support (Ketchup))

Lift(Burger → Ketchup) = 33.3/10 = 3.33

a) Display top 5 rows of data

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori
store_data = pd.read_csv("D:/datasets/store_data.csv")
print(store_data.head())
print('Structure of store data\n',str(store_data))
b) Find the rules with min_confidence : .2, min_support= 0.0045, min_lift=3, min_length=2
Let's suppose that we want rules for only those items that are purchased at least 5 times a day,
or 7 x 5 = 35 times in one week, since our dataset is for a one-week time period.
The support for those items can be calculated as 35/7500 = 0.0045.
The minimum confidence for the rules is 20% or 0.2.
DEPARTMENT OF AI Page 24
Similarly, the value for lift as 3 and finally min_length is 2 since at least two products should
exist in every rule.
#Converting data frame to list
records = []
for i in range(0, 7500):
records.append([str(store_data.values[i,j]) for j in range(0, 20)])
#Generating association rules using apriori()
#association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3,
min_length=2)
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3,
min_length=5)
association_results = list(association_rules)
print(len(association_results))
print(association_results[0])
for item in association_rules:
# first index of the inner list
# Contains base item and add item
pair = item[0]
items = [x for x in pair]
print("Rule: " + items[0] + " -> " + items[1])
#second index of the inner list
print("Support: " + str(item[1]))
#third index of the list located at 0th
#of the third index of the inner list
print("Confidence: " + str(item[2][0][2]))
print("Lift: " + str(item[2][0][3])) print("=====================================")

DEPARTMENT OF AI Page 25
\

Exercise 6: CLASSIFICATION - DECISION TREES

Write a python program

1. To build a decision tree classifier to determine the kind of flower by using given dimensions.

DEPARTMENT OF AI Page 26
2. Training with various split measures (Gini index, Entropy and Information Gain)
3. Compare the accuracy

A decision tree is a machine learning algorithm that uses a tree-like model of decisions and their
subsequent consequences to arrive at a particular decision. It is a Supervised Machine Learning model,
where the data is continuously split according to a certain parameter, and finally, a decision is made.

Usually, a decision tree is drawn upside down, with the root node at the top and the leaf nodes at the
bottom. A decision tree usually contains 3 types of nodes.

1. Root node: The very top node that represents the entire population or sample.
2. Decision nodes: Sub-nodes that split from the root node.
3. Leaf nodes: Nodes with no children, also known as terminal nodes.

DEPARTMENT OF AI Page 27
Build a Decision Tree using IRIS dataset in Python:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Reading the Iris.csv file

data = load_iris()

# Extracting Attributes / Features

X = data.data

# Extracting Target / Class Labels

y = data.target

# Import Library for splitting data

from sklearn.model_selection import train_test_split

# Creating Train and Test datasets

X_train, X_test, y_train, y_test = train_test_split(X,y, random_state = 50, test_size = 0.25)

# Creating Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()
clf.fit(X_train,y_train)

# Predict Accuracy Score

y_pred = clf.predict(X_test)
print("Train data accuracy:",accuracy_score(y_true = y_train, y_pred=clf.predict(X_train)))
print("Test data accuracy:",accuracy_score(y_true = y_test, y_pred=y_pred))

DEPARTMENT OF AI Page 28
Decision Tree

DEPARTMENT OF AI Page 29
Exercise 7: CLASSIFICATION –BAYESIAN NETWORK

1. Build Bayesian network model using existing default data

2. Visualize Tree Augmented Naïve Bayes model

o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

PROGRAM:-

from sklearn import metrics

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

DEPARTMENT OF AI Page 30
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
dataset=pd.read_csv("Iris.csv")
X=dataset.iloc[:,:4].values
Y=dataset['Species'].values
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3)
classifier=GaussianNB()
classifier.fit(X_train,Y_train)
print(X_test[0])
y_pred=classifier.predict(X_test)
print(y_pred)
accuracy=accuracy_score(Y_test,y_pred)
print("Accuracy:",accuracy)

DEPARTMENT OF AI Page 31
Exercise 8: CLUSTERING – K-MEANS

Write a python program

1. To perform Preprocessing

2. To perform clustering using k-means algorithm to cluster the records

Program:

import matplotlib.pyplot as plt

import numpy as np

from sklearn.cluster import KMeans

X=np.array([[1,1],[1.5,2],[3,4],[5,7],[3.5,5],[4.5,5],[3.5,4.5]])

print(X)

plt.scatter(X[:,0],X[:,1])

kmeans=KMeans(n_clusters=2)

kmeans.fit(X)

DEPARTMENT OF AI Page 32

Data Analytics Using Python Lab Manual
50% (2)
Data Analytics Using Python Lab Manual
8 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
3 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
12 pages
Long Quiz 3
100% (1)
Long Quiz 3
18 pages
Vi Sem Bca Qbank - Wcms - Fds
50% (2)
Vi Sem Bca Qbank - Wcms - Fds
11 pages
Dataware Q&a Bank
100% (1)
Dataware Q&a Bank
42 pages
1813 Sanjeev MarketBasketAnalysis
0% (1)
1813 Sanjeev MarketBasketAnalysis
45 pages
Apriori
No ratings yet
Apriori
34 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
13 pages
Discover Frequent Items in Small Stationary
No ratings yet
Discover Frequent Items in Small Stationary
16 pages
Write An ALP For All Arithematic Operations and Write ALP For Product of Two Numbers Withoutusing MUL Operation
No ratings yet
Write An ALP For All Arithematic Operations and Write ALP For Product of Two Numbers Withoutusing MUL Operation
3 pages
Mining Association Rule With WEKA Explorer: Lab Exercise Two
No ratings yet
Mining Association Rule With WEKA Explorer: Lab Exercise Two
4 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
Exercises 695 Ar
No ratings yet
Exercises 695 Ar
1 page
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
No ratings yet
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
3 pages
Design and Implementation of Efficient APRIORI Algorithm
No ratings yet
Design and Implementation of Efficient APRIORI Algorithm
4 pages
Module 4 BDA NOTES
No ratings yet
Module 4 BDA NOTES
75 pages
6.DMBI Question Bank PDF
No ratings yet
6.DMBI Question Bank PDF
12 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Data Science
No ratings yet
Data Science
18 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Shalvin
No ratings yet
Shalvin
9 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Assignment 4 - Jupyter Notebook
No ratings yet
Assignment 4 - Jupyter Notebook
6 pages
Analisis Pola Pembelian Konsumen Pada PT Indoritel Makmur Internasional TBK Menggunakan Metode Algoritma Apriori
No ratings yet
Analisis Pola Pembelian Konsumen Pada PT Indoritel Makmur Internasional TBK Menggunakan Metode Algoritma Apriori
6 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Research and Case Analysis of Apriori Algorithm Based On Mining Frequent Item-Sets
No ratings yet
Research and Case Analysis of Apriori Algorithm Based On Mining Frequent Item-Sets
11 pages
DM Lab Progrmas 35
No ratings yet
DM Lab Progrmas 35
38 pages
DMC Lab Ex - 1 To 15 (31.03.2024)
No ratings yet
DMC Lab Ex - 1 To 15 (31.03.2024)
52 pages
Week 10
No ratings yet
Week 10
50 pages
LAB EXERCISE 2 - Data Preprocessing
No ratings yet
LAB EXERCISE 2 - Data Preprocessing
10 pages
Ap Python
No ratings yet
Ap Python
12 pages
DAV Practicals
No ratings yet
DAV Practicals
26 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
23 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Python
No ratings yet
Python
32 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Python Class 6 Assignment Solution
No ratings yet
Python Class 6 Assignment Solution
9 pages
LAB EXERCISE 2 - Data Preprocessing
No ratings yet
LAB EXERCISE 2 - Data Preprocessing
10 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Practical File 2024
No ratings yet
Practical File 2024
25 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
Data Preprocessing For Clustering
No ratings yet
Data Preprocessing For Clustering
40 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
48 pages
AD3411
No ratings yet
AD3411
28 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Data Analyticskit601 Unit 4 Notes
No ratings yet
Data Analyticskit601 Unit 4 Notes
178 pages
Association Rule Mapping - Unit-4
No ratings yet
Association Rule Mapping - Unit-4
11 pages
DA Unit 4
100% (1)
DA Unit 4
125 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Sowmi DS
No ratings yet
Sowmi DS
27 pages
Practical 1
No ratings yet
Practical 1
5 pages
A Survey of Utility-Oriented Pattern Mining
No ratings yet
A Survey of Utility-Oriented Pattern Mining
22 pages
Market Basket Analysis Using Association Rules Unit 5
No ratings yet
Market Basket Analysis Using Association Rules Unit 5
21 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Market Basket Analysis Using Apriori Algorithm Gro
No ratings yet
Market Basket Analysis Using Apriori Algorithm Gro
9 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
Edp 3
No ratings yet
Edp 3
16 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
CS-3361-Data-science-lab Manual
No ratings yet
CS-3361-Data-science-lab Manual
36 pages
A Survey of Key Technologies For High Utility Patterns Mining
No ratings yet
A Survey of Key Technologies For High Utility Patterns Mining
17 pages
3 - Pandas
No ratings yet
3 - Pandas
87 pages
DWDM Lab Manual 28.04.25-9-14
No ratings yet
DWDM Lab Manual 28.04.25-9-14
6 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
Lecture 5
No ratings yet
Lecture 5
27 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
45 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
Gec Practicals
No ratings yet
Gec Practicals
31 pages
ML Notes
No ratings yet
ML Notes
44 pages
Unit 4 Data Analytics
No ratings yet
Unit 4 Data Analytics
11 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet