0% found this document useful (0 votes)
130 views13 pages

BDM Tool - Weka: Example 6: K-Means Clustering

This document discusses using the K-Means clustering algorithm in WEKA to analyze the bank-data.csv dataset. It explains that WEKA's SimpleKMeans implementation can handle categorical and numerical attributes using Euclidean distance. The user is instructed to choose SimpleKMeans, evaluate the cluster assignments, and check the number of clusters and incorrectly clustered instances. The document also mentions another example that uses the Air Traffic Passenger Statistics.csv dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views13 pages

BDM Tool - Weka: Example 6: K-Means Clustering

This document discusses using the K-Means clustering algorithm in WEKA to analyze the bank-data.csv dataset. It explains that WEKA's SimpleKMeans implementation can handle categorical and numerical attributes using Euclidean distance. The user is instructed to choose SimpleKMeans, evaluate the cluster assignments, and check the number of clusters and incorrectly clustered instances. The document also mentions another example that uses the Air Traffic Passenger Statistics.csv dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

BDM Tool - WEKA

Example 6 : K-Means clustering


K-Means Introduction (1)
K-Means Introduction (2)
K-Means Introduction (3)
K-Means Introduction (4)
K-Means Introduction (5)
K-Means Introduction (6)
K-Means Introduction (7)
K-Means Introduction (8)
K-Means Introduction (9)
Example 6 : K-Means Clustering
• http://facweb.cs.depaul.edu/mobasher/classes/ect584/weka/k-means.htm
l

• WEKA SimpleKMeans algorithm automatically handles a mixture of


categorical and numerical attributes.

• SimpleKMeans algorithm uses Euclidean distance measure to compute


distances between instances and clusters.

• Dataset : bank-data.csv
• Cluster
– Choose : SimpleKMeans / EM / HierarchicalClusterer
– Classes to clusters evaluation : Variable selection (NOM)
– Check the number of clusters and Incorrectly clustered instances
Example 7 – Air Traffic Passenger Statistics
• TRY IT
• Data set - Air Traffic Passenger Statistics.csv

• Activity Period
• Operating Airline
• Operating Airline IATA Code
• Published Airline
• Published Airline IATA Code
• GEO Summary
• GEO Region
• Activity Type Code
• Price Category Code
• Terminal
• Boarding Area
• Passenger Count
• Adjusted Activity Type Code
• Adjusted Passenger Count
• Year
• Month

You might also like