Data Mining With Weka - Demo

The document provides an introduction to using the Weka data mining tool. It covers downloading and installing Weka, exploring datasets, using classification and clustering algorithms on sample datasets, evaluating models with training and testing as well as cross validation, and finding association rules. Visualization techniques are also discussed.

Uploaded by

dawit gebreyohans

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views12 pages

Data Mining With Weka - Demo

Uploaded by

dawit gebreyohans

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Data Mining with Weka

Instructor: Solomon (Ph.D)

Getting started with Weka
Introduction
• Download from:
– https://waikato.github.io/weka-wiki/downloading_weka/

(for Windows, Mac, Linux)

• Weka 3.8.6 (the latest stable version of Weka and includes

datasets)
Exploring the Explorer
• Install Weka
• Get datasets
– Convert .xls to .csv (Save As -> CSV (MS-DOS))
– Convert .csv to .arff (Experimenter -> Analyse -> File -> .csv ->
Open Explorer -> Edit -> Ok -> Save -> .arff)
• Open Explorer
• Open a dataset (weather.nominal.arff)
• Look at attributes and their values
• Edit the dataset
• Save it?
Exploring datasets
• The classification problem (weather.numeric.arff)
• weather.nominal.arff, weather.numeric.arff
• Nominal vs. numeric attributes
Training and testing
• Use J48 to analyse the segment dataset
– Open file segment-challenge.arff
– Choose J48 decision tree learner
– Supplied test set segment-test.arff
– Run it: 96% accuracy

– Evaluate on training set: 99% accuracy

– Evaluate on percentage split: 95% accuracy

– Do it again: get exactly the same result!
Repeated training and testing
• Evaluate J48 on segment-challenge
– With segment-challenge.arff …
– and J48
– Set percentage split to 90%
– Run it: 96.7% accuracy (seed = 1)
– Repeat with seed 2, 3, 4, 5, 6, 7, 8, 9, 10 -> 0.94, 0.94, 0.967,
0.953, 0.967, 0.920, 0.947, 0.933, 0.947
• Sample Mean (x̄) =  xi/n = 0.949
• Standard deviation () =  (xi - x̄)2/n-1 = 0.018
Cross-validation
• 10-fold cross-validation
– Divide dataset into 10 parts (folds)
– Hold out each part in turn
– Average the results
– Each data point used once for testing, 9 times for training

• Stratified cross-validation: ensure that each fold has the right

proportion of each class value
• Practical rule of thumb:
– Lots of data? – use percentage split
– Else stratified 10-fold cross-validation
Clustering

• With clustering, there is no “class” attribute

• Try to divide the instances into natural groups, or
“clusters”
• Example:
– Examine iris.arff in the Explorer
– Imagine deleting the class attribute
– Could you recover the classes by clustering the data?

Iris Setosa Iris Versicolor Iris Virginica

Visualizing clusters

• Iris data (iris.arff), SimpleKMeans, specify 3 clusters

– 3 clusters with 50 instances each
• Visualize cluster assignments (right-click menu)
– Plot Cluster against Instance_number to see what the errors are
• Perfect? – surely not!
– Ignore class attribute; 3 clusters - with 61, 50, 39 instances

• Which instances does a cluster contain?

– Use the AddCluster unsupervised attribute
filter
– Try with SimpleKMeans; Apply and click Edit
• Hard to evaluate clustering
– It should really be evaluated with respect to an
application.
Association rules
• Weather data (weather.nominal.arff) has 336 rules with confidence
100%
– But only 8 have support >= 3, only 58 have support >= 2
• Weka: specify minimum confidence level (minMetric, default 90%)
number of rules sought (numRules, default 10)
• Support is expressed as a proportion of the number of instances
• Weka runs Apriori algorithm several times
starts at upperBoundMinSupport (usually left at 100%)
decreases by delta at each iteration (default 5%)
stops when numRules reached
… or at lowerBoundMinSupport (default 10%)
Thank You!

WEKA Lab Manual
100% (2)
WEKA Lab Manual
107 pages
Lab 01-PhamBinhDuong ITCSIU21054
No ratings yet
Lab 01-PhamBinhDuong ITCSIU21054
9 pages
Data Mining - Lab 7 Using Weka For Clustering
No ratings yet
Data Mining - Lab 7 Using Weka For Clustering
9 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Data Mining - Lab - Manual
No ratings yet
Data Mining - Lab - Manual
20 pages
Data Mining Practical
No ratings yet
Data Mining Practical
31 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Weka Book Questions
0% (1)
Weka Book Questions
2 pages
Data Mining Lab Questions
100% (1)
Data Mining Lab Questions
47 pages
Data Mining Record
No ratings yet
Data Mining Record
24 pages
Data Mining: Index
No ratings yet
Data Mining: Index
47 pages
HAI C-06 Jueves 15-10-2020
No ratings yet
HAI C-06 Jueves 15-10-2020
34 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Lab3 NguyenQuocKhanh ITITIU18186
No ratings yet
Lab3 NguyenQuocKhanh ITITIU18186
7 pages
Semester 2, 2020 Week 8: Data Mining in WEKA Tutorial/Lab Session - 7
No ratings yet
Semester 2, 2020 Week 8: Data Mining in WEKA Tutorial/Lab Session - 7
13 pages
K Means EM Cobweb WEKA PDF
No ratings yet
K Means EM Cobweb WEKA PDF
6 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
36 pages
WEKA Manual
No ratings yet
WEKA Manual
11 pages
LAB Experiment Data Mining and Warehousing
No ratings yet
LAB Experiment Data Mining and Warehousing
33 pages
Using Weka 3 For Classification and Prediction: Preparing A Test File
No ratings yet
Using Weka 3 For Classification and Prediction: Preparing A Test File
6 pages
Lecture 7 - Weka
No ratings yet
Lecture 7 - Weka
69 pages
DW Lab
No ratings yet
DW Lab
85 pages
DWDM Lab Tasks
No ratings yet
DWDM Lab Tasks
13 pages
Clustering Examples
No ratings yet
Clustering Examples
47 pages
Data Warehousing - To Write
No ratings yet
Data Warehousing - To Write
23 pages
DMlab - FilE prINCE
No ratings yet
DMlab - FilE prINCE
27 pages
Data Mining - Session #1 - Unlocked
No ratings yet
Data Mining - Session #1 - Unlocked
22 pages
DM Manual-Min
No ratings yet
DM Manual-Min
100 pages
Data Warehousing Lab Manual
No ratings yet
Data Warehousing Lab Manual
36 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
Lab 01-Form
No ratings yet
Lab 01-Form
8 pages
Wa0002.
No ratings yet
Wa0002.
21 pages
Assignment-7: Opening Iris - Arff and Removing Class Attribute
No ratings yet
Assignment-7: Opening Iris - Arff and Removing Class Attribute
17 pages
Workshop 1
No ratings yet
Workshop 1
16 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Analysis & Pediction Using WEKA Machine Learing Toolkit
No ratings yet
Analysis & Pediction Using WEKA Machine Learing Toolkit
37 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
Wekappt
No ratings yet
Wekappt
58 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
DM Assignments
No ratings yet
DM Assignments
4 pages
Practical 5: Introduction To Weka For Classfication
100% (1)
Practical 5: Introduction To Weka For Classfication
4 pages
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
No ratings yet
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
4 pages
Weka (20030421-Version1 by Kdelab)
No ratings yet
Weka (20030421-Version1 by Kdelab)
51 pages
Exp 6
No ratings yet
Exp 6
12 pages
Lesson 2
No ratings yet
Lesson 2
30 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
Introduction To Weka-A Toolkit For Machine Learning
No ratings yet
Introduction To Weka-A Toolkit For Machine Learning
11 pages
Individual Assignment 2
No ratings yet
Individual Assignment 2
4 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Task 0: Weka Introduction
No ratings yet
Task 0: Weka Introduction
11 pages
Lab 04
No ratings yet
Lab 04
7 pages
AI32 Guide To Weka PDF
No ratings yet
AI32 Guide To Weka PDF
6 pages
DWM
No ratings yet
DWM
9 pages
DWDM Lab 2
No ratings yet
DWDM Lab 2
3 pages
6 WebMining
No ratings yet
6 WebMining
45 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
DMLB 1
No ratings yet
DMLB 1
3 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Lesson-2 LAN Design
No ratings yet
Lesson-2 LAN Design
116 pages
Labs
No ratings yet
Labs
35 pages
Chap8 Advanced Cluster Analysis
No ratings yet
Chap8 Advanced Cluster Analysis
45 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
DM Intro - 1
No ratings yet
DM Intro - 1
31 pages
Lesson 1
No ratings yet
Lesson 1
27 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
ZEB PPT - PPTX (Autosaved)
No ratings yet
ZEB PPT - PPTX (Autosaved)
11 pages
Optimizing Data Warehousing Performance Through Machine Learning
No ratings yet
Optimizing Data Warehousing Performance Through Machine Learning
10 pages