Lab 01-Form
Lab 01-Form
1.1. Introduction
Weka is an open-source software available at www.cs.waikato.ac.nz/ml/weka. Weka stands for the
Waikato Environment for Knowledge Analysis. It offers clean, spare implementation of the simplest
techniques, designed to aid understanding of the data mining techniques. It also provides a work-bench
that includes full, working, state-of-the-art implementations of many popular learning schemes that can
be used for practical data mining or for research.
In the first class, we are going to get started with Weka: exploring the “Explorer” interface, exploring
some datasets, building a classifier, using filters, and visualizing your dataset. (See the lecture of class 1
by Ian H. Witten, [1])
Task: Taking notes how you find the Explorer, and answering questions in the following sections
In dataset weather.nominal.arff, how many attributes are there in the relation? What are their values?
What is the class and its values? Counting instances for each attribute value.
1
Dataset Attributes Values #Instances
outlook sunny 5
Relation: overcast 4
weather.symBolic rainy 5
#Instances: 14 Distinct 3
#Attributes: 5 hot 4
temperature mild 6
cool 4
Distinct 3
high high
humidity normal normal
Distinct 2
TRUE TRUE
windy FALSE FALSE
Distinct 2
Class play yes yes
no no
Distinct 2
Weather.numeric.arff
Glass.arff
2
Dataset Attributes Values #Instances
Rl Minimum 1.511
Relation:Glass Maximum 1.534
#Instances: 214 Mean 1.518
#Attributes: 10 StdDev 0.003
Distinct: 178
Na Minimum 10.73
Maximum 17.38
Mean 13.408
StdDev 0.817
Distinct: 142
Mg Minimum 0
Maximum 4.49
Mean 2.685
StdDev 1.442
Distinct: 94
Al Minimum 0.29
Maximum 3.5
Mean 1.445
StdDev 0.499
Distinct: 118
Si Minimum 69.81
Maximum 75.41
Mean 72.651
StdDev 0.775
Distinct: 133
K Minimum 0
Maximum 6.21
Mean 0.497
StdDev 0.652
Distinct: 65
Ca Minimum 5.43
Maximum 16.19
Mean 8.957
StdDev 1.423
Distinct: 143
Ba Minimum 0
Maximum 3.15
Mean 0.175
StdDev 0.497
Distinct: 34
Fe Minimum 0
Maximum 0.51
Mean 0.057
StdDev 0.097
Distinct: 32
Class Type build wind float 70
3
build wind non-float 76
vehic wind float 17
vehic wind non-float 0
containers 13
tableware 9
headlamps 29
Distinct: 6
temperature Minimum 20
Distinct: 10
humidity Minimum 50
Maximum 90
Mean 70.8
StdDev 13.155
Distinct: 10
Maximum 800
Mean 535
StdDev 171.675
Distinct: 9
wind_speed Minimum 2
Maximum 7
Mean 4.1
4
StdDev 1.663
Distinct: 6
moderate 3
high 3
Distinct: 3
5
Evaluate the confusion matrix every time running an algorithm.
The algorithm is skewed towards classifying into a = build wind float, and b = build wind non-float
RandomTree:
The algorithm is skewed towards classifying into a = build wind float, and b = build wind non-float.
However, RandomTree provides better results than 148.
6
_Use a filter to remove an attribute
Follow the instructions in [1], review the outputs of J48 applied to glass.arff:
Original
Remove Fe
Remove all
attributes
except RI and
MG
7
8