Lab 01-PhamBinhDuong ITCSIU21054
Lab 01-PhamBinhDuong ITCSIU21054
1.1. Introduction
Weka is an open-source software available at www.cs.waikato.ac.nz/ml/weka. Weka stands for the
Waikato Environment for Knowledge Analysis. It offers clean, spare implementation of the simplest
techniques, designed to aid understanding of the data mining techniques. It also provides a work-bench
that includes full, working, state-of-the-art implementations of many popular learning schemes that can
be used for practical data mining or for research.
In the first class, we are going to get started with Weka: exploring the “Explorer” interface, exploring
some datasets, building a classifier, using filters, and visualizing your dataset. (See the lecture of class 1
by Ian H. Witten, [1])
Task: Taking notes how you find the Explorer, and answering questions in the following sections
In dataset weather.nominal.arff, how many attributes are there in the relation? What are their values?
What is the class and its values? Counting instances for each attribute value.
1
Dataset Attributes Values #Instances
outlook sunny 5
Relation:weather.symbolic overcast 4
#Instances:14 rainy 5
#Attributes:5 distinct: 3
temperature hot 4
mild 6
cold 4
distinct: 3
humidity high 7
normal 7
distinct: 2
TRUE 6
windy FALSE 8
distinct: 2
Class play yes 9
no 5
distinct: 2
2
Relation: Glass Maximum:1.534
#Instances: 214 Mean:1.518
#Attributes: 10 StdDev: 0.003
Minimum:10.73 Distinct:142
Na Maximum:17.38
Mean:13.408
StdDev: 0.817
Minimum:0 Distinct:94
Mg Maximum:4.49
Mean:2.685
StdDev: 1.441
Minimum:0.29 Distinct:118
Al Maximum:3.5
Mean:1.445
StdDev: 0.499
Si Minimum:69.81 Distinct:133
Maximum:75.41
Mean:72.651
StdDev: 0.775
K Minimum:0 Distinct:65
Maximum:6.21
Mean:0.497
StdDev: 0.652
Ca Minimum:5.43 Distinct:143
Maximum:16.19
Mean:8.957
StdDev: 1.423
Ba Minimum:0 Distinct:34
Maximum:3.15
Mean:0.175
StdDev: 0.497
Fe Minimum:0 Distinct:32
Maximum:0.51
Mean:0.057
StdDev: 0.097
Class Type build wind float 70
build wind non-float 76
vehic wind float 17
vehic wind non-float 0
containers 13
tableware 9
headlamps 29
distinct: 6
3
Dataset Attributes Values #Instances
Sex Minimum:0 distinct : 2
Relation: gameandgrade Maximum: 1
#Instances: 770 Mean: 0.499
#Attributes: 10 StdDev: 0.5
School Code Minimum: 1 distinct: 11
Maximum: 11
Mean: 4.944
StdDev: 3
Minimum: 0 distinct : 5
Playing Years Maximum: 4
Mean: 1.584
StdDev: 1.407
Minimum: 0 distinct: 6
Playing Often Maximum: 5
Mean: 2.243
StdDev: 1.924
Playing Hours Minimum: 0 distinct: 6
Maximum: 5
Mean: 1.488
StdDev: 1.338
Playing Games Minimum: 0 Distinct 3
Maximum: 2
mean : 0.706
StdDev: 0.459
Parent Revenue Minimum: 0 distinct: 5
maximum: 4
mean: 1.838
stddev: 1.064
Father Education minimum: 0 distinct: 7
maximum: 6
mean: 3.718
stddev: 1.172
Mother Education minimum: 0 distinct: 7
maximum: 6
mean : 3.41
std dev: 1.176
Class Grade distinct : 105
4
Algorith Pruned/unpruned minNu No. Corre
m mObj of ctly
Lea Classi
ves fied
Insta
nces
J48 Pruned: 2 30 143
Unpruned 2 30 144
5
Random 150
Tree
6
Random Tree
7
_Fewer attributes, better classification:
Follow the instructions in [1], review the outputs of J48 applied to glass.arff:
8
-
15
Iris-virginica Iris-versicolor
73
Iris-virginica Iris-versicolor
119
Iris-versicolor Iris-virginica
92
Iris-versicolor Iris-virginica
109
98 Iris-versicolor Iris-setosa