Datawarehouse Pract 2
Datawarehouse Pract 2
WEKA Explorer
The University Experimenter
of Waikalo
KnowiedgeFlow
Waikato Environment tor Knowledge Analysis
Vesion 38.0 Workbench
tc) 1999-2016
The University ofWalkate
Simple CLI
Hamilton, New Zesland
Choosse.None
Currenirelation
Relalon None Selected attrilbute
tstancS None Aiributes: None Name. None
Sum ofwelghts: None Type None
An tes Missing: None Disinct None Unlque None
Msualze All
Status
debug Falsa
Num) rnd
coNotCheckCapabiues False
Stait
localyPredidive Truo
Result bst (righ
missingSepsrsleFalse
numThreads 1
pooiSize
Status
Weka Explores
/ PreprocessClassty Cluster Associate Selectatrlbutes Visualze
Attribute Evraluator
Choose
gUcenencObjetEdto
weka atributeSeledhon. BestFirst
Attibute Selec
About
More
W
Cross2 ÍBestFirst
by greedy hillcdimbing
Search6s the spaca of attrlbutesubsets
augmentedwith a bactracxing laclty
(Num) treng
drecionForward
Stat
loouDCacheS2e 1
Result tist (righ
searchTeminatlon 5
statSet
OK Cancef
A
Ooen. Lo
Status
OK
Page 3
Information Technology
lnside the weka explorer window there are
six tabs:
Preprocess- used to choose the data file to be used by the application.
Open File- allows for the user to select files residing on the local machine or recorded medium
Open URL- provides a mechanism to locate afile or data source from a different location
specified by the user
Open Database- allows the user to retrieve files or data from a database source provided by user
2
Classify- used to test and train different learning schemes on the preprocessed data file under
experimentation
3 Cluster- used to apply different tools that identify clusters within the data file.
The Cluster tab opens the process that is used to identify commonalties or clusters of occurrences
within the data set and produce information for the user to analyze.
Page 4
Information Technology
Preproces Clasoty lCluster AssGala
Qusterr seled atnodes Vsu
Choose EM100 -4-1-y
10-rnay- 1 L 1 0E.6-i Ner 1 0E 6M 10F.6
K10-umn dote 1 100
Cluster node
Clusterer output
Uwetraning sei
WSupped tesleet
W Percentage api
Clas$es lo duslers evaluaton
Btore dusters for
nsualcatlon
lgnore atnbules
Sart
Result kst (cight-chck lor opuons)
Stitus
OK
Log
4 Association- used to apply different rules to the data file that identify association within the
data. The associate tab opens a window to select the options for
associations within the dataset.
Status
OK Log
5. Select attributes-used to apply different rules to reveal changes based on selected attributes
inclusion or exclusion from the
6
experiment
Visualize- used to see what the various manipulation produced on the data set in a 2D format.
inscatter plot and bar
graph output.
Experimenter -this option allows USers to conduct diferent experimental variations on data
sets and pertom statistical manipulation. The Weka Experiment Environment enables the user to
Create, run, modify, and analyze experiments in a more convenient manner than is possible when
processing the schemes individually. For example, the user can create an experiment that runs
Several schemes against a series of datasets and then analyze the results to detemine if one of the
schemes is (statistically) better than the other schemes.
nment
Setup AonAnalys
Experlment Connguratlon Made Simple
Open New
Resutts Destnation
Number of repetitons:
Dasttion
Algorfthms
Datasets
Cte'eic.
database.
file, CSV file, JDBC
Results destination: ARFF Percentage Split (data
randomized).
Cross-validation (default),Train/Test
Experiment type: first/Algorithms first.
Number of repetitions, Data sets
Iteration control:
Algorithmns: filters
Page 6
Information Technology
drop
3. Knowledge Flow -basically the same functionality as Explorer with drag and
previous
functionality. The advantage of this option is that it supports incremental learning from
results
ability to execute
4. Simple CLI - provides users without a graphic interface option the
commands from a terminal window.
b. Explore the default datasets in weka tool.
directory.
double click on the data"
Click the "Open button to open a data set and
file... " to practíceon.
learning datasetsthat you can use
Weka provides a number of small common machine
Select the "iris.arfi file to load the Iris dataset.
Seach dota
Share
and
References: (2005) Data Mining: Practical machine learning tools
E.
[1]Witten, LH. and Frank, Kaufmann, San Francisco. Publishers,
Morgan
techniques. 2nd edition Learning, Morgan Kaufmann
C4.5: Programs for Machine
[2] Ross Quinlan (1993).
San Mateo, CA.
(3] CVS-htp://weka.sourceforge.net/wikilindex.php/CVS
[4]Weka Doc-http://weka.sourceforge.net/wekadoc/
Exercise: normalization
min-max
1. Normalize the data using
Page 7
Information Technology