0% found this document useful (0 votes)
16 views7 pages

Datawarehouse Pract 2

Dwm

Uploaded by

Niraj Dandge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views7 pages

Datawarehouse Pract 2

Dwm

Uploaded by

Niraj Dandge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Experiment : Installation of WEKA Tool

Aim: A. Investigation the Application


interfaces of the Weka tool. Introduction:
Introduction
Weka (pronounced to rhyme with Mecca) is a workbench
visualization tools and algorithms for data analysis and that contains a collection of
graphical user interfaces for easy access to these predictive modeling, together with
Weka was a Tc/Tk front-end to (mostly functions. The original non-Java version of
third-party) modeling algorithms implemented in other
programming languages, plus data preprocessing
running machine learning experiments. This utilities in C, and Make file-based system for
original
analyzing data from agricultural domains, but the moreversion was primarily designed as a tool for
recent fully
for which development
started in 1997, is now used in many Java-based version (Weka 3),
particular for educational purposes and research. different application areas, in
Advantages Weka include:
of
Free availability under the GNU
General
Portability, since it is fully implemented Public License.
in the Java
on almost any modern
computing programming language and thus runs
Acomprehensive collection of platform
data
Ease of use due to its graphical user preprocessing and modeling techniques
interfaces
Description:
Open the program. Once the program has
been loaded on the user's machine it is
navigating the programs start option and that will depend on
to opened by
the user"s operating system.
Figure 1.1 is an example of the initial opening screen on a
There are four options available on this initial
computer.
screen:
Weka GUIChooser
Program Visualization Tools Help
Applications

WEKA Explorer
The University Experimenter
of Waikalo

KnowiedgeFlow
Waikato Environment tor Knowledge Analysis
Vesion 38.0 Workbench
tc) 1999-2016
The University ofWalkate
Simple CLI
Hamilton, New Zesland

Fig: 1.1 Weka GUI


1.
Explorer -the
the Explorer graphical interface used to conduct experimentation on raw data After clicking
button the weka explorer
interface appears.
PrepocessCSS
Open URI
Füler
Open D8.. Generate

Choosse.None
Currenirelation
Relalon None Selected attrilbute
tstancS None Aiributes: None Name. None
Sum ofwelghts: None Type None
An tes Missing: None Disinct None Unlque None

Msualze All

Status

Welcoome to the weka Eploret


Log i0

Fig: 1.2 Pre-processor

Information Technology Page 2


Weka Explorer
Anrbate Evatuntor
Chooe
Soarch Metho wela otbutb
Selecsba ClsSubselEvd
Abou
Chooss
Cls SubseiEval
Atribute Selec
Evaluates he worth o a subsel t atibutes tby consldering the Capablitos
Indiaual predcwe abUY ot ean ealura along nthe degree
of redundana beween
them

debug Falsa
Num) rnd
coNotCheckCapabiues False
Stait
localyPredidive Truo
Result bst (righ
missingSepsrsleFalse
numThreads 1

pooiSize

Status

Open OR Cancel Log


S3ve

Weka Explores
/ PreprocessClassty Cluster Associate Selectatrlbutes Visualze
Attribute Evraluator

Choose CtsSutbsetEval -P1-E1


Search ethod
X

Choose
gUcenencObjetEdto
weka atributeSeledhon. BestFirst
Attibute Selec
About
More

W
Cross2 ÍBestFirst
by greedy hillcdimbing
Search6s the spaca of attrlbutesubsets
augmentedwith a bactracxing laclty

(Num) treng
drecionForward
Stat
loouDCacheS2e 1
Result tist (righ
searchTeminatlon 5

statSet
OK Cancef
A
Ooen. Lo
Status

OK

Page 3

Information Technology
lnside the weka explorer window there are
six tabs:
Preprocess- used to choose the data file to be used by the application.
Open File- allows for the user to select files residing on the local machine or recorded medium
Open URL- provides a mechanism to locate afile or data source from a different location
specified by the user
Open Database- allows the user to retrieve files or data from a database source provided by user
2
Classify- used to test and train different learning schemes on the preprocessed data file under
experimentation

Fig: 1.3 choosing Zero set from classify


Again there are several options to be selected inside of the
classify tab. Test option gives the user
the choice of using four different test mode
scenarios on the data set.
1. Use training set
2. Supplied training set
3. Cross validation
4. Split percentage

3 Cluster- used to apply different tools that identify clusters within the data file.
The Cluster tab opens the process that is used to identify commonalties or clusters of occurrences
within the data set and produce information for the user to analyze.

Page 4
Information Technology
Preproces Clasoty lCluster AssGala
Qusterr seled atnodes Vsu
Choose EM100 -4-1-y
10-rnay- 1 L 1 0E.6-i Ner 1 0E 6M 10F.6
K10-umn dote 1 100
Cluster node
Clusterer output
Uwetraning sei
WSupped tesleet
W Percentage api
Clas$es lo duslers evaluaton
Btore dusters for
nsualcatlon
lgnore atnbules
Sart
Result kst (cight-chck lor opuons)

Stitus

OK
Log

4 Association- used to apply different rules to the data file that identify association within the
data. The associate tab opens a window to select the options for
associations within the dataset.

PreprocessClassity Ciu_ter LAssodate Seled attrlbutes ISuallze


ASSOcator
Choose Apriori -N 10-T0-C 0.9-D 0.05-U1.0 -M01-8-1,0 -ç -1

Stant. Associator output


Resut list (right..

Status

OK Log
5. Select attributes-used to apply different rules to reveal changes based on selected attributes
inclusion or exclusion from the
6
experiment
Visualize- used to see what the various manipulation produced on the data set in a 2D format.
inscatter plot and bar
graph output.

Experimenter -this option allows USers to conduct diferent experimental variations on data
sets and pertom statistical manipulation. The Weka Experiment Environment enables the user to
Create, run, modify, and analyze experiments in a more convenient manner than is possible when
processing the schemes individually. For example, the user can create an experiment that runs
Several schemes against a series of datasets and then analyze the results to detemine if one of the
schemes is (statistically) better than the other schemes.

nment
Setup AonAnalys
Experlment Connguratlon Made Simple

Open New

Resutts Destnation

Experiment Type Meraüon Control

Number of repetitons:

Dasttion

Algorfthms
Datasets

Cte'eic.

Use reistive paths

Fig: 1.6 Weka experiment

database.
file, CSV file, JDBC
Results destination: ARFF Percentage Split (data
randomized).
Cross-validation (default),Train/Test
Experiment type: first/Algorithms first.
Number of repetitions, Data sets
Iteration control:
Algorithmns: filters

Page 6
Information Technology
drop
3. Knowledge Flow -basically the same functionality as Explorer with drag and
previous
functionality. The advantage of this option is that it supports incremental learning from
results
ability to execute
4. Simple CLI - provides users without a graphic interface option the
commands from a terminal window.
b. Explore the default datasets in weka tool.
directory.
double click on the data"
Click the "Open button to open a data set and
file... " to practíceon.
learning datasetsthat you can use
Weka provides a number of small common machine
Select the "iris.arfi file to load the Iris dataset.

Seach dota
Share

Program Files Weka-3-8 data


". 7 This PC OS iC)
Dste mad1fiet Trpe
Name AREF Dsa File
Favorites ARFF Dia hie
irlinE.sff 4/14/201623AM
Destop ARFF Dta File
Downloads O
breat-oncerartf 4/t4/2616 1.23 44
ARFF Czta File
contact-lenses,arff 4/|4/2016 823 AM
Recent places AREF Dets Fle
cpu.arff 4 142016 2:2 A
a OneDrve ARSF Dsts File
cpu.with.vendor.arff 4/14/2015 A
ARFF Cato File
Ocredit-garff 4/14r2016 &2÷ AM
Homegroup O dabetes.arff 4/14/2016 2za AM ARFF Dat Fle
ARFF DIa Fúe
glass.artf 4/14'201633At
Ths PC Ohypothyroid.srtf 4/14/016 3:23 AtM
ARFF Dtta Fiie
4 K8
Desktop O ionosphere.aff S1U0163.23 AM
ARFF Dzta File
Documets ARFF Det* Fle
O irns.2D.arf 4/14716825 AM
Downloads ARFF Ots Fúe
Music 42423162.2 A!
LPEr Dats File
O aborarff APFF Bats File
Prctures OReutersCorm-testatf 4/14/2D16 &22 AIM
Videos O ReutesCom-troin,arff 4142016 223 44
ARSF D) File

ReutersGrain-test.erff ARFF sta fiie


4/14/20163 22 A64
A Hew Voiume (F:) ARFF Dgta Fife 136 KB
OReutersGrain-tràinaff 4714/2016 323 A6N
Rs Nen Volume (G:) 108 KB
segment-challengeartf 4/14/2616 33AA ARFF Da File
Osegment-test.stt 4/144206 8:2 AM ARFF Ozta Fie
Network Osoybean.art 4/14/20t6 2 s AM AR$F Ota File
SAKRISHNAN supermerket.arff

1itenn selected 587 bytes


Sets in weka
25 terns

Fig: 1.7 Different Data

and
References: (2005) Data Mining: Practical machine learning tools
E.
[1]Witten, LH. and Frank, Kaufmann, San Francisco. Publishers,
Morgan
techniques. 2nd edition Learning, Morgan Kaufmann
C4.5: Programs for Machine
[2] Ross Quinlan (1993).
San Mateo, CA.
(3] CVS-htp://weka.sourceforge.net/wikilindex.php/CVS
[4]Weka Doc-http://weka.sourceforge.net/wekadoc/

Exercise: normalization
min-max
1. Normalize the data using

Page 7
Information Technology

You might also like