DMW 05
DMW 05
PRACTICAL - 1
AIM : Explore machine learning tool “WEKA”.
The foundation of any Machine Learning application is data - not just a little data but a huge
data which is termed as Big Data in the current terminology.
To train the machine to analyze big data, you need to have several considerations on the data:
WEKA - an open source software provides tools for data preprocessing, implementation of
several Machine Learning algorithms, and visualization tools so that you can develop
machine learning techniques and apply them to real-world data mining problems.
The WEKA GUI Chooser application will start and you would see the following screen –
The GUI Chooser application allows you to run five different types of applications as listed
here −
Explorer
Experimenter
KnowledgeFlow
Workbench
Simple CLI
When you click on the Explorer button in the Applications selector, it opens the following
screen :
Preprocess
Classify
Cluster
Associate
Select Attributes
Visualize
Preprocess Tab
Initially as you open the explorer, only the Preprocess tab is enabled. The first step in
machine learning is to preprocess the data. Thus, in the Preprocess option, you will select the
data file, process it and make it fit for applying the various machine learning algorithms.
Classify Tab
The Classify tab provides you several machine learning algorithms for the classification of
your data. To list a few, you may apply algorithms such as Linear Regression, Logistic
Regression, Support Vector Machines, Decision Trees, RandomTree, RandomForest,
NaiveBayes, and so on. The list is very exhaustive and provides both supervised and
unsupervised machine learning algorithms.
Cluster Tab
Under the Cluster tab, there are several clustering algorithms provided - such as
SimpleKMeans, FilteredClusterer, HierarchicalClusterer, and so on.
Associate Tab
Under the Associate tab, you would find Apriori, FilteredAssociator and FPGrowth.
Select Attributes allows you feature selections based on several algorithms such as
ClassifierSubsetEval, PrinicipalComponents, etc.
Visualize Tab
Lastly, the Visualize option allows you to visualize your processed data for analysis.