0% found this document useful (0 votes)

17 views88 pages

Unit-7 Tools of AI (April 9, 2024)

Uploaded by

Arin Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views88 pages

Unit-7 Tools of AI (April 9, 2024)

Uploaded by

Arin Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

Study material of Unit-7

Syllabus – Tools for Machine learning: Introduction to Weka and KNIME

tools. Decision tree introduction of real-world data using the tools. Clustering
of data using tools. Learning through neural network using the tools.

1
Machine Learning Tools

Machine learning is one of the most revolutionary technologies that is making lives
simpler. It is a subfield of Artificial Intelligence, which analyses the data, build
the model, and make predictions. Due to its popularity and great applications,
every tech enthusiast wants to learn and build new machine learning Apps.
However, to build ML models, it is important to master machine learning tools.
Mastering machine learning tools will enable you to play with the data, train your
models, discover new methods, and create algorithms.

There are different tools, software, and platform available for machine learning,
and also new software and tools are evolving day by day. Although there are many
options and availability of Machine learning tools, choosing the best tool per your
model is a challenging task. If you choose the right tool for your model, you can
make it faster and more efficient and choosing the tool completely depends on the
requirement for one's project, skills, and price to the tool. Most of these tools are
freely available, except for some tools such as Rapid Miner. Each tool works in a
different language and provides some specifications. There are some popular and
commonly used Machine learning tools are as below :

1.Weka
2. KNIME
3.TensorFlow
4. PyTorch
5. Google Cloud ML Engine
6. Amazon Machine Learning (AML)
7. Accord.NET
8. Apache Mahout
9. Shogun
10. Oryx2
11. Apache Spark MLlib
12. Google ML kit for Mobile

2
Weka –
(Machine Learning Tool/Clustering Tool/Decision Tree Tool)

Weka is a popular open-source machine learning tool that provides a collection of

algorithms for data preprocessing, classification, regression, clustering, and
visualization. It is widely used in academic and industrial settings and supports a
variety of file formats.

WEKA - An open source software provides tools for data preprocessing,

implementation of several Machine Learning algorithms, and visualization tools
so that you can develop machine learning techniques and apply them to real-world
data mining problems. What WEKA offers is summarized in the following
diagram −

If you observe the beginning of the flow of the image, you will understand that
there are many stages in dealing with Big Data to make it suitable for machine
learning −
3
First, you will start with the raw data collected from the field. This data may
contain several null values and irrelevant fields. You use the data preprocessing
tools provided in WEKA to cleanse the data.
Then, you would save the preprocessed data in your local storage for applying ML
algorithms.
Next, depending on the kind of ML model that you are trying to develop you would
select one of the options such as Classify, Cluster, or Associate. The Attributes
Selection allows the automatic selection of features to create a reduced dataset.
Note that under each category, WEKA provides the implementation of several
algorithms. You would select an algorithm of your choice, set the desired
parameters and run it on the dataset.
Then, WEKA would give you the statistical output of the model processing. It
provides you a visualization tool to inspect the data.
The various models can be applied on the same dataset. You can then compare the
outputs of different models and select the best that meets your purpose.
Thus, the use of WEKA results in a quicker development of machine learning
models on the whole.
Now that we have seen what WEKA is and what it does, in the next chapter let us
learn how to install WEKA on your local computer.
Some of the key features of Weka are:

• User-friendly interface: Weka provides a graphical user interface that

allows users to easily explore and analyze data.
• Extensibility: Weka allows users to develop and integrate their own
algorithms and extensions into the tool.
• Comprehensive documentation and support: Weka has a large
community of users and developers, and provides extensive
documentation, tutorials, and forums for support.
These machine learning algorithms help in data mining.

4
Features:
• Data preparation
• Classification
• Regression
• Clustering
• Visualization and
• Association rules mining.
Pros:
• Provides online courses for training.
• Easy to understand algorithms.
• It is good for students as well.
Cons:
• Not much documentation and online support are available.

Weka - Installation
To install WEKA on your machine, visit WEKA’s official website and download
the installation file. WEKA supports installation on Windows, Mac OS X and
Linux. You just need to follow the instructions on this page to install WEKA for
your OS.
The steps for installing on Mac are as follows −
• Download the Mac installation file.
• Double click on the downloaded weka-3-8-3-corretto-jvm.dmg file.
You will see the following screen on successful installation.

• Click on the weak-3-8-3-corretto-jvm icon to start Weka.

• Optionally you may start it from the command line −

java -jar weka.jar

5
The WEKA GUI Chooser application will start and you would see the following
screen −

The GUI Chooser application allows you to run five different types of applications
as listed here −
• Explorer
• Experimenter
• KnowledgeFlow
• Workbench
• Simple CLI

Weka - Launching Explorer

When you click on the Explorer button in the Applications selector, it opens the
following screen −

6
On the top, you will see several tabs as listed here −
• Preprocess
• Classify
• Cluster
• Associate
• Select Attributes
• Visualize
Under these tabs, there are several pre-implemented machine learning algorithms.
Let us look into each of them in detail now.

Preprocess Tab

Initially as you open the explorer, only the Preprocess tab is enabled. The first
step in machine learning is to preprocess the data. Thus, in the Preprocess option,
you will select the data file, process it and make it fit for applying the various
machine learning algorithms.

Classify Tab

The Classify tab provides you several machine learning algorithms for the
classification of your data. To list a few, you may apply algorithms such as Linear
Regression, Logistic Regression, Support Vector Machines, Decision Trees,
RandomTree, RandomForest, NaiveBayes, and so on. The list is very exhaustive
and provides both supervised and unsupervised machine learning algorithms.

Cluster Tab

Under the Cluster tab, there are several clustering algorithms provided - such as
SimpleKMeans, FilteredClusterer, HierarchicalClusterer, and so on.

Associate Tab

Under the Associate tab, you would find Apriori, FilteredAssociator and
FPGrowth.

Select Attributes Tab

7
Select Attributes allows you feature selections based on several algorithms such
as ClassifierSubsetEval, PrinicipalComponents, etc.

Visualize Tab

Lastly, the Visualize option allows you to visualize your processed data for
analysis.
As you noticed, WEKA provides several ready-to-use algorithms for testing and
building your machine learning applications. To use WEKA effectively, you must
have a sound knowledge of these algorithms, how they work, which one to choose
under what circumstances, what to look for in their processed output, and so on. In
short, you must have a solid foundation in machine learning to use WEKA
effectively in building your apps.
Weka - Loading Data
We start with the first tab that you use to preprocess the data. This is common to
all algorithms that you would apply to your data for building the model and is a
common step for all subsequent operations in WEKA.
For a machine learning algorithm to give acceptable accuracy, it is important that
you must cleanse your data first. This is because the raw data collected from the
field may contain null values, irrelevant columns and so on.
In this chapter, you will learn how to preprocess the raw data and create a clean,
meaningful dataset for further use.
First, you will learn to load the data file into the WEKA explorer. The data can be
loaded from the following sources −
• Local file system
• Web
• Database
Here, we will see all the three options of loading data in detail.

Loading Data from Local File System

Just under the Machine Learning tabs that you studied in the previous lesson, you
would find the following three buttons −

8
• Open file ...
• Open URL ...
• Open DB ...
Click on the Open file ... button. A directory navigator window opens as shown in
the following screen −

Now, navigate to the folder where your data files are stored. WEKA installation
comes up with many sample databases for you to experiment. These are available
in the data folder of the WEKA installation.
For learning purpose, select any data file from this folder. The contents of the file
would be loaded in the WEKA environment. We will very soon learn how to
inspect and process this loaded data. Before that, let us look at how to load the data
file from the Web.

Loading Data from Web

Once you click on the Open URL ... button, you can see a window as follows −

9
We will open the file from a public URL Type the following URL in the popup
box −
https://storm.cis.fordham.edu/~gweiss/data-mining/weka-
data/weather.nominal.arff
You may specify any other URL where your data is stored. The Explorer will load
the data from the remote site into its environment.

Loading Data from DB

Once you click on the Open DB ... button, you can see a window as follows −

10
Set the connection string to your database, set up the query for data selection,
process the query and load the selected records in WEKA.
Weka - File Formats
WEKA supports a large number of file formats for the data. Here is the complete
list −
• arff
• arff.gz
• bsi
• csv
• dat
• data
• json
• json.gz
• libsvm
• m
• names
• xrff
• xrff.gz
The types of files that it supports are listed in the drop-down list box at the bottom
of the screen. This is shown in the screenshot given below.

11
As you would notice it supports several formats including CSV and JSON. The
default file type is Arff.

Arff Format

An Arff file contains two sections - header and data.

• The header describes the attribute types.
• The data section contains a comma separated list of data.
As an example for Arff format, the Weather data file loaded from the WEKA
sample databases is shown below −

From the screenshot, you can infer the following points −

12
• The @relation tag defines the name of the database.
• The @attribute tag defines the attributes.

• The @data tag starts the list of data rows each containing the comma

separated fields.
• The attributes can take nominal values as in the case of outlook shown

here −
@attribute outlook (sunny, overcast, rainy)
• The attributes can take real values as in this case −

@attribute temperature real

• You can also set a Target or a Class variable called play as shown here

−
@attribute play (yes, no)
• The Target assumes two nominal values yes or no.

Other Formats

The Explorer can load the data in any of the earlier mentioned formats. As arff is
the preferred format in WEKA, you may load the data from any format and save
it to arff format for later use. After preprocessing the data, just save it to arff format
for further analysis.
Now that you have learned how to load data into WEKA, in the next chapter, you
will learn how to preprocess the data.
Weka - Preprocessing the Data
The data that is collected from the field contains many unwanted things that leads
to wrong analysis. For example, the data may contain null fields, it may contain
columns that are irrelevant to the current analysis, and so on. Thus, the data must
be preprocessed to meet the requirements of the type of analysis you are seeking.
This is the done in the preprocessing module.
To demonstrate the available features in preprocessing, we will use
the Weather database that is provided in the installation.
Using the Open file ... option under the Preprocess tag select the weather-
nominal.arff file.

13
When you open the file, your screen looks like as shown here −

This screen tells us several things about the loaded data, which are discussed
further in this chapter.

Understanding Data
14
Let us first look at the highlighted Current relation sub window. It shows the
name of the database that is currently loaded. You can infer two points from this
sub window −
• There are 14 instances - the number of rows in the table.
• The table contains 5 attributes - the fields, which are discussed in the
upcoming sections.
On the left side, notice the Attributes sub window that displays the various fields
in the database.

The weather database contains five fields - outlook, temperature, humidity, windy
and play. When you select an attribute from this list by clicking on it, further details
on the attribute itself are displayed on the right hand side.
Let us select the temperature attribute first. When you click on it, you would see
the following screen −

15
In the Selected Attribute subwindow, you can observe the following −
• The name and the type of the attribute are displayed.
• The type for the temperature attribute is Nominal.
• The number of Missing values is zero.
• There are three distinct values with no unique value.
• The table underneath this information shows the nominal values for this
field as hot, mild and cold.
• It also shows the count and weight in terms of a percentage for each
nominal value.
At the bottom of the window, you see the visual representation of the class values.
If you click on the Visualize All button, you will be able to see all features in one
single window as shown here −

16
Removing Attributes

Many a time, the data that you want to use for model building comes with many
irrelevant fields. For example, the customer database may contain his mobile
number which is relevant in analysing his credit rating.

To remove Attribute/s select them and click on the Remove button at the bottom.
The selected attributes would be removed from the database. After you fully
preprocess the data, you can save it for model building.
Next, you will learn to preprocess the data by applying filters on this data.

17
Applying Filters

Some of the machine learning techniques such as association rule mining requires
categorical data. To illustrate the use of filters, we will use weather-
numeric.arff database that contains two numeric attributes
- temperature and humidity.
We will convert these to nominal by applying a filter on our raw data. Click on
the Choose button in the Filter subwindow and select the following filter −
weka→filters→supervised→attribute→Discretize

Click on the Apply button and examine

the temperature and/or humidity attribute. You will notice that these have
changed from numeric to nominal types.

18
Let us look into another filter now. Suppose you want to select the best attributes
for deciding the play. Select and apply the following filter −
weka→filters→supervised→attribute→AttributeSelection
You will notice that it removes the temperature and humidity attributes from the
database.

After you are satisfied with the preprocessing of your data, save the data by
clicking the Save ... button. You will use this saved file for model building.
In the next chapter, we will explore the model building using several predefined
ML algorithms.
Weka - Classifiers
Many machine learning applications are classification related. For example, you
may like to classify a tumor as malignant or benign. You may like to decide
whether to play an outside game depending on the weather conditions. Generally,
this decision is dependent on several features/conditions of the weather. So you
may prefer to use a tree classifier to make your decision of whether to play or not.
In this chapter, we will learn how to build such a tree classifier on weather data to
decide on the playing conditions.

Setting Test Data

We will use the preprocessed weather data file from the previous lesson. Open the
saved file by using the Open file ... option under the Preprocess tab, click on
the Classify tab, and you would see the following screen −

19
Before you learn about the available classifiers, let us examine the Test options.
You will notice four testing options as listed below −
• Training set
• Supplied test set
• Cross-validation
• Percentage split
Unless you have your own training set or a client supplied test set, you would use
cross-validation or percentage split options. Under cross-validation, you can set
the number of folds in which entire data would be split and used during each
iteration of training. In the percentage split, you will split the data between training
and testing using the set split percentage.
Now, keep the default play option for the output class −

20
Next, you will select the classifier.

Selecting Classifier

Click on the Choose button and select the following classifier −

weka→classifiers>trees>J48
This is shown in the screenshot below −

21
Click on the Start button to start the classification process. After a while, the
classification results would be presented on your screen as shown here −

22
Let us examine the output shown on the right hand side of the screen.
It says the size of the tree is 6. You will very shortly see the visual representation
of the tree. In the Summary, it says that the correctly classified instances as 2 and
the incorrectly classified instances as 3, It also says that the Relative absolute error
is 110%. It also shows the Confusion Matrix. Going into the analysis of these
results is beyond the scope of this tutorial. However, you can easily make out from
these results that the classification is not acceptable and you will need more data
for analysis, to refine your features selection, rebuild the model and so on until you
are satisfied with the model’s accuracy. Anyway, that’s what WEKA is all about.
It allows you to test your ideas quickly.

Visualize Results

To see the visual representation of the results, right click on the result in the Result
list box. Several options would pop up on the screen as shown here −

Select Visualize tree to get a visual representation of the traversal tree as seen in
the screenshot below −

23
Selecting Visualize classifier errors would plot the results of classification as
shown here −

A cross represents a correctly classified instance while squares represents

incorrectly classified instances. At the lower left corner of the plot you see
a cross that indicates if outlook is sunny then play the game. So this is a correctly
24
classified instance. To locate instances, you can introduce some jitter in it by
sliding the jitter slide bar.
The current plot is outlook versus play. These are indicated by the two drop down
list boxes at the top of the screen.

Now, try a different selection in each of these boxes and notice how the X & Y
axes change. The same can be achieved by using the horizontal strips on the right
hand side of the plot. Each strip represents an attribute. Left click on the strip sets
the selected attribute on the X-axis while a right click would set it on the Y-axis.
There are several other plots provided for your deeper analysis. Use them
judiciously to fine tune your model. One such plot of Cost/Benefit analysis is
shown below for your quick reference.

25
Explaining the analysis in these charts is beyond the scope of this tutorial. The
reader is encouraged to brush up their knowledge of analysis of machine learning
algorithms.
In the next chapter, we will learn the next set of machine learning algorithms, that
is clustering.

Weka - Clustering
A clustering algorithm finds groups of similar instances in the entire dataset.
WEKA supports several clustering algorithms such as EM, FilteredClusterer,
HierarchicalClusterer, SimpleKMeans and so on. You should understand these
algorithms completely to fully exploit the WEKA capabilities.
As in the case of classification, WEKA allows you to visualize the detected clusters
graphically. To demonstrate the clustering, we will use the provided iris database.
The data set contains three classes of 50 instances each. Each class refers to a type
of iris plant.

Loading Data

26
In the WEKA explorer select the Preprocess tab. Click on the Open file ... option
and select the iris.arff file in the file selection dialog. When you load the data, the
screen looks like as shown below −

You can observe that there are 150 instances and 5 attributes. The names of
attributes are listed
as sepallength, sepalwidth, petallength, petalwidth and class. The first four
attributes are of numeric type while the class is a nominal type with 3 distinct
values. Examine each attribute to understand the features of the database. We will
not do any preprocessing on this data and straight-away proceed to model building.

Clustering

Click on the Cluster TAB to apply the clustering algorithms to our loaded data.
Click on the Choose button. You will see the following screen −

27
Now, select EM as the clustering algorithm. In the Cluster mode sub window,
select the Classes to clusters evaluation option as shown in the screenshot below
−

Click on the Start button to process the data. After a while, the results will be
presented on the screen.

28
Next, let us study the results.

Examining Output

The output of the data processing is shown in the screen below −

From the output screen, you can observe that −

• There are 5 clustered instances detected in the database.
• The Cluster 0 represents setosa, Cluster 1 represents
virginica, Cluster 2 represents versicolor, while the last two clusters do
not have any class associated with them.
If you scroll up the output window, you will also see some statistics that gives the
mean and standard deviation for each of the attributes in the various detected
clusters. This is shown in the screenshot given below −

29
Next, we will look at the visual representation of the clusters.

Visualizing Clusters

To visualize the clusters, right click on the EM result in the Result list. You will
see the following options −

30
Select Visualize cluster assignments. You will see the following output −

31
As in the case of classification, you will notice the distinction between the correctly
and incorrectly identified instances. You can play around by changing the X and
Y axes to analyze the results. You may use jittering as in the case of classification
to find out the concentration of correctly identified instances. The operations in
visualization plot are similar to the one you studied in the case of classification.

Applying Hierarchical Clusterer

To demonstrate the power of WEKA, let us now look into an application of another
clustering algorithm. In the WEKA explorer, select the HierarchicalClusterer as
your ML algorithm as shown in the screenshot shown below −

Choose the Cluster mode selection to Classes to cluster evaluation, and click on
the Start button. You will see the following output −

32
Notice that in the Result list, there are two results listed: the first one is the EM
result and the second one is the current Hierarchical. Likewise, you can apply
multiple ML algorithms to the same dataset and quickly compare their results.
If you examine the tree produced by this algorithm, you will see the following
output −

33
Here, you will study the Associate type of ML algorithms.
Weka - Feature Selection
When a database contains a large number of attributes, there will be several
attributes which do not become significant in the analysis that you are currently
seeking. Thus, removing the unwanted attributes from the dataset becomes an
important task in developing a good machine learning model.
You may examine the entire dataset visually and decide on the irrelevant attributes.
This could be a huge task for databases containing a large number of attributes like
the supermarket case that you saw in an earlier lesson. Fortunately, WEKA
provides an automated tool for feature selection.
This chapter demonstrate this feature on a database containing a large number of
attributes.

Loading Data

In the Preprocess tag of the WEKA explorer, select the labor.arff file for loading
into the system. When you load the data, you will see the following screen −

34
Notice that there are 17 attributes. Our task is to create a reduced dataset by
eliminating some of the attributes which are irrelevant to our analysis.

Features Extraction

Click on the Select attributesTAB.You will see the following screen −

Under the Attribute Evaluator and Search Method, you will find several
options. We will just use the defaults here. In the Attribute Selection Mode, use
full training set option.
Click on the Start button to process the dataset. You will see the following output
−

35
At the bottom of the result window, you will get the list of Selected attributes. To
get the visual representation, right click on the result in the Result list.
The output is shown in the following screenshot −

36
Clicking on any of the squares will give you the data plot for your further analysis.
A typical data plot is shown below −

This is similar to the ones we have seen in the earlier chapters. Play around with
the different options available to analyze the results.
You have seen so far the power of WEKA in quickly developing machine learning
models. What we used is a graphical tool called Explorer for developing these
models. WEKA also provides a command line interface that gives you more power
than provided in the explorer.
Clicking the Simple CLI button in the GUI Chooser application starts this
command line interface which is shown in the screenshot below −

37
Type your commands in the input box at the bottom. You will be able to do all that
you have done so far in the explorer plus much more. Refer to
WEKA documentation (https://www.cs.waikato.ac.nz/ml/weka/documentation.ht
ml) for further details.
Lastly, WEKA is developed in Java and provides an interface to its API. So if you
are a Java developer and keen to include WEKA ML implementations in your own
Java projects, you can do so easily.

38
KNIME
(Machine Learning Tool/Clustering Tool/Decision Tree Tool)

KNIME is a tool for data analytics, reporting and integration platform. Using the
data pipelining concept, it combines different components for machine learning
and data mining.

Features:
• It can integrate the code of programming languages like C, C++, R,
Python, Java, JavaScript etc.
• It can be used for business intelligence, financial data analysis, and
CRM.
Pros:
• It can work as a SAS alternative.
• It is easy to deploy and install.
• Easy to learn.
Cons:
• Difficult to build complicated models.
• Limited visualization and exporting capabilities.

KNIME - Introduction
Developing Machine Learning models is always considered very challenging due
to its cryptic nature. Generally, to develop machine learning applications, you must
be a good developer with an expertise in command-driven development. The
introduction of KNIME has brought the development of Machine Learning models
in the purview of a common man.
KNIME provides a graphical interface (a user friendly GUI) for the entire
development. In KNIME, you simply have to define the workflow between the
various predefined nodes provided in its repository. KNIME provides several
predefined components called nodes for various tasks such as reading data,

39
applying various ML algorithms, and visualizing data in various formats. Thus, for
working with KNIME, no programming knowledge is required. Isn’t this exciting?
The upcoming chapters of this tutorial will teach you how to master the data
analytics using several well-tested ML algorithms.

KNIME - Installation
KNIME Analytics Platform is available for Windows, Linux and MacOS. In this
chapter, let us look into the steps for installing the platform on the Mac. If you use
Windows or Linux, just follow the installation instructions given on the KNIME
download page. The binary installation for all three platforms is available
at KNIME’s page.

Mac Installation

Download the binary installation from the KNIME official site. Double click on
the downloaded dmg file to start the installation. When the installation completes,
just drag the KNIME icon to the Applications folder as seen here −

KNIME - First Run

Double-click the KNIME icon to start the KNIME Analytics Platform. Initially,
you will be asked to setup a workspace folder for saving your work. Your screen
will look like the following −

40
You may set the selected folder as default and the next time you launch KNIME,
it will not

show up this dialog again.

After a while, the KNIME platform will start on your desktop. This is the
workbench where you would carry your analytics work. Let us now look at the
various portions of the workbench.
KNIME - Workbench
When KNIME starts, you will see the following screen −

41
As has been marked in the screenshot, the workbench consists of several views.
The views which are of immediate use to us are marked in the screenshot and listed
below −
• Workspace
• Outline
• Nodes Repository
• KNIME Explorer
• Console
• Description
As we move ahead in this chapter, let us learn these views each in detail.

Workspace View

The most important view for us is the Workspace view. This is where you would
create your machine learning model. The workspace view is highlighted in the
screenshot below −

The screenshot shows an opened workspace. You will soon learn how to open an
existing workspace.

42
Each workspace contains one or more nodes. You will learn the significance of
these nodes later in the tutorial. The nodes are connected using arrows. Generally,
the program flow is defined from left to right, though this is not required. You may
freely move each node anywhere in the workspace. The connecting lines between
the two would move appropriately to maintain the connection between the nodes.
You may add/remove connections between nodes at any time. For each node a
small description may be optionally added.

Outline View

The workspace view may not be able to show you the entire workflow at a time.
That is the reason, the outline view is provided.

The outline view shows a miniature view of the entire workspace. There is a zoom
window inside this view that you can slide to see the different portions of the
workflow in the Workspace view.

Node Repository

This is the next important view in the workbench. The Node repository lists the
various nodes available for your analytics. The entire repository is nicely
categorized based on the node functions. You will find categories such as −
43
• IO
• Views
• Analytics

Under each category you would find several options. Just expand each category
view to see what you have there. Under the IO category, you will find nodes to
read your data in various file formats, such as ARFF, CSV, PMML, XLS, etc.

Depending on your input source data format, you will select the appropriate node
for reading your dataset.
By this time, probably you have understood the purpose of a node. A node defines
a certain kind of functionality that you can visually include in your workflow.
44
The Analytics node defines the various machine learning algorithms, such as
Bayes, Clustering, Decision Tree, Ensemble Learning, and so on.

The implementation of these various ML algorithms is provided in these nodes.

To apply any algorithm in your analytics, simply pick up the desired node from
the repository and add it to your workspace. Connect the output of the Data reader
node to the input of this ML node and your workflow is created.
We suggest you to explore the various nodes available in the repository.
KNIME Explorer
The next important view in the workbench is the Explorer view as shown in the
screenshot below −

The first two categories list the workspaces defined on the KNIME server. The
third option LOCAL is used for storing all the workspaces that you create on your
local machine. Try expanding these tabs to see the various predefined workspaces.
Especially, expand EXAMPLES tab.

45
KNIME provides several examples to get you started with the platform. In the next
chapter, you will be using one of these examples to get yourself acquainted with
the platform.

Console View

As the name indicates, the Console view provides a view of the various console
messages while executing your workflow.

The Console view is useful in diagnosing the workflow and examining the
analytics results.

Description View

The last important view that is of immediate relevance to us is

the Description view. This view provides a description of a selected item in the
workspace. A typical view is shown in the screenshot below −
46
The above view shows the description of a File Reader node. When you select
the File Reader node in your workspace, you will see its description in this view.
Clicking on any other node shows the description of the selected node. Thus, this
view becomes very useful in the initial stages of learning when you do not
precisely know the purpose of the various nodes in the workspace and/or the nodes
repository.

Toolbar

Besides the above described views, the workbench has other views such as toolbar.
The toolbar contains various icons that facilitate a quick action. The icons are
enabled/disabled depending on the context. You can see the action that each icon
performs by hovering mouse on it. The following screen shows the action taken
by Configure icon.

47
Enabling/Disabling Views
The various views that you have seen so far can be turned on/off easily. Clicking
the Close icon in the view will close the view. To reinstate the view, go to
the View menu option and select the desired view. The selected view will be added
to the workbench.

Now, as you have been acquainted with the workbench, I will show you how to
run a workflow and study the analytics performed by it.
KNIME - Running Your First Workflow
KNIME has provided several good workflows for ease of learning. In this chapter,
we shall pick up one of the workflows provided in the installation to explain the
various features and the power of analytics platform. We will use a simple
classifier based on a Decision Tree for our study.

Loading Decision Tree Classifier

In the KNIME Explorer locate the following workflow −

LOCAL / Example Workflows / Basic Examples / Building a Simple Classifier
This is also shown in the screenshot below for your quick reference −

48
Double click on the selected item to open the workflow. Observe the Workspace
view. You will see the workflow containing several nodes. The purpose of this
workflow is to predict the income group from the democratic attributes of the adult
data set taken from UCI Machine Learning Repository. The task of this ML model
is to classify the people in a specific region as having income greater or lesser than
50K.
The Workspace view along with its outline is shown in the screenshot below −

49
Notice the presence of several nodes picked up from the Nodes repository and
connected in a workflow by arrows. The connection indicates that the output of
one node is fed to the input of the next node. Before we learn the functionality of
each of the nodes in the workflow, let us first execute the entire workflow.

Executing Workflow

Before we look into the execution of the workflow, it is important to understand

the status report of each node. Examine any node in the workflow. At the bottom
of each node you would find a status indicator containing three circles. The
Decision Tree Learner node is shown in the screenshot below −

The status indicator is red indicating that this node has not been executed so far.
During the execution, the center circle which is yellow in color would light up. On
successful execution, the last circle turns green. There are more indicators to give
you the status information in case of errors. You will learn them when an error
occurs in the processing.
Note that currently the indicators on all nodes are red indicating that no node is
executed so far. To run all nodes, click on the following menu item −
Node → Execute All

50
After a while, you will find that each node status indicator has now turned green
indicating that there are no errors.
In the next chapter, we will explore the functionality of the various nodes in the
workflow.
KNIME - Exploring Workflow
If you check out the nodes in the workflow, you can see that it contains the
following −
• File Reader,
• Color Manager
• Partitioning
• Decision Tree Learner
• Decision Tree Predictor
• Score
• Interactive Table
• Scatter Plot
• Statistics
These are easily seen in the Outline view as shown here −

Each node provides a specific functionality in the workflow. We will now look
into how to configure these nodes to meet up the desired functionality. Please note
that we will discuss only those nodes that are relevant to us in the current context
of exploring the workflow.

File Reader

The File Reader node is depicted in the screenshot below −

51
There is some description at the top of the window that is provided by the creator
of the workflow. It tells that this node reads the adult data set. The name of the file
is adult.csv as seen from the description underneath the node symbol. The File
Reader has two outputs - one goes to Color Manager node and the other one goes
to Statistics node.
If you right click the File Manager, a popup menu would show up as follows −

52
The Configure menu option allows for the node configuration.
The Execute menu runs the node. Note that if the node has already been run and
if it is in a green state, this menu is disabled. Also, note the presence of Edit Note
Description menu option. This allows you to write the description for your node.
Now, select the Configure menu option, it shows the screen containing the data
from the adult.csv file as seen in the screenshot here −

When you execute this node, the data will be loaded in the memory. The entire
data loading program code is hidden from the user. You can now appreciate the
usefulness of such nodes - no coding required.
Our next node is the Color Manager.

Color Manager

Select the Color Manager node and go into its configuration by right clicking on
it. A colors settings dialog would appear. Select the income column from the
dropdown list.
Your screen would look like the following −

53
Notice the presence of two constraints. If the income is less than 50K, the datapoint
will acquire green color and if it is more it gets red color. You will see the data
point mappings when we look at the scatter plot later in this chapter.

Partitioning
In machine learning, we usually split the entire available data in two parts. The
larger part is used in training the model, while the smaller portion is used for
testing. There are different strategies used for partitioning the data.
To define the desired partitioning, right click on the Partitioning node and select
the Configure option. You would see the following screen −

54
In the case, the system modeller has used the Relative (%) mode and the data is
split in 80:20 ratio. While doing the split, the data points are picked up randomly.
This ensures that your test data may not be biased. In case of Linear sampling, the
remaining 20% data used for testing may not correctly represent the training data
as it may be totally biased during its collection.
If you are sure that during data collection, the randomness is guaranteed, then you
may select the linear sampling. Once your data is ready for training the model, feed
it to the next node, which is the Decision Tree Learner.

Decision Tree Learner

The Decision Tree Learner node as the name suggests uses the training data and
builds a model. Check out the configuration setting of this node, which is depicted
in the screenshot below −

As you see the Class is income. Thus the tree would be built based on the income
column and that is what we are trying to achieve in this model. We want a
separation of people having income greater or lesser than 50K.
After this node runs successfully, your model would be ready for testing.

55
Decision Tree Predictor

The Decision Tree Predictor node applies the developed model to the test data set
and appends the model predictions.

The output of the predictor is fed to two different nodes - Scorer and Scatter Plot.
Next, we will examine the output of prediction.

Scorer

This node generates the confusion matrix. To view it, right click on the node. You
will see the following popup menu −

56
Click the View: Confusion Matrix menu option and the matrix will pop up in a
separate window as shown in the screenshot here −

57
It indicates that the accuracy of our developed model is 83.71%. If you are not
satisfied with this, you may play around with other parameters in model building,
especially, you may like to revisit and cleanse your data.

Scatter Plot

To see the scatter plot of the data distribution, right click on the Scatter Plot node
and select the menu option Interactive View: Scatter Plot. You will see the
following plot −

The plot gives the distribution of different income group people based on the
threshold of 50K in two different colored dots - red and blue. These were the colors
set in our Color Manager node. The distribution is relative to the age as plotted
on the x-axis. You may select a different feature for x-axis by changing the
configuration of the node.
The configuration dialog is shown here where we have selected the marital-
status as a feature for x-axis.

58
This completes our discussion on the predefined model provided by KNIME. We
suggest you to take up the other two nodes (Statistics and Interactive Table) in the
model for your self-study.
Let us now move on to the most important part of the tutorial – creating your own
model.

KNIME - Building Your Own Model

Here, you will build your own machine learning model to categorize the plants
based on a few observed features. We will use the well-known iris dataset
from UCI Machine Learning Repository for this purpose. The dataset contains
three different classes of plants. We will train our model to classify an unknown
plant into one of these three classes.
We will start with creating a new workflow in KNIME for creating our machine
learning models.

Creating Workflow

To create a new workflow, select the following menu option in the KNIME
workbench.
59
File → New
You will see the following screen −

Select the New KNIME Workflow option and click on the Next button. On the
next screen, you will be asked for the desired name for the workflow and the
destination folder for saving it. Enter this information as desired and
click Finish to create a new workspace.
A new workspace with the given name would be added to the Workspace view as
seen here −

60
You will now add the various nodes in this workspace to create your model.
Before, you add nodes, you have to download and prepare the iris dataset for our
use.

Preparing Dataset

Download the iris dataset from the UCI Machine Learning Repository
site Download Iris Dataset. The downloaded iris.data file is in CSV format. We
will make some changes in it to add the column names.
Open the downloaded file in your favorite text editor and add the following line at
the beginning.
sepal length, petal length, sepal width, petal width, class
When our File Reader node reads this file, it will automatically take the above
fields as column names.
Now, you will start adding various nodes.

Adding File Reader

Go to the Node Repository view, type “file” in the search box to locate the File
Reader node. This is seen in the screenshot below −

61
Select and double click the File Reader to add the node into the workspace.
Alternatively, you may use drag-n-drop feature to add the node into the workspace.
After the node is added, you will have to configure it. Right click on the node and
select the Configure menu option. You have done this in the earlier lesson.
The settings screen looks like the following after the datafile is loaded.

To load your dataset, click on the Browse button and select the location of your
iris.data file. The node will load the contents of the file which are displayed in the
lower portion of the configuration box. Once you are satisfied that the datafile is
located properly and loaded, click on the OK button to close the configuration
dialog.
You will now add some annotation to this node. Right click on the node and
select New Workflow Annotation menu option. An annotation box would appear
on the screen as shown in the screenshot here:

62
Click inside the box and add the following annotation −
Reads iris.data
Click anywhere outside the box to exit the edit mode. Resize and place the box
around the node as desired. Finally, double click on the Node 1 text underneath
the node to change this string to the following −
Loads data
At this point, your screen would look like the following −

We will now add a new node for partitioning our loaded dataset into training and
testing.
63
Adding Partitioning Node

In the Node Repository search window, type a few characters to locate

the Partitioning node, as seen in the screenshot below −

Add the node to our workspace. Set its configuration as follows −

Relative (%) : 95
Draw Randomly
The following screenshot shows the configuration parameters.

64
Next, make the connection between the two nodes. To do so, click on the output
of the File Reader node, keep the mouse button clicked, a rubber band line would
appear, drag it to the input of Partitioning node, release the mouse button. A
connection is now established between the two nodes.
Add the annotation, change the description, position the node and annotation view
as desired. Your screen should look like the following at this stage −

65
Next, we will add the k-Means node.

Adding k-Means Node

Select the k-Means node from the repository and add it to the workspace. If you
want to refresh your knowledge on k-Means algorithm, just look up its description
in the description view of the workbench. This is shown in the screenshot below −

66
Incidentally, you may look up the description of different algorithms in the
description window before taking a final decision on which one to use.
Open the configuration dialog for the node. We will use the defaults for all fields
as shown here −

Click OK to accept the defaults and to close the dialog.

Set the annotation and description to the following −
• Annotation: Classify clusters
• Description:Perform clustering
Connect the top output of the Partitioning node to the input of k-Means node.
Reposition your items and your screen should look like the following −

67
Next, we will add a Cluster Assigner node.

Adding Cluster Assigner

The Cluster Assigner assigns new data to an existing set of prototypes. It takes
two inputs - the prototype model and the datatable containing the input data. Look
up the node’s description in the description window which is depicted in the
screenshot below −

Thus, for this node you have to make two connections −

• The PMML Cluster Model output of Partitioning node → Prototypes
Input of Cluster Assigner
• Second partition output of Partitioning node → Input data of Cluster
Assigner
These two connections are shown in the screenshot below −

68
The Cluster Assigner does not need any special configuration. Just accept the
defaults.
Now, add some annotation and description to this node. Rearrange your nodes.
Your screen should look like the following −

At this point, our clustering is completed. We need to visualize the output

graphically. For this, we will add a scatter plot. We will set the colors and shapes
for three classes differently in the scatter plot. Thus, we will filter the output of

69
the k-Means node first through the Color Manager node and then through Shape
Manager node.

Adding Color Manager

Locate the Color Manager node in the repository. Add it to the workspace. Leave
the configuration to its defaults. Note that you must open the configuration dialog
and hit OK to accept the defaults. Set the description text for the node.
Make a connection from the output of k-Means to the input of Color Manager.
Your screen would look like the following at this stage −

Adding Shape Manager

Locate the Shape Manager in the repository and add it to the workspace. Leave
its configuration to the defaults. Like the previous one, you must open the
configuration dialog and hit OK to set defaults. Establish the connection from the
output of Color Manager to the input of Shape Manager. Set the description for
the node.
Your screen should look like the following −

70
Now, you will be adding the last node in our model and that is the scatter plot.

Adding Scatter Plot

Locate Scatter Plot node in the repository and add it to the workspace. Connect
the output of Shape Manager to the input of Scatter Plot. Leave the configuration
to defaults. Set the description.
Finally, add a group annotation to the recently added three nodes
Annotation: Visualization
Reposition the nodes as desired. Your screen should look like the following at this
stage.

This completes the task of model building.

KNIME - Testing the Model
To test the model, execute the following menu options: Node → Execute All
71
If everything goes correct, the status signal at the bottom of each node would turn
green. If not, you will need to look up the Console view for the errors, fix them up
and re-run the workflow.
Now, you are ready to visualize the predicted output of the model. For this, right
click the Scatter Plot node and select the following menu options: Interactive
View: Scatter Plot
This is shown in the screenshot below −

You would see the scatter plot on the screen as shown here −

72
You can run through different visualizations by changing x- and y- axis. To do so,
click on the settings menu at the top right corner of the scatter plot. A popup menu
would appear as shown in the screenshot below −

You can set the various parameters for the plot on this screen to visualize the data
from several aspects.
This completes our task of model building.
KNIME - Summary and Future Work
KNIME provides a graphical tool for building Machine Learning models. Here,
you learned how to download and install KNIME on your machine.
Summary
You learned the various views provided in the KNIME workbench. KNIME
provides several predefined workflows for your learning. We used one such
workflow to learn the capabilities of KNIME. KNIME provides several pre-
programmed nodes for reading data in various formats, analyzing data using
several ML algorithms, and finally visualizing data in many different ways.
Towards the end of the tutorial, you created your own model starting from scratch.
We used the well-known iris dataset to classify the plants using k-Means
algorithm.
By this way , You are now ready to use these techniques for your own analytics.
73
Decision Tree and Tools
(Weka and KNIME tools already explained above)
A decision tree is a flowchart that starts with one main idea — or question —
and branches out with potential outcomes of each decision. By using a decision
tree, you can identify the best possible course of action.

When it comes to marketing, decision-making can feel particularly risky. What is

my colleague is so attached to a new product, she doesn’t want to mention any of
its shortcomings? What if my marketing team doesn’t mind office growth, but they
haven’t considered how it will affect our strategy long-term?

The visual element of a decision tree helps you include more potential actions and
outcomes than you might’ve if you just talked about it, mitigating risks of
unforeseen consequences.

Plus, the diagram allows you to include smaller details and create a step-by-step
plan, so once you choose your path, it’s already laid out for you to follow.

A decision tree contains four elements: the root node, decision nodes, leaf nodes,
and branches that connect them together.
• The root node is where the tree starts. It's the big issue or decision you are

addressing.
• As the name suggests, the decision nodes represent a decision in your tree.

They are possible avenues to "solve" your main problem.

• The lead nodes represent possible outcomes of a decision. For instance, if

you're deciding where to eat for lunch, a potential decision node is eat a

74
hamburger at McDonald's. A corresponding leaf node could be: Save money
by spending less than $5.
• Branches are the arrows that connect each element in a decision tree.
Follow the branches to understand the risks and rewards of each decision.

Now let's explore how to read and analyze the decisions in the tree.

Decision Tree Analysis [Example]

Let’s say you’re deciding where to advertise your new campaign:

1. On Facebook, using paid ads, or

2. On Instagram, using influencer sponsorships.

For the sake of simplicity, we’ll assume both options appeal to your ideal
demographic and make sense for your brand.

Here’s a preliminary decision tree you’d draw for your advertising campaign:

As you can see, you want to put your ultimate objective at the top — in this
case, Advertising Campaign is the decision you need to make.

Next, you’ll need to draw arrows (your branches) to each potential action you
could take (your leaves).

For our example, you only have two initial actions to take: Facebook Paid Ads, or
Instagram Sponsorships. However, your tree might include multiple alternative
options depending on the objective.

75
Now, you’ll want to draw branches and leaves to compare costs. If this were the
final step, the decision would be obvious: Instagram costs $10 less, so you’d likely
choose that.

However, that isn't the final step. You need to figure out the odds for success versus
failure. Depending on the complexity of your objective, you might examine
existing data in the industry or from prior projects at your company, your team’s
capabilities, budget, time-requirements, and predicted outcomes. You might also
consider external circumstances that could affect success.

Evaluating Risk Versus Reward

In the Advertising Campaign example, there’s a 50% chance of success or failure

for both Facebook and Instagram. If you succeed with Facebook, your ROI is
around $1,000. If you fail, you risk losing $200.

Instagram, on the other hand, has an ROI of $900. If you fail, you risk losing $50.

To evaluate risk versus reward, you need to find out Expected Value for both
avenues. Here’s how you’d figure out your Expected Value:

• Take your predicted success (50%) and multiply it by the potential amount
of money earned ($1000 for Facebook). That’s 500.
• Then, take your predicted chance of failure (50%) and multiply it by the
amount of money lost (-$200 for Facebook). That’s -100.
• Add those two numbers together. Using this formula, you’ll see Facebook’s
Expected Value is 400, while Instagram’s Expected Value is 425.

With this predictive information, you should be able to make a better, more
confident decision — in this case, it looks like Instagram is a better option. Even
though Facebook has a higher ROI, Instagram has a higher Expected Value, and
you risk losing less money.

How to Create a Decision Tree

You can create a decision tree using the following steps. Remember: once you
complete your tree, you can begin analyzing each decision to find the best course
of action.

76
1. Define your main idea or question.
The first step is identifying your root node. This is the main issue, question, or idea
you want to explore. Write your root node at the top of your flowchart.

2. Add potential decisions and outcomes.

Next, expand your tree by adding potential decisions. Connect these decisions to
the root node with branches. From here, write the obvious and potential outcomes
of each decision.

3. Expand until you hit end points.

Remember to flesh out each decision in your tree. Each decision should eventually
hit an end point, ensuring all outcomes rise to the surface. In other words, there's
no room for surprises.

4. Calculate risk and reward.

Now it's time to crunch the numbers.
The most effective decision trees incorporate quantitative data. This allows you to
calculate the expected value of each decision. The most common data is monetary.

5. Evaluate outcomes.
The last step is evaluating outcomes. In this step, you are determining which
decision is most ideal based on the amount of risk you're willing to take.
77
Remember, the highest-value decision may not be the best course of action. Why?
Although it comes with a high reward, it may also bring a high level of risk.
It's up to you — and your team — to determine the best outcome based on your
budget, timeline, and other factors.

While the Advertising Campaign example had qualitative numbers to use as

indicators of risk versus reward, your decision tree might be more subjective.

For instance, perhaps you’re deciding whether your small startup should merge
with a bigger company. In this case, there could be math involved, but your
decision tree might also include more quantitative questions, like: Does this
company represent our brand values? Yes/No. Do our customers benefit from the
merge? Yes/No.

To clarify this point, let’s take a look at some diverse decision tree examples.

Decision Tree Examples

The following example is from SmartDraw, a free flowchart maker:
Example One: Project Development

Decision Tree Software Tools

Over the past few years, open source decision tree software tools have been in high
demand for solving analytics and predictive data mining problems.

78
Decision trees are a popular type of supervised learning algorithm that builds
classification or regression models in the shape of a tree (that’s why they are also
known as regression and classification trees). They work for both categorical
data and continuous data.
Here best open source license classification tree software solutions that run on
Windows, Linux, and Mac OS X.

1. Weka
2. KNIME
3. Rapid Miner
4. SilverDecisions
5. Orange
6. Rattle
7. SMILES
8. Scikit-learn
9. OC1 Decision Tree Software System
10. Simple Decision Tree

1. Weka
This is a Java-based free and open source tool for Windows, Linux, and Mac OS
X. Weka is a powerful collection of machine learning algorithms for data mining
purposes.

The algorithms can be applied directly to a dataset as well as called from Java
code. Weka contains tools for classification, regression, visualization and
clustering, and association rules.

Moreover, Weka has free online courses that teach machine learning and data
mining using Weka.

79
2. KNIME

KNIME Analytics Platform is one of the best open solutions for data-driven
innovation. The platform is fast to deploy, easy to scale, and very intuitive to learn.

KNIME will provide you with 1500 modules, hundreds of ready-to-run examples
(including decision tree examples), a variety of integrated tools, and an extremely
wide choice of advanced algorithms. Great software for any data scientist.

In addition, KNIME has great examples of how to build prediction or classification

model using a decision tree algorithm.

Knime is available for Windows, Linux, and Mac OSX.

3. Rapid Miner
This is a very powerful and popular data mining software solution which provides
you with predictive advanced analytics. And it is one of the best open source
decision tree software tool with no-coding required.
Written in Java, it holds a variety of data mining functions such as visualization,
data pre-processing, cleansing, filtering, clustering, and predictive analysis. Its
Decision Tree operator generates a decision tree model, which can be used for
classification and regression.
Also, it is easily integrated with WEKA.

80
Rapid Miners runs on Windows, Linux, and Mac OSX.

4. SilverDecisions

SilverDecisions is a free and open source decision tree software with a great set of
layout options. It is a specialized software for creating and analyzing decision
trees.

The decision tree can be easily exported to JSON, PNG or SVG format.

In addition, they will provide you with a rich set of examples of decision trees in
different areas such as research and development project decision tree, city council
management software and etc.

The software works on Windows, Linux, and Mac OS X.

5. Orange

81
Orange is a free and open source data visualization software and machine learning
tool for novice and expert. Orange will surprise you!

It allows you to explore statistical distributions, box and whisker plots, scatter
plots, or dive much deeper with decision trees, heatmaps, MDS, hierarchical
clustering, and linear regression models.

In addition, Orange graphic user interface allows you to focus on exploratory data
analysis instead of coding. Orange is used for teaching at schools, universities and
in professional training courses all over the world.

When it comes to decision trees, Orange has numerous implementations of

classification tree learners: TreeLearner, SimpleTreeLearner, and a C45Learner.

Orange works on Windows, Linux, and Mac OS X.

10. Rattle

Rattle is a popular GUI for data mining using R. It is also a great solution when it
comes to open source decision tree software. The tool runs on Linux, Mac OS, and
Windows.

82
Rattle is a powerful tool that presents statistical and visual summaries of data,
builds both unsupervised and supervised machine learning models from the data,
presents the performance of models graphically and etc.
Except for business and commercial enterprises purposes, Rattle is used also for
teaching in Australian and American universities.

In addition, Rattle has a number of decision tree examples you can use.

ARTIFICIAL NEURAL NETWORK TOOLS/SOFTWARE

Artificial Neural network software apply concepts adapted from biological neural
networks, artificial intelligence and machine learning and is used to simulate,
research, develop Artificial Neural network. Neural network simulators are
software applications that are used to simulate the behavior of artificial or
biological neural networks which focus on one or a limited number of specific
types of neural networks.

They are typically stand-alone and not intended to produce general neural
networks that can be integrated in other software. Simulators usually have some
form of built-in visualization to monitor the training process and some simulators
also visualize the physical structure of the neural network. In order for neural
network models to be shared by different applications, Predictive Model Markup
Language (PMML) is used. PMML is an XML-based language which provides a
way for applications to define and share neural network models and other data
mining models between PMML compliant application.

83
What are Artificial Neural Network Tools (Software)?

Artificial Neural Network Software is used to simulate, research, develop, and

apply artificial neural networks, software concepts adapted from biological neural
networks.

Artificial Neural Network Software are intended for practical applications of

artificial neural networks with the primary focus is on data mining and forecasting.
These data analysis simulators usually have some form of preprocessing
capabilities and use a relatively simple static neural network that can be
configured.

Artificial Neural Network Software/Tools : Below are mentioned some of the

Top Artificial Neural Network Software .
Neural Designer, Neuroph, Darknet, Keras, NeuroSolutions, Tflearn, ConvNetJS,
Torch, NVIDIA DIGITS, Stuttgart Neural Network Simulator, DeepPy, MLPNeuralNet,
DNNGraph, AForge.Neuro, NeuralN, NeuralTalk2, Knet, cuda-convnet2, DN2A,
Mocha, HNN, Lasagne, neon, LambdaNet, gobrain, RustNN, deeplearn-rs
1.Neural Designer
Neural Designer software is developed by the startup company called Artelnics,
headquartered in Spain. The company was founded by Roberto Lopez and Ismael
Santana. Neural Designer is a desktop application for data mining that uses neural
networks which is a paradigm of machine learning. Neural networks in Neural
84
Designs are mathematical models of the brain functions, computational models
which are inspired by central nervous systems in the brain that can be trained to
perform certain tasks. Neural Designer has most of the advanced techniques for
data preparations, machine learning and model deployment. Its visual graphical
user interface provides comprehensive and visual results without the need to write
code or assemble blocks. The software implements multicore processing to analyse
larger amounts of data in less time.

Neural Designer is a machine learning software designed to help businesses in

engineering, banking, insurance, healthcare, retail, and consumer industries utilize
data to create, train, and deploy neural network models.

With these models, you can easily discover relationships, recognize patterns and
make predictions.

The graphical user interface lets teams run and view tasks using charts, tables, or
graphs, build predictive models, and make complex operations.

Key features of Neural Designer include data mining, data visualization, model
training, and predictive analytics.

Engineers can use the platform to monitor system behavior, identify the occurrence
of faults, predict equipment failure, and improve product quality using predictive
analytics.

In addition, the application allows financial entities to categorize groups of people

with similar features and ensure relevant interaction, manage risk assessment by
evaluating the impact of unfortunate events, and identify customers likely to leave
to take action in advance.

Neural Designer enables the healthcare sector to use machine learning technology
to study environmental influence, genetic factors, and physiological data to treat
diseases effectively. In addition, it facilitates drug design, letting physicians
predict the behavior of a molecule and its characteristics.

Neural Designer helps retailers collect customer data and use artificial intelligence
to streamline sales forecasting and client targeting.

85
Key benefits of using Neural Designer

- Easy to use: Neural Designer follows a very well-defined protocol for building
neural network models and has an intuitive graphical user interface. This allows
you to develop AI-powered applications without programming or building block
diagrams.

- State-of-the-art algorithms: Neural Designer contains the most advanced

methods for data preparation, model training, feature selection, testing analysis,
response optimization, and model deployment. As a result, you will get the best
results from your data.

- High performance: Neural Designer is developed in C++ and optimized with

CPU parallelization and GPU acceleration. This allows you to analyze much larger
data sets in much less time.

2.Neuroph
Neuroph is an object-oriented artificial neural network framework written
in Java. It can be used to create and train neural networks in Java programs.
86
Neuroph provides Java class library as well as GUI tool easyNeurons for creating
and training neural networks.
It is an open-source project hosted at SourceForge under the Apache License.
Versions before 2.4 were licensed under LGPL 3, from this version the license is
Apache 2.0 License.
Neuroph is an open-source project hosted at SourceForge under the Apache
License. It is a library for creating neural networks and utilizing machine learning.
Neuroph is a lightweight Java neural network framework to develop common
neural network architectures.
Users can interact with Neuroph using :
• A GUI-based tool
• A Java library Both approaches rely on an underlying class hierarchy which
builds artificial neural networks out of layers of neurons.
Neuroph contains has nice GUI neural network editor to quickly create Java neural
network components. The software simplifies the development of a neural network
by providing Java neural network library and GUI tool that supports creating,
training and saving neural networks.

Features
• Neuroph's core classes correspond to basic neural network concepts
like artificial neuron, neuron layer, neuron connections, weight, transfer
function, input function, learning rule etc.
• Neuroph supports common neural network architectures such as Multilayer
perceptron with Backpropagation, Kohonen and Hopfield networks.
• All these classes can be extended and customized to create custom neural
networks and learning rules. Neuroph has built-in support for image
recognition.
3.Darknet
Darknet is an open-source neural network framework written in C and CUDA and
supports CPU and GPU computation. It is a convolutional neural network that is
nineteen layers deep. The pretrained network can classify images into 1000 object
87
categories such as keyboard, mouse, pencil and many animals. As a result, the
network has learned rich feature representation for a wide range of images.
Darknet is installed with only two optional dependencies like OpenCV if users
want a wider variety of support image types or CUDA if they want GPU
computation. The users can start by just installing the base system which has only
been tested on Linux and Mac computers.
4.Keras
Keras is a deep learning library for Theano and TensorFlow. The high-level neural
networks library is written in Python and capable of running on the top of both
applications. Keras is an API designed for human beings, not machines. The
software follows best practices for reducing cognitive load. It offers consistent and
simple APIs and minimizes the number of user actions required for common use
cases. Keras provides clear and actionable error messages and has extensive
documentation and developer guides. Keras deep learning library allows easy and
fast prototyping through total modularity, minimalism, and extensibility. It
supports convolutional neural networks and recurrent networks, as well as
combinations of the two.
5.NeuroSolutions
NeuroSolutions is a neural network software development environment designed
by NeuroDimension. It combines a modular, icon-based network design interface
with an implementation of advanced learning procedures, such as conjugate
gradients, Levenberg Marquardt and backpropagation through time.
NeuroSolutions product family is leading-edge neural network software for data
mining to create highly accurate and predictive models using advanced processing
techniques, intelligent automated neural network topology search through cutting-
edge distributed computing. It is a design interface with advanced artificial
intelligence and learning algorithms using intuitive wizards or an easy-to-use
Excel interface.
The software provides three separate wizards for automatically building neural
network models : • Data Manager • Neural Builder • Neural Expert

Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Lab Manual - DM
No ratings yet
Lab Manual - DM
56 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Weka Tutorial
100% (1)
Weka Tutorial
58 pages
Data Warehousing Lab Excercise
No ratings yet
Data Warehousing Lab Excercise
45 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
Data Mining Example (Using Weka)
50% (2)
Data Mining Example (Using Weka)
59 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
Weka DW&DM Lab Notes
No ratings yet
Weka DW&DM Lab Notes
37 pages
Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
NOTES
No ratings yet
NOTES
45 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Aiml Final
No ratings yet
Aiml Final
45 pages
Weka Tutorial
100% (1)
Weka Tutorial
55 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
Weka - Launching Explorer3
No ratings yet
Weka - Launching Explorer3
3 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Itdw
No ratings yet
Itdw
44 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Aiml Manual
No ratings yet
Aiml Manual
27 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
Lab Manual
No ratings yet
Lab Manual
24 pages
Weka Installation Steps Final
No ratings yet
Weka Installation Steps Final
7 pages
Data Mining Complete Lab Manual - DRSNR
No ratings yet
Data Mining Complete Lab Manual - DRSNR
27 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Your Energy Bill
No ratings yet
Your Energy Bill
4 pages
Lab 02
No ratings yet
Lab 02
4 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Weka U5
No ratings yet
Weka U5
3 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
ASTM A249 Stainless Steel Tubes
No ratings yet
ASTM A249 Stainless Steel Tubes
10 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
What Is Weka
No ratings yet
What Is Weka
2 pages
Data Mining (WEKA) en Formatted
No ratings yet
Data Mining (WEKA) en Formatted
52 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Weka Activity Report
No ratings yet
Weka Activity Report
30 pages
DWM1
No ratings yet
DWM1
19 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
Wekappt
No ratings yet
Wekappt
58 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
Exp 6
No ratings yet
Exp 6
9 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
ExplorerGuide A Version 3-5-8
No ratings yet
ExplorerGuide A Version 3-5-8
22 pages
Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Aim Theory::: Study and Working of WEKA Tool
No ratings yet
Aim Theory::: Study and Working of WEKA Tool
3 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
No ratings yet
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
16 pages
Technology-Plan BPP Frelyn
100% (2)
Technology-Plan BPP Frelyn
4 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
Arduino Energy Meter PDF
100% (2)
Arduino Energy Meter PDF
16 pages
Advanced View of Atmega Microcontroller Projects List - ATMega32 AVR
No ratings yet
Advanced View of Atmega Microcontroller Projects List - ATMega32 AVR
146 pages
Abbreviation: SR-Suspension Revoked May of 2015.: 31/08/2012 Temporarily Deactivated Due To Not Applied For Renewal
No ratings yet
Abbreviation: SR-Suspension Revoked May of 2015.: 31/08/2012 Temporarily Deactivated Due To Not Applied For Renewal
186 pages
Foundation Load (Reactions) Data FOR 45 M Diameter Thickener
No ratings yet
Foundation Load (Reactions) Data FOR 45 M Diameter Thickener
88 pages
JNCIS-SP Certification - Juniper Networks US
No ratings yet
JNCIS-SP Certification - Juniper Networks US
6 pages
I2ml2e Chap4 v1 0
No ratings yet
I2ml2e Chap4 v1 0
27 pages
DAA Presentation Greedy Aproch of Coloring
No ratings yet
DAA Presentation Greedy Aproch of Coloring
11 pages
DL QB With Ans
No ratings yet
DL QB With Ans
38 pages
HV Assignment 1 (Section 2a) (1 47)
No ratings yet
HV Assignment 1 (Section 2a) (1 47)
5 pages
Microprocessor in Agriculture
No ratings yet
Microprocessor in Agriculture
2 pages
Project On IMS-BT
No ratings yet
Project On IMS-BT
10 pages
UNIT 2.2 Functional Modeling
No ratings yet
UNIT 2.2 Functional Modeling
23 pages
PCworth Product Pricelist
No ratings yet
PCworth Product Pricelist
22 pages
FIBARO - Net - Prices For Dealers
100% (1)
FIBARO - Net - Prices For Dealers
9 pages
K039-Pic-Checklist For JCB
No ratings yet
K039-Pic-Checklist For JCB
1 page
Proposal - NorthTrend - N2N Renewal
No ratings yet
Proposal - NorthTrend - N2N Renewal
5 pages
10.1016 J.enganabound.2017.04.005 Creep Crack Analysis of Viscoelastic Material by Numerical Manifold Method
No ratings yet
10.1016 J.enganabound.2017.04.005 Creep Crack Analysis of Viscoelastic Material by Numerical Manifold Method
15 pages
Geotechnical Earthquake Engineering: Dr. Deepankar Choudhury
No ratings yet
Geotechnical Earthquake Engineering: Dr. Deepankar Choudhury
40 pages
Kolom Distilasi Tinjauan Umum
No ratings yet
Kolom Distilasi Tinjauan Umum
22 pages
Water Body Extraction From Sentinel-3 Image With Multiscale Spatiotemporal Super-Resolution Mapping
No ratings yet
Water Body Extraction From Sentinel-3 Image With Multiscale Spatiotemporal Super-Resolution Mapping
20 pages
RIL - List of Subsidiaries
No ratings yet
RIL - List of Subsidiaries
7 pages
102-00094-I RIO ZUNI Operators Manual
No ratings yet
102-00094-I RIO ZUNI Operators Manual
46 pages
Arwa Alrezehi - Shahad Sultan
No ratings yet
Arwa Alrezehi - Shahad Sultan
1 page
JP-Finance Officer
No ratings yet
JP-Finance Officer
2 pages
Tsarouchas Anastasios Resume
No ratings yet
Tsarouchas Anastasios Resume
1 page
Which Control The Pitch Angle of The Tail Rotor Blades: by Pressing On The Right Pedal, The Pitch Is
No ratings yet
Which Control The Pitch Angle of The Tail Rotor Blades: by Pressing On The Right Pedal, The Pitch Is
5 pages
Innovating HRM in The Local Government - The Northern Samar Experience - BATULA, FLORENCIO A
No ratings yet
Innovating HRM in The Local Government - The Northern Samar Experience - BATULA, FLORENCIO A
1 page

Unit-7 Tools of AI (April 9, 2024)

Uploaded by

Unit-7 Tools of AI (April 9, 2024)

Uploaded by

Study material of Unit-7

Syllabus – Tools for Machine learning: Introduction to Weka and KNIME

Weka is a popular open-source machine learning tool that provides a collection of

WEKA - An open source software provides tools for data preprocessing,

• User-friendly interface: Weka provides a graphical user interface that

• Click on the weak-3-8-3-corretto-jvm icon to start Weka.

java -jar weka.jar

Weka - Launching Explorer

Select Attributes Tab

Loading Data from Local File System

Loading Data from Web

Loading Data from DB

An Arff file contains two sections - header and data.

From the screenshot, you can infer the following points −

@attribute temperature real

Click on the Apply button and examine

Setting Test Data

Click on the Choose button and select the following classifier −

A cross represents a correctly classified instance while squares represents

The output of the data processing is shown in the screen below −

From the output screen, you can observe that −

Applying Hierarchical Clusterer

Click on the Select attributesTAB.You will see the following screen −

KNIME - First Run

show up this dialog again.

The implementation of these various ML algorithms is provided in these nodes.

The last important view that is of immediate relevance to us is

Loading Decision Tree Classifier

In the KNIME Explorer locate the following workflow −

Before we look into the execution of the workflow, it is important to understand

The File Reader node is depicted in the screenshot below −

Decision Tree Learner

KNIME - Building Your Own Model

Adding File Reader

In the Node Repository search window, type a few characters to locate

Add the node to our workspace. Set its configuration as follows −

Adding k-Means Node

Click OK to accept the defaults and to close the dialog.

Adding Cluster Assigner

Thus, for this node you have to make two connections −

At this point, our clustering is completed. We need to visualize the output

Adding Color Manager

Adding Shape Manager

Adding Scatter Plot

This completes the task of model building.

When it comes to marketing, decision-making can feel particularly risky. What is

They are possible avenues to "solve" your main problem.

Decision Tree Analysis [Example]

Let’s say you’re deciding where to advertise your new campaign:

1. On Facebook, using paid ads, or

2. On Instagram, using influencer sponsorships.

Evaluating Risk Versus Reward

In the Advertising Campaign example, there’s a 50% chance of success or failure

How to Create a Decision Tree

2. Add potential decisions and outcomes.

3. Expand until you hit end points.

4. Calculate risk and reward.

While the Advertising Campaign example had qualitative numbers to use as

Decision Tree Examples

Decision Tree Software Tools

In addition, KNIME has great examples of how to build prediction or classification

Knime is available for Windows, Linux, and Mac OSX.

The software works on Windows, Linux, and Mac OS X.

When it comes to decision trees, Orange has numerous implementations of

Orange works on Windows, Linux, and Mac OS X.

ARTIFICIAL NEURAL NETWORK TOOLS/SOFTWARE

Artificial Neural Network Software is used to simulate, research, develop, and

Artificial Neural Network Software are intended for practical applications of

Artificial Neural Network Software/Tools : Below are mentioned some of the

Neural Designer is a machine learning software designed to help businesses in

In addition, the application allows financial entities to categorize groups of people

- State-of-the-art algorithms: Neural Designer contains the most advanced

- High performance: Neural Designer is developed in C++ and optimized with

You might also like