Discipulus Owners Manual
Discipulus Owners Manual
Discipulus Owners Manual
Discipulus
Owner’s Manual
By Frank D. Francone
Discipulus, Notitia, Solution Analytics, Speed Matters, and RML are trademarks of
Register Machine Learning Technologies, Inc.
Information in this document is subject to change without notice. The software described in this document is
furnished under a license agreement. The software may be used and copied only in accordance with the terms
of those agreements. No part of this publication may be reproduced, stored, in a retrieval system, or transmitted
in any form or any means electronic or mechanical, including photocopying and recording for any purpose
other than the purchaser’s personal use without the written permission of Register Machine Learning
Technologies, Inc.
Discipulus, Notitia, Solution Analytics, Speed Matters, and RML are trademarks of Register Machine Learning
Technologies, Inc., Littleton, Colorado.
Table of Chapters
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Starting Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Table of Contents
Starting Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Discipulus Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Problem Types and Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . 3
Data Import and Preparation using Notitia . . . . . . . . . . . . . . . . . . . . 3
Evolved Program Graphic and Statistical Analysis using Solution
Analytics Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Sample Data Sets Available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Minimum System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Uninstall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Enter an Activation Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Convert the Demonstration Version to a Purchased Version . . . . . . 8
Extend an Expiring or Expired License . . . . . . . . . . . . . . . . . . . . . . 8
Upgrade or Add-On to an Existing License . . . . . . . . . . . . . . . . . . . . 8
Deactivate or Move a License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Important License Agreement Reminder . . . . . . . . . . . . . . . . . . . . . 10
Is There an Easy Way to Create a Data File for Direct Text File
Import? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Does Discipulus Come with Sample Data Sets I Can Run? . . . . . . 33
How Do I Start a Project? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Is There a Way to Find Out which Input Variables Are the Most
Important? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
How Do I Use the Models Discipulus Has Created? . . . . . . . . . . . . 69
How to Deploy Discipulus Models from within Discipulus . . . . . . . 69
How to Deploy Discipulus Models as Source Code . . . . . . . . . . . . 70
Main Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
The Main Window Menu Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
The Main Window Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
The Main Window Status Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Starting Up
Thank you for your purchase of Discipulus, Notitia and Solution
Analytics, the world’s first and fastest commercial Genetic
Programming and data analysis software. Discipulus writes computer
programs--automatically--in Java, C, C Sharp, Delphi and Intel
assembler code, all on a desktop computer.
• Installation on page 5;
• Uninstall on page 7
Discipulus Overview
Discipulus writes computer programs from examples you give it. These
examples are contained in "training data," "validation data" and "applied
(or testing) data" that you provide to Discipulus.
But you do not have to deal with the intricacies of machine code and
genetic programming directly. Discipulus configures itself intelligently
as it writes programs.
For the advanced user, Discipulus gives you detailed control over every
aspect of its operation. In fact, Discipulus wraps its low level operations
in a high-level interface that lets you get as close to the machine code as
you want – or you may stay far away.
For specialized models that do not fall into one of the above categories,
both the Enterprise and Enterprise Plus versions of Discipulus allow you
to design your own fitness function. This permits you to configure
Discipulus to solve almost any modeling problem you can imagine.
Discipulus evolved model to new data, Notitia will apply exactly the
same transforms to the new data.
Notitia opens directly from the Discipulus Project Wizard and returns
data directly to Discipulus when you are done with it.
For more information, please see the Notitia tutorial and manual, which
was installed on your hard disk when you installed Discipulus.
C:\Program Files\AimApps\Discipulus5\Data\
C:\Users\Public\AimApps\Discipulus5\Data\
"Discipulus_Notitia_Tutorial_Data_File.xls"
4. The last file is an Excel file that you would import with Notitia. It
is organized like the tutorial file into training and scoring tabs. It
too has deliberate data errors for you to experiment with fixing in
Notitia. It is a regression problem. It is called:
"Fractionating_Column_With_Phase.xls"
Installation
This section documents how to install Discipulus, together with the
companion Notitia data preparation and Solution Analytics graphic
analysis applications. Note, no separate installation is required for
Notitia or Solution Analytics. They all install together. This is the install
process.:
• Double Click on the installer file and the install process will begin on
your computer. You will see this screen:
Important Note: DO NOT install more than one copy of any version
of Discipulus on a single computer. It will cause unpredictable and
irregular performance for both copies.
Uninstall
Uninstall Discipulus, Notitia and Solution Analytics from the Windows
control panel as you would a typical Windows program.
Just enter your activation code into the four white boxes and click the
Activate button. Your activation code comes in the email you will
receive after you purchase your license.
When you finish, your version of Discipulus will now conform to the
license for which you just entered the activation code.
After the purchase, you will get an email containing your activation
code. Enter the activation code as described in Enter an Activation Code
on page 7.
After the extension, you will get an email containing your activation
code. Enter the activation code as described in Enter an Activation Code
on page 7.
After the upgrade or extension, you will get an email containing your
activation code. Enter the activation code as described in Enter an
Activation Code on page 7.
Click, the "Yes" box and then click the "Deactivate" button. The
following message box should then display:
Technical Support
For simple technical support problems, please contact us at
[email protected].
Here are some of the questions that are often raised by users about
Discipulus, Notitia and Solution Analytics. (Note, this is a Frequently
Asked Questions document, not full documentation of these three
products. For full documentation, please refer to the Discipulus, Notitia,
and Solution Analytics Owner’s Manuals.)
• How Do I Set-up Text Data Files for Direct Text File Import Into
Discipulus? on page 28
• What Values Should I Use for the Target Outputs for Classification
Problems, Ranking Problems, and Logistic Regression Problems? on
page 30
• Is There an Easy Way to Create a Data File for Direct Text File
Import? on page 33
• Does Discipulus Come with Sample Data Sets I Can Run? on page 33
• Is There a Way to Find Out which Input Variables Are the Most
Important? on page 68
f) How Do I Set-up Text Data Files for Direct Text File Import Into
Discipulus? on page 28;
3. Stop the Project. When the project has produced results to your
satisfaction, use the Finish Project button to stop the project (you
may resume it later). See: How and When Do I Terminate a
Project? on page 42;
Evolved program models map your inputs to your target output. Put
another way, program models map the independent variables to your
dependent variable. So if you have two inputs (independent variables)
and one output, Discipulus finds and parameterizes an optimal form for
the function f, where:
TgtOutput = f (InputOne,InputTwo)
So, in essence, you provide the data files that contain matched inputs
and outputs. Each row of data contains the matched input and outputs.
From them, Discipulus creates models that allow you to predict outputs
from similar inputs.
When you are finished, you have a high-precision model that lets you
predict outputs for new data.
Then, after you train your models, Notitia applies the same transforms to
new data so that it may be scored.
When you are setting up a Discipulus project for training, you open
Notitia directly from the Discipulus Project Wizard as shown in
Figure 1.
Click the "GO" button and Notitia will open. In Notitia, you will be able
to access Excel, database, connections, and text files, find missing
values and outliers, transform and group data, and split your data up into
training, validation and applied data for use in Discipulus. In addition,
Notitia will store all your transformation settings on the data when you
exit and return to Discipulus.
On the other hand, when you are using Discipulus to score new data
after you have trained models, then you open Notitia from the main file
menu as follows:
When you click on Load New Applied Data, the following window pops
up if you originally imported your training data from Notitia XML files.
Otherwise, Notitia just opens up directly. If this window pops up, make
the selections shown in Figure 3.
Once you click the "Launch Notitia" button, Notitia will open and all of
the stored transforms from your training data will be active. Select a
Data Set in Notitia (if you already loaded the scoring data) or Import a
data set for scoring. The active data set will be returned to Discipulus
with all of the stored transforms applied to it.
You may select among these three methods in the Project Wizard in the
Select Data Sets window, shown below:
For more information about using Notitia with Discipulus, please see the
Notitia Owner’s Manual and the help files that accompany Notitia. The
key thing to understand here is that Discipulus and Notitia link
automatically
information about how to use this option, please see the Notitia Owner’s
Manual and the help files accompanying Notitia.
Notitia lets you import data from virtually any data source that has
uniform rows and columns. For more information about how to use this
option, please see the Notitia Owner’s Manual and the help files
accompanying Notitia.
Columns. Each column represents either one of the inputs or the target
output.
Column Names. If you import data from Notitia or using Notitia XML
files, you may associated a name (or column heading) with each column
and that column name will be used in the evolved programs and
reporting.
On the other hand, if you use direct text file import, no column names
are permitted and Discipulus will name your inputs v000, v001,
v002 . . . in the order of the columns. There are special rules for
setting up files for direct text file import, which you may review here:
How Do I Set-up Text Data Files for Direct Text File Import Into
Discipulus? on page 28.
If you use Notitia to import data to Discipulus, Notitia will let you split
the data into training, validation and applied data sets in a variety of
ways. If you use direct text file import, you must split your data into
separate data sets yourself.
Data Sets for Scoring. For scoring data after you have finished training
a model, Discipulus requires a file with the same number of inputs as
the training data. This scoring data may optionally have a target output
column.
Data Set Dimensions. Each data set must have at least two rows and at
least two columns. The maximum number of inputs is 64.
Data Types. Discipulus accepts only numeric data. The only non-
numeric characters that are permissible are a decimal point and "E"
when used in correct exponential floating point notation.
If you import data to Discipulus using Notitia, Notitia will accept non-
numeric inputs and help you convert them, consistently, to numeric
values. If you use direct text file import, you must make that conversion
yourself outside of Discipulus.
Figure 10. The Select Data Sets Window in the Project Wizard
with the Text File Import Section Highlighted
Direct text file import occurs when you select the "Import from Text
Files" button and then browse for training, validation, and (optionally)
applied data files for import.
This method is fast; but it requires that you split the data into training,
validation and applied data before import. In addition, if you use this
method, you may not use column titles. Instead, you import unnamed
columns and Discipulus will name them v000, v001, v002 etc.
The following topics discuss how to set up your data files for direct text
import:
• What Values Should I Use for the Target Outputs for Classification
Problems, Ranking Problems, and Logistic Regression Problems? on
page 30;
• What Are Training, Validation, and Applied Data Files? on page 30;
• Is There an Easy Way to Create a Data File for Direct Text File
Import? on page 33.
In addition, the sample regression and classification files that come with
Discipulus will import correctly into Discipulus. You may find it useful
to review them. You may find out more about the sample files in Does
Discipulus Come with Sample Data Sets I Can Run? on page 33.
• 1.0
• 100
• 2345.67
Here are some values that will not read into Discipulus:
The only hard and fast rule for binary classification problems is that the
target output column in your data files should contain only two numeric
values. Typically, the values 0,1 or -1,1 are used.
In any event, Discipulus refers to the lower value you give it as Class
Zero and to the higher value you give it as Class One.
For Ranking and Logistic Regression problem types, the only values
permitted in the target output column are 0 and 1.
You must load training and validation data to run Discipulus because
they are used in model creation. You may load applied data before or
after a project is finished. Applied data plays no part in building models.
Instead, applied data lets you see how the models work on data that
played no role in building the models.
For example, a very small training file with two inputs and one output
might look the way it appears in Table 1 (the first two columns are the
inputs and the third column is the output):1 This file has only three
Table 1.
Input 1 Input 2 Target Output
2.0 4.0 6.0
3.1 5.0 8.1
1.3 3.2 4.5
examples for Discipulus to learn from. Ordinarily, your data files should
have many more examples than this.
The target output you want to predict is always in the rightmost column.
Even though the columns in Table 1 have headings, you should not use
headings in Discipulus direct text file input data files. If you use Notitia
to import data, you may use column headings.
From this data file, Discipulus would build a model that predicts the
output from the inputs. An evolved program containing only one line of
code:
would produce the output column from the input columns in this table.
1
The lines and column labels would not appear in a Discipulus data file.
They appear in the above table only for clarity.
Each of the three data sets should be representative of the problem you
are trying to model.
If your data set is very small, you may elect not to use an applied data
file. In that case, divide the data equally between the training and
validation data sets. If you have to make one bigger than the other, it is
usually better to have the training data larger.
1. Take the last third of your examples in sequence and put them into
your applied data set. That way, you can evaluate the performance
of your evolved models on the applied data, which is later in time
than the training data. This is very important for good model
building.
Then create a tab delimited text file from your spreadsheet as follows. In
Excel, make the following menu selections:
• Then in the dialog box that pops up, you should select Text Only
from the Save As Type Box and name the text file you want to create.
1. Every time you start Discipulus, the Project Setup Wizard comes
up automatically. Figure 11 shows Discipulus right after it has
started. The first page of the Project Setup Wizard is showing:
2. The alternative way to begin a new project is, on the File Menu,
click New and the Project Setup Wizard will start:
1. Name and Save Your Project. When you first start the project
wizard, you see the Project Name and Location Window. Click on
the Browse button to select a folder and name for your project file.
The project file stores all information about the project you will
run.
4. Start the Project. The fourth window in the Project Wizard is the
"Customize Parameters and Start Project" window. To start the
project, click on the "GO" button. Discipulus is entirely
configured. As the project proceeds, Discipulus will intelligently
adjust its own configuration. Two notes about the other buttons on
this page:
* The "Options" button allows advanced users to set run and project
parameters. It takes you to the Advanced Options Window.
• Project History Tab. This tab charts the performance of the project.
It shows the error rate (fitness) for the best program model (the red
line) and the best team model (the green line) at each point in your
project.
• Project Detail Tab. This tab shows detailed information about how
long it has taken individual runs to reach the various error rates found
during the project.
• Current Run Tab. This tab shows the parameters used and progress
of the current run in the project. So each time the project starts a new
run, the values in this tab will change.
For more information about the Monitor Project Window, see The
Monitor Project Window on page 88.
If you want, you can set the run termination criterion manually in the
Advanced Options window shown in Figure 17. What you set will
applied to all runs in the project.
You can get to the Advanced Options Window in two different ways--
from the main menu and from the Project Wizard:
Figure 19. The Discipulus Toolbar with the Finish Run Tool
Highlighted
You can get to the Advanced Options Window in two different ways--
from the main menu and from the Project Wizard:
halting the project and doing another project in Fixed Mode with a
shorter termination criterion--perhaps two times the level at which
the best program was found. This may shorten your run time.
If no project is currently open, a file dialog box will pop up. Choose the
project file that you want to continue. The project will continue where it
left off. If there is an open project, that project will continue from where
it left off.
To see the outputs of one of your best program models, select that
model in the Best Programs Tab of the Reports Window. Then Click the
View Results button as shown in Figure 24.
When you click the View Results button, the Data Window will open. In
Data Window chart view, the output of the program you just selected
will be shown in the "Selected Program" data series. In the Data
Window spreadsheet view, the output of the program you just selected
will be shown in the "Selected Program" column or columns as shown
in Figure 25.
You can view the outputs of a selected Team Model in a similar way.
Figure 26 shows the Team Solutions Tab of the Reports Window with
the best five member team selected.
If you click on View Results, the output of the selected team is sent to
the Data Window in the "Selected Program" data series. Thus, in Data
Windowchart view, you would see that selected program data as a line
on the chart labeled "Selected Program." In spreadsheet view, you
would see the output of that selected team as a column of outputs in a
spreadsheet. The spreadsheet view is shown above in Figure 25 with the
Selected Program column highlighted.
At this point, you may save your selected best program using two
different methods. One method saves the selected program as object
code. The other saves it in a format that lets you reload the program into
Discipulus later:
Just select a computer language to save your best program in. Then
select Browse to designate the folder and file name for your selected
program. Discipulus automatically adds the correct extension for the
particular computer language you select.
To view the code of the best, three-program team from this window,
select the "View Code" button.
The Team Solutions Code window will pop up. An example is shown in
Figure 33, which shows an evolved team solution in the C language.
Note that when you are in the Team Solutions Code window, you may
select between various computer languages to view the evolved team in,
such as CSharp, Java, and Delphi. You do that by using the Language
drop down box, which is highlighted in the above figure.
To save the object code of the selected team model, click on the Save
Program button (highlighted). That will bring up the window shown in
Figure 35.
Just select a computer language to save your best team in and then select
Browse to designate the folder and file name for your selected team.
From the Best Programs Tab, select the program you wish to view in
interactive evaluator and either double click on that program or click the
"Analyze Program" button. Interactive Evaluator will open with that
program loaded. Figure 37 shows the Interactive Evaluator window
containing the selected regression program.
Starting at the Best Programs Tab, there are five different tools for
analyzing and graphing the outputs of your best programs. they are
covered in the following topics:
• Use the Data Window to Graph the Predicted Outputs of Your Best
Program vs. the Target Outputs for the Best Program on page 63
• Use the Data Window View the Numeric Predicted Outputs of your
Best Program. on page 64
In this figure, Each row represents a single program and the programs
are ordered from best to worst. The combined training and validation
data sets (see highlighted section) are used to compute the fitness values
for the 30 best programs.
The summary statistics shown are labeled in this window and will vary
depending on your problem type. For regression problems, the Best
Programs tab shows the R2 statistic for each program, the fitness value
computed for each best program, and the run number in the current
project in which the best program was found.
The summary statistics shown in the Best Programs tab of the Reports
Window pertain to particular data sets. The data set used to compute the
fitness statistic (training, validation, etc.) is shown in the Statistic
Displayed drop down box, which is highlighted in Figure 39.
You may change the data set used to compute fitness by clicking on this
Statistic Displayed drop down box and selecting one of the options. For
example, to view the performance of one of the best programs on the
Validation Data, first choose the program and then select Validation
Data in the Statistic Displayed drop down box.
First: Select the program you wish to view in the Best Programs Tab of
the Reports Window as shown in Figure 40.
The Solution Analytics application will open and the program you were
viewing in Interactive Evaluator will be automatically loaded into it. A
regression problem in Solution Analytics looks like Figure 42.
Note the tabs in the above figure show you graphics that are particular to
regression problems. On the other hand, had you used Solution
Analytics to handle a ranking problem, Solution analytics would have
looked like Figure 43.
The outputs of the program you just selected are displayed in the
Selected Program Column as shown in Figure 44.
For more information about the operation of the Data Window, see the
Discipulus Owner’s Manual.
five different team sizes from which you may select and the fitness of
each.
To see the detailed breakdown by team vote for any particular team,
select the team. The detailed breakdown will appear in the lower
window.
you wish to view graphic analytics. Then click on the "Start Solution
Analytics" button. The Solution Analytics Application will open and
your best team will be loaded automatically into Solution Analytics. For
a regression problem, Solution Analytics will look like Figure 45 when
it opens.
For more information about the operation of the Data Window, see the
Discipulus Owner’s Manual chapter on this window.
The outputs of the team you just selected are displayed in the Selected
Program Column as shown in Figure 46.
For more information about the operation of the Data Window, see that
chapter in the Discipulus Owner’s Manual.
• Select the team you wish to view in the Best Teams Tab of the
Reports Window.
• The Team Solution Code window will pop up. Choose the language
you wish to view in the combo box at the bottom.
To save the code displayed in the Team Solution Code window, click on
the Save Button. Assembler and C code is saved in .cpp files. Java code
is saved in .java files.
You may sort the inputs on any column in the Input Impacts Tab.
Finally, you may save the input impacts report by clicking on the Save
button on the Inputs Impact Tab.
* On the file menu, click, Open. Select the project file that contains
the program you are interested in.
* On the file menu, choose Load New Applied Data. Select the file
containing the data you wish to apply the program to or use Notitia
to import the new data. This file may contain Target Outputs or it
may not. Discipulus will ask you whether the file has Target
Outputs in it and will adjust its behavior according to your answer.
* Click on the Applied Tab of the Data Window. The outputs of the
program on the new applied data appear as the Selected Program
in the Data Window.
You may save and load programs from the Interactive Evaluator.
To deploy the program that you have previously saved via the
Interactive Evaluator, take the following steps:
* On the file menu, choose Load New Applied Data. Select the file
containing the data you wish to apply the program to or import the
data from Notitia.
* Click on the Applied Tab of the Data Window. The outputs of the
program on the new applied data appear as the Selected Program
in the Data Window.
Once you have saved source code files in this manner, you may call
them from your own programs and send new data to them. The source
code files return the output of the best programs and best teams.
The interfaces by which you call the evolved source code programs is
described in detail a separate document installed with Discipulus named
Decompiled_Program_Interface.PDF.
On the other hand, if you use Notitia to get your data into Discipulus,
Notitia will use the column names you have assigned to input columns
in the decompiled best programs and best teams.
There is, unfortunately, no hard and fast rule that can describe the
appropriate relationship between rows and columns of data. Each data
set has its own distribution; and that distribution strongly affects
whether you have enough data points. For an excellent discussion of the
sufficiency of the size of a particular data set, See Pyle, Dorian, Data
Transformations for Data Mining, Morgan Kaufman Publishers, Inc.
1999.
But there are some techniques you can try to eliminate it on your data.
They are:
1. Get more data. This is the best single approach and, if more data
can be obtained, this should have the best effect on your
performance.
* From the information on the Input Impacts Tab, you can decide
which inputs had a real impact on your best solutions and which
did not.
* You may reduce the number of inputs in two ways: (1) Redo your data
set, without the unwanted inputs; or (2) Disable the inputs in the Data
Window--see Excluding Inputs from a Project on page 111.
* Reduce the Max Program size from 512 to either 128 or 256.
• The Wizard appears. Follow the steps in the Wizard until you get to
the screen that has an Options button. Click that button. The
Advanced Options Page appears;
• Click the Set Button. Set whatever individual run parameters you
want for the individual run. Click OK;
• Click the Randomize Button. Make sure all boxes are unchecked.
Click OK.
In general, Discipulus sets good defaults for these parameters and you
will often not need to change them. If you do need to change them, this
chapter tells you how.
You may open the Advanced options window in one of two ways:
• Use the Set Up Learning menu on the main menu and select Options;
or
• In Stepping Mode, Discipulus starts with short runs and then steps-up
the length of the runs during the project.
• In Fixed Mode, the length of all runs in the project is the same.
button. Then, you may set the actual value for terminating the runs by
typing in the box to the right.
7. The Single Run Options Window. See The Single Run Advanced
Options Window on page 113 and
Main Window
When you first start Discipulus, you will see the Main Window. It will
contain the first page of the Project Setup Wizard. If you cancel the
project wizard, the Main Window will appear as shown in Figure 50.
1. Menus (at the top). See The Main Window Menu Bar on page 82;
2. Toolbar (just below the menu bar). See The Main Window Toolbar
on page 87;
3. Status Bar (at the bottom of the screen). See The Main Window
Status Bar on page 88; and;
4. The Project Setup Wizard (not shown above). See The Project
Setup Wizard Windows on page 88.
The Main Window is visible at all times when you are running
Discipulus.
File Menu
Edit Menu
The edit menu lets you select and copy data from the Spreadsheet View
in the Data Window. This window is not active unless the Spreadsheet
view of the Data Window is active.
View Menu
The view menu lets you toggle the Toolbar and the Status Bar on and off.
Fitness Sub-Menu
Using the Fitness Sub-Menu, you may view or change the parameters of
the currently active Discipulus project. This menu is not active until you
have created a project using the Project Wizard. The operation of this
menu varies somewhat depending on whether a project is running or it is
finished.
When a project is running, you may use it to view the parameters of the
project, but not to change them. In that situation, the Set Up Learning
Menu looks like Figure 51:
This figure shows the menu fully expanded for the ranking fitness
function type. Please note three things:
• The check mark by Classification tells you that the current project is
running a classification fitness function.
• A menu item that is not grayed out (for example, "Regression") tells
you that you could select this menu item and it may be further
expanded for more information.
When the project is not running, you may use the items in this menu to
view or change them. Again, the check mark shows you the most
currently used fitness function. However, when the project is not
running, you may select any available fitness function and set its
parameters. In that case, you may then start the project over using the
new fitness function.
Options Sub-Menu
The Options sub-menu of the Set Up Learning Menu takes you to the
Advanced Options page, where you set project and run parameters. See
Controlling Discipulus Projects on page 75 and The Advanced Options
Window on page 112.
Run Menu
From this menu, you start, end, and continue projects. In addition, you
may use this menu to jump to the Reports Window.
Registration Menu
This menu lets you manage your license for Discipulus. There are four
sub-menus that you may use:
Window Menu
From this menu, you may switch back and forth between the various
windows in Discipulus.
Help Menu
This gives you information about how to use Discipulus.
3. It appears each time you click the New File icon in the toolbar.
For a detailed description of how the Project Setup Wizard works, see
How Do I Start a Project? on page 34.
Figure 52. The Overview Tab of the Monitor Project Window for
a Classification Problem
The Best Program and Best Team boxes show the program that
performs best on the combined training and validation data for the
project thus far.
The Best Program and Best Team boxes provide appropriate statistics
for your problem type. For example, while Figure 52 shows the
Overview Tab for a classification problem, Figure 53 shows the same
tab for a ranking (ROC curve type problem). Note that the ranking
problem displays a completely different set of statistics for the best
program and the best team because fitness is computed differently for
the ranking and classification problem types.
Figure 53. The Overview Tab of the Monitor Project Window for
a Ranking (ROC Curve) Problem
Regardless of problem type, the Project Status box in the Overview Tab
shows how many runs have been performed, how many programs have
been evaluated for fitness, the time elapsed and the current step of the
current project.
The red line shows the performance of the best program as more runs
are completed in the project. The green line show the performance of the
best team as the project proceeds. In both cases, the number reported is
error; so lower is better.
The Project Detail Tab shows the value of the best program found as
individual runs in a project get longer and longer. So it gives you
information about how long runs should be to find the best program for
your problem.
You can sort the best program values in the detail tab in two ways:
either you can view the information by how many generations a run has
been going without improving (Generations Without Improvement) or
you can view it by how many generations a run has been going since the
start of the run (Generations Since Start).
• Only one run in the project has gone as long as 160 generations
without improvement. It was a regression problem; so the best fitness
over all programs (mean squared error) was 0.055817.
The column labeled Best Five Average shows the average of the fitness
of the best five programs in the project.
Note that the Project Detail Tab only shows information where there has
been a change in the Best Program or Best Five Average columns.
Finally, in the combo box at the bottom of the Project Detail Tab, you
can select between displaying information based on Generations Since
Start (shown) and Generations Without Improvement.
The Status Box shows how long the current run has been going.
The Performance box shows the fitness of the best program in the
current run to date.
The Best Programs Tab is the starting point for analysis, graphing,
saving, simplification and editing of your best programs. Detailed
information about how to save, graph, analyze, simplify and edit your
best programs may be found in: (1) How Do I Graph and Analyze the
Outputs of a Selected Best Program Created by Discipulus? on page 57;
(2) How Do I View an Evolved Best Program Created by Discipulus? on
page 48; and (3) How Do I Save an Evolved Best Program Created by
Discipulus? on page 50
Each line in the Best Programs Tab represents one of the thirty best
programs of the project. When you first open the window, the best fit
program is in the top row.
You may select any of the best programs by clicking on that program in
the "Hit-Rate" column. In Figure 57, the third best program has been
selected.
You may sort the best programs by clicking at the top of a column.
Doing so sorts the best programs by the entries in that column.
You may display the best programs’ statistics on various data sets (e.g.
training, validation, training & validation, and applied data) by making
the appropriate selection in the "Statistic Displayed" Box. Note that the
statistics displayed varies by problem-type, i.e. regression,
classification, ranking, and logistic regression.
• View and save the code created by Discipulus for any of these thirty
best programs (click on the Analyze Program Button). See: Use
Interactive Evaluator to View Code, Simplify, Edit, and Optimize a
Best Program on page 60; or
Figure 58. The Team Solutions Tab Showing the Training &
Validation Data Performance of the Five-Program
Team on a Classification Problem
2. You may select which data set the performance data applies to by
making an appropriate selection in the "Data Set Used" Box.
* For 31.84% of the rows in your data, the best five member team
voted 0:5--that is, zero votes for class one and five votes for class
zero. In calculating this vote, each program in the five-team
solution has one vote. So in this case, a 0:5 vote means that all five
programs voted for class zero 31.84% of the time.
* The accuracy of 0:5 votes was about 94.76% on the selected data
set.
6. From the Team Solutions Tab, you can take the following steps:
* View the numeric output of the selected team for all data
examples. To do so, click on the View Results Button. The Data
Window will open up and the outputs of the selected program will
be displayed in the Data Window as "Selected Program. When you
open the data window, if you see a graph, click the "Spreadsheet"
radio button in the lower left hand corner to see a graphic view of
these data.
* You can view the code of the selected team by clicking on the
View Code Button. From the window that pops up, you can save
that code as a C, C Sharp, Delphi, Inline Assembler, or as a Java
function.
Note the Average Impact and Maximum Impact columns are empty. To
compute those columns, click an the "Calculate Impacts" button. This
can take a while as it is very computationally intensive. Now the Input
Impacts Tab looks like Figure 60.
Figure 60. The Input Impacts Tab after Computing the Average
and Maximum Columns
1. Each line in the tab represents a different input variable from your
data set. Put another way, each line represents a single column in
your input data. If you imported your data from Notitia, and there
were column names, the column names you imported appear here.
If, on the other hand, you imported text files directly into
Discipulus, Input001 represents the first column in your text file,
Input002 the second, and so forth.
2. You may sort the data in this tab by clicking at the top of the
column you wish to sort by.
• Indices for your rows of data. You only see this information if you
have used the Notitia application to import your data to Discipulus;
• The inputs from the training, validation and applied data files;
• The Target Output from your training, validation, and applied data
files;
• Various predicted outputs from best programs and best teams evolved
by Discipulus. What outputs the Data Window shows depends on
your problem type as shown in the following table.
The predicted outputs in the Data Window are a particularly useful part
of Discipulus. Each output prediction is made for every row of data in
three columns: (1) A column for the Best Program of the project
selected by Discipulus; (2) A column for the Best Team of the project
selected by Discipulus; and (3) A column for evolved programs and
teams selected by you called the "Selected Program" column.
In addition, the Data Window lets you save the predicted outputs from
Discipulus created evolved programs and teams for use outside
Discipulus.
• The Data Window is always open in Discipulus. You can just find it
on your screen and click on it; or
* The Best Team Output column contains the evolved team output
for the Best Team of the project, as selected by Discipulus.
The "Save" button (highlighted in red) lets you save data from the
current tab in the Data Window to text files.
• To select Chart View, click Chart in the lower part of the Data
Window;
* The "Best Team Output" column contains the evolved team output
for the Best Team of the project, as selected by Discipulus.
The meaning of the Best Program, Best Team, and Selected Program
columns have been explained elsewhere.
applied outputs fall outside that range. They are displayed with a value
above one or below zero, as appropriate.
• Save all Files at Once. This option lets you save training, validation
and applied data with a single click to the "Folder" and "File Name"
you designate.
• The Training, Validation, and Applied check boxes lets you save
each data file to a separately designated file and location.
• Save Column Titles will make the first row of the output file the
column titles shown in the spreadsheet view of the Data Window.
Unchecked, this option saves just the numeric data.
• Append to file adds the data you are about to save to an existing file.
• Include Indices will include the RI (Row Index) and ID (User Index)
columns if they exist.
• Save Selected Columns means that only the columns you selected
before opening this window will be saved.
• All means that all columns in the spreadsheet view will be saved.
• Save Outputs means that just the target and predicted output columns
will be saved.
Here is how you may monitor the various programs and teams from the
Data Window:
The Chart Selection Box lets you control which of these three output(s)
is displayed in the Chart View of the Data Window. Figure 61 shows the
entire Data Window including the Chart Selection Box. Figure 63 is a
blow up of the Chart Selection Box:
• Make sure you are in the Chart View of the Data Window (from the
Window Menu, select Data; then check the Chart Button at the
bottom of the screen);
• Scroll to the top of the Chart Selection Box (see Figure 63);
• Click on the box labeled V0 at the top of the Chart Selection Box.
You will see a new line appear on the chart. This new line shows you
the values of the first input variable from your data files, which
Discipulus calls V0. You may repeat this step for your other inputs, if
there are other inputs in your data files.
You may control the project level parameters and many of those single
run target parameters from the Advanced Options Window.
You may open the Advanced Options window in one of two ways:
• From the Discipulus Project Setup Wizard, select Options. See Using
the Project Setup Wizard on page 36.
1. Choose between Stepping Mode and Fixed Mode for your project
(these modes determine how Discipulus sets the duration of each
run in your project);
5. Set the target parameters for the individual runs and determine
whether each run’s parameters will be randomized.
For more information about setting the target parameters for single runs
in a project, see The Single Run Advanced Options Window on
page 113.
• How to Open the Single Run Advanced Options Window on page 113;
• How the Single Run Parameters Affect a Project on page 114; and
• How to Use the Single Run Advanced Options Window on page 115.
The Single Run Advanced Options Window will pop up. It looks like
Figure 65:
• If you do not randomize any run level parameters, then the resultant
Discipulus project will consist of many runs using the exact
parameters set in the Single Run Advanced Option page.
* Best ROC Curve then Cost Fitness Function for Ranking Problems
on page 185
Check each parameter you want to have randomized. Discipulus will use
the target value set in the Single Run Advanced Options page for that
parameter (see The Single Run Advanced Options Window on page 113)
and will randomize run parameters during your project around that
target value.
By default, the random seed for parameter randomization is set off the
system clock. This is a different random number generator than is used
for the project.
Interactive Evaluator
Interactive Evaluator is a powerful software engineering tool that goes
to work after a project is over. It lets you look at evolved programs, edit
them, simplify them, optimize them, and then explore the effects of your
changes. Some of the highlights of the Interactive Evaluator are:
• Manual and automatic editing features let you simplify your evolved
programs (see Editing a Program in Interactive Evaluator on
page 131, Automatic Intron Removal in Interactive Evaluator on
page 144, and Automatic Simplification in Interactive Evaluator on
page 145);
• The Performance Box lets you track the effect of your edits on
fitness, hit-rate and program statistics (The Performance Box in
Interactive Evaluator on page 130);
Of course, your use of these options, like all the other features of
Discipulus, must be consistent with the Discipulus License Agreement
that applies to your version of Discipulus.
After you open it, Interactive Evaluator looks like Figure 67 for
regression problem types.
You will note that the statistics reported in Interactive Evaluator in the
upper right hand corner vary depending on the problem type.
• Target Output vs. Predicted output plots and statistics, including Q-Q
plots
• Confusion Matrices
• ROC Curve
• ROC Curve
• Pseudo-ROC Curve
Figure 69 and Figure 70 illustrate how the use of the Back and Forward
Queue buttons change which program from the Queue is displayed in
the Program Body Window. (The shaded boxes show which program
from the Queue is displayed in the Program Body Window.)
Figure 69. The Program Queue, Before and After You Click on
the Back Queue Button
Figure 70. The Program Queue, Before and After You Click on
the Forward Queue Button
fitness (and for classification problems, the hit-rate) for that program. If
you are using a large training set, you may find that the pauses are quite
noticeable because the fitness calculation is time consuming.
You will see the changes in fitness (and hit-rate) among the different
programs in the Queue as you browse through the Queue. These fitness
and hit-rate values appear in the Performance Window (located in the
upper right portion of the Interactive Evaluator Window).
Regardless how you load a program, Discipulus adds the new program
to the end of the Queue.
As you make such changes, two types of shifts will occur in the Queue.
Which of these shifts occurs depends on where you are in the Program
Queue when you make such a change: For example, Changing a
Program from the End of the Interactive Evaluator Program Queue on
page 125 is different than Changing a Program from the Beginning or
Middle of the Interactive Evaluator Program Queue on page 126.
When you make changes to the program that is at the end of the Queue,
the new program(s) you just created is appended to the end of the Queue
and the program that was in the Program Body window before you
made the changes is moved toward the beginning of the Queue.
Figure 71, Figure 72, and Figure 73 illustrate how Discipulus changes
the Queue when you make changes to the program at the end of the
Queue. (The shaded box shows which program from the Queue that is
displayed in the Program Body Window.)
Figure 71. The Queue, Just after You Edit, Remove, or Add an
Instruction to the Last Program in the Queue
Figure 72. The Queue, Just after You Optimize the Constants in
the Last Program in the Queue
Figure 73. The Queue, Just after You Simplify the Last Program
in the Queue
Figure 74 illustrates how this works when you edit a program that is in
the middle of the Queue. (The shaded boxes show which program from
the Queue are displayed in the Program Body Window.)
Figure 74. The Queue, Before and After You Edit a Program
from the Middle of the Queue
• Choose where and under what name you choose to save the current
program in the Program Body Window;
• Click OK.
Saved program files are named: “*.ind.” You may later reload this
program into Interactive Evaluator. (See Loading a Saved Program on
page 129.)
• The saved assembler file is Intel 486 assembler. It will not run on
machines that are not compatible with a Pentium Pro or higher chip.
• Browse to and highlight the program you wish to load. It will have
the file extension "*.ind."
• Click OK.
The evolved program you have selected will appear in the Program
Body Window in Interactive Evaluator.
Doing this will cause the appropriate fitness figures and statistics to be
displayed in the Performance Box. The figures displayed will depend on
the problem type in your project. (see Figure 67):
In addition, the change in fitness and other statistics between the last
time you clicked Run and the most current time you click Run are
shown in parentheses. This changed figure appears in red if the fitness
(or hit-rate) has gotten worse and in black if better.
As you browse through the Queue, you will also notice numbers in
parentheses below the fitness values displayed. These numbers represent
the change in fitness and other reported statistics as between the current
evolved program displayed and the previous evolved program displayed.
As you browse back and forth through the Queue, you may notice a
short pause before the next program displays. This is to permit
Discipulus to calculate the fitness (and for classification problems, the
hit-rate) for the new program displayed in the Queue. (If you are using a
very large training set, you may find that the pauses are quite noticeable
because the fitness calculation is time-consuming.)
• Highlight the line of code in the Program Body Window just above
the place you want the new code to appear;
• Click Add;
• The Edit Instruction Box pops up. In it, you may select among all
available instructions and set their parameters (for more information
about selecting instructions and parameters, see Selecting Among
Available Instructions in Interactive Evaluator on page 134); and
• Click OK.
• Click on the Edit button (or you may replace these first two steps by
double clicking on the line of code you wish to edit);
• The Edit Instruction Box pops up. In it, you may select among all
available instructions and set their parameters. Select the instruction
you want and choose the parameter(s) you want (for more
information about selecting instructions and parameters, see Selecting
Among Available Instructions in Interactive Evaluator on page 134);
and
• Click OK.
The line of code you just selected will replace the original line of code.
You can find out more about how to use the Queue in the section
entitled Using the Interactive Evaluator Program Queue on page 123.
• Click OK.
The instruction you just selected will now be shown in the Program
Body Window.
in the upper part of the Edit Instruction Box. (See Figure 75.) The
Discipulus Owner's Manual contains extensive documentation of the
available instructions, what they are and what function they perform.
(See Instruction Set Reference on page 215) and of the types of
instructions you may encounter in this box (See Types of Instructions
and Types of Parameters in Interactive Evaluator on page 136).
• You may type a real number into the Parameter Box and click on OK.
The real value you typed in will become the constant parameter for the
instruction. The Instructions that accept a constant as a parameter are:
• FADD constant;
• FDIV constant;
• FMUL constant;
• FSUB constant.
You may find detailed information about each of these instructions and
how they work. See Instruction Set Reference on page 215.
The instructions that accept an input from your data set as a parameter
are:
• FADD [ESI+%d1];
• FDIV [ESI+%d1];
• FSUB [ESI+%d1].
f[1]+=f[0];
• NOP; • FLDZ;
• F2XM1; • FPREM;
• FABS; • FSCALE;
• FCHS; • FSIN;
• FCOS; • FSQRT;
• JB EPI+6; and
• FDECSTP;
• JNB EPI+6.
• FINCSTP;
Here are some examples of how you can combine structural changes
with constant optimization to simplify and improve your evolved
programs in Interactive Evaluator:
f[0]+=Input024;
with the equivalent instruction that contains a constant (you choose the
constant value to insert). For example:
f[0]+=2.0.
Then click on the Optimize button. (The optimization may well change
the value of the constant, "2.0.") The effect of this change on fitness will
help you determine if Discipulus is using the input, Input024, in this
program in a useful way to predict output.
worse, you may still want to eliminate this instruction in the interest of
simplifying your solution.
For example, using the Remove, Edit and/or Add buttons in Interactive
Evaluator, you might replace the following sequence of instructions:
f[0]+=3.45678989;
f[0]*=.013457890;
f[0]+=f[0] – (equivalent to multiplying by 2);
f[0]/=3.45678989;
f[0]/=3.45678989;
Then click the Optimize button. (The optimization will probably change
the value of the two constants you just added.) You may end up with an
optimized and simplified program with fitness as good or better than
before.
Here is an example of this process. Using the Remove, Edit and/or Add
buttons in Interactive Evaluator, replace the following complex
instruction:
f[0]=sqrt(f[0]);
with the following two instructions, (you choose the constant values to
insert):
f[0]*=1.000;
f[0]+=0.000.
The Simplify button can often reduce the size of an evolved program by
up to 90%, without losing fitness. Plus, the simpler program is
algebraically much simpler than the original program and executes
much faster. But automatic simplification is very computationally
intensive. You should be prepared to let your computer run only on the
simplification process over lunch (or even longer) when you click on the
Simplify Button.
• The last program in the Queue is the best program at the end of the
simplification process; and
The old program (the one from before you clicked on optimize) will be
moved to the preceding position in the Queue. Accordingly you can
return to the previous program (before the edit) by clicking on the
Queue Back button three times. Figure 73 illustrates this operation of
the Queue.
Here, you will find all of the parameters that are particular to Genetic
Programming together on one page.
Generally speaking, a run will take longer with a larger population. But,
also generally speaking, a larger population can solve more difficult
problems. One of the big advantages of Discipulus over other learning
systems is that Discipulus is fast enough to evolve very large
populations in realistic time frames.
You may access this parameter from the Genetic Programming Tab of
the Advanced Options Window (Figure 77). See Accessing Genetic
Programming Parameters on page 150.
Once the overall mutation rate is set, the particulars of the application of
the mutation operator are controlled by other parameters. Those
particulars are discussed in Advanced Mutation Parameters on
page 165.
You may access this parameter from the Genetic Programming Page of
the Advanced Options Window (Figure 77). See Accessing Genetic
Programming Parameters on page 150.
Once the overall crossover rate is set, the particulars of the application
of the crossover operator are controlled by other parameters. Those
particulars are discussed in Crossover in Genetic Programming on
page 204.
You may access this parameter from the Genetic Programming Page of
the Advanced Options Window (Figure 77). See Accessing Genetic
Programming Parameters on page 150.
The reproduction rate in a run is what is left over after the application of
the crossover and mutation operators. The reproduction rate may be
calculated (in percentages) as follows:
You may access the crossover and mutation parameters from the
Genetic Programming Page of the Advanced Options Window
(Figure 77). See Accessing Genetic Programming Parameters on
page 150.
You may access the Genetic Programming Demes parameters from the
Genetic Programming Tab of the Single Run Advanced Options Page.
The number of Demes may not exceed half the number of programs in
the population.
You may set this parameter either on the Genetic Programming Tab of
the Advanced Options Window (Figure 77) or on the Advanced Tab of
the Genetic Programming Page. See Accessing Genetic Programming
Parameters on page 150.
• Two programs are chosen from each of the selected Demes and the
better program from each Deme is selected for crossover;
• The selected programs from each Deme are crossed over. The
offspring of this crossover replace the two losers in the tournament.
You may set this parameter to any value from 0 to 100%. Generally
speaking, you should start a demes setup with no crossover and low
migration rates – migration on the order of 1% seems to work well. If
you set the migration rate too high, it effectively cancels out the effect
of having separate Demes.
You may set this parameter either on the Genetic Programming Tab of
the Advanced Options Window (Figure 77) or on the Advanced Tab of
the Genetic Programming Page. See Accessing Genetic Programming
Parameters on page 150.
You may set this parameter to any value from 0 to 100%. A low value,
from 0.1% to 10% is recommended. Generally speaking, you should
start a demes setup with no crossover and low migration rates –
migration on the order of 1% seems to work well. If you set the
migration rate too high, it effectively cancels out the effect of having
separate Demes.
You may set this parameter either on the Genetic Programming Tab of
the Advanced Options Window (Figure 77) or on the Advanced Tab of
the Genetic Programming Page. See Accessing Genetic Programming
Parameters on page 150.
Advanced Options
NOTE: The default settings for a Discipulus project work quite well for
most projects. In fact, a Discipulus project automatically sets,
randomizes and optimizes the parameters for all runs in that project.
Thus, the matters covered in this chapter should be considered
advanced subject matters that most users need not consider.
Discipulus has many advanced features. You may access them in the
Single Run Advanced Options Window. The following topics document
all features of Discipulus that have not been discussed elsewhere:
• The Advanced Options Window will pop up. Then click on the Set
button.
DSS may not be enabled when you have written your own custom
fitness function.
DSS chooses the Training Subset from the overall set of training
examples (fitness cases) that you are using for your training data set. It
chooses the Training Subset using three criteria:
• Age. The "age" of the training example (that is, how long it has been
since a particular training example was used in a Training Subset).
• Randomly.
You may set the relative importance of the above criteria. Thus, if you
selected a "Target Subset Size" of 200 you could, for example, select the
elements of that subset in the following proportions: 20% (40 training
examples) by the age of the training example, 70% (140 training
examples) by the difficulty of the training example and 10% (20 training
examples) randomly. In practice, we find that 50% by age and 50% by
difficulty works quite well.
DSS periodically discards the current Training Subset and selects a new
one. You may set the frequency with which this occurs.
You may find the DSS Page with the following menu and tab selections:
You are now in the DSS Page. Here are the parameters that you may use
to control the Dynamic Subset Selection Algorithm.
Practically speaking, this means that the longer it has been since a
training example was included in a Training Subset, the more likely it is
to be chosen in that portion of the next Training Subset that is chosen by
age.
Parsimony Pressure
Parsimony pressure is a term used to refer to techniques that tend to
make the evolved programs in Discipulus shorter--that is, more
parsimonious. Parsimony pressure causes “natural selection” in
evolutionary learning systems to favor the selection of shorter and more
compact evolved programs.
programs is less than a set percentage (you may set the percentage)
different from the average fitness of the two programs. If the difference
in the two programs’ fitness is less than that threshold percentage, then
parsimony is applied to that tournament by selecting the shorter of the
two programs as the tournament winner.
Ratio of Constants/Inputs
This parameter sets the relative weight accorded to constants and to
inputs during the initialization of the population and in the mutation
operator. A value greater than 50% results in a relatively larger use of
constants relative to inputs during evolution. A value less than 50%,
results in the reverse.
Program Size
There are two parameters that control the size of the programs you
evolve using Discipulus. Program Size parameters are measured in
bytes. They represent the length of the body of the programs in the
population. For more information about how Discipulus programs are
constructed, see Population, Program, Instruction Block and Instruction
Reference on page 211.
Table 3.
Minimum Maximum
4 bytes Initial Program
Size
• Fill in the Program Size-Max Box either by typing a value into the
box or using the slider.
• On the Advanced Options Page, click Set. The Single Run Advanced
Options Page pops up;
* If you want to set the Random Seed for a run based on the system
clock, click on System Time.
Measuring Fitness
The Genetic Programming algorithm uses a “fitness function” to
determine which evolved programs survive and reproduce. The fitness
function used depends on what type of problem you want to solve. So, if
you have two classes and want to classify rows into them, you would
use a classification fitness function. If you wanted to rank them
(common in CRM and credit scoring), you could use a ranking fitness
function. If you want to predict numeric outputs, you could use a
regression or function fitting fitness function.
Not all fitness functions are available in all versions of Discipulus. The
advanced fitness function package is comprised our new, ranking and
logistic-regression fitness functions. They are available in the Enterprise
Plus version and as an upgrade to other versions of Discipulus.
In General:
• Best ROC Curve then Cost Fitness Function for Ranking Problems
on page 185
* Ranking. There are four ranking fitness functions: (1) Best ROC
Curve Fitness Function for Ranking Problems on page 184; (2)
Best ROC Curve (Compare) Fitness Function for Ranking
Problems on page 184; (3) Best ROC Curve then Cost Fitness
You can choose your problem type and fitness function in two ways:
The four problem types appear plus "Custom Fitness Functions." Select
the appropriate one for your problem and then follow the remaining
menus that open to configure that fitness function as you see fit.
In the Project Wizard, the "Select Problem Type and Fitness Function"
window appears after you have imported data. First, click on the
appropriate problem type in the "Select Problem Type" box. All
available fitness functions appear in the "Select Preset Fitness Function"
box. Select the one you prefer and set any necessary parameters for that
fitness function in the area to the right of that box. Discipulus will only
let you change parameters that are appropriate for your selected fitness
function.
• For ranking problems, Discipulus use the "Best ROC Curve" fitness
measure described below (see Best ROC Curve Fitness Function for
Ranking Problems on page 184).
As you will see below, Discipulus gives you a good deal of power to
adjust and modify these default settings for fitness functions. Please see
Using the Project Setup Wizard on page 36
To do so:
• A menu will open up that shows all problem types. Select a problem
type.
• A new menu will open up showing all available fitness functions for
that problem type. Select one.
• If, and only if, there are parameters to be set for that fitness function,
a window will open up and you can set the parameters there. If no
window opens up, you are done.
The next sections describe, respectively, the fitness function used for
function fitting problems and the different fitness functions used for
classification problems.
evolved program match your target output in the training data. The
closer the match, the more fit the evolved program.
The raw error for each training example is the difference between the
output of an evolved program and the target output from your training
data file. (See Training, Validation, and Applied Data on page 205.)
Discipulus has two ways to average the raw error – it uses either
“absolute” or “squared” error measurements. The squared error method
is the default method in Discipulus. Here is a description of the two
different ways in which Discipulus implements the measurement of
errors:
You may choose between Absolute Error and Minimum Squared Error
in the Project Wizard or from the Main Menu as described in Accessing
Fitness Measurement Parameters after the Project Wizard is Complete
on page 176.
when you run classification problems. Instead, you are looking for high
accuracy of classification – that is, a high “hit-rate.”
Table 4 shows various combinations of values that you may use for the
target outputs for the two classes.
Then do three separate projects and make the classification based on the
results of the three separate evolved programs. Decomposing the
problem in this manner usually results in much better classification.
You may set that parameter in the Select Problem Type and Fitness
Function window of the Project Wizard or from the Main Menu as
described in Accessing Fitness Measurement Parameters after the
Project Wizard is Complete on page 176.
You may set that parameter in the Select Problem Type and Fitness
Function window of the Project Wizard or from the Main Menu as
described in Accessing Fitness Measurement Parameters after the
Project Wizard is Complete on page 176.
Discipulus uses the weights you set in the fitness function as follows.
Let weightneg be the weight you assign to negative examples. Let
weightpos be the weight you assign to positive examples. Let Hit-Rateneg
be the hit-rate for negative examples and let Hit-Ratepos be the hit-rate
for positive examples. Then the overall hit-rate for the purpose of
calculating fitness is determined as follows:
Hit-Rate weighted = ( Hit-Rate pos × weight pos ) + ( Hit-Rate neg × weight neg )
1
If you are using Dynamic Subset Selection (DSS) with the difficulty parameter set to
at least 50%, you probably will not need to adjust for the difference between the
number of positive and negative examples. (See Dynamic Subset Selection on
page 158.)
Our four new ranking fitness functions blend Area under the Curve of a
ROC curve and minimum cost to provide four innovative fitness
functions. With these fitness functions, you can harness Genetic
Programming to solve the exact problem you have, instead of trying to
shoehorn classification or regression to do ranking.
1. If we are 95% confident that Program One has a better ROC curve
AUC than Program Two, Program One wins the tournament.
2. If we are 95% confident that Program Two has a better ROC curve
AUC than Program One, Program Two wins the tournament.
The only parameter for this fitness function is the Confidence Level.
You may set this parameter in the Select Problem Type and Fitness
3. If neither of the above are true, we calculate the Cost for each of
the programs at every decision threshold along the ranking, given
the Cost of a False Negative and the Cost of a False Positive. The
program that has the lowest cost at any threshold is the tournament
winner.
You may set these parameters in the Select Problem Type and Fitness
Function window of the Project Wizard or from the Main Menu as
described in Accessing Fitness Measurement Parameters after the
Project Wizard is Complete on page 176.
given the Cost of a False Negative and the Cost of a False Positive,
Discipulus calculates the Cost at each decision threshold in the ranking.
The fitness is the minimum cost across all decision thresholds.
You may set these parameters in the Select Problem Type and Fitness
Function window of the Project Wizard or from the Main Menu as
described in Accessing Fitness Measurement Parameters after the
Project Wizard is Complete on page 176.
The outputs of the program are transformed using the logistic transform
for each row:
We then interpret P for class one as a probability and calculate the log-
likelihood for the output of the program, given the target outputs.
These probabilities are output to the three Probability of Class One rows
in the Data Window.
Although the full scope of custom fitness functions is beyond the scope
of this Manual, materials are available for those who wish to use this
advanced feature. Those materials are found in the Custom Fitness
Function folder installed with any version that has the Custom Fitness
Function capability in a file called:
Discipulus_Custom_Fitness_Functions_Interface.pdf
Using these two vectors, you calculate a fitness for the evolved program
in your DLL. Then you return that calculated fitness to Discipulus.
Fitness must be smaller as the program gets better.
This fitness function lets you interact with real systems and perform
other specialized fitness measures not possible if Discipulus handles
your data.
f[0] -= Input001;
f[0] -= 3.12345
In this example, the computation variable and the constant are terminals
while subtraction is the operator from the function set.
• Addition;
• Subtraction;
• Square Root;
• Absolute Value.
You can control the function set in your runs by using the Instruction
Set Box on the Instruction Page of the Advanced Options Window to
designate the function set in great detail. See: Choosing the Function Set
on page 197 and Weighting the Function Set on page 198.
The Terminal Set Defined. By itself, an operator from the function set
(such as addition) is useless. An addition operator must have values to
add together and a place to put the sum. The function set operators,
therefore, cannot act alone. They must have values to operate upon.
The terminal set for a run is made up of the values on which the
function set operates. For example, it includes the values that the
addition operator adds together.
Inputs (Example: Input001, Input002. . .). These are just the inputs from
your data file. Discipulus calls the first column input from your data file,
Input001; it calls the second, Input002 and so forth. Note, if you import
data to Discipulus using Notitia, Discipulus will use column names from
your files instead of "Input001." Here, we use the convention that those
input columns have been named by Discipulus as "Input001" etc.
You control how many and which constants are available from the
Program Size and Constants Page of the Advanced Options Window.
See The Terminal Set: Configuring Constants on page 192.
You are now on the Program Size and Constants Tab, which is shown in
Figure 84. All configuration of constants is done from here.
The Constant List Box displays the constants that Discipulus will use as
terminals in the next run. The Constant List Box appears to the left of
the Randomize Constants Box on the Program Size and Constants Page.
There are three ways to change the constants in the terminal set:
You may remove a constant from the Constant List Box by highlighting
the constant in the Constant List Box and clicking on the Remove
Button in the Program Size and Constants Tab (Figure 84).
The number of constants in the Constant List Box may not be greater
than sixty-four minus the number of inputs in your training data set.
• The Single Run Advanced Options Window pops up. Click on the
Instructions Tab;
l13: f[0]*=f[0];
This line of code takes the value in the temporary computation variable,
f[0], squares it and places the result back into f[0].
The Instruction Set Box appears in the lower left hand corner of the
page that you see when you make these selections.
For example, in the Addition group, you will find that there are three
different addition type instructions available. By way of example, the
first two addition instructions listed in the Addition Group perform the
following simple addition operations:
The Instruction Set Box then appears in the lower left hand corner of the
page that you see when you make these selections.
General Reference
This section contains the following reference materials:
• Data Files for Direct Text File Import Reference on page 204
• Block Mutation;
• Data Mutation.
Block Mutation
Instruction Mutation
Data Mutation
Ratio of Constants/Inputs
Discipulus learns and from which you may evaluate the quality of the
programs that Discipulus has evolved.
You may load data files from the Project Setup Wizard. See Starting the
Project Setup Wizard on page 35 and Using the Project Setup Wizard on
page 36:
Practice Note: Discipulus will not allow you to start a run unless you
have loaded both a both training and validation data sets.
Practice Note: Discipulus will allow you to start a run without applied
data loaded. Applied Data data may be loaded at any time–before or
after a run using File, Load New Applied Data.
The "Data" subdirectory contains sample training and validation data set
files for the user to experiment with.
Table 5.
Input 1 Input 2 Output
2.0 4.0 6.0
3.1 5.0 8.1
1.3 3.2 4.5
1
The lines and column labels would not appear in a Discipulus data file.
They appear in the above table only for clarity.
would produce the output column from the input columns in this table.
Discipulus would evolve this trivially simple program from the above
training data set almost immediately.
• What Are Training, Validation and Applied Data Files? on page 206;
To view the training file, click on the Training Tab in the data window.
The validation file should contain examples that are of the same type
and structure as the training examples and that comprise a good
Discipulus will not run until you have loaded both training and
validation files. If you do not want to use a separate validation file, just
load the training file in both as the training and the validation data.
Discipulus will run this way just fine.
Discipulus will not train on the examples in the validation file. That is,
Discipulus will not use the examples in the validation file as part of the
fitness function used for natural selection. Instead, it will use the
validation data to provide information to you on how well the programs
evolved by Discipulus will work on data they did not train on.
(Validation is an essential step in automatic learning and is discussed in
greater detail in Chapter 1 and Section 8.5 of Banzhaf, Nordin, Keller
and Francone, Genetic Programming, An Introduction (1998).)
The applied data file should contain examples that are of the same type
and structure as the training examples and that comprise a good
representative set of samples from the learning domain. The only
exception to this is that applied data can, but is not required to contain a
column for the Target Output.
To view the applied data file, click on the Applied Tab in the Data
Window.
• Data files must be ACSII text files. You may create ASCII files using
Word Pad (this is a utility program that comes with Windows 95/98/
• Each row in the training, validation, and applied data files must have
a separate "example" that contains both inputs and one projected
output.
• The columns of data in the training, validation, and applied data files
must be separated by a tab or a space on each row.
• The output data that you want to have Discipulus learn must be the
right hand column.
• The training, validation, and applied data files must have the same
number of columns of data in each row and must have two or more
rows and two or more columns of data.
• Every value in the training, validation, and applied data files must be
an integer or a real number.
• You should not put any non-printing characters at the end of a line or
the end of a file. Examples of non-printing characters would be, no
extra spaces or tabs at the end of a line.
• 1.0
• 100
• 2345.67
Here are some values that will not read into Discipulus:
Then create a text file from your spreadsheet as follows. In Excel, make
the following menu selections:
• Then in the dialog box that pops up, you should select Text Only
from the Save As Type Box and name the text file you want to create.
This procedure will create a properly formatted text file that may be
read directly into Discipulus.
Three files are included in your Data Folder, one for training, one for
validation and one for testing.
The Gaussian Classes. For the Gaussian problem, the classes are
generated mathematically. Class zero has eight inputs all with normal
distribution with zero mean and standard deviation equal to one. Class
one likewise has eight inputs, all with normal distributions with zero
mean but with a standard deviation equal to two. Thus, the two classes
overlap considerably (in technical terms, they are linearly inseparable),
making it difficult to distinguish among different class members.
For more information about this data set, see ESPRIT Basic Research
Project Number 6891, ELENA, Enhanced Learning for Evolutive
Population
In genetic programming, the population is a collection of computer
programs in which the learning algorithm operates. The smallest
population possible in Discipulus has five programs. The maximum
population size is limited only by the RAM in your computer.
Program
In Genetic Programming, the term “program” refers to a computer
program that is subject to learning. In Discipulus, the program is a
native machine code function that runs directly on the floating point
processor unit. In this manual, these native machine code functions are
referred to as “programs” or “evolved programs.”
The Header
In Discipulus, the program’s header initializes the floating point unit
(using the FINIT instruction) and then loads the value of zero into each
of the eight FPU registers.
The Footer
In Discipulus, the footer of a program contains a instructions that “tidy-
up” after program execution. The footer is followed by a return
instruction.
The Body
In Discipulus, the body of a program is where learning takes place. The
body of a program is comprised of Instruction Blocks which are in turn
comprised of Instructions.
Instructions
Instruction Blocks
Literature Reference
Banzhaf, W., Nordin, J., Keller, R.E. and Francone, F.D. (1998).
Genetic Programming – An Introduction To the Automatic Evolution of
Computer Programs and its Applications. Morgan Kaufmann, San
Francisco, CA, USA, and dpunkt.verlag, Heidelberg, Germany.
Sanchez and Canton (1995). Numerical Programming the 386, 486 and
Pentium. McGraw-Hill, New York, NY, USA.
This chapter describes the instructions that appear in the Instruction Set
Box on the Instruction Page of the Single Run Advanced Options
Window. You may find them as follows:
The Instruction Set Box appears on this page and is shown here as
Figure 85. It is organized into Instruction Groups. For example,
addition, arithmetic, and trigonometry are all Instruction Groups you
may use in the Instruction Set Box.
You can see and access the instructions that are contained in any
Instruction Group by clicking on the plus sign by the Instruction
Group’s name. The following topics provide additional information
about the various types of instructions that may be used in Discipulus
programs:
• Add two registers: See FADD ST(%r), ST(0) on page 218; and
C Code Description
This operator is equivalent to the following C pseudocode:
Assembler Description
This instruction adds the value in the top of the FPU stack (ST(0)) to the
value in variable FPU register designated as (%r). It places the sum into
the top of the stack (ST(0)). The value in %r is variable and is set during
evolution.
Stack Operation
None.
C Code Description
This instruction is equivalent to the following C pseudocode:
Assembler Description
This instruction adds the value in the top of the FPU stack (ST(0)) to the
value in variable FPU register designated as (%r). It places the sum into
the variable FPU register designated as (%r). The value in %r is variable
and is set during evolution.
Stack Operation
None.
FADD [ESD+%d1]
This instruction will put two different operators into your evolved
programs:
• The first adds f[0] to one of the inputs from your data file and places
the result into f[0];
• The second adds f[0] to one of the constants from the Terminal Set
and places the result into f[0].
C Code Description
The two operators referred to above are equivalent to the following lines
of C pseudocode in evolved programs:
f[0] is, of course, the temporary calculation register. The input will show
up in your evolved programs as Input001, Input002 . . . The constant
will show up as a real valued constant, such as 9.1234567.
Assembler Description
This instruction adds the value in the top of the FPU stack (ST(0)) to the
value of one of the inputs in your training data set or one of the
constants. It places the sum into the top of the stack (ST(0)). The value
in %d1 is variable (that is, which variable or which constant) and is set
during evolution.
Stack Operation
None.
FABS
This instruction takes the absolute value of f[0] and places the result
into f[0].
C Code Description
It is equivalent to this C pseudocode:
f[0]=ABS(f[0]);
Assembler Description
Takes the absolute value of the top of the FPU stack (ST(0)). It places
that absolute value back into the top of the stack (ST(0)).
Stack Operation
None.
FCHS
This instruction changes the sign of f[0] and places the result into f[0].
C Code Description
This instruction is equivalent to this C pseudocode:
f[0]=–(f[0]);
Assembler Description
Changes the sign of the value in the top of the stack register, ST(0).
Stack Operation
None.
FSCALE
This instruction multiplies f[0] by two raised to the power, f[1]. It then
places the result back into f[0].
C Code Description
It is equivalent to this pseudocode:
f[0]=f[0]*(2^f[1]);
Assembler Description
Calculates ST(0)*2^ST(1) and places the result into ST(0).
Stack Operation
None.
FSQRT
This instruction takes the square root of f[0] and places the result into
f[0].
C Code Description
This instruction is equivalent to the following C pseudocode:
f[0]=SQRT(f[0]);
Assembler Description
Takes the square root of ST(0) and places the result into ST(0).
Stack Operation
None.
C Code Description
This instruction is equivalent to the following C pseudocode:
cflag=(f[0]<f[n]);
Where cflag is a Boolean variable that can have only the values of 0 or 1
and where f[n] is the value in one of the n temporary computation
variables.
Assembler Description
Compares the contents of register ST(0) and ST(n) and sets the status
flags ZF, PF, and CF in the EFLAGS register according to the results.
Stack Operation
None.
C Code Description
This instruction is equivalent to the following C pseudocode:
Assembler Description
Tests the CF status flag and moves the source operand (ST(n)) to the
destination operand (ST(0)), if CF=1.
Stack Operation
None.
C Code Description
Equivalent C pseudocode is:
Assembler Description
Tests the CF status flag and moves the source operand (ST(n)) to the
destination operand (ST(0)), if CF=0.
Stack Operation
None.
JB EPI+6
This instruction causes the program to skip execution of the next
Instruction Block if the conditional flag (cflag) equals 1. (The
conditional flag is set by the Comparison Group instructions.)
C Code Description
A C code example follows. This code tests whether cflag=1. If it does,
the program skips over line 12:
Assembler Description
Tests the CF status flag and jumps program execution by 6 bytes if
CF=1.
Stack Operation
None.
JNB EPI+6
This instruction causes the program to skip execution of the next
Instruction Block if the conditional flag (cflag) equals 0. (The
conditional flag is set by the Comparison Group instructions.)
C Code Description
A C code example follows. This code tests whether cflag=0. If it does,
the program skips over line 12.
Assembler Description
Tests the CF status flag and jumps program execution by 6 bytes if
CF=0.
Stack Operation
None.
FXCH ST(%r)
The FXCH instruction swaps the values in f[0] and f[n]. This is an
important instruction in Register Machine configurations because it
allows the system to move values to and from the higher f[n] variables
for temporary storage while other calculations are performed in f[0].
C Code Description
The FLD instructions are equivalent to the following C pseudocode:
tmp=f[0];
f[0]=f[n];
f[n]=tmp;
Assembler Description
Swap the values in ST(0) and ST(n).
Stack Operation
None.
• Divide one register by another; place the result in f[0]: See FDIV
ST(0), ST(%r) on page 225;
• Divide one register by another; place the result in f[n]; See FDIV
ST(%r), ST(0) on page 226;
C Code Description
This operator is equivalent to the following C pseudocode:
Assembler Description
This instruction divides the value in the top of the FPU stack (ST(0)) by
the value in variable FPU register designated as (%r). It places the
difference into the top of the stack (ST(0)). The value in %r is variable
and is set during evolution.
Stack Operation
None.
C Code Description
This instruction is equivalent to the following C pseudocode:
Assembler Description
This instruction divides the value in the top of the FPU stack (ST(0)) by
the value in variable FPU register designated as (%r). It places the result
into the variable FPU register designated as (%r). The value in %r is
variable and is set during evolution.
Stack Operation
None.
FPREM
This operator causes an evolved program calculate the remainder left
when f[0] is divided by f[1] and to place the result into f[0]. This
instruction is useful for periodic data.
C Code Description
This instruction is equivalent to the following C pseudocode:
f[0]=f[0]– ((int)(f[0]/f[1])*f[1]);
Assembler Description
Computes the remainder obtained from dividing the value in the ST(0)
register (the dividend) by the value in the ST(1) register (the divisor or
modulus), and stores the result in ST(0). The remainder represents the
following value:
Stack Operation
None.
FDIV [ESD+%d1]
This instruction will put two different types of code into your evolved
programs:
• The first divides f[0] by one of the inputs from your data file and
places the result into f[0];
• The second divides f[0] by one of the constants from the Terminal
Set and places the result into f[0].
C Code Description
This operator causes an evolved program to include both of the
following lines of C pseudocode in evolved programs:
f[0] is, of course, the temporary calculation register. The input will show
up in your evolved programs as Input001, Input002. . . The constant will
show up as a real valued constant, such as 9.1234567.
Assembler Description
This instruction subtracts the value in one of the inputs in your training
data set or one of the constants, from the value in the top of the FPU
stack (ST(0)). It places the difference into the top of the stack (ST(0)).
The value in %d1 represents which value is subtracted (that is, which
variable or which constant) and is set during evolution.
Stack Operation
None.
C Code Description
This operator is equivalent to the following C pseudocode:
if (fabs(f[0])<1) f[0]=pow(2,f[0])-1;
Assembler Description
Calculates the exponential value of 2 to the power of the source operand
minus 1. The source operand is located in register ST(0) and the result is
also stored in ST(0). The value of the source operand must lie in the
range –1.0 to +1.0. If the source value is outside this range, the result is
undefined.
Stack Operation
None.
• Multiply two registers and place the result in f[0]. See FMUL ST(0),
ST(%r) on page 229;
• Multiply two registers and place the result in f[n]. See FMUL
ST(%r), ST(0) on page 229; and
C Code Description
This operator is equivalent to the following C pseudocode:
Assembler Description
This instruction multiplies the value in the top of the FPU stack (ST(0))
and the value in variable FPU register designated as (%r). It places the
product into the top of the stack (ST(0)). The value in %r is variable and
is set during evolution.
Stack Operation
None.
C Code Description
This instruction is equivalent to the following C pseudocode:
Assembler Description
This instruction multiplies the value in the top of the FPU stack (ST(0))
and the value in variable FPU register designated as (%r). It places the
product into the variable FPU register designated as (%r). The value in
%r is variable and is set during evolution.
Stack Operation
None.
FMUL [ESD+%d1]
This instruction will put two related operators into your evolved
programs:
• The first multiplies f[0] and one of the inputs from your data file and
places the result into f[0];
• The second multiplies f[0] and one of the constants from the
Terminal Set and places the result into f[0].
C Code Description
The two related operators referred to above are equivalent to the
following lines (one at a time) of C pseudocode in evolved programs:
f[0] is, of course, the temporary calculation register. The input will show
up in your evolved programs as Input001, Input002. . . . etc. Or, if you
name the input columns and use Notitia to import the data to Discipulus,
your input names will appear in the evolved programs. The constant will
show up as a real valued constant, such as 9.1234567.
Assembler Description
This instruction multiplies the value in one of the inputs in your training
data or one of the constants, to the value in the top of the FPU stack
(ST(0)). It places the product into the top of the stack (ST(0)). The value
in %d1 represents which value is subtracted (that is, which variable or
which constant) and is set during evolution.
Stack Operation
None.
FINCSTP
This instruction increments the FPU stack pointer by 1. It makes no
changes to the contents of the registers.
• Subtract two registers and put the result in f[0]. See FSUB ST(0),
ST(%r) on page 232;
• Subtract two registers and put the result in f[n]. See FSUB ST(%r),
ST(0) on page 232; and
C Code Description
This operator is equivalent to the following C pseudocode:
Assembler Description
This instruction subtracts the value in the top of the FPU stack (ST(0))
from the value in variable FPU register designated as (%r). It places the
difference into the top of the stack (ST(0)). The value in %r is variable
and is set during evolution.
Stack Operation
None.
C Code Description
This instruction is equivalent to the following C pseudocode:
Assembler Description
This instruction subtracts the value in the top of the FPU stack (ST(0))
from the value in variable FPU register designated as (%r). It places the
difference into the variable FPU register designated as (%r). The value
in %r is variable and is set during evolution.
Stack Operation
None.
FSUB [ESD+%d1]
This instruction will put two related operators into your evolved
programs:
• The first subtracts one of the inputs from your data file from f[0] and
places the result into f[0];
• The second subtracts one of the constants from the Terminal Set from
f[0] and places the result into f[0].
C Code Description
The two related operators referred to above are equivalent to the
following lines of C pseudocode in evolved programs:
f[0] is, of course, the temporary calculation register. The input will show
up in your evolved programs as Input001, Input002, etc. Or, if you
assigned column names for your inputs and used Notitia to import the
data, your column names will be used in the evolved programs. The
constant will show up as a real valued constant, such as 9.1234567.
Assembler Description
This instruction subtracts an input or a constant from the value in the top
of the FPU stack (ST(0)). It places the result into the top of the stack
(ST(0)). The value in %d1 represents which value is subtracted (that is,
which variable or which constant) and is set during evolution.
Stack Operation
None.
FCOS
This instruction calculates the cosine of f[0] and puts the result into f[0].
C Code Description
This operator is equivalent to the following C pseudocode:
f[0]=cos(f[0]);
Assembler Description
Calculates the cosine of the source operand in register ST(0) and stores
the result in ST(0).
Stack Operation
None.
FSIN
This instruction calculates the sin of f[0] and puts the result into f[0].
C Code Description
This operator is equivalent to the following C pseudocode:
f[0]=sin(f[0]);
Assembler Description
Calculates the sine of the source operand in register ST(0) and stores the
result in ST(0). The source operand must be given in radians.
Stack Operation
Pushes a value onto the stack. Decrements the stack pointer.
Index
A
Addition Instruction Group 217
Advanced Options Window 114, 158
finding 157
Age 159
Analysis
evolved program 14
models 14
Arithmetic Instruction Group 219
Assembler 190
equivalent 196
mnemonics 197
Atom 189
Automatic programming
function set 189
terminal set 189
B
Best Evolved Program 110
Best Programs
deployment 69
Block Mutation
block rate 165
in genetic programming 203
Body 211, 212
C
C Code 190
saving for compilation into other programs 128
cflag 191, 221, 222, 223, 224
Chart Selection Box 111
Chart View 111
Classifications
three or more 180
Code
adding a line of 131
eliminating excess lines 142
Comparison Instruction Group 221
Comparison Instructions 191
Condition Instruction Group 222
Conditional Branching 191
Conditional Flags 191
Constant
optimizing
combining with manual simplification 141
Constants 167–170, 191–194
eliminating stacked 143
how Discipulus optimizes 139
inputting 193
letting Discipulus create 194
optimizing
how to 139
parameters 192
ratio of to inputs 167, 203
removing 194
weight 167
Crossover 152
advanced 165
frequency 152, 202
homologous 168, 204
in genetic programming 165, 202
non-homologous 168, 204
and program size 171
rate 152
Crossover rate 152
D
Darwinian Natural Selection 14
Data
Time Series 32
Data File
creating 33
creating with Microsoft Excel 33
example 30
general rule for splitting up data 32
Splitting between training, validation and applied 32
Data Files 28, 204
loading 204
opening 83, 204
order of examples 109
outputs 30
sorting 109
types of data 29
Data Mutation 202, 203
Data Transfer Instruction Group 224
Data Window 101
chart selection box 111
continuous output 109
displaying inputs 101
displaying outputs of best programs 101
displaying target output 101
training tabs 109
validation tabs 109
Deme
crossover percentage between 154
enabled/not enabled 154
migration rate between 155
number of 154
parameters 154–155
usefulness of 153
Deployment
from project file 69
Difficulty 159
Division Instruction Group 225
DSS
age 159
defined 159
difficulty 159
enabled 161
frequency of changing subset 162
random 159
selection by age 161
selection by difficulty 161
stochastic selection 161
target subset size 161
training subset 159
Dynamic Subset Selection. See also DSS 158–162
E
Error Measurements
linear 178
squared 178
Evolution 191, 192, 198
natural 168
speeding up 158
Evolved Program
analysis 14
best of run on training data 110
during reporting period 110
best of run on validation data 110
C code in 190
calculation variable in 195
cflag in 191
computation variable in 189
constants in 189, 191
display of 110
inputs in 190
line of code in 189
loading into interactive evaluator 128
outputs 111
saving for later use 128
selecting in chart selection box 111
temporary computation variables in 191
Evolved program
deployment 14
Evolved Programs
analysis 14
defined 14
deployment
programming interface 70
determining if two are tied 182
saving from interactive evaluator 127
Examples
assigning weights to 182
positive and negative 182
Exponential Instruction Group 228
F
FABS 219
FADD 217, 218
FCHS 220
FCMOVB 222
FCMOVNB 223
FCOMI 221
FCOS 234
FDECSTP 231
FDIV 225, 226, 227
File Menu 83
FINCSTP 231
Fitness Function
custom 187
DSS 158
hits-then-error 181
linear error 177
linear error measurement 177
overview 177
square of the error 177
squared error measurement 177
FMUL 229, 230
Footer 212
FPREM 226
FPU 195
how to get detailed information about 196
preset files 190
Frequency
crossover 152
in generation equivalents 162
mutation 152
Frequently Asked Questions 11
FSCALE 220
FSIN 234
FSQRT 221
FSUB 232, 233
Function Set 189
choosing 197
defined 190
weighting 198
FXCH 224
G
Genetic Programming
algorithm 201
deme 153
initial population 169
mutation frequency 152
parameters
crossover rate 152
maximum number of FPU registers in 195
mutation rate 151
population size 151, 152, 153
reproduction rate 153
reference 201
search operators 202
H
Header 212
Hit-Rate
definition 180
positive and negative 180
reporting of 181
Homologous crossover 168
I
Individual 211
Initial Population 167, 169
Initial Program Size 169
Input
sensitivity analysis 68
Input Impacts Tab 68
Inputs 190
detecting spurious 142
ratio of to constants 167, 203
weight 167
Installation 5
Instruction 203
mutation 203
mutation rate 166
ratio of constants/inputs 167
Instruction Block
crossover 204
homologous crossover 168
length 165
non-homologous crossover 204
reference 211
Instruction Data Mutation Rate 167
Instruction Group
addition 217
arithmetic 219
comparison 221
condition 222
data transfer 224
division 225
exponential 228
multiplication 229
rotate stack 231
subtraction 231
trigonometric 234
Instruction Rate Mutation Box 166
Instructions
choosing 133, 134
FABS 219
FADD 217, 218
FCHS 220
FCMOVB 222
FCMOVNB 223
FCOMI 221
FCOS 234
FDECSTP 231
FDIV 225, 226, 227
FINCSTP 231
FMUL 229, 230
FPREM 226
FSCALE 220
FSIN 234
FSQRT 221
FSUB 232, 233
FXCH 224
installation 5
JB 223
JNB 224
reference 215
that accept a constant 136
that accept a register 137
that accept an input 136
that have no parameters 138
types 136
Intel FPU Registers. See also FPU 196
Interactive Evaluator
adding a line of code in 131
calculating program fitness 129
default program load 129
editing program in 131
initial queue 123
loading a saved program 129
loading programs into 128
opening 121
performance box 130
running program on new (applied) data 130
saving evolved programs 127
saving programs for later use 128
viewing program outputs 130
Introns 144
J
JB 223
JNB 224
L
License Agreement 10
Linear Fitness Function 177
M
Main WIndow 81
Main Window 81
menu bar 82
menus 82
status bar 82, 88
toolbar 82, 87
Maximum Program Size 170, 171
Menu Bar 82
Menus
set up learning 84
Microsoft Word Pad 207
Minimum 5
Minimum System Requirements 5
Model Building
analysis 57
steps 13
team models 23
Models
analysis 14
Monitor Project Window 91
current run tab 93
overview 40
overview tab 89
Multiplication Instruction Group 229
Mutation
in genetic programming 151
of data 167
of instruction blocks 165
of instructions 166
rate of constants/inputs 167
Mutation Rate
in genetic programming 151
N
Natural Selection 14
Non-homologous crossover 168, 171
O
Operator
as part of function set 189
complex 143
examples of 190
linear 143
register machine 199
replacing complex with linear 143
stack 199
using preset files to configure 190
Outputs
class one 180
class zero 180
classifying 179
continuous 109
controlling display of 110
of best evolved programs 101
target values 179
Overfitting
detecting 72
eliminating 72
how to address 71
P
Parameters
advanced 157
block mutation 165
choosing 133
data mutation 167
in custom DLL 188
initial program size 169
maximum program size 170
Program Size
and constants 191
initial 169
maximum 170
Project
continuing where you left off 45
Defined 15
File 15
finish 14
information available while running 39, 40
runs included 15
starting 14, 33
starting with project setup wizard 39
when to stop a project 42
project detail tab 91
Project File
naming project file with the project setup wizard 39
Project Setup Wizard
starting 88
using 36
R
RAM 5
Random Seed
system clock 171
Ratio of Constants/Inputs 167, 203
Ratio of Constants/Inputs Box 167
Raw Error 178
Register Machine
operator 199
Removing Introns 144
Replacement
instruction block 203
instructions 203
Reporting Period
best of run on training data 110
continuous output 109
Reports Window 94
S
Saved Programs
deployment from interactive evaluator 70
Search Operator
block mutation rate 166
homologous crossover 169
in genetic programming 165
instruction data mutation rate 167
instruction mutation rate 166
Selection
by age 161
by difficulty 161
dynamic subset 158
stochastic 161
Set Up Learning Menu 84
Simplification
automatic 145
choosing standard or thorough 145
speeding up 145
Simulated Annealing
reference 204
Small Data Set 32
Sorting 112
Square Fitness Function 177
Stack
operator 199
Statistics Window 113
Status Bar 88
Stochastic Selection 161
T
Target Subset Size 161
Team Models
output
viewing graph of 67
viewing numeric outputs 67
viewing C, Java or Assembler Code 68
viewing performance statistics 65
Technical 10
Technical Support 10
Temporary Computation Variables
FPU register equivalents in 195
number of 195
role of 195
special role of 195
Terminal Set 189
cflag in 191
choosing 192
conditional flags in 191
constants in 191
defined 190
inputs 190
temporary computation variables in 191, 195
weighting 196
Testing Data 206
Testing File 206
Toolbar 82, 87
Tournaments
selection 162
Training
subset 159
tabs 109
Training Data
best of run on 110
defined 30
Training File
ASCII text in 207
creating 207
defined 206
using Microsoft Excel to create 207
Training Subset 159
Training Tabs 109
Trigonometric Instruction Group 234
V
Validation
tabs 109
Validation Data
best of run on 110
Validation File
ASCII text in 207
creating 207
defined 206
using Microsoft Excel to create 207
Validation Tabs 109
Viewing Inputs 101, 111
W
Weighting
function set 198
terminal set 196
Windows
data 101
main 81
statistics 113
Windows 2000 5
Windows 98 5
Windows NT 5