0% found this document useful (0 votes)

26 views272 pages

Discipulus Owners Manual

El mejor producto de GA

Uploaded by

clarod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views272 pages

Discipulus Owners Manual

El mejor producto de GA

Uploaded by

clarod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 272

“Speed Matters”

Discipulus

with Notitia and Solution Analytics

Owner’s Manual
By Frank D. Francone

Genetic Programming, Data Preparation, and Graphic Analysis

Discipulus, Notitia, Solution Analytics, Speed Matters, and RML are trademarks of
Register Machine Learning Technologies, Inc.

Copyright 1998-2010 Register Machine Learning Technologies, Inc.

Information in this document is subject to change without notice. The software described in this document is
furnished under a license agreement. The software may be used and copied only in accordance with the terms
of those agreements. No part of this publication may be reproduced, stored, in a retrieval system, or transmitted
in any form or any means electronic or mechanical, including photocopying and recording for any purpose
other than the purchaser’s personal use without the written permission of Register Machine Learning
Technologies, Inc.

Discipulus, Notitia, Solution Analytics, Speed Matters, and RML are trademarks of Register Machine Learning
Technologies, Inc., Littleton, Colorado.

Register Machine Learning Technologies, Inc.

7606 S. Newland St. Littleton, CO 80128
www.rmltech.com

Logo design by Joann Kalmus-Leonard

Page layout by Frank Francone and William Metzger
Page i

Table of Chapters
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Starting Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Frequently Asked Questions . . . . . . . . . . . . . . . . . . . . . 11

Controlling Discipulus Projects . . . . . . . . . . . . . . . . . . . 75

Discipulus Window Workspaces . . . . . . . . . . . . . . . . . . 81

Interactive Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Advanced Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Measuring Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Function And Terminal Sets . . . . . . . . . . . . . . . . . . . . 189

General Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

DiscipulusTM Software Owner’s Manual

Page ii

DiscipulusTM Software Owner’s Manual

Page iii

Table of Contents
Starting Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Discipulus Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Problem Types and Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . 3
Data Import and Preparation using Notitia . . . . . . . . . . . . . . . . . . . . 3
Evolved Program Graphic and Statistical Analysis using Solution
Analytics Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Sample Data Sets Available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Minimum System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Uninstall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Enter an Activation Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Convert the Demonstration Version to a Purchased Version . . . . . . 8
Extend an Expiring or Expired License . . . . . . . . . . . . . . . . . . . . . . 8
Upgrade or Add-On to an Existing License . . . . . . . . . . . . . . . . . . . . 8
Deactivate or Move a License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Important License Agreement Reminder . . . . . . . . . . . . . . . . . . . . . 10

Frequently Asked Questions . . . . . . . . . . . . . . . . . . . . . 11

How Do I Use Discipulus to Build Models? . . . . . . . . . . . . . . . . . . 13

What Does Discipulus Do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
How Does the Notitia Data Preparation Module Work with
Discipulus? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
What Does the Solution Analytics Visualization Module Do? . . . . 18

DiscipulusTM Software Owner’s Manual

Page iv

What Kinds of Problems Will Discipulus Handle? . . . . . . . . . . . . . 22

What is the Difference between a Project and a Run in
Discipulus? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
What Are "Program Models" and "Team Models?" . . . . . . . . . . . 23
How Do I Import Data into Discipulus? . . . . . . . . . . . . . . . . . . . . . 24
How Do I Transform, Clean-up, and Split My Data before I
Model with Discipulus? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
How do I Find and Handle Outliers and Missing Values before I
Model with Discipulus? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Can I Get Data into Discipulus from Excel Files, Databases or
Windows Connections? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
What Kind of Data Does Discipulus Use? . . . . . . . . . . . . . . . . . . . . 26
How Do I Set-up Text Data Files for Direct Text File Import
Into Discipulus? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
What Types of Data May I Use in Data Files? . . . . . . . . . . . . . . . . 29
What Values Should I Use for the Target Outputs for
Classification Problems, Ranking Problems, and Logistic
Regression Problems? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
What Are Training, Validation, and Applied Data Files? . . . . . . . 30
What Does an Input Data File Look Like? . . . . . . . . . . . . . . . . . . . 31
How Should I Split My Data among the Training, Validation,
and Applied Data Files? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Is There a General Rule for Dividing Data? . . . . . . . . . . . . . . . . . . 32
What if I Have a Very Small Data Set? . . . . . . . . . . . . . . . . . . . . . 32
How Should I Split Time Series or Sequential Data? . . . . . . . . . . . 32

Is There an Easy Way to Create a Data File for Direct Text File
Import? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Does Discipulus Come with Sample Data Sets I Can Run? . . . . . . 33
How Do I Start a Project? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

DiscipulusTM Software Owner’s Manual

Page v

Starting the Project Setup Wizard . . . . . . . . . . . . . . . . . . . . . . . . . 35

Using the Project Setup Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

What Should I Look for when a Project Is Running? . . . . . . . . . . 40

How Do I Make Runs in a Project Terminate? . . . . . . . . . . . . . . . . 41
How and When Do I Terminate a Project? . . . . . . . . . . . . . . . . . . . 42
Manual Project Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Automatic Project Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Guidelines for Project Termination . . . . . . . . . . . . . . . . . . . . . . . . 44

How Do I Continue a Project After I Have Stopped It? . . . . . . . . . 45

Can I View the Predicted Outputs of an Evolved Best Program
or a Best Team? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
How Do I View an Evolved Best Program Created by
Discipulus? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
How Do I Save an Evolved Best Program Created by
Discipulus? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
How Do I View an Evolved Best Team Created by Discipulus? . . . 52
How Do I Save an Evolved Best Team Created by Discipulus? . . . 54
How Do I Open the Interactive Evaluator Window? . . . . . . . . . . . 56
How Do I Graph and Analyze the Outputs of a Selected Best
Program Created by Discipulus? . . . . . . . . . . . . . . . . . . . . . . . . . 57
View Summary Best Program Statistics in the Best Programs
Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Use Interactive Evaluator to View Code, Simplify, Edit, and
Optimize a Best Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Use Solution Analytics to Graph and Analyze the Outputs of a
Best Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Use the Data Window to Graph the Predicted Outputs of Your
Best Program vs. the Target Outputs for the Best Program . . . 63
Use the Data Window View the Numeric Predicted Outputs of your
Best Program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

DiscipulusTM Software Owner’s Manual

Page vi

How Do I Graph and Analyze the Outputs of a Selected Best

Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Viewing Summary Statistics for the Best Team Models . . . . . . . . . 65
Viewing Graphic Analytics of the Output of a Selected Best
Team Model in Solution Analytics . . . . . . . . . . . . . . . . . . . . . . . 65
Viewing a Graph of the Outputs of a Selected Best Team Model
in the Data Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Viewing Numeric Outputs of a Selected Best Team Model in
the Data Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Viewing and Saving C, Java, or Assembler Code of a Selected
Best Team Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Is There a Way to Find Out which Input Variables Are the Most
Important? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
How Do I Use the Models Discipulus Has Created? . . . . . . . . . . . . 69
How to Deploy Discipulus Models from within Discipulus . . . . . . . 69
How to Deploy Discipulus Models as Source Code . . . . . . . . . . . . 70

What Do Input001, Input002, etc. Represent in the Interactive

Evaluator? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
What Do f[0], f[1] Represent in Interactive Evaluator? . . . . . . . . . 71
What about Overfitting? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
How to Detect Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
How to Eliminate Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

How Do I Do a Single Run? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Controlling Discipulus Projects . . . . . . . . . . . . . . . . . . 75

Where to Control Project-Level Settings . . . . . . . . . . . . . . . . . . . . . 76

Choosing Stepping Mode or Fixed Mode for Run Termination . . . 77
Setting the Stepping Mode Parameters . . . . . . . . . . . . . . . . . . . . . . 78
Setting the Fixed Mode Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 78

DiscipulusTM Software Owner’s Manual

Page vii

Setting Run Randomization Targets (the "Set" Button) . . . . . . . . . 79

Setting which Run Parameters Randomize between Runs (the
"Randomize" Button) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Discipulus Window Workspaces . . . . . . . . . . . . . . . . . . 81

Main Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
The Main Window Menu Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
The Main Window Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
The Main Window Status Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

The Project Setup Wizard Windows . . . . . . . . . . . . . . . . . . . . . . . . . 88

The Monitor Project Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
The Overview Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
The Project History Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
The Project Detail Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
The Current Run Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

The Reports Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

The Best Programs Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
The Team Solutions Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
The Input Impacts Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

The Data Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Opening the Data Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
The Data Window in Chart View . . . . . . . . . . . . . . . . . . . . . . . . . 103
The Data Window in Spreadsheet View . . . . . . . . . . . . . . . . . . . 104
Switching between Chart View and Spreadsheet View in the
Data Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Best Program, Best Team, and Selected Program Output
Columns in Spreadsheet View of Data Window . . . . . . . . . . . 106
The Three "Probability of Class One" Columns in Spreadsheet
View for Classification and Regression Problems . . . . . . . . . . 106

DiscipulusTM Software Owner’s Manual

Page viii

The Three "Ranking" Columns in Spreadsheet View for

Ranking and Logistic Problems . . . . . . . . . . . . . . . . . . . . . . . . 107
Saving Data to File from the Spreadsheet View . . . . . . . . . . . . . 108
Copying Data from the Spreadsheet View . . . . . . . . . . . . . . . . . . 109
Making Chart View More Useful by Sorting Your Training and
Validation Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Switching between Training, Validation, and Applied Data in
the Data Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Turning on Continuous Display in the Data Window . . . . . . . . . . 109
Refreshing the Data Window Manually . . . . . . . . . . . . . . . . . . . . 110
Controlling the Display of the Outputs of the Three Best
Evolved Programs in the Data Window . . . . . . . . . . . . . . . . . . 110
Controlling which Inputs the Data Window Chart Displays . . . . . 111
Excluding Inputs from a Project . . . . . . . . . . . . . . . . . . . . . . . . . . 111

The Advanced Options Window . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

The Single Run Advanced Options Window . . . . . . . . . . . . . . . . . 113
How to Open the Single Run Advanced Options Window . . . . . . 113
How the Single Run Parameters Affect a Project . . . . . . . . . . . . 114
How to Use the Single Run Advanced Options Window . . . . . . . 115

The Randomize Parameters Window . . . . . . . . . . . . . . . . . . . . . . . 117

The Interactive Evaluator Window . . . . . . . . . . . . . . . . . . . . . . . . 118

Interactive Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Opening Interactive Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Opening Solution Analytics from Interactive Evaluator . . . . . . . . 122
Using the Interactive Evaluator Program Queue . . . . . . . . . . . . . 123
The Initial Interactive Evaluator Program Queue . . . . . . . . . . . . . 123
Moving Around in the Interactive Evaluator Program Queue . . . . 124
What Happens to the Interactive Evaluator Program Queue
When You Load Programs into Interactive Evaluator? . . . . . . 125

DiscipulusTM Software Owner’s Manual

Page ix

What Happens to the Interactive Evaluator Program Queue

When You Make Changes to the Program Displayed in the
Program Body WIndow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Saving and Loading Evolved Programs in Interactive

Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Saving Evolved Programs from Interactive Evaluator . . . . . . . . . 127
Loading Evolved Programs into Interactive Evaluator . . . . . . . . . 128

Calculating the Fitness of a Program in Interactive Evaluator . . 129

Viewing the Outputs of a Program from Interactive Evaluator . . 130
The Performance Box in Interactive Evaluator . . . . . . . . . . . . . . 130
Viewing Changes in the Fitness and other Statistics as You
Browse through the Interactive Evaluator Queue . . . . . . . . . . 130

Editing a Program in Interactive Evaluator . . . . . . . . . . . . . . . . . 131

Add a Line of Code to the Interactive Evaluator Program . . . . . . 131
Remove a Line of Code from the Interactive Evaluator Program 132
Change a Line of Code in the Interactive Evaluator Program . . . 132
Effect of Editing a Program on the Interactive Evaluator
Program Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Choosing Instructions and Parameters While Editing Programs

in Interactive Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Selecting Among Available Instructions in Interactive
Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Selecting Parameters for an Instruction in Interactive
Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Types of Instructions and Types of Parameters in Interactive
Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Optimizing Constants of Evolved Programs in Interactive

Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
How to Optimize Constants in Interactive Evaluator . . . . . . . . . . 139
How Discipulus Optimizes Constants in Interactive Evaluator . . 139
Effect of Optimizing Constants on the Interactive Evaluator
Program Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

DiscipulusTM Software Owner’s Manual

Page x

Controlling the Speed and Intensity of Constant Optimization in

Interactive Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Combining Constant Optimization with Manual Program
Simplification in Interactive Evaluator . . . . . . . . . . . . . . . . . . . 141
Detect Spurious Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Eliminate Excess Lines of Code . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Eliminate Stacked Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Replacing Complex Operators with Linear Operators . . . . . . . . . 143

Automatic Intron Removal in Interactive Evaluator . . . . . . . . . . . 144

Automatic Simplification in Interactive Evaluator . . . . . . . . . . . . 145
Configuring Automatic Simplification in Interactive Evaluator --
the Use of the Options Page . . . . . . . . . . . . . . . . . . . . . . . . . . 145
The Effect of Automatic Simplification on the Interactive
Evaluator Program Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Practice Tips for Interactive Evaluator . . . . . . . . . . . . . . . . . . . . . 147

Genetic Programming Parameters . . . . . . . . . . . . . . . 149

Accessing Genetic Programming Parameters . . . . . . . . . . . . . . . . 150

Basic Genetic Programming Parameters . . . . . . . . . . . . . . . . . . . . 151
Genetic Programming: Population Size . . . . . . . . . . . . . . . . . . . . 151
Genetic Programming: Mutation Rate . . . . . . . . . . . . . . . . . . . . . 151
Genetic Programming: Crossover Rate . . . . . . . . . . . . . . . . . . . . 152
Genetic Programming: Reproduction Rate . . . . . . . . . . . . . . . . . 153

Advanced Genetic Programming Deme Parameters . . . . . . . . . . . 153

Genetic Programming Demes: Enabled/Not Enabled Check
Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Genetic Programming Demes: Number of Demes . . . . . . . . . . . . 154
Genetic Programming Demes: Crossover Percentage Between
Demes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

DiscipulusTM Software Owner’s Manual

Page xi

Genetic Programming Demes: Migration Rate Between

Demes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Genetic Programming Demes: Practice Note Regarding
Crossover and Migration Rates between Demes . . . . . . . . . . 156

Advanced Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Finding the Single Run Advanced Options Window . . . . . . . . . . . 157

Dynamic Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Dynamic Subset Selection Overview . . . . . . . . . . . . . . . . . . . . . . 158
Dynamic Subset Selection Parameters . . . . . . . . . . . . . . . . . . . . 159

Parsimony Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Advanced Crossover and Mutation . . . . . . . . . . . . . . . . . . . . . . . . 165
Advanced Mutation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 165
Advanced Crossover Parameters – Homologous Crossover . . . 168

Program Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Initial Program Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Maximum Program Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Maximum Program Size and Non-Homologous Crossover . . . . . 171

Setting the Random Seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Measuring Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Choosing a Problem Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

Default Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Accessing Fitness Measurement Parameters after the Project
Wizard is Complete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Fitness Function Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Fitness Measures for Regression Problems . . . . . . . . . . . . . . . . . . 177
Fitness Measure for Classification Problems . . . . . . . . . . . . . . . . 178

DiscipulusTM Software Owner’s Manual

Page xii

Classification Problems: How Discipulus Classifies Evolved

Program Outputs. Setting the Threshold . . . . . . . . . . . . . . . . . 179
Classification Problems: How to Handle Problems with Three or
More Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Classification Problems: Hit-Rates Defined . . . . . . . . . . . . . . . . . 180
Classification Problems: Reporting of Overall, Positive, and
Negative Hit-Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
How the Hits-then-Error Fitness Function Works . . . . . . . . . . . . . 181
How Discipulus Determines if Two Evolved Programs are Tied
in the Hits-then-Error Fitness Function . . . . . . . . . . . . . . . . . . 182
Assigning Different Weights to Positive and Negative Examples
for Hits-then-Error Fitness Functions . . . . . . . . . . . . . . . . . . . . 182

Fitness Measures for Ranking Problem Types . . . . . . . . . . . . . . . 183

The Four Ranking Fitness Functions . . . . . . . . . . . . . . . . . . . . . . 184
Best ROC Curve Fitness Function for Ranking Problems . . . . . . 184
Best ROC Curve (Compare) Fitness Function for Ranking
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Best ROC Curve then Cost Fitness Function for Ranking
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Minimum Cost Fitness Function for Ranking Problems . . . . . . . . 185

Fitness Measure for Logistic-Regression Binary Target Output

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Custom Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Function And Terminal Sets . . . . . . . . . . . . . . . . . . . . 189

Function and Terminal Sets Defined . . . . . . . . . . . . . . . . . . . . . . . 190

Choosing the Terminal Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
The Terminal Set: Configuring Constants . . . . . . . . . . . . . . . . . . 192
The Terminal Set. Configuring Temporary Computation
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Weighting the Terminal Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

DiscipulusTM Software Owner’s Manual

Page xiii

Choosing the Function Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

The Function Set: Types of Instructions Available . . . . . . . . . . . . 197
The Function Set: Choosing Instructions for a Run . . . . . . . . . . . 197

Weighting the Function Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

General Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Genetic Programming Reference . . . . . . . . . . . . . . . . . . . . . . . . . . 201

The Genetic Programming Algorithm . . . . . . . . . . . . . . . . . . . . . 201
Genetic Programming Search Operators . . . . . . . . . . . . . . . . . . 202

Data Files for Direct Text File Import Reference . . . . . . . . . . . . . 204

Training, Validation, and Applied Data . . . . . . . . . . . . . . . . . . . . . 205
What Are Training, Validation and Applied Data Files? . . . . . . . . 206
Creating Training, Validation, and Applied Data Files . . . . . . . . . 207
A Shortcut for Creating Data Files . . . . . . . . . . . . . . . . . . . . . . . . 209

Sample Data Sets Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

The Fractionating Column Regression Problem . . . . . . . . . . . . . 209
The Gaussian Classification Problem . . . . . . . . . . . . . . . . . . . . . 210

Population, Program, Instruction Block and Instruction

Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
The Structure of a Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Literature Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

Instruction Set Reference . . . . . . . . . . . . . . . . . . . . . . . 215

DiscipulusTM Software Owner’s Manual

Page xiv

Addition Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

FADD ST(0), ST(%r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
FADD ST(%r), ST(0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
FADD [ESD+%d1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Arithmetic Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

FABS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
FCHS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
FSCALE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
FSQRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

Comparison Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

FCOMI ST(0), ST(%r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

Condition Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

FCMOVB ST(0), ST(%r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
FCMOVNB ST(0), ST(%r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
JB EPI+6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
JNB EPI+6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

Data Transfer Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . 224

FXCH ST(%r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

Division Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

FDIV ST(0), ST(%r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
FDIV ST(%r), ST(0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
FPREM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
FDIV [ESD+%d1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

Exponential Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Multiplication Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . 229
FMUL ST(0), ST(%r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
FMUL ST(%r), ST(0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
FMUL [ESD+%d1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

DiscipulusTM Software Owner’s Manual

Page xv

Rotate Stack Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

FDECSTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
FINCSTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Subtraction Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

FSUB ST(0), ST(%r) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
FSUB ST(%r), ST(0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
FSUB [ESD+%d1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Trigonometric Instruction Group . . . . . . . . . . . . . . . . . . . . . . . . . . 234

FCOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
FSIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

DiscipulusTM Software Owner’s Manual

Page xvi

DiscipulusTM Software Owner’s Manual

Page 1

Starting Up
Thank you for your purchase of Discipulus, Notitia and Solution
Analytics, the world’s first and fastest commercial Genetic
Programming and data analysis software. Discipulus writes computer
programs--automatically--in Java, C, C Sharp, Delphi and Intel
assembler code, all on a desktop computer.

Discipulus uses a multi-run evolutionary algorithm to evolve computer

programs from your data ("evolved programs"). These evolved
programs are high-precision models built from your data and map your
inputs (the independent variables) to your output (the dependent
variable).

Unlike statistical techniques, neural networks, decision trees and the

like, Discipulus builds models without any user tuning. It is self-tuning
and self-parameterizing. In fact, you do not have to know anything
about Genetic Programming to produce results that rival those of
experts.

All versions of Discipulus come with Notitia, a data access, data

cleaning, and data preparation module and Solution Analytics, which
provides statistics and graphics for your evolved programs. You install,
uninstall the three applications together. All three are integrated in their
respective GUI’s.

You may find the following additional introductory information about

using Discipulus on your system at the following locations:

• Discipulus Overview on page 2;

• Problem Types and Fitness Functions on page 3;

• Data Import and Preparation using Notitia on page 3

• Evolved Program Graphic and Statistical Analysis using Solution

Analytics Software on page 4

• Sample Data Sets Available on page 4;

• Minimum System Requirements on page 5;

DiscipulusTM Software Owner’s Manual

Page 2 Starting Up

• Installation on page 5;

• Uninstall on page 7

• Enter an Activation Code on page 7

• Convert the Demonstration Version to a Purchased Version on

page 89

• Extend an Expiring or Expired License on page 8

• Upgrade or Add-On to an Existing License on page 8

• Deactivate or Move a License on page 9

• Technical Support on page 10; and

• Important License Agreement Reminder on page 10.

Discipulus Overview
Discipulus writes computer programs from examples you give it. These
examples are contained in "training data," "validation data" and "applied
(or testing) data" that you provide to Discipulus.

Several years of research and development went into Discipulus. It

combines innovative techniques that have never before been available in
commercial software.

Among other things, several fundamental technical breakthroughs make

Discipulus about 60 to 200 times faster than previous software. For
example, Discipulus evolves native machine code directly. This
accounts for the blazing speed of Discipulus. This and other techniques
are protected under U.S. Patents 6,493,686, 6,098,059 and 5,946,673.

But you do not have to deal with the intricacies of machine code and
genetic programming directly. Discipulus configures itself intelligently
as it writes programs.

For the advanced user, Discipulus gives you detailed control over every
aspect of its operation. In fact, Discipulus wraps its low level operations
in a high-level interface that lets you get as close to the machine code as
you want – or you may stay far away.

DiscipulusTM Software Owner’s Manual

Starting Up Page 3

For those advanced users, we recommend Genetic Programming – An

Introduction. On the Automatic Evolution of Computer Programs and
its Applications by Banzhaf, Nordin, Keller and Francone (1998) as the
starting point for furthering your understanding of Genetic
Programming. It is a good, textbook style, overview of the field. It may
be obtained from bookstores and online booksellers.

Problem Types and Fitness Functions

Discipulus will build four types of models out of the box: regression,
classification, ranking and logistic. You choose the type of model you
want to build by choosing a "fitness function." The fitness function is
the method that Discipulus uses to determine how "fit" an evolving
program is during a project. In more detail, the four types of models you
may build are:

1. Regression models or curve fitting. Regression is available in all

versions of Discipulus;

2. Classification models, which distinguish category X from category

Y. Classification is available in all versions of Discipulus;

3. Ranking models, which rank items in terms of a the likelihood

they are in a target class designated by you. Ranking models are
available in the Enterprise Plus version of Discipulus and available
as an add-on to other versions; and

4. Logistic Regression Models, which model the probability that an

item is in category X or not in category X. Logistic Regression
models are available in the Enterprise Plus version of Discipulus
and available as an add-on to other versions

For specialized models that do not fall into one of the above categories,
both the Enterprise and Enterprise Plus versions of Discipulus allow you
to design your own fitness function. This permits you to configure
Discipulus to solve almost any modeling problem you can imagine.

Data Import and Preparation using Notitia

Discipulus is bundled with Notitia, which is data preparation, cleaning,
and import software. Notitia lets you import, clean-up, transform and
split data for use in Discipulus. Then, when you want to apply a

DiscipulusTM Software Owner’s Manual

Page 4 Starting Up

Discipulus evolved model to new data, Notitia will apply exactly the
same transforms to the new data.

Notitia opens directly from the Discipulus Project Wizard and returns
data directly to Discipulus when you are done with it.

For more information, please see the Notitia tutorial and manual, which
was installed on your hard disk when you installed Discipulus.

Evolved Program Graphic and Statistical Analysis

using Solution Analytics Software
Discipulus comes bundled with Solution Analytics software. Solution
Analytics provides detailed graphic analysis, statistics, and comparison
of the outputs of your evolved programs.

Solution Analytics opens directly from the Discipulus Interactive

Evaluator and Team Solutions windows. Solution Analytics was
installed on your hard disk when you installed Discipulus.

For more information, please see the Solution Analytics Owner’s

Manual, which was installed on your hard disk when you installed
Discipulus.

Sample Data Sets Available

Discipulus comes with four sample data sets located in the /Data Folder.
On Windows 2003 and earlier systems, the /Data folder is located in:

C:\Program Files\AimApps\Discipulus5\Data\

On Windows Vista and Windows 7 Operating systems, the /Data folder

is located in:

C:\Users\Public\AimApps\Discipulus5\Data\

The four data sets may be described as follows:

1. The "fractionating column" data set is an example of a regression

problem. This data set is pre-split for you and contained in three
text files, labeled training, validation and applied. You would use
these files for direct text file import.

DiscipulusTM Software Owner’s Manual

Starting Up Page 5

2. The "gaussian" data set is an example of a classification problem.

It may also be used for Ranking and Logistic Regression type
problems. This data set is pre-split for you and contained in three
text files, labeled training, validation and applied. You would use
these files for direct text file import.

3. The tutorial data set is in an Excel file. A set of training data is in

one tab worksheet. A set of scoring data is in a different tab. You
would import these data using Notitia. There are deliberate data
mistakes in this file that are used in the tutorial. It is a
classification problem and may also be used as a ranking or
logistic regression problem. The file is called:

"Discipulus_Notitia_Tutorial_Data_File.xls"

4. The last file is an Excel file that you would import with Notitia. It
is organized like the tutorial file into training and scoring tabs. It
too has deliberate data errors for you to experiment with fixing in
Notitia. It is a regression problem. It is called:

"Fractionating_Column_With_Phase.xls"

Minimum System Requirements

You must have at least the following equipment to run Discipulus.

• A computer containing an Intel 486 (with floating point coprocessor)

or any chip that implements the Intel 486 (with math coprocessor)
instruction set.

• Windows 2000 SP2 or other, more recent, 32 bit Windows operation

system.

• 128 Megabytes of RAM. 1 Gigabyte preferred.

Installation
This section documents how to install Discipulus, together with the
companion Notitia data preparation and Solution Analytics graphic
analysis applications. Note, no separate installation is required for

DiscipulusTM Software Owner’s Manual

Page 6 Starting Up

Notitia or Solution Analytics. They all install together. This is the install
process.:

• IMPORTANT!!! If you have a previous version of Discipulus

installed, uninstall it. Save any project files and Program (*.ind) files
you may have stored in the \\AIMAPPS\DISCIPULUS folders before
doing that.

• You will download an installer file from RMLTECH.COM to your

local computer. This could occur when you download the demo
version or when you download one of the purchased versions.

• Double Click on the installer file and the install process will begin on
your computer. You will see this screen:

• Click "Next" and follow the prompts as in a typical Windows

application.

Important Note: DO NOT install more than one copy of any version
of Discipulus on a single computer. It will cause unpredictable and
irregular performance for both copies.

DiscipulusTM Software Owner’s Manual

Starting Up Page 7

Uninstall
Uninstall Discipulus, Notitia and Solution Analytics from the Windows
control panel as you would a typical Windows program.

Enter an Activation Code

You will get an activation code by email each time you purchase,
extend, upgrade, or add-on to a Discipulus license. To enter that
activation code, select the Registration menu and then select "Activate
License." You will see the following window:

Just enter your activation code into the four white boxes and click the
Activate button. Your activation code comes in the email you will
receive after you purchase your license.

DiscipulusTM Software Owner’s Manual

Page 8 Starting Up

For the demonstration version of Discipulus, no activation code is

required.

When you finish, your version of Discipulus will now conform to the
license for which you just entered the activation code.

Convert the Demonstration Version to a Purchased

Version
The demonstration version of Discipulus is active for 15 days. During
that time, it provides all features included in the Enterprise Plus version.
At the end of 15 days, it will stop working. You may convert it to one of
the licensed versions of Discipulus such as Professional, Enterprise, or
Enterprise Plus from Discipulus or by logging directly onto
WWW.RMLTECH.COM.

To do so from Discipulus, select the Registration menu and then select

the Purchase menu item. You will be directed to the RML website
where you may purchase a license.

After the purchase, you will get an email containing your activation
code. Enter the activation code as described in Enter an Activation Code
on page 7.

Extend an Expiring or Expired License

Discipulus is licensed for one year. You will be warned before your
license expires. To extend your license, select the Registration menu and
click on the Upgrade/Add-On/Extend menu item. You will be directed
to the RML website where you may extend your license.

After the extension, you will get an email containing your activation
code. Enter the activation code as described in Enter an Activation Code
on page 7.

Upgrade or Add-On to an Existing License

You may upgrade to a more advanced version of Discipulus or you may
add features (such as advanced ranking and logistic fitness functions)
onto your existing Discipulus version. To do so, select the Registration
menu and click on the Upgrade/Add-On/Extend menu item. You will be
directed to the RML website where you may upgrade your license.

DiscipulusTM Software Owner’s Manual

Starting Up Page 9

After the upgrade or extension, you will get an email containing your
activation code. Enter the activation code as described in Enter an
Activation Code on page 7.

Deactivate or Move a License

You may wish to move your Discipulus license from a first computer to
a second computer. To do so, select the Registration main menu and
then click Deactivate. You will see the following message box:

Click, the "Yes" box and then click the "Deactivate" button. The
following message box should then display:

You may now install Discipulus on another computer. To activate it on

the second computer, just use the same Activation Code you used for the
first computer (you got that Activation Code in an email when you
purchased the license you just deactivated).

DiscipulusTM Software Owner’s Manual

Page 10 Starting Up

Warning: Deactivating Discipulus on a computer disables that

computer permanently from running Discipulus. It cannot be
reinstalled on that computer or reactivated on that computer.

Technical Support
For simple technical support problems, please contact us at
[email protected].

For more complex issues, an hourly-charge technical support package is

available for purchasers of the Professional, Engineering, or Enterprise
versions of Discipulus.

Important License Agreement Reminder

Discipulus creates computer programs. Your use, licensing and
exploitation of these evolved programs may be subject to a run-time
deployment license and is constrained by the Discipulus License
Agreement, which you may find in the installable files you acquired
from us. Please review the Discipulus License Agreement carefully to
determine the scope of your license.

DiscipulusTM Software Owner’s Manual

Page 11

Frequently Asked Questions

Discipulus, and its companion applications, Notitia data preparation and
Solution Analytics, are an integrated suite of applications that provide
advanced statistical, analytic and predictive modeling capabilities.

Here are some of the questions that are often raised by users about
Discipulus, Notitia and Solution Analytics. (Note, this is a Frequently
Asked Questions document, not full documentation of these three
products. For full documentation, please refer to the Discipulus, Notitia,
and Solution Analytics Owner’s Manuals.)

• How Do I Use Discipulus to Build Models? on page 13

• What Does Discipulus Do? on page 14;

• How Does the Notitia Data Preparation Module Work with

Discipulus? on page 15

• What Does the Solution Analytics Visualization Module Do? on

page 18

• What Kinds of Problems Will Discipulus Handle? on page 22

• What is the Difference between a Project and a Run in Discipulus?

on page 23

• What Are "Program Models" and "Team Models?" on page 23

• How Do I Import Data into Discipulus? on page 24

• How Do I Transform, Clean-up, and Split My Data before I Model

with Discipulus? on page 25

• How do I Find and Handle Outliers and Missing Values before I

Model with Discipulus? on page 26

• Can I Get Data into Discipulus from Excel Files, Databases or

Windows Connections? on page 26

• What Kind of Data Does Discipulus Use? on page 26

DiscipulusTM Software Owner’s Manual

Page 12 Frequently Asked Questions

• How Do I Set-up Text Data Files for Direct Text File Import Into
Discipulus? on page 28

• What Types of Data May I Use in Data Files? on page 29

• What Values Should I Use for the Target Outputs for Classification
Problems, Ranking Problems, and Logistic Regression Problems? on
page 30

• What Are Training, Validation, and Applied Data Files? on page 30

• What Does an Input Data File Look Like? on page 31

• How Should I Split My Data among the Training, Validation, and

Applied Data Files? on page 32

• Is There a General Rule for Dividing Data? on page 32

• What if I Have a Very Small Data Set? on page 32

• How Should I Split Time Series or Sequential Data? on page 32

• Is There an Easy Way to Create a Data File for Direct Text File
Import? on page 33

• Does Discipulus Come with Sample Data Sets I Can Run? on page 33

• How Do I Start a Project? on page 34

• What Should I Look for when a Project Is Running? on page 40

• How Do I Make Runs in a Project Terminate? on page 41

• How and When Do I Terminate a Project? on page 42

• How Do I Continue a Project After I Have Stopped It? on page 45

• Can I View the Predicted Outputs of an Evolved Best Program or a

Best Team? on page 45

• How Do I View an Evolved Best Program Created by Discipulus? on

page 48

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 13

• How Do I Save an Evolved Best Program Created by Discipulus? on

page 50

• How Do I View an Evolved Best Team Created by Discipulus? on

page 52

• How Do I Save an Evolved Best Team Created by Discipulus? on

page 54

• How Do I Open the Interactive Evaluator Window? on page 56

• How Do I Graph and Analyze the Outputs of a Selected Best

Program Created by Discipulus? on page 57

• How Do I Graph and Analyze the Outputs of a Selected Best Team on

page 64

• Is There a Way to Find Out which Input Variables Are the Most
Important? on page 68

• How Do I Use the Models Discipulus Has Created? on page 69

• What Do Input001, Input002, etc. Represent in the Interactive

Evaluator? on page 71

• What Do f[0], f[1] Represent in Interactive Evaluator? on page 71

• What about Overfitting? on page 71 and

• How Do I Do a Single Run? on page 74.

How Do I Use Discipulus to Build Models?

There are five easy steps:

1. Import your data files. See:

a) How Do I Import Data into Discipulus? on page 24;

b) How Do I Transform, Clean-up, and Split My Data before I Model

with Discipulus? on page 25;

DiscipulusTM Software Owner’s Manual

Page 14 Frequently Asked Questions

c) How do I Find and Handle Outliers and Missing Values before I

Model with Discipulus? on page 26;

d) Can I Get Data into Discipulus from Excel Files, Databases or

Windows Connections? on page 26;

e) What Kind of Data Does Discipulus Use? on page 26; and

f) How Do I Set-up Text Data Files for Direct Text File Import Into
Discipulus? on page 28;

2. Start a Project. Use the Project Setup Wizard to start a project.

See: How Do I Start a Project? on page 34;

3. Stop the Project. When the project has produced results to your
satisfaction, use the Finish Project button to stop the project (you
may resume it later). See: How and When Do I Terminate a
Project? on page 42;

4. Analyzing the Evolved Models. After the project is finished,

Discipulus builds a comprehensive report of the results of the
project. You can choose among the thirty best evolved programs
and the five best evolved team models. You can look at graphs of
the performance of these models and the C, Java, C Sharp, Delphi
or Intel Assembler code of the evolved models. See: How Do I
Graph and Analyze the Outputs of a Selected Best Program
Created by Discipulus? on page 57.

5. Deploying the Evolved Models. You can deploy the models

evolved by Discipulus (subject to your license agreement) in two
ways: (1) From within Discipulus using the Data Window; or (2)
as C, Java, C Sharp, Delphi, or Intel Assembler code. For more
information about deploying Discipulus models, see, How Do I
Use the Models Discipulus Has Created? on page 69.

What Does Discipulus Do?

Discipulus creates models of data that you provide to it. The models are
created as computer programs in Java, C, C Sharp, Delphi, or Intel
assembler. Because Discipulus uses Darwinian natural selection to
create these programs, they are referred to here as "evolved programs."

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 15

Evolved program models map your inputs to your target output. Put
another way, program models map the independent variables to your
dependent variable. So if you have two inputs (independent variables)
and one output, Discipulus finds and parameterizes an optimal form for
the function f, where:

TgtOutput = f (InputOne,InputTwo)

You give each pairing of inputs and target output to Discipulus in a

single row of data.

Discipulus keeps all information about evolved programs in a "Project

File." Thus, your work in Discipulus is organized into "Projects." Each
project summarizes the models built during up to hundreds of individual
Genetic Programming runs. Project files have the extension "*.BST."

Discipulus is a “supervised learning” system. That means that you must

provide training, validation and applied data files to Discipulus that
contain matched inputs and outputs.

So, in essence, you provide the data files that contain matched inputs
and outputs. Each row of data contains the matched input and outputs.
From them, Discipulus creates models that allow you to predict outputs
from similar inputs.

When you are finished, you have a high-precision model that lets you
predict outputs for new data.

How Does the Notitia Data Preparation Module Work

with Discipulus?
Before you train your models, Notitia performs data access, data
preparation, data transformation, and data splitting.

Then, after you train your models, Notitia applies the same transforms to
new data so that it may be scored.

DiscipulusTM Software Owner’s Manual

Page 16 Frequently Asked Questions

When you are setting up a Discipulus project for training, you open
Notitia directly from the Discipulus Project Wizard as shown in
Figure 1.

Figure 1. Discipulus Project Wizard with Notitia Data Import

Highlighted

Click the "GO" button and Notitia will open. In Notitia, you will be able
to access Excel, database, connections, and text files, find missing
values and outliers, transform and group data, and split your data up into
training, validation and applied data for use in Discipulus. In addition,
Notitia will store all your transformation settings on the data when you
exit and return to Discipulus.

On the other hand, when you are using Discipulus to score new data
after you have trained models, then you open Notitia from the main file
menu as follows:

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 17

Figure 2. Launching Notitia from the File/Load New Applied

Data Menu

When you click on Load New Applied Data, the following window pops
up if you originally imported your training data from Notitia XML files.
Otherwise, Notitia just opens up directly. If this window pops up, make
the selections shown in Figure 3.

Figure 3. Select Import Data From Notitia and Click Launch

Notitia

Once you click the "Launch Notitia" button, Notitia will open and all of
the stored transforms from your training data will be active. Select a
Data Set in Notitia (if you already loaded the scoring data) or Import a

DiscipulusTM Software Owner’s Manual

Page 18 Frequently Asked Questions

data set for scoring. The active data set will be returned to Discipulus
with all of the stored transforms applied to it.

What Does the Solution Analytics Visualization Module

Do?
Solution Analytics may be opened from several places in Discipulus. It
provides graphing and statistics for selected best program models or best
team models.

For example, in a logistic regression problem, Solution Analytics would

show charts appropriate to that problem type. An example is shown in
Figure 4.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 19

Figure 4. Solution Analytics Chart for a Logistic Regression

Problem. Figure Shows the ROC Chart for a Best
Program Model

On the other hand, for a regression problem, Solution Analytics graphs

both the predicted outputs and the residuals. An example of one of the
regression charts is shown in

DiscipulusTM Software Owner’s Manual

Page 20 Frequently Asked Questions

Figure 5. Solution Analytics Chart for Regression Residuals.

Figure Shows a Q-Q plot for the Residuals

Solution Analytics starts from two different places in Discipulus,

depending on whether you want to chart a team model or a program
model.

For Best Team Model Charting. When a project is finished the

Reports window opens up automatically. Click on the "Team Solutions"
tab. Then select one of the teams with your mouse and click on the
"Start Solution Analytics" button as shown in Figure 6.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 21

Figure 6. Starting Solution Analytics from the Team Solutions

Tab to View Charting for Best Team Models

For Best Program Model Charting. When a project is finished, the

Reports Window opens up automatically. Select one of the programs in
the "Best Programs" tab and click the "Analyze" Button as shown in
Figure 7.

Figure 7. Selecting a Best Program for Analysis from the Best

Programs Tab

The "Interactive Evaluator" window will open up and the selected

program will be displayed as shown in Figure 8.

DiscipulusTM Software Owner’s Manual

Page 22 Frequently Asked Questions

Figure 8. The Interactive Evaluator Window with the Start

Solution Analytics Button Highlighted

Click on the "Start Solution Analytics" button above and whatever

program is shown in the "Program Body is sent to Solution Analytics for
display.

What Kinds of Problems Will Discipulus Handle?

Discipulus handles the following two types of problems out-of-the-box:

• Regression Problems (fitting a curve);

• Binary Classification Problems. These are classification problems

where the task is to classify the data set into one of two categories.

• Ranking Problems. The task is to rank rows of data according to the

criterion in your target output. For example, rank customers by the
ones most likely to respond to a mail solicitation or rank credit card
transactions by likelihood that they are fraudulent.

• Logistic Regression Problems. Map your inputs to the probability that

a row of data is in class 1. This is much like traditional logistic

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 23

regression except that Discipulus defines and maps the functional

form.

• Custom Problems. Discipulus lets you write a DLL that contains a

fitness function of your choosing that is suited to the problem you
have. The two custom fitness function interfaces are very flexible and
permit you to handle almost any data or problem type.

Regression and Classification are available in all versions of Discipulus.

The advanced fitness functions (ranking, logistic regression, and custom
fitness functions) will be available depending on what version you
license. The advanced fitness functions may also be purchased as an
add-on.

What is the Difference between a Project and a Run in

Discipulus?
A Discipulus "run" starts with a single population of evolving programs,
evolves them into high-precision models, and stops.

A Discipulus "project" is a collection of runs performed one after

another or in parallel. Generally, each of the runs in a project is
performed with a different random seed and may be performed with
different run parameters.

Although it is possible to force Discipulus perform a project that

consists of a single run, you will usually want to perform multiple runs
during a project. Extensive research by RML has established that the
multiple run approach is much more likely to produce good results than
the single run approach.

What Are "Program Models" and "Team Models?"

Discipulus writes computer programs that model your data. The
programs are available to you in C, Java, C Sharp, Delphi or Intel
Assembler.

A "Program Model" or an "Evolved Program" is a single program

written by Discipulus that models your data.

DiscipulusTM Software Owner’s Manual

Page 24 Frequently Asked Questions

A "Team Model" is a combination of single Program Models that

Discipulus has combined to produce a better result than any of the single
Program Models.

Discipulus assembles both Program Models and Team Models during

each project. You may find these models and information about their
performance in the Reports Window. See The Reports Window on
page 94.

How Do I Import Data into Discipulus?

There are three ways to import data into Discipulus:

1. Method 1. Direct import of text files;

2. Method 2. Import data using the Notitia data import and

preparation module; or

3. Method 3. Create XML data files for import using Notitia as a

standalone program (usually helpful when you are doing cross-
validation) and then import those XML files into Notitia.

You may select among these three methods in the Project Wizard in the
Select Data Sets window, shown below:

Figure 9. Select Data Sets Window in the Project Wizard

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 25

Choose Method 1, direct import of text files, by selecting the "Import

from Text Files" radio button. For more information about how to use
this method of text file import, please see How Do I Set-up Text Data
Files for Direct Text File Import Into Discipulus? on page 28.

Choose Method 2, the Notitia data preparation and splitting module, by

selecting the "Import Using Notitia Data Import and Preparation
Software" button. This method opens Notitia immediately and gives you
access to Notitia’s powerful data analysis, cleanup, transformation,
outlier analysis, missing value location, and data set splitting
capabilities. Notitia will import files from text files, Excel files, database
files, and existing Windows connections. Once you finish using Notitia,
it will return the transformed, split, and prepared data to Discipulus for
modeling. In addition, Notitia saves the complete path of
transformations of the data. When you return to Notitia from a
Discipulus project, those same transforms are applied to new data to
which you are applying evolved Discipulus models.

For more information about using Notitia with Discipulus, please see the
Notitia Owner’s Manual and the help files that accompany Notitia. The
key thing to understand here is that Discipulus and Notitia link
automatically

Choose Method 3, importing a Notitia XML file, by selecting the

"Import Notitia XML File" button. To use this option, you must have
already created the Notitia XML file using Notitia as a standalone
application. For more information about how to use this option, please
see the Notitia Owner’s Manual and the help files accompanying
Notitia.

How Do I Transform, Clean-up, and Split My Data

before I Model with Discipulus?
You do that by choosing to import data via Notitia. You make that
choice in the Discipulus project wizard. See How Do I Import Data into
Discipulus? on page 24.

Notitia lets you change character data to numeric, transform your

numeric data, find and handle outliers and missing values, group values
like "true" and "TRUE", and then split the transformed data into
training, validation and applied data for Discipulus. For more

DiscipulusTM Software Owner’s Manual

Page 26 Frequently Asked Questions

information about how to use this option, please see the Notitia Owner’s
Manual and the help files accompanying Notitia.

How do I Find and Handle Outliers and Missing Values

before I Model with Discipulus?
You do that by choosing to import data via Notitia from the Discipulus
project wizard. See How Do I Import Data into Discipulus? on page 24.

Notitia lets you change character data to numeric, transform your

numeric data, group values like "true" and "TRUE" as one value, find
and handle outliers and missing values, and then split the transformed
data into training, validation and applied data for Discipulus. For more
information about how to use this option, please see the Notitia Owner’s
Manual and the help files accompanying Notitia.

Can I Get Data into Discipulus from Excel Files,

Databases or Windows Connections?
Yes. You do that by choosing to import data via Notitia in the
Discipulus project wizard. See How Do I Import Data into Discipulus?
on page 24.

Notitia lets you import data from virtually any data source that has
uniform rows and columns. For more information about how to use this
option, please see the Notitia Owner’s Manual and the help files
accompanying Notitia.

What Kind of Data Does Discipulus Use?

Discipulus accepts spreadsheet-type data sets, that is, files with a fixed
number of columns and rows. Only rectangular spreadsheets are
accepted--that is, every row has the same number of columns and every
column has the same number of rows. Every cell in the spreadsheet must
have a meaningful value in it.

Rows. For training, each row should contain inputs (independent

variables) paired with a single target output (dependent variable). For
scoring new data after your models are trained, there need not be a
target output.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 27

Columns. Each column represents either one of the inputs or the target
output.

Column Names. If you import data from Notitia or using Notitia XML
files, you may associated a name (or column heading) with each column
and that column name will be used in the evolved programs and
reporting.

On the other hand, if you use direct text file import, no column names
are permitted and Discipulus will name your inputs v000, v001,
v002 . . . in the order of the columns. There are special rules for
setting up files for direct text file import, which you may review here:
How Do I Set-up Text Data Files for Direct Text File Import Into
Discipulus? on page 28.

Data Sets for Training. For training models, Discipulus requires

training data and validation data sets, at a minimum. Both of these data
sets are required for training and for selecting best programs. The
training and validation data may be identical. We do not generally
recommend that except for large data sets where overfitting is not an
issue. Optionally, you may provide "applied" data to Discipulus.
Applied data plays no role in training or selecting the best program. So
the performance on the Applied Data lets you see how your evolved
program performs on truly unseen data.

If you use Notitia to import data to Discipulus, Notitia will let you split
the data into training, validation and applied data sets in a variety of
ways. If you use direct text file import, you must split your data into
separate data sets yourself.

Data Sets for Scoring. For scoring data after you have finished training
a model, Discipulus requires a file with the same number of inputs as
the training data. This scoring data may optionally have a target output
column.

Data Set Dimensions. Each data set must have at least two rows and at
least two columns. The maximum number of inputs is 64.

Data Types. Discipulus accepts only numeric data. The only non-
numeric characters that are permissible are a decimal point and "E"
when used in correct exponential floating point notation.

DiscipulusTM Software Owner’s Manual

Page 28 Frequently Asked Questions

If you import data to Discipulus using Notitia, Notitia will accept non-
numeric inputs and help you convert them, consistently, to numeric
values. If you use direct text file import, you must make that conversion
yourself outside of Discipulus.

Target Output Column Location. Discipulus will always interpret the

rightmost column in a data set as the target output (unless you are
scoring data and there is no target output column).

How Do I Set-up Text Data Files for Direct Text File

Import Into Discipulus?
Discipulus creates models using data that you provide to it. For training,
you provide it in training, validation and (optionally) applied data files.
One of the ways to do that is via direct import of text files in the Project
Wizard.

Figure 10. The Select Data Sets Window in the Project Wizard
with the Text File Import Section Highlighted

Direct text file import occurs when you select the "Import from Text
Files" button and then browse for training, validation, and (optionally)
applied data files for import.

This method is fast; but it requires that you split the data into training,
validation and applied data before import. In addition, if you use this

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 29

method, you may not use column titles. Instead, you import unnamed
columns and Discipulus will name them v000, v001, v002 etc.

The following topics discuss how to set up your data files for direct text
import:

• What Types of Data May I Use in Data Files? on page 29;

• What Values Should I Use for the Target Outputs for Classification
Problems, Ranking Problems, and Logistic Regression Problems? on
page 30;

• What Are Training, Validation, and Applied Data Files? on page 30;

• What Does an Input Data File Look Like? on page 31;

• How Should I Split My Data among the Training, Validation, and

Applied Data Files? on page 32;

• Is There an Easy Way to Create a Data File for Direct Text File
Import? on page 33.

In addition, the sample regression and classification files that come with
Discipulus will import correctly into Discipulus. You may find it useful
to review them. You may find out more about the sample files in Does
Discipulus Come with Sample Data Sets I Can Run? on page 33.

What Types of Data May I Use in Data Files?

Discipulus accepts files with numeric data only. Integers and floating
point numbers are OK.

Here are some values that Discipulus will accept:

• 1.0

• 100

• 2345.67

Here are some values that will not read into Discipulus:

• $1.00 (dollar sign not allowed)

DiscipulusTM Software Owner’s Manual

Page 30 Frequently Asked Questions

• 1,235 (comma not allowed)

• True (letters not allowed)

• "Single" (neither letters nor quotation marks are allowed).

What Values Should I Use for the Target Outputs for

Classification Problems, Ranking Problems, and
Logistic Regression Problems?
Discipulus performs binary classification. That is, it will create models
that classify your data into either one class or the other.

The only hard and fast rule for binary classification problems is that the
target output column in your data files should contain only two numeric
values. Typically, the values 0,1 or -1,1 are used.

In any event, Discipulus refers to the lower value you give it as Class
Zero and to the higher value you give it as Class One.

For Ranking and Logistic Regression problem types, the only values
permitted in the target output column are 0 and 1.

What Are Training, Validation, and Applied Data Files?

Discipulus takes three different types of data files. The format of all
three is identical. They are:

• Training data files;

• Validation data files; and

• Applied data files.

You must load training and validation data to run Discipulus because
they are used in model creation. You may load applied data before or
after a project is finished. Applied data plays no part in building models.
Instead, applied data lets you see how the models work on data that
played no role in building the models.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 31

What Does an Input Data File Look Like?

Each line of data in your data files contains a matched pair of inputs and
outputs. (Inputs are the information you always know about your model.
The Target Output is what you would like to predict).

For example, a very small training file with two inputs and one output
might look the way it appears in Table 1 (the first two columns are the
inputs and the third column is the output):1 This file has only three

Table 1.
Input 1 Input 2 Target Output
2.0 4.0 6.0
3.1 5.0 8.1
1.3 3.2 4.5

examples for Discipulus to learn from. Ordinarily, your data files should
have many more examples than this.

The target output you want to predict is always in the rightmost column.
Even though the columns in Table 1 have headings, you should not use
headings in Discipulus direct text file input data files. If you use Notitia
to import data, you may use column headings.

From this data file, Discipulus would build a model that predicts the
output from the inputs. An evolved program containing only one line of
code:

Output = Input 1 + Input 2;

would produce the output column from the input columns in this table.

1
The lines and column labels would not appear in a Discipulus data file.
They appear in the above table only for clarity.

DiscipulusTM Software Owner’s Manual

Page 32 Frequently Asked Questions

How Should I Split My Data among the Training,

Validation, and Applied Data Files?
This section covers the following topics:

• Is There a General Rule for Dividing Data? on page 32;

• What if I Have a Very Small Data Set? on page 32; and

• How Should I Split Time Series or Sequential Data? on page 32.

Is There a General Rule for Dividing Data?

Yes. A good start is to divide your data into three equal parts--one each
for the training, validation and applied data sets. (For neural network
practitioners, we would discourage splitting your data 90%-10% as is
typical for training and testing data sets in neural networks.)

Each of the three data sets should be representative of the problem you
are trying to model.

What if I Have a Very Small Data Set?

Small data sets can cause models to be overfit. Overfit means that the
model works very well on the training and validation data; but not so
well on applied data (which is not used in training).

Discipulus has many built in protections against overfitting. In general,

Discipulus is much less likely to overfit data than other modeling
software. But it does happen.

If your data set is very small, you may elect not to use an applied data
file. In that case, divide the data equally between the training and
validation data sets. If you have to make one bigger than the other, it is
usually better to have the training data larger.

How Should I Split Time Series or Sequential Data?

Where your data set contains examples that are sequential, such as time
series data, it is good practice to take following three steps:

1. Take the last third of your examples in sequence and put them into
your applied data set. That way, you can evaluate the performance
of your evolved models on the applied data, which is later in time

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 33

than the training data. This is very important for good model
building.

2. Take the first two-thirds of your examples (in sequence) and

randomize them. Then split these data into two equal size data
sets--one for training and one for validation.

Is There an Easy Way to Create a Data File for Direct

Text File Import?
Yes. Although there are many methods, the easiest way is to use
Microsoft Excel. Create a spreadsheet containing your data for a single
file. Each input is a column of data in the spreadsheet. The output is a
single column of data on the farthest right data column of the
spreadsheet.

Then create a tab delimited text file from your spreadsheet as follows. In
Excel, make the following menu selections:

• On the File menu, click Save As;

• Then in the dialog box that pops up, you should select Text Only
from the Save As Type Box and name the text file you want to create.

Does Discipulus Come with Sample Data Sets I Can

Run?
Yes. In the /Data directory, you can find a sample of a regression
problem and classification problem. Two of these sample sets have
already been split into training, validation and applied data files for your
convenience and are designed for you to test text file import and
modeling. In addition, two Excel files are included: (1) One containing a
classification problem that is used in the tutorial; and (2) The other
containing a regression problem with deliberate data errors introduced
for you to experiment with in learning the Notitia data preparation
module. See What Are Training, Validation, and Applied Data Files? on
page 30 and How Should I Split My Data among the Training,
Validation, and Applied Data Files? on page 32.

Here is a description of the two sample text data file:

DiscipulusTM Software Owner’s Manual

Page 34 Frequently Asked Questions

• The pre-split text files containing regression data are called

"fractionating column" data. These data are from an industrial
process and are made up of past known pairings of present and future
column states; and

• The pre-split text files containing classification data are called

"gaussian" data. This is a binary classification problem. So you may
also use the gaussian data set to test ranking and logistic regression
problem types. The gaussian data set has 24 inputs. The first 8 inputs
all contribute to solving the classification problem. The last 16 inputs
are random numbers. This data set demonstrates Discipulus’ ability to
distinguish relevant from irrelevant inputs.

Here is a description of the Excel file containing sample regression data.

• The name of the file is Fractionating_Column_With_Phase.XLS.

• This file has deliberate data-cleansing errors in it to demonstrate the

data preparation capabilities of Notitia. The errors are described in
the tab labeled "Notes on Intentional Errors."

• The "Training_Data" Tab comprises 800 rows of data containing an

index column, four numeric input columns and one categorical input
column containing the values "Low", "LOW" and "High." It also
contains two missing values (one blank cell and one "?" in a numeric
column) and 14 outliers.

• The "Scoring_Data" tab contains data on which to score a finished

model. This tab contains target outputs

The tutorial file is named "Discipulus_Notitia_Tutorial_Data_File.xls."

It is described at some length in the tutorial about how to use Discipulus
and Notitia together.

How Do I Start a Project?

All projects begin with the Project Setup Wizard. It helps you load in
your data, name the project, configure your project and start your
project. Other than identifying the data files and naming the project, all
parameters for the project are set automatically and intelligently by the
project setup wizard. This section covers the following topics:

• Starting the Project Setup Wizard on page 35; and

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 35

• Using the Project Setup Wizard on page 36.

Starting the Project Setup Wizard

You may start the Project Wizard in two different ways:

1. Every time you start Discipulus, the Project Setup Wizard comes
up automatically. Figure 11 shows Discipulus right after it has
started. The first page of the Project Setup Wizard is showing:

Figure 11. The Project Name and Location Window of the

Project Wizard

2. The alternative way to begin a new project is, on the File Menu,
click New and the Project Setup Wizard will start:

DiscipulusTM Software Owner’s Manual

Page 36 Frequently Asked Questions

Figure 12. Starting a New Project from the File Menu

Using the Project Setup Wizard

The project setup wizard takes you through five simple steps:

1. Name and Save Your Project. When you first start the project
wizard, you see the Project Name and Location Window. Click on
the Browse button to select a folder and name for your project file.
The project file stores all information about the project you will
run.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 37

Figure 13. The Project Name and Location Window. Click on

Browse to Name and Locate your Project File

2. Select Data for Training. The second window in the project

wizard lets you tell Discipulus how and where to get data for
training. That window is shown in Figure 14.

DiscipulusTM Software Owner’s Manual

Page 38 Frequently Asked Questions

Figure 14. Selecting Data for Training

You can get data for Discipulus training by three different

methods: (1) Direct import of text files; (2) Use Notitia data
import and preparation software; or (3) Import Notitia XML files
directly. Much more information is available on the various
methods of importing data at How Do I Import Data into
Discipulus? on page 24.

3. Identify the Problem Type and Fitness Function. The third

window in the project setup wizard is the Select Problem Type and
Fitness Function Window. This window automatically detects
which problem types are appropriate for your target output and
also detects which fitness functions are part of the license you
acquired. Thus, you may find Classification problem types greyed
out if your target output has many different values. And, if you
own the Professional version of Discipulus (with no add-ons), the
advanced fitness functions (ranking and logistic) are not available.
Figure 15 shows the window that lets you choose the problem type
and the fitness function.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 39

Figure 15. The Problem Type and Fitness Function Window of

the Project Wizard

4. Start the Project. The fourth window in the Project Wizard is the
"Customize Parameters and Start Project" window. To start the
project, click on the "GO" button. Discipulus is entirely
configured. As the project proceeds, Discipulus will intelligently
adjust its own configuration. Two notes about the other buttons on
this page:

* The "Options" button allows advanced users to set run and project
parameters. It takes you to the Advanced Options Window.

* More information about setting run and project parameters in the

Advanced Options Window may be found in the Discipulus
Owners Manual, in the sections entitled: (1) Controlling
Discipulus Projects; (2) The Advanced Options Window; and (3)
The Single Run Advanced Options Window.

DiscipulusTM Software Owner’s Manual

Page 40 Frequently Asked Questions

Figure 16. Click "GO" to start your project.

What Should I Look for when a Project Is Running?

The Monitor Project Window appears automatically when you start a
project. This window gives you four types of information about your
project:

• Overview Tab. The Overview Tab contains information such as how

many runs have been completed, the performance of the best program
model found, and the performance of the best team model found so
far. The performance statistics are customized to the problem type
you are running.

• Project History Tab. This tab charts the performance of the project.
It shows the error rate (fitness) for the best program model (the red
line) and the best team model (the green line) at each point in your
project.

• Project Detail Tab. This tab shows detailed information about how
long it has taken individual runs to reach the various error rates found
during the project.

• Current Run Tab. This tab shows the parameters used and progress
of the current run in the project. So each time the project starts a new
run, the values in this tab will change.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 41

For more information about the Monitor Project Window, see The
Monitor Project Window on page 88.

How Do I Make Runs in a Project Terminate?

The default settings for Discipulus start a project in "stepping" mode.
Discipulus starts with very short runs and then increases the length of
the runs as the project continues. In other words, by default, Discipulus
handles run termination for you.

If you want, you can set the run termination criterion manually in the
Advanced Options window shown in Figure 17. What you set will
applied to all runs in the project.

Figure 17. The Advanced Options Window

In the highlighted area, you may select a run termination of either

"generations since start" (that is, since the start of the run) or
"generations without improvement" (that is, generations since there has
been an improvement in the best program in the run.)

You can get to the Advanced Options Window in two different ways--
from the main menu and from the Project Wizard:

DiscipulusTM Software Owner’s Manual

Page 42 Frequently Asked Questions

Method 1. From the Set Up Learning Menu, select, Options.

Method 2. The final window in the Project Wizard is the Customize

Parameters and Start Run window. Select Options from that window as
shown in Figure 18.

Figure 18. The Customize Parameters and Start Run Window in

the Project Wizard. Options Button Highlighted.

How and When Do I Terminate a Project?

You can terminate a Discipulus Project manually or automatically. This
section provides the following additional information about Discipulus
Project Termination:

Manual Project Termination on page 42;

Automatic Project Termination on page 43; and

Guidelines for Project Termination on page 44.

Manual Project Termination

You can always stop a project manually by clicking on the "Finish
Project" button on the Monitor Project Window. Alternatively, you may
click on the Finish tool on the toolbar shown in Figure 19.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 43

Figure 19. The Discipulus Toolbar with the Finish Run Tool
Highlighted

By default, Discipulus runs in "stepping" mode. It starts with a series of

very short runs and then increases the length of the runs as the project
proceeds. In this mode, a project may only be terminated manually.

Automatic Project Termination

You may elect to terminate your project after a fixed number of runs. To
do so, go to the Advanced Options Window, make sure "stepping is
unchecked, and enter a value for Maximum Number of Runs as shown
in Figure 20.

Figure 20. Using the Advanced Options Window to set a

Maximum Number of Runs in a Project to 300

You can get to the Advanced Options Window in two different ways--
from the main menu and from the Project Wizard:

Method 1. From the Set Up Learning Menu, select, Options.

DiscipulusTM Software Owner’s Manual

Page 44 Frequently Asked Questions

Method 2. The final window in the Project Wizard is the Customize

Parameters and Start Run window. Select Options from that window as
shown in Figure 21.

Figure 21. The Customize Parameters and Start Run Window in

the Project Wizard. Options Button Highlighted.

Guidelines for Project Termination

This section provides some practice guideline for determining when to
terminate a project manually. You should try to stop your projects when
it appears that no further improvement is likely. To make this
determination a few of rules of thumb are useful:

1. Look at the Overview Tab of the Monitor Project Window. Is the

performance of the best program or the best team sufficient for
your purposes? If so, you can stop the run.

2. Look at the Project History Tab of the Monitor Project Window. Is

the project still showing steady improvement? If so, you may want
to let the project continue.

3. Look at the Project Detail Tab of the Monitor Project Window.

Select the type of termination criterion (generations without
improvement or generations since start) you wish to examine. Was
the best program located less than one-fourth of the way to the
maximum value shown on that tab? If so, you might consider

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 45

halting the project and doing another project in Fixed Mode with a
shorter termination criterion--perhaps two times the level at which
the best program was found. This may shorten your run time.

How Do I Continue a Project After I Have Stopped It?

On the Run Menu, click on Continue.

Figure 22. Continuing a Run from the Run Menu

If no project is currently open, a file dialog box will pop up. Choose the
project file that you want to continue. The project will continue where it
left off. If there is an open project, that project will continue from where
it left off.

Alternatively, you may continue a run by clicking on the Continue tool

on the toolbar.

Figure 23. Continuing a Run using the Continue Tool on the

Toolbar (Circled)

Can I View the Predicted Outputs of an Evolved Best

Program or a Best Team?
Yes. After a project is over, Discipulus saves the thirty best evolved
program models and the five best team models and automatically brings
up the Reports Window.

DiscipulusTM Software Owner’s Manual

Page 46 Frequently Asked Questions

To see the outputs of one of your best program models, select that
model in the Best Programs Tab of the Reports Window. Then Click the
View Results button as shown in Figure 24.

Figure 24. The Second Best Evolved Program Model Is

Selected in the Best Programs Tab. View Results
Sends that Program’s Outputs to the Data Window.

When you click the View Results button, the Data Window will open. In
Data Window chart view, the output of the program you just selected
will be shown in the "Selected Program" data series. In the Data
Window spreadsheet view, the output of the program you just selected
will be shown in the "Selected Program" column or columns as shown
in Figure 25.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 47

Figure 25. The Output of a Selected Program Model is Shown in

the Selected Program Output Column of the Data
Window (Highlighted)

You can view the outputs of a selected Team Model in a similar way.
Figure 26 shows the Team Solutions Tab of the Reports Window with
the best five member team selected.

DiscipulusTM Software Owner’s Manual

Page 48 Frequently Asked Questions

Figure 26. Best Five Member Team Selected in the Team

Solutions Tab of the Reports Window

If you click on View Results, the output of the selected team is sent to
the Data Window in the "Selected Program" data series. Thus, in Data
Windowchart view, you would see that selected program data as a line
on the chart labeled "Selected Program." In spreadsheet view, you
would see the output of that selected team as a column of outputs in a
spreadsheet. The spreadsheet view is shown above in Figure 25 with the
Selected Program column highlighted.

How Do I View an Evolved Best Program Created by

Discipulus?
When a Discipulus project finishes, Discipulus automatically opens a
Reports Window. The Best Program Tab provides a list of the 30 best
programs from that project. To save one of those programs, select it
with your mouse and then click the Analyze Program button, as shown
in Figure 27. In that figure, the third best program of the project has
been selected.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 49

Figure 27. Best Programs Tab Showing Third Best Program

Selected and Analyze Program Button Highlighted

When you click Analyze Program, the Interactive Evaluator Window

will open and the program you just selected will be displayed along with
performance statistics for that program as shown in Figure 30.

Figure 28. Interactive Evaluator Window Displaying a Best

Program

DiscipulusTM Software Owner’s Manual

Page 50 Frequently Asked Questions

The functionality of the interactive evaluator window is documented in

in the chapter of the Discipulus Owners Manual devoted to that subject.

How Do I Save an Evolved Best Program Created by

Discipulus?
When a Discipulus project finishes, Discipulus automatically opens a
Reports Window. The Best Program Tab provides a list of the 30 best
programs from that project. To save one of those programs, select it
with your mouse and then click the Analyze Program button, as shown
in Figure 29. In that figure, the third best program of the project has
been selected.

Figure 29. Best Programs Tab Showing Third Best Program

Selected and Analyze Program Button Highlighted

When you click Analyze Program, the Interactive Evaluator Window

will open and the program you just selected will be displayed along with
performance statistics for that program as shown in Figure 30.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 51

Figure 30. Interactive Evaluator Window Displaying a Best

Program

At this point, you may save your selected best program using two
different methods. One method saves the selected program as object
code. The other saves it in a format that lets you reload the program into
Discipulus later:

Method 1--Save Program as Object Code: To save your program as

object code, click on the Save Decompiled Program button. That will
bring up the window shown in Figure 31.

DiscipulusTM Software Owner’s Manual

Page 52 Frequently Asked Questions

Figure 31. Save Decompiled Program Window

Just select a computer language to save your best program in. Then
select Browse to designate the folder and file name for your selected
program. Discipulus automatically adds the correct extension for the
particular computer language you select.

Method 2--Save Program in Reusable Format: To save your program

in a format that can be loaded back into Discipulus for further use, click
on the Save Program button in the Interactive Evaluator Window
(Figure 30). This takes you to a Windows browser in which you may
designate the folder and file name for the

How Do I View an Evolved Best Team Created by

Discipulus?
At the end of a project, Discipulus opens the Reports window. The
Team Solutions Tab in the Reports Window shows the best team models
that evolved during that project. Accordingly, Figure 32 shows that the
best team model composed of three evolved programs has been selected
in that window.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 53

Figure 32. The Team Solutions Tab of the Reports Window. In

this Figure, the Best Three-Program Team Solution
is Selected and the View Code Button is Highlighted.

To view the code of the best, three-program team from this window,
select the "View Code" button.

The Team Solutions Code window will pop up. An example is shown in
Figure 33, which shows an evolved team solution in the C language.

DiscipulusTM Software Owner’s Manual

Page 54 Frequently Asked Questions

Figure 33. The Team Solutions Code Window. In this Figure,

the Language Drop Down Box is Highlighted.

Note that when you are in the Team Solutions Code window, you may
select between various computer languages to view the evolved team in,
such as CSharp, Java, and Delphi. You do that by using the Language
drop down box, which is highlighted in the above figure.

How Do I Save an Evolved Best Team Created by

Discipulus?
At the end of a project, Discipulus opens the Reports window. The
Team Solutions Tab in the Reports Window shows the best team models
that evolved during that project. Accordingly, Figure 34 shows that the
best team model composed of five evolved programs has been selected.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 55

Figure 34. The Team Solutions Tab of the Reports Window

To save the object code of the selected team model, click on the Save
Program button (highlighted). That will bring up the window shown in
Figure 35.

Figure 35. Save Decompiled Program Window

Just select a computer language to save your best team in and then select
Browse to designate the folder and file name for your selected team.

DiscipulusTM Software Owner’s Manual

Page 56 Frequently Asked Questions

Discipulus automatically adds the correct extension for the particular

computer language you select.

How Do I Open the Interactive Evaluator Window?

There are two ways to open the Interactive Evaluator Window:

* From the main menu, Click Interactive Evaluator and Select

"Start". The Interactive Evaluator Window opens and the best
program of the project is automatically sent to the window and
displayed.

* After you finish a project, Discipulus automatically creates

comprehensive reports (the Reports Window) that let you select
and analyze the 30 best program models. A sample of the Best
Programs Tab is shown in Figure 36.

Figure 36. Best Programs Tab for a Regression Problem

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 57

From the Best Programs Tab, select the program you wish to view in
interactive evaluator and either double click on that program or click the
"Analyze Program" button. Interactive Evaluator will open with that
program loaded. Figure 37 shows the Interactive Evaluator window
containing the selected regression program.

Figure 37. Interactive Evaluator with a Regression Program

Loaded

Interactive Evaluator gives you a powerful suite of tools to analyze,

optimize, shorten and edit your evolved programs. It is documented at
length in the Discipulus Owners Manual.

How Do I Graph and Analyze the Outputs of a Selected

Best Program Created by Discipulus?
After you finish a project, Discipulus automatically creates
comprehensive reports (the Reports Window) that let you select and
analyze the 30 best program models. The "Best Programs" tab of the
Reports window is the starting point to analyze the best programs
evolved by Discipulus. An example is shown in Figure 38 for a
regression problem. You will see different statistics reported for other
problem types in this tab.

DiscipulusTM Software Owner’s Manual

Page 58 Frequently Asked Questions

Figure 38. The Best Programs Tab for a Regression Problem

Starting at the Best Programs Tab, there are five different tools for
analyzing and graphing the outputs of your best programs. they are
covered in the following topics:

• View Summary Best Program Statistics in the Best Programs Tab on

page 59

• Use Interactive Evaluator to View Code, Simplify, Edit, and Optimize

a Best Program on page 60

• Use Solution Analytics to Graph and Analyze the Outputs of a Best

Program on page 61

• Use the Data Window to Graph the Predicted Outputs of Your Best
Program vs. the Target Outputs for the Best Program on page 63

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 59

• Use the Data Window View the Numeric Predicted Outputs of your
Best Program. on page 64

View Summary Best Program Statistics in the Best Programs Tab

The Best Programs tab shows summary statistics for the thirty best
programs of the project. An example of this tab for a regression problem
is shown in Figure 39.

Figure 39. Best Programs Tab for a Regression Problem. This

Figure Highlights the Data Set Selection Dropdown
Box.

In this figure, Each row represents a single program and the programs
are ordered from best to worst. The combined training and validation
data sets (see highlighted section) are used to compute the fitness values
for the 30 best programs.

The summary statistics shown are labeled in this window and will vary
depending on your problem type. For regression problems, the Best
Programs tab shows the R2 statistic for each program, the fitness value
computed for each best program, and the run number in the current
project in which the best program was found.

The summary statistics shown in the Best Programs tab of the Reports
Window pertain to particular data sets. The data set used to compute the
fitness statistic (training, validation, etc.) is shown in the Statistic
Displayed drop down box, which is highlighted in Figure 39.

DiscipulusTM Software Owner’s Manual

Page 60 Frequently Asked Questions

You may change the data set used to compute fitness by clicking on this
Statistic Displayed drop down box and selecting one of the options. For
example, to view the performance of one of the best programs on the
Validation Data, first choose the program and then select Validation
Data in the Statistic Displayed drop down box.

Use Interactive Evaluator to View Code, Simplify, Edit, and

Optimize a Best Program
To view, edit, simplify or optimize the C Code of one of the thirty best
programs, take the following steps:

First: Select the program you wish to view in the Best Programs Tab of
the Reports Window as shown in Figure 40.

Figure 40. Best Program Window Showing Third Best Program

Selected for Analysis

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 61

Second: Click on the "Analyze Program" Button or double click on the

program itself. In either case, the interactive evaluator window will open
with the selected program loaded as shown in

Figure 41. Interactive Evaluator Window with Selected

Regression Program Loaded

Interactive Evaluator is a powerful tool that enables editing,

optimization and simplification and is described in detail in Interactive
Evaluator.

Use Solution Analytics to Graph and Analyze the Outputs of a

Best Program
To show the Solution Analytics graphic analytic application for a
particular program, you start at the Best Programs tab of the Reports
window and open Interactive Evaluator, as described in Use Interactive
Evaluator to View Code, Simplify, Edit, and Optimize a Best Program
on page 60. Then, click the "Start Solution Analytics" button.

The Solution Analytics application will open and the program you were
viewing in Interactive Evaluator will be automatically loaded into it. A
regression problem in Solution Analytics looks like Figure 42.

DiscipulusTM Software Owner’s Manual

Page 62 Frequently Asked Questions

Figure 42. Solution Analytics with a Regression Problem

Loaded

Note the tabs in the above figure show you graphics that are particular to
regression problems. On the other hand, had you used Solution
Analytics to handle a ranking problem, Solution analytics would have
looked like Figure 43.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 63

Figure 43. Solution Analytics with a Ranking Problem Loaded

Use the Data Window to Graph the Predicted Outputs of Your

Best Program vs. the Target Outputs for the Best Program
To view a graph of the outputs of one of the thirty best programs from a
project, first select the program you wish to view in the Best Programs
Tab of the Reports Window and then click on the View Results button.
The Data Window will pop up. The program you just selected is
displayed as the Selected Program. It is plotted against the Target
Output from your Data Files.

DiscipulusTM Software Owner’s Manual

Page 64 Frequently Asked Questions

Use the Data Window View the Numeric Predicted Outputs of

your Best Program.
To view numeric outputs of one of the thirty best programs, first select
the program you wish to view in the Best Programs Tab of the Reports
Window and click on the View Results button. The Data Window will
pop up. Choose Spreadsheet View.

The outputs of the program you just selected are displayed in the
Selected Program Column as shown in Figure 44.

Figure 44. Discipulus Data Window Containing the Outputs of a

Best Program Just Selected

For more information about the operation of the Data Window, see the
Discipulus Owner’s Manual.

How Do I Graph and Analyze the Outputs of a Selected

Best Team
During a project, Discipulus assembles the best programs into teams.
The output from all of the programs that comprise a team are assembled
into one collective output that is frequently better than any particular
member of the team.

After your project is complete, the Reports window automatically opens.

Click on the "Team Solutions" tab of the Reports window. You will see

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 65

five different team sizes from which you may select and the fitness of
each.

The following subjects are treated here:

• Viewing Summary Statistics for the Best Team Models on page 65

• Viewing a Graph of the Outputs of a Selected Best Team Model in the

Data Window on page 67

• Viewing a Graph of the Outputs of a Selected Best Team Model in the

Data Window on page 67

• Viewing Numeric Outputs of a Selected Best Team Model in the Data

Window on page 67 and

• Viewing and Saving C, Java, or Assembler Code of a Selected Best

Team Model on page 68.

Viewing Summary Statistics for the Best Team Models

The statistics shown in the Best Teams Tab of the Reports Window
show the performance of each team on particular data sets. You can
change the data set in the combo box at the bottom of the Tab. For
example, to view the performance of one of the best programs on the
Validation Data, first choose the team and then select Validation Data in
the combo box.

For classification problems, this Tab also shows a breakdown of the

performance of the Team by the team vote. For example, a team of 5
members may vote 5:0 for class 1, or 4:1 and so forth. The strength of
the vote is usually a very good predictor of the probability that the class
prediction is correct.

To see the detailed breakdown by team vote for any particular team,
select the team. The detailed breakdown will appear in the lower
window.

Viewing Graphic Analytics of the Output of a Selected Best Team

Model in Solution Analytics
When a Project is over, the Reports Window opens automatically. Select
the "Team Solutions" tab. You will see summary statistics of the five
best teams of different sizes for the project. Select the team for which

DiscipulusTM Software Owner’s Manual

Page 66 Frequently Asked Questions

you wish to view graphic analytics. Then click on the "Start Solution
Analytics" button. The Solution Analytics Application will open and
your best team will be loaded automatically into Solution Analytics. For
a regression problem, Solution Analytics will look like Figure 45 when
it opens.

Figure 45. Solution Analytics Loaded with a Regression

Problem, Best Team Model

Note, because this was a regression problem, Solution Analytics shows

charts appropriate to regression-type problems. Had this been a
classification, ranking or logistic regression problem, the charts would
be appropriate to the problem type.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 67

See the Solution Analytics Owner’s Manual for more information.

Viewing a Graph of the Outputs of a Selected Best Team Model in

the Data Window
To view a graph of one of the best team models, first select the team you
wish to view on the Best Teams Tab of the Reports Window. Then click
on the View Results button. The Data Window will pop up. The
program you just selected is displayed as the Selected Program. It is
plotted against the Target Output from your Data Files.

For more information about the operation of the Data Window, see the
Discipulus Owner’s Manual chapter on this window.

Viewing Numeric Outputs of a Selected Best Team Model in the

Data Window
To view numeric outputs of one of the best team models, first select the
team you wish to view in the Best Programs Tab of the Reports
Window. Then click on the View Results button. The Data Window will
pop up. Choose Spreadsheet View.

The outputs of the team you just selected are displayed in the Selected
Program Column as shown in Figure 46.

Figure 46. Discipulus Data Window Containing the Outputs of a

Program or Team Just Selected

DiscipulusTM Software Owner’s Manual

Page 68 Frequently Asked Questions

For more information about the operation of the Data Window, see that
chapter in the Discipulus Owner’s Manual.

Viewing and Saving C, Java, or Assembler Code of a Selected

Best Team Model
To view the source code for the best teams:

• Select the team you wish to view in the Best Teams Tab of the
Reports Window.

• Then click on the View Code button.

• The Team Solution Code window will pop up. Choose the language
you wish to view in the combo box at the bottom.

To save the code displayed in the Team Solution Code window, click on
the Save Button. Assembler and C code is saved in .cpp files. Java code
is saved in .java files.

Is There a Way to Find Out which Input Variables Are

the Most Important?
Yes. After you finish a project, Discipulus looks through all of the best
programs and analyzes how many times each input appears in a way that
contributes to the fitness of the programs that contain them. Those
results are shown on the Input Impacts Tab of the Reports Window. A
value of 1.00 in the Frequency column indicates that this input variable
appeared in 100% of the best thirty programs.

You may sort the inputs on any column in the Input Impacts Tab.

In addition to frequencies, you may calculate the average and maximum

impact of removing each input has on the programs from which they are
removed. To do so, click on the Calculate Impacts button. The results
are scaled between 0 and 1.0. A value of 1.0 represents the largest
impact value possible.

Finally, you may save the input impacts report by clicking on the Save
button on the Inputs Impact Tab.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 69

How Do I Use the Models Discipulus Has Created?

You may deploy best program or best team models to new data in either
of two ways: (1) From within Discipulus by loading new applied data;
or (2) By saving C, C Sharp, Java, Delphi or Assembler code of the
model and calling that code from your own programs.

This section treats the following subjects:

• How to Deploy Discipulus Models from within Discipulus on

page 69; and

• How to Deploy Discipulus Models as Source Code on page 70

How to Deploy Discipulus Models from within Discipulus

You may deploy Discipulus models to new data from within Discipulus
in two different ways:

1. Deploying Best Program or Best Team Models from a Project

File. All of the best programs and best teams in a project are
stored in the project file. To run any evolved program or team
stored in this manner on new data, take the following steps:

* On the file menu, click, Open. Select the project file that contains
the program you are interested in.

* On the file menu, choose Load New Applied Data. Select the file
containing the data you wish to apply the program to or use Notitia
to import the new data. This file may contain Target Outputs or it
may not. Discipulus will ask you whether the file has Target
Outputs in it and will adjust its behavior according to your answer.

* Go to the Best Programs Tab or the Team Solutions Tab of the

Reports Window (depending on whether you are applying a
program or a team) and you have loaded the new applied data.
Select the program or team you want to apply to the new data.

* Click on View Results in the Best Programs or Team Solutions

Tab. The Data Window will pop up.

* Alternatively, click on View Results in the Interactive Evaluator.

The Data Window will pop up.

DiscipulusTM Software Owner’s Manual

Page 70 Frequently Asked Questions

* Click on the Applied Tab of the Data Window. The outputs of the
program on the new applied data appear as the Selected Program
in the Data Window.

You may view these outputs in a graph or in a spreadsheet form in the

Data Window. For more information on the Data Window.

2. Deploying a Saved Program from the Interactive Evaluator

You may save and load programs from the Interactive Evaluator.

To deploy the program that you have previously saved via the
Interactive Evaluator, take the following steps:

* In Interactive Evaluator, click Load Program. Locate and load the

program you wish to apply to new data.

* On the file menu, choose Load New Applied Data. Select the file
containing the data you wish to apply the program to or import the
data from Notitia.

* Click on View Results. The Data Window will pop up.

* Click on the Applied Tab of the Data Window. The outputs of the
program on the new applied data appear as the Selected Program
in the Data Window.

How to Deploy Discipulus Models as Source Code

You may save best programs or best team models as C, C Sharp, Java,
Delphi or Intel Inline Assembler source code. See: How Do I Save an
Evolved Best Program Created by Discipulus? on page 50 and How Do
I Save an Evolved Best Team Created by Discipulus? on page 54.

Once you have saved source code files in this manner, you may call
them from your own programs and send new data to them. The source
code files return the output of the best programs and best teams.

The interfaces by which you call the evolved source code programs is
described in detail a separate document installed with Discipulus named
Decompiled_Program_Interface.PDF.

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 71

What Do Input001, Input002, etc. Represent in the

Interactive Evaluator?
If you use direct text file import to get your data into Discipulus, you
will see labels like Input001, Input002 in the Interactive Evaluator
display of your evolved program. The labels, Input001, etc. are names
Discipulus assigns to your data files. Input001 represents the first
(leftmost) input variable in your data files. Input002 represents the
second input variable in your data files and so forth.

Discipulus uses these names throughout its Data, Reports, and

Interactive Evaluator windows to refer to your input variables.

On the other hand, if you use Notitia to get your data into Discipulus,
Notitia will use the column names you have assigned to input columns
in the decompiled best programs and best teams.

What Do f[0], f[1] Represent in Interactive Evaluator?

When you look at programs created by Discipulus, you will see values
like f[0], f[1] and so forth. They are temporary computation variables in
the programs Discipulus creates. The output of these programs is the
value remaining in f[0] after the program executes.

The values in all f variables are initialized to zero before executing a

program by Discipulus.

What about Overfitting?

Overfitting is a very common problem in modeling software. It occurs
when the models developed perform very well on training and
validation data, but do not perform so well on the unseen, applied data.

Discipulus has many built in protections against overfitting. You will

find that Discipulus overfits less that other modeling tools. However,
Discipulus is not immune.

The topics covered here are:

• How to Detect Overfitting on page 72; and

• How to Eliminate Overfitting on page 72.

DiscipulusTM Software Owner’s Manual

Page 72 Frequently Asked Questions

How to Detect Overfitting

Overfitting is easy to detect. Here are three common symptoms:

1. If the training performance is considerably better than the

validation performance and if the validation performance is
considerably better than the applied data performance, you
probably have an overfitting problem. Another sign of overfitting
is if, during the course of a project, you notice that as the training
and validation performance gets better, the applied data
performance becomes worse.

2. Another sign of overfitting is if, during the course of a project, you

notice that the applied data value initially improves but later
becomes worse.

How to Eliminate Overfitting

The principal cause of overfitting is having too many inputs relative to
the number of data points in your training and validation data--another
way of thinking about it is that you have too many columns in your data
relative to the number of rows.

There is, unfortunately, no hard and fast rule that can describe the
appropriate relationship between rows and columns of data. Each data
set has its own distribution; and that distribution strongly affects
whether you have enough data points. For an excellent discussion of the
sufficiency of the size of a particular data set, See Pyle, Dorian, Data
Transformations for Data Mining, Morgan Kaufman Publishers, Inc.
1999.

But there are some techniques you can try to eliminate it on your data.
They are:

1. Get more data. This is the best single approach and, if more data
can be obtained, this should have the best effect on your
performance.

2. Reduce the number of inputs. After you finish a project that

produces overfit results, Discipulus produces a detailed report on
the importance of the various inputs. It is in the Input Impacts tab
on the Reports window. See: Is There a Way to Find Out which
Input Variables Are the Most Important? on page 68. Here is how
to use the information on that tab to reduce overfitting:

DiscipulusTM Software Owner’s Manual

Frequently Asked Questions Page 73

* From the information on the Input Impacts Tab, you can decide
which inputs had a real impact on your best solutions and which
did not.

* Once you have made that determination, eliminate the least

important of the impacts and redo the project.

* You may reduce the number of inputs in two ways: (1) Redo your data
set, without the unwanted inputs; or (2) Disable the inputs in the Data
Window--see Excluding Inputs from a Project on page 111.

3. Reduce the target program size for the project. By default,

Discipulus projects create runs with a maximum program size that
is randomized around 512 bytes in length. You can shorten that
maximum program size and prevent it from randomizing as
follows:

* On the Set Up Learning Menu, click Options. The Advanced

Options Page appears.

* To reset the target maximum program size, Click on the Set

Button. The Single Run Advanced Options page appears. Choose
the Program Size and Constants button.

* Reduce the Max Program size from 512 to either 128 or 256.

* To prevent Max Program size from randomizing, on the Advanced

Options Page, click Randomize. The Randomize Parameters page
will appear. Make sure that the Maximum Program Size box is
unchecked.

* Reducing the size of the programs is also a good technique when

you are performing a project for the sole purpose of narrowing the
number of inputs. It tends to make Discipulus very choosy about
using a particular input.

4. Reduce the DSS Subset Size and Eliminate Selection by

Difficulty in Choosing the Subset. By default, the DSS Subset
Size is set to 50. You can reduce that figure and eliminate
selection by difficulty as described in Dynamic Subset Selection
Parameters on page 159.

DiscipulusTM Software Owner’s Manual

Page 74 Frequently Asked Questions

How Do I Do a Single Run?

Discipulus performs multiple configured runs by default. If you want to
perform a single run at a given set of parameters, take the following
steps:

• Start a new project (click File, New);

• The Wizard appears. Follow the steps in the Wizard until you get to
the screen that has an Options button. Click that button. The
Advanced Options Page appears;

• Uncheck the Stepping, Enabled box;

• Set the Maximum Number of Runs to 1;

• Click the Set Button. Set whatever individual run parameters you
want for the individual run. Click OK;

• Click the Randomize Button. Make sure all boxes are unchecked.
Click OK.

• On the Advanced Options Page, click OK.

• Continue with the Project Setup Wizard.

DiscipulusTM Software Owner’s Manual

Page 75

Controlling Discipulus Projects

This chapter describes how to control project-level settings in
Discipulus. These are settings that affect the entire project, such as
project termination, run termination, randomization of parameters across
runs, and the like. These settings are controlled from the Advanced
Options Window.

In general, Discipulus sets good defaults for these parameters and you
will often not need to change them. If you do need to change them, this
chapter tells you how.

In Discipulus, the "project" is the key organizing concept. In a single

project, you import data and perform both modeling and subsequent
scoring of new data with the evolved model. A project is saved with a
*.bst extension. All parameters you set, all data you import and all
evolved program or team models are saved in the project file for later
use.

A project is different than a Genetic Programming "run." A Discipulus

project may consist of a single Genetic Programming run or thousands
of Genetic Programming runs. By default, Discipulus conducts projects
using many runs per project. During those runs, and by default,
Discipulus creates one run after another by choosing different
randomized parameter settings for each run. These are project level
settings as opposed to "run" level settings.

This section covers the following topics:

• Where to Control Project-Level Settings on page 76

• Choosing Stepping Mode or Fixed Mode for Run Termination on

page 77

• Setting the Stepping Mode Parameters on page 78

• Setting the Fixed Mode Parameters on page 78

• Setting Run Randomization Targets (the "Set" Button) on page 79

• Setting which Run Parameters Randomize between Runs (the

"Randomize" Button) on page 80.

DiscipulusTM Software Owner’s Manual

Page 76 Controlling Discipulus Projects

Where to Control Project-Level Settings

All project-level parameters may be set in the Advanced Options
Window, which is shown in Figure 47.

Figure 47. The Advanced Options Window

You may open the Advanced options window in one of two ways:

• Use the Set Up Learning menu on the main menu and select Options;
or

• From the Discipulus Project Setup Wizard, select Options, as shown

in Figure 48. For more information about using the Project Setup
wizard, please see Using the Project Setup Wizard on page 36.

DiscipulusTM Software Owner’s Manual

Controlling Discipulus Projects Page 77

Figure 48. The Customize Parameters Window of the Project

Wizard

Choosing Stepping Mode or Fixed Mode for Run

Termination
Discipulus projects run in either Stepping Mode or Fixed Mode. The
default value is Stepping Mode.

• In Stepping Mode, Discipulus starts with short runs and then steps-up
the length of the runs during the project.

• In Fixed Mode, the length of all runs in the project is the same.

You may choose between these two modes as follows:

• On the Advanced Options Window, check the Stepping Enabled box.

When it is checked, Stepping Mode is enabled. When it is not
checked, Fixed Mode is enabled.

DiscipulusTM Software Owner’s Manual

Page 78 Controlling Discipulus Projects

Figure 49. The Advanced Options Window with Stepping

Enabled

Setting the Stepping Mode Parameters

The only parameters necessary to define the Stepping Mode are:

• Initial Generations without Improvement. This parameter sets the

beginning number of generations that a run may perform without an
improvement in fitness before it is terminated;

• Runs to Complete at each Level. This parameter tells Discipulus

how many runs to complete before doubling the length of the runs.

Setting the Fixed Mode Parameters

In Fixed Mode, Discipulus terminates all runs in the project using the
same criterion. The parameters with which you may control this process
are:

• Generations Without Improvement. You may choose to terminate

each run based on how many generations that run has gone without
an improvement in fitness by clicking on the appropriate radio

DiscipulusTM Software Owner’s Manual

Controlling Discipulus Projects Page 79

button. Then, you may set the actual value for terminating the runs by
typing in the box to the right.

• Generations Since Start. You may choose to terminate each run

based on how many generations that run has gone since the run
started by clicking the appropriate radio button. Then, you may set
the actual value for terminating the runs by typing in the box to the
right.

• Maximum Number of Runs. Whatever number you enter in this box

will cause the project to terminate after it has performed that number
of runs.

• Adaptive Termination Enabled. When you click the Enabled box,

this causes Discipulus to terminate runs that are not performing so
well earlier than better performing runs.

• Adaptive Termination Settings. You may set how many levels

Discipulus at which will terminate the worse performing runs. You
may also set what percentage of runs should be terminated at each
level. For example, suppose you were doing a Fixed Mode project
with a termination criterion of 600 generations without improvement.
The default Adaptive Termination Settings of 4 levels and 50% at
each level mean that: (1) At 32 generations without improvement, the
worst 50% of runs will be terminated; (2) At 75 generations without
improvement, 50% of runs will be terminated; (3) At 150 generations
without improvement, the worst 50% of runs will be terminated; and
(4) At 300 generations without improvement, the worst 50% of runs
will be terminated.

Setting Run Randomization Targets (the "Set" Button)

During a project, Discipulus will by default randomize parameters from
run to run. You may control the target around which randomization
takes place by clicking the "Set" button on the Advanced Options
Window (see The Single Run Advanced Options Window on page 114
for how to use this window once it is opened).

DiscipulusTM Software Owner’s Manual

Page 80 Controlling Discipulus Projects

Setting which Run Parameters Randomize between

Runs (the "Randomize" Button)
During a project, Discipulus will by default randomize parameters from
run to run. You may control whether randomization occurs at all and, if
so, you may choose which of those parameters should be randomized by
clicking on the "Randomize" button in the Advanced Options Window
(see The Randomize Parameters Window on page 117 for more
information about how to use this window when it opens).

DiscipulusTM Software Owner’s Manual

Page 81

Discipulus Window Workspaces

Discipulus has eight working windows you will use. They are:

1. The Main Window. See Main Window on page 81

2. The Project Setup Wizard Windows. See The Project Setup

Wizard Windows on page 88

3. The Monitor Project Window. See The Monitor Project Window

on page 88

4. The Reports Window. See The Reports Window on page 94

5. The Data Window. See The Data Window on page 101

6. The Advanced Options Window. See The Advanced Options

Window on page 112

7. The Single Run Options Window. See The Single Run Advanced
Options Window on page 113 and

8. The Interactive Evaluator Window. See The Interactive Evaluator

Window on page 118.

Main Window
When you first start Discipulus, you will see the Main Window. It will
contain the first page of the Project Setup Wizard. If you cancel the
project wizard, the Main Window will appear as shown in Figure 50.

DiscipulusTM Software Owner’s Manual

Page 82 Discipulus Window Workspaces

Figure 50. The Main Window before any data is loaded

The Main Window is comprised of:

1. Menus (at the top). See The Main Window Menu Bar on page 82;

2. Toolbar (just below the menu bar). See The Main Window Toolbar
on page 87;

3. Status Bar (at the bottom of the screen). See The Main Window
Status Bar on page 88; and;

4. The Project Setup Wizard (not shown above). See The Project
Setup Wizard Windows on page 88.

The Main Window is visible at all times when you are running
Discipulus.

The Main Window Menu Bar

The menu bar is how you access most of Discipulus’ features. In brief
summary, here is the type of functionality provided by each main menu
selection. There are nine main menu items. Not all of them are active at
all times; but only when they provide useful functionality. They are
described in the following sections:

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 83

• File Menu on page 83

• Edit Menu on page 84

• View Menu on page 84

• Set Up Learning Menu on page 84

• Run Menu on page 86

• Interactive Evaluator Menu on page 86

• Registration Menu on page 86

• Window Menu on page 87

• Help Menu on page 87

File Menu

The file menu lets you take the following actions:

• Project Operations. Start a new project file (File, New), open an

existing project file (File, Open), close the current project file (File,
Close), or save or save the current project file under a new file name
(File, Save or Save-as).

DiscipulusTM Software Owner’s Manual

Page 84 Discipulus Window Workspaces

• Load New Applied Data. To deploy existing Discipulus models to

new data, you may use this menu item to load new data into the
Applied Data Window.

• Exit Discipulus. To exit Discipulus, click File, Exit.

Edit Menu
The edit menu lets you select and copy data from the Spreadsheet View
in the Data Window. This window is not active unless the Spreadsheet
view of the Data Window is active.

View Menu
The view menu lets you toggle the Toolbar and the Status Bar on and off.

Set Up Learning Menu

The Set Up Learning Menu has two important sub-menus: The Fitness
Functions sub-menu and the Options sub-menu. They are discussed
separately in the following sections:

• Fitness Sub-Menu on page 84

• Options Sub-Menu on page 85

Fitness Sub-Menu

Using the Fitness Sub-Menu, you may view or change the parameters of
the currently active Discipulus project. This menu is not active until you
have created a project using the Project Wizard. The operation of this
menu varies somewhat depending on whether a project is running or it is
finished.

When a project is running, you may use it to view the parameters of the
project, but not to change them. In that situation, the Set Up Learning
Menu looks like Figure 51:

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 85

Figure 51. Set Up Learning, Fitness Function Menu when a

Project is Running

This figure shows the menu fully expanded for the ranking fitness
function type. Please note three things:

• The check mark by Classification tells you that the current project is
running a classification fitness function.

• A menu item that is not grayed out (for example, "Regression") tells
you that you could select this menu item and it may be further
expanded for more information.

• A menu item that is grayed out (for example, "Logistic Regression")

tells you that the menu item may not be further expanded for more
information.

When the project is not running, you may use the items in this menu to
view or change them. Again, the check mark shows you the most
currently used fitness function. However, when the project is not
running, you may select any available fitness function and set its
parameters. In that case, you may then start the project over using the
new fitness function.

Options Sub-Menu

The Options sub-menu of the Set Up Learning Menu takes you to the
Advanced Options page, where you set project and run parameters. See
Controlling Discipulus Projects on page 75 and The Advanced Options
Window on page 112.

If a project is running, you may use the Advanced Options page to

review the parameter selections you have made for the project. But you
cannot change them.

DiscipulusTM Software Owner’s Manual

Page 86 Discipulus Window Workspaces

If a project is not running, you may review or change the parameters

you have set for the project. If you change the parameters, you may
restart the project and it will do so with your new parameter settings.

Run Menu

From this menu, you start, end, and continue projects. In addition, you
may use this menu to jump to the Reports Window.

Interactive Evaluator Menu

From this menu, you may open the Interactive Evaluator menu. When
you do this, the best program of the current project is automatically
loaded into the Interactive Evaluator Window. Interactive Evaluator is
the principal tool you use after a project is finished for analyzing,
graphing, and refining your best programs of the run.

Registration Menu

This menu lets you manage your license for Discipulus. There are four
sub-menus that you may use:

• Enter Activation Code. Use this menu if you have to enter an

activation code for Discipulus. You may encounter the need to do this
when the Demonstration period runs out or when you extend or
upgrade your license. You will receive your Activation Codes by
email or from the "Account" section of our website.

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 87

• Buy Add-Ons/Upgrade/Renew. Use this menu to extend your

license, to add features to your license or to upgrade your license
version (for example, upgrading from Professional to Enterprise).

• Purchase. Use this menu to purchase a new license for Discipulus.

• Deactivate Discipulus.Use this menu to move Discipulus to a

different machine. Use this cautiously. Once you have deactivated
Discipulus on a machine, it may not be reinstalled.

Window Menu
From this menu, you may switch back and forth between the various
windows in Discipulus.

Help Menu
This gives you information about how to use Discipulus.

The Main Window Toolbar

The toolbar has seven icons. They are:

This icon creates a new project. It is equivalent to the menu

selection: File, New.

This icon opens an existing project. It is equivalent to the menu

selection: File, Open.

This icon saves the current project. It overwrites any existing

project with the same path and file name. It is equivalent to the
menu selection: File, Save.

This icon copies to the Windows Clipboard, the current selected

columns in the Data Window, spreadsheet view. It is equivalent to
the main menu selection, Edit, Copy.

This icon starts the currently loaded project running. It is

equivalent to the menu selection: Run, Start Run.

This icon continues a project from where it was halted. It is

equivalent to the menu selection: Run, Continue. If a project is
loaded in Discipulus, it continues that project. If no project is loaded in

DiscipulusTM Software Owner’s Manual

Page 88 Discipulus Window Workspaces

Discipulus, this button opens a file browser. When you designate a

project file, it will continue that project from where it left off.

This icon halts the current project. It is equivalent to the menu

selection: Run, Finish.

The Main Window Status Bar

The status bar runs across the bottom of your screen. It provides
information about what processes Discipulus is performing, such as:
"Running Engine" or "Calculating Teams."

The Project Setup Wizard Windows

Every Discipulus project begins with the Project Wizard. You may get
to the Project Setup Wizard in three ways:

1. It appears each time you start Discipulus;

2. It appears each time you click File, New; or

3. It appears each time you click the New File icon in the toolbar.

For a detailed description of how the Project Setup Wizard works, see
How Do I Start a Project? on page 34.

The Monitor Project Window

The Monitor Project Window is the principal source of information you
will want while Discipulus is running a project. It has four tabs. They
are described in the following topics:

1. The Overview Tab on page 89;

2. The Project History Tab on page 90;

3. The Project Detail Tab on page 91;

4. The Current Run Tab on page 93;

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 89

The Overview Tab

The Overview Tab gives a summary of the current project. Figure 52
shows this tab while Discipulus is running a classification problem.

Figure 52. The Overview Tab of the Monitor Project Window for
a Classification Problem

The Best Program and Best Team boxes show the program that
performs best on the combined training and validation data for the
project thus far.

The Best Program and Best Team boxes provide appropriate statistics
for your problem type. For example, while Figure 52 shows the
Overview Tab for a classification problem, Figure 53 shows the same
tab for a ranking (ROC curve type problem). Note that the ranking
problem displays a completely different set of statistics for the best
program and the best team because fitness is computed differently for
the ranking and classification problem types.

DiscipulusTM Software Owner’s Manual

Page 90 Discipulus Window Workspaces

Figure 53. The Overview Tab of the Monitor Project Window for
a Ranking (ROC Curve) Problem

Regardless of problem type, the Project Status box in the Overview Tab
shows how many runs have been performed, how many programs have
been evaluated for fitness, the time elapsed and the current step of the
current project.

The Project History Tab

The Project History Tab shows, in graphic form, the error rate for the
best program and the best team found during the project.

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 91

Figure 54. The Project History Tab of the Monitor Project

Window

The red line shows the performance of the best program as more runs
are completed in the project. The green line show the performance of the
best team as the project proceeds. In both cases, the number reported is
error; so lower is better.

The Project Detail Tab

Typically, very short runs in a project do not find the best program
possible. It is necessary to perform longer runs to find the best possible
programs.

The Project Detail Tab shows the value of the best program found as
individual runs in a project get longer and longer. So it gives you
information about how long runs should be to find the best program for
your problem.

You can sort the best program values in the detail tab in two ways:
either you can view the information by how many generations a run has
been going without improving (Generations Without Improvement) or

DiscipulusTM Software Owner’s Manual

Page 92 Discipulus Window Workspaces

you can view it by how many generations a run has been going since the
start of the run (Generations Since Start).

Figure 55. The Project Detail Tab

For the current project, Figure 55 shows the following:

• Only one run in the project has gone as long as 160 generations
without improvement. It was a regression problem; so the best fitness
over all programs (mean squared error) was 0.055817.

• Twenty-one runs in the project went as long as 40 generations

without improvement. At that point, the project had also found a
program with a fitness of 0.055817.

• At 30 generations without improvement, no run had produced a best

program with a fitness as good as 0.0555817.

So we can tentatively conclude that the termination criterion for runs

should be at least 40 generations without improvement.

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 93

The column labeled Best Five Average shows the average of the fitness
of the best five programs in the project.

Note that the Project Detail Tab only shows information where there has
been a change in the Best Program or Best Five Average columns.

Finally, in the combo box at the bottom of the Project Detail Tab, you
can select between displaying information based on Generations Since
Start (shown) and Generations Without Improvement.

The Current Run Tab

Discipulus projects are normally made up of many individual Genetic
Programming runs. The Current Run Tab shows information about the
performance and parameters being used in the current run.

Figure 56. The Current Run Tab

The Status Box shows how long the current run has been going.

The Performance box shows the fitness of the best program in the
current run to date.

DiscipulusTM Software Owner’s Manual

Page 94 Discipulus Window Workspaces

The Selected Parameters shows selected parameters used in the current

run. You will notice that these parameters will often change from run-
to-run. This occurs when the current project is set to randomize and
optimize the parameters between runs. Randomization and optimization
of parameters between runs is the default setting for all new Discipulus
projects.

The Reports Window

After you finish a project, Discipulus automatically creates
comprehensive reports (the Reports Window) that let you select and
analyze the 30 best program models. The Reports Window appears after
a project has been finished. It gives detailed information about:

• The thirty Best Programs found during the project,

• The five Best Teams found during the project; and

• The relative importance of the inputs in your data files.

The following three topics are treated below:

1. The Best Programs Tab on page 94;

2. The Team Solutions Tab on page 96; and

3. The Input Impacts Tab on page 99.

The Best Programs Tab

The Best Programs Tab shows performance detail of the thirty best
programs found during a project. Figure 57 shows the Best Programs
Tab after a project is complete for a classification problem.

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 95

Figure 57. The Best Programs Tab for a Classification Problem

The Best Programs Tab is the starting point for analysis, graphing,
saving, simplification and editing of your best programs. Detailed
information about how to save, graph, analyze, simplify and edit your
best programs may be found in: (1) How Do I Graph and Analyze the
Outputs of a Selected Best Program Created by Discipulus? on page 57;
(2) How Do I View an Evolved Best Program Created by Discipulus? on
page 48; and (3) How Do I Save an Evolved Best Program Created by
Discipulus? on page 50

Each line in the Best Programs Tab represents one of the thirty best
programs of the project. When you first open the window, the best fit
program is in the top row.

You may select any of the best programs by clicking on that program in
the "Hit-Rate" column. In Figure 57, the third best program has been
selected.

You may sort the best programs by clicking at the top of a column.
Doing so sorts the best programs by the entries in that column.

DiscipulusTM Software Owner’s Manual

Page 96 Discipulus Window Workspaces

You may display the best programs’ statistics on various data sets (e.g.
training, validation, training & validation, and applied data) by making
the appropriate selection in the "Statistic Displayed" Box. Note that the
statistics displayed varies by problem-type, i.e. regression,
classification, ranking, and logistic regression.

In addition to seeing the performance information for the best thirty

programs you may also perform the following steps from this tab:

• View detailed graphics and statistical analysis of any of these thirty

programs in the Solution Analytics application. See: Viewing
Graphic Analytics of the Output of a Selected Best Team Model in
Solution Analytics on page 65; or

• View the graphic outputs of any of these thirty programs graphically.

To do so, click on the View Results Button. The Data Window will
open up and the outputs of the selected program will be displayed in
the Data Window as "Selected Program." See: Use the Data Window
to Graph the Predicted Outputs of Your Best Program vs. the Target
Outputs for the Best Program on page 63; or

• View the numeric outputs of any of these thirty programs. To do so,

click on the View Results Button. The Data Window will open up
and the outputs of the selected program will be displayed in the Data
Window as "Selected Program." See: Use the Data Window View the
Numeric Predicted Outputs of your Best Program. on page 64; or

• View and save the code created by Discipulus for any of these thirty
best programs (click on the Analyze Program Button). See: Use
Interactive Evaluator to View Code, Simplify, Edit, and Optimize a
Best Program on page 60; or

• Move any of these thirty programs to the Interactive Evaluator

Window for editing, optimization and simplification (click on the
Analyze Program Button). See: Use Interactive Evaluator to View
Code, Simplify, Edit, and Optimize a Best Program on page 60.

The Team Solutions Tab

After you finish a project, Discipulus automatically creates
comprehensive reports (the Reports Window) that let you select and
analyze the 30 best program models. One of the report tabs is the Team
Solutions Tab.

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 97

Discipulus assembles Team solutions from the best programs in a

project. Team solutions combine individual programs from the project
into an ensemble solution. The output is the joint output of all programs
in the team. These team solutions frequently perform better than do
individual programs solutions. The Team Solutions Tab shows the best
team solutions found during a project.

Figure 58. The Team Solutions Tab Showing the Training &
Validation Data Performance of the Five-Program
Team on a Classification Problem

The Team Solutions Tab operates as follows:

1. The top panel of the Team Solutions Tab shows summary

performance data for the five best teams from the project--
specifically the best team of sizes: one program, three programs,
five programs, seven programs and nine programs. Each row in
the top panel of the Best Teams Tab represents one team.

2. You may select which data set the performance data applies to by
making an appropriate selection in the "Data Set Used" Box.

DiscipulusTM Software Owner’s Manual

Page 98 Discipulus Window Workspaces

3. You may select which team to view by clicking on the line

representing the team in the top panel in the "Team Size" column.

4. When you select a team in the tap panel, detailed information

about that team appears in the bottom panel for classification
problems (the bottom panel is empty for regression problems).

5. For classification problems, teams make the classification by

having the team members vote. Each line in the lower panel of the
Team Solutions Tab summarizes the performance of that team
when the vote is as set forth in the vote column. You may
ascertain the following information from the first line in the lower
panel of Figure 58,

* For 31.84% of the rows in your data, the best five member team
voted 0:5--that is, zero votes for class one and five votes for class
zero. In calculating this vote, each program in the five-team
solution has one vote. So in this case, a 0:5 vote means that all five
programs voted for class zero 31.84% of the time.

* The accuracy of 0:5 votes was about 94.76% on the selected data
set.

6. From the Team Solutions Tab, you can take the following steps:

* Perform detailed graphic analytics of the outputs of any of the best

teams. To do so, click the "Start Solution Analytics" tab and the
Solution Analytics application will open with the selected best
team automatically loaded. See: Viewing Graphic Analytics of the
Output of a Selected Best Team Model in Solution Analytics on
page 65

* View the numeric output of the selected team for all data
examples. To do so, click on the View Results Button. The Data
Window will open up and the outputs of the selected program will
be displayed in the Data Window as "Selected Program. When you
open the data window, if you see a graph, click the "Spreadsheet"
radio button in the lower left hand corner to see a graphic view of
these data.

* View the empirically measured probability that a particular

prediction for a particular data point by this best team is correct.
To do so, click on the View Results Button. That probability now

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 99

appears in the "Selected Program, Probability of Class 1" column

in the Data Window. When you open the data window, if you see a
graph, click the "Spreadsheet" radio button in the lower left hand
corner to see a graphic view of these data.

* You can view the code of the selected team by clicking on the
View Code Button. From the window that pops up, you can save
that code as a C, C Sharp, Delphi, Inline Assembler, or as a Java
function.

The Input Impacts Tab

The Input Impacts Tab contains information about how important the
inputs in your data file were in creating the thirty best programs in your
project. When the Input Impacts Tab first opens, it looks like Figure 59

Figure 59. Input Impacts Tab when it First Opens

Note the Average Impact and Maximum Impact columns are empty. To
compute those columns, click an the "Calculate Impacts" button. This
can take a while as it is very computationally intensive. Now the Input
Impacts Tab looks like Figure 60.

DiscipulusTM Software Owner’s Manual

Page 100 Discipulus Window Workspaces

Figure 60. The Input Impacts Tab after Computing the Average
and Maximum Columns

The Input Impacts Tab operates as follows:

1. Each line in the tab represents a different input variable from your
data set. Put another way, each line represents a single column in
your input data. If you imported your data from Notitia, and there
were column names, the column names you imported appear here.
If, on the other hand, you imported text files directly into
Discipulus, Input001 represents the first column in your text file,
Input002 the second, and so forth.

2. You may sort the data in this tab by clicking at the top of the
column you wish to sort by.

3. The Frequency Column shows what percentage of the best thirty

programs from the project contained the referenced input.

4. The Average Impact Column shows the average effect of

removing that input from the thirty best programs of the project
and replacing it with a permuted version of that input. The greater
the value, the more impact removal had.

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 101

5. The Maximum Impact Column shows the maximum effect of

removing that input from each of the thirty best programs and
replacing it with a permuted version of that input. The greater the
value, the higher the maximum impact of removal.

The Data Window

The Data Window displays up to four different types of information in
either chart or spreadsheet formats. The information displayed changes
depending on your problem type. It also lets you save inputs to and
outputs from your evolved programs.

The information displayed by the data window falls in four categorize:

• Indices for your rows of data. You only see this information if you
have used the Notitia application to import your data to Discipulus;

• The inputs from the training, validation and applied data files;

• The Target Output from your training, validation, and applied data
files;

• Various predicted outputs from best programs and best teams evolved
by Discipulus. What outputs the Data Window shows depends on
your problem type as shown in the following table.

The predicted outputs in the Data Window are a particularly useful part
of Discipulus. Each output prediction is made for every row of data in
three columns: (1) A column for the Best Program of the project
selected by Discipulus; (2) A column for the Best Team of the project
selected by Discipulus; and (3) A column for evolved programs and
teams selected by you called the "Selected Program" column.

In brief summary, Table 2 shows what types of predicted output

columns are available in the Data Window for each of the problem types
supported by Discipulus.

DiscipulusTM Software Owner’s Manual

Page 102 Discipulus Window Workspaces

Table 2. Predicted Outputs in the Data Window by Problem Type

In addition, the Data Window lets you save the predicted outputs from
Discipulus created evolved programs and teams for use outside
Discipulus.

The following subjects are addressed in this section:

• Opening the Data Window on page 103

• The Data Window in Chart View on page 103

• The Data Window in Spreadsheet View on page 104

• Switching between Chart View and Spreadsheet View in the Data

Window on page 106

• Best Program, Best Team, and Selected Program Output Columns in

Spreadsheet View of Data Window on page 106

• The Three "Probability of Class One" Columns in Spreadsheet View

for Classification and Regression Problems on page 106

• The Three "Ranking" Columns in Spreadsheet View for Ranking and

Logistic Problems on page 107

• Saving Data to File from the Spreadsheet View on page 108

• Copying Data from the Spreadsheet View on page 109

• Making Chart View More Useful by Sorting Your Training and

Validation Files on page 109

• Switching between Training, Validation, and Applied Data in the

Data Window on page 109

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 103

• Turning on Continuous Display in the Data Window on page 109

• Refreshing the Data Window Manually on page 110

• Controlling the Display of the Outputs of the Three Best Evolved

Programs in the Data Window on page 110

• Controlling which Inputs the Data Window Chart Displays on

page 111

• Excluding Inputs from a Project on page 111.

Opening the Data Window

You may open the Data Window in three different ways:

• On the Window menu, click Data; or

• The Data Window is always open in Discipulus. You can just find it
on your screen and click on it; or

Click the "View Results" button in either the Interactive Evaluator

window or in the Team Solutions Tab of the reports window. The Data
Window will automatically open and the outputs of the selected best
program or best team will be automatically loaded into the "Selected
Program" column or columns in the Data Window.

The Data Window in Chart View

The Data Window displays your data as a chart or as a spreadsheet,
depending on the settings you choose. In Chart View, the Data Window
looks like Figure 61:

DiscipulusTM Software Owner’s Manual

Page 104 Discipulus Window Workspaces

Figure 61. The Data Window in Chart View

The functionality of the various chart view sections is highlighted and

described in magenta.

The Data Window in Spreadsheet View

The Data Window in Spreadsheet View shows the same information as
the chart view, except that it does so in a spreadsheet format. Figure 62
shows a simple two input regression problem where the data was
imported to Discipulus using the Notitia Data Preparation application.
The user’s column names for the inputs (independent variables) are
"In0" and "In1" (highlighted in green). The various portions of the
spreadsheet view that are different than the chart view are highlighted
and annotated.

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 105

Figure 62. The Data Window in Spreadsheet View on a Two-

Input Regression Problem

The RI and ID columns (highlighted in Magenta) only appear if you

imported a data file using the Notitia data preparation application. That
lets you label each row with unique indices.

The Target Output column (highlighted in blue) is your dependent

variable that Discipulus tries to predict.

The three columns highlighted in brown contain the actual output of

different programs and teams evolved by Discipulus. These columns
appear in the Data Window for all problem types and may be described
as follows:

* The Best Program Output column contains the evolved program

output for the Best Program of the project, as selected by
Discipulus.

* The Best Team Output column contains the evolved team output
for the Best Team of the project, as selected by Discipulus.

* The Selected Program Output column contains the evolved team

output for a program or team that you select by clicking on one of
the "View Result" buttons that appear elsewhere in Discipulus.

DiscipulusTM Software Owner’s Manual

Page 106 Discipulus Window Workspaces

That sends the selected program or team output to the Data

Window in this column.

The "Save" button (highlighted in red) lets you save data from the
current tab in the Data Window to text files.

Switching between Chart View and Spreadsheet View in the Data

Window
You may move back and forth between the Chart View (Figure 61) and
the Spreadsheet View (Figure 62) as follows:

• To select Chart View, click Chart in the lower part of the Data
Window;

• To select Spreadsheet View, click Spreadsheet in the lower part of

the Data Window.

Best Program, Best Team, and Selected Program Output

Columns in Spreadsheet View of Data Window
The three columns highlighted in brown contain the actual output of
different programs and teams evolved by Discipulus. These columns
appear in the Data Window for all problem types and may be described
as follows:

* The "Best Program Output" column contains the evolved program

output for the Best Program of the project, as selected by
Discipulus.

* The "Best Team Output" column contains the evolved team output
for the Best Team of the project, as selected by Discipulus.

* The "Selected Program Output" column contains the evolved

output for a program or team that you select by clicking on one of
the "View Result" buttons that appear elsewhere in Discipulus.
That sends the selected program or team output to the Data
Window in this column.

The Three "Probability of Class One" Columns in Spreadsheet

View for Classification and Regression Problems
For classification and logistic regression problem types, three additional
columns appear in the spreadsheet view of the Data Window. They are:

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 107

* The "Best Program Probability of Class One" column contains the

probability that the row containing it is a member of Class One,
given the outputs of the Best Program of the project, as selected by
Discipulus.

* The "Best Team Probability of Class One"column contains the

contains the probability that the row containing it is a member of
Class One, given the outputs of the Best Team of the project, as
selected by Discipulus.

* The "Selected Program Probability of Class One" column contains

the predicted probability that the row containing it is a member of
Class One, given the outputs of a program or team that you have
selected by clicking on one of the "View Result" buttons that
appear elsewhere in Discipulus.

The Three "Ranking" Columns in Spreadsheet View for Ranking

and Logistic Problems
Ranking Problems have three additional columns in the data window.
Just those columns are shown below:

The meaning of the Best Program, Best Team, and Selected Program
columns have been explained elsewhere.

The ranking columns are computed by taking the predicted outputs by

the respective best program and team and selected program over all
training and validation data and ranking them. The values shown on
each row are the ranking that the output has for that row across that
entire data set, normalized to the zero to one range. It is possible that

DiscipulusTM Software Owner’s Manual

Page 108 Discipulus Window Workspaces

applied outputs fall outside that range. They are displayed with a value
above one or below zero, as appropriate.

Saving Data to File from the Spreadsheet View

To save data in the Data Window to file, select the data file (training,
validation, or applied) and the columns therein you wish to save. Then
click the "Save" button highlighted in red on Figure 62. When you do,
the following window appears:

The following option may be selected:

• Save all Files at Once. This option lets you save training, validation
and applied data with a single click to the "Folder" and "File Name"
you designate.

• The Training, Validation, and Applied check boxes lets you save
each data file to a separately designated file and location.

• Save Column Titles will make the first row of the output file the
column titles shown in the spreadsheet view of the Data Window.
Unchecked, this option saves just the numeric data.

• Append to file adds the data you are about to save to an existing file.

• Include Indices will include the RI (Row Index) and ID (User Index)
columns if they exist.

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 109

• Save Selected Columns means that only the columns you selected
before opening this window will be saved.

• All means that all columns in the spreadsheet view will be saved.

• Save Outputs means that just the target and predicted output columns
will be saved.

Copying Data from the Spreadsheet View

Select the columns you want to copy in the spreadsheet view. From the
main menu, click Edit, Copy. The columns have been copied to the
Windows clipboard.

Making Chart View More Useful by Sorting Your Training and

Validation Files
For most training and validation data sets, it is possible to sort the
training and validation examples so that the output appears as a
continuous line in the Chart View. How to do that depends on your data.
But you will find that it is much easier to view how well the best
evolved program is matching the Target Output (i.e. the output column
of your training and validation files) if you do this.

Learning in Discipulus is completely independent of the order of the

training and validation examples in your data file. So you may sort your
data in any way you choose without concern that it will hurt the learning
capabilities of Discipulus.

Switching between Training, Validation, and Applied Data in the

Data Window
You may shift back and forth between viewing the training data, the
validation data, and the applied data by selecting the appropriately
named tab in the Data Window. These tabs are highlighted in magenta
in Figure 61.

Turning on Continuous Display in the Data Window

If the Refresh After Every Run box is selected, then the various data
displays will be updated at the end of each run during your project. This
will cause projects with very large data sets to run considerably slower.

DiscipulusTM Software Owner’s Manual

Page 110 Discipulus Window Workspaces

Refreshing the Data Window Manually

The Refresh button updates the various data displays in the Data
Window after you click it.

Controlling the Display of the Outputs of the Three Best Evolved

Programs in the Data Window
Discipulus displays predicted outputs of three evolved programs during
every project. For Ranking, Logistic Regression, or Classification
problem types, it will display additional predicted outputs, as described
above. During and after a project, you may look at the outputs of all
three of these programs in the Data Window.

Here is how you may monitor the various programs and teams from the
Data Window:

1. The best program of the entire project is displayed as "Best

Program" in the Data Window;

2. The best team solution of the entire project is displayed as the

"Best Team" in the Data Window;

3. While a project is running, the Data Window displays the best

program of the project as the Selected Program in the Data
Window;

4. After a project is finished only. When you select a program in

the Best Programs Tab of the Reports Window (see The Best
Programs Tab on page 94) and click the View Results Button,
Discipulus displays the output of the program you selected as the
"Selected Program" in the Data Window;

5. After a project is finished only. When you select a team in the

Team Solutions Tab of the Reports Window (see The Team
Solutions Tab on page 96) and click the View Results Button,
Discipulus displays the output of the team you selected as the
"Selected Program" in the Data Window;

6. After a project is finished only. When you click on View Results

in the Interactive Evaluator (see Viewing the Outputs of a Program
from Interactive Evaluator on page 130), Discipulus displays the
outputs of the program in the program box of Interactive Evaluator
as the "Selected Program" in the Data Window.

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 111

The Chart Selection Box lets you control which of these three output(s)
is displayed in the Chart View of the Data Window. Figure 61 shows the
entire Data Window including the Chart Selection Box. Figure 63 is a
blow up of the Chart Selection Box:

Figure 63. The Chart Selection Box in the Data Window

You turn different evolved programs’ outputs on and off by checking

and unchecking the boxes in the Chart Selection Box.

Controlling which Inputs the Data Window Chart Displays

The Chart Selection Box also lets you control which inputs from the
training and validation data files are displayed in the Chart View. By
default, all inputs are turned off in the Chart View. You may cause one
or more of the inputs to be displayed as follows:

• Make sure you are in the Chart View of the Data Window (from the
Window Menu, select Data; then check the Chart Button at the
bottom of the screen);

• Scroll to the top of the Chart Selection Box (see Figure 63);

• Click on the box labeled V0 at the top of the Chart Selection Box.

You will see a new line appear on the chart. This new line shows you
the values of the first input variable from your data files, which
Discipulus calls V0. You may repeat this step for your other inputs, if
there are other inputs in your data files.

Excluding Inputs from a Project

Make sure you are in the Chart View of the Data Window. Double click
on the input column you do not wish to use in your project. It will
display "Disabled" at the top of the column. To reverse this, just double
click on the same column again--the "Disabled" label will disappear.

DiscipulusTM Software Owner’s Manual

Page 112 Discipulus Window Workspaces

The Advanced Options Window

The Advanced Options Window lets you set project level parameters. A
Discipulus project consists of up to hundreds of runs. In Discipulus
projects, Discipulus creates one run after another by randomly choosing
different parameter settings for each run. Normally, Discipulus projects
randomize the individual run parameters around a set of target
parameters for single runs.

You may control the project level parameters and many of those single
run target parameters from the Advanced Options Window.

Figure 64. The Advanced Options Window

You may open the Advanced Options window in one of two ways:

• From the Set Up Learning menu, choose Options; or

• From the Discipulus Project Setup Wizard, select Options. See Using
the Project Setup Wizard on page 36.

You can accomplish the following in the Advanced Options Window:

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 113

1. Choose between Stepping Mode and Fixed Mode for your project
(these modes determine how Discipulus sets the duration of each
run in your project);

2. Choose detail parameters that control the duration of each run in

your project;

3. Limit the total number of runs in your project;

4. Determine which output files to generate while the project is

running; and

5. Set the target parameters for the individual runs and determine
whether each run’s parameters will be randomized.

For more information about setting project level parameters, see

Controlling Discipulus Projects on page 75.

For more information about setting the target parameters for single runs
in a project, see The Single Run Advanced Options Window on
page 113.

For more information about randomizing parameters during a project,

see The Randomize Parameters Window on page 117.

The Single Run Advanced Options Window

The Discipulus parameters that apply to a single run appear in the Single
Run Advanced Options Window. The following topics are treated here:

• How to Open the Single Run Advanced Options Window on page 113;

• How the Single Run Parameters Affect a Project on page 114; and

• How to Use the Single Run Advanced Options Window on page 115.

How to Open the Single Run Advanced Options Window

To get to the Single Advanced Options Window, make the following
menu selections:

• From the Set Up Learning Menu, select Options. The Advanced

Options Window will pop up.

DiscipulusTM Software Owner’s Manual

Page 114 Discipulus Window Workspaces

• On the Advanced Options Window, click the "Set" button in the

Single Run Parameters panel.

The Single Run Advanced Options Window will pop up. It looks like
Figure 65:

Figure 65. The Single Run Advanced Options Window

To navigate around the Single Run Advanced Options Window, just

select the appropriately labeled tab and access the features you want.

How the Single Run Parameters Affect a Project

Usually, a project includes many individual runs. How the single run
parameters affect a project depends on whether you choose to
randomize parameters (see The Randomize Parameters Window on
page 117). This works as follows:

• If you randomize any single run level parameters, then Discipulus

will create runs in that project that are randomized around the single
run parameter setting on the Single Run Advanced Options Window.
So, for example, if the population size in the Single Run Advanced
Options Window is 500 and if population size is set to be randomized

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 115

in the Advanced Options Window, then a project will consist of many

runs that are randomized around population size 500.

• If you do not randomize any run level parameters, then the resultant
Discipulus project will consist of many runs using the exact
parameters set in the Single Run Advanced Option page.

See The Randomize Parameters Window on page 117 for how to

randomize run level parameters during a project.

How to Use the Single Run Advanced Options Window

The Single Run Advanced Options page is divided into seven tabs that
divide the parameters up into workable groups. Detailed information
about the parameters contained in the Single Run Advanced Options
Window may be found in the following locations:

Run Control Parameters:

• Setting the Random Seed on page 171

Genetic Programming Parameters

• Accessing Genetic Programming Parameters on page 150

• Basic Genetic Programming Parameters on page 151

• Advanced Genetic Programming Deme Parameters on page 153 and

• Advanced Crossover and Mutation on page 165

Fitness Function Parameters

• Accessing Fitness Measurement Parameters after the Project Wizard

is Complete on page 176

• For Regression Problem Types

* Fitness Measures for Regression Problems on page 177

• For Classification Problem Types

* How the Hits-then-Error Fitness Function Works on page 181

DiscipulusTM Software Owner’s Manual

Page 116 Discipulus Window Workspaces

• For Ranking Problem Types

* The Four Ranking Fitness Functions on page 184

* Best ROC Curve (Compare) Fitness Function for Ranking

Problems on page 184

* Best ROC Curve then Cost Fitness Function for Ranking Problems
on page 185

* Minimum Cost Fitness Function for Ranking Problems on

page 185

• For Logistic Regression Problem Types

* Fitness Measure for Logistic-Regression Binary Target Output

Problems on page 186

• For Custom Fitness Functions

* Custom Fitness Functions on page 187

• Other Useful Fitness Function Parameters

* Dynamic Subset Selection on page 158

* Parsimony Pressure on page 162

The Function and Terminal Set Parameters

• Choosing the Terminal Set on page 192

• Weighting the Terminal Set on page 196

• Choosing the Function Set on page 197

• Weighting the Function Set on page 198 and

• Program Size on page 169

DiscipulusTM Software Owner’s Manual

Discipulus Window Workspaces Page 117

The Randomize Parameters Window

The Randomize Parameters Window lets you determine which, if any,
run level parameters a project should randomize. You get to this
window from the Advanced Options Window, by clicking on the button
labeled "Randomize."

Figure 66. The Randomize Parameters Window

Check each parameter you want to have randomized. Discipulus will use
the target value set in the Single Run Advanced Options page for that
parameter (see The Single Run Advanced Options Window on page 113)
and will randomize run parameters during your project around that
target value.

DiscipulusTM Software Owner’s Manual

Page 118 Discipulus Window Workspaces

By default, the random seed for parameter randomization is set off the
system clock. This is a different random number generator than is used
for the project.

The Interactive Evaluator Window

Interactive Evaluator is a powerful software engineering tool that goes
to work after a project is over. It lets you look at programs evolved by
Discipulus, edit them, simplify them, optimize them, save and load
them, and then explore the effects of your changes. It has both manual
and automatic features for simplifying your evolved programs.

A Program Queue saves the programs you create in Interactive

Evaluator. You can move back and forth through these programs in the
Queue.

Interactive Evaluator also gives you flexibility in importing and

exporting programs evolved by Discipulus. Any of the programs in the
Best Programs report may be sent to the Interactive Evaluator. You may
save programs you have created in Interactive Evaluator. Finally,
Interactive Evaluator lets you save and reuse evolved programs that you
have changed.

You may also use Interactive Evaluator to test evolved programs on

applied data (see Training, Validation, and Applied Data on page 205) –
that is, data that were not included in the run.

An entire section is devoted to Interactive Evaluator (see Interactive

Evaluator on page 119) and we refer the user to that section for further
information.

DiscipulusTM Software Owner’s Manual

Page 119

Interactive Evaluator
Interactive Evaluator is a powerful software engineering tool that goes
to work after a project is over. It lets you look at evolved programs, edit
them, simplify them, optimize them, and then explore the effects of your
changes. Some of the highlights of the Interactive Evaluator are:

• Manual and automatic editing features let you simplify your evolved
programs (see Editing a Program in Interactive Evaluator on
page 131, Automatic Intron Removal in Interactive Evaluator on
page 144, and Automatic Simplification in Interactive Evaluator on
page 145);

• The Performance Box lets you track the effect of your edits on
fitness, hit-rate and program statistics (The Performance Box in
Interactive Evaluator on page 130);

• A Program Queue saves the programs you create in Interactive

Evaluator (The Initial Interactive Evaluator Program Queue on
page 123). You can move back-and-forth through these programs in
the Queue;

• Disk Operations options give you flexibility in loading and saving

programs evolved by Discipulus. In fact, the programs you and
Discipulus create in Interactive Evaluator may be saved in either a
proprietary *.ind format or as C++ , C Sharp, Java, Delphi, and Intel
Inline Assembler files (Saving and Loading Evolved Programs in
Interactive Evaluator on page 127). Programs saved with the *.ind
format may be reloaded into Discipulus Interactive Evaluator at any
later time;

• One click startup of the Solution Analytics application gives you

access to a powerful suite of graphic and analytic tools for the
programs sent to or created in Interactive Evaluator (See Opening
Solution Analytics from Interactive Evaluator on page 122).

• Finally, you may also use Interactive Evaluator to test evolved

programs on applied data (see Training, Validation, and Applied
Data on page 205) – that is, data that were not included in the run.

DiscipulusTM Software Owner’s Manual

Page 120 Interactive Evaluator

Of course, your use of these options, like all the other features of
Discipulus, must be consistent with the Discipulus License Agreement
that applies to your version of Discipulus.

The following topics contain additional information about the use of

Interactive Evaluator:

• Opening Interactive Evaluator on page 121;

• The Initial Interactive Evaluator Program Queue on page 123;

• Saving and Loading Evolved Programs in Interactive Evaluator on

page 127;

• Calculating the Fitness of a Program in Interactive Evaluator on

page 129;

• Viewing the Outputs of a Program from Interactive Evaluator on

page 130;

• The Performance Box in Interactive Evaluator on page 130;

• Editing a Program in Interactive Evaluator on page 131;

• Choosing Instructions and Parameters While Editing Programs in

Interactive Evaluator on page 133;

• Optimizing Constants of Evolved Programs in Interactive Evaluator

on page 138;

• Controlling the Speed and Intensity of Constant Optimization in

Interactive Evaluator on page 140;

• Combining Constant Optimization with Manual Program

Simplification in Interactive Evaluator on page 141;

• Automatic Intron Removal in Interactive Evaluator on page 144;

• Automatic Simplification in Interactive Evaluator on page 145; and

• Practice Tips for Interactive Evaluator on page 147.

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 121

Opening Interactive Evaluator

You may open Interactive Evaluator in two different ways:

1. Open Interactive Evaluator with no program loaded into it as

follows: On the Interactive Evaluator Menu, click Start. You may
wish to do this if you have saved programs from Interactive
Evaluator that you want to reuse.

2. Open Interactive Evaluator with a program loaded in as follows.

From the Best Programs Tab of the Reports Window, select a
program and click Analyze Program. Interactive Evaluator will
open up and the selected program will be in Interactive Evaluator.

After you open it, Interactive Evaluator looks like Figure 67 for
regression problem types.

Figure 67. The Interactive Evaluator Main Window for a

Regression Problem

On the other hand, Interactive Evaluator looks like

DiscipulusTM Software Owner’s Manual

Page 122 Interactive Evaluator

Figure 68. Interactive Evaluator Window for Ranking Problem

Types

You will note that the statistics reported in Interactive Evaluator in the
upper right hand corner vary depending on the problem type.

Opening Solution Analytics from Interactive Evaluator

Open the Solution Analytics application by clicking on the "Start
Solution Analytics" button. Whatever program is active in the Program
Queue and shown in the program body at that time will be loaded
automatically into Solution Analytics for graphing and analysis.

For Regression problem types, Solution Analytics provides:

• Target Output vs. Predicted output plots and statistics, including Q-Q
plots

• Residuals plots and analysis, including normal probability plot.

For Classification problem types, Solution Analytics provides:

• Confusion Matrices

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 123

• ROC Curve

• Pseudo ROC Curve

• Binned Probability Plots

For Ranking and Logistic Regression problem types, Solution

Analytics provides.

• ROC Curve

• Pseudo-ROC Curve

• Binned Probability Plots

Using the Interactive Evaluator Program Queue

The Program Queue is a sequence of programs in Interactive Evaluator
that you may move through, both forward and backward. These
programs enter the Queue: (1) As you load programs into Interactive
Evaluator; and (2) As you create new programs using the Interactive
Evaluator tools.

The following topics contain additional information about the

Interactive Evaluator Program Queue.

• The Initial Interactive Evaluator Program Queue on page 123;

• Moving Around in the Interactive Evaluator Program Queue on

page 124;

• What Happens to the Interactive Evaluator Program Queue When

You Load Programs into Interactive Evaluator? on page 125; and

• What Happens to the Interactive Evaluator Program Queue When

You Make Changes to the Program Displayed in the Program Body
WIndow on page 125.

The Initial Interactive Evaluator Program Queue

When you first open Interactive Evaluator, the program you transferred
to Interactive Evaluator from the Best Programs Tab of the Reports

DiscipulusTM Software Owner’s Manual

Page 124 Interactive Evaluator

Window loads automatically into the Program Body Window. That

program is the only evolved program in the Queue at that time.

Moving Around in the Interactive Evaluator Program Queue

Discipulus stores the evolved programs you load or create while using
Interactive Evaluator in the Program Queue. You can move back-and-
forth among those stored programs using the Back and Forward Queue
buttons (these are the two buttons with right and left facing triangles to
the right of the Program Body Window).

Figure 69 and Figure 70 illustrate how the use of the Back and Forward
Queue buttons change which program from the Queue is displayed in
the Program Body Window. (The shaded boxes show which program
from the Queue is displayed in the Program Body Window.)

Figure 69. The Program Queue, Before and After You Click on
the Back Queue Button

Figure 70. The Program Queue, Before and After You Click on
the Forward Queue Button

As you browse back-and-forth through the Queue, you may notice a

pause before Discipulus displays the program you are browsing to in the
Program Body Window. This is to permit Discipulus to calculate the

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 125

fitness (and for classification problems, the hit-rate) for that program. If
you are using a large training set, you may find that the pauses are quite
noticeable because the fitness calculation is time consuming.

You will see the changes in fitness (and hit-rate) among the different
programs in the Queue as you browse through the Queue. These fitness
and hit-rate values appear in the Performance Window (located in the
upper right portion of the Interactive Evaluator Window).

What Happens to the Interactive Evaluator Program Queue When

You Load Programs into Interactive Evaluator?
You may load existing programs into Interactive Evaluator. Examples of
loading a program are:

• Transferring a program from the Best Programs Tab of the Reports

Window to Interactive Evaluator.

• Loading a previously saved program.

Regardless how you load a program, Discipulus adds the new program
to the end of the Queue.

What Happens to the Interactive Evaluator Program Queue When

You Make Changes to the Program Displayed in the Program
Body WIndow
As you use the Interactive Evaluator, Discipulus adds and deletes
programs from the Queue. This happens each time you click the Add,
Edit, Remove, Remove Introns, Simplify or Optimize buttons in
Interactive Evaluator.

As you make such changes, two types of shifts will occur in the Queue.
Which of these shifts occurs depends on where you are in the Program
Queue when you make such a change: For example, Changing a
Program from the End of the Interactive Evaluator Program Queue on
page 125 is different than Changing a Program from the Beginning or
Middle of the Interactive Evaluator Program Queue on page 126.

Changing a Program from the End of the Interactive Evaluator Program

Queue
The Queue changes each time you click the Add, Edit, Remove,
Remove Introns, Simplify or Optimize buttons in Interactive Evaluator.

DiscipulusTM Software Owner’s Manual

Page 126 Interactive Evaluator

When you make changes to the program that is at the end of the Queue,
the new program(s) you just created is appended to the end of the Queue
and the program that was in the Program Body window before you
made the changes is moved toward the beginning of the Queue.

Figure 71, Figure 72, and Figure 73 illustrate how Discipulus changes
the Queue when you make changes to the program at the end of the
Queue. (The shaded box shows which program from the Queue that is
displayed in the Program Body Window.)

Figure 71. The Queue, Just after You Edit, Remove, or Add an
Instruction to the Last Program in the Queue

Figure 72. The Queue, Just after You Optimize the Constants in
the Last Program in the Queue

Figure 73. The Queue, Just after You Simplify the Last Program
in the Queue

Changing a Program from the Beginning or Middle of the Interactive

Evaluator Program Queue
When you change a program from the beginning or middle of the
Queue, Discipulus operates slightly differently. Discipulus deletes all
programs that are later in the Queue than the program you changed.
Then it adds the new, changed program(s) to the end of the Queue.

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 127

Figure 74 illustrates how this works when you edit a program that is in
the middle of the Queue. (The shaded boxes show which program from
the Queue are displayed in the Program Body Window.)

Figure 74. The Queue, Before and After You Edit a Program
from the Middle of the Queue

Saving and Loading Evolved Programs in Interactive

Evaluator
The purpose of the Interactive Evaluator module is to allow you to
evaluate, modify, and simplify existing programs evolved by Discipulus.
From Interactive Evaluator, you may save and load your evolved
programs and your modifications in various ways and in various
formats.

The following topics contain additional information about saving and

loading programs in Interactive Evaluator:

• Saving Evolved Programs from Interactive Evaluator on page 127;

and

• Loading Evolved Programs into Interactive Evaluator on page 128.

Saving Evolved Programs from Interactive Evaluator

Interactive Evaluator provides two ways to save evolved programs,
depending on how you intend to use them later. They are:

• Saving Evolved Programs for Later Use in Interactive Evaluator on

page 128; and

DiscipulusTM Software Owner’s Manual

Page 128 Interactive Evaluator

• Saving Evolved C, C Sharp, Java, Delphi, or Assembler Code for

Compilation into other Programs on page 128.

Saving Evolved Programs for Later Use in Interactive Evaluator

In Interactive Evaluator, the evolved program in the Program Body
Window may be saved to hard disk. To do so:

• Click on the Save Program button in Interactive Evaluator;

• Choose where and under what name you choose to save the current
program in the Program Body Window;

• Click OK.

Saved program files are named: “*.ind.” You may later reload this
program into Interactive Evaluator. (See Loading a Saved Program on
page 129.)

Saving Evolved C, C Sharp, Java, Delphi, or Assembler Code for

Compilation into other Programs
In Interactive Evaluator, evolved programs in the Program Body
Window may be saved as C, inline Assembler, or Java files. To do so:

• Click on the appropriately named button in Interactive Evaluator.

You may choose the location and the name for the file in the dialog
box that appears.

Some notes on the saved source code files:

• The saved program code may be compiled an appropriate compiler;

• The saved assembler file is Intel 486 assembler. It will not run on
machines that are not compatible with a Pentium Pro or higher chip.

• If you are running your evolved programs on Intel processors,

consider using the assembler version. It is faster and does not involve
rounding errors in making its computations.

Loading Evolved Programs into Interactive Evaluator

When you load a program into Interactive Evaluator, the C Code
representing that program will be displayed in the Program Body

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 129

Window. There are two different ways an evolved program may be

loaded into the Interactive Evaluator Module. They are:

• Default Program Load on page 129;

• Loading a Saved Program on page 129.

Default Program Load

When you activate Interactive Evaluator from the Best Teams Tab or the
Best Programs Tab of the Reports Window, the program selected on that
tab transfers over to Interactive Evaluator.

Loading a Saved Program

Interactive Evaluator lets you save programs you have modified. (See
Saving Evolved Programs from Interactive Evaluator on page 127.) The
programs are saved in a file bearing the *.ind extension. You may load
these saved *.ind files back into Interactive Evaluator as follows:

• Click on the Load Program button in the Interactive Evaluator

Window.

• Browse to and highlight the program you wish to load. It will have
the file extension "*.ind."

• Click OK.

The evolved program you have selected will appear in the Program
Body Window in Interactive Evaluator.

Calculating the Fitness of a Program in Interactive

Evaluator
You may determine the fitness of the evolved program shown in the
Program Body Window as follows:

• In Interactive Evaluator, click on the Run button.

Doing this will cause the appropriate fitness figures and statistics to be
displayed in the Performance Box. The figures displayed will depend on
the problem type in your project. (see Figure 67):

DiscipulusTM Software Owner’s Manual

Page 130 Interactive Evaluator

In addition, the change in fitness and other statistics between the last
time you clicked Run and the most current time you click Run are
shown in parentheses. This changed figure appears in red if the fitness
(or hit-rate) has gotten worse and in black if better.

Viewing the Outputs of a Program from Interactive

Evaluator
When you click on View Results in Interactive Evaluator, Discipulus
displays the outputs of the program in the Program Box of Interactive
Evaluator as the "Selected Program" in the Data Window. (For more on
the Data Window, see The Data Window on page 101.)

The Performance Box in Interactive Evaluator

The Performance Box in Interactive Evaluator gives you information for
the current program displayed in the Program Body Box. Three types of
information appear in the Performance Box, depending on what type of
problem you are running.

You may see an example of this in Figure 67 (regression problem type)

and Figure 68 (ranking problem type). For this regression problem,
Discipulus informs you that the fitness function is "Mean Squared
Error." For this ranking problem, Discipulus informs you that the fitness
function is "1-Area under the ROC Curve."

On the other hand, for a classification problem, the fitness will be

labeled "Hit Rate" and for a logistic regression problem, the fitness will
be labeled "-2*Log-Likelihood."

Viewing Changes in the Fitness and other Statistics as You

Browse through the Interactive Evaluator Queue
The fitness and statistics calculated for the current program in the
Program Body window is always shown in the Performance Box, which
you may find in the upper right part of the Interactive Evaluator
Window.

As you browse through the Queue, you will also notice numbers in
parentheses below the fitness values displayed. These numbers represent
the change in fitness and other reported statistics as between the current
evolved program displayed and the previous evolved program displayed.

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 131

Black parenthetical values represent improved fitness while red

parenthetical values represent worse fitness. If no value appears, that
means there was no change in fitness.

As you browse back and forth through the Queue, you may notice a
short pause before the next program displays. This is to permit
Discipulus to calculate the fitness (and for classification problems, the
hit-rate) for the new program displayed in the Queue. (If you are using a
very large training set, you may find that the pauses are quite noticeable
because the fitness calculation is time-consuming.)

Editing a Program in Interactive Evaluator

You may edit the program that appears in the Program Body Window of
Interactive Evaluator. Editing may include: adding a line of code,
removing a line of code, or changing a line of code. In other words,
Interactive Evaluator is a compact programming environment. The
programs you create here by editing may be saved, loaded, and saved as
decompiled code in the same manner as a program created completely
by Discipulus.

The following topics contain additional information about editing

programs in the Interactive Evaluator Window:

• Add a Line of Code to the Interactive Evaluator Program on

page 131;

• Remove a Line of Code from the Interactive Evaluator Program on

page 132;

• Change a Line of Code in the Interactive Evaluator Program on

page 132;

• Effect of Editing a Program on the Interactive Evaluator Program

Queue on page 133;

• Choosing Instructions and Parameters While Editing Programs in

Interactive Evaluator on page 133;

Add a Line of Code to the Interactive Evaluator Program

To add a line of code to the program shown in the Program Body
Window, do the following:

DiscipulusTM Software Owner’s Manual

Page 132 Interactive Evaluator

• Highlight the line of code in the Program Body Window just above
the place you want the new code to appear;

• Click Add;

• The Edit Instruction Box pops up. In it, you may select among all
available instructions and set their parameters (for more information
about selecting instructions and parameters, see Selecting Among
Available Instructions in Interactive Evaluator on page 134); and

• Click OK.

The line of code will be added to your program.

Remove a Line of Code from the Interactive Evaluator Program

To remove a line of code from the Program Body Window, do the
following:

• Highlight the line of code you wish to remove; and

• Click on the Remove button.

The selected line of code will disappear.

Change a Line of Code in the Interactive Evaluator Program

To change an existing line of code in the Program Body Window, do the
following:

• Highlight the line of code you wish to change;

• Click on the Edit button (or you may replace these first two steps by
double clicking on the line of code you wish to edit);

• The Edit Instruction Box pops up. In it, you may select among all
available instructions and set their parameters. Select the instruction
you want and choose the parameter(s) you want (for more
information about selecting instructions and parameters, see Selecting
Among Available Instructions in Interactive Evaluator on page 134);
and

• Click OK.

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 133

The line of code you just selected will replace the original line of code.

Effect of Editing a Program on the Interactive Evaluator Program

Queue
Each time you change a program, the new program containing the
changed instruction will be added as the last program in the Queue. The
old program (the one from before you made your edit) will be moved to
the next-to-last position in the Queue. Accordingly you can return to the
previous program (before the edit) by clicking on the Queue Back
button.

You can find out more about how to use the Queue in the section
entitled Using the Interactive Evaluator Program Queue on page 123.

Choosing Instructions and Parameters While Editing

Programs in Interactive Evaluator
When you add an instruction (Add a Line of Code to the Interactive
Evaluator Program on page 131) or change an instruction (Change a
Line of Code in the Interactive Evaluator Program on page 132) in the
current Interactive Evaluator program, you may have perform two
different tasks:

• Select among different available instructions -- e.g., you might have

to choose between a plus and a minus instruction; or

• Choose which parameter to use with a particular instruction -- e.g.

you might have to choose a constant value for an instruction.

The following topics provide information about instructions and

parameters in Interactive Evaluator:

• Selecting Among Available Instructions in Interactive Evaluator on

page 134;

• Selecting Parameters for an Instruction in Interactive Evaluator on

page 135; and

• Types of Instructions and Types of Parameters in Interactive

Evaluator on page 136.

DiscipulusTM Software Owner’s Manual

Page 134 Interactive Evaluator

Selecting Among Available Instructions in Interactive Evaluator

When you click on the Edit or Add buttons in Interactive Evaluator, the
Edit Instruction Box pops up. This box lets you select the instruction
you want and, where appropriate, the parameters for that instruction.
This section describes how to select among available instructions.

• In the Edit Instruction Box, there is a pull-down, combo box that

provides a complete list of all instructions you may choose (in
alphabetical order). Page down to the instruction you want and click
on it; and

• Click OK.

The instruction you just selected will now be shown in the Program
Body Window.

Figure 75. The Edit Instruction Box with the Pull-Down

Instruction Menu Opened

The section highlighted in magenta shows the C code of the instruction

in its initial or, after editing, current state.

The "Instruction" section highlighted in orange is a drop down box of all

available instructions from which you may select.

The orange highlighted "Instruction" list in the Edit Instruction Window

down box contains the instructions you may choose from. The
instructions are set out in assembler code. However, you may find it
helpful that the C code equivalent of the assembler instruction appears

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 135

in the upper part of the Edit Instruction Box. (See Figure 75.) The
Discipulus Owner's Manual contains extensive documentation of the
available instructions, what they are and what function they perform.
(See Instruction Set Reference on page 215) and of the types of
instructions you may encounter in this box (See Types of Instructions
and Types of Parameters in Interactive Evaluator on page 136).

Selecting Parameters for an Instruction in Interactive Evaluator

When you click on the Edit or Add buttons in Interactive Evaluator, the
Edit Instruction Box pops up. This section tells you how to select
parameters for an instruction in that Edit Instruction Box.

The section highlighted in light blue is a drop down box of all

parameters you can select for the chosen instruction. So in Figure 75,
FMUL is the currently selected instruction. It multiplies the value in
register 0 with the value in a variable register. The parameter "3" selects
register 3 as the selected register. If you wanted to use a different
register, you would click on the drop-down box and choose from the
available parameters.

The Parameter Box provide a complete list of all parameters available

for the instruction that you have selected. For instructions that do not
have parameters, the box is empty. If parameters are available, simply
page down to the parameter you want and click on it.

These instructions and parameters are set out in assembler code.

However, you may find it helpful that the C code equivalent of the
currently chosen assembler instruction appears in the upper part of the
Edit Instruction Box (highlighted in magenta in Figure 75). The
program Discipulus Owner's Manual contains extensive documentation
of the available instructions and what function they perform. (See
Instruction Set Reference on page 215) and of the types of parameters
you may encounter in this box (See Types of Instructions and Types of
Parameters in Interactive Evaluator on page 136).

What instruction you have chosen determines what parameters are

available. The following section describes the types of parameters
available for different instructions and what they mean.

DiscipulusTM Software Owner’s Manual

Page 136 Interactive Evaluator

Types of Instructions and Types of Parameters in Interactive

Evaluator
Different types of instructions accept different types of parameters. In
this regard, Discipulus’ instructions may be classified into four types of
instructions:

• Instructions that Accept a Constant as a Parameter on page 136;

• Instructions that Accept an Input as a Parameter on page 136;

• Instructions that Accept a Register as a Parameter on page 137; and

• Instructions that Have No Parameters on page 138.

Instructions that Accept a Constant as a Parameter

Four of the available instructions accept a real valued constant as a
parameter. When you select such an instruction, the parameter combo-
box suggests a value for the constant. You have two choices here:

• You may accept the suggested value by clicking on OK; or

• You may type a real number into the Parameter Box and click on OK.

The real value you typed in will become the constant parameter for the
instruction. The Instructions that accept a constant as a parameter are:

• FADD constant;

• FDIV constant;

• FMUL constant;

• FSUB constant.

You may find detailed information about each of these instructions and
how they work. See Instruction Set Reference on page 215.

Instructions that Accept an Input as a Parameter

Many Discipulus instructions accept an input from your data set as a
parameter. In this case, when you select such an instruction, a number
corresponding to each of the available independent variables in your

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 137

data set appears in the parameter combo-box. Select the appropriate

input from that list.

The instructions that accept an input from your data set as a parameter
are:

• FADD [ESI+%d1];

• FDIV [ESI+%d1];

• FMUL [ESI+%d1]; and

• FSUB [ESI+%d1].

Although the bracketed materials in these instructions look complex,

they just represent the addresses of the inputs from your data file. You
may find detailed documentation of each of these instructions. (See
Instruction Set Reference on page 215.)

Instructions that Accept a Register as a Parameter

A number of available instructions take a register designation as a
parameter. There are eight floating-point registers in a Pentium Chip –
so you may select among eight different registers, labeled 0-7. The
instructions that take a register as a parameter are:

• FADD ST(%0), ST(%r); • FDIV ST(%r), ST(%0);

• FADD ST(%r), ST(%0); • FLD ST(%r);

• FCMOVB ST(%0), ST(%r); • FMUL ST(%0), ST(%r);

• FCMOVNB ST(%0), ST(%r); • FMUL ST(%r), ST(%0);

• FCOMI ST(%0), ST(%r); • FSUB ST(%0), ST(%r);

• FSUB ST(%r), ST(%0).

• FDIV ST(%0), ST(%r);

In these instructions, the symbol "%r" designates the variable register in

the instruction for which you may select a parameter. The “%0”
represents the f[0] register. Thus, the instruction:

FADD ST(%r), ST(%0);

DiscipulusTM Software Owner’s Manual

Page 138 Interactive Evaluator

when parameterized with the value 1, is equivalent to the following C

Code:

f[1]+=f[0];

where f[1] and f[0] represent registers 1 and 0, respectively, of the

Pentium Floating Point Unit. You may find detailed documentation of
each of these instructions and the parameters they accept in this
Discipulus Owner's Manual. (See Instruction Set Reference on
page 215).

Instructions that Have No Parameters

Many of the available instructions take no parameters. They are:

• NOP; • FLDZ;

• F2XM1; • FPREM;

• FABS; • FSCALE;

• FCHS; • FSIN;

• FCOS; • FSQRT;

• JB EPI+6; and
• FDECSTP;
• JNB EPI+6.
• FINCSTP;

You can find detailed documentation of each of these instructions and

the parameters they accept in this Discipulus Owner's Manual. (See
Instruction Set Reference on page 215).

Optimizing Constants of Evolved Programs in

Interactive Evaluator
Discipulus usually evolves programs that contain constants. In addition,
using the Edit or Add features in Interactive Evaluator, you may add
constants to evolved programs or modify these constants. You may
often find it useful to optimize those constants to give better
performance.

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 139

The following topics contain information about optimizing constants in

Interactive Evaluator:

• How to Optimize Constants in Interactive Evaluator on page 139;

• How Discipulus Optimizes Constants in Interactive Evaluator on

page 139; and

• Effect of Optimizing Constants on the Interactive Evaluator Program

Queue on page 140.

How to Optimize Constants in Interactive Evaluator

Discipulus can learn a more optimal set of constants for the program
that appears in the Program Body Window of Interactive Evaluator. To
cause Discipulus to optimize constants, do the following:

• In Interactive Evaluator Window, click on the Optimize button.

Discipulus will attempt to calculate a more optimal set of constants for

that program and display the results to you in the Program Body
window. In addition Discipulus will show you the change in fitness and
hit-rate caused by the optimization in the parentheticals in the
Performance Box (red indicates worse performance and black indicates
better performance).

Many evolved programs from the Genetic Programming module of

Discipulus are already highly optimized, so it is impossible to improve
them further. In that case, you will find that using the optimize feature
has no effect.

How Discipulus Optimizes Constants in Interactive Evaluator

Discipulus optimizes constants using an Evolution Strategies (ES)
algorithm with adaptive step sizes. This is a very powerful and efficient
real valued constant optimizing algorithm.

Four notes on constant optimization using ES:

• It is possible, of course, that the constants are already optimized so

improvement may be impossible;

DiscipulusTM Software Owner’s Manual

Page 140 Interactive Evaluator

• ES is a stochastic algorithm and clicking on Optimize may result in

worse fitness or worse hit-rates. This is not common but it can
happen;

• Because the constant optimization stops after a set number of

generations (you set that figure in the Interactive Evaluator Options
Window), you may find that clicking on Optimize a second time
results in even more improvement; and

• The result in ES is dependent on initial conditions. Thus, if you think

you should be getting better results, try changing one of the existing
constants in the program and optimizing again. This changes in the
initial conditions and may move the optimizing algorithm past local
optima.

Effect of Optimizing Constants on the Interactive Evaluator

Program Queue
Each time you click on the Optimize button, the new program
containing optimized constants will be added as the last programs in the
Queue. The old program (the one from before you clicked on optimize)
will be moved to the next to last position in the Queue. Accordingly you
can return to the previous program (before the edit) by clicking on the
Queue Back button. Figure 72 illustrates how optimizing constants
effects the Program Queue. You can find out more about how to use the
Queue by reviewing the section entitled The Initial Interactive
Evaluator Program Queue on page 123.

Controlling the Speed and Intensity of Constant

Optimization in Interactive Evaluator
Discipulus lets you control the speed and intensity of the constant
optimization in Interactive Evaluator. You do so with the Optimization
slider that appears when you click on the Options button in Interactive
Evaluator. Figure 76 shows the Interactive Evaluator Options Page.

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 141

Figure 76. The Options Page of Interactive Evaluator

For slower but more thorough optimization, slide the Optimization

Slider to the left and vice-versa for faster and but less thorough
Optimization.

Combining Constant Optimization with Manual

Program Simplification in Interactive Evaluator
Interactive Evaluator lets you edit evolved programs manually. (See
Editing a Program in Interactive Evaluator on page 131.) Combining
this feature with constant optimization is often very useful. This section
contains some suggestions about how to do this.

Interactive Evaluator lets you change the structure of evolved programs

manually. While a particular change may alter the shape of the curve
that fits your data in a good way, it may also move that “curve” so that
the fitness of the program is worse than before the change. But if you
optimize the constants after the structural change, the curve may be
shifted properly and give a better fit than before the structural change.

Here are some examples of how you can combine structural changes
with constant optimization to simplify and improve your evolved
programs in Interactive Evaluator:

• Detect Spurious Inputs on page 142; and

• Eliminate Excess Lines of Code on page 142;

DiscipulusTM Software Owner’s Manual

Page 142 Interactive Evaluator

• Eliminate Stacked Constants on page 143; and

• Replacing Complex Operators with Linear Operators on page 143

Detect Spurious Inputs

Evolved programs frequently contain instructions that utilize inputs
from your data set. But the inputs in these instructions can turn out, on
analysis, to be unrelated to the target output. Discipulus seems to use
such inputs as constants.

You can determine if a particular instruction uses an input as a constant

by editing the instruction to replace the input with a constant. After
replacing this instruction, you would click on the Optimize button. If the
fitness of the modified program (containing a constant) is the same or
better than the fitness of the original program (containing the input) after
the constants are optimized, then the input you just replaced is probably
a spurious input. Replacing that input with a constant simplifies your
solution.

Here is an example of finding a spurious input. Using the Edit button in

Interactive Evaluator, replace the following instruction, which contains
an input:

f[0]+=Input024;

with the equivalent instruction that contains a constant (you choose the
constant value to insert). For example:

f[0]+=2.0.

Then click on the Optimize button. (The optimization may well change
the value of the constant, "2.0.") The effect of this change on fitness will
help you determine if Discipulus is using the input, Input024, in this
program in a useful way to predict output.

Eliminate Excess Lines of Code

Evolved programs frequently contain lines of code that have little
practical effect on the output. To test a line of code in an evolved
program, delete it using the Remove button. Then click on Optimize. If
the fitness stays the same or better, you can simplify your program by
eliminating this line of code. In fact, even if the fitness gets only a little

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 143

worse, you may still want to eliminate this instruction in the interest of
simplifying your solution.

Eliminate Stacked Constants

Discipulus often evolves programs that contain sequences of many
instructions, each of which modify a register with constant values. You
can often simplify these programs by deleting these instructions (use the
Remove button) and replacing them with a shorter linear transformation
sequence.

For example, using the Remove, Edit and/or Add buttons in Interactive
Evaluator, you might replace the following sequence of instructions:

f[0]+=3.45678989;
f[0]*=.013457890;
f[0]+=f[0] – (equivalent to multiplying by 2);
f[0]/=3.45678989;
f[0]/=3.45678989;

with the following instructions (you choose the constant values to

insert):

f[0]*=1.000; //this is just to provide a constant to optimize

f[0]+=0.0. //this is just to provide a constant to optimize

Then click the Optimize button. (The optimization will probably change
the value of the two constants you just added.) You may end up with an
optimized and simplified program with fitness as good or better than
before.

Replacing Complex Operators with Linear Operators

You will find that complex operators like Square Root, Sine, Log are
often used by Discipulus in a manner that may be very closely
approximated by a linear transformation. In that case, the algebraic
simplicity of evolved programs is greatly improved by using the linear
approximation, instead of the complex operator.

To test whether Discipulus has used a complex instruction as a linear

transformation, replace the complex operator in the evolved program
with a linear transformation.

DiscipulusTM Software Owner’s Manual

Page 144 Interactive Evaluator

Here is an example of this process. Using the Remove, Edit and/or Add
buttons in Interactive Evaluator, replace the following complex
instruction:

f[0]=sqrt(f[0]);

with the following two instructions, (you choose the constant values to
insert):

f[0]*=1.000;
f[0]+=0.000.

These replacement instructions comprise a simple (but initially neutral)

linear transformation to the register, f[0].

After completing the replacement described above, click the Optimize

button. (The optimization may well change the value of the two
constants you just added.) You will frequently end up with an optimized
program that has fitness as good or better than before. And, the
optimized program may be algebraically simpler and will execute faster
than it did before the replacement.

Automatic Intron Removal in Interactive Evaluator

The term “Introns,” in Genetic Programming refers to program
instructions that are included in evolved programs that have little or no
effect on the output of the evolved program.

The button labeled “Remove Introns” in the Interactive Evaluator

executes a proprietary algorithm that simplifies your code by removing
such instructions. In particular, you may find that, not only will your
programs get shorter, the fitness often increases.

After the removal process, Discipulus adds the newly simplified

program as the last program in the Queue. The old program (the one
from before you simplified) will be moved to the next to last position in
the Queue. Accordingly you can return to the previous program (before
the simplification) by clicking on the Queue Back button.

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 145

Removing introns has the same effect as optimizing constants on the

Queue. Figure 72 illustrates this effect. You can find out more about
how to use the Queue by reviewing the section entitled The Initial
Interactive Evaluator Program Queue on page 123.

In general, we recommend using the "Remove Introns" button on a

program before deploying it for real world use.

Automatic Simplification in Interactive Evaluator

The Simplify button in the Interactive Evaluator Window is a powerful
tool. It activates one of two proprietary algorithms (“Standard” and
“Thorough”) that can often greatly simplify the programs Discipulus has
evolved. To simplify your programs automatically, just click on the
Simplify button in Interactive Evaluator.

The Simplify button can often reduce the size of an evolved program by
up to 90%, without losing fitness. Plus, the simpler program is
algebraically much simpler than the original program and executes
much faster. But automatic simplification is very computationally
intensive. You should be prepared to let your computer run only on the
simplification process over lunch (or even longer) when you click on the
Simplify Button.

The following additional topics contain additional information about

automatic simplification:

• Configuring Automatic Simplification in Interactive Evaluator -- the

Use of the Options Page on page 145; and

• The Effect of Automatic Simplification on the Interactive Evaluator

Program Queue on page 146.

Configuring Automatic Simplification in Interactive Evaluator --

the Use of the Options Page
To speed up Automatic Simplification, use the two sliders on the
Options Page of Interactive Evaluator. (To get to the Options Page, click
on the Options Button in Interactive Evaluator.) Figure 76 shows the
Options Page. Moving either of the sliders on the Options Page toward
“Faster” will speed up simplification by making the algorithm work less
intensely. Moving either of these sliders toward “Slower” will do the
reverse.

DiscipulusTM Software Owner’s Manual

Page 146 Interactive Evaluator

The Effect of Automatic Simplification on the Interactive

Evaluator Program Queue
Each time you click on the Simplify button, three new programs
containing optimized constants will be added as the last three programs
in the Queue. They are:

• The last program in the Queue is the best program at the end of the
simplification process; and

• The preceding program in the Queue is the best program found

during simplification process on the validation set.

• The preceding program in the Queue is the program that remains

after all simplification steps have been completed.

The old program (the one from before you clicked on optimize) will be
moved to the preceding position in the Queue. Accordingly you can
return to the previous program (before the edit) by clicking on the
Queue Back button three times. Figure 73 illustrates this operation of
the Queue.

DiscipulusTM Software Owner’s Manual

Interactive Evaluator Page 147

Practice Tips for Interactive Evaluator

The most important thing to remember about Interactive Evaluator is
that it is interactive. You have manual and automatic tools. You can use
either or both of them over-and-over again on a single program in any
sequence you choose. Test various sequences. Two examples of
sequences that are repeatedly useful are:

• If the automatic algorithms (Remove Introns, Automatic

Simplification, and Constant Optimization) seem stuck, make a
sensible change to the program manually and then try the automatic
simplification again. The change can be as simple as a small change
in a constant in a single instruction or the removal of several
instructions. What change is appropriate depends on the program that
confronts you at the time. But try making a change, then optimize
constants, and, optionally remove introns or simplify. If the result is
bad, you can always return to your starting point in the Queue.

• Once the automatic simplification or intron removal has had some

effect, look at the program carefully. You often will find
simplifications that the automatic algorithm missed. Make that
simplification, optimize constants and then start a new round of
intron removal and/or automatic simplification.

DiscipulusTM Software Owner’s Manual

Page 148 Interactive Evaluator

DiscipulusTM Software Owner’s Manual

Page 149

Genetic Programming Parameters

NOTE: The default settings for a Discipulus project work quite well for
most projects. In fact, Discipulus automatically sets, randomizes, and
optimizes the Genetic Programming parameters for the runs that
comprise a project. Thus, the matters covered in this chapter should be
considered advanced subject matters that most users need not consider.

Genetic Programming is a powerful automatic learning technique. The

Genetic Programming Reference on page 201 contains a detailed
description of the Genetic Programming algorithm.

The following additional topics are addressed here regarding Genetic

Programming parameters:

• Accessing Genetic Programming Parameters on page 150

• Basic Genetic Programming Parameters on page 151 including

* Genetic Programming: Population Size on page 151

* Genetic Programming: Mutation Rate on page 151

* Genetic Programming: Crossover Rate on page 152 and

* Genetic Programming: Reproduction Rate on page 153

• Advanced Genetic Programming Deme Parameters on page 153

including:

* Genetic Programming Demes: Enabled/Not Enabled Check Box

on page 154

* Genetic Programming Demes: Number of Demes on page 154

* Genetic Programming Demes: Crossover Percentage Between

Demes on page 154 and

* Genetic Programming Demes: Migration Rate Between Demes on

page 155

DiscipulusTM Software Owner’s Manual

Page 150 Genetic Programming Parameters

Accessing Genetic Programming Parameters

You may access the Genetic Programming parameters from the Genetic
Programming Page of the Single Run Advanced Options Window.

To do so, make the following menu selections:

• On the Set Up Learning menu, click Options;

• The Advanced Options Window appears. Click on the Set button.

• Click on the Genetic Programming Tab (Figure 77).

At this point, the Genetic Programming Page of the Advanced Options

Window appears:

Figure 77. The Genetic Programming Page of the Advanced

Options Window

Here, you will find all of the parameters that are particular to Genetic
Programming together on one page.

DiscipulusTM Software Owner’s Manual

Genetic Programming Parameters Page 151

Basic Genetic Programming Parameters

The basic Genetic Programming parameters are:

• Genetic Programming: Population Size on page 151

• Genetic Programming: Mutation Rate on page 151

• Genetic Programming: Crossover Rate on page 152 and

• Genetic Programming: Reproduction Rate on page 153

Genetic Programming: Population Size

The population size parameter sets the number of programs in the
population that Discipulus will evolve.

There is no upper limit on population size in the program. The

maximum size population you may use will be determined by the
amount of RAM on your computer and the maximum length of the
programs in the population.

Generally speaking, a run will take longer with a larger population. But,
also generally speaking, a larger population can solve more difficult
problems. One of the big advantages of Discipulus over other learning
systems is that Discipulus is fast enough to evolve very large
populations in realistic time frames.

Populations of 500,000 have been successfully evolved on desktop PC’s

that have 256 Megabytes of RAM. This is a very large population by
Genetic Programming standards. However, before you use huge
populations, you should try much smaller populations because
Discipulus is often able to solve difficult problems with populations of
100 to 1000.

You may access this parameter from the Genetic Programming Tab of
the Advanced Options Window (Figure 77). See Accessing Genetic
Programming Parameters on page 150.

Genetic Programming: Mutation Rate

Mutation is one of the principal "search operators" used to transform
programs in the Genetic Programming algorithm. Mutation causes

DiscipulusTM Software Owner’s Manual

Page 152 Genetic Programming Parameters

random changes in programs from the population that have won a

tournament.

The "Mutation Frequency" parameter sets the overall probability of

mutation of the programs that have been selected as winners in a
tournament by Discipulus.

Once the overall mutation rate is set, the particulars of the application of
the mutation operator are controlled by other parameters. Those
particulars are discussed in Advanced Mutation Parameters on
page 165.

The allowable range for the Mutation Frequency parameter is 0% to

100%. Although most genetic programming systems use a very low
mutation rate, it is our experience that Discipulus benefits from a much
higher mutation rate. We use a "Mutation Frequency" setting of 90% on
many of our runs.

Technical note: The mutation operator is applied probabilistically to all

programs who have won tournaments. Thus, it is applied regardless
whether a tournament winner has also been selected for crossover.

You may access this parameter from the Genetic Programming Page of
the Advanced Options Window (Figure 77). See Accessing Genetic
Programming Parameters on page 150.

Genetic Programming: Crossover Rate

Like mutation, crossover is one of the principal "search operators" used
in genetic programming to transform programs in the population.
Crossover operates by exchanging sequences of instructions between
two tournament winners. The result of that exchange produces two
offspring that are then inserted into the population in place of the losers
in the tournament.

The "Crossover Frequency" parameter sets the overall probability that

crossover will occur between the two winners in a tournament by
Discipulus. The allowable range for this parameter is 0% to 100%.

Once the overall crossover rate is set, the particulars of the application
of the crossover operator are controlled by other parameters. Those
particulars are discussed in Crossover in Genetic Programming on
page 204.

DiscipulusTM Software Owner’s Manual

Genetic Programming Parameters Page 153

Although most genetic programming systems use a very high crossover

rate, it is our experience that Discipulus (and most other Genetic
Programming systems) benefits from a lower crossover rate. We use a
“Crossover Frequency” setting of 50% on many of our runs.

You may access this parameter from the Genetic Programming Page of
the Advanced Options Window (Figure 77). See Accessing Genetic
Programming Parameters on page 150.

Genetic Programming: Reproduction Rate

Reproduction is also a Discipulus search operator in Genetic
Programming. Reproduction just copies a program and places the copy
into the population in addition to the original program.

The reproduction rate in a run is what is left over after the application of
the crossover and mutation operators. The reproduction rate may be
calculated (in percentages) as follows:

100 – mutation rate – (crossover rate * [1 – mutation rate])

You may access the crossover and mutation parameters from the
Genetic Programming Page of the Advanced Options Window
(Figure 77). See Accessing Genetic Programming Parameters on
page 150.

Advanced Genetic Programming Deme Parameters

Biologists have suggested that genetic diversity is enhanced in natural
populations by the fact that natural populations of the same species may
be isolated from each other geographically. The amount of blending of
genetic material among the members of the species by crossover is,
therefore, limited to migration among these isolated locales. These
isolated locales are called "Demes."

The Demes feature of Discipulus mimics the geographic isolation of

natural Demes. In Discipulus the Demes are arranged in a circle.
Movement of evolved programs among the Demes occurs only between
adjacent Demes on the circle.

You may access the Genetic Programming Demes parameters from the
Genetic Programming Tab of the Single Run Advanced Options Page.

DiscipulusTM Software Owner’s Manual

Page 154 Genetic Programming Parameters

The following topics are covered here regarding Genetic Programming

deme parameters:

• Genetic Programming Demes: Enabled/Not Enabled Check Box on

page 154;

• Genetic Programming Demes: Number of Demes on page 154;

• Genetic Programming Demes: Crossover Percentage Between

Demes on page 154;

• Genetic Programming Demes: Migration Rate Between Demes on

page 155.

Genetic Programming Demes: Enabled/Not Enabled Check Box

This checkbox determines whether the Demes feature is turned on or
off. You may find it on the Genetic Programming Tab of the Advanced
Options Window (Figure 77). See Accessing Genetic Programming
Parameters on page 150.

Genetic Programming Demes: Number of Demes

This parameter determines the number of Demes into which the Genetic
Programming Population is divided. You should choose this number so
that each Deme, by itself, has enough programs in it to engage in useful
evolution. Populations of fifty to one-hundred programs per Deme
probably represent the lower useful size of each Deme.

The number of Demes may not exceed half the number of programs in
the population.

You may set this parameter either on the Genetic Programming Tab of
the Advanced Options Window (Figure 77) or on the Advanced Tab of
the Genetic Programming Page. See Accessing Genetic Programming
Parameters on page 150.

Genetic Programming Demes: Crossover Percentage Between

Demes
Ordinarily, crossover occurs between programs in the same Deme. But
Discipulus can also perform crossover between programs in adjacent
Demes.

DiscipulusTM Software Owner’s Manual

Genetic Programming Parameters Page 155

The "Crossover Percentage Between Demes" parameter sets the percent

of tournaments that will result in crossover between programs in
adjacent Demes. When a tournament has been selected for crossover
between Demes, Discipulus takes the following steps:

• A Deme is selected at random;

• One of the two adjacent Demes is selected at random;

• Two programs are chosen from each of the selected Demes and the
better program from each Deme is selected for crossover;

• The selected programs from each Deme are crossed over. The
offspring of this crossover replace the two losers in the tournament.

You may set this parameter to any value from 0 to 100%. Generally
speaking, you should start a demes setup with no crossover and low
migration rates – migration on the order of 1% seems to work well. If
you set the migration rate too high, it effectively cancels out the effect
of having separate Demes.

Genetic Programming Demes: Migration Rate Between Demes

Ordinarily, programs and their offspring stay in the same Deme. But
Discipulus can cause them to migrate between adjacent Demes.

The "Migration Rate Between Demes" parameter sets the percent of

tournaments that result in migration of programs between adjacent
Demes.

When a tournament has been selected for migration, Discipulus

performs the following steps:

• A Deme is selected at random;

• One of the two adjacent Demes is selected at random;

DiscipulusTM Software Owner’s Manual

Page 156 Genetic Programming Parameters

• Two programs are chosen randomly from each of the selected

Demes. The better program from each Deme replaces the worse
program from the other Deme.

You may set this parameter to any value from 0 to 100%. A low value,
from 0.1% to 10% is recommended. Generally speaking, you should
start a demes setup with no crossover and low migration rates –
migration on the order of 1% seems to work well. If you set the
migration rate too high, it effectively cancels out the effect of having
separate Demes.

Genetic Programming Demes: Practice Note Regarding

Crossover and Migration Rates between Demes
Generally speaking, you should start a demes setup with no crossover
and low migration rates – migration on the order of 1% seems to work
well. If you set the migration rate too high, it effectively cancels out the
effect of having separate Demes.

DiscipulusTM Software Owner’s Manual

Page 157

Advanced Options
NOTE: The default settings for a Discipulus project work quite well for
most projects. In fact, a Discipulus project automatically sets,
randomizes and optimizes the parameters for all runs in that project.
Thus, the matters covered in this chapter should be considered
advanced subject matters that most users need not consider.

Discipulus has many advanced features. You may access them in the
Single Run Advanced Options Window. The following topics document
all features of Discipulus that have not been discussed elsewhere:

• Finding the Single Run Advanced Options Window on page 157;

• Dynamic Subset Selection on page 158;

• Parsimony Pressure on page 162;

• Advanced Crossover and Mutation on page 165;

• Controlling Program Size on page 169; and

• Setting the Random Seed on page 171.

Finding the Single Run Advanced Options Window

You may access all Discipulus parameters from the Advanced Options
Window. To get to the Advanced Options Window, make the following
menu selections:

• From the Set Up Learning Menu, select Options.

• The Advanced Options Window will pop up. Then click on the Set
button.

• Figure 78 shows the Single Run Advanced Options Window:

DiscipulusTM Software Owner’s Manual

Page 158 Advanced Options

Figure 78. The Advanced Options Window

To navigate around the Single Run Advanced Options Window, just

select the appropriately labeled tab and access the features you want.

Dynamic Subset Selection

If you selected Genetic Programming as the learning algorithm, you can
use a variant of the Dynamic Subset Selection algorithm (“DSS”) to
enhance training for Genetic Programming learning. DSS speeds up
evolution and usually provides better generalization capabilities.

The following topics contain more information about the DSS

algorithm:

• Dynamic Subset Selection Overview on page 158 and

• Dynamic Subset Selection Parameters on page 159

Dynamic Subset Selection Overview

You may use DSS for classification, function fitting (regression),
logistic-regression type problems, and ranking problem types.

DiscipulusTM Software Owner’s Manual

Advanced Options Page 159

DSS may not be enabled when you have written your own custom
fitness function.

DSS calculates the fitness of evolved programs by using a constantly

changing subset of the complete training data file (a "training subset").
As an example, even though you may have a training data file with 1000
training examples, Dynamic Subset Selection causes the actual fitness
evaluations to occur on a smaller Training Subset of – say – 200 of the
total of – say – 10,000 training examples. The 200 cases in the Training
Subset are regularly changed by the DSS algorithm.

DSS chooses the Training Subset from the overall set of training
examples (fitness cases) that you are using for your training data set. It
chooses the Training Subset using three criteria:

• Age. The "age" of the training example (that is, how long it has been
since a particular training example was used in a Training Subset).

• Difficulty. A measure of how difficult the algorithm is finding a

particular training example.

• Randomly.

You may set the relative importance of the above criteria. Thus, if you
selected a "Target Subset Size" of 200 you could, for example, select the
elements of that subset in the following proportions: 20% (40 training
examples) by the age of the training example, 70% (140 training
examples) by the difficulty of the training example and 10% (20 training
examples) randomly. In practice, we find that 50% by age and 50% by
difficulty works quite well.

DSS periodically discards the current Training Subset and selects a new
one. You may set the frequency with which this occurs.

Dynamic Subset Selection Parameters

You can set the parameters for Dynamic Subset Selection from the DSS
Tab of the Advanced Options Window. Figure 79 shows the DSS Page.

DiscipulusTM Software Owner’s Manual

Page 160 Advanced Options

Figure 79. The DSS Page

You may find the DSS Page with the following menu and tab selections:

• On the Set Up Learning menu, click Advanced;

• Click the DSS Tab (Figure 79).

You are now in the DSS Page. Here are the parameters that you may use
to control the Dynamic Subset Selection Algorithm.

• DSS Parameter -- Enabled on page 161

• DSS Parameter -- Target Subset Size on page 161

• DSS Parameter -- Selection By Age on page 161

• DSS Parameter -- Selection By Difficulty on page 161;

• DSS Parameter -- Stochastic Selection on page 161 and

• DSS Parameter -- Frequency (in generation equivalents) on page 162

DiscipulusTM Software Owner’s Manual

Advanced Options Page 161

DSS Parameter -- Enabled

The Enabled Check Box toggles Dynamic Subset Selection on and off.
When the box is checked, DSS is on.

DSS Parameter -- Target Subset Size

You set the size of the Training Subset with the Target Subset Size
parameter. This parameter must be less than the number of examples in
your training data file. Generally, values between 35 and 200 are
recommend.

DSS Parameter -- Selection By Age

Use the selection by age parameter to choose what percentage of the
Training Subset will be selected by the age criterion. With age selection
in DSS, the longer it has been since a particular training example has
been included in a Training Subset, the higher its age value. Then, the
portion of the Training Subset that is selected by age is chosen
probabilistically from all training examples in proportion to their age
value.

Practically speaking, this means that the longer it has been since a
training example was included in a Training Subset, the more likely it is
to be chosen in that portion of the next Training Subset that is chosen by
age.

DSS Parameter -- Selection By Difficulty

Use this parameter to choose what percentage of the Training Subset
will be selected by the Difficulty criterion.

NOTE: You cannot set selection by difficulty in Ranking fitness

functions as it has no meaning in that context.

WARNING: Do not set this parameter to 100%. Training will not be

effective.

DSS Parameter -- Stochastic Selection

Use this parameter to choose what percentage of the Training Subset
will be selected randomly.

DiscipulusTM Software Owner’s Manual

Page 162 Advanced Options

DSS Parameter -- Frequency (in generation equivalents)

Use this parameter to set how frequently the Training Subset is changed.
You set this parameter by entering a "generation equivalent." Because
Discipulus works by tournament selection, a population of 10,000
evolves thorough one full "generation equivalent" in 5,000 tournaments.
Similarly, a population of 2,000 evolves through one full "generation
equivalent" in 1,000 tournaments.

So, by way of example, if your population was 10,000, setting the

Frequency parameter to 1 would mean that the Training Subset is
refreshed every 5,000 tournaments (1*10,000/2). If the frequency
parameter were, in the same situation, set to 0.1, the Training Subset
would be refreshed every 500 tournaments (0.1*10,000/2).

Parsimony Pressure
Parsimony pressure is a term used to refer to techniques that tend to
make the evolved programs in Discipulus shorter--that is, more
parsimonious. Parsimony pressure causes “natural selection” in
evolutionary learning systems to favor the selection of shorter and more
compact evolved programs.

Discipulus applies parsimony pressure by randomly choosing a portion

of tournaments for application of parsimony. For each tournament
chosen, Discipulus determines if the fitness of the two evolved programs
in the tournament is within a certain threshold. If the difference in the
two programs’ fitness is less than that threshold percentage, then
parsimony is applied to that tournament by selecting the shorter of the
two programs as the tournament winner.

You may access parsimony pressure parameters as follows:

• On the Set Up Learning menu, click Select Advanced Options;

• Click on the Miscellaneous Tab (Figure 80).

DiscipulusTM Software Owner’s Manual

Advanced Options Page 163

Figure 80. The Miscellaneous Tab of the Single Run Advanced

Options Page

The following topics describe the parameters available to control

parsimony:

• Parsimony Pressure Parameters -- Enabled on page 163

• Parsimony Pressure Parameters -- Threshold on page 163

• Parsimony Pressure Parameters -- Delay (in generation equivalents)

on page 164 and

• Parsimony Pressure Parameters -- Effect % on page 164

Parsimony Pressure Parameters -- Enabled

Clicking the Enabled Check Box turns parsimony pressure on or off for
the run. Checked is on; unchecked is off.

Parsimony Pressure Parameters -- Threshold

This value sets the percentage of the Parsimony Pressure threshold. This
threshold is calculated by determining if the fitness of the evolved

DiscipulusTM Software Owner’s Manual

Page 164 Advanced Options

programs is less than a set percentage (you may set the percentage)
different from the average fitness of the two programs. If the difference
in the two programs’ fitness is less than that threshold percentage, then
parsimony is applied to that tournament by selecting the shorter of the
two programs as the tournament winner.

Parsimony Pressure Parameters -- Delay (in generation equivalents)

This parameter allows you to delay the application of parsimony
pressure until a certain number of generations has passed in a run. This
is a very useful way to allow a run to evolve good solutions without
parsimony and then, at the end of the run, apply parsimony pressure to
simplify the evolved solutions.

This parameter is set in “generation equivalents.” Because Discipulus

uses tournament selection and a steady state algorithm, a population of
10,000 evolves through one full “generation equivalent” in 5,000
tournaments. Similarly, a population of 2,000 evolves through one full
“generation equivalent” in 1,000 tournaments.

Parsimony Pressure Parameters -- Effect %

This number allows you to set what percentage of tournaments will be
subject to the parsimony algorithm. If, for example, this value is set to
33%, then about 1/3 of all tournaments will be subject to the parsimony
pressure algorithm.

PRACTICE NOTE: In general, a very small amount of parsimony

pressure should be used—otherwise it may cause Discipulus to find very
short, but very bad, solutions. Reducing either the Threshold parameter
or the Effect % parameter will tend to reduce the amount of parsimony
pressure.

DiscipulusTM Software Owner’s Manual

Advanced Options Page 165

Advanced Crossover and Mutation

The Search Operators used in Discipulus are crossover, mutation, and
reproduction. You may control the basic levels of crossover, mutation
and reproduction as follows:

• For Genetic Programming, the overall crossover, mutation, and

reproduction rates may be set in the Genetic Programming Page of
the Advanced Options Window.

• These basic parameters are addressed elsewhere and will not be

detailed again here. See generally: Basic Genetic Programming
Parameters on page 151, and Advanced Genetic Programming Deme
Parameters on page 153.

The following topics address advanced crossover and mutation issues:

• Advanced Mutation Parameters on page 165;

* Block Mutation Rate on page 165;

* Instruction Mutation Rate on page 166;

* Instruction Data Mutation Rate on page 167; and

* Ratio of Constants/Inputs on page 167.

• Advanced Crossover Parameters – Homologous Crossover on

page 168.

Advanced Mutation Parameters

The following parameters affect the mutation operator in Genetic
Programming and Simulated Annealing.

Block Mutation Rate

Evolved programs in Discipulus maintain their instructions inside of
Instruction Blocks that are 32 bits in length. See Population, Program,
Instruction Block and Instruction Reference on page 211. The Block
Mutation Rate parameter sets what percentage of mutation operations
replace an entire Instruction Block with a new randomly generated
Instruction Block.

DiscipulusTM Software Owner’s Manual

Page 166 Advanced Options

You set this parameter as follows:

• On the Set Up Learning Menu, click Advanced;

• Click on the Search Operators Tab (Figure 81);

• Fill in the Block Mutation Rate Box.

Figure 81. The Search Operators Tab

Instruction Mutation Rate

The Instruction Mutation Rate parameter sets the percent of mutation
operations that result in a single instruction being replaced by a new,
randomly chosen instruction of the same length. You set this parameter
as follows:

• On the Set Up Learning Menu, click Advanced;

• Click on the Search Operators Tab (Figure 81);

• Fill in the Instruction Mutation Rate Box.

DiscipulusTM Software Owner’s Manual

Advanced Options Page 167

Instruction Data Mutation Rate

This parameter determines the probability that the mutation operator
will change the temporary computation variable, the inputs or the
constants to which an instruction refers to another temporary
computation variable, input, or constant.

You set this parameter as follows:

• On the Set Up Learning Menu, click Advanced;

• Click on the Search Operators Tab (Figure 81);

• Fill in the Instruction Data Mutation Rate Box.

Ratio of Constants/Inputs
This parameter sets the relative weight accorded to constants and to
inputs during the initialization of the population and in the mutation
operator. A value greater than 50% results in a relatively larger use of
constants relative to inputs during evolution. A value less than 50%,
results in the reverse.

You set this parameter as follows:

• On the Set Up Learning Menu, click Advanced;

• Click on the Instruction Tab (Figure 82);

• Fill in the Ratio of Constants/Inputs Box.

DiscipulusTM Software Owner’s Manual

Page 168 Advanced Options

Figure 82. The Instruction Tab

Advanced Crossover Parameters – Homologous Crossover

Homologous crossover is an innovation that attempts to duplicate
natural evolution more closely than does traditional crossover. In
homologous crossover, the two evolved programs are lined up next to
each other. Crossover occurs by exchanging groups of contiguous
Instruction Blocks between the two evolved programs. The groups of
contiguous Instruction Blocks are chosen so that the groups from each
parent evolved program are the same length and are taken from the same
position in both of the two parent evolved programs.

Non-Homologous crossover occurs when Instruction Blocks are

exchanged between two evolved programs with no reference to the size
and location of the two sets of Instruction Blocks. (Non-homologous
crossover is the only method Discipulus has to change the length of
evolved programs. So you should rarely turn it off completely.)

The Homologous Crossover parameter sets the percentage of crossover

events that are "homologous" as opposed to the percentage that are
"non-homologous." We recommend a setting of 50% to 95% for most
problems.

DiscipulusTM Software Owner’s Manual

Advanced Options Page 169

You may set the value of the Homologous Crossover parameter as

follows:

• On the Set Up Learning Menu, click Options;

• On the Advanced Options Page, click Set;

• Click on the Search Operators Tab (Figure 81);

• Fill in the Homologous Crossover Box.

Program Size
There are two parameters that control the size of the programs you
evolve using Discipulus. Program Size parameters are measured in
bytes. They represent the length of the body of the programs in the
population. For more information about how Discipulus programs are
constructed, see Population, Program, Instruction Block and Instruction
Reference on page 211.

The following topics give additional information about Program Size:

• Initial Program Size on page 169;

• Maximum Program Size on page 170; and

• Maximum Program Size and Non-Homologous Crossover on

page 171.

Initial Program Size

This parameter (in bytes) sets the size of the programs in the first
population created by Discipulus at the start of a run. In other words,
Discipulus creates an initial population with programs that have lengths
chosen by a uniform pseudo-random distribution between the following
minimum and maximum sizes (Table 3):

Table 3.
Minimum Maximum
4 bytes Initial Program
Size

DiscipulusTM Software Owner’s Manual

Page 170 Advanced Options

You may set this parameter as follows:

• On the Set Up Learning Menu, click Advanced;

• Click on the Program Size and Constants Tab (Figure 83);

• Fill in the Program Size-Initial Box.

Figure 83. The Program Size and Constants Tab

Maximum Program Size

This parameter sets the maximum length of the body of an evolved
program in the population (in bytes).

You may set this parameter as follows:

• On the Set Up Learning Menu, click Options;

• On the Advanced Options Page, click Set;

• Click on the Program Size and Constants Tab (Figure 83);

DiscipulusTM Software Owner’s Manual

Advanced Options Page 171

• Fill in the Program Size-Max Box either by typing a value into the
box or using the slider.

Maximum Program Size and Non-Homologous Crossover

Non-Homologous crossover causes programs to change in length. Thus,
it may cause the length of the children created by the crossover event to
be longer than the Maximum Program Size parameter. If this happens,
the crossover operator is performed again so that both of the Children
are of permissible length in the following manner:

• The smaller child of the first crossover is retained as one of the

children of the crossover.

• Crossover is performed a second time and the smaller child of this

second crossover is retained as the second child of the crossover.

Setting the Random Seed

You may set the Random Seed used in individual Discipulus runs either
by referencing the system clock, or you may set it manually. You may
do so with the following menu and tab selections:

• On the Set Up Learning menu, click Advanced. The Advanced

Options Page pops up.

• On the Advanced Options Page, click Set. The Single Run Advanced
Options Page pops up;

• On the Single Run Advanced Options Page, click on the

Miscellaneous Tab:

* If you want to set the Random Seed for a run based on the system
clock, click on System Time.

* If you want to enter an explicit seed, do so here.

DiscipulusTM Software Owner’s Manual

Page 172 Advanced Options

DiscipulusTM Software Owner’s Manual

Page 173

Measuring Fitness
The Genetic Programming algorithm uses a “fitness function” to
determine which evolved programs survive and reproduce. The fitness
function used depends on what type of problem you want to solve. So, if
you have two classes and want to classify rows into them, you would
use a classification fitness function. If you wanted to rank them
(common in CRM and credit scoring), you could use a ranking fitness
function. If you want to predict numeric outputs, you could use a
regression or function fitting fitness function.

Generally speaking, the better an evolved program models your training

data, the more fit it will be.

Not all fitness functions are available in all versions of Discipulus. The
advanced fitness function package is comprised our new, ranking and
logistic-regression fitness functions. They are available in the Enterprise
Plus version and as an upgrade to other versions of Discipulus.

The following topics regarding fitness functions may be found here:

In General:

• Choosing a Problem Type on page 174

• Default Fitness Functions on page 176

• Accessing Fitness Measurement Parameters after the Project Wizard

is Complete on page 176

• Fitness Function Overview on page 177

For Regression Problem Types

• Fitness Measures for Regression Problems on page 177

For Classification Problem Types

• Fitness Measure for Classification Problems on page 178

• How the Hits-then-Error Fitness Function Works on page 181

DiscipulusTM Software Owner’s Manual

Page 174 Measuring Fitness

For Ranking Problem Types

• Fitness Measures for Ranking Problem Types on page 183

• The Four Ranking Fitness Functions on page 184

• Best ROC Curve (Compare) Fitness Function for Ranking Problems

on page 184

• Best ROC Curve then Cost Fitness Function for Ranking Problems
on page 185

• Minimum Cost Fitness Function for Ranking Problems on page 185

For Logistic Regression Problem Types

• Fitness Measure for Logistic-Regression Binary Target Output

Problems on page 186

For Custom Fitness Functions

• Custom Fitness Functions on page 187

Choosing a Problem Type

In most circumstances, Discipulus detects your problem type and
automatically configures itself to handle your problem appropriately.
You can, however set the problem type directly, if you choose.
Discipulus is set up to handle two different types of problems:

• Binary Target Output Problems. Where your data has target

outputs that have only two values, you have three choices (depending
on version) for a fitness function. They are:

* Classification. See Fitness Measure for Classification Problems

on page 178

* Ranking. There are four ranking fitness functions: (1) Best ROC
Curve Fitness Function for Ranking Problems on page 184; (2)
Best ROC Curve (Compare) Fitness Function for Ranking
Problems on page 184; (3) Best ROC Curve then Cost Fitness

DiscipulusTM Software Owner’s Manual

Measuring Fitness Page 175

Function for Ranking Problems on page 185; and (4) Minimum

Cost Fitness Function for Ranking Problems on page 185 and

* Logistic-Regression. Use logistic-regression when it is important

that you produce a true probabilistic prediction. Logistic-
Regression produces evolved programs and teams that optimize
the maximum likelihood estimator of probability, given binary
targets

• Function Fitting (or Regression) Problems. In function fitting

problems, the learning task is to fit the actual values of the Target
Outputs in the data files (see Fitness Measures for Regression
Problems on page 177). Most problems where the target is numeric
(many values) may be approached with the function fitting
capabilities of Discipulus. There are two fitness functions available in
Regression type problems: (1) Mean Squared Error; and (2) Absolute
Error. See Fitness Measures for Regression Problems on page 177.

You can choose your problem type and fitness function in two ways:

From the main menu:

• On the Set Up Learning menu, click Problem Category;

• On the menu that appears, click Fitness Function.

The four problem types appear plus "Custom Fitness Functions." Select
the appropriate one for your problem and then follow the remaining
menus that open to configure that fitness function as you see fit.

From the Project Wizard:

In the Project Wizard, the "Select Problem Type and Fitness Function"
window appears after you have imported data. First, click on the
appropriate problem type in the "Select Problem Type" box. All
available fitness functions appear in the "Select Preset Fitness Function"
box. Select the one you prefer and set any necessary parameters for that
fitness function in the area to the right of that box. Discipulus will only
let you change parameters that are appropriate for your selected fitness
function.

DiscipulusTM Software Owner’s Manual

Page 176 Measuring Fitness

Default Fitness Functions

The easiest way to choose a fitness function is to let Discipulus do it for
you. Depending on the problem type for a project (function fitting or
binary target outputs), Discipulus automatically selects an appropriate
fitness function. (See Fitness Measures for Regression Problems on
page 177.) Discipulus automatically sets the fitness function as follows,
when you select a problem type:

• For function fitting problems, Discipulus uses the squared error

measurement for the fitness functions described below (see Fitness
Measures for Regression Problems on page 177);

• For classification problems, Discipulus uses the “Hits-then-Error”

fitness measure described below (see How the Hits-then-Error
Fitness Function Works on page 181).

• For ranking problems, Discipulus use the "Best ROC Curve" fitness
measure described below (see Best ROC Curve Fitness Function for
Ranking Problems on page 184).

As you will see below, Discipulus gives you a good deal of power to
adjust and modify these default settings for fitness functions. Please see
Using the Project Setup Wizard on page 36

Accessing Fitness Measurement Parameters after the

Project Wizard is Complete
You will normally set problem type, fitness function and the parameters
of the fitness function in the Project Wizard. However, you may also
access after you have finished with the Project Wizard.

To do so:

• Click "Set Up Learning" on the Main Menu. Then select, "Fitness

Function."

• A menu will open up that shows all problem types. Select a problem
type.

• A new menu will open up showing all available fitness functions for
that problem type. Select one.

DiscipulusTM Software Owner’s Manual

Measuring Fitness Page 177

• If, and only if, there are parameters to be set for that fitness function,
a window will open up and you can set the parameters there. If no
window opens up, you are done.

Fitness Function Overview

If you understand the following five points, you will understand
Discipulus fitness functions.

1. Discipulus makes different fitness functions available for function

fitting problems (regression) than it does for binary target output
problems (ranking, classification, and logistic-regression).

2. You may choose between two different fitness functions for

function fitting (regression) problems – Absolute Error or "Min
Squared Error." (See Fitness Measures for Regression Problems
on page 177.)

3. When you solve classification problems, Discipulus uses the Hits-

then-Error approach in classification problems. In this approach,
you may assign different weights (or costs) to the positive and
negative examples in the training set (see Assigning Different
Weights to Positive and Negative Examples for Hits-then-Error
Fitness Functions on page 182).

4. You may choose between four different fitness functions for

ranking problems. See:Fitness Measures for Ranking Problem
Types on page 183.

5. The Logistic-Regression fitness function works for target outputs

that are binary (two-valued). It produces a true probability
estimate for each row of data.

The next sections describe, respectively, the fitness function used for
function fitting problems and the different fitness functions used for
classification problems.

Fitness Measures for Regression Problems

Generally speaking, Discipulus calculates the fitness of evolved
programs by determining how closely the predicted outputs of the

DiscipulusTM Software Owner’s Manual

Page 178 Measuring Fitness

evolved program match your target output in the training data. The
closer the match, the more fit the evolved program.

For function fitting (regression) problems, Discipulus calculates the

amount of the raw error made by the evolved program for each example
in the data set. Then it averages those errors over the whole set to
calculate the overall fitness of the evolved program.

The raw error for each training example is the difference between the
output of an evolved program and the target output from your training
data file. (See Training, Validation, and Applied Data on page 205.)

Discipulus has two ways to average the raw error – it uses either
“absolute” or “squared” error measurements. The squared error method
is the default method in Discipulus. Here is a description of the two
different ways in which Discipulus implements the measurement of
errors:

• Absolute Error Measurement. Discipulus calculates the fitness of

an evolved program by taking the average of the absolute value of the
raw errors over the examples in the data set.

• Minimum Squared Error Measurement. With a squared error

function, Discipulus calculates the fitness of an evolved program by
taking the average of the squared raw errors over the examples in the
data set.

You may choose between Absolute Error and Minimum Squared Error
in the Project Wizard or from the Main Menu as described in Accessing
Fitness Measurement Parameters after the Project Wizard is Complete
on page 176.

Fitness Measure for Classification Problems

Generally speaking, Discipulus calculates the fitness of evolved
programs by determining how closely the outputs of the evolved
program and the target outputs in the training data match up. The closer
the match, the fitter the evolved program.

But unlike function fitting problems, classification problems are hard to

solve well using a simple error based measurement of the closeness of
fit. The reason for this is simple – you are not looking for a “function”

DiscipulusTM Software Owner’s Manual

Measuring Fitness Page 179

when you run classification problems. Instead, you are looking for high
accuracy of classification – that is, a high “hit-rate.”

The following topics contain additional information about the ins-and-

outs of classification fitness functions:

• Classification Problems: How Discipulus Classifies Evolved

Program Outputs. Setting the Threshold on page 179;

• Classification Problems: How to Handle Problems with Three or

More Classes on page 180;

• Classification Problems: Hit-Rates Defined on page 180;

• Classification Problems: Reporting of Overall, Positive, and

Negative Hit-Rates on page 181.

• How the Hits-then-Error Fitness Function Works on page 181.

Classification Problems: How Discipulus Classifies Evolved

Program Outputs. Setting the Threshold
Discipulus will evolve programs that classify training, validation, and
applied data into two classes – class zero and class one. You determine
the categories to which class zero and class one correspond. But when
you set up your data files, the outputs in your training, validation, and
applied data sets should be two distinct values that correspond
respectively to classes zero and one.

Table 4 shows various combinations of values that you may use for the
target outputs for the two classes.

Table 4. Target Output Values

Output Value Output Value For Suggested
For Class Zero Class One Threshold
0.0 1.0 0.5
–1.0 1.0 0.0
10.0 30.0 20.0
17 18 17.5

DiscipulusTM Software Owner’s Manual

Page 180 Measuring Fitness

Discipulus classifies the output of a program as a class zero or a class

one using a threshold. Discipulus automatically sets the threshold
halfway between the target outputs for classes zero and one, as shown in
Table 4. Generally, you should not change that setting.

Discipulus uses the threshold to classify the outputs of evolved program

models as follows:

• If an evolved program’s output for a training example is greater than

or equal to the classification threshold, that output is counted as a
class one output.

• If the output of an evolved program for a training example is less

than the classification threshold, that output is counted as a class zero
output.

Classification Problems: How to Handle Problems with Three or

More Classes
You may encounter problems where a data set must be classified into
more than two classes. For example, suppose you want to classify
objects from satellite images as trucks, cars, or motorcycles. The best
way to solve such a problem is to decompose it into three separate
classification problems as follows:

• Car vs. not a Car;

• Truck vs. not a Truck; and

• Motorcycle vs. not a Motorcycle.

Then do three separate projects and make the classification based on the
results of the three separate evolved programs. Decomposing the
problem in this manner usually results in much better classification.

Classification Problems: Hit-Rates Defined

Hit-Rate Defined. The “hit-rate” is the percentage of all examples
(training, validation, or applied) that are correctly classified. So, for
example, if an evolved program correctly classifies 45 out of 50 training
examples, the “hit-rate” for the training data set is 90%.

Positive and Negative Hit-Rates Defined. Classification problems are

often posed as positive examples (class one) and negative examples

DiscipulusTM Software Owner’s Manual

Measuring Fitness Page 181

(class zero). For example, in a medical application, the positive

examples may be instances where a patient had diabetes. The negative
examples would be patients where there was no diagnosis of diabetes.

Classification Problems: Reporting of Overall, Positive, and

Negative Hit-Rates
You will often find it useful to know how well evolved classification
programs perform overall and separately on the positive and negative
examples. Discipulus gives you that information in a number of places--
for example, the Monitor Project Window shows that information for
the project on the Overview tab. Discipulus uses the Hits-then-Error
approach to calculate fitness for classification problems by default.

You may set that parameter in the Select Problem Type and Fitness
Function window of the Project Wizard or from the Main Menu as
described in Accessing Fitness Measurement Parameters after the
Project Wizard is Complete on page 176.

The following topics contain more information about the hits-then-error

fitness function for classification problems:

• How the Hits-then-Error Fitness Function Works on page 181;

• How Discipulus Determines if Two Evolved Programs are Tied in the

Hits-then-Error Fitness Function on page 182; and

• Assigning Different Weights to Positive and Negative Examples for

Hits-then-Error Fitness Functions on page 182.

How the Hits-then-Error Fitness Function Works

Discipulus evolves programs by conducting tournaments in which two
evolved programs compete against one another. In each tournament, the
“fitter” program wins the tournament.

When you enable the Hits-then-Error fitness function, Discipulus

determines the winner of these tournaments by the following steps:

1. Discipulus first calculates the weighted hit-rate of both evolved

programs. For more information about weighted hit rates, see
Assigning Different Weights to Positive and Negative Examples
for Hits-then-Error Fitness Functions on page 182;

DiscipulusTM Software Owner’s Manual

Page 182 Measuring Fitness

2. If the two programs are “tied” in their hit-rates, Discipulus

proceeds to step three. (See How Discipulus Determines if Two
Evolved Programs are Tied in the Hits-then-Error Fitness
Function on page 182.) If they are not tied, Discipulus skip step
three; the program with the higher hit-rate is chosen as the winner
of the tournament;

3. Discipulus calculates the “error” of the outputs of the two

programs as described in the section entitled Fitness Measures for
Regression Problems on page 177. The winner of the tournament
is the program with the lower error measurement. If the errors of
the two programs are identical, Discipulus selects the winner
randomly.

How Discipulus Determines if Two Evolved Programs are Tied in

the Hits-then-Error Fitness Function
Discipulus allows you to set a range to determine if two programs are
tied in their hit-rates. You do that by setting the “Tied Tournament
Threshold %” parameter.

Assigning Different Weights to Positive and Negative Examples

for Hits-then-Error Fitness Functions
It is often useful to assign different weights to positive and negative
examples in the training set. Two examples of such situations are:

• Where the cost of a false positive is different than the cost of a

false negative. For example, suppose the classification problem was
a preliminary screening to diagnose cancer. Most patients would
probably agree that the “cost” of an incorrect diagnosis that the
patient does not have cancer is very high because they do not get
treated. On the other hand, an incorrect diagnosis that the patient has
cancer just leads to further testing.

• Where the number of positive examples and negative examples

are very unbalanced in the training data. For example, suppose the
number of diagnoses of cancer in the training data comprised only

DiscipulusTM Software Owner’s Manual

Measuring Fitness Page 183

five percent of all training cases. In this case, it is easy for

computerized learning systems to get stuck classifying everything as
being not cancerous.1

Discipulus allows you to assign different weights to positive and

negative examples. You may set that parameter in the Select Problem
Type and Fitness Function window of the Project Wizard.

Discipulus uses the weights you set in the fitness function as follows.
Let weightneg be the weight you assign to negative examples. Let
weightpos be the weight you assign to positive examples. Let Hit-Rateneg
be the hit-rate for negative examples and let Hit-Ratepos be the hit-rate
for positive examples. Then the overall hit-rate for the purpose of
calculating fitness is determined as follows:

Hit-Rate weighted = ( Hit-Rate pos × weight pos ) + ( Hit-Rate neg × weight neg )

Fitness Measures for Ranking Problem Types

Many ranking problems are mistakenly run as classification problems.
In fact, classification is quite different than ranking. The goal in a
ranking problem is to rank rows of data in terms of the likelihood that
they are a class one. In other words, the goal is to put as many class
one’s high in the ranking as possible.

Ranking problems are frequently encountered in customer relationship

management, credit scoring, financial portfolio optimization and the
like. For example, mortgage loan applications can be ranked by likely
profitability. The lender can then keep making loans to the best ranked
borrowers until the cost of lending exceeds the expected profitability.

The most common statistical measure of how good a binary ranking is

the area under the curve of the ROC curve generated by the ranking
("AUC").

1
If you are using Dynamic Subset Selection (DSS) with the difficulty parameter set to
at least 50%, you probably will not need to adjust for the difference between the
number of positive and negative examples. (See Dynamic Subset Selection on
page 158.)

DiscipulusTM Software Owner’s Manual

Page 184 Measuring Fitness

Another way to measure the quality of a ranking is the Cost of the

ranking, given a selection threshold along the ranking. Everything above
the threshold is a Class One. Everything below is a Class Zero. Given a
cost for a false negative and a false positive, every threshold for every
ranking has a cost.

Our four new ranking fitness functions blend Area under the Curve of a
ROC curve and minimum cost to provide four innovative fitness
functions. With these fitness functions, you can harness Genetic
Programming to solve the exact problem you have, instead of trying to
shoehorn classification or regression to do ranking.

The Four Ranking Fitness Functions

Best ROC Curve Fitness Function for Ranking Problems
This is the default fitness function for ranking fitness functions. It
computes the area under the curve of the ROC curve ("AUC") defined
by the outputs of an evolved program. The formal fitness function is: 1-
AUC.

There are no parameters required for this fitness function.

Best ROC Curve (Compare) Fitness Function for Ranking

Problems
This fitness function operates on two programs that are in a tournament.
The task is to determine which program should win the tournament and
proceed onto crossover and mutation. That determination is made at a
confidence level that you set, say, 95% confident for this example. The
decision metric is:

1. If we are 95% confident that Program One has a better ROC curve
AUC than Program Two, Program One wins the tournament.

2. If we are 95% confident that Program Two has a better ROC curve
AUC than Program One, Program Two wins the tournament.

3. If neither of the above are true, we randomly pick Program One or

Program Two as the tournament winner.

The only parameter for this fitness function is the Confidence Level.
You may set this parameter in the Select Problem Type and Fitness

DiscipulusTM Software Owner’s Manual

Measuring Fitness Page 185

Function window of the Project Wizard or from the Main Menu as

described in Accessing Fitness Measurement Parameters after the
Project Wizard is Complete on page 176.

Best ROC Curve then Cost Fitness Function for Ranking

1. If we are 95% confident that the ranking generated by Program

One has a better ROC curve AUC than the ranking generated by
Program Two, Program One wins the tournament.

2. If we are 95% confident that the ranking generated by Program

Two has a better ROC curve AUC than the ranking generated by
Program One, Program Two wins the tournament.

3. If neither of the above are true, we calculate the Cost for each of
the programs at every decision threshold along the ranking, given
the Cost of a False Negative and the Cost of a False Positive. The
program that has the lowest cost at any threshold is the tournament
winner.

The parameters required for this fitness function are:

• Cost of a False Negative;

• Cost of a False Positive;

• Confidence level for ROC curve AUC comparison.

You may set these parameters in the Select Problem Type and Fitness
Function window of the Project Wizard or from the Main Menu as
described in Accessing Fitness Measurement Parameters after the
Project Wizard is Complete on page 176.

Minimum Cost Fitness Function for Ranking Problems

The Minimum Cost fitness function ranks the rows of data in the order
of the output of the program being evaluated. Given that ranking, and

DiscipulusTM Software Owner’s Manual

Page 186 Measuring Fitness

given the Cost of a False Negative and the Cost of a False Positive,
Discipulus calculates the Cost at each decision threshold in the ranking.
The fitness is the minimum cost across all decision thresholds.

The parameters required for this fitness function are:

• Cost of a False Negative; and

• Cost of a False Positive.

Fitness Measure for Logistic-Regression Binary Target

Output Problems
The logistic-regression fitness for a program is computed with the
following steps:

The outputs of the program are transformed using the logistic transform
for each row:

We then interpret P for class one as a probability and calculate the log-
likelihood for the output of the program, given the target outputs.

The formal fitness function used is -2*Log-Likelihood summed over all

rows of data on which fitness is being computed. Like traditional
logistic regression, this generates a true, maximum likelihood
probability estimate for each row of data.

These probabilities are output to the three Probability of Class One rows
in the Data Window.

There are no parameters to set for this fitness function.

DiscipulusTM Software Owner’s Manual

Measuring Fitness Page 187

Custom Fitness Functions

The Enterprise and Enterprise Plus versions of Discipulus permit you to
write your own fitness function. You do that by creating a dynamic-link-
library (DLL), which you may call from Discipulus.

Although the full scope of custom fitness functions is beyond the scope
of this Manual, materials are available for those who wish to use this
advanced feature. Those materials are found in the Custom Fitness
Function folder installed with any version that has the Custom Fitness
Function capability in a file called:

Discipulus_Custom_Fitness_Functions_Interface.pdf

This section summarizes the capabilities of the Discipulus custom

fitness functions feature and how to obtain full featured Custom Fitness
Function capabilities.

There are two types of custom fitness functions in Discipulus, Vector

Style and Pointer Style custom fitness functions.

Vector Style Custom Fitness Functions. In the Vector Style custom

fitness function, Discipulus passes two vectors to your DLL. The first
one contains the outputs of the evolved program for which fitness is
being calculated. The second one contains the target outputs from your
data file.

Using these two vectors, you calculate a fitness for the evolved program
in your DLL. Then you return that calculated fitness to Discipulus.
Fitness must be smaller as the program gets better.

Pointer Style Custom Fitness Functions. In the Pointer Style custom

fitness function, Discipulus passes a pointer to the evolved program to
your DLL. Your DLL can execute the evolved program as machine code
using that pointer. Your DLL is entirely responsible for handling all
training and validation data and must calculate and return a fitness for
both the training and validation sets.

This fitness function lets you interact with real systems and perform
other specialized fitness measures not possible if Discipulus handles
your data.

DiscipulusTM Software Owner’s Manual

Page 188 Measuring Fitness

Parameters in Your Custom DLL. You may have custom parameters

in your DLL. Your DLL may publish those parameters and the
Discipulus GUI will let you set these parameters in the GUI. It also
stores the parameter settings you choose in Discipulus project files.

DiscipulusTM Software Owner’s Manual

Page 189

Function And Terminal Sets

NOTE: Discipulus projects start with a standard setting for the function
and terminal set that works quite well without any changes by the user.
Accordingly, this chapter should be regarded as an advanced subject
that most users never need to consider.

Discipulus builds programs out of small bits of code, each of which

perform simple operations – you could think of these simple operations
as the atoms that Discipulus assembles into programs. In Automatic
Programming, these atoms are made up of the function set and the
terminal set. Two examples should clarify this point. Here is a line of
code from a program Discipulus has evolved:

f[0] -= Input001;

This "atom" of code performs one simple operation – it subtracts the

value of the first input from your data file from the First Computation
Variable and then places that difference back into the First Computation
Variable. The computation variable and the first input are parts of the
terminal set. The subtraction operator is part of the function set.

Alternatively, the above line of code might instead perform a subtraction

operation with a constant as follows:

f[0] -= 3.12345

In this example, the computation variable and the constant are terminals
while subtraction is the operator from the function set.

Many other combinations of operators (from the function set) and

inputs, constants and computation variables (from the terminal set) are
possible. Each of these combinations is an "atom" that Discipulus can
assemble with other "atoms" into a program.

The following topics contain information about how to control which

operators are in the function set, and which constants, variables and
inputs are in the terminal set.

• Function and Terminal Sets Defined on page 190;

• Choosing the Terminal Set on page 192;

DiscipulusTM Software Owner’s Manual

Page 190 Function And Terminal Sets

• Weighting the Terminal Set on page 196;

• Choosing the Function Set on page 197; and

• Weighting the Function Set on page 198.

Function and Terminal Sets Defined

The Function Set Defined. The function set for a run is comprised of
all operators that you configure Discipulus to use in evolving programs.
Examples of operators you can include in the function set are:

• Addition;

• Subtraction;

• Square Root;

• Absolute Value.

There are many other operators available in Discipulus.

You can control the function set in your runs by using the Instruction
Set Box on the Instruction Page of the Advanced Options Window to
designate the function set in great detail. See: Choosing the Function Set
on page 197 and Weighting the Function Set on page 198.

The Terminal Set Defined. By itself, an operator from the function set
(such as addition) is useless. An addition operator must have values to
add together and a place to put the sum. The function set operators,
therefore, cannot act alone. They must have values to operate upon.

The terminal set for a run is made up of the values on which the
function set operates. For example, it includes the values that the
addition operator adds together.

You will use four different types of terminals in Discipulus programs:

Inputs (Example: Input001, Input002. . .). These are just the inputs from
your data file. Discipulus calls the first column input from your data file,
Input001; it calls the second, Input002 and so forth. Note, if you import
data to Discipulus using Notitia, Discipulus will use column names from

DiscipulusTM Software Owner’s Manual

Function And Terminal Sets Page 191

your files instead of "Input001." Here, we use the convention that those
input columns have been named by Discipulus as "Input001" etc.

You do not need to do anything to configure the inputs as part of your

terminal set. All inputs from your data file are automatically available to
Discipulus as terminals.

Constants (Example: 9.35452664). You may make constants available to

Discipulus during evolution. Having constants available usually helps
Discipulus evolve good solutions.

You control how many and which constants are available from the
Program Size and Constants Page of the Advanced Options Window.
See The Terminal Set: Configuring Constants on page 192.

Temporary Computation Variables (Example: f[0], f[1]...). Like

human programmers, Discipulus uses temporary computation variables
to store values short term, while it performs calculations. You may
control how many temporary computation variables are available to
Discipulus from the Instructions Page of the Advanced Options
Window. See The Terminal Set. Configuring Temporary Computation
Variables on page 195.

Conditional Flags (Example: cflag). Discipulus has the ability to evolve

if-then type structures by using the Comparison and Conditional
Branching and Conditional Move type instructions from the function
set. If you include such instructions in the function set, you will see that
your evolved program use a "cflag" (conditional flag) variable. Cflag
variables hold true or false values depending on the result of a
Comparison operator. The Conditional Branching operators then read
the value in cflag to determine what step to take.

If you use Comparison and Conditional Branching Operators in the

function set, the proper use of the cflag variable is configured for you
automatically by Discipulus.

If you use conditional jump instructions, no Java decompilation will be

available for the evolved programs. For this reason, only the conditional
move instructions are included in the default instruction set used by
Discipulus. You will find that conditional move instructions are
sufficient for almost all applications.

DiscipulusTM Software Owner’s Manual

Page 192 Function And Terminal Sets

Choosing the Terminal Set

Discipulus configures two types of "terminals" automatically -- inputs
and cflag terminals. You do not have to worry about these elements of
the terminal set. But you can configure the other types of terminals --
constants and temporary calculation variables.

The following topics contain more information about the configuration

of constants and temporary calculation variables:

• The Terminal Set: Configuring Constants on page 192;

• The Terminal Set. Configuring Temporary Computation Variables on

page 195.

The Terminal Set: Configuring Constants

Discipulus automatically sets up your constants for you. Thus, this
capability is an advanced option that need rarely be used.

Giving Discipulus a range of constants in the terminal set to use in

evolution often helps it evolve good solutions. You may input your own
constants, let Discipulus generate constants for you, or combine the two
approaches. You control the constants in a run from the Program Size
and Constants Page of the Advanced Options Window. You get to that
window as follows:

• From the Set Up Learning menu, click Options;

• On the Advanced Options Window, click Set;

• The Single Run Advanced Options Window appears. Click the

Program Size and Constants Tab (Figure 84).

You are now on the Program Size and Constants Tab, which is shown in
Figure 84. All configuration of constants is done from here.

DiscipulusTM Software Owner’s Manual

Function And Terminal Sets Page 193

Figure 84. The Program Size and Constants Tab

The Constant List Box displays the constants that Discipulus will use as
terminals in the next run. The Constant List Box appears to the left of
the Randomize Constants Box on the Program Size and Constants Page.
There are three ways to change the constants in the terminal set:

• Input your own constants manually (see Configuring Constants

before a Run -- Input Your Own Constants on page 193); or

• Let Discipulus generate random constants for you (Configuring

Constants before a Run -- Let Discipulus Create Constants For You
on page 194); or

• Combine approaches 1 and 2 Configuring Constants before a Run --

Combining Two Approaches to Creating Constants on page 194).

Configuring Constants before a Run -- Input Your Own Constants

You may add a constant to the Constant List Box by typing that constant
into the Edited Value Box and clicking on the Add Button in the
Program Size and Constants Tab (Figure 84).

DiscipulusTM Software Owner’s Manual

Page 194 Function And Terminal Sets

You may remove a constant from the Constant List Box by highlighting
the constant in the Constant List Box and clicking on the Remove
Button in the Program Size and Constants Tab (Figure 84).

You may edit an existing constant by performing the following three

steps in the Program Size and Constants Tab (Figure 84):

1. Remove it from the Constant List Box;

2. Change its value in the Edited Value Box; and

3. Click on the Add Button to return the edited constant to the

Constant List Box.

Configuring Constants before a Run -- Let Discipulus Create Constants

For You
Alternatively you may have Discipulus create constants for you. To do
so, perform the following steps in the Program Size and Constants Tab
(Figure 84):

1. Enter the number of constants you want Discipulus to create in the

Amount Box.

2. Enter the maximum and minimum constant values in the boxes

with those names.

3. Click on the Randomize Button.

Discipulus will create that number of random constants in the assigned

range. The constants will be distributed uniformly.

Configuring Constants before a Run -- Combining Two Approaches to

Creating Constants
You may have Discipulus create some constants for you and then add
your own constants. You would do that as follows:

1. Create automatic randomized constants (see Configuring

Constants before a Run -- Let Discipulus Create Constants For
You on page 194); and then

2. Add one or more constants manually (see Configuring Constants

before a Run -- Input Your Own Constants on page 193).

DiscipulusTM Software Owner’s Manual

Function And Terminal Sets Page 195

The number of constants in the Constant List Box may not be greater
than sixty-four minus the number of inputs in your training data set.

The Terminal Set. Configuring Temporary Computation Variables

Like human programmers, Discipulus uses temporary computation
variables to store values while it performs calculations. The f[0] terms in
in the programs Discipulus evolves are the temporary computation
variables.

You can configure Discipulus to use up to eight computation variables.

Depending on how many you choose, these variables will be named
f[0], f[1], etc.(all the way up to f[7])in the C programs produced by
Discipulus. (For the technically minded, the temporary computations are
performed in the eight Intel Floating Point Unit registers. The f[..] terms
correspond to those eight registers.) Here is how to set the number of
temporary computation variables:

• On the Set Up Learning menu, click Options;

• On the Advanced Options Window, click Set;

• The Single Run Advanced Options Window pops up. Click on the
Instructions Tab;

• Input the number of temporary computation variables you want into

the "Maximum Number Of FPU Registers Parameter" box (or use the
slider).

A few more points about the temporary computation variables may be of

interest:

Less Is More. It may be tempting to use all eight computation variables.

We have found that this is rarely useful and that it often interferes with
learning. Instead start with one or two computation variables.

The Dual Role of f[0] Calculation Variable. The f[0] computation

variable is special in Discipulus because it has two roles. It is, of course
a temporary computation variable. But when an evolved program is
finished executing, the value then in f[0] is also treated by Discipulus as
the output of the program for testing fitness.

DiscipulusTM Software Owner’s Manual

Page 196 Function And Terminal Sets

Positioning. Temporary Calculation Variables (f[..] terms) may appear

on both sides of a line of code. Here is a sample line from a program
evolved by Discipulus:

l13: f[0]*=f[0];

This line of code takes the value in the temporary computation variable,
f[0], squares it and places the result back into f[0].

Assembler Equivalent. All of our discussion of the temporary

computation variables above has related to how they appear in
decompiled C programs. But you can also decompile to Intel inline
assembler, C Sharp, Delphi and Java. In the case of assembler, the eight
calculation registers are referred to as ST(0), ST(1) . . .

How to Get Detailed Information About The Intel FPU Registers

and the Assembler Instructions Used by Discipulus. To get more
information, we recommend the Intel Architecture Software
Developer’s Manual

For more information on the Intel FPU processor, we recommend

Sanchez and Canton, "Numerical Programming the 387, 486 and
Pentium", McGraw-Hill, Inc., 1995.

Weighting the Terminal Set

You may set a weight on the relative frequency of constants (1.23456)
and inputs (e.g. Input001...) in the population by changing the Ratio
Constant/Inputs parameter as follows:

• On the Set Up Learning menu, click Options;

• On the Advanced Options Window, click Set;

• On the Single Run Advanced Options Window, click the Instructions

Tab;

• The Ratio Constant/Inputs parameter appears in the Instruction

Control Box.

This number sets the proportion of constants to inputs in the initial

population and also sets the bias for mutations to constants versus
mutations to inputs.

DiscipulusTM Software Owner’s Manual

Function And Terminal Sets Page 197

Choosing the Function Set

The Instruction Set Box on the Instructions Tab of the Single Run
Advanced Options window allows you to select from among the various
instructions Discipulus may use during learning. You get to the
Instructions Set Box as follows:

• On the Set Up Learning menu, select Options,

• Select the Set button, and

• Select the Instructions Tab.

The Instruction Set Box appears in the lower left hand corner of the
page that you see when you make these selections.

The following topics contain additional information about the

Instruction Set Box:

• The Function Set: Types of Instructions Available on page 197; and

• The Function Set: Choosing Instructions for a Run on page 197

The Function Set: Types of Instructions Available

Instructions are grouped by type in the Instruction Set Box. These
Instruction Groups are shown in the display in the Instruction Set Box as
the top outline level. Two examples of Instruction Groups are:

• The "Addition" Group. Discipulus implements four different

addition operators and they are grouped together in the addition
group.

• The "Trigonometric" Group. Discipulus implements two different

trigonometric operators. They are grouped together in this group.

The Function Set: Choosing Instructions for a Run

Click on the "plus" symbol beside an Instruction Group in the
Instruction Set Box. Discipulus will show the commonly used assembler
mnemonic used for operators that are available in that Group.
(Assembler mnemonics can be non-intuitive, to say the least. So we
have provided a comprehensive guide to what each instruction does in
much simpler terminology. See Instruction Set Reference on page 215.)

DiscipulusTM Software Owner’s Manual

Page 198 Function And Terminal Sets

Now click on a box beside one of the assembler mnemonics. Discipulus

will place a check mark beside a particular instruction. When the box
beside an operator is checked, that instruction will be used by
Discipulus during evolution.

For example, in the Addition group, you will find that there are three
different addition type instructions available. By way of example, the
first two addition instructions listed in the Addition Group perform the
following simple addition operations:

• FADD ST(0), ST(%r). This instruction does the following:

f[0]=f[0]+f[n] (or f[0]+=f[n]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any of the temporary computation variables you
have configured Discipulus to use. The value of n is variable and is set
during evolution.

• FADD ST(%r), ST(0). This instruction does the following:

f[n]=f[0]+f[n] (or f[n]+=f[0]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any one of the temporary computation variables
you have configured Discipulus to use. The value of n is variable and is
set during evolution.

These two examples are for illustration only.

Although the name of the various instructions appears complex and

intimidating, each instruction actually performs a single very simple
operation that may be easily understood. The Instruction Set Reference
Chapter below contains a list of all instructions in the Instruction Set
Box describing what each of them does – hopefully in more easily
understood English. (See Instruction Set Reference on page 215.)

Weighting the Function Set

You may set the relative amount of each of the instructions in the
function set of a run by using Discipulus’ Instruction Weights feature.

You get to the Instruction Weights feature as follows:

DiscipulusTM Software Owner’s Manual

Function And Terminal Sets Page 199

• On the Set Up Learning menu, select Options.

• On the Advanced Options Window, click Set.

• Select the Instructions Tab.

The Instruction Set Box then appears in the lower left hand corner of the
page that you see when you make these selections.

You will notice a number between 01 and 20 beside each Instruction in

the Instruction Set Box. This number determines the relative weight
given to that instruction in the next run. It affects both the initialization
of the population and the mutation operator. For example, if the weight
beside an instruction is 02, the mutation operator will be twice as likely
to choose that instruction than if the weight were 01.

To change the weight assigned to a particular instruction, click in the

box by that instruction so that the box is checked. (If the box is already
checked, you will have to click it twice – once to uncheck it and once to
recheck it.) At this point, click on the Instruction Weight slider until you
see the weight beside the instruction change. The maximum weight you
can assign to an instruction is 20.

DiscipulusTM Software Owner’s Manual

Page 200 Function And Terminal Sets

DiscipulusTM Software Owner’s Manual

Page 201

General Reference
This section contains the following reference materials:

• Genetic Programming Reference on page 201

• Data Files for Direct Text File Import Reference on page 204

• Training, Validation, and Applied Data on page 205

• Sample Data Sets Reference on page 209

• Population, Program, Instruction Block and Instruction Reference on

page 211

• Literature Reference on page 213

Genetic Programming Reference

Genetic Programming uses Darwinian natural selection on a population
of programs to evolve a program that predicts the target output from
your data file from the inputs in your data file.

The following topics contain further information about Genetic

Programming:

• The Genetic Programming Algorithm on page 201

• Genetic Programming Search Operators on page 202

The Genetic Programming Algorithm

Here are the steps in Discipulus Genetic Programming for a single run:

1. Initialize the population. Discipulus creates a population of

programs randomly. The number of programs is set by the
Population Size parameter. The average length of the programs is
set by the Program Size: Initial parameter.

2. Run a Tournament. Discipulus picks four programs randomly out

of the population of programs. It compares them and picks two
winners and two losers based on fitness.

DiscipulusTM Software Owner’s Manual

Page 202 General Reference

3. Apply the Search Operators. Discipulus then applies search

operators like crossover and mutation to the winners and produces
two "Children" or "Offspring." Discipulus creates the offspring as
follows:

a) Copy the two winners and replace the losers;

b) With Crossover Frequency, crossover the copies of the winners;

c) With Mutation Frequency, mutate the one of the programs

resulting from performing step 3a; and

d) With Mutation Frequency, mutate the other of the programs

resulting from performing step 3a.

4. Repeat until Termination. Discipulus then repeats the last three

steps until the run is terminated.

Genetic Programming Search Operators

Discipulus uses two different search operators to create child programs
(or offspring) from parent programs. These search operators are
mutation, and crossover.

The following topics contain further information about Genetic

Programming search operators:

• Mutation in Genetic Programming on page 202; and

• Crossover in Genetic Programming on page 204.

Mutation in Genetic Programming

In Genetic Programming, mutation causes a random change in the
program that is the subject of the mutation operator. There are three
different types of mutation in Discipulus:

• Block Mutation;

• Instruction Mutation; and

• Data Mutation.

DiscipulusTM Software Owner’s Manual

General Reference Page 203

Here are more detailed descriptions of each type of mutation arranged in

the following topics:

• Block Mutation on page 203;

• Instruction Mutation on page 203;

• Data Mutation on page 203; and

• Ratio of Constants/Inputs on page 203.

Block Mutation

Discipulus keeps its instructions inside of Instruction Blocks that are 32

bits in length. See Instruction Blocks on page 212. In Block Mutation,
Discipulus replaces an entire Instruction Block with a new randomly
generated Instruction Block.

The Ratio of Constants/Inputs parameter affects the choice of terminals

(constants vs. inputs) placed into these replacement Instruction Blocks.

Instruction Mutation

In Instruction Mutation, Discipulus replaces an existing instruction with

a new, randomly chosen instruction of the same length.

The Ratio of Constants/Inputs parameter affects the choice of terminals

(constants vs. inputs) placed into these replacement Instructions.

Data Mutation

Data Mutation modifies an existing instruction by leaving the existing

operator intact but changing one of the terminals to a randomly selected
terminal.

The Ratio of Constants/Inputs parameter affects the choice of terminals

(constants vs. inputs) placed into these replacement Instruction Blocks.

Ratio of Constants/Inputs

This parameter sets the relative weight accorded to constants and to

inputs during the initialization of the population and in the mutation
operator. A value greater than 50% results in a relatively larger use of

DiscipulusTM Software Owner’s Manual

Page 204 General Reference

constants relative to inputs during evolution. A value less than 50%,

results in the reverse.

Crossover in Genetic Programming

In Genetic Programming, crossover is either Homologous or Non-
Homologous. All crossover in Genetic Programming occurs between
Instruction Blocks, never inside of Instruction Blocks. See Instruction
Blocks on page 212.

Homologous Crossover. In Homologous Crossover, Discipulus selects

a sequence of Instruction Blocks from one of the parent programs. The
position and length of the Instruction Block are chosen randomly. This
sequence of Instruction Blocks is then swapped with a sequence of
Instruction Blocks from the other parent program with the same position
and length.

In other words, Homologous Crossover maintains the position of the

swapped code in the two parent programs even though the code is
swapped between the two parents.

Non-Homologous Crossover. In Non-Homologous Crossover,

Discipulus selects a sequence of Instruction Blocks from one of the
parent programs. The position and length of the Instruction Block are
chosen randomly. Call this sequence of Instruction Blocks, “Sequence
1."

Discipulus then selects a sequence of Instruction Blocks from the other

parent program. The position and length of the Instruction Block are
chosen randomly. Call this sequence of Instruction Blocks,
"Sequence 2."

Sequence 2 then replaces Sequence 1 in the first parent program.

Sequence 1 replaces Sequence 2 in the second parent program.

Note that Non-Homologous crossover can change the length of either or

both of the parent programs.

Data Files for Direct Text File Import Reference

Discipulus allows three types of data files: training, validation and
applied data files. Data files contain the inputs and outputs from which

DiscipulusTM Software Owner’s Manual

General Reference Page 205

Discipulus learns and from which you may evaluate the quality of the
programs that Discipulus has evolved.

You may load data files from the Project Setup Wizard. See Starting the
Project Setup Wizard on page 35 and Using the Project Setup Wizard on
page 36:

Practice Note: Discipulus will not allow you to start a run unless you
have loaded both a both training and validation data sets.

Practice Note: Discipulus will allow you to start a run without applied
data loaded. Applied Data data may be loaded at any time–before or
after a run using File, Load New Applied Data.

The "Data" subdirectory contains sample training and validation data set
files for the user to experiment with.

Training, Validation, and Applied Data

Discipulus is a “supervised learning” system. So you must provide
training, validation and applied data files that contain matched inputs
and outputs from which you want Discipulus to learn. Each matched
pair of inputs and outputs appears as a line in your data files. For
example, a very small training file with two inputs and one output might
look the way it appears in Table 5 (the first two columns are the inputs
and the third column is the output):1

Table 5.
Input 1 Input 2 Output
2.0 4.0 6.0
3.1 5.0 8.1
1.3 3.2 4.5

An evolved program containing only one line of code:

1
The lines and column labels would not appear in a Discipulus data file.
They appear in the above table only for clarity.

DiscipulusTM Software Owner’s Manual

Page 206 General Reference

Output = Input 1 + Input 2;

would produce the output column from the input columns in this table.
Discipulus would evolve this trivially simple program from the above
training data set almost immediately.

The following topics contain additional information about Data Files:

• What Are Training, Validation and Applied Data Files? on page 206;

• Creating Training, Validation, and Applied Data Files on page 207;

and

• A Shortcut for Creating Data Files on page 209.

What Are Training, Validation and Applied Data Files?

If you choose direct text file import (as opposed to importing data via
Notitia) you must provide at least two ASCII text data files to
Discipulus each time you want Discipulus to learn. One of the files is a
training file (see The Training File on page 206). The other is a
validation file (see The Validation File on page 206). In addition, you
may load a separate applied data file with new or different data at any
time before or after a run (see The Applied Data File on page 207).

The Training File

The training file should contain the data (the examples) you want
Discipulus to use when it is learning. In other words, the fitness function
in Discipulus is calculated on the training file. Put another way, an
evolved program will be ranked as more fit during learning the better it
generates the output column from the input columns in your training
file.

To view the training file, click on the Training Tab in the data window.

The Validation File

The validation file is used by Discipulus to pick the best programs from
the population.

The validation file should contain examples that are of the same type
and structure as the training examples and that comprise a good

DiscipulusTM Software Owner’s Manual

General Reference Page 207

representative set of samples from the learning domain. To view the

validation file, click on the Validation Tab in the data window.

Discipulus will not run until you have loaded both training and
validation files. If you do not want to use a separate validation file, just
load the training file in both as the training and the validation data.
Discipulus will run this way just fine.

Discipulus will not train on the examples in the validation file. That is,
Discipulus will not use the examples in the validation file as part of the
fitness function used for natural selection. Instead, it will use the
validation data to provide information to you on how well the programs
evolved by Discipulus will work on data they did not train on.
(Validation is an essential step in automatic learning and is discussed in
greater detail in Chapter 1 and Section 8.5 of Banzhaf, Nordin, Keller
and Francone, Genetic Programming, An Introduction (1998).)

The Applied Data File

The whole point of Discipulus is to let you evolve programs that are
useful on data that you did not have when you trained the program.
Applied data files allow you to do that.

The applied data file should contain examples that are of the same type
and structure as the training examples and that comprise a good
representative set of samples from the learning domain. The only
exception to this is that applied data can, but is not required to contain a
column for the Target Output.

To view the applied data file, click on the Applied Tab in the Data
Window.

Regardless when you load applied data, it has no effect on training or

reporting during a run whatsoever.

Creating Training, Validation, and Applied Data Files

Discipulus requires a text file to be in a precise format before it may be
imported as a training file, a validation file, or a applied data file.
However, these files are not hard to create. Here are the rules for
creating them:

• Data files must be ACSII text files. You may create ASCII files using
Word Pad (this is a utility program that comes with Windows 95/98/

DiscipulusTM Software Owner’s Manual

Page 208 General Reference

NT and may often be found in Windows Accessories.) Alternatively,

you may use a spreadsheet such as Microsoft Excel (as described in A
Shortcut for Creating Data Files on page 209.)

• The training, validation, and applied data files should be identical in

structure, with the same number of inputs and outputs, i.e. the same
number of columns. These files may, however, have a different
number of examples – that is, a different number of rows in the file.

• Your data must be arranged in columns in the training, validation,

and applied data files. Each column represents an input or an output
(the output being held in the farthest right column).

• Each row in the training, validation, and applied data files must have
a separate "example" that contains both inputs and one projected
output.

• The columns of data in the training, validation, and applied data files
must be separated by a tab or a space on each row.

• The output data that you want to have Discipulus learn must be the
right hand column.

• The training, validation, and applied data files must have the same
number of columns of data in each row and must have two or more
rows and two or more columns of data.

• Every value in the training, validation, and applied data files must be
an integer or a real number.

• You should not put any non-printing characters at the end of a line or
the end of a file. Examples of non-printing characters would be, no
extra spaces or tabs at the end of a line.

Here are some values that Discipulus will accept:

• 1.0

• 100

• 2345.67

Here are some values that will not read into Discipulus:

DiscipulusTM Software Owner’s Manual

General Reference Page 209

• $1.00 (dollar sign not allowed)

• 1,235 (comma not allowed)

• True (letters not allowed)

• "Single" (neither letters nor quotation marks are allowed)

A Shortcut for Creating Data Files

The easiest way to create a training or validation file is to use Microsoft
Excel. Create a spreadsheet containing your data. Each input is a column
of data in the spreadsheet. The output is a single column of data on the
farthest right data column of the spreadsheet.

Then create a text file from your spreadsheet as follows. In Excel, make
the following menu selections:

• On the File menu, click Save As;

• Then in the dialog box that pops up, you should select Text Only
from the Save As Type Box and name the text file you want to create.

This procedure will create a properly formatted text file that may be
read directly into Discipulus.

Sample Data Sets Reference

Two sample data sets are included with your program disk, one
regression problem and one classification problem. For information
about how to use Discipulus on these data sets, please see Does
Discipulus Come with Sample Data Sets I Can Run? on page 33.

Each of these data sets is described below.

The Fractionating Column Regression Problem

The fractionating column regression problem is an industrial, chemical,
batch-process control problem. Initial settings for the batch process are
the inputs and the task is to predict the desired results, which is included
as the output data.

DiscipulusTM Software Owner’s Manual

Page 210 General Reference

Three files are included in your Data Folder, one for training, one for
validation and one for testing.

The Gaussian Classification Problem

The Gaussian problem is a difficult classification problem. To solve it,
Discipulus must evolve a program that will distinguish one type of input
(class zero) from another (class one).

Classification. When we refer to classification problem, we mean that

the task is to evolve a program that will decide whether a set of inputs
should be in class zero or class one.

When you do your own classification problems, you determine what

classes zero and one mean. For example, in a credit prediction problem,
class zero might represent a rating of "creditworthy" while class one
might represent a rating of "not creditworthy."

The Gaussian Classes. For the Gaussian problem, the classes are
generated mathematically. Class zero has eight inputs all with normal
distribution with zero mean and standard deviation equal to one. Class
one likewise has eight inputs, all with normal distributions with zero
mean but with a standard deviation equal to two. Thus, the two classes
overlap considerably (in technical terms, they are linearly inseparable),
making it difficult to distinguish among different class members.

False Inputs. The Gaussian problem is a well studied problem.

Normally researchers have studied this problem with only eight inputs.

In our version of the Gaussian problem, we have made the problem

much more difficult by adding sixteen inputs that are, for lack of a better
term, “false” inputs. That is, these sixteen inputs are unrelated to the
output. So in total, there are twenty-four inputs for this problem. Eight
of them are “real” inputs and the remainder are “false” inputs. To solve
the problem, Discipulus must first determine what inputs are the good
ones and then determine how to use the good ones in a program that will
produce useful results. To see how Discipulus does in this regard, look
at the Input Impacts Tab in the Reports Window after you finish the
project--Input001 thru Input008 should be highlighted as the important
inputs.

For more information about this data set, see ESPRIT Basic Research
Project Number 6891, ELENA, Enhanced Learning for Evolutive

DiscipulusTM Software Owner’s Manual

General Reference Page 211

Neural Architecture, June 30, 1995. It is available on the World Wide

Web at http://www.dice.ucl.ac.be/neural-nets/ELENA/ELENA.html.

Population, Program, Instruction Block and Instruction

Reference
Understanding Discipulus and the search operators it uses requires a
basic understanding of how Discipulus evolves computer programs. A
brief definition of several terms appears in the following topics:

• Population on page 211;

• An Evolved Program on page 211;

• The Structure of a Program on page 211;

* The Header on page 212;

* The Footer on page 212; and

* The Body on page 212.

Population
In genetic programming, the population is a collection of computer
programs in which the learning algorithm operates. The smallest
population possible in Discipulus has five programs. The maximum
population size is limited only by the RAM in your computer.

Program
In Genetic Programming, the term “program” refers to a computer
program that is subject to learning. In Discipulus, the program is a
native machine code function that runs directly on the floating point
processor unit. In this manual, these native machine code functions are
referred to as “programs” or “evolved programs.”

The Structure of a Program

A program breaks down into four sections: the header, the body and the
footer and a return instruction. Neither the header, the footer, nor the
return instruction may be crossed-over or mutated. Rather, all search
operators occur in the body of the program.

DiscipulusTM Software Owner’s Manual

Page 212 General Reference

The Header
In Discipulus, the program’s header initializes the floating point unit
(using the FINIT instruction) and then loads the value of zero into each
of the eight FPU registers.

The Footer
In Discipulus, the footer of a program contains a instructions that “tidy-
up” after program execution. The footer is followed by a return
instruction.

The Body
In Discipulus, the body of a program is where learning takes place. The
body of a program is comprised of Instruction Blocks which are in turn
comprised of Instructions.

Instructions

An Instruction is the basic unit of evolution. It is one of the native

machine code instructions that runs on the floating point processor unit.
A complete list of available instructions may be found in the Instruction
Set Reference on page 215.

Instruction Blocks

Because your computer has a CISC processor, different Instructions

have different lengths. Discipulus imposes some uniformity on this
arrangement by "gluing" Instructions together into Instruction Blocks.
The crossover operator does not act inside of Instruction Blocks.

Each Instruction Block is 32 bits long and may be composed of one or

more native processor instructions.

DiscipulusTM Software Owner’s Manual

General Reference Page 213

Literature Reference
Banzhaf, W., Nordin, J., Keller, R.E. and Francone, F.D. (1998).
Genetic Programming – An Introduction To the Automatic Evolution of
Computer Programs and its Applications. Morgan Kaufmann, San
Francisco, CA, USA, and dpunkt.verlag, Heidelberg, Germany.

Koza, J. (1992). Genetic Programming, On the Programming of

Computers by Means of Natural Selection. MIT Press, Cambridge, MA,
USA.

Masters, T. (1995). Advanced Algorithms for Neural Networks. John

Wiley & Sons, New York, NY, USA.

Sanchez and Canton (1995). Numerical Programming the 386, 486 and
Pentium. McGraw-Hill, New York, NY, USA.

INTEL Inc. Architecture Software Developer’s Manual, Volume 1-3.

Available as Adobe Acrobat files from Intel’s web site, http://
developer.intel.com/design/Pentium II/manuals/.

Gathercole, C. and Ross, P. (1994). Dynamic Training Subset Selection

for Supervised Learning in Genetic Programming. In: Proceedings of
PPSN III. Yuval Davidor and Hans-Paul Schwefel (Eds.), Volume 866
of Lecture Notes in Computer Science. Springer, Berlin, Germany.

Andre, D. and Koza, J. (1996). A parallel implementation of Genetic

Programming that achieves superlinear Performance. In: Proceedings of
the International Conference on Parallel and Distributed Processing
Techniques and Applications, Volume III. H.R. Arabnia (Ed.), CSREA.

DiscipulusTM Software Owner’s Manual

Page 214 General Reference

DiscipulusTM Software Owner’s Manual

Page 215

Instruction Set Reference

NOTE: The default settings for a Discipulus project work quite well for
most projects. Thus, the matters covered in this chapter should be
considered advanced subject matters that most user- need not consider.

This chapter describes the instructions that appear in the Instruction Set
Box on the Instruction Page of the Single Run Advanced Options
Window. You may find them as follows:

• On the Set Up Learning menu, click Options. The Advanced Options

Window pops up;

• On the Advanced Options Window, click on the Single Run

Advanced Options button.

• On the Single Run Advanced Options Window, click the Instruction

Tab (Figure 85).

The Instruction Set Box appears on this page and is shown here as
Figure 85. It is organized into Instruction Groups. For example,
addition, arithmetic, and trigonometry are all Instruction Groups you
may use in the Instruction Set Box.

DiscipulusTM Software Owner’s Manual

Page 216 Instruction Set Reference

Figure 85. The Instruction Tab

You can see and access the instructions that are contained in any
Instruction Group by clicking on the plus sign by the Instruction
Group’s name. The following topics provide additional information
about the various types of instructions that may be used in Discipulus
programs:

• Addition Instruction Group on page 217;

• Arithmetic Instruction Group on page 219;

• Comparison Instruction Group on page 221;

• Condition Instruction Group on page 222;

• Data Transfer Instruction Group on page 224;

• Division Instruction Group on page 225;

• Exponential Instruction Group on page 228;

• Multiplication Instruction Group on page 229;

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 217

• Rotate Stack Instruction Group on page 231;

• Subtraction Instruction Group on page 231; and the

• Trigonometric Instruction Group on page 234.

Addition Instruction Group

The addition instruction group includes three instructions, which are
discussed in the following topics:

• Add two registers: See FADD ST(0), ST(%r) on page 217;

• Add two registers: See FADD ST(%r), ST(0) on page 218; and

• Add register and input or register and constant: See FADD

[ESD+%d1] on page 218.

FADD ST(0), ST(%r)

This instruction adds any one of the temporary computation variables
(f[n]) to the value in f[0] and puts the sum into f[0].

C Code Description
This operator is equivalent to the following C pseudocode:

f[0]=f[0]+f[n] (or f[0]+=f[n]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any of the temporary computation variables you
have configured Discipulus to use.

The value of n is variable and is set during evolution.

Assembler Description
This instruction adds the value in the top of the FPU stack (ST(0)) to the
value in variable FPU register designated as (%r). It places the sum into
the top of the stack (ST(0)). The value in %r is variable and is set during
evolution.

Stack Operation
None.

DiscipulusTM Software Owner’s Manual

Page 218 Instruction Set Reference

FADD ST(%r), ST(0)

This instruction adds any one of the temporary computation variables
(f[n]) to the value in f[0] and puts the sum into f[n].

C Code Description
This instruction is equivalent to the following C pseudocode:

f[n]=f[0]+f[n] (or f[n]+=f[0]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any one of the temporary computation variables
you have configured Discipulus to use. The value of n is variable and is
set during evolution.

Assembler Description
This instruction adds the value in the top of the FPU stack (ST(0)) to the
value in variable FPU register designated as (%r). It places the sum into
the variable FPU register designated as (%r). The value in %r is variable
and is set during evolution.

Stack Operation
None.

FADD [ESD+%d1]
This instruction will put two different operators into your evolved
programs:

• The first adds f[0] to one of the inputs from your data file and places
the result into f[0];

• The second adds f[0] to one of the constants from the Terminal Set
and places the result into f[0].

C Code Description
The two operators referred to above are equivalent to the following lines
of C pseudocode in evolved programs:

f[0]=f[0]+input (or f[0]+=input)

f[0]=f[0]+constant (or f[0]+=constant)

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 219

f[0] is, of course, the temporary calculation register. The input will show
up in your evolved programs as Input001, Input002 . . . The constant
will show up as a real valued constant, such as 9.1234567.

During evolution, an input can be changed by the mutation operator to a

constant and vice versa. Similarly, which input or constant is referenced
in this operator may be changed by the mutation operator.

Assembler Description
This instruction adds the value in the top of the FPU stack (ST(0)) to the
value of one of the inputs in your training data set or one of the
constants. It places the sum into the top of the stack (ST(0)). The value
in %d1 is variable (that is, which variable or which constant) and is set
during evolution.

Stack Operation
None.

Arithmetic Instruction Group

The Arithmetic Instruction Group contains four instructions that are
described in the following topics:

• Absolute Value. See FABS on page 219;

• Change Sign. See FCHS on page 220;

• Scaling. See FSCALE on page 220; and

• Square root, See FSQRT on page 221.

FABS
This instruction takes the absolute value of f[0] and places the result
into f[0].

C Code Description
It is equivalent to this C pseudocode:

f[0]=ABS(f[0]);

DiscipulusTM Software Owner’s Manual

Page 220 Instruction Set Reference

Assembler Description
Takes the absolute value of the top of the FPU stack (ST(0)). It places
that absolute value back into the top of the stack (ST(0)).

Stack Operation
None.

FCHS
This instruction changes the sign of f[0] and places the result into f[0].

C Code Description
This instruction is equivalent to this C pseudocode:

f[0]=–(f[0]);

Assembler Description
Changes the sign of the value in the top of the stack register, ST(0).

Stack Operation
None.

FSCALE
This instruction multiplies f[0] by two raised to the power, f[1]. It then
places the result back into f[0].

C Code Description
It is equivalent to this pseudocode:

f[0]=f[0]*(2^f[1]);

Assembler Description
Calculates ST(0)*2^ST(1) and places the result into ST(0).

Stack Operation
None.

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 221

FSQRT
This instruction takes the square root of f[0] and places the result into
f[0].

C Code Description
This instruction is equivalent to the following C pseudocode:

f[0]=SQRT(f[0]);

Assembler Description
Takes the square root of ST(0) and places the result into ST(0).

Stack Operation
None.

Comparison Instruction Group

The comparison instruction group contains only one instruction, which
compares the values in two floating point registers. See FCOMI ST(0),
ST(%r) on page 221.

FCOMI ST(0), ST(%r)

Compares the values in f[0] and f[n]. If f[0] is less than f[n], it sets the
temporary variable, cflag to 1, otherwise, it set cflag to 0.

C Code Description
This instruction is equivalent to the following C pseudocode:

cflag=(f[0]<f[n]);

Where cflag is a Boolean variable that can have only the values of 0 or 1
and where f[n] is the value in one of the n temporary computation
variables.

Assembler Description
Compares the contents of register ST(0) and ST(n) and sets the status
flags ZF, PF, and CF in the EFLAGS register according to the results.

DiscipulusTM Software Owner’s Manual

Page 222 Instruction Set Reference

Stack Operation
None.

Condition Instruction Group

The conditional instructions work with the Comparison Instruction
Group. The Comparison Instructions set the value of cflag by comparing
the values in f[0] and f[1]. Then the Condition Instructions use the value
in cflag to decide whether or not to take one of two steps:

• Move the value in f[n] to f[0]; or

• Jump over one Instruction Block.

The following topics describe the Conditional Instructions you may

include in Discipulus programs:

• Conditional copy of value from one register to f[0]: See FCMOVB

ST(0), ST(%r) on page 222;

• Conditional copy of value from f[0] to another register: See

FCMOVNB ST(0), ST(%r) on page 223;

• Conditional jump of an Instruction Block if cflag = 1: See JB EPI+6

on page 223; and

• Conditional jump of an Instruction Block if cflag = 0; JNB EPI+6 on

page 224.

FCMOVB ST(0), ST(%r)

This instruction moves the value in f[n] to f[0] if the conditional flag
(cflag) is equal to 1. (The conditional flag is set by the Comparison
Group instructions.)

C Code Description
This instruction is equivalent to the following C pseudocode:

if (cflag) f[0] = f[n];

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 223

Assembler Description
Tests the CF status flag and moves the source operand (ST(n)) to the
destination operand (ST(0)), if CF=1.

Stack Operation
None.

FCMOVNB ST(0), ST(%r)

This instruction moves the value in f[n] to f[0] if the conditional flag
(cflag) is equal to 0. (The conditional flag is set by the Comparison
Group instructions.)

C Code Description
Equivalent C pseudocode is:

if (!cflag) f[0] = f[n];

Assembler Description
Tests the CF status flag and moves the source operand (ST(n)) to the
destination operand (ST(0)), if CF=0.

Stack Operation
None.

JB EPI+6
This instruction causes the program to skip execution of the next
Instruction Block if the conditional flag (cflag) equals 1. (The
conditional flag is set by the Comparison Group instructions.)

C Code Description
A C code example follows. This code tests whether cflag=1. If it does,
the program skips over line 12:

11: if (cflag) goto 13;

12: f[0]+=1.234567;
13: f[0]*=f[0];

DiscipulusTM Software Owner’s Manual

Page 224 Instruction Set Reference

Assembler Description
Tests the CF status flag and jumps program execution by 6 bytes if
CF=1.

Stack Operation
None.

JNB EPI+6
This instruction causes the program to skip execution of the next
Instruction Block if the conditional flag (cflag) equals 0. (The
conditional flag is set by the Comparison Group instructions.)

C Code Description
A C code example follows. This code tests whether cflag=0. If it does,
the program skips over line 12.

11: if (!cflag) goto 13;

12: f[0]+=1.234567;
13: f[0]*=f[0]

Assembler Description
Tests the CF status flag and jumps program execution by 6 bytes if
CF=0.

Stack Operation
None.

Data Transfer Instruction Group

The Data Transfer Instructions move values around without changing
the values. The one such instruction implemented in Discipulus
exchanges values between f[0] and another register. See FXCH ST(%r)
on page 224.

FXCH ST(%r)
The FXCH instruction swaps the values in f[0] and f[n]. This is an
important instruction in Register Machine configurations because it
allows the system to move values to and from the higher f[n] variables
for temporary storage while other calculations are performed in f[0].

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 225

C Code Description
The FLD instructions are equivalent to the following C pseudocode:

tmp=f[0];
f[0]=f[n];
f[n]=tmp;

Assembler Description
Swap the values in ST(0) and ST(n).

Stack Operation
None.

Division Instruction Group

The Division Instruction Group includes four instructions that are
detailed in the following topics:

• Divide one register by another; place the result in f[0]: See FDIV
ST(0), ST(%r) on page 225;

• Divide one register by another; place the result in f[n]; See FDIV
ST(%r), ST(0) on page 226;

• Calculate a remainder; See FPREM on page 226; and

• Divide f[0] by either a constant or an input value: See FDIV

[ESD+%d1] on page 227.

FDIV ST(0), ST(%r)

This instruction divides one of the temporary computation variables
(f[0]) by the value in f[n] and puts the difference into f[0].

C Code Description
This operator is equivalent to the following C pseudocode:

f[0]=f[0]/f[n] (or f[0]/=f[n]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any of the temporary computation variables you

DiscipulusTM Software Owner’s Manual

Page 226 Instruction Set Reference

have configured Discipulus to use. The value of n is variable and is set

during evolution.

Assembler Description
This instruction divides the value in the top of the FPU stack (ST(0)) by
the value in variable FPU register designated as (%r). It places the
difference into the top of the stack (ST(0)). The value in %r is variable
and is set during evolution.

Stack Operation
None.

FDIV ST(%r), ST(0)

This instruction divides one of the temporary computation variables
(f[n]) by the value in f[0] and puts the difference into f[n].

C Code Description
This instruction is equivalent to the following C pseudocode:

f[n]=f[n]/f[0] (or f[n]/=f[0]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any one of the temporary computation variables
you have configured Discipulus to use. The value of n is variable and is
set during evolution.

Assembler Description
This instruction divides the value in the top of the FPU stack (ST(0)) by
the value in variable FPU register designated as (%r). It places the result
into the variable FPU register designated as (%r). The value in %r is
variable and is set during evolution.

Stack Operation
None.

FPREM
This operator causes an evolved program calculate the remainder left
when f[0] is divided by f[1] and to place the result into f[0]. This
instruction is useful for periodic data.

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 227

C Code Description
This instruction is equivalent to the following C pseudocode:

f[0]=f[0]– ((int)(f[0]/f[1])*f[1]);

f[0] and f[1] are, of course, temporary calculation variables.

Assembler Description
Computes the remainder obtained from dividing the value in the ST(0)
register (the dividend) by the value in the ST(1) register (the divisor or
modulus), and stores the result in ST(0). The remainder represents the
following value:

Remainder = ST(0) - (Q * ST(1))

Here, Q is an integer value that is obtained by truncating the real-

number quotient of [ST(0) /ST(1)] toward zero. The sign of the
remainder is the same as the sign of the dividend. The magnitude of the
remainder is less than that of the modulus.

Stack Operation
None.

FDIV [ESD+%d1]
This instruction will put two different types of code into your evolved
programs:

• The first divides f[0] by one of the inputs from your data file and
places the result into f[0];

• The second divides f[0] by one of the constants from the Terminal
Set and places the result into f[0].

C Code Description
This operator causes an evolved program to include both of the
following lines of C pseudocode in evolved programs:

f[0]=f[0]–input (or f[0]–=input); and

f[0]=f[0]–constant (or f[0]–=constant);

DiscipulusTM Software Owner’s Manual

Page 228 Instruction Set Reference

f[0] is, of course, the temporary calculation register. The input will show
up in your evolved programs as Input001, Input002. . . The constant will
show up as a real valued constant, such as 9.1234567.

During evolution, an input can be changed by the mutation operator to a

constant and vice versa. Similarly, which input or constant is referenced
in this operator may be changed by the mutation operator.

Assembler Description
This instruction subtracts the value in one of the inputs in your training
data set or one of the constants, from the value in the top of the FPU
stack (ST(0)). It places the difference into the top of the stack (ST(0)).
The value in %d1 represents which value is subtracted (that is, which
variable or which constant) and is set during evolution.

Stack Operation
None.

Exponential Instruction Group

This instruction group implements only one instruction, F2XM1. The
F2XM1 instruction calculates two raised to the f[0] power, minus one
and puts the result into f[0].

C Code Description
This operator is equivalent to the following C pseudocode:

if (fabs(f[0])<1) f[0]=pow(2,f[0])-1;

Where f[0] represents the first temporary computation variable.

Assembler Description
Calculates the exponential value of 2 to the power of the source operand
minus 1. The source operand is located in register ST(0) and the result is
also stored in ST(0). The value of the source operand must lie in the
range –1.0 to +1.0. If the source value is outside this range, the result is
undefined.

Stack Operation
None.

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 229

Multiplication Instruction Group

The four multiplication instructions implemented in Discipulus are
discussed in the following topics:

• Multiply two registers and place the result in f[0]. See FMUL ST(0),
ST(%r) on page 229;

• Multiply two registers and place the result in f[n]. See FMUL
ST(%r), ST(0) on page 229; and

• Multiply a register by an input or a constant. See FMUL [ESD+%d1]

on page 230.

FMUL ST(0), ST(%r)

This instruction multiplies one of the temporary computation variables
(f[n]) and f[0] and puts the product into f[0].

C Code Description
This operator is equivalent to the following C pseudocode:

f[0]=f[0]f[n] (or f[0]=f[n]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any of the temporary computation variables you
have configured Discipulus to use. The value of n is variable and is set
during evolution.

Assembler Description
This instruction multiplies the value in the top of the FPU stack (ST(0))
and the value in variable FPU register designated as (%r). It places the
product into the top of the stack (ST(0)). The value in %r is variable and
is set during evolution.

Stack Operation
None.

FMUL ST(%r), ST(0)

This instruction multiplies the values in f[0] and f[n] together and places
the results in f[n].

DiscipulusTM Software Owner’s Manual

Page 230 Instruction Set Reference

C Code Description
This instruction is equivalent to the following C pseudocode:

f[n]=f[0]f[n] (or f[n]=f[0]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any one of the temporary computation variables
you have configured Discipulus to use. The value of n is variable and is
set during evolution.

Assembler Description
This instruction multiplies the value in the top of the FPU stack (ST(0))
and the value in variable FPU register designated as (%r). It places the
product into the variable FPU register designated as (%r). The value in
%r is variable and is set during evolution.

Stack Operation
None.

FMUL [ESD+%d1]
This instruction will put two related operators into your evolved
programs:

• The first multiplies f[0] and one of the inputs from your data file and
places the result into f[0];

• The second multiplies f[0] and one of the constants from the
Terminal Set and places the result into f[0].

C Code Description
The two related operators referred to above are equivalent to the
following lines (one at a time) of C pseudocode in evolved programs:

f[0]=f[0]input (or f[0]=input).

f[0]=f[0]constant (or f[0]=constant).

f[0] is, of course, the temporary calculation register. The input will show
up in your evolved programs as Input001, Input002. . . . etc. Or, if you
name the input columns and use Notitia to import the data to Discipulus,

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 231

your input names will appear in the evolved programs. The constant will
show up as a real valued constant, such as 9.1234567.

During evolution, an input can be changed by the mutation operator to a

constant and vice versa. Similarly, which input or constant is referenced
in this operator may be changed by the mutation operator.

Assembler Description
This instruction multiplies the value in one of the inputs in your training
data or one of the constants, to the value in the top of the FPU stack
(ST(0)). It places the product into the top of the stack (ST(0)). The value
in %d1 represents which value is subtracted (that is, which variable or
which constant) and is set during evolution.

Stack Operation
None.

Rotate Stack Instruction Group

FDECSTP
This instruction decrements the FPU stack pointer by 1. It makes no
changes to the contents of the registers.

FINCSTP
This instruction increments the FPU stack pointer by 1. It makes no
changes to the contents of the registers.

Subtraction Instruction Group

The three subtraction instructions implemented by Disciples are
discussed in the following topics:

• Subtract two registers and put the result in f[0]. See FSUB ST(0),
ST(%r) on page 232;

• Subtract two registers and put the result in f[n]. See FSUB ST(%r),
ST(0) on page 232; and

• Subtract an input or a constant from a register. See FSUB

[ESD+%d1] on page 233.

DiscipulusTM Software Owner’s Manual

Page 232 Instruction Set Reference

FSUB ST(0), ST(%r)

This instruction subtracts one of the temporary computation variables
(f[n]) from the value in f[0] and puts the difference into f[0].

C Code Description
This operator is equivalent to the following C pseudocode:

f[0]=f[0]–f[n] (or f[0]–=f[n]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any of the temporary computation variables you
have configured Discipulus to use. The value of n is variable and is set
during evolution.

Assembler Description
This instruction subtracts the value in the top of the FPU stack (ST(0))
from the value in variable FPU register designated as (%r). It places the
difference into the top of the stack (ST(0)). The value in %r is variable
and is set during evolution.

Stack Operation
None.

FSUB ST(%r), ST(0)

This instruction subtracts f[0] from f[n] and places the result into f[n].

C Code Description
This instruction is equivalent to the following C pseudocode:

f[n]=f[n]–f[0] (or f[n]–=f[0]);

Where f[0] represents the first temporary computation variable and

where f[n] represents any one of the temporary computation variables
you have configured Discipulus to use. The value of n is variable and is
set during evolution.

Assembler Description
This instruction subtracts the value in the top of the FPU stack (ST(0))
from the value in variable FPU register designated as (%r). It places the

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 233

difference into the variable FPU register designated as (%r). The value
in %r is variable and is set during evolution.

Stack Operation
None.

FSUB [ESD+%d1]
This instruction will put two related operators into your evolved
programs:

• The first subtracts one of the inputs from your data file from f[0] and
places the result into f[0];

• The second subtracts one of the constants from the Terminal Set from
f[0] and places the result into f[0].

C Code Description
The two related operators referred to above are equivalent to the
following lines of C pseudocode in evolved programs:

f[0]=f[0]–input (or f[0]–=input).

f[0]=f[0]–constant (or f[0]–=constant).

f[0] is, of course, the temporary calculation register. The input will show
up in your evolved programs as Input001, Input002, etc. Or, if you
assigned column names for your inputs and used Notitia to import the
data, your column names will be used in the evolved programs. The
constant will show up as a real valued constant, such as 9.1234567.

During evolution, an input can be changed by the mutation operator to a

constant and vice versa. Similarly, which input or constant is referenced
in this operator may be changed by the mutation operator.

Assembler Description
This instruction subtracts an input or a constant from the value in the top
of the FPU stack (ST(0)). It places the result into the top of the stack
(ST(0)). The value in %d1 represents which value is subtracted (that is,
which variable or which constant) and is set during evolution.

DiscipulusTM Software Owner’s Manual

Page 234 Instruction Set Reference

Stack Operation
None.

Trigonometric Instruction Group

The two trigonometric functions implemented in Discipulus are
discussed in the following topics:

• Cosine function. See FCOS on page 234; and

• Sine function. FSIN on page 234.

FCOS
This instruction calculates the cosine of f[0] and puts the result into f[0].

C Code Description
This operator is equivalent to the following C pseudocode:

f[0]=cos(f[0]);

Assembler Description
Calculates the cosine of the source operand in register ST(0) and stores
the result in ST(0).

Stack Operation
None.

FSIN
This instruction calculates the sin of f[0] and puts the result into f[0].

C Code Description
This operator is equivalent to the following C pseudocode:

f[0]=sin(f[0]);

Assembler Description
Calculates the sine of the source operand in register ST(0) and stores the
result in ST(0). The source operand must be given in radians.

DiscipulusTM Software Owner’s Manual

Instruction Set Reference Page 235

Stack Operation
Pushes a value onto the stack. Decrements the stack pointer.

DiscipulusTM Software Owner’s Manual

Page 236 Instruction Set Reference

DiscipulusTM Software Owner’s Manual

Page 237

Index
A
Addition Instruction Group 217
Advanced Options Window 114, 158
finding 157
Age 159
Analysis
evolved program 14
models 14
Arithmetic Instruction Group 219
Assembler 190
equivalent 196
mnemonics 197
Atom 189
Automatic programming
function set 189
terminal set 189

B
Best Evolved Program 110
Best Programs
deployment 69
Block Mutation
block rate 165
in genetic programming 203
Body 211, 212

C
C Code 190
saving for compilation into other programs 128
cflag 191, 221, 222, 223, 224
Chart Selection Box 111
Chart View 111
Classifications
three or more 180

DiscipulusTM Software Owner’s Manual

Page 238 Index

Code
adding a line of 131
eliminating excess lines 142
Comparison Instruction Group 221
Comparison Instructions 191
Condition Instruction Group 222
Conditional Branching 191
Conditional Flags 191
Constant
optimizing
combining with manual simplification 141
Constants 167–170, 191–194
eliminating stacked 143
how Discipulus optimizes 139
inputting 193
letting Discipulus create 194
optimizing
how to 139
parameters 192
ratio of to inputs 167, 203
removing 194
weight 167
Crossover 152
advanced 165
frequency 152, 202
homologous 168, 204
in genetic programming 165, 202
non-homologous 168, 204
and program size 171
rate 152
Crossover rate 152

D
Darwinian Natural Selection 14
Data
Time Series 32

DiscipulusTM Software Owner’s Manual

Index Page 239

Data File
creating 33
creating with Microsoft Excel 33
example 30
general rule for splitting up data 32
Splitting between training, validation and applied 32
Data Files 28, 204
loading 204
opening 83, 204
order of examples 109
outputs 30
sorting 109
types of data 29
Data Mutation 202, 203
Data Transfer Instruction Group 224
Data Window 101
chart selection box 111
continuous output 109
displaying inputs 101
displaying outputs of best programs 101
displaying target output 101
training tabs 109
validation tabs 109
Deme
crossover percentage between 154
enabled/not enabled 154
migration rate between 155
number of 154
parameters 154–155
usefulness of 153
Deployment
from project file 69
Difficulty 159
Division Instruction Group 225

DiscipulusTM Software Owner’s Manual

Page 240 Index

DSS
age 159
defined 159
difficulty 159
enabled 161
frequency of changing subset 162
random 159
selection by age 161
selection by difficulty 161
stochastic selection 161
target subset size 161
training subset 159
Dynamic Subset Selection. See also DSS 158–162

E
Error Measurements
linear 178
squared 178
Evolution 191, 192, 198
natural 168
speeding up 158
Evolved Program
analysis 14
best of run on training data 110
during reporting period 110
best of run on validation data 110
C code in 190
calculation variable in 195
cflag in 191
computation variable in 189
constants in 189, 191
display of 110
inputs in 190
line of code in 189
loading into interactive evaluator 128

DiscipulusTM Software Owner’s Manual

Index Page 241

outputs 111
saving for later use 128
selecting in chart selection box 111
temporary computation variables in 191
Evolved program
deployment 14
Evolved Programs
analysis 14
defined 14
deployment
programming interface 70
determining if two are tied 182
saving from interactive evaluator 127
Examples
assigning weights to 182
positive and negative 182
Exponential Instruction Group 228

F
FABS 219
FADD 217, 218
FCHS 220
FCMOVB 222
FCMOVNB 223
FCOMI 221
FCOS 234
FDECSTP 231
FDIV 225, 226, 227
File Menu 83
FINCSTP 231

DiscipulusTM Software Owner’s Manual

Page 242 Index

Fitness Function
custom 187
DSS 158
hits-then-error 181
linear error 177
linear error measurement 177
overview 177
square of the error 177
squared error measurement 177
FMUL 229, 230
Footer 212
FPREM 226
FPU 195
how to get detailed information about 196
preset files 190
Frequency
crossover 152
in generation equivalents 162
mutation 152
Frequently Asked Questions 11
FSCALE 220
FSIN 234
FSQRT 221
FSUB 232, 233
Function Set 189
choosing 197
defined 190
weighting 198
FXCH 224

G
Genetic Programming
algorithm 201
deme 153
initial population 169
mutation frequency 152

DiscipulusTM Software Owner’s Manual

Index Page 243

parameters
crossover rate 152
maximum number of FPU registers in 195
mutation rate 151
population size 151, 152, 153
reproduction rate 153
reference 201
search operators 202

H
Header 212
Hit-Rate
definition 180
positive and negative 180
reporting of 181
Homologous crossover 168

I
Individual 211
Initial Population 167, 169
Initial Program Size 169
Input
sensitivity analysis 68
Input Impacts Tab 68
Inputs 190
detecting spurious 142
ratio of to constants 167, 203
weight 167
Installation 5
Instruction 203
mutation 203
mutation rate 166
ratio of constants/inputs 167

DiscipulusTM Software Owner’s Manual

Page 244 Index

Instruction Block
crossover 204
homologous crossover 168
length 165
non-homologous crossover 204
reference 211
Instruction Data Mutation Rate 167
Instruction Group
addition 217
arithmetic 219
comparison 221
condition 222
data transfer 224
division 225
exponential 228
multiplication 229
rotate stack 231
subtraction 231
trigonometric 234
Instruction Rate Mutation Box 166
Instructions
choosing 133, 134
FABS 219
FADD 217, 218
FCHS 220
FCMOVB 222
FCMOVNB 223
FCOMI 221
FCOS 234
FDECSTP 231
FDIV 225, 226, 227
FINCSTP 231
FMUL 229, 230
FPREM 226
FSCALE 220
FSIN 234
FSQRT 221
FSUB 232, 233

DiscipulusTM Software Owner’s Manual

Index Page 245

FXCH 224
installation 5
JB 223
JNB 224
reference 215
that accept a constant 136
that accept a register 137
that accept an input 136
that have no parameters 138
types 136
Intel FPU Registers. See also FPU 196
Interactive Evaluator
adding a line of code in 131
calculating program fitness 129
default program load 129
editing program in 131
initial queue 123
loading a saved program 129
loading programs into 128
opening 121
performance box 130
running program on new (applied) data 130
saving evolved programs 127
saving programs for later use 128
viewing program outputs 130
Introns 144

J
JB 223
JNB 224

L
License Agreement 10
Linear Fitness Function 177

DiscipulusTM Software Owner’s Manual

Page 246 Index

M
Main WIndow 81
Main Window 81
menu bar 82
menus 82
status bar 82, 88
toolbar 82, 87
Maximum Program Size 170, 171
Menu Bar 82
Menus
set up learning 84
Microsoft Word Pad 207
Minimum 5
Minimum System Requirements 5
Model Building
analysis 57
steps 13
team models 23
Models
analysis 14
Monitor Project Window 91
current run tab 93
overview 40
overview tab 89
Multiplication Instruction Group 229
Mutation
in genetic programming 151
of data 167
of instruction blocks 165
of instructions 166
rate of constants/inputs 167
Mutation Rate
in genetic programming 151

DiscipulusTM Software Owner’s Manual

Index Page 247

N
Natural Selection 14
Non-homologous crossover 168, 171

O
Operator
as part of function set 189
complex 143
examples of 190
linear 143
register machine 199
replacing complex with linear 143
stack 199
using preset files to configure 190
Outputs
class one 180
class zero 180
classifying 179
continuous 109
controlling display of 110
of best evolved programs 101
target values 179
Overfitting
detecting 72
eliminating 72
how to address 71

P
Parameters
advanced 157
block mutation 165
choosing 133
data mutation 167
in custom DLL 188
initial program size 169
maximum program size 170

DiscipulusTM Software Owner’s Manual

Page 248 Index

program size and non-homologous crossover 171

selecting 135
types 136
Pentium II 5
Pentium Pro 5
Performance Box 130
Problem Type
automatic detection with project setup wizard 38
regression 22
Program
body 212
calculating fitness of 129
changing in program queue 125, 126
default load 129
editing 131
effect of adding or removing an instruction 133
effect of editing 133
footer 212
header 212
loading a saved 129
manual simplification 141
running on new (applied) data 130
viewing outputs of 130
Program Model
deployment from interactive evaluator 70
Program Models
deployment 69
Program Queue
changing a program from beginning or middle of 126
changing a program from end of 125
effect of optimizing on 140
moving around in 124
viewing fitness and hit-rate changes while browsing 130
what happens programs are loaded 125
what happens when changes are made to displayed programs 125

DiscipulusTM Software Owner’s Manual

Index Page 249

Program Size
and constants 191
initial 169
maximum 170
Project
continuing where you left off 45
Defined 15
File 15
finish 14
information available while running 39, 40
runs included 15
starting 14, 33
starting with project setup wizard 39
when to stop a project 42
project detail tab 91
Project File
naming project file with the project setup wizard 39
Project Setup Wizard
starting 88
using 36

R
RAM 5
Random Seed
system clock 171
Ratio of Constants/Inputs 167, 203
Ratio of Constants/Inputs Box 167
Raw Error 178
Register Machine
operator 199
Removing Introns 144
Replacement
instruction block 203
instructions 203
Reporting Period
best of run on training data 110
continuous output 109
Reports Window 94

DiscipulusTM Software Owner’s Manual

Page 250 Index

Reproduction Rate 153

Rotate Stack Instruction Group 231
Run
performing a single run 74
Run Control Parameters
random seed 171

S
Saved Programs
deployment from interactive evaluator 70
Search Operator
block mutation rate 166
homologous crossover 169
in genetic programming 165
instruction data mutation rate 167
instruction mutation rate 166
Selection
by age 161
by difficulty 161
dynamic subset 158
stochastic 161
Set Up Learning Menu 84
Simplification
automatic 145
choosing standard or thorough 145
speeding up 145
Simulated Annealing
reference 204
Small Data Set 32
Sorting 112
Square Fitness Function 177
Stack
operator 199
Statistics Window 113
Status Bar 88
Stochastic Selection 161

DiscipulusTM Software Owner’s Manual

Index Page 251

Subtraction Instruction Group 231

Supervised Learning 15
System Time 171

T
Target Subset Size 161
Team Models
output
viewing graph of 67
viewing numeric outputs 67
viewing C, Java or Assembler Code 68
viewing performance statistics 65
Technical 10
Technical Support 10
Temporary Computation Variables
FPU register equivalents in 195
number of 195
role of 195
special role of 195
Terminal Set 189
cflag in 191
choosing 192
conditional flags in 191
constants in 191
defined 190
inputs 190
temporary computation variables in 191, 195
weighting 196
Testing Data 206
Testing File 206
Toolbar 82, 87
Tournaments
selection 162
Training
subset 159
tabs 109

DiscipulusTM Software Owner’s Manual

Page 252 Index

Training Data
best of run on 110
defined 30
Training File
ASCII text in 207
creating 207
defined 206
using Microsoft Excel to create 207
Training Subset 159
Training Tabs 109
Trigonometric Instruction Group 234

V
Validation
tabs 109
Validation Data
best of run on 110
Validation File
ASCII text in 207
creating 207
defined 206
using Microsoft Excel to create 207
Validation Tabs 109
Viewing Inputs 101, 111

W
Weighting
function set 198
terminal set 196
Windows
data 101
main 81
statistics 113
Windows 2000 5
Windows 98 5
Windows NT 5