Enterprise Historian: Pavilion Insights
Enterprise Historian: Pavilion Insights
Pavilion Insights
Version 5.0
User’s Guide
3BUF 000 497R0001 REV A
NOTICE
The information in this document is subject to change without notice and should not be construed as a commitment by
ABB Automation, Inc. ABB Automation, Inc. assumes no responsibility for any errors that may appear in this document.
In no event shall ABB Automation, Inc. be liable for direct, indirect, special, incidental, or consequential damages of any
nature or kind arising from the use of this document, nor shall ABB Automation, Inc. be liable for incidental or
consequential damages arising from use of any software or hardware described in this document.
This document and parts thereof must not be reproduced or copied without ABB Automation, Inc. ’s written permission,
and the contents thereof must not be imparted to a third party nor be used for any unauthorized purpose.
The software described in this document is furnished under a license and may be used, copied, or disclosed only in
accordance with the terms of such license.
TRADEMARKS
Advant, AdvaCommand, AdvaInform, and AdvaBuild are registered trademarks of ABB Asea Brown Boveri Ltd.,
Switzerland.
See additional legal notices in Preface section.
User’s Guide
Version 5.0
June 1999
Pavilion Technologies, Inc. has made substantial efforts to ensure the accuracy of this document. Pavilion
Technologies, Inc. assumes no responsibility for any damages that may result from errors or omissions in
this document. The information in this document is subject to change without notice and should not be
construed as a commitment by Pavilion Technologies, Inc.
The software described in this document is furnished under a license and may be used or copied only in
accordance with the terms of such license.
Copyright Pavilion Technologies, Inc., 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999. All Rights
Reserved.
The following are registered trademarks of Pavilion Technologies, Inc.: Data Insights, OnLine Transform
Processor, Pavilion, Pavilion Data Interface, Process Insights, Process Perfecter, Sensor Validator,
Simulation Insights, Soft CEM, Soft Sensor, Soft Sensor Insights, Software CEM, Pavilion - Turning Your
Data Into Gold.
The following are trademarks of Pavilion Technologies, Inc.: BOOST, Boiler OnLine Optimization
Software Technology, DataIns, Economic Insights, Insights, OnLine Learning, Pavilion OnLine
Applications, Pavilion RunTime Products, PDI, Plant Optimizer, Power Insights, Power Insights Suite,
Process Optimizer, ProcIns, Production Chain Optimization, Programless OnLine Engine, Property
Predictor, RunTime Application Engine, RunTime Software Controller, Simulation Insights, Soft Sensor
Insights, Virtual OnLine Analyzer, VOA.
Advant, AdvaCommand, AdvaInform, and AdvaBuild are registered trademarks of ABB Asea Brown
Boveri Ltd, Switzerland. Enterprise Historian is a trademark of ABB Asea Brown Boveri Ltd, Switzerland.
AIM is a trademark of W.R. Biles & Associates, Inc.
CM50S and TDC 3000 are trademarks of Honeywell Inc.
Exceed is a trademark of Hummingbird Communications, Ltd.
Foxboro and I/A Series are registered trademarks of The Foxboro Company.
GLOBEtrotter, GLOBEtrotter Software, FLEXlm and Flexible License Manager are registered trademarks
of GLOBEtrotter Software, Inc.
HP, Apollo, and HP-UX are registered trademarks of Hewlett-Packard Company.
IBM, RS-6000, and AIX are trademarks of International Business Machines Corporation.
Motif, OSF/1, UNIX and the "X" device are registered trademarks and IT DialTone and The Open Group
are trademarks of The Open Group in the US and other countries.
OpenVMS, VAX, DEC, and DECnet are trademarks of Digital Equipment Corporation.
PI and PI-ProcessBook are trademarks of OSI Software, Inc.
PostScript is a trademark of Adobe Systems Incorporated.
Sentinel and Sentinel SuperPro are registered trademarks of Rainbow Technologies, Inc.
SUN, Sun Microsystems, and Solaris are registered trademarks of Sun Microsystems, Inc., and SunOS is a
trademark of Sun Microsystems, Inc.
Windows NT, Windows, Excel for Windows, and Notepad are registered trademarks of Microsoft
Corporation.
X Window System is a trademark of the Massachusetts Institute of Technology.
Contents
Preface xiii
How This Manual Is Organized xiii
Release Notes xiv
Year 2000 xv
Input Data Restriction xv
Number of Windows xv
Technical Support xv
Chapter 1: Introduction
Getting Started 1-2
Product Overview 1-3
File Pull-Down Menu 1-3
Edit Pull-Down Menu 1-5
View Pull-Down Menu 1-5
Tools Pull-Down Menu 1-7
File Editor 1-7
File Formatter 1-7
Chapter 5: Spreadsheet
About Datasets 5-1
Transforms 5-2
Status 5-2
Date/Time Reference 5-3
Preprocessing Data 5-3
Creating a Dataset 5-4
Loading a Dataset 5-10
Spreadsheet Window 5-12
Header Information 5-14
Selecting a Region 5-14
Changing Spreadsheet Contents 5-15
Displaying Variable Statistics 5-17
Appendix D: Files
Log Files D-1
Data Dictionary File D-2
Print Files D-2
Format File Suffixes D-2
Dataset File Suffixes D-3
Dataset Report File Suffixes D-3
Model File Suffixes D-3
Model Report and Data File Suffixes D-4
Index
Appendix B, Error Measures, contains definitions and equations for relative error, R2,
and other error measures.
Appendix C, Frequently-Asked Questions, provides tips and hints.
Appendix D, Files, lists and briefly describes the files created or required by the
product.
Appendix E, User-Defined Transforms, explains how to develop and add your own
transforms to the transform calculator.
Some features require the purchase of additional licensing; for more information,
contact your sales representative.
Release Notes
Current release notes are provided with every installation. Be sure to consult the release
notes before using the product.
Year 2000
All Pavilion products (except versions of Process Insights® and Software CEM® earlier
than version 1.5) have been rigorously tested for Year 2000 Compliance. For further
details, contact your customer support representative.
Number of Windows
This product does not limit the number of windows that you can display simultaneously,
but many windowing systems have such a limit, often about 20. Consult your system
administrator for more details.
Technical Support
If you have problems or questions about the product, contact your customer support
representative.
The Pavilion software distribution includes a script, pav_info, that displays
information about your computer’s hardware and software configuration. If you report a
problem with your Pavilion installation, your customer support representative may
request that you run this script and send the output to help with diagnosis. You may run
the script at any time if you feel curious. The pav_info script is located in the same
directory as other Pavilion executables.
Advant, AdvaCommand, AdvaInform, and AdvaBuild are registered trademarks of ABB Asea Brown
Boveri Ltd, Switzerland. Enterprise Historian is a trademark of ABB Asea Brown Boveri Ltd, Switzerland.
AIM is a trademark of W.R. Biles & Associates, Inc.
CM50S and TDC 3000 are trademarks of Honeywell Inc.
Exceed is a trademark of Hummingbird Communications, Ltd.
Foxboro and I/A Series are registered trademarks of The Foxboro Company.
GLOBEtrotter, GLOBEtrotter Software, FLEXlm and Flexible License Manager are registered trademarks
of GLOBEtrotter Software, Inc.
HP, Apollo, and HP-UX are registered trademarks of Hewlett-Packard Company.
IBM, RS-6000, and AIX are trademarks of International Business Machines Corporation.
Motif, OSF/1, UNIX and the "X" device are registered trademarks and IT DialTone and The Open Group
are trademarks of The Open Group in the US and other countries.
OpenVMS, VAX, DEC, and DECnet are trademarks of Digital Equipment Corporation.
PI and PI-ProcessBook are trademarks of OSI Software, Inc.
PostScript is a trademark of Adobe Systems Incorporated.
Section
Sentinel and Sentinel SuperPro are registered trademarks of Rainbow Technologies, Inc.
SUN, Sun Microsystems, and Solaris are registered trademarks of Sun Microsystems, Inc., and SunOS is a
trademark of Sun Microsystems, Inc.
Windows NT, Windows, Excel for Windows, and Notepad are registered trademarks of Microsoft
Corporation.
X Window System is a trademark of the Massachusetts Institute of Technology.
Pavilion Insights™ gives you the power to analyze complex linear and nonlinear
processes. Using just your historical process data, Insights analysis tools can:
• Plot process data in a variety of revealing formats: chronological order, variable
against variable, histogram, principal components analysis (P.C.A.), and
correlation.
• Compute basic statistics or use the transform calculator to perform advanced
analysis.
• Rank the effect of process inputs on the outputs.
• Perform what-if scenarios (predictions) using a linear or nonlinear model.
Insights provides all the tools required for complete process analysis: data formatter,
data spreadsheet, data plotter, nonlinear model builder and trainer, model analysis tools,
and what-if scenario screen. The spreadsheet and plotter are integrated with a transform
calculator that not only captures your modifications as equations for further tuning, but
also offers an extensive library of mathematical and practical functions.
When you have learned all you can learn from your data using the advanced statistical
and plotting capabilities, use the auto modeler wizard to build a model quickly and
Getting Started
Start Insights from the Start menu: select Start > Programs > Pavilion Technologies >
Insights.
Product Overview
The main screen provides tools and other features through pull-down menus and a
quick-access toolbar. It can also bring up your browser for displaying instructions and
help.
Click the Tutorial or Road Map links to display them in your browser. For more
information on the tutorial and road map, see the Insights Tutorial.
Save
Write the dataset or model to disk. The dataset or model remains loaded. You do not
have to save a model during or after training because the model trainer saves
training as it proceeds.
Exit
Unload any dataset or model and terminate Insights. If you have made changes to
an open dataset or model but not saved the changes, Insights prompts you to save
them before exiting. If you have been training a model, you do not have to save the
training; the model trainer saves the training to disk as it proceeds.
In the Customize Toolbar window, the Toolbar Buttons list shows the current layout
of the toolbar. The Available Buttons list shows the toolbar elements that you can
add. By default, all tool buttons appear in the toolbar, so the only available toolbar
element is the separator. Use the Add and Remove buttons in the middle to move
toolbar buttons into or out of the toolbar. Use the Move Up and Move Down
buttons to change the location of buttons in the toolbar.
Your changes to the toolbar remain until you terminate Insights. The next time you
start Insights, the toolbar is reset to its default.
File Editor
Create, display, or modify an ASCII text file such as a raw data file. The file editor is the
only Pavilion utility that can change your raw data files. The formatter does not change
raw data files. For more information, see Chapter 3, File Editor.
File Formatter
Use the formatter to describe the format of data files so that the spreadsheet can include
them in datasets.
The formatter can read data files in a wide variety of formats, allowing you to specify
column separators, designate row types, set column data types, associate data columns
with date-time columns, and more. When you finish specifying the format of a file, the
formatter writes a format file describing the organization of the data file. The formatter
does not change the original raw data file.
The File Formatter submenu provides these operations:
New
Format a file starting with the default format settings. This operation performs the
same function as File > New > File Format.
Copy
Format a file starting with format settings already defined for another file.
Edit
Modify the format for an already-formatted file.
Delete
Delete the format file for a data file
For more information, see Chapter 4, File Formatter.
Data Spreadsheet
Use the spreadsheet to display and modify the contents of a dataset. The data
spreadsheet supports mathematical transform-generated columns and provides access to
data both in its before-transforms state and in its after-transforms state. The transforms
Data Plotter
Display and modify an existing dataset graphically. The plotter offers a variety of plot
types, including chronological order (time series), variable against variable (XY),
histogram, principal components analysis (P.C.A.), and correlation. The plotter also
provides a variety of tools for modifying data. The plotter is closely integrated with the
spreadsheet and transform calculator so that any change you make in one is immediately
visible in the others. For more information, see Chapter 6, Data Plotter.
Transform Calculator
Display and modify the transform list for an existing dataset. Any modifications you
make to the dataset using the plotter or, in some cases, the spreadsheet, are accessible as
equations called transforms. You can, for example, clip ranges of values simply by
selecting them in a plot; then you can use the transform calculator to review the
transform implementing the clip and modify it as needed. You can create new variables
by applying mathematical transforms to existing variables. The library of transforms is
extensive, and you can add your own transforms if required. The transform calculator
also allows you to time-merge data so that sampling intervals are uniform and missing
values have been restored using your choice of interpolation or extrapolation technique.
The transform calculator is closely integrated with the spreadsheet and plotter so that
any change you make in one is immediately visible in the others. For more information,
see Chapter 7, Transform Calculator.
Model Builder
Build a model using either the auto modeler wizard, which uses default settings for a
number of model configuration and training parameters, or build a model using the
manual model configuration tool. You must load a dataset before you can build a model.
For more information, see Chapter 8, Building a Model.
Model Trainer
Tune a new or existing model using an existing dataset. For more information, see
Chapter 9, Model Trainer.
Model Analysis
Use the model analysis tools to check the fidelity of your models after they have been
trained and to perform process analyses. Select one of:
Predicted vs. Actual
Plot predicted output values against corresponding actual output values. This plot
shows the accuracy of the model over the range of data used to train the model and
also allows you to calculate residuals, the differences between predicted values and
actual values. Analyzing the distribution of the residuals can provide insight into
how well models have generalized to the process being modeled.
Sensitivity vs. Rank
Plot the sensitivity of the output to inputs.
Output vs. Percent
Plot output values against the full range of input values.
Sensitivity vs. Percent
Plot the sensitivity of the output against the full range of input values.
What Ifs
Perform what-if scenarios (predict outputs) using an existing model.
For more information, see Chapter 10, Model Analysis Tools and Chapter 11, What Ifs.
Plug-Ins
Pavilion plug-in technology makes it easy to add wizards, data extractors, and other
functionality to Pavilion products. Plug-in software is released separately from other
Pavilion products. Display the plug-ins by selecting Tools > Plug-Ins. For data extractor
plug-ins, see File > New > Dataset. For modeling plug-ins, see File > New > Model or
Tools > Model Builder. For help, see Help > Plug-Ins. For more information on the
currently-available plug-ins, contact your customer support representative.
Wizards
Using a data extractor wizard to read data straight from your DCS or historian allows
you to build a dataset without having to generate and format raw data files. All you do is
specify the range of time for which you need data and the time interval separating the
data samples. The data extractor wizard reads the required data and builds the dataset.
To run a data extractor wizard, select File > New > Dataset.
The New Dataset window shows icons for data extractors. It also has an icon for the file
formatter, which you use to prepare raw data files if necessary. For more information,
see Chapter 4, File Formatter.
To start the wizard, select the icon and click OK, or simply double-click the icon.
The default values are probably correct for your computer. If not, contact your system
administrator.
In the File Setup section, the history object file should list logs that you intend to access.
If necessary, use the Browse button to locate an alternative history object file. If the logs
you need to access do not appear in the history object file, click Edit to bring up an
ASCII text editor so that you can add the desired entries to the file.
Click OK to close the Options window.
Login Window
In the Introduction window of the wizard, click Next to proceed to the Login window.
Note: Searches are case sensitive. The search strings must match the
case of the log names in the historian.
In the Mask field, use the asterisk (*) to match any number of characters in the name.
For example:
pav*
List all logs starting with pav.
Pav*
List all logs starting with Pav.
In this manner, you can continue to search and select logs for extraction.
The set of logs that you can access is determined by your computer’s history object file,
which you can specify by clicking Options. The Options window (described on page
2-3) also allows you to edit a history object file, adding entries for logs you need to
access.
If you intend to extract data from the same logs again in the future, you may want to
save the current log list by clicking Save Project Logs. This operation saves the list of
logs to a file. Later, instead of searching and selecting logs again, you can load this list
using the Load Project Logs operation.
When the list on the right, Project Logs To Be Extracted, lists all the logs you need,
click Next to continue to the Select Date and Times window.
Note: The start and end times and the extraction interval determine
the size of your dataset. Make sure you have enough disk space for
the amount of data you intend to extract.
When the Use Multiple Intervals feature is turned on, only the intervals appearing in the
list will be extracted. If Use Multiple Intervals is turned off, only the interval specified
in the date, time, and interval fields at the top of the window will be extracted.
After specifying the intervals to extract, click Next to continue to the Data Validation
window.
To validate all logs, click Validate All Logs. The wizard indicates the results of the
validation check in the Status field:
Unknown
Validation not yet performed for this log. Status is set to Unknown whenever you
enter the Data Validation window from a previous window in the wizard.
Good
Extraction of all 500 validation values was successful.
Missing
Some validation values were missing. Review the sample data (see below), and
consider returning to preceding wizard windows to modify your log list and interval
Note: Data extraction fails if a log is not found on the historian. You
must resolve a Bad status before continuing.
When you select a single log, the operations for validating selected logs, removing logs,
and viewing sample data become active. When you select multiple logs, only the
operations for validating and removing selected logs become active.
When validation has returned a Good or, where acceptable, Missing status for each log,
click Next to continue to the Extract and Save Data window.
The Dataset format is intended for use with Pavilion products and tools such as the
spreadsheet, plotter, model trainer, and so forth.
The ASCII format is a common text file, or raw file. For the ASCII format, you can
specify column headings and separators, and the string to substitute for error values.
If you choose to extract the data into the ASCII format, you can still use the ASCII files
to build a Pavilion dataset later.
For either format, specify a file or dataset name and path name. If the file or dataset
already exists, you are prompted before overwriting it.
Click Finish to begin the extraction.
Click Options to verify and change the port used for communicating with the PDI
network server.
The PDI network server handles the data connection between the data extractor and the
Enterprise Historian. If the server is not available at port 8764 on your computer,
contact your system administrator. After correcting the problem, restart data extraction
by clicking Finish in the Extract and Save Data window.
To interrupt extraction at any time and return to the wizard, click Cancel Data
Extraction.
If you are extracting into a dataset, the wizard loads the dataset into the spreadsheet
upon completion. If the dataset does not exist in your data dictionary, a prompt appears,
asking if you wish to add it.
Use the editor to display and modify common ASCII text files. To invoke the editor,
select Tools > File Editor.
The editor is line oriented and allows you to view and manipulate special characters. It
also includes a global search and replace feature. The editor is intended principally for
The File Browser is used to select a file to be loaded into the editor. It displays a list of
files and subdirectories in your current directory (on OpenVMS systems, it displays and
can access only the most recent version of any file).
• After you have made one or more directory changes, click Previous to move back
to your most recent previous directory selection, or click First to move back to the
first directory you were in when you entered the File Browser.
• Click on any subdirectory and click Load to move down to that subdirectory.
• Double-click on any subdirectory to move down to that subdirectory.
• Click in the Directory text box, type in the full path of any directory, and press the
Return key, to move to that directory (On Windows NT systems, this field is also
used to navigate to undisplayed network drives).
Loading a File
You can enter a File Mask to display only those file names that match a particular
format. The wildcard character is asterisk (*) for any number of arbitrary characters.
Anything that you type in this text box, including backspacing over its contents, is not
processed unless you press the Return key while the cursor is still in the text box.
If the file that you want to edit appears in the list, double-click on it, or click on it and
click Load; or you can type in the name of any file in the displayed directory, or the full
path and filename of any file, and press the Return key. The selected file’s contents will
appear in the editor window.
After you type a name and click Create, the editor is filled with a single line containing
the end-of-line sequence. You can use the Append function in the Edit menu, described
on page 3-14, to add more lines.
Editing a File
The contents of your file are displayed. The Show Special Characters toggle controls
whether nonprintable characters are represented in the file display area. These are the
codes that indicate special characters:
With Show Special Characters turned on, the example file shown on page 3-5 would
look like this:
To select one row of the file, click on it (either on its contents or on its row number). To
select a contiguous group of rows, drag through them, or click on the first one, scroll if
necessary, and shift-click on the last one. The line number(s) selected are displayed in
the information area above the line edit field.
You can edit one row at a time by selecting it. If only one row is selected, its contents
are copied into the line Edit field. Special characters are always displayed in the edit
field, regardless of the state of the Show Special Characters toggle. To change the
contents of a line, select it, then click in the edit field and backspace and type. To apply
changes from the edit field, click in the edit field and press the Return key; to cancel
changes, make a new row selection without pressing the return key.
Note: Pressing the return key simply enters the changes you have
made; it does not automatically put the end-of-line sequence at the
end of the line. Be sure to leave the proper end-of-line characters at
the end of the line, or the line will be combined with the one that
The Go To button is used to scroll to a row number that you specify. It invokes a prompt
box for you to fill in the row number. When you click Go, the file is scrolled.
For additional editing functions, see “Edit Pull-Down Menu” on page 3-14.
You can use this simply to search, or to search and replace. Type in the text that you
want to Search For and Replace With. If you want to specify any special characters, use
the codes listed in the table on page 3-6.
the search will find any instance of a, followed by any number of any characters except
newline, carriage return, or formfeed, followed by b. For a wildcard search across lines
in the file, you must explicitly specify the end-of-line sequence.
The following sample screen shows the end-of-line sequence for UNIX and OpenVMS
systems:
The following sample screen shows the end-of-line sequence for Windows systems:
The Case Sensitive toggle controls whether the search is case-sensitive. The Replace
With text can be left empty if you want to search for text and delete it.
The Search Region specifies the rows affected by the search. If you had already selected
a group of rows before you opened this window, that selection is the default Search
Region; otherwise, the default Search Region is the entire file. Searching can be
Forward or Backward from the current position.
The Search button finds and highlights the next occurrence of the specified Search For
string, without changing it. The Replace button replaces the next single occurrence. The
Replace All button displays the number of occurrences found, and allows you to
Replace them all or cancel. The Count button displays the number of occurrences,
without any replacement. The Done button closes the Search and Replace window.
Menu Bar
The Editor menu bar provides the File pull-down menu and the Edit pull-down menu.
A message tells you how many characters and lines were written.
Save File As
Save File As invokes a prompt box for you to type in a file name. You can type just
the file name, or the full path and name. After you specify a name, you are warned
about overwriting, and told how many characters and lines were written.
Delete File
Delete File invokes the file browser, for you to select which file to delete.
The file browser is used to traverse your directory structure and select one file to be
deleted. It functions as described in “Selecting a File to Edit” on page 3-3. After
you select a file, you are asked to confirm.
This chapter explains how to use the formatter to describe the format of your data files.
If you use a data extractor wizard to acquire raw data for building a dataset, you do not
need to use the formatter. To start a data extractor, select File > New > Dataset.
Most manufacturing processes have a mechanism for storing historical process data.
These “data historians” come in a multitude of forms from many different vendors, but
they all tend to have one thing in common: they can write out the data into ASCII text
flat files (columns of data with new items on each row). These files may be of a wide
variety of formats, but most commonly they are either space- or tab- or comma-
separated. Often they have one or more header lines (lines at the top of the file) that
describe the file and the columns.
Before the spreadsheet can read data, it must have information about the format of the
file so that it can read the data correctly. It is the job of the formatter to specify
information about flat files so that the spreadsheet can read the files. The formatter
stores this information in a format file, which contains simple information about the
format such as number of columns and rows, column delimiter, and name and data type
of each column. A formatted file is a data file that has been described in a format file.
Required
As you prepare the data files, adhere to the following requirements for structure and
date/time information.
Structure
The basic assumptions about your data are:
• It is arranged in rows and columns.
• It is in an ASCII text file.
• Each “variable” (information from a particular data collection point in your
process) is stored in a different column.
Generally, successive rows contain information from successive points in time,
although this is not strictly necessary. Any given variable must be recorded in the same
column in every row. Beyond this basic requirement, the formatter and spreadsheet
allow wide flexibility in data file formats, as described below.
These special requirements apply only to a data file that does not include date and time
information attached to the variables.
Optional
The data file may include header rows that contain information about the variables’ Tag
Names, Comments, and Units.
Every variable is identified by a tag name. An optional comment may also be attached
to a variable, but the comment is not always visible throughout the product. If you will
ever use the Pavilion Data Interface™ to access your data in real time, we recommend
that you use the DCS tags as either the tag names or the comments. It is often
convenient to use the DCS tags as the comments, and brief descriptions such as
“flowrate” as the tag names.
You can also define Units for each variable, but except for date and time variables, this
information is ignored by the product and may be omitted.
If the data file has any other header rows, they can be skipped easily and do not cause a
problem.
Formatter Options
Use the formatter to describe the format of a data file, or change a format that you have
already specified; you can also remove a formatted file’s name from the data dictionary,
deleting its associated format file from the disk.
The formatter displays a data file interpreted according to its current format
specification, and allows you to change the format specification if you can see from the
display that the data file is not described correctly or sufficiently. The formatter is
described in detail beginning on page 4-5.
Changing Directories
If the file that you want to format is not located in your current directory, you can
change to another directory, and its files and subdirectories will be displayed in the list.
There are a number of methods to change the current directory:
• Drag on the Directory option menu to display the complete directory structure
above the current directory. Use this menu to move up one or more directory levels.
• After you have made one or more directory changes, click Previous to move back
to your most recent previous directory selection, or click First to move back to the
first directory you were in when you entered the file browser.
• Click on any subdirectory and click Select to move down to that subdirectory.
• Double-click on any subdirectory to move down to that subdirectory.
• Click in the Directory text box, type in the full path of any directory, and press the
Return key, to move to that directory.
Loading a File
You can enter a File Mask to display only those file names that match a particular
format. The wildcard character is asterisk (*) for any number of arbitrary characters.
Anything that you type in this text box, including backspacing over its contents, is not
processed unless you press the Return key while the cursor is still in the text box.
When the file that you want to format appears in the list, double-click on it, or click on it
and click Select. The formatter will open and load the selected file.
New Format
When you create a new format for a data file, the formatter scans and tries to interpret
the data file’s contents. The data file is then displayed according to this format. You can
change the format specification if you can see from the display that the data file is not
described correctly or sufficiently. When you click Done in any of the formatter
The names of the file being formatted, its format file, and the data dictionary, are
displayed. To save the format file into a different file, or to use a different data
dictionary, type its name in the text box and press the Return key.
The first item to inspect is the Column Separator. You have the following choices:
Spaces (and Tabs)
Data items are separated by any nonzero number of spaces and/or tabs.
Comma (and spaces)
Data items are separated by one comma and any number of spaces.
Just One Tab
Data items are separated by exactly one tab. Two consecutive tabs are interpreted as
a missing data point. (Spaces are ignored.)
Fixed Width
Values begin at specific character positions; there are not necessarily any characters
between values.
Special
Data items are separated by some other character not listed above, such as “#”, and
spaces are ignored.
If you choose Fixed Width, then later, when you get to Step 2, you will need to set the
column widths. If you choose Special, click in the text box to the right, and type in it the
character that is the column separator.
The system fills in the number of columns based on the current value of the Column
Separator. You should always check the number of columns because, like the column
separator, it may have been derived incorrectly; if it is wrong, click in the text box,
backspace over the old value, and type in the correct value.
If the value for a particular point is missing or unavailable, some data systems will just
leave it blank, but some data systems will replace it with a character string such as
“####”. If your data file contains any particular character string that takes the place of
missing data, you should type it in the Missing Data box.
Row Flags
Row flags are displayed in this screen to the left of each row in the data file. Every row
may be assigned a flag, or its flag may be left blank to indicate that the row contains
data. If header rows in the data file contain Tag Names, Units, or Comments (described
If more than one row is flagged Tag Name, the values in all Tag Name rows will be
appended to form the variable’s tag name; the same is true for Units and Comments.
In the formatter you cannot change any data values, but you can set the row flag so that
rows of bad data are skipped. To move to any particular row, click Go To, and a dialog
box will ask what row number you want to move to. When you click Go, the file is
scrolled until that row number is displayed.
After you have checked all parameters in Step 1 and made any necessary corrections,
move to the top area labelled Move to Step, and click 2. Format Columns.
The display is divided into two areas, separated by horizontal double lines. The top area
is the column information area. In a new format file the Tag Name, Comment, and Units
shown in this area are taken from the header rows that you marked with row flags, if
any; otherwise they are left blank. If the column separations are not marked correctly,
go back to Step 1 (move the mouse to Move to Step at the top of the window, and click
1. Format Rows), make the correction, and then return to Step 2. If there is an error in
the row flags, you can correct it in either Step 1 or Step 2.
If you specified the column separator as Fixed Width, you must adjust the column
boundaries. In the column information area, move the mouse onto the vertical line
between columns; the pointer changes shape to a bidirectional arrow:
With the arrow pointer, press the mouse button and drag the column boundary to the
desired location.
The column information attached to each variable includes its tag name, comment,
units, type, and date/time reference. Even if this information was copied from flagged
header rows, you can still change it. The system fills in its interpretation of type and
date/time reference; you should always check these, and change them if necessary.
To change a variable’s tag name, comment, units, or date/time reference, click in its cell.
The name of the cell that you are editing will appear in the Selection area, its current
contents (if any) will appear in the Edit text box, and the cursor will move into the Edit
box.
Backspace and type as necessary to edit the text, and press the Return key to apply the
change. You can also make changes using the Edit menu functions, as described in “Edit
Menu” on page 4-16. If you enter new values in any cell, either by typing or by the Edit
menu functions, that cell is no longer affected by any row flags; if you want to change it
back, you must type or edit the old value back in.
Units
Units are ignored except for date and time variables; they are provided solely for your
convenience, and you may leave them blank if you wish. The system can understand
many common date/time formats, and for those you can leave the Units blank. If you
have a date/time format that the system does not automatically understand, you simply
type into the Units field a description of the format. Characters such as commas, colons,
hyphens, slashes, and so forth should appear exactly as they occur in the data.
Components of a date or time are indicated by the keys in the following tables. The keys
are not case sensitive.
Key Meaning
m Month number (1-12), no leading zeros
mm Month number (01-12), always 2 digits
mmm Three letter abbreviation for month name (Jan-Dec)
mmmm Month name fully spelled out
d Day number (1-31), no leading zeros
dd Day number (01-31), always 2 digits
y Year (1-2000), no leading zeros, as many digits as needed
yy Last two digits of year (00-99). See Note below.
Key Meaning
yyyy Four digit year
www Three letter abbreviation for weekday name (Sun-Sat)
wwww Weekday name fully spelled out
j Day number in the year (1-366), no leading zeros
jjj Day number in the year (001-366), always 3 digits
k Week number (1-54), no leading zeros (see “Week Number,” below)
kk Week number (01-54), always 2 digits (see “Week Number,” below)
h Hour (1-12 or 1-24), no leading zeros.
hh Hour (01-12 or 01-24), always 2 digits.
m Minute (1-59), no leading zeros.
mm Minute (01-59), always 2 digits.
s Second (1-59), no leading zeros.
ss Second (01-59), always 2 digits.
t Tenths of a second (0-9), one digit only.
tt Hundredths of a second (00-99), always 2 digits.
ttt Thousandths of a second (000-999), always 3 digits.
a a.m. and p.m. indicated by a single a or p
p Alternative for “a”.
am a.m. and p.m. indicated by the letters am or pm.
pm Alternative for “am”.
a.m. a.m. and p.m. indicated by the characters “a.m.” or “p.m.”, with periods
and no spaces.
p.m. Alternative for “a.m.”.
Examples
Type in
File Contents Interpretation
These Units
11/5/96 November 5, 1996 m/d/yy
11/5/96 May 11, 1996 d/m/yy
Type in
File Contents Interpretation
These Units
96032 February 1, 1996 yyjjj
960201 February 1, 1996 yymmdd
233045 11:30 p.m. and 45 seconds hhmmss
1.0, 1.15, 1.30, 1.45 hours and minutes: 1:00, 1:15, 1:30, 1:45 h.m
If you encounter a date or time in a format that cannot be expressed in this notation, you
can use the editor to change the data file; or in some cases it may be better to tell the
formatter that the values are strings or numbers, and then use transforms to convert them
to date and time (see “Date/Time” on page A-34).
Type
When the formatter creates a new format file, it scans the contents of each column in the
data file and tries to interpret the data Type. You should always check these and change
them if necessary. To change a variable’s Type, click on its Type cell (or drag or shift-
click through multiple adjacent Type cells), and the Edit text box will change to an
option menu listing the available Types.
Most of the Types are self-explanatory. Double means double-precision real. DateTime
and TimeDate refer to columns with both date and time data, in the specified order, in a
single column. EurReal and EurDouble are real and double-precision numbers with
Date/Time
As explained in “Data File Contents” on page 4-2, data files do not always have to
include date and time information. If date and time information is provided, it can be in
two different columns or combined in one column (but if it is in two columns now, they
will be combined into a single column in the spreadsheet). Either date or time can occur
first. There can be more than one date/time point in each row of the data file; variables
in a single row of the data file do not have to be sampled all at the same time if each
variable is recorded with its sampling time.
The Date/Time reference of a date or time column is filled in as “n/a” and cannot be
changed. The Date/Time reference of a data column is the column number of its
corresponding Date and Time. (If the Date and Time are in two different columns, the
syntax is Date column number, slash, Time column number.) The system will fill in
default values which are the closest Date and Time columns to the left of the data
column. If the system was unable to recognize Date or Time columns correctly, you
must fill in the correct values.
Edit Menu
The Edit menu is available in Steps 2 and 3. Edit functions can be applied to all header
rows except Types (but not to the data rows). You must select a cell in one of these rows
before you invoke any Edit function (except Undo). Only one cell can be edited at a
time.
Undo (Ctrl-u)
Reverses the most recent editing change.
Cut (Ctrl-x)
Copies the contents of the selected cell to the edit buffer, and then erases the cell.
Copy (Ctrl-y)
Copies the contents of the selected cell to the edit buffer.
Paste (Ctrl-p)
Copies the contents of the edit buffer into the selected cell. If it already contained a
value, that value is lost (not appended).
Clear (Ctrl-b)
Erases the contents of the selected cell, without saving to the edit buffer.
Insert (Ctrl-i)
Inserts a new, empty cell to the left of the selected cell, moving the rest of the row
one cell to the right, and losing the contents of the rightmost cell.
Delete (Ctrl-d)
Copies the contents of the selected cell to the edit buffer, deletes this cell, and
moves the rest of the row one cell to the left, leaving the rightmost cell blank.
Note: Any changes that you specify using this Edit menu are applied
to the format file, not to your data file. The formatter never changes
the contents of your data files.
Completing Step 2
After you have checked all parameters in Step 2 and made any necessary corrections,
move to the top area labelled Move to Step, and click 3. Verify Format.
You cannot change a variable’s type in this step, but you can change any of the other
information that you set in Step 2; you can also change each variable’s display format.
If values in a date/time column are displayed as Error or as FmtErr, it means that the
units for that column do not match the column’s contents (or you left the Units blank,
and the file’s actual units are not one of the common formats that the system can
understand by default). Return to Step 2 so you can view the column’s contents, and
type in the correct Units; see “Units” on page 4-12. If you cannot specify the units to
clear these errors, it may be necessary to read the file into the editor and adjust its
contents; see Chapter 3, File Editor.
Display Format
Display format is provided solely for your convenience. A variable’s display format
does not affect how it is read, interpreted, or stored internally; the display format simply
specifies how you want the data to appear on the screen. Display format does not apply
to Integer or String variables.
The display format for a Real or Double variable is an integer between 0 and 20 that
tells how many decimal places to display. The default display for these types is derived
from the number of decimals that occur in that column in the first few data rows in the
data file.
The display format for a Date variable has the same syntax as Date Units, as described
in “Units” on page 4-12, except that month and weekday name specifications are case
sensitive: the case of the first letter is copied, and the case of the second letter is used for
all subsequent letters. For example:
If the Date’s Units are left blank, its default Display format is mm/dd/yy; if the Date’s
Units are filled in, its default Display format is the same as its Units.
The display format for a Time variable has the same syntax as Time Units, as described
in “Units” on page 4-12, except that times with am or pm indicators copy the case of the
am or pm specifier. If the Time’s units are left blank, its default display format is
Completing Step 3
When you have finished verifying that the file will be interpreted correctly, click Done
to save the file specification in a format file, record it in the data dictionary, and close
the formatter.
Copying a Format
The Copy Format option is for situations in which multiple data files have identical or
very similar formats. This option makes a copy of a format that you have already
specified. You can then edit this copy if necessary.
The format is copied from one file that has already been formatted, to one or more files
that are not already formatted. Toggle buttons are used to select whether the files have
Identical or Similar formats.
This window lists all of the formatted files recorded in your current data dictionary,
sorted by directory, with number of rows and columns, and start date if known. The
name of your data dictionary is displayed; to change to a different data dictionary, type
its name in the text box. To select a formatted file from this list, double-click on it, or
click on it and click Select. Its name and path will be inserted in the Copy Format
window.
When making a selection from a list, if you notice that any listed file is obsolete, you
can click on it and then click Delete. The format file is deleted from your disk and the
data file’s name is removed from the data dictionary, but the data file itself remains on
your disk. If you remove a format file from this list, the window remains open so you
can continue the original purpose of selecting a formatted file.
This window is similar to the file browser described in “Selecting a File” on page 4-4,
but it has the ability to select multiple files. Click on any file in the Files list and click
the right arrow button, or just double-click on the file, and its name will be copied into
the Selection list. To remove a file name from the Selection list, click on it and click the
left arrow button, or just double-click on it. When the Selection list contains one or
more files that you want to copy the format onto, click Select, to close this window and
return to the Copy Format window.
When you select Edit and Apply to copy formats to multiple files, the formatter comes
up containing the first file in the list; after you click Cancel or Done, the formatter
comes up again containing the next file in the list. In this situation, the formatter is the
same as described in “New Format” on page 4-5, except that it includes two additional
control buttons, Cancel All and Done All. The regular Cancel and Done buttons pertain
only to the format file currently being edited; the Cancel All and Done All buttons will
cancel or finish all files remaining.
Editing a Format
Use the Edit Format option to review and change any format that you have already
specified. Selecting the Edit Format option displays the Select File Format window.
This window lists all of the files that you have already formatted. It operates as
described in “Selecting a Data File that Has Been Formatted” on page 4-22. After you
select a file, the formatter is opened, containing the selected file and its current format
specification. You can cancel, or make changes and save them with the Done button.
Note: If a format has been saved with non-blank values in Tag Name,
Units, or Comment, and you edit the format, the values that you
already saved can be changed only by editing the individual header
cells; changing the Row Flags at this point has no effect.
Deleting Formats
If you no longer need to reference some of your formatted files, you can delete their
formats. The Delete Formats option invokes the Delete Formats window, for you to
select files. The selected files’ names are removed from the data dictionary, and their
It is used to delete format files that the formatter generated to describe your data files,
and remove their entries from the data dictionary. It does not delete your data files.
The name of the current data dictionary is displayed. If you want to use a different data
dictionary file, click in this box, type in its name, and press the Return key.
The Files area lists all formatted files that are recorded in the data dictionary, sorted by
directory, with their corresponding format files. The Selection area lists the files that
you select. There are several methods to move files from one list to the other:
• click on a file and click the arrow button
• double-click on a file
• drag on a group of files and click the arrow button
• click the all=> or <=all button
When the Selection list contains all the formats that you want to delete, click Delete. A
question box will ask you to confirm, and if you do, the formats are deleted and this
window is closed.
Key Concepts
• In Step 1, check & correct the column separators; mark the header (Tag Name,
Units, Comment) rows.
• In Step 2, check & correct the variable types and date/time pointers; optionally
override header values.
• In Step 3, check the interpretation of dates and times; if misinterpreted or Error, go
back to Step 2 and type in the Units (see the table on page 4-12 for syntax). In
extreme cases, use the editor to change the data file, or call them strings and parse
them using transforms.
• After you save a format file that assigns a name to a variable, that name takes
precedence even if you Edit Format and change the Row Flags. The only way that
Edit Format can change saved names is by manually editing the header cells in
Step 2.
This chapter explains how to use the spreadsheet to read data from formatted files (data
files that have been described to the formatter, as recorded in the data dictionary), and to
manipulate the data in the spreadsheet.
Invoke the spreadsheet by selecting Tools > Data Spreadsheet.
About Datasets
Use the spreadsheet to gather variables from formatted files into an internal data
structure called a dataset. When you save a dataset, its name is stored in the data
dictionary. A Pavilion model can only read data that is in a dataset.
A dataset consists of the original (raw) data values (obtained from the process history
through formatted files), and a list of functions or transforms that have been applied to
the data, producing a set of transformed data values. The transformed variables can
include variables that are unchanged from their raw values, variables whose raw values
have been modified by the transforms, and newly created variables generated by
transforms. After you have applied any transforms to the dataset, you can still view and
Transforms
All transforms on the dataset are kept in one ordered list. This transform list includes
functions that you apply directly from the Transform window (see Chapter 7, Transform
Calculator), and transforms that are automatically generated by a number of user actions
in the spreadsheet and plot windows. Whenever you load a dataset, or perform any
action that requires the transforms to be recalculated, a message displays each transform
as it is being applied. Transforms can be modified or deleted. Any action that generates
a transform can be undone by deleting the transform.
Status
Every data cell in a column has both a value and a status. If the status is OK, you see
only the value; if the status is not OK, the value may be undefined. Categories of bad
status include Cut, Blank, Break, Missing, and Error. The spreadsheet allows you to
change statuses as well as values.
Date/Time Reference
If date and time were in two separate columns in the formatter, they are always
combined into a single date/time variable in the dataset.
Every numeric or string variable is optionally associated with a date/time variable; this
is referred to as its date/time pointer or date/time reference. You can change the date/
time pointer of a raw or independent column, but a computed column inherits its date/
time pointer from the variable(s) on which it depends (thus you cannot build a transform
using two variables with different date/time pointers).
Preprocessing Data
In the preprocessing phase, your dataset is displayed in a spreadsheet format or as a plot;
the default view is spreadsheet. You can have more than one copy of each view open at
the same time, up to your system limits on the maximum number of windows allowed,
but they all contain the same dataset.
You can remove bad data by changing its status or value as described in “Changing
Spreadsheet Contents” on page 5-15; by using plot cuts or clips, described in “Tools” on
page 6-28; or with transform functions described in Appendix A, Transform Reference.
For more information, see “Before Transforms Properties: Analysis Variable Range” on
page 5-45 and “Time Merge” on page 5-35.
The icon window shows any available data extractors in addition to the Formatted
ASCII Files option.
The data extractors create a dataset by reading data directly from the DCS or historian,
thus allowing you to skip the formatting phase and proceed straight to the preprocessing
phase. The Formatted ASCII Files option allows you to select formatted files for
creating the dataset. If you select Formatted ASCII Files, the Select Files window
appears.
This dialog lists all formatted files recorded in the data dictionary, sorted by directory.
The complete path name of the current data dictionary is displayed; if you want to use a
different data dictionary, you can click in the box and type its name; when you press the
Return key, the formatted files recorded in that data dictionary will be listed.
This window is used to select the formatted files from which you will read variables.
The Selection area lists the files that you select. There are several methods to move files
from one list to the other:
• click on a file and click the arrow button
• double-click on a file
• drag on a group of files and click the arrow button
• click the all=> or <=all button
This window displays all variables in all of the formatted files that you selected.
Variables are grouped under the name of their file. The Show Directory toggle button
controls whether you see the full path name of the formatted file.
Each variable is listed by its column number in the data file, tag name, comment if any,
and type. If any variable was not given a tag name in the formatter, it is marked “(no
name)”.
To change the tag name of any variable in the Selection list, click on it, then click in the
New Name box, type the name, and press the Return key. Name changes do not affect
your data files or their associated format files; they only set the name that a variable will
have when it is read into the current dataset.
When you click OK, the Selection list is checked for duplicate or bad variable names. If
duplicate names are found, it displays the first one, and asks whether to correct all the
names automatically or cancel the read:
Variable names may not include an exclamation point (!), a double quote ("), a left brace
({), or a right brace (}). As with duplicate names, if any bad names are found, it asks
whether to correct all the names automatically or cancel the read:
If you select Automatically, long names are truncated, the characters _2, _3, _4, and so
forth, are appended to duplicates, exclamation points and other illegal characters are
replaced with underscores (_), and reading continues. If you do not like the default
resolution names, you can change the names later.
Loading a Dataset
When you click Load a Dataset, the Select Dataset dialog is invoked.
This window is used to select an existing dataset to be loaded. It displays the name of
your current data dictionary, and a scrolled list of all datasets recorded in the data
dictionary, sorted by directory. If you want to use a different data dictionary, click in the
box, type in its name, and press the Return key, and datasets in the new data dictionary
will be listed.
When loading a dataset, if you notice on the list any dataset name that is obsolete, you
can click on it and then click Delete. You will be asked whether to Delete the dataset,
Just Remove its name from the data dictionary, or Cancel. Regardless of whether you
confirm or cancel, the Select Dataset window remains open so you can continue the
original purpose of selecting a dataset to be loaded.
You can load any dataset, regardless of whether it is displayed in the data dictionary
listing:
• If the dataset is listed in this window double-click on it, or click on it and then click
Select; or
• Type the full path name in the Dataset Name box and click Select; or
• Click Browse to invoke the common File Browser, described in “Selecting a File to
Edit” on page 3-3. The Browser displays datasets in the directory that you most
recently accessed, or in the directory from which you started running the program.
A File Mask will automatically be applied so that only dataset names (and directory
names) are displayed. Click on a dataset name and click Select in the Browser, to
enter the name in the Dataset Name box in this window, then click Select in this
window.
After any of these selection methods, the dataset is loaded. If you load a dataset not
recorded in your data dictionary, you will be asked whether to add it, and what comment
to attach to it. While the raw data is being loaded, a message is displayed which
periodically tells the portion of the dataset that has been read. If you click Cancel, the
dataset loading process will be cancelled (you cannot load just a portion of a dataset).
If you click Stop, the dataset is cleared from memory; you cannot load a dataset without
applying its transform list. After the raw data is read and all transforms are applied, the
Spreadsheet window is opened.
You can also use the Select Dataset window to delete a dataset or remove it from the
data dictionary. To do so, select the dataset name and then click Delete. A dialog
appears, prompting you to select one of the following options:
Delete
Delete the dataset files as well as the dataset’s entry in the data dictionary,
Just Remove from DD
Delete the dataset’s entry in the data dictionary without deleting the dataset files, or
Cancel
Cancel the operation without deleting the dataset files or changing the data
dictionary.
Spreadsheet Window
The Spreadsheet window displays a dataset in the format of a spreadsheet. Each column
contains a single variable (if any date and time information was in separate columns in
the formatter, it is automatically combined into a single date/time column). Header
information about a variable is displayed at the top of a column, above the data values.
Row and column numbers in the dataset do not necessarily correspond to row and
column numbers in the raw data file(s) as displayed in the formatter.
The Show Dataset menu allows you to view and manipulate the dataset either in its
normal state after the transform list has been applied (after-transforms), or in the
before-transforms state, showing the raw variables before any transforms are applied.
In the before-transforms state, the background color changes, the menu bar contents
change, and several of the operations buttons become inactive.
Header Information
The information area at the top of each column displays the variable’s Tag Name
(described on page 4-3), Comment (described on page 4-3), Units (described on page
4-12), Display format, and Time Column number. For raw variables, this information is
inherited from the format file. You can change this information as described in
“Changing Spreadsheet Contents” on page 5-15 or “Variable Properties” on page 5-43.
Display format for date/time, real, and double-precision variables is as described on
page 4-19. Display format for integers is “(Int)”, and for string variables is “(Str)”; these
cannot be changed.
Time Col is the column number of the variable’s date/time pointer. For a date/time
variable, this value is “n/a”. If a variable has no associated date/time information, this
value is “none”. If you rearrange the order of columns such that a date/time column is in
a different position, all Time Col values are updated automatically.
Selecting a Region
• To select a single cell, click on it.
• To select a rectangular group of cells, click in one corner of the rectangle and drag
to the opposite corner; or click in one corner, scroll if necessary, and shift-click in
the opposite corner.
• To select an entire row, click on its row number.
• To select a group of rows, click on the first row number and drag to the last row
number; or click on the first row number, scroll if necessary, and shift-click on the
last row number.
• To select an entire column, click on its column number.
• To select a group of columns, click on the first column number and drag to the last
column number; or click on the first column number, scroll if necessary, and shift-
click on the last column number.
• To undo a selection, click in the white space to the right of or below the
spreadsheet, for example, on the word “Row” at the far left side of header section.
The selected area is highlighted. The arrow keys on your keyboard will change your
selection in the arrow’s direction. If the selected area is not visible on the screen, and
you use an arrow key to make a new selection, the dataset will be scrolled such that the
new selection is visible. If a row is selected, the left and right arrow keys will scroll to
the leftmost or rightmost column of the dataset; if a column is selected, the up and down
arrow keys will scroll to the top or bottom of the column. This can be used as a shortcut:
to find the last row in a column that is shorter than the other columns, select the entire
column and press the down arrow.
You can display statistics for either the after-transforms (normal) or the before-
transforms state.
Mean
n
1
x = --- å x i
n
i=1
Standard Deviation
n
1
s = ------------ å ( x i – x )2
n–1
i=1
Skew n
xi – x 3
å æè ------------ö
1
---
n s ø
i=1
Kurtosis æ n x – x 4ö
ç 1--- æ ------------ö ÷ –3
çn å è s ø ÷
i
è i=1 ø
ò f(x)dx = 0.5
–∞
q1
ò f(x)dx = 0.25
–∞
q3
ò f(x)dx = 0.75
–∞
Trimmed Mean Mean of all values inclusively between the 1st and 3rd quar-
tiles.
Trimmed Standard Devia- Standard deviation of all values inclusively between the 1st
tion and 3rd quartiles.
You can save the displayed statistics to an ASCII text report file by clicking Save
Report. This invokes a prompt box for you to enter the file name. A default name is
provided.
The Print button is used to print only the statistics that are visible; be sure to calculate
the ones you want before you print. The Print button invokes the Statistics Print setup
window, which sends a PostScript® specification to a file or directly to a printer.
Time Statistics
The Time Statistics button on the spreadsheet invokes the Time Statistics window.
This window displays information about the date/time variables in your dataset. You
can select either the after-transforms (normal) or before-transforms state of the dataset.
The information includes the location of invalid values, and the numbers, location, and
size of increasing, constant, and decreasing intervals. The use of time statistics
information is discussed in more detail in “Time Merge” on page 5-35.
To write the time statistics information to an ASCII text report file, click Save Report.
This invokes a prompt box for you to enter the file name. A default name is provided.
You can specify a row number (within the range of the dataset), or a column tag name,
comment, or number. The tag name or comment is case-sensitive, and must be typed in
exactly as it occurs in the variable. When you click Go, this dialog is closed, the
spreadsheet is scrolled to display your selection, and the selection is highlighted.
The Value parameter is the number to which your data is compared. The Within
parameter is a small tolerance value determining the exactness of a numeric
comparison. This tolerance is applied to Equal, Not Equal, Less Than Or Equal, and
Greater Than Or Equal searches.
For example, Find Value Equal To Value 506.0 Within 0.01 would find any number
between 505.99 and 506.01. If you leave Within blank, it defaults to .5 times the
precision of the Value that you enter; for example, if you specify Value 3.200, Within
defaults to 0.0005. Within is used because of round-off errors between the internal
precision and your specified display format, and because of the round-off errors
inherent in computer storage of real numbers.
All other comparison types require you to specify a Date, or Date and Time, to which
the dataset values are compared.
Do not put quotes around the value being searched for, or the quotes will be considered
as part of the search value.
This type of search is useful for finding rows in the dataset when a combination of
values from several variables indicates a specific state in the process.
A condition is an expression built by combining variable names, constants, parentheses,
relational operators (=, <>, <, <=, >, >=), logical operators (and, or, not), and transform
functions. A condition is built using the same syntax as a transform, as described in
detail in “Syntax” on page 7-7 and Appendix A, Transform Reference. The transform
functions that are allowed are those that operate on a single data point at a time; for
example, $sin(x) is allowed, but $average(x) is not. As in the transform
calculator (Chapter 7, Transform Calculator), variable names that contain characters
other than alphanumerics, or that consist solely of numerals, must be surrounded with
exclamation points; exclamation points are optional for any other variable names;
functions and operators must be preceded by a dollar sign if they have the same name as
a variable; and dollar signs are optional for any other functions or operators. For
example, if a dataset contains variables called pressure, temperature, and flow, then you
could specify a Find Row search with the Condition
((!pressure!>1232.) $and (!temperature!>194.5) $and (!flow!<508.))
or click Search to find the first occurrence. After a successful search, this window is
closed and the matching cell or row is selected and highlighted.
Repeating a Search
The Search Again button is used to repeat a search without invoking the Search Window
again. If the most recent search was a Count, Search Again looks for the first match.
Select the orientation, paper size, number of copies, and destination. Consult your
system administrator for a list of valid printer names on your system. If you write to a
file, you can specify it by file name only, or by its full directory path. Select the row and
column numbers to be printed. You can optionally print System Labels, showing the
dataset name and the current date and time; you can also print labels on the top and/or
bottom of each page. Fonts and sizes can be specified independently for the table
contents and the labels.
Copying Variables
There are two methods by which you can copy a variable: Duplicate and Copy Values.
Duplicate copies the history of all transforms that had been applied to the variable, and,
for a raw variable, copies the before-transforms values into a new raw variable. Copy
Values copies the variable’s current transformed values and statuses into a new raw
variable, with no record of the history by which the current values were generated or
calculated.
For example, if a dataset’s transform list contains these transforms:
and you duplicate the variable flow1, the new transform list will be:
A duplicated variable inherits all properties of the original; copied values do not inherit
the comment or date/time pointer. For more information, see “Variable Properties” on
page 5-43.
Identical selection windows are invoked when you click Duplicate or Copy Values. The
following window is for the Duplicate operation.
There is a scrolled list of all variables in the after-transforms state of the dataset (even if
you are currently viewing the before-transforms state). When you select a variable it is
highlighted, and a default tag name for the new variable appears in the New Name box.
If you do not want to use the default new name you can type in a different one. If you
had already selected a variable in the spreadsheet before you invoked this dialog box,
that variable is the default selection, but you can change it.
After you click Duplicate or Copy, the transform list is applied to the dataset.
Meanwhile, a working message displays each transform as it is applied.
Deleting Variables
The Delete Variables window is invoked when you click Delete in the spreadsheet or in
the plot window.
After you select the variables to be deleted and click Delete, you are asked to confirm.
After you confirm, the variables and any transforms applied to them are deleted from
the dataset.
A variable cannot be deleted if:
• it is a date/time variable that is the date/time reference for any other variable (see
“Header Information” on page 5-14), either in the after-transforms or the before-
transforms state of the dataset;
Time Merge
The time merge function transforms the dataset so that in terms of time stamps and
sampling intervals, the entire dataset is uniform and consistent: each row has a single
time stamp, and rows are separated by a single, consistent time interval. The time merge
function provides several options for the interpolation or extrapolation method used.
A time merge is implemented using the $TimeMerge transform (see “System-
Generated Transforms” on page A-30).
In the following case, leave the Error in place, and the entire data row will be skipped.
Values for 01:00 will be generated when you time merge.
If any date/time variable has a value that is legal but erroneous, a time merge will cause
that error to affect the data values negatively. Before doing a time merge, you should
always check the Time Statistics to identify any peculiar values, and correct or remove
them. It can sometimes be convenient to round off the date/time values (see the
$dtRound transform described on page A-10) before doing a time merge.
It is usually preferable to remove bad data values (“outliers”) from the dataset before
doing a time merge. If you time merge first, and then remove the outliers, you should
also remove any interpolated or extrapolated value that was based on the outlier.
For interpolated points, the LinearExtend and SplineExtend options function the same
as ordinary Linear and Spline; but also they repeat the last original value until the
specified ending time is reached:
Interval is the time difference between successive rows in the resulting time column.
You type in the amount, and select its units from the option menu.
Maximum Time Gap is an optional value used to control whether a gap in the data
(before the merge) is filled by the time merge or is left blank. If a gap in the data is
smaller than the specified Maximum Time Gap, the time merge expands data to fill the
gap. If a gap in the data is larger than the Maximum Time Gap, the time merge indicates
the gap by a data point with Blank status. If all columns are filled with multiple rows of
blanks, time merge collapses them into a single row of points with Break status.
To set the Maximum Time Gap, type in the amount, and select its units from the option
menu; if you will be using this dataset to model a process, the maximum time gap
should be set to a value greater than the time merge interval but less than the time frame
over which extrapolation or interpolation would introduce significant errors. This
maximum time gap is therefore a function of the speed of the process as well as the time
merge interval.
Time merge calculations can only be applied to date/time values in strictly increasing
order. Two option menus are used to specify how to handle values out of order: Cut Data
throws out any data that is in decreasing time order; Sort Data sorts the entire dataset
while making the time merge calculations.
If the time values are in random order, sorting can be extremely slow; it may be
preferable to sort the dataset first (using a series of $sort transforms, see Chapter 7,
Transform Calculator and $sort described on page A-23), then write a dataset report
(described on page 5-52) and create a new dataset that is already sorted.
When you click Merge, a $TimeMerge transform is generated and applied to the
dataset (the transform syntax is described on page A-31). This transform may force
some other transforms already on the dataset to be re-evaluated. A working message
indicates which transform is currently being evaluated.
Certainty
Time merge may generate new data by extrapolation or interpolation. For each point, a
record is kept of whether it existed before the time merge or whether it was generated,
and, if generated, how far away it was from known data; this record is called certainty.
After a time merge, the positional help at the bottom of the spreadsheet displays the
certainty for a given point in the dataset.
You can query certainty values with the $certainty transform described on page
A-5, or change them with $setcert described on page A-22. When you train a
model, you can use the certainty values to pay more attention to points with higher
certainty; see “Sparse Data Algorithm” on page 9-16.
The distance from known data that causes a certainty of zero is set with a parameter
called maxCert. This parameter cannot be set in the Time Merge window, but it is
automatically inserted in the $TimeMerge transform (described on page A-31), so you
can modify it (see “Editing the Transform List” on page 7-18). If you use the optional
Max Time Gap, maxCert defaults to the same value; if you do not use Max Time Gap,
maxCert defaults to 12 times the time merge Interval.
Certainty values are generated by the $TimeMerge transform and do not use any other
information about the variable; in particular, if you time merge the same variable twice,
its certainties after the second time merge do not take into account any poor certainties it
may have had resulting from the first time merge. If you will be making use of certainty
information, never time merge the same variable more than once.
Variable Properties
Use the Variable Properties window to set Analysis Values and Analysis Variable
Ranges (AVR), and to view and change other properties. Invoke it by clicking
Properties in the spreadsheet or in the plot window.
There is a scrolled list of all variables in the current view. Click on the tag name of any
variable in this list, and its properties will be displayed.
Common Properties
For all variables, the Column number, Type, and number of Values are displayed. These
properties cannot be changed. The number of values is the total number of points that
this variable has in the dataset, including cut, blank, missing, error, and break points.
All other properties are displayed in text boxes. To change any of them, click in the text
box, type in the change, and press the Return key. Properties for all variables include the
column header information: tag name, comment, units, display format, and (for
variables other than date/times) the date/time column pointer.
The Show Transforms button brings up the Transform window, described on page 7-3.
If you select any variable in this window, and then click Show Transforms, the
Transform window comes up with that selected variable applied as a Mask, as described
on page 7-7.
After correcting any bad data, you can reset the AVR manually by typing in new values,
or you can use the convenience buttons (or the AVR menu in the before-transforms view
of the spreadsheet) to change the AVR to the current range. Convenience buttons are
provided to set the AVR at any time to the current values of the data range either before
or after transforms; note that either of these is a one-time-only setting to current values,
and does not change the AVR again later if the range changes again later. You should not
normally set the AVR to the after-transforms range if you have applied transforms, other
than clips and cuts, that change the range.
Menu Bar
The spreadsheet and the plot window share most menu bar functions, although some
menus in the spreadsheet are omitted from the plot. In the after-transforms state of the
dataset, the menus are Dataset, Window, Edit, and Reorder; in the before-transforms
state of the dataset, the Reorder menu is replaced with an AVR menu.
Dataset Menu
The Dataset menu provides the following operations:
Create New Dataset page 5-47
Load Dataset page 5-47
Add New Variables page 5-47
Add New Rows (Before Transforms) page 5-48
Add Dataset page 5-50
Inherit Transforms page 5-50
Save Dataset and Save Dataset As page 5-50
Save Dataset Report page 5-52
Clear Dataset page 5-55
Delete Dataset File page 5-55
Load Dataset
This function operates as described in “Loading a Dataset” on page 5-10. If you cancel
the process at any point, a currently loaded dataset will remain loaded; otherwise, the
dataset is cleared immediately before the new one is loaded.
Note that you cannot append to any one column from more than one data file at the
same time.
You cannot read new rows into an independent or computed variable; new rows can be
appended only to raw variables (variables that exist before-transforms).
You can append to multiple columns at the same time, but if they have the same date/
time pointer, they must have the same length in the before-transforms state of the
dataset
.
Inherit Transforms
The Inherit Transforms function lets you start with one dataset, for example ds1, that
does not yet have any transforms, and select a second dataset, for example ds2, to
inherit transforms from. The raw columns from ds2 are not touched, only the complete
transform list from ds2 is copied to become the transform list of ds1; the transform list
is then applied to the raw variables of ds1. The Inherit Transforms operation will fail if
any transform from ds2 cannot be applied to the variables in ds1.
When you select Inherit Transforms from the Dataset menu, the common Select Dataset
dialog is invoked. This dialog functions exactly as described in “Loading a Dataset” on
page 5-10.
After inheriting transforms, you should carefully review any functions that are based on
a variable’s row number or date/time value, to be sure they are still appropriate in the
dataset that inherited them. Examples of such functions include, but are not limited to,
$override, $MarkCut, and $TimeCut.
you saved it. If the dataset does not already have a name, Save Dataset is identical to
Save Dataset As.
Save Dataset As saves the current dataset, and stores its name in a data dictionary. The
Save Dataset dialog provides text boxes for you to enter a directory and dataset name,
data dictionary name, and an optional comment.
The dataset name can consist of letters, numbers, and underscores only; no other
characters are allowed. If you enter a name that is already recorded in the data
dictionary, you will be asked whether to overwrite the existing dataset.
You can use any valid data dictionary name. It defaults to the data dictionary that you
have most recently specified, or to your default file.
The comment is stored in the data dictionary with the dataset name, and is displayed
whenever the Select Dataset dialog is opened.
You can save datasets in either binary or ASCII format. Binary data is faster to save and
load. ASCII can be viewed with a text editor and transferred between computers more
You can save the dataset in the format used by the current version of Pavilion software
or in the format used by a previous version. This feature is useful if you intend to use the
dataset on a computer that you have not updated with the latest version of Pavilion
software. A given version of the software can use datasets saved in an earlier format, but
the software may not be able to use a dataset saved in a later format.
Character is written between those header values. You can select Space, Tab, Comma,
or any other single character that you specify.
Default values are provided for the Start and End Row, and the Directory and Filename
of the report file. The default End Row is based on the length of the dataset, not of the
column(s) that are being written.
The Write Format File option is used if you want to read the report file back into a new
dataset; it allows you to bypass the formatter by automatically writing a format file and
recording it in a data dictionary. The next time you select Create New Dataset or Add
New Variables, this dataset report file will appear in your list of formatted files. If you
select Write Format File, you can type in the name of a data dictionary.
Clear Dataset
Clear Dataset erases the internal storage image of the current dataset, giving you an
opportunity to save if you have not yet done so. It does not affect any datasets that you
have saved to disk.
This dialog displays the name of the current data dictionary, and a scrolled list of all
datasets recorded in it (sorted by directory). If you want to use a different file as the data
dictionary, click in the box, type in its name, and press the Return key, and datasets in
the new data dictionary will be listed. To select a dataset from the list, click on it and
then click the Delete button (or, as a shortcut, just double-click on the dataset). You can
type in the full path name of a dataset instead.
Delete
This operation permanently removes the files from your disk.
Just Remove From DD
This operation removes the data dictionary entry without affecting the disk files.
Window Menu
The Window menu contains two entries, New Spreadsheet and New Plot. All
spreadsheet (described starting at Chapter 5, Spreadsheet) and plot windows (described
starting at Chapter 6, Data Plotter) that are open at the same time contain the same
dataset.
Edit Menu
The Edit menu is found only on the spreadsheet. Edit functions can be used on data cells
or header cells; but for data cells they can be used only in the before-transforms state of
the dataset. Edit functions do not generate transforms, so you have no record of any
editing that has been performed. (Similar functionality is available in the
$clearRows, $copyRows, $deleteRows, $dupRows, and $insertRows
transforms.) Only the most recent edit operation can be undone.
Edit Regions
Before selecting any Edit operation (except Undo), you must click or drag in the
spreadsheet to select the region to which the editing will be applied. Except as noted
below, the region can be a single cell, a rectangular group of cells, one or more complete
rows, or one or more complete columns. To select a region, drag from one corner to the
opposite corner; or click in one corner, scroll the window if necessary, and shift-click in
the opposite corner. To select a complete row, click in its row number; to select multiple
rows, drag through their row numbers; to select a complete column, click in its column
number; to select multiple columns, drag through their column numbers.
Note: For edit operations, selecting all of the cells in a row or column
is not the same as selecting the row or column by its number.
The cut/copy/paste operations treat header cells and data cells independently, and they
do not share the same buffering system; you cannot copy from data and paste into a
header, or vice versa.
Edit Operations
Undo (Ctrl-u)
Undoes the most recent editing operation since the current dataset was loaded.
Cut (Ctrl-x)
Moves the contents of the selected cells into the internal editing buffer, and leaves
the selected cells Blank. You can cut all of the cells in a column, but you cannot
select a column by number and cut the entire column. You cannot cut a Tag Name
cell.
Copy (Ctrl-y)
Copies the contents of the selected cells into the internal editing buffer.
Paste (Ctrl-p)
Copies the contents of the internal editing buffer into the selected region. The
region being pasted into must have the same variable types as the values in the
buffer. You can select a single cell that will be used as the upper left corner of the
paste region, or you can select the entire region. If you select a rectangular region,
its size in at least one dimension must match the size of the buffer. You can
lengthen a column by pasting beyond the bottom of a column, but you cannot create
new columns by pasting to the right of the dataset. You can paste into all of the cells
in a column, but you cannot paste when the selected region is one or more complete
columns.
Clear (Ctrl-b)
Removes the selected cells’ contents and leaves them Blank, without saving into
the internal editing buffer.
Reorder Menu
The Reorder menu is available in the spreadsheet only when you are viewing the
dataset’s after-transforms state.
Move Variables
To move one or more adjacent variables, first select the column(s) by clicking or
dragging on their column numbers (as described in “Selecting a Region” on
page 5-14), then select Move Variables from the Reorder menu. Now, when you
move the mouse into the spreadsheet, a moving indicator appears to show the
position into which the columns will be moved when you click. Scroll if necessary,
and when the indicator is where you want to move the columns, click, and the
columns will be moved. To “cancel” the move, just move the pointer back into the
original position
.
Sort Alphabetically
Sort Alphabetically first warns you that you cannot undo the operation, and then it
sorts the transformed variables.
AVR Menu
The AVR menu is available only when you are viewing the dataset’s before-transforms
state. There are two operations: Set to Data Range Before Transforms and Set to Data
Range After Current Transforms.
The concept of AVR applies only to raw variables before they are transformed. Every
raw numeric variable has an Analysis Variable Range (AVR). For more information on
AVR, see “Variable Properties” on page 5-43.
For one or more variables that you select, you can set the AVR at any time to the current
values of the data range in either the before- or after-transforms state; note that either of
these is a one-time-only setting to current values, and does not change the AVR again
later if the range changes again later. You should not normally set the AVR to the after-
transforms range if you have applied transforms, other than clips and cuts, that change
the range.
Either option from this window brings up a dialog used to select the variables to which
the AVR setting is to be applied.
This chapter explains how to use the plotter to display graphical representations of data
in a dataset. Display the plot window by selecting New Plot from the Window pull-
down menu in the spreadsheet.
The Plot window consists of a plotting area surrounded by plot controls, plus plotting
menus and operations buttons that are shared with the Spreadsheet window. The
operations buttons are explained beginning on page 5-32, the dataset menu is on page
You can plot and manipulate the dataset either in its normal state after the transform list
has been applied, or in the “before-transforms” state, showing the raw variables before
any transforms are applied. In the before-transforms state, the background color
changes, the AVR menu appears in the menu bar, and the Duplicate, Time-Merge, and
Transform operations buttons become inactive.
The controls to the left of the plotting area set the plot type and parameters. The specific
parameters that are to be set change with the plot type selection. The controls to the
right of the plotting area are used with any plot type (with some minor exceptions as
noted below).
The Continuous Update toggle button controls whether changes that you make to any
plot controls or selections are drawn immediately, or saved until you click the Draw
button.
If Continuous Update is turned off, a highlight appears around the Draw button
whenever you have specified changes that have not yet been drawn.
Depending on the amount of information that is being plotted, it can take a significant
amount of time to draw a plot. While a plot is being drawn, the Draw button changes to
a Stop button and is highlighted in red. If you click the Stop button, the drawing process
stops.
To make a plot, you select the Line Type, Graph Type, and Plot Type, and set the
parameters that appear for the selected plot type; as soon as you select the variables to
be plotted, the plot is drawn.
Graph Type
Graph Type is used when you select more than one variable to be plotted. For some Plot
Types, you cannot control the Graph Type, so the menu is grayed out.
A Stacked plot draws every selected variable on a separate Y axis and with a common
X axis, except in the histogram plot, where each X axis may be different. An Overlay
plot draws every selected variable on a single pair of axes. A Normalized plot draws
every selected variable on the same X axis, but with independent Y axis scalings, such
that each variable’s minimum and maximum are drawn to the same height on the plot.
These three plot types are illustrated below.
This box lists all variables that are currently being plotted. To display the Y axis
labelling for a different variable, double-click it, or click it and click OK.
When you select a different variable, the Y axis labels are changed to display the scaling
for that variable. The scaling of the plot is not changed.
Display
Legends are drawn above stacked plots, and to the right of overlay or normalized plots;
by turning off the legends, more space is available for drawing the plots.
Selecting Y Variables
All types of plots include in their parameters a button for selecting the variables to plot
on the Y axis, with up-arrow and down-arrow buttons for quick scrolling through the
list of variables.
When you click the Y Variables button, the Y Variable Selection dialog box is invoked.
When you select variables and click OK, the plot is drawn. After you have drawn a plot,
if you click a different plot type, the new plot is drawn with the same Y variable
selections.
The up-arrow and down-arrow buttons are used for quick scrolling through the variables
in the dataset. After you have selected any n variables, the up-arrow (or down-arrow)
button will select the next (or previous) n variables, without opening the Y Variable
Selection dialog. To keep a selected set of variables in the plot while scrolling through
other variables, select the Freeze tool and select the variables you want to keep in the
plot.
Row Count and % Rows Visible display corresponding information, controlling the size
of the X axis. If you change either of these parameters, the other one is automatically
updated. To display all rows, click the 100% button. Y variables are selected as
described in “Selecting Y Variables”, above. First Row and Row Count control which
rows of the dataset are plotted. To change either of them after you have drawn a plot,
click in its text box, type the new number, and either press the Return key while the
mouse is still in the text box, or click Draw. If you scroll through the plot using the
scrollbar or the left and right arrow buttons, the value of First Row is automatically
updated.
Start Date and Time, Increment, and % Time Visible control the portion of the dataset
that is plotted. Increment, which is expressed as a typed quantity with units selected
This corresponds exactly to the % Time Visible. If you change either of these two
parameters, the other one is automatically updated. To display all times, click the 100%
button. Start Date and Time are used as an alternative to the scrollbar or the left and
right arrow buttons, to scroll through the dataset. The default values are Start Date and
Time at the beginning of the dataset, and 100% of the dataset time visible.
To change any of these parameters, click in its text box, type the new number, and either
press the Return key or click Draw.
Y Variables are selected as described in “Selecting Y Variables” on page 6-10. You may
not select any variable that does not have a valid date/time reference. You can view the
date/time reference of a variable in the spreadsheet, or by checking its properties as
described on page 5-43.
Increment and % Time Visible control the scale of the X axis. If you change the Start
Date/Time to values outside the bounds of your dataset, the displayed Increment and
% Time Visible do not change. If you scroll through the plot using the scrollbar or the
left and right arrow buttons, the starting date and time are automatically updated.
XY Plot
XY plots one variable on the X axis and one or more variables on the Y axis. When you
select an XY plot, the parameters that appear are X Variable, Y Variables, First Row,
Row Count, and % Rows Visible.
Row Count and % Rows Visible display corresponding information, and if you change
either of these parameters, the other one is automatically updated. To display all rows,
click the 100% button. You may select X and Y variables in any order; plotting will
occur as soon as you select them both. Y variables are selected as described in
“Selecting Y Variables” on page 6-10.
When you click X Variable, the X Variable Selection box appears. This box simply lists
all variables in your dataset and allows you to select one of them for the X axis. You can
click your selection and then click OK, or double-click your selection, or click Cancel to
cancel the selection.
First Row and Row Count control which rows of the dataset are plotted. To change
either of these values before you draw a plot, click in its box and type the new number.
Important: If you display less than 100% of an XY plot, the portion that
is displayed is selected by row number, not by X value. This is
different from some other commercially available plotting packages.
For an XY plot, the default Line Type is Lines off and Points on. If you turn Lines on,
the lines will connect the data points in order by row number.
Probability Plot
The probability plot, or theoretical quantile-quantile plot, plots the quantiles of the
selected variable(s) against the quantiles of a theoretical normal distribution. This plot is
useful for identifying outliers and data clusters or multiple populations. If the data were
perfectly normally distributed, it would plot in a straight line, with the slope of the line
indicating the standard deviation. The outliers are those points farthest from a line
followed by the rest of the points. If the data consists of multiple populations, there will
be multiple groups of points, each roughly a straight line. This can indicate that the
variable stabilizes around two or more distinct states.
For a rigorous definition of the quantile-quantile plot, consult a textbook or reference on
statistical methods for data analysis. As an approximation only, you can think of the
quantile-quantile plot like this:
Make a plot of the cumulative frequency distribution of your data, with percentage
of data on the X axis and data value at that percentage on the Y axis. Make a similar
plot for a normal curve, with percentage of data on the X axis and number of
standard deviations from the mean on the Y axis. Since these two plots share the
same X axis, you can group the plotted values in triplets of (X, Y, Y'). Drop the X
value from these triplets and you will have pairs consisting of cumulative frequency
within your data and cumulative frequency within a normal curve, both measured at
the same percentage of data. A plot of these pairs, with normal curve values on the
X axis and values within your data on the Y axis, is a quantile-quantile plot.
When you select a Probability plot, the only parameter is Y Variables; Y variables are
selected as described in “Selecting Y Variables” on page 6-10, but note that you cannot
Histogram Plot
When you select a histogram plot, the parameters that appear are Y Variables, First
Row, Row Count, Number of Bins, Bin Size and Offset, Cumulative, and % Data
Shown.
P.C.A. Plot
A Principal Components Analysis (P.C.A.) plot shows component number on the X axis
and the product of the principal component vector with its weight on the Y axis.
You must select at least two Y variables to make a P.C.A. plot, and they must be
numeric variables, not date/times or strings. Select Y variables as described in
“Selecting Y Variables” on page 6-10.
The Normalize toggle controls whether the calculations are based on normalized values,
which are divided by the standard deviation of the data, or unnormalized values, which
are not. Selecting Normalize does not normalize the results.
The default graphical representation of P.C.A. data is as a series of colored squares. The
range from -1 to +1 times the value of largest magnitude is divided into a number of
equal portions, and each portion is assigned a color; the default colors are shades of blue
for negative values, and shades of red for positive values. The number of portions is
equal to the number of colors specified as color resources. The size of each square is
proportional to the magnitude of the value.
If the Vary Size toggle is turned off, all squares are the same size; if Vary Color is off, all
squares are black; if both are off, no squares appear. The Line Type controls for a
P.C.A. plot default to Points and Lines both turned off, but they can be turned on; if on,
they are plotted on an unmarked Y axis that is identical to the range of calculated
The operation adds transforms that calculate the scores for each component. You specify
a prefix to identify the new variables, or you may accept the default.
Note: The algorithms used for the Correlation plot do not detect
nonlinear correlations. Variables in your dataset may be strongly
related nonlinearly even though the Correlation plot indicates no
correlation. To analyze nonlinear correlations in your data, use the
model builder’s Find Time Delays feature (described on page 8-12).
In the Correlation plot, data is represented in the same style of colored boxes as the
P.C.A. plot, described above. The Normalize, Vary Size, and Vary Color toggle buttons,
and Line Type and Graph Type selections, function as described above for a P.C.A. plot.
The parameters for a Correlation plot are an option menu to select Y vs. Y or X vs. Y
plotting, X Variable and Y Variable selection buttons; toggle buttons to set Normalize,
Vary Size, and Vary Color; a Use Time toggle, and other parameters that depend on the
setting for Use Time.
Y vs. Y is a shortcut for making identical selections for the X and Y variables; X vs. Y
lets you make independent selections for X and Y variables. If Y vs. Y is selected, the
X Variables button is grayed out.
If Use Time is turned off, the additional parameters are X Tau and Y Tau. These values
specify an integer number of rows that the X and Y variables should be shifted before
the correlation calculations are made.
If Use Time is turned on, the analysis calculates correlations for a group of Y variables
against a single X variable that is time-shifted through a range of time steps. The
additional parameters are Start Tau, Stop Tau, and Interval, the Y vs. Y toggle
disappears, and the X Variables(plural) button becomes an X Variable(singular) button.
For example, if you set Start Tau -10, Stop Tau +10, and Interval 1, then all of the
To generate a report file of correlation statistics, click Write Corr Report. The operation
prompts you to specify a file name.
Correlation values can be written into columns in the dataset using any of the transforms
$correlation, $corr, $covariance, or $covarTD, described on page A-8.
Y Axis Limits
To change the limits of a Y axis, move the mouse over the Y axis legend until it is
highlighted, and click; the Axis Limits box will appear (note that you must click on the
highlighted region and not on the axis line itself).
You can set a variable’s Y axis limits to the limits of its displayed values, or the limits of
all of its values in the dataset, or to any arbitrary minimum and maximum value. If you
choose to display data’s min and max or variable’s min and max, the Display Cuts
toggle button in the Plot window controls whether the values of Cut and Break points
are considered in calculating the min and max. When you click Apply, this dialog is
closed and the plot axis is changed as specified.
Crosshairs
After a plot is drawn, two black arrows appear in the lower left corner of the drawing
area.
Tools
The plotter provides a number of tools to help you examine data and select data for
cutting.
The informational tools are Info (on page 6-31) and Zoom (on page 6-37). The tools that
change the dataset, and thus generate transforms, are Clip, Cut (three variations), and
Uncut (on page 6-30). The Freeze tool (on page 6-38) is used to select variables to retain
in the plot while cycling through the other variables. The currently-selected tool is
outlined and highlighted. The default tool is Info. Not all tools are available for all types
of plots; if a tool is not available, its icon is grayed out.
In the after-transforms state of the dataset, clips, cuts, and uncuts are applied to the
dataset as transforms, so they can be viewed, modified, or deleted, like any other
transform. This can be especially useful if you are having difficulty getting precise
mouse movement in a crowded plot; just use the mouse to mark approximately the area
that you want, then open the Transform window, select the transform, and modify it to
the exact values that you want.
However, in the before-transforms state of the dataset, clips, cuts, and uncuts change the
data directly, with no record of the change that was made, and no direct Undo capability.
Cuts and uncuts can still be rescinded with an opposite action (either before- or after-
transforms), or with a $changestat transform, but there is no Undo capability for
before-transforms clips. If you destroy data by a before-transforms clip, the only way
to recover is to re-create the dataset from the raw data file. Points that have been clipped
or cut are indicated in a plot by colored dots. Cut points can be hidden if you turn off the
Display Cuts toggle button, but clipped points are always visible. When a point has been
cut, its value is still known, but its status is Cut so the value is not used in calculations.
You can restore a cut point to OK status by marking it with the Uncut tool, or by
removing the transform that cut it, or by clicking in its cell in the spreadsheet and typing
OK.
Uncut $UnMarkCut
applied from a Row Number plot
Uncut $UnTimeCut
applied from a Time Series plot
Uncut $UnScatCut
applied from an XY plot
When you Cut Y or Clip, the transform is based only on the variable’s value. When you
Cut Box, Cut X, or Uncut, the transform applies the cut or uncut according to the X axis
in the current plot type: $MarkCut and $UnMarkCut are based on row numbers in
the dataset, $TimeCut and $UnTimeCut are based on date and time in the variable’s
date/time column reference, and $ScatCut and $UnScatCut are based on row
numbers combined with pairs of values in the selected variable and the X axis variable.
Any method of uncut can be used after any method of cut.
When you apply a clip, cut, or uncut to an Overlay or Normalized plot, the system
generates one transform for every variable in the plot, even if a particular variable does
not have any points that are affected by the action.
The plot tools provide a quick and easy means to cut data from the dataset that you are
currently using; but cuts from Cut X or Cut Box, which are based on a row number or
date and time in the current dataset, will have no effect (or even an undesirable effect) if
you inherit the transforms onto another dataset, or if you save the dataset to be used in
real time with . It is almost always preferable to remove data from a dataset by using a
transform that describes conditions when data should be ignored.
For a large dataset with a large number of transforms, it can take a considerable amount
of time to apply the dataset’s transform list. You can minimize the calculation time if
you apply cuts and uncuts carefully. For example, suppose that you intend to cut points
from rows 4167 through 4200, but find that you missed and accidentally cut rows 4150
through 4200. You could simply apply an uncut to rows 4150-4166; or you could open
the transform window, find the cut in the transform list, and modify it to apply only to
the correct rows. Modifying the transform is more work for you, but it means that when
the dataset’s transform list is evaluated, only one transform (from these actions) will be
calculated, instead of one cut and one uncut.
Info
Info is used to display information about a plotted point.
Click on the Info icon, move the mouse to the desired point, and push and hold the
mouse button. While the button is held, the coordinates of the point and any other
pertinent information will be displayed.
Information for row number plot:
Clip
Clip is used with numeric variables only, to change data values above or below a
selected value to be equal to that value.
It affects the entire variable, even if not all of it is visible in the plot. In a single clip
action you can set either an upper limit or a lower limit, but not both at the same time.
To set an upper limit, click on the Clip icon, move the mouse anywhere in the top half of
the plot to be clipped, push and hold the mouse button, and drag the mouse vertically on
the plot. As you drag the mouse, the Y value of its current position is displayed, the area
to be clipped is highlighted, and every affected point is marked with a yellow vertical
arrow.
The vertical arrows are difficult to distinguish when the plot shows a large number of
points, but when displaying fewer points, you can distinguish them.
Cut Y
Cut Y is used with numeric variables only, to cut all data values above or below a
selected value.
It affects the entire variable, even if not all of it is visible in the plot. In a single Cut Y
action you can set either an upper limit or a lower limit, but not both at the same time.
The mouse actions are the same as described above for clipping. To set an upper limit,
click on the Cut Y icon, move the mouse anywhere in the top half of the plot to be cut,
push and hold the mouse button, and drag the mouse vertically on the plot. As you drag
the mouse, the Y value of its current position is displayed, the area to be cut is
highlighted, and every affected point is marked. When you reach the Y value that you
want to set as the upper limit of the data, release the mouse button, and click Apply; all
data points greater than that value will be cut. A lower limit is set similarly, starting in
the bottom half of the plot.
Cut Box
Cut Box is used to define a rectangular area of a plot within which all data points are cut
from the dataset, based on time or row number.
Click on the Cut Box icon, move the mouse to one corner of the rectangular area you
wish to cut, push and hold the mouse button, drag the mouse to the opposite corner of
the rectangle, and release the button. The rectangle will be highlighted, and all data
points within it will be marked with a red box.
The red boxes are difficult to distinguish when the plot shows a large number of points,
but when displaying fewer points, you can distinguish them.
When you click Apply, all data points within the rectangle will be cut from the dataset.
Cut X
Cut X is used to cut all values within a range that is defined on the X axis, based on time
or row number.
To cut from all plots, move the mouse into the area between any two plots and drag
through the desired area.
As you drag the mouse, the area to be cut is highlighted, and every affected point is
marked with a red box. After marking the area to be cut, release the mouse button, and
click Apply; all data points within the marked range will be cut.
Uncut
Uncut is used to restore the previous value of some points that have been cut using any
of the cut tools.
Uncut is not the same as undoing a Cut; to undo a Cut, find it in the transform list and
delete it. Click on the Uncut icon, and drag through a rectangular area (as described
above for Cut Box). It’s OK if the region includes points that were not already cut. Click
Apply and any previously cut points within the rectangle will be uncut.
Zoom
Zoom is used to magnify or “zoom in on” a portion of a plot.
Click on the Zoom icon, and use the mouse to drag through a rectangular area (as
described above for Cut Box). Cancel and Apply Zoom buttons will appear.
Click Apply Zoom and the selected region will expand to fill the entire plotting area,
and an Undo Zoom button will appear. You may repeat Zoom as many times as
necessary. When you click Undo Zoom, the original plot will be restored. The other plot
Freeze
Use the Freeze tool to select variables to retain in the plot so that you can use the up-
arrow and down-arrow buttons to cycle through the other variables, comparing them to
the “frozen” variables.
First select the freeze tool. Then click one or more variables in the plot. The border on
variables selected for freeze changes from a dotted line to a solid line. Now use the up-
arrow and down-arrow buttons to the left of the plot area to cycle through the other
variables. The frozen variables remain in place so that you can compare them to each
page of variables at a time.
Printing a Plot
The Print button is used to print a copy of the plotting area (not the entire window). It
invokes the Plot Print setup window, which sends a PostScript specification to a file or
directly to a printer.
Select the orientation, paper size, number of copies, and destination, and whether to
print in color (Color On) or black & white (Color Off). Consult your system
administrator for a list of valid printer names on your system. If you write to a file, you
can specify it by file name only, or by its full directory path. You can optionally print
System Labels, showing the dataset name and the current date and time; you can also
print labels on the top of the page, and along the X and Y axes. Fonts and sizes can be
specified independently for the labels.
This chapter describes how to use the transform calculator to add, display, and delete
transforms in your dataset. The effects of transforms appear immediately in the after-
transforms view of the dataset. To invoke the transform calculator, click Transform at
the bottom of the spreadsheet.
A transform function is a mathematical or logical operation that may be applied to a
variable or group of variables in the dataset. The result of the function can replace the
values of an existing variable or can be stored as a new variable. You can apply more
than one transform sequentially to the same variable.
All transforms of a dataset are kept in one ordered list, which is displayed in the
Transform window. The transform list includes functions that you apply directly from
the Transform window, and transforms that are automatically generated by a number of
user actions in the spreadsheet and plot windows (see “System-Generated Transforms”
on page A-30). When transforms are being applied to a dataset, a working window
displays each transform as it is being calculated. In some circumstances this message
includes a Stop button that allows you to cancel the evaluation, but this feature is not
always available.
Transform Order
The internal architecture of spreadsheet software is typically either equation-based (also
called formula-based) or data flow.
In an equation-based system, each column (or, in some applications, each cell) has its
own formula or list of formulas. The system keeps track of which columns reference
which other columns, so that calculations are done in the correct order. If A already
exists in the spreadsheet, and you create a new column B=2*A, then any time you
change a value in A, B is automatically recalculated. No matter what you do to A, B
always retains the relationship 2*A. The user never has to be concerned with the order
in which formulas are applied, or with maintaining the relationships among variables.
Many commercial spreadsheet programs use an equation-based architecture.
The simple-to-use and easy-to-understand equation-based architecture, however, is not
compatible with modifying or deleting any transforms that were done before a time
merge. These capabilities require a data flow architecture.
In a data flow system, the formulas function like a programming language. There is one
ordered list of formulas for the entire spreadsheet, and the formulas are calculated
strictly in the order in which they appear in the list. If you enter the formula B=2*A, it
only means “set B to 2 times the value that A has right now”; if you subsequently enter
any formula that changes A’s value, B is not affected. This architecture is far more
powerful than equation-based, but it requires considerable attention and understanding
from the user to ensure that formulas are entered in the correct order to produce the
desired results.
The transform calculator implements a restricted data flow architecture that emulates an
equation-based system. When you enter a new transform, instead of being placed at the
bottom of the list, it is automatically inserted in a position which retains the
relationships that you have already specified. If you enter the formula B=2*A, and
subsequently enter any formula that changes A’s value, that new formula is inserted in
the list above B=2*A, so B’s value is forced to change, in a manner that is compatible
with the data flow concept. For any particular transform, there may be a number of
different “legal” positions; if necessary, you can rearrange the order of the transform
list, but you cannot rearrange it so that it stops emulating an equation-based system.
This architecture supports modifying or deleting transforms that were done before a
time merge.
Transform Window
The Transform window is used to build, modify, rearrange, and delete transforms. It is
invoked when you click the Transform button in the spreadsheet or in the plot window,
At the top of the window are text boxes for specifying the variable name and the
expression, with control buttons to clear the text boxes or enter the transform into the
transform list.
The calculator buttons, and lists of variables and transform functions and constants, are
used solely as a convenience in constructing a transform; their only action is to place
text into the Variable and Expression text boxes. If you prefer, you can type the text
from the keyboard, without using these buttons and lists.
The Multiple button invokes the common Variable Selection box. This is a convenience
for placing multiple variable names into a transform, automatically provided with the
required syntax (as described on page 7-7).
A trigger button, labeled with the ellipsis (“...”), appears next to some of the more
complex functions such as the $TimeMerge transform. Click the trigger button to
display a dialog for specifying arguments.
The transform list is the list of all transforms that have been applied to the dataset (and
that match the mask, if any). In this display, the complete transform list is numbered
sequentially, but the index numbers are for display only and are not part of the transform
syntax.
The bidirectional arrow button is used to expand the transform list vertically to fill the
window,
Masking is used to selectively display only a portion of the transform list. It is primarily
used to display all transforms that have been applied to a specified variable, but it can
also be used for any other lexical matching. Masking is applied from the Current Mask
text field; type in any text and press the Return key, and the transform list display will be
limited to transforms that contain that text (ignoring case). As a convenience for
displaying transforms that have been applied to a specified variable, click on that
variable in the Mask list; its name and the appropriate punctuation will be inserted in the
Current Mask text field.
If you select a spreadsheet column and then open the Transform window, or if you open
the Properties window, select a variable, and click Show Transforms to open the
Transform window, a mask is automatically applied to display transforms that have that
variable as output.
The Show All button removes the current mask and displays the complete transform list.
Syntax
The syntax of each transform function is given in Appendix A, Transform Reference;
this section discusses general syntactic rules. The syntax of each function is also
displayed in the positional help at the bottom of the window when you move the mouse
over the function’s name in the list. For example, placing the mouse over the average
However, note that you cannot make a transform that consists solely of a comment,
without a valid expression. For example, to suppress a transform
!A! = $ln(!A!)
you could modify it to
!A! = !A!;$ln(!A!)
When you create a new variable, its tag name can be any valid name that you wish; it
does not have to match or indicate any tag in your process. Valid names must not be
longer than 72 characters, and may not contain an exclamation point (!), a double
quote ("), a left curly brace ({), or a right curly brace (}).
Expressions can include extra blank spaces, tabs, and redundant parentheses. Names of
variables, functions, and constants are not case sensitive; sometimes we display and
document them in uppercase or mixed case, but this is solely for ease of recognition.
If the output of a transform is the same variable as one of its inputs, you can substitute
the symbol $self instead of repeating the name of the input.
Numeric constants can be entered in decimal or scientific notation.
Date/time constants are specified as date followed by time, separated by at least one
space, surrounded by backslash (\) characters. The date and time must be in a form that
the parser can recognize; a number of these are documented in “Units” on page 4-12.
The recommended form is
\mm/dd/yy hh:mm:ss.ttt\
with seconds and thousandths optional. There are additional forms that can be
recognized, including
\3 days\
and
\4.1 hours\
Character string constants must be surrounded with either double quote (") or single
quote (') characters. There are two options for typing a character string constant with
embedded quotes; either type the embedded value twice, or surround the string with the
other quote style. For example, the string
ab"cd
could be typed as
"ab""cd"
calculates five new variables containing fft information on an input variable called
signal; but
omits the second and third outputs, calculating only the frequency, magnitude, and
phase information.
This syntax generates five different transforms, each placed in the transform list in its
own correct position (not necessarily the order in which you typed the output variables)
.
When you use this syntax, each output variable must be surrounded with exclamation
points and the list must be separated with commas, but you must not surround the list
with parentheses. The syntax that uses parentheses is only for a single transform that has
multiple outputs.
The default number of rows is the length of the dataset at the time the transform was
created. If you wish to specify a different length, you can do so when you create the
transform
or you can modify the transform after it has been created. (If you modify the transform
and remove the length completely, the default length will automatically be put back in.)
Arithmetic Operators
+ unary positive or binary addition
- unary negative or binary subtraction
* multiplication
/ division
^ raise to the power (not necessarily a positive integer)
These operators function as on an ordinary calculator.
% percent; N% is replaced internally with N/100 (N must be a constant)
then the value of New will be 1 in any data row in which A is less than 123, and New
will be 0 in all other data rows.
then the value of New will be 0 in any data rows in which both A and B are less than
123, and New will be 123 when A or B or both reach 123.
which is expressed verbally as “if A is less than 10, the result is 1; otherwise, if A is less
than 100, the result is 2; otherwise, if A is less than 1000, the result is 3; otherwise, the
result is 0”. See also the $findLE transform, which functions as a multiply-nested if-
less-than-or-equal.
Miscellaneous Buttons
\ Special key symbol used to surround date/time constants.
: Used in time values, and to specify the column length of new independent
variables.
space Used at your convenience to insert blank spaces for easier reading.
" Used to surround character string constants; interchangeable with single
quote (').
; Comments out the remainder of the transform, as described in “Syntax” on
page 7-7.
average in the Functions and Constants list below and then click on flow1 in the
variables list, the corresponding terms appear in the Expression field:
When you first open this window, the window manager’s focus should be in the
Variable text box; if not, move the mouse outside the window and back in, and, if
necessary, click in the text box. After you fill in the Variable, if you press the Return
key, the focus will automatically move to the Expression. At any time you can click the
Clear button; both text boxes will be erased, and the focus will move back to the
Variable. After you fill in the Expression, if you press the Return key or click the Enter
button, the new transform is inserted in its correct position within the transform list, and
any affected transforms are recalculated. A message informs you which transform is
currently being calculated.
(If you enter a transform that is parsed but cannot be evaluated, the entire transform list
is reapplied before a message notifies you of the failure, to ensure that the dataset is
restored to a safe condition.) After the transform list has been calculated, the text of the
new transform remains in the Variable and Expression fields, because it is common to
make several similar transforms sequentially. If you don’t want to re-use any of this text,
simply click the Clear button to erase it.
The transform list Edit functions are in the Edit pull-down menu.
While you are editing the transform list, it is likely to take several steps to complete the
editing that you intend; therefore, recalculation is automatically suspended until you
select it. When you edit the transform list, the Cancel Changes and Update Dataset
buttons, which are normally grayed out, become enabled (and are highlighted in red).
Note: When these buttons are enabled, you can continue to edit the
transform list, but you cannot enter new transforms for automatic
insertion, or save the dataset, or perform any action in the
spreadsheet or Plot window that would generate a transform, until you
click Cancel Changes, Update Dataset (recalculate), or Done
(recalculate and close the window).
Exceptions to this are Delete All, which is automatically followed by recalculate, and
Copy, which does not change the list. Cancel Changes only cancels the transform list
editing that has not yet been applied; it does not cancel any changes that have already
been applied to the dataset.
Append
Append changes the Enter button to Append, and allows you to create one new
transform that is placed at the bottom of the transform list
.
The Append operation bypasses the rules that would insert it only in a valid position.
The Clear button cancels the Append operation.
Insert Before
You must select (click on) a transform in the list before you can select Insert Before.
Insert Before changes the Enter button to Insert, and allows you to create one new
transform that is placed immediately before the selected transform, bypassing the rules
that would insert it only in a valid position. The Clear button cancels the Insert Before
operation.
Modify
You must select (click on) a transform in the list before you can select Modify. As a
shortcut, just double-click on the transform. Modify changes the Enter button to Modify
and the Clear button to Cancel, and copies the selected transform back up into the
The result of a modified transform is placed in the transform list in the same position it
was in before it was modified, which may no longer be valid for it if you have changed
any variable names in it. (If you modify one transform, but you don’t change any of the
variable names that it uses for input or output, its position in the list will almost always
still be valid.)
When you modify a transform, you may not use the shortcut syntax described on page
7-11 to enter multiple transforms simultaneously.
Windows that are invoked from trigger buttons, as described on page 7-6, are used only
for creating new transforms and cannot be used to modify them.
Delete
You can select one or more transforms from the list and then Delete them. To select one
transform, click on it; to unselect it, control-click. To select a contiguous group, drag
through them, or click on the first one and shift-click on the last one. After selecting
transforms, select Delete from the menu.
Delete All
If the full transform list is displayed, Delete All deletes them all; if the transform list is
masked, Delete All deletes only the ones that are visible. You are asked to confirm, and
the transforms are deleted and the transform list (if any of it remains) is immediately
recalculated.
Cut
You can select one or more transforms from the list and then Cut them. Cut removes
them from the list, but saves them in an internal Cut/Copy/Paste buffer so you can Paste
them later. The cut transforms remain in the buffer until you exit the Pavilion product or
until you cut or copy again. The contents of the buffer remain intact even if you load a
different dataset.
Copy
You can select one or more transforms from the list and then Copy them. Copy saves
them into the internal Cut/Copy/Paste buffer without removing them from the list. As
with Cut, the copied transforms remain in the buffer until you exit the Pavilion product
or until you cut or copy again. The contents of the buffer remain intact even if you load
a different dataset
of order is that no dataset variable can be used as an output after it has already been used
as an input to a different variable.
User-Defined Transforms
There is some limited capability for the transform calculator to access C functions that
you write, treating them as special-purpose, customized transform functions. Some
examples are provided, and appear in the User-Defined group of functions (but do not
appear in the All functions list). For more information, see Appendix E, User-Defined
Transforms, or contact your customer support representative.
General Rules
Control statements can reference or change variables in the dataset; they can also assign
and query values for Tcl variables, which are variables that do not appear in the dataset
and are known only to the set of control statements.
This chapter explains how to build a model. As an alternative, you may want to use the
auto modeler wizard instead (select File > New > Model).
Building a model involves selecting a dataset to use as the basis for the model,
specifying which variables are inputs and which are outputs, identifying any time delay
relationships that exist between inputs and outputs, and selecting portions of the dataset
for training, testing, and validation sets.
The features for building models are in the Model window. You invoke the Model
window by selecting Tools > Model Builder.
Before building a model, you may need to time-merge the dataset that you will use for
training the model. For more information, see “When Time Merging Is Required” on
page 5-35. After building and saving a model, proceed to Chapter 9, Model Trainer.
Prediction
In a prediction model, the variables are categorized as input or output. You can use a
prediction model for predicting outputs and for performing control and optimization
(also called predict inputs and predict setpoints).
FANN
FANN (focused-attention neural network) models are useful for analyzing processes
having intermediate measurements or indicators that are dependent on the manipulated
controls or external measured disturbance variables.
In a FANN model, the variables are categorized as independent input, dependent input,
or output.
Independent variables include controls that can be manipulated by the user (for
example, by downloading a new setpoint to a PID as long as the PID is not in manual
mode), as well as external measured disturbance variables input to the system, such as
feedstock, that come from outside the system, are measured and known, but are not
dependent on any of the other variables, and cannot be changed by the operator.
when one of the controls is changed. These two stages of the model are identified as
Phase 1 and Phase 2.
A prediction model is structured like this:
Independent
Controls/Externals
Outputs
Model
(Phase 1)
Outputs
Dependent
States Model Predicted States
(Phase 2)
Initial States
Custom
A custom model allows advanced users to specify the number of hidden nodes for the
neural network. Hidden nodes are of interest only to advanced users. The custom model
PCR
Principal components regression (PCR) is a standard multivariate-statistical variant of
ordinary least squares regression where the PCA (principal components analysis)
components of the original inputs are used as inputs to the linear regression. PCR is
useful for reducing the dimensionality of input data in cases where there is an
insufficient amount of data available relative to the number of variables. A complete
discussion of PCR is beyond the scope of this manual. The model builder feature for
PCR models is discussed in “Select PCR Components” on page 8-58.
External
An external model is any set of equations or algorithms that you have prepared and
compiled yourself. If you wish to develop an external model, contact a customer service
representative.
File Menu
For all models, the menu bar contains the File pull-down menu. The operations Load
Dataset, Load Model, and Delete Model File are described in “Pull-Down Menus” on
page 9-2. The operations Save Model, Save Model As, and Clear Model are described
in “Saving or Abandoning the Model” on page 8-62.
select which ones will appear in the Build Model window. Selecting a variable here does
not commit it to being in the model, but simply allows it to appear in the grid.
In the Build Model window, below each variable name is a grid cell that is used to set
the variable’s usage in the model. In the simple case of a process with no time delays,
the variable type definitions are set in the single row of cells immediately below the row
of variable tag names. To mark variables, first click on the variable type, and then click
in the cell under the variable names; you can mark a group of adjacent variables by
dragging through all the cells. To unmark a variable, first click the Clear button, and
then click in the cell under its tag name. The type name will appear in the cell. Any
variable left blank is not used in the model. If you make a mistake, you can change or
clear any variable type as often as necessary. For example, to select flow1 as an input
You must specify a model interval. It should be the same as the time interval between
rows in the dataset. If this interval can be determined automatically, it is displayed;
otherwise, a field is provided for you to fill in, and select units from an option menu.
Time Delays
To model a process having significant time delays between process variable
interactions, specify a time delay value for each variable. The time delay value adjusts
the temporal relationship of a process variable with respect to the other variables in the
process.
The terms tau and time delay are used interchangeably to mean a quantity of sampling
intervals.
2-hour delay
1-hour delay
In the example process illustrated above, a change in Input1 will affect Output1
approximately two hours later, and a change in Input2 will affect Output1
approximately one hour later. The time delay values assigned to these three variables
depend on the data sampling interval and on the position of the zero time delay, the
point where you decide to run the model.
You can also use the same variable more than once in a model. Every process input or
output variable can have as many time delays as necessary, that is, the same variable
may be used with different time delays to serve as multiple inputs and/or outputs in a
model. However, steady-state models typically have only one time delay per variable.
Click No to return to the Model Builder. Click Yes to continue to the Time Delay
Identification window.
The left side of this window lists all variables in the dataset. The right side contains two
lists, one for model inputs and one for outputs. To copy variables from the dataset list
into either of the model variable lists, click or drag on the variable names, then click on
one of the right arrow buttons. An alternative is to click in one of the model variable
lists, which will mark its right arrow button with a light blue highlight; then double-
clicking a variable in the dataset list will copy it into that selected model variable list.
When you finish selecting model inputs and outputs, click Done. A grid appears in the
Time Delay Identification window listing the input variables along the left side and the
outputs along the top.
There is one grid cell for each input/output pair. Each cell indicates the time delay for
the pair and the method for determining the time delay. You set the method by selecting
one or more cells and then selecting the method in the Edit Delay Settings section.
For each cell, specify one of the following methods for identifying time delays:
Automatic
The time delay identification algorithm determines the best time delay based on
nonlinear correlations in the dataset. Specify the range to be searched as a Min and
Max Time Delay, and specify an Increment (a number of time delays) at which to
search.
For example, the range 0-10 and default increment 1 cause the time delay
identification facility to evaluate the nonlinear correlations at time delays 0, 1, 2,
and so on up to 10, after which it selects the time delay having the greatest
correlation. In the grid, automatic delays that you have not yet calculated are
represented by three asterisks (***).
Examine each input/output pair, and use your knowledge of the process as much as
possible to set the range. It is crucial that the range cover the entire area where you
believe the best time delay resides; on the other hand, you can minimize execution
time by not specifying an unnecessarily broad range.
Automatic time delay identification generates sensitivity statistics that you can use
later (in “Finding the Best Inputs” on page 8-21) in determining which inputs, if
any, are not necessary in the model. To take full advantage of the Find Best Inputs
feature later, you may wish to select automatic time delay identification for as many
inputs as you have time and resources to calculate now, or all inputs if possible.
For incremental delays, type in a Min and Max Time Delay and an Increment.
For example, with Min of 1, Max of 8, and Increment of 2, the variable is defined
as an input at time delays 1, 3, 5, and 7.
None
Do not use automatic time delay identification for this input/output pair, nor specify
an explicit time delay. As a result, the pair has no time delay, and the model has no
connection between this input and output. If you determine later that you need to
create a connection between the input and output, do so by adding the input at the
desired time delay in the Model Builder and then use the Edit Connectivity facility
(described on page 8-32) to specify the required connections. In the Model Builder
window, non-fully-connected variables are marked with an asterisk (*).
Note: For use in this Time Delay Identification window, the time delay
values refer simply to the relative time between the two variables in
the selected grid cell, so they should be zero or positive; this differs
from the absolute time delay that will be assigned to each variable
within the model, in which all input variables are zero or negative. The
absolute time delays will be automatically assigned later, when you
create the model from the Model Output window, described on page
8-21.
Writing a Report
The Write Report button can be used to write a file showing, for every pair of input and
output variables, all of the time delays that were considered and the calculated nonlinear
correlation value for each. This button invokes the Enter Filename dialog, which is used
to enter a file Name and Directory for the report.
calculate all automatic time delays, click Configure Model. The Model Output window
appears.
The plot shows only the inputs for which automatic time delay identification has been
performed. Inputs with manually-specified time delays or no time delays do not appear.
The sensitivities plotted are the peaks detected within the Min and Max time delay
boundaries you specified for automatic time delay identification.
Use the Info tool to display variable name, time delay, and peak sensitivity for points in
the plot.
The sensitivities shown in the Find Best Inputs plot may not match the sensitivities you
generate later using the Sensitivity vs. Percent model analysis tool (described on page
10-36) because different algorithms and methods are used in each case. The values
shown by Find Best Inputs, generated by automatic time delay identification, are
statistical equations in the common sense whereas the methods used for Sensitivity vs.
Percent analysis combine statistical techniques with the information available from a
fully-trained neural network model. The values shown in the Find Best Inputs plot are
not as accurate as those generated for Sensitivity vs. Percent analysis, but the Find Best
Inputs values are still sufficient to represent the relative sensitivities of the inputs.
To mark the lower-sensitivity inputs, to the right of the marked area, for removal, click
Apply.
Red points mark the inputs that will be removed from the model when you click Done.
By default all output variables are set to time delay zero, but you can change them to
any other time delay. It is also possible to specify more than one delay per output
variable, but it is not typically appropriate to do so.
The button changes appearance, and the grid expands to a matrix. You cannot expand
the grid before a dataset is loaded.
The names of the variables are displayed across the top. The relative time is listed along
the left side, displayed both as increments of the model interval, and as time units if
known. Boxes appear for you to specify the vertical scrolling limits and the increment
between displayed rows; you can specify these as time units or as rows.
The following sample display shows a model that uses the antifoam variable twice,
once at time delay -8 and again at time delay -4.
Each instance of the antifoam variable would be considered a different variable and
counted separately in the Model Statistics window and any other display that listed
model variables.
This window displays the model variables, with their time delays, in a grid. Each grid
cell marks the intersection of one input and one output. Click one of the top three
buttons on the left to set connections throughout the entire model. If you want to specify
both linear and nonlinear connections in the same model, click Custom Connectivity.
You can select an individual grid cell and specify the type of relationship between that
one input and output. You can specify a nonlinear connection, a linear connection, or no
connection at all. All three selections can be used within the same model. Like the
model builder grid, to use this window you first select the type of connection, and then
click to set a grid cell to that type. Changes that you make in this window are applied to
the model immediately, with no opportunity to cancel. If any cell in a row or column is
empty (connectivity “None”), the affected input and output are marked with asterisks in
the Build Model grid.
Internal Parameters
The Internal Parameter Count fields appear only for custom models having nonlinear
connections or nonlinear outputs.
The Internal Parameter Count indicates the number of hidden units (nodes) used in the
network. The quality of the model is ordinarily not very sensitive to this number;
however, in difficult problems, the overall accuracy of the model can be sensitive to the
number of hidden units.
Given a good testing set, it is impossible to use too many hidden units because the
number of effective parameters in the network always starts at the number of input
variables and grows with training time; over-fitting (too many effective parameters) is
avoided by stopping training when performance on the testing set begins to degrade.
Ultimately, the model reflects the parameters and other settings that produced the
greatest accuracy in the testing set. Given a good testing set, the only drawback to using
a large number of hidden units is that larger networks are slower to train.
The Internal Parameter Count can be set to the default value or to a custom number that
you specify. Before you begin to define a model, there is no default, so it is displayed as
three asterisks, as shown above. The default is calculated and displayed only when you
click the Update Default button. To use a number different from the default, click and
type in the box labeled Use Custom.
CAUTION: Do not use the gain constraints feature unless you know
the true process gains and can specify them adequately. Enforcing
incorrect gains can result in an unreliable model.
The Gain Constraints feature allows you to improve your model in two ways:
Enforce accurate gains for optimization
When using a steady-state model for optimization (predict inputs) or as part of a
Process Perfecter dynamic control model, it is crucial that the steady-state model’s
gains be correct. You can specify minimum and maximum gain boundaries and
other parameters that the trainer uses in forcing the model to take on the proper
gains.
Perform extrapolation training
Ordinarily, a model functions reliably only in the operating range represented in its
training data. Using the Gain Constraints feature, however, you can enforce gains
across the entire input space, thus dramatically improving the accuracy of the
model outside operating regions covered by the training data. Extrapolation
training occurs by default when you specify gain constraints.
To display the gains in a model, use Output vs. Percent model analysis (described on
page 10-22) or Sensitivity vs. Percent model analysis (described on page 10-36). If the
gains differ from what you know to be correct, you may want to utilize the Gain
Constraints feature and retrain the model.
To evaluate your model’s accuracy when extrapolating, or generating predictions
outside the range of its training data, use the What Ifs tool (see Chapter 11, What Ifs).
The display shows gain-constrained training parameters for each input/output pair. The
defaults shown indicate that there are no constraints on the gains. For this default
condition, the model trainer derives the gains entirely from the training data.
To change the gain constraint parameters, select one or more input/output pairs. With
pairs selected, the window displays the parameters.
Note: Extrapolation training does not occur unless you have also
specified gain constraints. Gain constraints may be specified for the
The Gain Constraints window allows you to specify how many random extrapolation
patterns to generate for training.
With the parameters set to generate one extrapolation pattern per ten rows of training
data, roughly 10% of the data is randomly generated throughout the input space
indicated by the variable bounds (described on page 8-65). The extrapolation patterns
are not included when calculating the training and testing errors. They serve only to
expand the response surface across areas having no data. To turn off extrapolation
training, set Number of Extrapolation Patterns to 0.
This warning means that the gain constraints you specified cannot be achieved across
the entire range of the indicated input variable(s).
If the Output Function of the output variable is Nonlinear, the maximum gain that the
model can achieve across the entire range of the input variable is the dataset range of the
output variable divided by the dataset range of the input variable:
(Max_output - Min_output) / (Max_input - Min_input)
where Max_output is the variable bound for the output variable, and so forth.
If the Output Function of the output variable is Linear, however, the model can
accommodate any (finite and reasonable) gain constraint range. These ideas are
illustrated in Figure 1 and Figure 2, below, and may be understood as follows.
The Nonlinear Output Function saturates at the minimum and maximum of the
historical dataset. Therefore, the maximum gain that can exist across the entire range of
the input variable is equal to the slope of the line running from the lower left corner to
Monitoring Training
In the Training window, there is a Gain Constraints button in the Monitors area. This
window displays the Percent of Training Patterns Violating the Gain Constraints for
each input/output pair. This information allows you to see to what extent your specified
gain constraints are being met by the trained model. If, for any input/output pair, the
percentage is close to or less than the Pattern Tolerance parameter you specified for that
pair, the model is conforming to your gain constraint specifications (as well as fitting
the data as well as possible within those constraints). This information is also used to
determine the best epoch (see below). If the percent of patterns in violation is
significantly exceeding your Pattern Tolerance specification, see “Trouble Shooting
Gain Constraints” on page 8-47.
Best Epoch
In gain constrained training, Autostop (as well as the Auto Learning Rate Adjustment)
will be fixed in the Off position. In determining the best epoch in gain-constrained
training, constraint satisfaction must be considered in addition to the test relative error.
The gain constraints have priority; therefore, the best epoch is determined as follows: if
in no epoch the gain constraints have yet been satisfied (within tolerance; see above),
the epoch with the lowest test relative error is the best epoch; otherwise, the epoch
having the lowest test relative error among all of the epochs satisfying the gain
constraints is the best epoch.
Setting Patterns
When a dataset is used in a model, each row of the dataset must be a single sample of all
the input and output variables for the process at a given point in time. All the variables
in a dataset row must be sampled at the same time. (See “Time Merge” on page 5-35 for
information on how to achieve this.) A data pattern, also called a model pattern, is a
complete set of inputs and outputs as presented to the model. If there are time delays
among the variables, a data pattern will contain values gathered from different rows in
the dataset.
dataset) and the predicted output (from the model). The testing pass of the epoch does
not alter the model, but compares its output to the target output of the test patterns to
provide a means of determining the success of the training.
In addition to the required sets of testing and training patterns, it is advisable to reserve
a portion of the dataset for validation. The model is never exposed to the validation
patterns while it is being trained. The final measure of a model’s performance (after
training is completed) should be how closely it calculates the output values for the
validation set (see “Predicted vs. Actual” on page 10-2). The validation set should
normally be a block of data at the end of the dataset (this is the default behavior). It is
not necessary that the validation set statistics match the statistics of the rest of the
dataset (unlike the test set, which should closely match the training set); however, you
must make sure that the validation set does not contain input or output points that are
outside the range of the remainder of the data. It is possible to specify a validation set
from patterns distributed throughout the dataset, instead of using a block of data at the
end, but this is not generally recommended. The validation set is typically optional (for
required use, see “Stiff” on page 9-13); when used, it should ordinarily be roughly
between 5% and 10% of the data.
The training performance of a model can depend on which portions of the dataset are
chosen as the test set. After you save a model, the Model Statistics window (described
on page 8-64) is invoked. You should examine the mean and standard deviation of the
testing and training sets; they should differ by no more than about 5%. The test set
should be at least 10% of the total of training and testing data; between 10% and 20% is
generally sufficient.
Filtering
The model will learn the relationships in the training data. If the training data contains
redundant (or nearly redundant) values, the model will pay more attention to patterns
However, if your process has more than one operational mode, but spends more time in
one mode than the others, it will cause “mode dominance”, in which the model does not
learn all operational modes equally well. If this problem occurs, you can filter out the
redundant data (after the dataset is divided into testing, training, and validation
patterns).
This window is used to specify testing/training/validation sets and filtering. When you
load a dataset and a model, the dataset is first divided into the testing/training/validation
sets, and then the filtering is applied, and finally, any unusable rows are discarded (for
example, at the beginning and end of the dataset, around breaks, and rows containing
invalid points). Because the unusable rows are not discarded until last, the number of
patterns in each group is not necessarily what you initially specified. The Model
Statistics window, which comes up when you save or load a model, contains an option
to write the final pattern selection into a column in the dataset; for more information,
see “Model Statistics Window” on page 8-64.
The test set can be selected at fixed intervals in the dataset, or at random, or according to
values specified in a variable in the dataset. With interval or random selection of the test
set, the validation set is a block of rows; for a customized validation set, you must select
Use Variable for the test set.
Interval
An interval test set is selected by dividing the dataset into blocks. The validation set is
selected first, by specifying a Start row and a number of Samples (or 0 to suppress the
validation set). The number of samples can be greater than will actually fit in the dataset
(for your convenience in specifying “from here to the end” without having to calculate
an exact number of rows).
After the validation set has been reserved, the testing set is specified from the remainder
of the dataset as blocks of data at regular intervals, and all other rows are used for
training. Start is the row number of the first testing row. Testing Samples is the number
of rows in each block of test rows; Training Samples is the number of rows to skip
between blocks of test rows (that is, the number of data rows to use for training). At
least one block of test rows is always included in the test set; Iterations is the number of
repetitions in addition to the first block. Iterations can be greater than will actually fit in
the dataset, but at least one specified block must fit.
The default values can be used in a simple system without time delays. If your process
includes time delays, the number of testing samples in a block should, as a rule of
thumb, be at least twice the size of the span of time delays in the process (that is, the
row 1
Test
row 15
row 16
Train
row 100
row 101
Test
row 115
row 116
Train
row 190
row 191
Validate row 200
Random
With Random test set selection, you specify the Validation Set exactly as for an Interval
test set; then you select a percentage of the remainder of the data to be used for testing.
The default is 15%. If the dataset has a relatively small number of rows, it may be
difficult with Random selection to achieve a good statistical balance between the testing
and training sets; if this occurs, you should select Use Variable instead of Random.
Use Variable
The Use Variable option for test and validation sets is a customized selection used for
unusual situations in which neither Interval nor Random selection is suitable; instead,
you create a variable in the dataset with code values that indicate how each row is to be
used. The variable can be created in the preprocessor using the transform calculator, or
it can be a Pattern Flag variable that was generated automatically and appended to the
No other values are allowed in the variable. You can create the variable by assigning
numeric values to it, or by using the transform constants $m_ignore, $m_train,
$m_test, or $m_valid (described on page A-17).
You must specify which variable to use.
You can enter the variable’s name, or you can click Select Variable to invoke the Test
Set Variable Selection window. This window lists all variables in the dataset; however,
only numeric variables can be selected. When you click OK, this window is closed and
the selected variable’s name is entered in the Select Test Set window.
For example, suppose that you want to divide a dataset into groups of 500 rows, such
that in each group the first 100 rows are used for testing, the next 300 rows for training,
and the final 100 rows for validation. You can create variables called PatternFlags
and showPatternFlags, and assign their values using these transforms:
!PatternFlags! = $findle($row $mod 500, 0, $m_valid, 100,
...continuation of previous line: $m_test, 400, $m_train, $m_valid)
!showPatternFlags! = $ttv(!PatternFlags!)
Train Filter
The filter selection can be FIFO (first in, first out), Nearest Neighbor, or None. A FIFO
filter concentrates on the most recent data; a Nearest Neighbor filter groups data by
similar values in n-dimensional space. The filtering parameters are Cells per Input and
Patterns per Cell. Cells per Input is the number of divisions to make along each axis in
n-dimensional pattern space; Patterns per Cell is the maximum number of similar
patterns allowed before the filter begins to discard patterns. These parameters are
calculated by default, and should not be changed without first consulting your customer
support representative. For more information, see “Filtering” on page 8-49.
If you specify a filter, every time you load the model, you will be asked whether to Use
or Disable the filter.
Additional Features
In addition to the common features described above, each model type has additional
features, which are accessed by controls in the Build Model window.
all components. Click Calculate PCR to calculate and graph the components according
to cumulative variance.
Optionally, you can select unnormalized scaling instead of normalized, the default. In
normalized scaling, calculations are based on values divided by the standard deviation
of the data; in unnormalized scaling, the values are not divided by the standard
deviation. If you change the scaling, you need to click Calculate PCR again. Both
options yield components and scores based on mean-centered inputs (x - <x>, where
<x> is the average of x).
Information appears in the positional help field at the bottom of the window.
Use the Include Left tool to select the number of components for inclusion in the model.
You can also select PCR components by entering a number in the Number of
Components text box. Implicitly, specifying a number of components indicates the
strongest (leftmost) components.
This report is the same as the one produced when you select Write PCA Report in a
PCA plot of a dataset (on page 6-18).
After you have selected the number of components that you want included in the model,
click Done to return to the Build Model window.
When you save the model, the model builder calculates the PCR statistics, effectively
training the model. The PCR model is immediately ready for use. It does not require the
epoch-by-epoch training that other model types require.
To review PCR scores for the dataset, bring up the Predicted vs. Actual model analysis
tool (on page 10-2), specifying that internal parameters should be appended to the
current dataset. For PCR models, the columns added to the dataset are the scores.
Other tools for principal components analysis (PCA) are the $pca transform (on page
A-20) and the PCA plot (on page 6-18).
This window displays the number of patterns in each of the training, testing, and
validation sets, and the total of these three sets; for a number of reasons, this total often
will not equal the number of rows in the dataset (if any variable has unusable values, if
the model has any time delays, if a training filter is being used, and so forth). The mean
and standard deviation of the output variables in each group are also displayed.
If you are interested in seeing how the dataset is grouped into training, testing,
validation, and ignored patterns, you can click the Append Pattern Column button. This
creates a new column in the dataset containing values that indicate each pattern’s use
(-1=ignore, 0=train, 1=test, 2=validation). A message notifies you when the new
column is complete. You can go back to the preprocessor and view the new column; you
can even modify its values, then return to the model builder and modify the pattern set
specification to use this variable. The $ttv transform can be used to create an
additional variable displaying the interpretation of the codes.
When you have just built a new model, it is necessary to check:
• the number of each type of variable, to be sure that the model has been specified
correctly;
• the number of testing, training, and validation patterns: if a validation set is used, it
should be between 5% and 10% of the size of the dataset, and the test set should be
at least 10%–20% of the total of testing and training patterns;
• the means and standard deviations of the testing and training patterns: if they are
different by more than about 5%, the test set is not representative and should be
changed;
• if a validation set is used, it must not contain points that are outside the range of the
remainder of the data (but it is not necessary, or generally expected, that the
validation set mean and standard deviation should match the testing or training
sets).
If any of these requirements are not met, you should return to the Build Model window,
change the test set specification, and save the model again (if the test set is specified
from a variable in the dataset, you must go to the preprocessor and change the variable’s
values, and then load the model, but the model does not have to be saved again). After
these requirements are met, we recommend that you click Variable Bounds to invoke the
Variable Bounds window.
After a model has been trained, the Model Statistics window also displays some of the
statistics of the dataset on which the model was trained (which is not necessarily the
dataset currently loaded). These values are provided for information only; there is no
need for you to check them.
Variable Bounds
When a model is trained or run, the internal calculations are not made using the actual
dataset values of the variables; instead, each variable is linearly scaled to an internal
working range. This is done invisibly to the user, and all data displayed on the screen is
You can also use the gain constraints feature to improve a model’s extrapolation
accuracy, but you can do so only if you are also specifying gain constraints. For more
information, see “Gain Constraints and Extrapolation Training” on page 8-34.
menu or by clicking Variable Bounds in the Model Statistics window that appears when
you save or load a model.
This window is used to display the scaling bounds of the variables in a model, and
change them if the model has not yet been trained. It consists of a menu bar, an Edit
area, a Defaults area, and a scrolled list showing all of the variables in the model, their
lower and upper scaling bounds, and their minimum and maximum in the current
dataset.
If the model has already been trained, you can view the values, but the File menu and
Edit area are disabled, and the Defaults area can be viewed but not modified.
By default, every bound is set to a specified percentage beyond the values in the dataset;
this percentage can be either a percentage of the variable’s values, or of its range. You
can also set any bound to ignore the default and instead use a value that you type in.
If you set the same bounds as a % of Value, however, the bounds will be 0. to 110:
To edit the bounds for any variable individually, click on it and its current bounds and
default flags will appear in the Edit area. If you type a number, that number will be used
instead of the default percentage, and the Use Default toggle button will be turned off; if
you turn on the Use Default toggle button, the corresponding number will be filled in.
You cannot set a bound to be inside the range of the variable’s values in the dataset; if
the dataset contains extreme values that you want the model to ignore, you must use the
preprocessor to change or remove them.
The Save Report button invokes a prompt box asking for a file name and directory path.
When you type in a file name and click OK, the variable bounds information is saved to
an ASCII file. Use the editor (Chapter 3, File Editor) to view or print this file.
This chapter explains how to train a model. If you built your model using the auto
modeler wizard, you do not need to train the model because the auto modeler has
already done so.
The model trainer uses historical data that has been made into a dataset. You should
have already preprocessed the training data according to the instructions in Chapter 5,
Spreadsheet, and the following chapters. You should have already built and saved the
model according to the instructions in Chapter 8, Building a Model.
To train a model, you first load a dataset and the model, select a training type (described
on page 9-12), optionally set training parameters (described on page 9-14), and start the
training. Numeric information is always presented during training, and you can also
view two types of plots (see “Training Monitors” on page 9-18). Training continues
until you stop it, or until the training parameters that you set cause it to stop
automatically.
Pull-Down Menus
The model trainer provides the following pull-down menus:
File Pull-Down Menu page 9-3
Epoch Pull-Down Menu page 9-10
Phase Pull-Down Menu page 9-10
Load Dataset
When you click Load Dataset, the Select Dataset dialog is invoked. This is the common
Select Dataset dialog that was used in the preprocessor. It functions as described in
“Loading a Dataset” on page 5-10.
This dialog lists all models that have been recorded in your current data dictionary, with
their associated comments, if any. It functions exactly like the Select Dataset dialog,
described in “Loading a Dataset” on page 5-10. After you select a model, it is loaded.
When the model is loaded:
• The Patterns selection is applied to the current dataset.
• If you applied a filter to the model (when you created the model, described in
“Filtering” on page 8-49), you will be asked (every time you load the model)
whether to Use or Disable the filter. For a model that has a filter but has not yet
been trained, you should select Use Filter; at other times you may choose Use or
Disable.
This is only a warning, and does not prevent your using the model if desired. Only one
message is produced, even if more than one variable is out of range. You can view the
variable bounds to determine which variables are out of range.
Copy Model
Use Copy Model to save the current model under a new name, without affecting the
model files having the old name. It invokes a version of the Save Model window.
This window is similar to the Save Model window described on page 8-63, but it also
has a Save Training toggle button. If the model that is being copied has already been
trained, this toggle button controls whether the copy retains the training or is copied
without training; if the model that is being copied has not already been trained, this
toggle button is grayed out.
You can save the model in the format used by the current version of the software or in
the format used by a previous version. This feature is useful if you intend to use the
model on a computer that you have not updated with the latest version of the software.
A given version of the software can use models saved in an earlier format, but the
software may not be able to use a model saved in a later format.
Save Model As
This operation is the same as Copy Model, above.
This dialog displays the name of the current data dictionary, and a scrolled list of all
models recorded in it (sorted by directory). If you want to use a different data dictionary,
click in the box, type in its name, and press the Return key, and models in the new data
dictionary will be listed. To select a model from the list, click on it and then click the
Delete button (or, as a shortcut, just double-click on the model). You can type in the full
path name of a model instead. When you click the Delete button, you are asked whether
to delete the model, just remove it from the data dictionary, or Cancel.
Delete permanently removes the files from your disk; Just Remove From DD removes
the data dictionary entry without affecting the disk files. The data dictionary can have
R 2 = 1 – ( rel-err ) 2
If relative error is greater than 1., R2 is undefined and is displayed as zero (0.).
After an extended exposure to the training patterns, the model may begin to “memorize”
or overtrain on the training data. This reduces the total relative error for the training
data, but also reduces the model’s ability to predict accurately from new data. When this
happens, the total relative error for the test data increases. Typically, there is an optimal
training point, at which the test relative error reaches its lowest value. The trainer
detects this optimal training point and saves the best internal parameters for the model.
Later, when you use the model for prediction and optimization, the model uses the
optimal internal parameters.
If the model has more than one output variable, the error measures that are displayed are
a composite over all output variables. To see the relative error and R2 for individual
variables, run a Predicted vs. Actual analysis and view the report file; for more
information, see “Predicted vs. Actual” on page 10-2.
If you add gain constraints to a model and then retrain it, the training and testing errors
will probably change. For more information on gain constraints, see “Gain Constraints
and Extrapolation Training” on page 8-34.
In general, a relative error of 0.035 or less may indicate a useful model.
Training Types
There are three types of training: regular, stiff, and ridge. They are not all appropriate
for every type of model, and are grayed out when they should not be used. You can
partially train a model with one type, and then change to regular or stiff and continue
training; but if you change to ridge, it ignores any previous training and starts over.
Regular
The regular, general-purpose neural net trainer uses gradient descent
(“backpropagation”). It can be used with any model.
Stiff
Stiff training is an alternative method for training neural networks, using the patented
Stiff Differential Equation Solver algorithm developed at Du Pont and licensed from
Du Pont. You cannot use stiff training for linear models.
If the data in the system is “stiff,” as defined in any standard textbook on numerical
methods (many chemical processes are known to be stiff), it may train to a better result
with stiff training than with regular. The regular trainer only uses first derivative
information to seek a solution; the stiff trainer also uses second order partial derivatives
in conjunction with a stiff differential equation solver.
However, the stiff trainer is too compute-intensive to be useful for problems with more
than about 30 total input and output variables, depending on the computational power
available to you. Stiff training can be considerably slower than regular training, and
requires more memory. Time and memory consumption increase rapidly with the
number of variables, and linearly with the number of patterns in the dataset. Stiff
training also defines epochs differently, processing successive epochs at differing
speeds, but will generally converge within ten or fewer epochs. (The first reported
epoch completes almost immediately, but typically has a relatively high error; this
corresponds to the initial state or “zero-th” epoch of regular training.)
This message may or may not indicate an error condition. If you get this message, you
should first check the Error History plot or the error values displayed on the Train
window. If the stiff trainer has converged to a good solution (to a low relative error or
high R2), there is no need for further training.
Ridge
The ridge trainer, which performs ridge regression, is available only for linear models.
If the data from your system is actually linear, the ridge trainer will train it faster and
better than the regular trainer. If a linear model does not train well (to a low relative
error or high R2), then either there is insufficient data, or the data is not linear and you
should go back to the model builder and specify a new model that is not linear.
Training Parameters
Training parameters are set by default; you may, but do not have to, change them. You
can view them at any time, but they cannot be changed while the model is training. If
you want to change any parameters after training has begun, you have to stop training,
make the changes, and restart the training. If you select Stiff or Ridge training, all
parameters except Final Epoch and Train Rel Error are ignored. To inspect or change
training parameters, click Edit Parameters to invoke the Training Parameters window.
Stopping Criteria
No matter what stopping criteria you set, you can always stop training at any time by
clicking Stop Training in the Train window.
The Autostop algorithm is used to recognize that training has stopped improving; it will
cause training to stop if the training relative error begins and continues to increase, or if
it remains essentially unchanged for an extended period. This is useful if you want to let
a model train for a long time (overnight or over a weekend) without watching it, but you
don’t want to consume computer resources unnecessarily. Autostop applies only to
regular training, and is ignored for stiff or ridge. Autostop defaults on, but you are
warned if you start training with it on.
You can train for a particular number of epochs by setting a Final Epoch number. If you
stop training and later re-start it, the epoch numbers continue from where they stopped,
so the Final Epoch number is a cumulative total. The default is 10000.
Sparse data occurs when the model input variables are sampled frequently, but the
outputs are sampled relatively infrequently; for example, you may have 10 minute
samples of the process, but the output comes from a lab analysis performed only every 2
hours. The time merge function (“Time Merge” on page 5-35) interpolates or
extrapolates the output values to the time interval that you specify. The certainty of a
value records whether it already existed before the time merge or whether it was
generated, and, if generated, how far away it was from known data. For more
information, see “Certainty” on page 5-42. If Sparse Data Algorithm is turned on, the
training will be weighted by the certainties, paying more attention to the actual values
provided in the original data than to merged values.
Note: The sparse data algorithm weights the error term on each
pattern by the certainty of the outputs; therefore, in effect it ignores
patterns whose outputs have zero or near-zero certainty. This
weighting method may appear to inflate the R2 numbers from what
would be anticipated from an un-weighted error term.
When you have finished viewing or setting the training parameters, click Done to close
this window and return to the Train window.
For regular training, if you have Autostop turned on, you will be asked to verify it.
After you have chosen to set autostop either on or off, training begins. When training
begins, the Start Training button will gray out, and the Stop Training button will become
active.
If you use a filter with a FANN model (see “FANN” on page 8-2), the filter is applied at
the beginning and end of training of each phase.
Training Monitors
During training you can monitor training performance using the Error History,
Prediction, and if applicable, Gain Constraints monitors, all accessible from the Train
window.
This window plots the relative error for both the training data and the testing data, at
each epoch. The X axis is epoch number, and the Y axis is relative error. If the
Continuous Update toggle button is turned on, the plot is updated at the end of every
epoch; if it is turned off, the plot is updated only when you click the Update button. The
Print button invokes the common Plot Print setup window, described in “Printing a Plot”
on page 6-39.
For a FANN model, you can view and print this plot for either phase only while the
model is being trained for that phase.
For models having multiple outputs, click Select Outputs to specify individual outputs
whose relative errors you wish to plot in addition to the average relative errors. When
displaying multiple plots, you can display them either stacked or overlaid, as indicated
Click Show Table to display the training history. The history shows the train relative
error and the test relative error at each epoch. The Relative Test Error and Relative Train
Error columns show composite errors for all outputs in the model. Subsequent columns
show relative error for each output individually. The best epoch, the one whose training
is used by the model, is marked with an asterisk and colored blue.
This window contains stripcharts of the original dataset values and the model’s
predicted values, for every output variable in the model.
There is an Update toggle button associated with each individual stripchart, as well as
an Update All toggle button. If Update All is turned off, it overrides the individual
Update toggles and prevents all plots from being updated; if Update All is turned on, the
Update toggle on each stripchart controls whether it is updated.
The Selection button invokes the Training Stripchart Outputs dialog, where you select
which output variables to plot. This dialog lists all of the model output variables, for you
to select which ones to be plotted. Note that, for phase 2 of a FANN model, the model
outputs are the predicted values of the dependent variables (rather than the process
outputs); for more information, see “Model Types and Variable Types” on page 8-2.
For every input/output pair for which you have defined gain constraints, the display
shows the percentage of training patterns in the dataset that do not conform to the
constraints. The figures on the left are for the current epoch, and the figures on the right
are for the best epoch.
In this example, it is obvious that the results of the model at epoch 10 are not as good as
in later epochs when the testing and training errors converged. However, the low test
relative error at that point would cause epoch 10 to be remembered as the best epoch.
If you see from the Error History plot that this rare situation has occurred, you should
force the model to stop remembering the aberrant point as the best epoch. This is done
with the Replace Best with Current command, accessed from the Epoch menu.
You must stop training before you can access this command, and then restart the
training. When you select Replace Best with Current, you are asked to confirm.
If you click Replace Best Epoch, a message appears to let you know when the
replacement is complete.
Note that it is not necessary to wait until the model has fully trained before you replace
the best epoch. You can continue to train, and if a better epoch (with lower relative
error) occurs, it will override the epoch that you replaced.
This chapter describes the tools used to analyze a model. The model analysis tools are:
Predicted vs. Actual
Evaluate how well the model predicts process behaviors over the range of data used
to train the model. You can also calculate the residuals of the model. See “Predicted
vs. Actual” on page 10-2.
Sensitivity vs. Rank
Analyze the sensitivity of the outputs to each of the inputs; that is, determine how
much effect each input has on each output. Plot the inputs in order from greatest
sensitivity to lowest. See “Sensitivity vs. Rank” on page 10-9.
Output vs. Percent
From the lower boundary of a given input’s range to its upper boundary and while
holding all other inputs at their average (or other selected) values, plot a curve
showing how the input determines the output. See “Output vs. Percent” on
page 10-22.
This window is used to set up and initiate a Predicted vs. Actual analysis. If you do not
already have a dataset and model loaded, you must load them using the File menu
(described in “Pull-Down Menus” on page 9-2).
In this plot, predicted values are on the Y axis, and actual values are on the X axis.
The reporting and printing options are described in “Generating Reports and Data Files”
on page 10-2.
To display the coordinates of any point, click the Info tool, move the mouse onto the
point, and push and hold the mouse button; the variable’s name and time delay, the
Actual and Predicted values, and the row number of the actual output, will be displayed.
This is the common window that was used in the plotter; but note that the list contains
only the model outputs (rather than all variables in the dataset), and that every variable
is identified both by its tag name and by its time delay. Predicted vs. Actual points are
plotted only for the variables that you select.
Cutting Points
It is best to remove all bad data from the dataset before you begin to build a model, but
sometimes this is not possible. When a Predicted vs. Actual plot shows that most points
in a dataset were modeled well (close to the Perfect Model line), but a few points are
farther off, occasionally the cause is that those points’ values are bad–even though they
are within the range of that variable’s good values. If you investigate such points and
find that they are, in fact, bad values, you can use the Cut Box tool to remove them from
the dataset. You would then have to build and train a new model, that would not use
those points.
CAUTION: Do not cut good values that were not modeled well.
Doing so may invalidate your model. Avoid cutting a point without
understanding why it is bad.
To use the Cut Box or Uncut tool, click on its icon, drag through a rectangular area of
the plot (as you do in the Plot window, Chapter 6, Data Plotter), and then click Apply
(or click a different tool to cancel the marked area). Cuts and uncuts that you apply are
displayed on this plot, but they are not actually applied to the dataset yet. When you
have finished marking cuts and uncuts, you can click Cancel to discard them (and close
the window), or click Done to apply them as transforms to the dataset. You will be
warned that applying the transforms will cause the model to be cleared.
If you continue, the transform calculator may have to reapply some or all of the dataset’s
transform list.
Sensitivity Measures
The three types of sensitivity measures are Average Absolute, Average, and Peak.
Npats
∂o k, i
å
-----------
∂x k, j
k=1
Average Absolute = -------------------------
N pats
where Npats is the number of patterns in the dataset over which the distribution is
calculated, xk,j is the jth input for the kth pattern, and ok,i is the ith output for the kth
pattern.
Average sensitivity is the average of the partial derivatives (actual, not absolute, values),
N pats
∂o k, i
å ∂x k, -j
----------
Average = k---------------------
=1
-
N pats
∂o k, i
Peak = max æ ---------- , k ∈ 1, 2, …, N patsö
è ∂x - ø
k, j
All of these measures are scaled and are calculated for each pair of input and output
variables.
This window is used to set up and initiate a Sensitivity vs. Rank analysis. Sensitivity
may also be referred to as gain. If you do not already have a dataset and model loaded,
you must load them using the File menu (described in “Pull-Down Menus” on
page 9-2).
“Setting Patterns” on page 8-48 describes how the dataset is apportioned into these
groups. (Remember that the Model Statistics window, described on page 8-64, indicates
how many patterns there are of each type. If you select a patterns type that is empty, the
model cannot run.)
If the model was saved with a filter (described in “Filtering” on page 8-49), and you
applied the filter when you most recently loaded the model, then the designation “All”
patterns should be understood to mean “All patterns that remain after filtering.” To
disable a filter that you have already applied to a model, you must load the model again.
The sparse data algorithm, used in training a model, is explained on page 9-16. You
should turn on Sparse Data Algorithm if and only if you used it for training this model.
If you write a report file, it can be Summary or Detailed.
• A Summary report shows, for each model output variable, a list of the model
inputs, ranked in order of Average Absolute sensitivity. Each input variable’s
Average Absolute, Average, and Peak sensitivity are shown, with the pattern
number of the pattern in which the Peak occurred.
• A Detailed report includes, for every selected pattern in the dataset, the per-pattern
sensitivity of every output to every input; thus the length of this file is the number
of patterns times the number of output variables times the number of input
variables, plus header and summary rows.
You can also write the sensitivity values (either scaled or unscaled) to a data file. The
data file will contain one column for every pair of input and output variables in the
model, with the sensitivity (gain) values for this pair at every pattern in the dataset. The
data file can be formatted and read back into the spreadsheet and plotter as a dataset for
plotting and analysis.
After you have finished setting up the Sensitivity vs. Rank analysis, click Run
Sensitivity to run it (or click Cancel to abandon it). Sensitivity vs. Rank analysis can
take as long as training the model for one epoch.
Three toggle buttons control which type of sensitivity is displayed. One line is plotted
for each selected output variable. The Y axis is the sensitivity values; each plotted point
on a line represents that output’s sensitivity to one input variable. On each line, the
inputs are ordered by magnitude of sensitivity (of whichever type is being displayed);
If you are displaying Peak sensitivity, the pattern number at which the peak occurred
will also be shown.
To review the information for all points, generate a report or print the data. See
“Generating Reports and Data Files” on page 10-2.
Selecting Variables
The Select Outputs button invokes the Sens. vs. Rank Outputs dialog. This is the
common window that was used in the plotter; but note that the list contains only the
model outputs (rather than all variables in the dataset), and that every variable is
identified both by its tag name and by its time delay. Sensitivity results are plotted only
for the output variables that you select.
Average absolute sensitivity, then, gives a general indication of the strength of the
influence of an input on an output. Combined with average sensitivity, it can be used to
tell you whether the input-output relationship is linear, monotonic, or without a causal
connection.
The following illustrations show examples of X-Y plots (see “Output vs. Percent” on
page 10-22) of an input and an output variable over the extent of their range, with the
corresponding sensitivity relationships indicated.
Output
Output
Input Input
Average Absolute Sens. Average Absolute Sens.
> |Average Sens.| = Average Sens. = 0
Nonlinear, Nonmonotonic No Causal Relationship
To use the Include Left tool, click on it; it will be highlighted, and the Apply button will
be enabled (not grayed out). Click at any point within the plotting area, and all of the
plot to the left of the point where you clicked will be highlighted.
Click Apply, or click any other tool button to cancel this selection. If you click Apply,
every input variable that corresponds to at least one point in the highlighted region will
be marked Include, and every input variable that is totally outside the highlighted region
In the example illustrated above, the Include region was drawn to the right of the fifth
point, so the first five points on each output line are Included, but several other points
are also Included. An input variable is Excluded only if all of its corresponding points
are outside the Include region.
The Include Box tool is used to Include all variables within a specified rectangular
region of the plot, without changing the status of variables totally outside the region. To
use the Include Box tool, click on it and it will be highlighted, and the Apply button will
be enabled. Move the mouse onto the plot, to one corner of the rectangular region that
you want to specify. Push and hold the mouse button, dragging the mouse diagonally to
the opposite corner, and release the button.
Click Apply, or click any other tool button to cancel this selection. If you click Apply,
every input variable that corresponds to at least one point in the highlighted region will
In the example illustrated above, the selected point is now Included, as well as other
points that correspond to the same input variable.
The Exclude tool is used to Exclude all variables within a specified rectangular region
of the plot, without changing the status of variables totally outside the region. To use the
Exclude tool, click on it and it will be highlighted, and the Apply button will be enabled.
As for the Include Box tool, move the mouse onto the plot, to one corner of the
rectangular region that you want to specify. Push and hold the mouse button, dragging
the mouse diagonally to the opposite corner, and release the button. Click Apply, or
click any other tool button to cancel this selection. If you click Apply, every input
variable that corresponds to at least one point in the highlighted region will be marked
Exclude; variables totally outside the region are not affected.
This dialog lists all of the output variables in your model, by name and time delay. To
cancel the sort, click Cancel. To select a variable, double-click on it, or click on it and
click Select. The input variables in your dataset will be sorted according to the selected
output variable’s sensitivity to them (if an input variable occurs in the model with
multiple time delays, it is sorted according to the largest sensitivity at any of its time
delays). A message will be displayed when the sort is complete. The dataset will remain
sorted for as long as it is loaded. The sorted version is not saved unless you save it
explicitly using one of the Save Dataset commands in the File pull-down menu.
CAUTION: If you use the Include and Exclude tools to modify the
model, your work will be lost if you do not save the modified model
from the File menu. If you select Save Model from any other window
that you may have open, only the current model will be saved, not the
modified model that you specified in this window.
Save Dataset saves the dataset under its current name; Save Dataset As invokes the
common Save Dataset dialog (described in “Save Dataset and Save Dataset As” on
page 5-50) for you to specify a new name.
Save Model saves the modified model under the same name as the original model,
destroying the original model; Save Model As invokes the common Save Model dialog
(described in “Save Model and Save Model As” on page 8-63) for you to specify a new
name. If you modified the model, you must train it before you can use it.
the other is held constant, and then the first one is held constant while the second one is
stepped.
If analysis results to not reflect what you know to be true of your process, consider
using gain-constrained training. For more information, see “Gain Constraints and
Extrapolation Training” on page 8-34.
Stepping a Variable
For Output vs. Percent, there are two ways to define which variable is stepped, and
through what range it is stepped. Recall that a dataset consists of a set of raw variables
obtained from the process history, which are then operated on by transform functions,
producing a set of transformed values from which a model is built. The dataset in its
after-transforms state can include variables that are unchanged from their raw values,
variables whose raw values have been modified by the transforms, and newly created
variables generated by transforms.
• You can choose to work with only the model variables. One model input variable is
stepped from the minimum to the maximum of its Variable Bounds (dataset range
bounds by default), while all other model inputs are held at a constant value (either
the average in the dataset on which the model was trained, or any value that you
specify). The combination of values at each step is used as a data pattern which is
input to the model. This is suitable for most models, but it does not preserve any
interrelationships that transforms define among variables, and it defines the step
interval linearly according to the variable’s transformed values.
• Alternatively, you can “look behind” the transforms and work with the before-
transforms values that are input to the transform list. One dataset variable is stepped
from the minimum to the maximum of its analysis variable range (AVR) while all
the other dataset variables are held at their analysis value (see “Before Transforms
Properties: Analysis Variable Range” on page 5-45). The combination of values at
each step is then fed through the transform list to calculate transformed values, and
the data pattern input to the model is extracted from these transformed dataset
values. This method preserves the variable interrelationships defined by transforms,
and it defines the step interval linearly according to the variable’s actual values in
Example
For the dataset and model in this example, we will suppose that:
• The dataset and the model include additional variables that are not relevant to this
discussion.
• Raw variables, which correspond to tags in the data system, include A, B, C, D, T1,
T2, and T3. We will not specify what type of measurement is represented by A, B,
C, or D; T1, T2, and T3 are three temperature measurements around the same part
of the process, so they would be expected to have similar values.
• A actually ranges from 70. to 130., and B actually ranges from 1485. to 1603.; but
the file that was input contains some erroneous or invalid values that appear as
99999. The bad values were removed from A with a $MarkCut transform, which
removes points by their row number; the bad values were removed from B with a
$CutAbove transform, which removes points by their value.
• C has a large range in which small variations are significant, so a $ln transform
has been applied to it.
• A computed variable, 2D, was created as twice the value of D. The model uses as
input both D and 2D.
• A computed variable, avgT, was created as the average of the three temperature
measurements. The model uses this average instead of any one of the measured
temperatures.
Analysis Variable
Raw Data Values Range of Raw Data Range (modified)
(Before Transforms) A 70. 99999. A 70. 130.
B 1485. 99999. B 1485. 1603.
A B C D T1 T2 T3 C .1 1000. C .1 1000.
D 711. 1492. D 711. 1492.
T1 512. 768. T1 512. 768.
T2 508. 760. T2 508. 760.
T3 515. 771. T3 515. 771.
Transform List
!A!=$MarkCut(!A!,100,101)
!B!=$CutAbove(!B!,2000.)
!C!=$ln(!C!)
!avgT!=(!T1!+!T2!+!T3!)/3.
!2D!=2*!D!
Range of
Transformed Dataset Transformed Data Variable Bounds
A 70. 130. A 70. 130.
A B C D T1 T2 T3 2D avgT B 1485. 1603. B 1485. 1603.
C -1. 3. C -1. 3.
D 711. 1492. D 711. 1492.
T1 512. 768. T1 not used in model
T2 508. 760. T2 not used in model
T3 515. 771. T3 not used in model
2D 1422. 2986. 2D 1422. 2986.
avgT 511.7 766.3 avgT 511.7 766.3
This window is used to select one output variable and one or more input variables for
the Output vs. Percent analysis.
If you do not already have a dataset and model loaded, you must load them using the
File menu (described in “Pull-Down Menus” on page 9-2).
If the Source is Transformed Averages or Current Model Values, the lower half of the
window is used for Model Input Selection, and lists all input variables in the model, as
shown above (however, for a FANN model, only the Independent variables are
analyzed, so the Dependent variables are not listed; for more information, see “Model
Types and Variable Types” on page 8-2). If the Source is Raw Analysis Values, the
lower half of the window is used for Raw Variable Selection, and lists all raw variables
in the before-transforms view of the dataset.
In each case, you can select one or more variables to be stepped. (If you don’t select any,
they will all be selected automatically.) It is possible to select any raw variable in the
dataset, but for the purpose of this analysis, it only makes sense to select a variable that
is an input to the model, either directly or through transforms.
Every variable that is not being stepped is held at a constant value. When the Source is
Transformed Averages, the constant is each model input variable’s average value (in the
dataset on which the model was trained). When the Source is not Transformed
Averages, the Initialize menu in the menu bar is used to set values for all variables that
are not stepped.
• When the Source is Current Model Values, the constant is each model input
variable’s Current Value. The Current Values of the variables can be changed
whenever you run the model, so they will vary depending on what you have done
with the model most recently. They can be set by selecting Model Inputs from the
Initialize menu.
• When the Source is Raw Analysis Values, the constant is each raw variable’s
Analysis Value; the Analysis Value can be set in the Preprocessor Properties
window (see “Variable Properties” on page 5-43), or by selecting Raw Variables
from the Initialize menu.
This window is used to set the Current Values for model input variables. Each model
input variable is listed by name and time delay (tau), with a slider bar ranging from the
minimum to the maximum of its Variable Bounds, and a text field containing its Current
Value. You can change any Current Value either by dragging on the slider bar, or by
typing in the text field. The Reset to Averages button will change every variable’s
Current Value back to its average value in the dataset on which the model was trained.
The File menu contains two entries, Save Model and Copy Model. When you save a
model, its Current Values are saved with it. If you specify a set of Current Values that
you want to retain permanently with the model, you can save it under its current name
with Save Model, or save it under a new name with Copy Model, which operates as
described in “Save Model and Save Model As” on page 8-63.
For numeric variables, there is also a slider that can be used to change the Analysis
Value, as an alternative to typing it. The AVR was specified in the preprocessor when
the dataset was created. To specify a set of raw values, type in the text boxes or drag the
sliders. To change the Analysis Values of all numeric variables back to the midpoint
value of their AVR, click Reset to Midpoint.
It displays the name of every model input variable, with its Current Value and minimum
and maximum Variable Bounds. After you set Analysis Values for the raw variables,
you can click Apply; the dataset’s transform list will be applied to the Analysis Values
of the raw variables, generating transformed values for all variables in the dataset.
These transformed values become the Current Values of the model input variables. (If
any variable occurs in the model with multiple time delays, this process causes the
Current Value at every time delay to be the same, so the model input variables are each
listed only once, without time delay information.)
However, if any raw variable’s AVR has not been set to appropriate values, it is possible
to assign to it an Analysis Value which, after the transform list has been applied to it,
results in a transformed value that is outside the range of the model’s Variable Bounds.
If you get an error message stating that a variable is out of its allowed range and that you
need to change the AVR to put the variable back within the variable bounds, you must
look at the transforms to see which raw variables are inputs to that model variable (it
may be that there is only itself), and be sure to set a valid AVR for each of the raw
variables. You can change them from the AVR menu, which operates as described in
“AVR Menu” on page 5-59; for more information about AVR and Analysis Value, see
also “Before Transforms Properties: Analysis Variable Range” on page 5-45.
The File menu contains entries Save Dataset and Save Dataset As. If you change any
Analysis Values or Analysis Variable Ranges, you can save the changes permanently
with the dataset. Save dataset writes the changes into the current dataset; Save Dataset
As invokes the Save Dataset dialog (described in “Save Dataset and Save Dataset As”
on page 5-50) so you can save the changed dataset under a different name.
After you have finished setting Analysis Values, click Done to close this window.
This window displays the results for one model output variable, showing how the output
is affected by selected variables as they move through their range of values. Each line in
the plot corresponds to one of the selected variables, either a model input variable, or a
before-transforms raw variable, depending on the Source that you selected in the Output
vs. % Selection window. As a given variable is stepped through its range of values, the
other variables are held at fixed values. Each point on the line corresponds to one value
through which the variable was stepped. The Y axis is the output variable’s calculated
value. If plotting multiple inputs, the X axis is the percent of the stepped variable’s
range, from 0 to 100. If plotting one input, the X axis is the input’s value. If Source was
Transformed Averages or Current Model Values, this range is the variable bounds; if
Source was Raw Analysis Values, this range is the analysis variable range. If the error
conditions described on the previous page cause any points to be invalid, those points
are plotted with a large colored dot; any other points on the plot can still contain useful
information.
To display the coordinates of any point, move the mouse onto the point and press and
hold the mouse button.
The display shows the name of the input variable (and time delay, if it is a transformed
model variable), name and time delay of the output variable, value of the output, and
percent of the input’s range.
The reporting and printing options are described in “Generating Reports and Data Files”
on page 10-2.
Selecting Variables
The Select Inputs button invokes the Output vs. % Model Input Selection window, or
the Plot Variables Selection window, depending on whether you stepped model input
variables or raw variables. In either case, the variables listed are only the ones that you
have already calculated; this selection is to determine which calculated variables are to
be displayed.
∂ ∂
o( x ) ≡ o ( x ) ;( x 1, x 2, x j ≠ k, …, x Nin ) = Constant
∂ xk ∂ xk
xk xk
The major operational difference between Sensitivity vs. Percent and Output vs. Percent
is that only the model input variables can be stepped. Each selected variable is stepped
within its Variable Bounds; each variable not currently being stepped is held constant,
either at its average value, or at an arbitrary Current Value that you set.
This window displays the results for one model output variable. Each line in the plot
corresponds to one model input variable whose values were stepped. Each point on the
line corresponds to one value through which the variable was stepped. The Y axis is the
output variable’s sensitivity to the input; the X axis is the percent of the variable’s range,
from 0 to 100. This range is the Variable Bounds.
The reporting and printing options are described in “Generating Reports and Data Files”
on page 10-2.
Selecting Variables
The Select Inputs button invokes the Sens. vs. % Model Input Selection window. The
variables listed are only the ones that you have already calculated; this selection is to
determine which calculated variables are to be displayed.
In a What If, or predict outputs, study, you specify a value for every input variable in
your model, and the model predicts values for all of the output variables.
Use the What Ifs tool after you have built, trained, and analyzed your model according
to the instructions in Chapter 8, Building a Model, Chapter 9, Model Trainer, and
Chapter 10, Model Analysis Tools.
To invoke the What Ifs tool, select Tools > Model What Ifs.
If a FANN model is loaded, the variables are grouped as Independents, Initial values of
Dependents, Outputs, and Predicted values of Dependents.
Display Components
The display for each variable consists of its name and tau (time delay), Utilities button,
setpoint display bar, numeric display, and Action Menu Selection region.
Input Setpoint Editors are described on page 11-20; Output Setpoint Editors are
described on page 11-23; Stripcharts are described on page 11-17; and Clamping is
described on page 11-21.
To the right of each variable listing is the setpoint display bar (“setpoint widget”). This
display bar is your visual key to the behavior of the system. It is a graphical interface
that allows you to set initial and desired values, and other parameters that control the
calculations. When the mouse is over any indicator in this area, its name and value are
displayed in the positional help at the bottom of the window. You can change these
values by dragging on the indicators. While you drag, a popup shows the parameter’s
name and value. As calculations are made and a variable’s value is changed, the
indicators move to display the new value. All of the values that can be displayed and set
in this display bar can also be set in the Setpoint Editor, so they are described in
“Setpoint Editor: Inputs” on page 11-20 and “Setpoint Editor: Outputs” on page 11-23.
The Setpoint Editor for each variable is accessed from its Utilities menu, as shown
above.
To the right of each setpoint display bar is a numeric display that shows a value for each
model variable. When you have just been using the graphical interface to change values
of components of the setpoint display bar, the value that you set most recently is
displayed here. After you set up and initiate a calculation, the calculated final value is
displayed here.
The numeric display and the area behind it are also used as the Action Menu Selection
region to select and unselect model variables; a highlight in this region indicates that the
variable is selected for the Action menu. All selected variables are affected by any
actions that you take using the Action menu in the menu bar, described on page 11-6. To
select a variable, click on its Action Menu Selection region; to select multiple
contiguous variables, drag on them; to select additional noncontiguous variables, make
an original selection, then control-click or control-drag on the additional variable(s). To
unselect a variable, control-click on it. To unselect all variables in a section of the
screen, click the Clear Selection button in that section. The setpoint widget below shows
the selected variable highlighted at the right:
Menu Bar
The menu bar provides the following pull-down menus and option menus:
File Pull-Down Menu page 11-6
Action Pull-Down Menu page 11-6
Edit Pull-Down Menu page 11-6
Mode Option Menu page 11-6
Raw Midpoints
One pattern of inputs is generated by applying the dataset’s Transform List to the
midpoint of each raw variable’s Analysis Variable Range (which was specified in
the Preprocessor, and can be changed in the Raw Table Editor; for more
information, see “Raw Table Editor” on page 10-31).
Raw Analysis Values
One pattern of inputs is generated by applying the dataset’s Transform List to the
raw variables’ Analysis Values. The Analysis Values were originally set in the
Preprocessor, and can be changed using the Raw Table Editor. The Edit menu in the
menu bar can be used to open the Raw Table Editor.
DCS
This option is not supported in the Insights system.
Analysis Values and Raw Midpoints use the dataset’s Transform List,
so they automatically preserve the interrelationships defined by
transforms among the model variables.
Other Controls
Continuous Update
The Continuous Update toggle button is useful when Source is Transformed Dataset
and you use the Run button (described below) to process many patterns sequentially
(rather than stepping one pattern at a time).
If Continuous Update is turned off, the display on this window is not updated until the
run is complete (either you reach the end of the dataset, or you click the Stop button); if
Continuous Update is on, the display is updated at every pattern. Using Continuous
Update will significantly increase the processing time.
Edit
The parameters that appear in the setpoint display bars are explained in “Setpoint
Editor: Inputs” on page 11-20 and “Setpoint Editor: Outputs” on page 11-23. You can
change any of these parameters by dragging on its indicator in the setpoint display bar.
When you do so, its name and value appear in the Edit area.
You can fine-tune a parameter value by typing in the Edit area and pressing the Return
key.
This window is used to specify reporting options. The label on the Reporting button in
the Setpoints & What Ifs main window changes to indicate whether you have turned
reporting on or off in this window.
You can write a Report File containing Summary information only, or information on
the model Outputs by Pattern, or information on the model Inputs and Outputs by
Pattern. The default filename is displayed; to use a different file or directory, click in the
box, type its name, and press the Return key while the mouse is still in the box.
You can write a Data File at any time, but generally it is most useful only when Source
is Dataset (see “Source Option Menu” on page 11-7). For every pattern in the dataset, it
writes a record containing the initial and final values of every input variable, and the
predicted and original values of every output variable. A single record at the top of the
file identifies each variable. This file can then be formatted and read into a dataset. The
default filename is displayed; to use a different file, click in the box, type its name, and
press the Return key while the mouse is still in the box.
After the calculations have been made, you can click Show Report to invoke the Report
window, to display the Report file. You can also view this file with the editor
(Chapter 3, File Editor), and you can view or print it using your system resources.
Report File
All report files begin with the common information listing of all constraints and other
parameters for all variables.
If Mode is Predict Outputs and Source is not Transformed Dataset:
• The Summary and Outputs by Pattern reports both show the predicted and actual
(original) values, and Relative Error, for each output variable.
• The Inputs and Outputs by Pattern report includes the initial and final values for
input variables.
If Mode is Predict Outputs and Source is Transformed Dataset:
• The Summary report shows the Relative Error and R2 for each output variable, as a
composite over the entire dataset.
• The Outputs by Pattern report includes the predicted and actual (original) values,
and Relative Error, for each output variable at each pattern in the dataset.
• The Inputs and Outputs by Pattern report includes the initial and final values of
each input variable, at each pattern.
The toggle switches control whether some of the setpoint parameters that you set are
visible in the setpoint display bars on the screen. (All parameters that you set are
operational, regardless of whether they are visible, but they cannot be changed on the
main screen if they are not visible there.) Settings are made separately for each type of
variable. When Mode is Predict Outputs, you cannot make Desired Value or Range
visible.
If you turn on Confidence Interval, the dataset is checked; if the current dataset is not
the one on which the model was trained, it may be impossible to compute the
confidence matrix. It is safest to generate the confidence matrix only with the dataset on
which the model was trained. Next, you will be warned that it may take several minutes
to initialize, and you will have the opportunity to cancel. This is a one-time initialization
for each model, which usually takes about twice the amount of time as to do a run
through all patterns in the dataset. The confidence information is saved permanently
with the model. If you turn Confidence Interval on, you can change the reporting
interval for each output variable in its Setpoint Editor (they all default to 90%), or you
can change it for one or more variables using the Action menu. For more information,
see “Confidence” on page 11-25.
If Source is any selection except Transformed Dataset, the Step button is used to
make calculations; Stop, Run, Run Speed, and Current Row are ignored.
If Source is Transformed Dataset:
The Step button is used to make calculations for one pattern at a time. The Run button is
used to initiate continuous calculations through all selected patterns in the dataset. The
Stop button stops a continuous calculation. The Run Speed slider controls relative run
speed (slowest at the left, and fastest at the right); slow is useful if you have Continuous
Update turned on and want to view the step-by-step changes in values.
The Current Row number in the dataset is displayed; to move to a different row number
(for example, to move back to the beginning of the dataset), click in the box and type in
a new number. Since Current Row displays the row number that has just been
processed, if you type in a number and click Step, the row that is processed will be the
next valid row after the number that you typed. If you have selected Train, Test, or
Validation in the Patterns menu, or if there are breaks in the data, additional rows are
skipped.
When Current Row is 0, the “Current Row” label is grayed out, and you cannot change
to any other row until you Step or Run the model to move to an active row. Current Row
is set to 0 when you initially load a model, or any time you type in a row number that is
out of range.
Note: The examples below assume that you used the default Test Set,
such that the first 15 of every 100 rows are reserved for Testing, and
the other 85 rows are used for Training; and that you did not use a
Validation set. If you did not use the default Test Set, or if you did use
a Validation set, the examples will change accordingly. The dataset in
the example has 300 rows.
If you select Train in the Patterns menu, you can process only dataset rows that are part
of the Training Set (that is, 16-100, 116-200, 216-300). If you type a row number that is
at the end of one block of Training rows, the next row processed will be the first row in
the next Training block; for example, if you type 100, the next step will be row 116. If
you type a row number that is outside the range of the Training Set (any number less
than 16 or greater than 300), it will be changed to 0, and the next step will be the first
Training row (16). If you type a row number that is within the range of the Training Set
but is not a Training row, it will automatically be changed to the next valid Training row
number, and when you step, the following row will be processed; for example, if you
type 112, it will be changed to 116, and the next step will process row 117. Behavior for
these cases, and for comparable cases when you select Test in the Patterns menu, is
summarized in the table below. Boundaries around breaks in the data are handled
similarly.
All 0 0 1
Training Set
All 1 1 2
Test 300 0 1
200
201 Train 0 0 16
Test Set
215 Train 1 0 16
216
Train 16 16 17
Stripcharts
You can display two styles of stripchart plots: each variable can be in a separate
window, or all selected variables can be in a single scrolled window. Be aware that
updating stripcharts requires considerable resources and will significantly slow the
calculations.
To display stripcharts in individual windows, select Stripchart in each variable’s
Utilities menu.
Plot lines will appear in the stripchart the next time you step or run the model.
The Update toggle button on each plot, and the Update All toggle button at the top of
the window, work together to control whether plots are updated continuously: if Update
All is off, nothing is updated, regardless of the state of the individual Update’s buttons;
if Update All is on, the individual Update buttons control updating on each plot. The
stripcharts within this scrolled window cannot be resized vertically.
For Input variables, the Initial and Final values (see “Values, Clamping, Priority” on
page 11-21) are plotted; for Output variables, the Original and Predicted values are
plotted.
Note: There is no limit to the number of variables that you can display
in individual stripchart plots, but most window manager systems limit
the number of windows that you can have open simultaneously. If you
attempt to exceed this system limitation, you may crash your
machine.
The upper left region of the Setpoint Editor contains the variable’s tag name and time
delay, and an enlarged view of the setpoint display bar with its min and max scaling
values. The setpoint display bar is a graphical interface that allows you to set initial and
desired values, and other parameters that control the calculations. Values of these
parameters can be changed by dragging on their indicator (either here in the Setpoint
Editor, or in the Setpoints & What Ifs main window), or by typing in the text boxes
provided in this window. Changes that you make in this window do not appear in the
Setpoints main window until you click Apply or Done. Scaling can be changed to any
values that you wish.
Move the mouse over any of these indicators, and its name and value will be displayed
in the positional help area at the bottom of the window.
If any variable in the dataset is specified in the model with multiple time delays, any
changes that you make to Hard Constraints or Rate Constraints are applied to all time
delay instances of that variable.
For Current Value clamping, they point outward from the Final Value:
Statistics
The variable’s statistics in the model and in the current dataset are displayed for
information only.
Constraints
Constraints have no effect in Predict Outputs mode.
Error Computation
Error computation has no effect on Predict Outputs.
Confidence
Confidence appears in the Setpoint Editor, but is not currently operational for any input
variables.
Control Buttons
Cancel closes the Setpoint Editor window without applying any changes. Apply applies
the changes without closing the window. Done applies the changes and closes the
window.
Desired value
(green bar)
The upper left region of the Setpoint Editor contains the variable’s tag name and time
delay, and an enlarged view of the setpoint display bar with its min and max scaling
values. The setpoint display bar is a graphical interface that allows you to set original
and desired values, and other parameters that control the calculations. Values of these
parameters can be changed by dragging on their indicator (either here in the Setpoint
Editor, or in the Setpoints & What Ifs main window), or by typing in the text boxes
provided in this window. Changes that you make in this window do not appear in the
Setpoints main window until you click Apply or Done. Scaling can be changed to any
values that you wish.
Move the mouse over any of these indicators, and its name and value will be displayed
in the positional help area at the bottom of the window.
Values
Original Value is used after you run a pattern through the model, to display the original
value that came from the Source. If Source is Current Screen, Original Value has no
meaning. Original Value is represented by a red triangle in the top half of the setpoint
display bar.
Final Value is used for display only, and cannot be changed. When Mode is Predict
Outputs, Final Value is the predicted output. Final Value is represented by a blue bar
bisected by a vertical red line.
Statistics
The variable’s statistics in the model and in the current dataset are displayed for
information only.
Constraints
Fuzzy constraints have no effect when Mode is Predict Outputs. Hard Constraints and
Rate Constraints are not operational for any output variables.
Error Computation
Error computation for output variables has no effect on Predict Outputs.
Confidence
Confidence is operational only for output variables. Confidence can be controlled in the
Setpoint Editor only if it has already been turned on in the Setpoint View Parameters
dialog, described on page 11-12.
The Confidence range is based on the same concept as the standard student’s t-statistic.
It basically means that there is a certain confidence, or percentage, that the true process
output will fall within the given interval or “error bars” that bracket the model’s
predicted output. For example, if the Confidence selection is chosen to be 50%, the error
bars are typically relatively small and there is a 50/50 chance that the true output will
fall within the interval around the predicted output, whereas for the same prediction, if
we selected a Confidence of 99.9%, the error bars would be relatively larger, and we
would have a 99.9% probability that the true output is within the error bars of the
predicted output.
After the run or step is completed, the setpoint display bars are updated to display a
black horizontal bar that indicates the lower and upper limits of the Confidence range at
the selected percentage, and the size of the range is displayed in the Setpoint Editor;
these values are positive numbers interpreted as a range around the calculated result.
Most commonly, both of these numbers will be the same. A narrow range generally
corresponds to a lower confidence selection:
If you change the Confidence percentage selection after stepping the model, it has no
effect on the displayed confidence range; you must run or step the model in order to
calculate the confidence range for the changed percentage selection.
Control Buttons
Cancel closes the Setpoint Editor window without applying any changes. Apply applies
the changes without closing the window. Done applies the changes and closes the
window.
Final value
(blue bar)
Desired value
(green bar)
The dataset does not have to be the one on The dataset does not have to be the one on
which the model was trained, but it has to which the model was trained, but it has to
include variables that have the same names, include variables that have the same names.
and the dataset mean and standard deviation The values contained in this dataset are not
should not differ from those of the training used.
dataset by more than about 5%.
Set Source Transformed Dataset, and make Select a Source; and in all cases, set Patterns
a selection from the Patterns menu. to All.
• If Source is Transformed Averages, inputs
to the model are the average of each vari-
able’s transformed values that were used
to train the model.
• If Source is Current Screen, inputs to the
model are the values that you set as each
variable’s Initial Value, either in the set-
point display bar on the Setpoints & What
Ifs window, or in the Setpoint Editor.
• If Source is Raw Midpoints, one pattern of
inputs is generated by applying the
dataset’s Transform List to the midpoint
of each variable’s Analysis Variable
Range. The AVR was specified in the Pre-
processor, and can be changed using the
Raw Table Editor, which is accessed from
the Edit menu in the menu bar, and is
described on page 10-31.
• If Source is Raw Analysis Values, one pat-
tern of inputs is generated by applying the
dataset’s Transform List to the raw vari-
ables’ Analysis Values. The Analysis
Values were originally set in the Prepro-
cessor, and can be changed using the Raw
Table Editor.
7. Turn on any stripchart plots that you may want to view, using
the Action menu or each variable’s Utilities menu. Stripcharts
are not generally used if Source is any value other than Trans-
formed Dataset.
For input variables, the Initial and Final values are plotted;
Initial is the value specified in the Source, and Final is the
same value unless clamping forces it to a different value.
For output variables, the Original and Predicted values are
plotted; Original is the value specified in the Source, and Pre-
dicted is the model’s calculated result.
If you have modified any input variable from If Source is not Dataset, the Original value
its values as contained in the dataset, you does not provide useful information.
should not expect the Predicted output val-
ues to match the Original values from the
dataset.
You can step through the dataset one Click Step, and the model calculates the out-
selected pattern at a time, or run through all put values that would be produced from the
selected patterns continuously. specified settings of the input variables. The
• If you want to run only one pattern, click setpoint display bars and numerical displays
Step. The model calculates the output val- will be updated.
ues that would be produced from the
inputs at the Current Row, and update all
setpoint display bars and numerical dis-
plays to show the Final values of inputs
(clamping applied to the Source values)
and Predicted values of outputs.
• To run through all selected patterns con-
tinuously, set Continuous Update and Run
Speed as you wish, and click Run. Current
Row will be updated as each pattern is
processed. If Continuous Update is on, the
setpoint display bars and numerical dis-
plays will also be updated at each pattern.
If you want to stop before the end of the
dataset is reached, click Stop.
To move back to the beginning of the
dataset, or to move to any other particular
row number, change the Current Row as
explained in “Stop, Step, Run, Current Row,
Run Speed” on page 11-14.
compare(string1,string2[,codes])
Compares the values of two string variables or constants, and returns a whole
number. The optional codes are expressed as a single argument within a single
pair of quotes. The returned values are 1 if equal and 0 if not equal, unless code
'L' is specified. The codes are not case-sensitive.
L Lexical comparison, returning -1 if string1 is lexically less than
string2, 0 if they are equal, and +1 if string1 is lexically
greater.
S Short compare: if string2 is shorter than string1, compare only
up to the length of string2.
T Before comparing, trim leading and trailing spaces from string1.
U Ignore case.
copyBreak(var)
If variable has no date/time pointer, returns value and status of variable. If
variable’s date/time pointer is Break, returns value of variable with status
Break. If variable’s date/time pointer is not Break, returns value and status of
variable.
copyRows(copyToVar, copyToStartRow, copyToEndRow,
copyFromVar,
copyFromStartRow[, copyFromEndRow])
The arguments are variables to copy to and from, with starting and ending row
numbers in each. The variables can be of any type but both must be of the same
type. This function copies one or more cells from the second variable and
pastes them into the first variable. If only one cell is copied, the
copyFromEndRow can be omitted, and if the copyToEndRow is greater
than copyToStartRow, the cell is pasted multiple times. If more than one
cell is copied, the copyFromEndRow must be specified, and the length from
copyToStartRow to copyToEndRow must equal the length from
copyFromStartRow to copyFromEndRow. The output of this transform
must be the same as the copyToVar.
corr(var1,var2[,count])
This transform takes as input two numeric variables and an optional count,
which defaults to 10 if omitted. It calculates the correlation coefficients of
var1 with respect to var2, for time delays (row number shifts) from -
count to count (so the result has count*2+1 rows).
day(dtVal)
Returns a whole number indicating the day of the month (1-31) of the input
date/time value.
deleterows(var,startrow[,endrow])
The arguments are a variable of any type and one or two row numbers. It
deletes rows from the variable, shifts any remaining rows up, and reduces the
variable’s length. If the endrow is omitted, only one row is deleted. The
output of this transform must be the same as the input variable.
delta(var[,n])
The two arguments are a variable and an integer n. The second argument may
be omitted and is then assumed to be 1. The result is the value of the specified
variable in row current+n minus the value in the current row; except that, if
row current+n is beyond a cell with Break status, or beyond the end of the
column, then the last valid result is repeated.
descend
See $sort described on page A-23.
differs(var)
Takes as input a variable of any type; returns 1 if the value in the current row is
different from the value in the previous row, and 0 if they are the same. If the
current row has any bad status, the result is Error; if the current row has a good
status but the previous row has a bad status, the result is 1.
double(x)
Causes the result of the variable, constant, or calculation x to be in double-
precision. If the output is a new variable, it will be of type double-precision,
but if the output variable already exists, its type is not changed.
dt(dtVal)
Causes the result of the date/time variable, constant, or calculation dtVal to
be a date/time. If the output is a new variable, it will be of type date/time, but if
the output variable already exists, its type is not changed. For example, from
!newdt! = \4/25/80 08:30:00\
the type of newdt is real, but from
!newdt! = $dt(\4/25/80 08:30:00\)
the type of newdt is date/time.
dtadd(dtVal,increment)
Returns a date/time value generated by adding increment to dtVal.
Increment is specified as described on page A-2, and can be negative. For
specified format. The format is specified as for the formatter (see “Display
Format” on page 4-19), and must be surrounded by quote (") characters. If the
format is omitted, the default format is used (mm/dd/yy hh:mm:ss).
dup(string,n)
Duplicates the contents of a string n times, where n is a positive integer
constant. For example, $dup('abc',3) returns the string abcabcabc.
The output is a string variable.
duprows(var,startrow,endrow)
The arguments are a variable of any type, and two row numbers. It copies the
variable’s value in the first row of the specified region and duplicates it into all
rows of the region. The output of this transform must be the same as the input
variable.
dwt(var,direction)
The arguments are a numeric variable, and a direction constant that is
either $forward or $inverse. The output is the Discrete Wavelet
Transform of the input variable.
e The constant e=2.71828...
encode(var,proximity,type,value1[,value2,...valueN])
Takes as input a numeric variable, a real constant proximity, a constant
indicating the encoding type, and N constant values, such that N ≥ 1 . This
special transform requires up to N output variable names (except for
type=e_range, there are only N-1 outputs), as described in “Transforms
With Multiple Outputs” on page 7-10.
For proximity=0., these are the encoding types and what they mean:
e_exact
Each output column i contains 1.0 if the input variable exactly equals
valuei, and 0.0 otherwise, for i=1,…,N.
e_lessthan
Each output column i contains 1.0 if the input variable is less than or
equal to valuei, and 0.0 otherwise, for i=1,…,N.
e_range
Each output column i contains 1.0 if valuei ≤ the input variable
≤ valuei+1, for i=1,…,N-1.
heartBeat(var,N)
Write the integer value N to variable var. If N is 0, alternate between writing 0
and 1. This transform is useful for setting up a heartbeat signal for on-line
applications so that your DCS can verify that the application is running.
holdLast(var [,num_cycles])
If variable var has a bad status, use the last good value for var instead. If you
do not specify the optional num_cycles argument, the last good value is
used indefinitely, or until status returns to good. If you specify the
num_cycles argument, the last good value is used for at most that number of
execution cycles or until status returns to good, whichever comes first.
This transform provides improved robustness in the on-line environment
where you may need a model to function in spite of a bad status. In the off-line
environment, a cycle corresponds to a row in the dataset. The num_cycles
argument may be greater than the size of the dataset; the transform maintains
its state history from row to row regardless of dataset size.
hour(dtVal)
Returns a whole number indicating the hour (0-23) of the input date/time.
if(expression,trueValue,falseValue)
See “Conditional Expression Constructors” on page 7-14.
ifft(realVar,imagVar,length)
This function takes as input two variables that are interpreted as real and
imaginary values, and an FFT length that is a constant power of 2, and
returns a single real column that is the raw inverse FFT.
inf Constant, signifying infinity.
insertrows(var,startrow[,numberOfRows])
The arguments are a variable of any type, a starting row number, and an
optional number of rows. It inserts the specified numberOfRows above the
startrow, increasing the variable’s length. The new rows are filled with
Blanks. The output variable must be the same as the input.
int(x)
Causes the result of the numeric variable, constant, or calculation x to be an
integer. If the output is a new variable, it will be of type integer, but if the
output variable already exists, its type is not changed.
inverse
See $dwt described on page A-11.
log(x[,y])
The second argument may be omitted and is then assumed to be 10. The result
is the logarithm of x in base y.
lookup(var1,var2)
This function uses the values of the second argument as row numbers, and
returns the value of the first argument in the row number specified by the
second argument. The first argument can be of any variable type. Note that
both arguments must be variables. For example, suppose there is a variable
called !key!, and its values in the first four rows of the dataset are 3, 27, 49,
31; if you type the transform
$lookup(!flow1!,!key!)
then the results in the first four rows of the dataset will be the original values of
!flow1! from rows 3, 27, 49, and 31.
lookupRel(var1,var2)
This relative lookup function returns the value of the first argument in a row
number calculated by adding the values of the second argument to the current
row number. The first argument can be of any variable type. Note that both
arguments must be variables. For example, suppose there is a variable called
!key!, and its values in the first four rows of the dataset are 3, 27, 49, 31; if
you type the transform
$lookupRel(!flow1!,!key!)
then the results in the first four rows of the dataset will be the original values of
!flow1! from rows 4, 29, 52, and 35.
m_test, m_train, m_valid, m_ignore
These constants are used to set values in a variable that you designate in the
Set Patterns window to specify patterns for a newly built model (for more
information, see “Use Variable” on page 8-55). The numeric values can be
interpreted using the $ttv transform, described on page A-25.
markcut
See “System-Generated Transforms” on page A-30.
max(x[,y,z,…])
Takes any number of arguments and finds the maximum. If there is only one
argument, and it is a variable, the result at every row is the variable’s maximum
over all rows in the dataset. In any other case, the result at each row is
evaluated using a variable’s value in that row.
moveexp(var,size[,alignment,decayRate[,threshold]])
Exponential average in the moving window. The decayRate is as defined for
the $expave transform, described on page A-13. See also “Moving Window
Transforms” on page A-2.
movegauss(var,size[,alignment,stddev[,threshold]])
Gaussian filter applied within a moving window. The standard deviation
argument must be a positive constant. See also “Moving Window Transforms”
on page A-2.
movels(var,size[,alignment[,threshold]])
Moving least squares fit to the points in the moving window. See also “Moving
Window Transforms” on page A-2.
movemax(var,size[,alignment[,threshold]])
Maximum value within the moving window. See also “Moving Window
Transforms” on page A-2.
movemed(var,size[,alignment[,threshold]])
Median value within the moving window. See also “Moving Window
Transforms” on page A-2.
movemeda(var,size[,alignment[,threshold]])
Approximating median within the moving window. See also “Moving Window
Transforms” on page A-2.
movemin(var,size[,alignment[,threshold]])
Minimum value within the moving window. See also “Moving Window
Transforms” on page A-2.
movesd(var,size[,alignment[,threshold]])
Standard deviation within the moving window. See also “Moving Window
Transforms” on page A-2.
movevalid(var,size[,alignment])
Number of valid points within the moving window (if all points are valid, at
breaks and boundaries it shrinks as the window size shrinks). See also
“Moving Window Transforms” on page A-2. Note that this transform, unlike
all other Moving Window transforms, does not take a threshold argument.
none A general-purpose constant used as an input to a number of transforms,
documented with any function that can use it.
now A date/time constant indicating the current date and time.
sigmoid(x[,a[,b]])
Returns the sigmoid of x,
1
sigmoid(x) = -------------------------
-
1 + e a(b – x)
where a controls the slope and defaults to 1, and b controls the location and
defaults to 0. The result is centered around b; it will be about 0 at b-6a and
about 1 at b+6a.
sign(x)
Returns -1 if x<0; returns 0 if x=0; and returns +1 if x>0.
sin(x)
Sine of a value that is in radians.
sind(x)
Sine of a value that is in degrees.
sort(var1,var2[,direction])
This function takes as input two variables and an optional direction. It
sorts the second column in the order of the first column, in either ascending or
descending order. The direction is the constant $ascend or $descend, and
defaults to $ascend if omitted.
spec(var1,[var2,]fftlen,displen,overlap)
This transform calculates a power spectrum. The input column is divided into
segments (which may overlap). The mean is subtracted from each segment, a
tapering window is applied, and a FFT is calculated. The results are averaged
and returned in a single output column. If the optional second variable is
omitted, the result is auto-power rather than cross-power. fftlen is the
length of each segment, displen is the length of the generated columns, and
overlap is the amount to overlap each segment.
sqrt(x)
Square root.
status(var)
The input variable can be of any type. The output is a string variable displaying
tan(x)
Tangent of a value that is in radians.
tand(x)
Tangent of a value that is in degrees.
timecut
See “System-Generated Transforms” on page A-30.
timemerge
See “System-Generated Transforms” on page A-30.
tm_average, tm_boxcar, and so forth: all keywords beginning with tm_
See $timemerge described on page A-31.
tmerge(valueVar,dtVar,method,interval,maxTimeGap,
handleDuplicates, handleOutOfOrder,maxCert)
This transform is similar to $TimeMerge. It takes as input a variable that has
numeric or string values and a date/time variable, and merges the value
variable to match the date/time variable. Unlike the system-generated
$TimeMerge, it does not delete the value variable’s original date/time
column. The additional arguments are the same as for $timemerge
described on page A-31. The output variable must be the same as the input
value variable.
today
A date/time constant indicating the current date, with time set to midnight.
trend(var)
The result of this transform is a new variable that is the best linear fit to the
input variable.
trunc(x)
Truncate the fractional portion of a number. For example, $trunc(4.9)=4,
and $trunc(-4.9)=-4.
ttv(var)
This transform is provided for your convenience in interpreting the numeric
values underlying the $m_test, $m_train, $m_valid, and $m_ignore
constants. It takes as input one numeric variable containing these constants’
values, and returns a string variable indicating the interpretation of each
constant, "Test", "Train", "Valid", "Ignore".
rand (No arguments.) At each row, generates a random real number between 0.0 and
1.0.
random([a,]b)
At each row, generates a random whole number inclusively between arguments
a and b. If a is omitted, it is assumed to be 1. a must be ≤ b.
randomS(seed,[a,]b)
At each row, generates a seeded random whole number inclusively between
arguments a and b. The seed is any positive integer. If a is omitted, it is
assumed to be 1. a must be ≤ b.
randS(seed)
At each row, generates a seeded random real number between 0.0 and 1.0. The
seed is any positive integer.
Gaussian Random
The random number transforms generate uniform sequences. If you want to use random
numbers with gaussian distribution instead, you must start by generating two random
variables, for example u1 and u2:
!u1! = $randS(42580)
!u2! = $randS(80191)
From these two variables, you can apply the following transform to generate a gaussian
sequence with mean 0. and std dev 1., for example g1:
!g1! = $sqrt(-2. * $ln(!u1!)) * $cos(2.* $pi * !u2!)
You can generate a second gaussian sequence, for example g2, from the same two
uniform variables:
!g2! = $sqrt(-2. * $ln(!u1!)) * $sin(2.* $pi * !u2!)
b_min
Minimum value (*).
b_nvalid
Number of valid values.
b_slope
Slope (based on a Least Squares fit).
b_std
Standard deviation.
Features that are marked with an asterisk (*) can be qualified by these codes.
b_index
Return the row number of the value, rather than its value.
b_time
Return the time of the value, rather than its value.
b_value
Return the value (default).
batchBreak(index,data)
This transform is used to insert Breaks in a data column between the end of
each batch and beginning of the next batch. It is used only when you wish to
perform some type of filtering (such as moving average, $moveAve), on the
data before extracting batch feature information. The index argument is the
column output from $batchIndex.
batchIndex(var,type[,tolerance])
This transform takes as input a variable containing batch identifiers and
generates a new column containing the row numbers in which each new batch
starts. This new index column is then used as input to $batch or $batchX
transforms. The batch identifier may be of any variable type. All records
pertaining to a single batch must be contiguous. The type argument is one of
these three constants, which control when to mark the start of a new batch:
b_change
Mark when a value differs from the previous value.
b_fall
Mark when a value decreases from the previous value.
System-Generated Transforms
These transform functions are generated automatically from user actions in the
spreadsheet or the Preprocessor Plot Window. When these functions have been applied
to a dataset, they appear in the Transform List and you can delete or modify them as
necessary. You can also type them directly into the Transform Window.
override(var,startrow,[endrow,]value)
This transform is generated when you change the value of a cell by clicking in
it and typing a new value. If endrow is omitted, it is assumed to be the same
as startrow. The transform generated by the system omits the endrow
because it is applied to a single cell, but you can modify the transform to apply
to a group of rows from startrow to a different endrow.
timemerge(start,end,interval,method,maxTimeGap,
handleDuplicates, handleOutOfOrder, maxCert,
var1[,…,varN])
This transform is generated when you apply a Time Merge from the Time
Merge window; see “Transform Window” on page 7-3 for more information on
the meanings of the parameters. The output of this transform is the new date/
time variable.
start is $tm_early_start (earliest start of any date/time column in the
dataset), $tm_late_start (latest start of any date/time column in the
dataset), or a date/time constant; end is $tm_late_end (latest end of any
date/time column in the dataset), $tm_early_end (earliest end of any date/
time column in the dataset), or a date/time constant. (The format for date/time
constants is described on page A-1).
interval is the Time Merge interval that you specified (in the format
described for Date/Time increments described on page A-2).
method is any of the constants $tm_boxcar, $tm_linear,
$tm_spline, $tm_linearExtend, $tm_splineExtend,
corresponding to your menu selection for the Time Merge method.
maxTimeGap is the Max Time Gap value that you specified, in the same
format for Date/Time increments.
handleDuplicates is one of the constants $tm_first,
$tm_average, or $tm_last, corresponding to your menu selection for
handling duplicate time values.
handleOutOfOrder is one of the constants $tm_cut or $tm_sort,
corresponding to your menu selection for handling out-of-order time values.
maxCert does not correspond to any field on the Time Merge window; it
controls how far a value can be from known data before its certainty is set to 0.
For more information, see “Certainty” on page 5-42.
unscatcut(Yvar,Ylow,Yhigh,Xvar,Xlow,Xhigh,startrow,endrow)
This transform is generated when you apply an Uncut to an XY plot. Within
the specified row number range, it uncuts points from the Y variable when both
Xvar and Yvar fall within the specified range of values. The output variable
must be the same as the input Y variable.
untimecut(var,dtVal1,dtVal2,low,high)
This transform is generated when you apply an Uncut to a Time Series plot. It
uncuts points within the range specified by the date/time values and the low
and high values. The output variable must be the same as the input.
Transform Finder
This section is provided to help you find transforms by topic. Consult the reference
pages above for syntax.
Editing
change a value $override
clear (set to blank) $clearRows
clip $clipAbove, $clipBelow
Plot Cuts
Status
break status, assign $break
certainty, query or set $certainty, $setCert
change status $changestat, $forcestat
cut status, assign $cutStat
error status, assign $error
query status $status, $valid
Strings
create string $str
combine multiple strings into a single string $str
compare string values $compare
convert ASCII codes to characters $char
convert number to string $fmt
convert string to number $val
duplicate string contents $dup
find position of a character in a string $pos
make new string $str
read date/time from string $dtread
search & replace within string $subst
substring $left, $mid, $midN, $right
write date/time to string $dtwrite
Type Conversion
See also “Type Forcing”, below.
Type Forcing
force to date/time $dt
force to double precision $double
force to integer $int
force to real number $real
force to string $str
When training a model or making What If calculations, the model trainer computes a
relative error that indicates the discrepancy between the actual output values in the
dataset and the predicted output values generated by the model.
A relative error, if computable, is a real number greater than 0. A relative error of 0.
would indicate that a model can perfectly predict outputs from inputs. A relative error of
1. would indicate that a model predicts as well as predicting the mean of the data.
Generally, a model with a relative error less than about 0.8 can be useful. If a relative
error is not computable, for example, if there is only a single pattern in the dataset, it is
assigned a value of -1.
The relative error is computed at three levels:
• For individual output variables: rel-errout
• For the output variables as a composite for each pattern: rel-errpat
• For the patterns as a composite for the entire dataset: rel-errtot
Squared errors are used to compute relative errors. Definitions of these error measures
follow.
2
sq-err out = ( y out – ŷ out )
where y out is the actual value for the output variable in the pattern and ŷ out is the output
value derived by the model.
sq-errpat
The squared error for a pattern is the sum of the squared errors for each output in the
pattern:
N outs
where out is the output index and N outs is the total number of outputs in the pattern.
sq-errtot
The squared error for a dataset is the sum of the squared errors for each pattern in the
dataset:
N pats
where pat is the pattern index and N pats is the total number of patterns in the dataset.
rel-errout
The relative error for an individual output variable is defined as:
sq-err out
rel-errout = -------------------
σ 2 out
2
where σ out is the variance of the actual values of the output.
The standard deviation for an output is computed from all the values for the output in
the dataset used during training.
sq-err pat
rel-err pat = ----------------------------------
2
-
N outs × σ all-outs
where
N outs
2 1 2
σ all-outs = -----------
N outs å σ out
out = 1
rel-errtot
The relative error for an entire dataset is defined as:
sq-err tot
rel-errtot = ----------------------------------------------------
2
-
N pats × N outs × σ all-outs
Relationship to R2
Relative error is not the same as the commonly-used statistical measure, R2. If relative
error is less than or equal to 1., the two measures are related as follows:
2 2
R = 1 – rel-err
If relative error is greater than 1., R2 is undefined and is displayed as zero (0.).
Questions
These are some frequently-asked questions (FAQs) plus additional tips and hints.
5. There is a date & time variable that the formatter can’t interpret.
1. First, check the format keys in “Units” on page 4-12, to be sure that the date/time
style can’t be understood by the formatter. Most date/time styles can be understood
if you specify the Units correctly.
2. If the date/time truly cannot be interpreted by the formatter, try to use the editor to
patch the file. If even this is not possible, in the formatter set the variable Type
(described on page 4-15) to String. For example, the formatter can read date
followed by time, or time followed by date, but not the style
Thu Aug 1 08:30:00 1991
where the time is inserted between parts of the date.
4. Look in the spreadsheet to find the column number of the new date/time variable
that was created by the transform.
5. In the After Transforms view of the dataset, in the header rows at the top of the
dataset, select the Time Col cells for all of the data variables that should use this
date/time variable, and then, in the Edit area, type in its column number.
CAUTION: At this point, do not save the dataset again, or you will
destroy all of its transforms.
3. Display the Before Transforms view of the dataset, and change the Date/Time
pointers that you wanted to.
4. From the File menu in the spreadsheet, select Inherit Transforms. Select the name
of the dataset that you just saved. This will bring the transforms back in from disk,
evaluating them once only.
5. If any problem occurs, simply abandon the changed version of the dataset and
reload the saved version. If the transforms are evaluated correctly with no
problems, you may now save the dataset.
2. Since flow1_num is a computed column, use the Copy Values function (described
on page 5-32) to copy it into a raw column, flow1_raw.
8. The dataset already has a Time Merge, but now I have added some
more variables, and they have to be Time Merged also. Do I have to sit
through two Time Merges every time it evaluates the Transform List?
No. Time Merge is implemented as a transform, so you can modify the original Time
Merge to include the new variables also. The syntax of the Time Merge transform
(described on page A-31) is fairly complex, so you generally use the Time Merge
window (described on page 5-38) to create it; but it is not difficult to add a new variable.
1. In the Transform window, locate the Time Merge in the transform list. Double-click
on it to bring it up into the Expression box to be modified. It will look
approximately like this:
3. Click in the transform, just to the left of the closing parenthesis. Type a comma,
followed by the name of the new date/time variable.
4. Hit the Return key, or click Modify; then click Update Dataset. The transform list
will be reevaluated. All of your variables will be merged now, in a single transform.
9. Two variables have the same Date/Time column, but I want them to
use different Tfme Merge methods.
This can be done, but it requires two Time Merges. For this example, we will use a
dataset with one date/time column, dt, and two data columns, data1 and data2.
1. Use the Duplicate function in the Operations buttons at the bottom of the
spreadsheet window to duplicate the date/time column, dt, into dt2. For this
example, the new date/time column will be column number 4.
4. Select only dt, then fill in the rest of the Time Merge information, and merge.
6. Now change the date/time pointer of data2 so that it is the same as the date/time
pointer of data1. You can do this from the spreadsheet, as described in several
other Hints above; or you can simply add the transform directly:
!data2! = $changeDate(!data2!, !merged_time!)
An alternative method would be to use the $tmerge transform, described on page
A-25.
2. In the Plot window, make a Histogram plot of the variable. Turn on the Cumulative
toggle button. Set the Number of Bins to a number so large that you cannot see the
individual bins.
4. Zoom again if necessary, then use the Info tool to find the range of values in the
histogram bin at this point. This value is the first decile.
5. Similarly, find the tenth decile where the distribution is 90% of the total valid
points.
When the prediction column is created, its date/time pointer (displayed in the
spreadsheet in the header row called Time Col) is none. Simply click in this cell and
type in the column number of the date/time column.
This Appendix documents all files created and used by Insights. All filenames are case-
sensitive on computer systems that are case-sensitive. All files are ASCII, except that
datasetname.pi_data can be either ASCII or binary, depending on how you
saved the dataset; you need to know which format it is if you use FTP to transfer a
dataset from one machine to another. If you move or copy a dataset or model from one
directory or machine to another, you must move or copy all files that are marked
“required” in this table.
Log Files
session_n.pi_script
A record is kept of all actions that you take during each session of Insights. This
record is saved in a file called session_n.pi_script, where n is a sequence
number that increments with every session until it reaches the limit that is specified
by the symbolic name PAVILION_SCRIPTS, and then restarts at 1. The current
file must be retained but old files may be deleted. The file is a Visual Basic (VB)
script. If you know VB, you can edit the file as needed. You can use the script to run
Insights from another application. You can run a specific session either by selecting
Print Files
file.ps
When you print from Insights, you have the choice of printing directly to a printer
or writing a PostScript file. If you write to a file, the default filename is file.ps.
This chapter explains how to develop and add your own transforms for use by the
transform calculator.
If the transform calculator does not provide all the services you need, you can add your
own transforms. Adding a transform requires that you write the transform in the C
programming language.
Some example user-define transforms are provided, and appear in the User-Defined
group of functions in the transform calculator’s Functions and Constants list (but do not
appear in the All functions list).
You can add at most ten user-defined transforms.
A user-defined transform must return one vector as output. This restriction means, for
example, that you cannot develop a fast fourier transform (fft) because it returns two
vectors, one real and one imaginary. You could, however, create two user-defined
transforms, fft_real and fft_imaginary.
The inputs to the user-defined transforms have no restrictions.
Symbols
! 4-12, 7-8, 7-24
" 7-25
$ 7-8, 7-24
: 7-12, 7-16
; 7-8, 7-16
[ ] 7-24
’ 7-25
Numerics
2000 year xv
A
ABB AEH data extractor 2-1
$abs A-3
$acos A-3
$acosd A-3
Action menu 11-6
selection region 11-5
B
$b_change A-29
$b_dcont A-28
$b_dfirst A-28
$b_dlast A-28
$b_dnum A-28
$b_fall A-29
$b_index A-29
$b_last A-28
$b_max A-28
$b_mean A-28
$b_min A-29
$b_nvalid A-29
$b_rise A-30
$b_slope A-29
$b_std A-29
$b_time A-29
$b_value A-29
Back button 1-3
$batch A-28
batch transforms A-27
$batchBreak A-29
$batchIndex A-29
$batchX A-30
Before Transforms 5-14
changing dataset 5-16, 6-29
best epoch 9-21, 9-23
best inputs 8-21
$biasSensor A-5
Binary data type 5-52
bounds checking A-6
boxcar 5-39
braces, curly{ } 7-24
brackets, square 7-24
$break A-5
breakpoint 7-22
browser 1-3
button layout 1-5
C
carriage return 3-1
$center A-2
$certainty A-5
certainty 5-42, 9-16, A-22, A-31
$changeDate A-30
$changelen A-6
$changestat 5-16, 6-29, A-30
$char A-6
$checkFlatline A-6
$checkRange A-6
$checkRate A-6
clamping 11-21
in Utilities menu 11-4
clear
dataset 5-55
model 8-62
$clearRows 5-56, A-6
Clip tool 6-33
$clipabove 6-30, A-32
$clipbelow 6-30, A-32
closed-loop control data 8-46
colon (:) 7-12, 7-16
colored dot 6-10, 10-18
D
data
maximum value xv
raw 5-1
transformed 5-1
data dictionary 5-56, 9-9, C-2
delete dataset 5-12
delete model 9-6
data extractor 2-1, 5-4
data extractor wizard 1-4
data file 1-3
creating 3-5
deleting 3-13
editing 3-6
data pattern 8-48
data plotter 1-8
data spreadsheet 1-7
data type 5-52
dataset 5-1
adding new rows 5-48
adding new variables 5-47
adding together 5-50
After Transforms 5-16, 6-29
Before Transforms 5-14, 5-16, 6-29
clearing 5-55
copy see Save Dataset As
creating 1-4, 5-4
delete 5-12
deleting 5-55
editing 5-56
inheriting transforms 5-50
$dtcreate A-10
$dtdiff A-10
$dtmake A-10
$dtread A-10
$dtround A-10
$dtwrite A-10
$dup A-11
$duprows 5-56, A-11
$dwt A-11
E
$e A-11
$e_exact A-11
$e_lessthan A-11
$e_range A-11
Edit menu 1-5
editing
data file 3-6
dataset 5-15
format 4-25
format headers 4-16
model connections 8-32
transforms 7-18
transforms for A-34
editor 3-1
saving file 3-12
search and replace 3-9
eigenvalue 6-21, 8-62
eigenvector 8-62
$encode A-11
Enterprise Historian data extractor 2-1
epoch 8-48, 9-21, 9-23
$error A-12
error computation 11-22, 11-25
Error History plot 9-19
$etread A-12
$etwrite A-12
exclamation point 4-12, 7-8, 7-24
Exclude 10-20
$exp A-13
$expave A-13
external model 8-4
extractor 2-1
extractor see data extractor 5-4
extrapolate 5-35
extrapolation 5-39
F
FANN model 8-2
filtering 9-17
phase 8-3
time delay 8-12
$fft A-14
file editor 3-1
menu 1-7
file format 1-3, 1-7
File menu 1-3, 9-3
File Transfer Protocol 5-52
filenames D-1
filter 8-49, 8-57, 8-64
disable 8-57
use 8-57
with FANN model 9-17
$filter_disable A-4, A-13
$filter_freeze A-13
$filter_smooth A-4, A-13
filtering transforms A-36
Final Epoch 9-15
final value 11-21, 11-25
find best inputs 8-21
$findle 7-15, 8-56, A-14
flatline detection A-6
$fmt A-14
focus 7-17
$forcestat A-14
Format 4-1
column separator 4-7
columns 4-10
copy 4-20
delete 4-25
edit 4-25
error 4-18
key concepts 4-27
new 4-3
row flags 4-7
rows 4-6
verify 4-18
Format File 4-1
format file 1-7
Formatted File 4-1, 5-1
formatter 1-3
menu 1-7
$forward A-11
Forward button 1-3
Fourier transform A-14
inverse A-15
freeze tool 6-38
FTP 5-52
fuzzy constraint 11-25
G
gain 10-12
gain constraint 8-34
gain constraints 9-23
gap 5-41
Gaussian random A-27
get 7-24
graph type 6-4
H
hard constraints 11-21
$heartBeat A-15
help
positional 1-10
Help menu 1-9
histogram plot 6-17
$holdLast A-15
Home button 1-3
home page 1-3
$hour A-15
I
IEEE standard xv
$if 7-14, A-15
if 7-25
$ifft A-15
Include Box 10-18
Include Left 10-17
independent variable (dataset) 5-2, 7-12
independent variable (model) 8-2, 11-3
$inf A-15
infinity A-15
Info
Predicted vs. Actual plot 10-7
preprocessor plot 6-31
J
$join A-16
L
$lag A-2
$lead A-2
$left A-16
legends
display 6-9
$len A-16
length
of column 7-12, A-6
of string A-16
line type 6-4
linear extrend 5-39
linear model
custom 8-34
prediction 8-32
$ln A-16
loading
dataset 5-10, 9-3
editor file 3-3
model 9-4
$log A-17
logarithm A-16, A-17
$lookup A-17
$lookupRel A-17
M
$m_ignore 8-56, A-17
$m_test 8-56, A-17
$m_train 8-56, A-17
$m_valid 8-56, A-17
main window 1-3
$markcut 5-50, 6-30, A-32
mask 7-7
math transforms A-36
$max A-17
max gain 8-37
maximum data value xv
Maximum Time Gap 5-41
$mean A-18
$median A-18
median, approximating A-19
menu 1-3
method 5-39
$mid A-18
$midn A-18
$millisec A-18
$min A-18
min gain 8-37
$minute A-18
$mod A-18
model
analyzer 1-9
auto modeler wizard 1-4
builder 1-8
building 8-1
checklist 8-7
clearing 8-62
connectivity 8-32
converting type 8-5
copy 9-6, 9-7
custom 8-3, 8-34
delete 9-6
deleting 9-9
editing connections 8-32
external 8-4
extrapolation 8-34
FANN 8-2
gain constraint 8-34
internal parameters 8-33
linear 8-32, 8-34
loading 9-4
N
n/a 4-16, 5-14
new format 4-3
newline 3-1
noise A-26
$none A-19
nonlinear correlation 8-12
normalized plot 6-4
setting Y axis 6-8
$not 7-14
$now A-19
$nrows A-20
$nvalid A-20
O
$or 7-14
$ord A-20
original value 11-24
output
error 9-19
relative error 9-19
Output vs. % 10-22
errors 10-33
example 10-24
plot 10-34
selection window 10-27
stepping a variable 10-23
overlay 6-4
$override 5-50, A-31
overtrain 9-11
P
P.C.A. plot 6-18
paste 1-5
pattern 8-48
pav_info script xv
PCA 8-4
$pca A-20
PCR 8-4
PCR model 8-58
Peak sensitivity 10-10
phase 8-3
$pi A-20
plot
Before Transforms 6-2
Clip 6-33
colored dot 6-10
continuous update 6-3
correlation 6-22
Q
quantile-quantile plot 6-16
quotes 7-25
R
R2 9-11, 10-2
$rand A-27
$random A-27
random numbers A-26
$randomS A-27
$randS A-27
range violation detection A-6
$rank A-21
rate constraints 11-21
rate of change violation detection A-6
raw data file 1-3
Raw Table Editor 10-31
raw variable 5-2
$real A-21
regular training 9-12
Relative Error 9-11, 10-2, B-1
relative error 9-19, B-5
release notes xiv
Replace Best with Current 9-24
S
sampling interval 5-35
save
dataset 5-50
dataset report 5-52
edited file 3-12
model 8-62
Save Dataset As 5-50
Save Model As see Copy Model
$scale A-21
$scatcut 6-30, A-32
score 6-21
Search
dataset 5-24, 5-25, 5-26, 5-27, 5-28
search
editor 3-9
$second A-21
$self 7-9, A-22
semicolon (;) 7-8, 7-16
sensitivity 8-21
Sensitivity vs. Rank 10-9
interpretation 10-14
measures 10-9
plot 10-14, 10-17, 10-18, 10-20
removing model variables 10-16
T
tag name 4-3, 4-12, 5-8, 7-9
displaying 11-4
restrictions 4-12, 5-9
$tan A-25
$tand A-25
tau 8-11
Tcl variable 7-23
test set 8-48
interval 8-53
random 8-55
stiff training 9-13
variable 8-55
verifying 8-65
testing patterns see test set
text editor 3-1
Time Col see Date/Time reference
time delay plot 8-20
time delays 5-35, 8-10, 8-11
calculating automatically 8-12
mapping into model 8-29
specifying manually 8-29
time gap 5-41
time interval 5-35
Time Merge 5-35, C-8, C-11
interval 5-35
spikes 5-40
transform see $TimeMerge
when required 5-35
window 5-38
time series plot 6-13
batch A-27
braces, curly 7-24
brackets, square 7-24
breakpoint 7-22
colon (:) 7-12, 7-16
comment 7-8
conditional expressions 7-14
curly braces 7-24
Date/Time reference 7-2
debugging 7-22
deleting 7-20
depend 7-2
dollar sign 7-8, 7-24
editing 7-18
entering 7-16
entering multiple 7-11
errors 7-22
exclamation point 7-8, 7-24
for converting A-41
for editing A-34
for filtering A-36
for smoothing A-36
for strings A-40
for type forcing A-42
from plot tools 6-30
index numbers 7-6
inheriting 5-50
input 7-2
invalid mix 7-2
list 7-6
mask 7-7
math A-36
miscellaneous A-38
modifying 7-19
moving window A-2
multiple outputs 7-10
on date/times A-34
order 7-2
output 7-2
random numbers A-26
semicolon (;) 7-8, 7-16
signal processing A-40
square brackets 7-24
statistics A-36
status A-40
syntax 7-7
system-generated A-30
U
unattached variable 5-39
Uncut
Predicted vs. Actual plot 10-9
preprocessor plot 6-37
units 4-12
$unmarkcut 6-30, A-32
$unscatcut 6-30, A-33
$untimecut 6-30, A-33
update initial button 11-8
user-defined transforms 7-23, A-2, E-1
Utilities button 11-4
Utilities menu 11-4
V
$val A-26
$valid A-26
validation patterns see test set
validation set see test set 8-49
variable 5-2
adding new 5-47
computed 5-2
copy 5-32
deleting 5-34
depend 7-2
dependent 8-2, 11-3
displaying name 11-4
duplicate 5-32
for test set 8-55
in a model 8-30
independent 5-2, 7-12, 8-2, 11-3
mapping into model 8-8
moving 5-58
origin 5-39
properties 5-43
raw 5-2
rename 5-15, 9-7
reordering 5-58
sorting 5-58
tag name and comment 4-12, 5-2, 5-8, 5-9, 7-9
Tcl 7-23
time delay 8-10, 8-29, 8-30
type 4-15, 5-39, A-42, C-15
types in models 8-2
unattached 5-39
units 4-12
variable bounds 8-65, 9-5, 10-23
disabled 8-67
gain constraints 8-66
report 8-69
setting 8-66
variables
sorting 10-21
view 1-5
view parameters 11-12
W
wavelet transform A-11
web browser 1-3
$weekday A-26
Welcome page 1-3
What Ifs 11-1
what ifs 1-9
window layout 1-5
window, moving A-2
$withinpct 7-15, A-26
wizard
auto modeler 1-4
data extractor 1-4, 5-4
working directory C-2
X
$xor 7-14
XY plot 6-15
Y
Y axis limits 6-27
Y variables 6-10
$year A-26
Z
Zoom tool 6-37