0% found this document useful (0 votes)
75 views480 pages

Enterprise Historian: Pavilion Insights

Uploaded by

Rodrigo Sampaio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views480 pages

Enterprise Historian: Pavilion Insights

Uploaded by

Rodrigo Sampaio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 480

Enterprise Historian™

Pavilion Insights
Version 5.0

User’s Guide
3BUF 000 497R0001 REV A
NOTICE
The information in this document is subject to change without notice and should not be construed as a commitment by
ABB Automation, Inc. ABB Automation, Inc. assumes no responsibility for any errors that may appear in this document.
In no event shall ABB Automation, Inc. be liable for direct, indirect, special, incidental, or consequential damages of any
nature or kind arising from the use of this document, nor shall ABB Automation, Inc. be liable for incidental or
consequential damages arising from use of any software or hardware described in this document.
This document and parts thereof must not be reproduced or copied without ABB Automation, Inc. ’s written permission,
and the contents thereof must not be imparted to a third party nor be used for any unauthorized purpose.
The software described in this document is furnished under a license and may be used, copied, or disclosed only in
accordance with the terms of such license.

TRADEMARKS
Advant, AdvaCommand, AdvaInform, and AdvaBuild are registered trademarks of ABB Asea Brown Boveri Ltd.,
Switzerland.
See additional legal notices in Preface section.

Copyright © ABB Automation, Inc. 2000

3BUF 000 497R0001 REV A


INSIGHTS ™

What you can understand...


You can improve...

User’s Guide
Version 5.0
June 1999
Pavilion Technologies, Inc. has made substantial efforts to ensure the accuracy of this document. Pavilion
Technologies, Inc. assumes no responsibility for any damages that may result from errors or omissions in
this document. The information in this document is subject to change without notice and should not be
construed as a commitment by Pavilion Technologies, Inc.
The software described in this document is furnished under a license and may be used or copied only in
accordance with the terms of such license.
Copyright Pavilion Technologies, Inc., 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999. All Rights
Reserved.
The following are registered trademarks of Pavilion Technologies, Inc.: Data Insights, OnLine Transform
Processor, Pavilion, Pavilion Data Interface, Process Insights, Process Perfecter, Sensor Validator,
Simulation Insights, Soft CEM, Soft Sensor, Soft Sensor Insights, Software CEM, Pavilion - Turning Your
Data Into Gold.
The following are trademarks of Pavilion Technologies, Inc.: BOOST, Boiler OnLine Optimization
Software Technology, DataIns, Economic Insights, Insights, OnLine Learning, Pavilion OnLine
Applications, Pavilion RunTime Products, PDI, Plant Optimizer, Power Insights, Power Insights Suite,
Process Optimizer, ProcIns, Production Chain Optimization, Programless OnLine Engine, Property
Predictor, RunTime Application Engine, RunTime Software Controller, Simulation Insights, Soft Sensor
Insights, Virtual OnLine Analyzer, VOA.

Advant, AdvaCommand, AdvaInform, and AdvaBuild are registered trademarks of ABB Asea Brown
Boveri Ltd, Switzerland. Enterprise Historian is a trademark of ABB Asea Brown Boveri Ltd, Switzerland.
AIM is a trademark of W.R. Biles & Associates, Inc.
CM50S and TDC 3000 are trademarks of Honeywell Inc.
Exceed is a trademark of Hummingbird Communications, Ltd.
Foxboro and I/A Series are registered trademarks of The Foxboro Company.
GLOBEtrotter, GLOBEtrotter Software, FLEXlm and Flexible License Manager are registered trademarks
of GLOBEtrotter Software, Inc.
HP, Apollo, and HP-UX are registered trademarks of Hewlett-Packard Company.
IBM, RS-6000, and AIX are trademarks of International Business Machines Corporation.
Motif, OSF/1, UNIX and the "X" device are registered trademarks and IT DialTone and The Open Group
are trademarks of The Open Group in the US and other countries.
OpenVMS, VAX, DEC, and DECnet are trademarks of Digital Equipment Corporation.
PI and PI-ProcessBook are trademarks of OSI Software, Inc.
PostScript is a trademark of Adobe Systems Incorporated.
Sentinel and Sentinel SuperPro are registered trademarks of Rainbow Technologies, Inc.
SUN, Sun Microsystems, and Solaris are registered trademarks of Sun Microsystems, Inc., and SunOS is a
trademark of Sun Microsystems, Inc.
Windows NT, Windows, Excel for Windows, and Notepad are registered trademarks of Microsoft
Corporation.
X Window System is a trademark of the Massachusetts Institute of Technology.
Contents

Preface xiii
How This Manual Is Organized xiii
Release Notes xiv
Year 2000 xv
Input Data Restriction xv
Number of Windows xv
Technical Support xv

Chapter 1: Introduction
Getting Started 1-2
Product Overview 1-3
File Pull-Down Menu 1-3
Edit Pull-Down Menu 1-5
View Pull-Down Menu 1-5
Tools Pull-Down Menu 1-7
File Editor 1-7
File Formatter 1-7

Insights User’s Guide i


Data Spreadsheet 1-7
Data Plotter 1-8
Transform Calculator 1-8
Model Builder 1-8
Model Trainer 1-8
Model Analysis 1-9
Plug-Ins 1-9
Help Pull-Down Menu 1-9

Chapter 2: Data Extractor Wizards


ABB Enterprise™ Historian 2-1
DataDirect and PDI Options Window 2-3
Login Window 2-3
Select Logs Window 2-5
Select Date and Times Window 2-7
Data Validation Window 2-9
Extract and Save Data Window 2-11
What the Wizard Does 2-12

Chapter 3: File Editor


Selecting a File to Edit 3-3
Changing Directories 3-4
Loading a File 3-4
Creating a New File 3-5
Editing a File 3-6
Search and Replace 3-9
Menu Bar 3-11
File Pull-Down Menu 3-11
Edit Pull-Down Menu 3-14
Exiting the Editor 3-14

Chapter 4: File Formatter


Data File Contents 4-2
Required 4-2
Structure 4-2
Date and Time Information 4-2
Optional 4-3

ii Insights User’s Guide


Section

Formatter Options 4-3


Creating a New Format 4-3
Selecting a File 4-4
Changing Directories 4-4
Loading a File 4-5
New Format 4-5
Step 1: Format Rows 4-6
Row Flags 4-7
Step 2: Format Columns 4-10
Tag Name and Comment 4-12
Units 4-12
Type 4-15
Date/Time 4-16
Edit Menu 4-16
Completing Step 2 4-17
Step 3: Verify Format 4-18
Display Format 4-19
Completing Step 3 4-20
Copying a Format 4-20
Selecting a Data File that Has Been Formatted 4-22
Selecting a Data File that Has Not Been Formatted 4-23
Copying the Format 4-24
Editing a Format 4-25
Deleting Formats 4-25
Key Concepts 4-27

Chapter 5: Spreadsheet
About Datasets 5-1
Transforms 5-2
Status 5-2
Date/Time Reference 5-3
Preprocessing Data 5-3
Creating a Dataset 5-4
Loading a Dataset 5-10
Spreadsheet Window 5-12
Header Information 5-14
Selecting a Region 5-14
Changing Spreadsheet Contents 5-15
Displaying Variable Statistics 5-17

Insights User’s Guide iii


Time Statistics 5-22
Viewing Data by Row or Column 5-23
Viewing Data by Value or Status 5-24
Search Types and Parameters 5-24
Scope and Order of Search 5-29
Invoking the Search 5-30
Repeating a Search 5-30
Printing the Spreadsheet 5-31
Variable Operations 5-32
Copying Variables 5-32
Deleting Variables 5-34
Time Merge 5-35
When Time Merging Is Required 5-35
Time Interval and Time Delays 5-35
Preparing the Dataset for Time Merge 5-36
Time Merge Window 5-38
Variable Properties 5-43
Common Properties 5-44
Before Transforms Properties: Analysis Variable Range 5-45
Menu Bar 5-46
Dataset Menu 5-47
Create New Dataset 5-47
Load Dataset 5-47
Add New Variables 5-47
Add New Rows (Before Transforms) 5-48
Add Dataset 5-50
Inherit Transforms 5-50
Save Dataset and Save Dataset As 5-50
Save Dataset Report 5-52
Clear Dataset 5-55
Delete Dataset File 5-55
Window Menu 5-56
Edit Menu 5-56
Edit Regions 5-56
Edit Operations 5-57
Reorder Menu 5-58
AVR Menu 5-59
Exiting the Spreadsheet 5-60

iv Insights User’s Guide


Section

Chapter 6: Data Plotter


Plot Appearance 6-4
Line Type 6-4
Graph Type 6-4
Display 6-9
Plot Types and Parameters 6-10
Selecting Y Variables 6-10
Row Number Plot 6-12
Time Series Plot 6-13
XY Plot 6-15
Probability Plot 6-16
Histogram Plot 6-17
P.C.A. Plot 6-18
Correlation Plot 6-22
Y Axis Limits 6-27
Crosshairs 6-27
Tools 6-28
Transform Tools 6-30
Info 6-31
Clip 6-33
Cut Y 6-34
Cut Box 6-34
Cut X 6-35
Uncut 6-37
Zoom 6-37
Freeze 6-38
Printing a Plot 6-39
Exiting the Plotter 6-39

Chapter 7: Transform Calculator


Transform Order 7-2
Transform Window 7-3
Syntax 7-7
Transforms With Multiple Outputs 7-10
Entering Multiple Transforms Simultaneously 7-11
Column Length of Independent Variables 7-12
Arithmetic Operators 7-13
Relational Operators 7-14

Insights User’s Guide v


Conditional Expression Constructors 7-14
Miscellaneous Buttons 7-16
Building a New Transform 7-16
Editing the Transform List 7-18
Append 7-19
Insert Before 7-19
Modify 7-19
Delete 7-20
Delete All 7-20
Cut 7-21
Copy 7-21
Paste Before, Paste After 7-21
Breakpoints: Debugging the Transform List 7-22
Transform Errors and Panic Stop 7-22
User-Defined Transforms 7-23
Transform Control Statements 7-23
General Rules 7-23
Commands and Examples 7-24

Chapter 8: Building a Model


Model Types and Variable Types 8-2
Prediction 8-2
FANN 8-2
Custom 8-3
PCR 8-4
External 8-4
Using the Model Builder 8-4
File Menu 8-8
Mapping Dataset Variables to the Model 8-8
Time Delays 8-10
Finding Time Delays Automatically 8-12
Specifying Time Delays Manually 8-29
Variables in the Model 8-30
Editing Connectivity 8-32
Internal Parameters 8-33
Gain Constraints and Extrapolation Training 8-34
Specifying Gain Constraints 8-36
Specifying Extrapolation Training Parameters 8-39
Saving the Model 8-40

vi Insights User’s Guide


Section

Training the Model 8-45


Monitoring Training 8-45
Best Epoch 8-45
How Gain Constraints Affect the Training and Testing Errors 8-45
Trouble Shooting Gain Constraints 8-47
Setting Patterns 8-48
Testing, Training, Validation 8-48
Filtering 8-49
Set Patterns Window 8-52
Additional Features 8-58
Select PCR Components 8-58
Saving or Abandoning the Model 8-62
Save Model and Save Model As 8-63
Model Statistics Window 8-64
Variable Bounds 8-65
Modifying an Existing Model 8-70

Chapter 9: Model Trainer


Pull-Down Menus 9-2
File Pull-Down Menu 9-3
Load Dataset 9-3
Load Model 9-4
Copy Model 9-6
Save Model As 9-7
Rename Model Variables 9-7
Delete Model File 9-9
Show Model Statistics 9-10
Show Variable Bounds 9-10
Epoch Pull-Down Menu 9-10
Phase Pull-Down Menu 9-10
Loading the Dataset and Model 9-11
Measures of Training Performance 9-11
Training Types 9-12
Regular 9-12
Stiff 9-13
Test and Validation Set Requirements 9-13
Completing Stiff Training 9-13
Ridge 9-14
Training Parameters 9-14

Insights User’s Guide vii


Stopping Criteria 9-15
Auto Learning Rate Adjustment 9-16
Sparse Data Algorithm 9-16
Starting and Stopping Training 9-17
Training Monitors 9-18
Error History Plot 9-19
Prediction Plots (Training Stripcharts) 9-21
Gain Constraint Monitor 9-23
Best Epoch Adjustment 9-23

Chapter 10: Model Analysis Tools


Generating Reports and Data Files 10-2
Predicted vs. Actual 10-2
Run Model Window 10-3
Predicted vs. Actual Window 10-6
Displaying Model Performance 10-7
Selecting Variables 10-8
Cutting Points 10-8
Sensitivity vs. Rank 10-9
Sensitivity Measures 10-9
Run Sensitivity Window 10-11
Sensitivity vs. Rank Window 10-13
Selecting Variables 10-14
Interpreting the Results 10-14
Removing Model Variables 10-16
Sorting the Dataset 10-21
File Menu 10-22
Output vs. Percent 10-22
Stepping a Variable 10-23
Example 10-24
Output vs. % Selection Window 10-27
Model Input Editor 10-30
Raw Table Editor 10-31
Initiating the Output vs. Percent Analysis 10-33
Output vs. % Plot 10-34
Selecting Variables 10-35
Changing the Output Response 10-35
Sensitivity vs. Percent 10-36
Sens. vs. % Selection Window 10-37

viii Insights User’s Guide


Section

Initiating the Sensitivity vs. Percent Analysis 10-38


Sensitivity vs. % Plot 10-39
Selecting Variables 10-40
Changing the Sensitivity 10-40

Chapter 11: What Ifs


What Ifs Main Window 11-1
Display Components 11-3
Menu Bar 11-5
File Pull-Down Menu 11-6
Action Pull-Down Menu 11-6
Edit Pull-Down Menu 11-6
Mode Option Menu 11-6
Patterns Option Menu 11-7
Source Option Menu 11-7
Other Controls 11-9
Continuous Update 11-9
Edit 11-9
Report 11-10
View 11-12
Stop, Step, Run, Current Row, Run Speed 11-14
Stripcharts 11-17
Setpoint Editor: Inputs 11-20
Values, Clamping, Priority 11-21
Statistics 11-22
Constraints 11-22
Error Computation 11-22
Confidence 11-22
Control Buttons 11-23
Setpoint Display Bar: Inputs 11-23
Setpoint Editor: Outputs 11-23
Values 11-24
Statistics 11-25
Constraints 11-25
Error Computation 11-25
Confidence 11-25
Control Buttons 11-27
Setpoint Display Bar: Outputs 11-27
What Ifs Checklist 11-27

Insights User’s Guide ix


To Predict Outputs 11-28

Appendix A: Transform Reference


General Information and Conventions A-1
Moving Window Transforms A-2
Common Transforms and Constants A-3
Random Number Transforms A-26
Gaussian Random A-27
Transforms for Batch Processes A-27
System-Generated Transforms A-30
Transform Finder A-33
Date/Time A-34
Editing A-34
Filtering/Smoothing A-36
Math & Statistics A-36
Miscellaneous A-38
Plot Cuts A-38
Signal Processing A-40
Status A-40
Strings A-40
Type Conversion A-41
Type Forcing A-42

Appendix B: Error Measures

Appendix C: Frequently-Asked Questions

Appendix D: Files
Log Files D-1
Data Dictionary File D-2
Print Files D-2
Format File Suffixes D-2
Dataset File Suffixes D-3
Dataset Report File Suffixes D-3
Model File Suffixes D-3
Model Report and Data File Suffixes D-4

x Insights User’s Guide


Section

Appendix E: User-Defined Transforms

Index

Insights User’s Guide xi


xii Insights User’s Guide
Preface
• How This Manual Is Organized,
page xiii
• Release Notes, page xiv
• Year 2000, page xv
• Input Data Restriction, page xv
• Number of Windows, page xv
• Technical Support, page xv

How This Manual Is Organized


This manual is organized into the following chapters and appendices:
Chapter 1, Introduction, provides a system overview. The chapter also includes a
general road map to guide you through process analysis.
Chapter 2, Data Extractor Wizards, describes the data extractor wizards, which you use
to read raw data straight from a DCS or historian when building a dataset.
Chapter 3, File Editor, explains how to use the text editor.
Chapter 4, File Formatter, explains how to use the formatter to describe the format of
your data files.
Chapter 5, Spreadsheet, explains how to create a dataset from one or more formatted
files, and to display the dataset and manipulate it in the spreadsheet.
Chapter 6, Data Plotter, explains how to display graphical representations of
spreadsheet data.

Insights User’s Guide xiii


Chapter 7, Transform Calculator, explains how to use the transform calculator to add
mathematical transforms to a dataset.
Chapter 8, Building a Model, explains how to build a model.
Chapter 9, Model Trainer, explains how to train a model.
Chapter 10, Model Analysis Tools, explains how to use a trained model to analyze your
system.
Chapter 11, What Ifs, explains the advanced What Ifs analysis feature.
Appendix A, Transform Reference, is the reference pages for the transform functions.

Appendix B, Error Measures, contains definitions and equations for relative error, R2,
and other error measures.
Appendix C, Frequently-Asked Questions, provides tips and hints.
Appendix D, Files, lists and briefly describes the files created or required by the
product.
Appendix E, User-Defined Transforms, explains how to develop and add your own
transforms to the transform calculator.

Note: Figures in this document are intended only to illustrate the


appearance of the graphical user interface. Files and data that appear
in the illustrations are chosen solely for display purposes, and are not
intended to present a complete or consistent example. Title bars of
windows appearing in figure may not match the title bars you see on
your system.

Some features require the purchase of additional licensing; for more information,
contact your sales representative.

Release Notes
Current release notes are provided with every installation. Be sure to consult the release
notes before using the product.

xiv Insights User’s Guide


Section

Year 2000
All Pavilion products (except versions of Process Insights® and Software CEM® earlier
than version 1.5) have been rigorously tested for Year 2000 Compliance. For further
details, contact your customer support representative.

Input Data Restriction


This product has been extensively tested using data values no larger than the IEEE
standard single precision maximum real number, 3.40282347e+38. On some machines,
raw input data any larger than the square root of this number may cause undefined
results. You should always use the statistics function (located in the data spreadsheet) to
check data values before using a dataset to build a model.

Number of Windows
This product does not limit the number of windows that you can display simultaneously,
but many windowing systems have such a limit, often about 20. Consult your system
administrator for more details.

Technical Support
If you have problems or questions about the product, contact your customer support
representative.
The Pavilion software distribution includes a script, pav_info, that displays
information about your computer’s hardware and software configuration. If you report a
problem with your Pavilion installation, your customer support representative may
request that you run this script and send the output to help with diagnosis. You may run
the script at any time if you feel curious. The pav_info script is located in the same
directory as other Pavilion executables.

Insights User’s Guide xv


Legal Notices June 1999
Pavilion Technologies, Inc. has made substantial efforts to ensure the accuracy of this document. Pavilion
Technologies, Inc. assumes no responsibility for any damages that may result from errors or omissions in
this document. The information in this document is subject to change without notice and should not be
construed as a commitment by Pavilion Technologies, Inc.
The software described in this document is furnished under a license and may be used or copied only in
accordance with the terms of such license.
Copyright Pavilion Technologies, Inc., 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999. All Rights
Reserved.
The following are registered trademarks of Pavilion Technologies, Inc.: Data Insights, OnLine Transform
Processor, Pavilion, Pavilion Data Interface, Process Insights, Process Perfecter, Sensor Validator,
Simulation Insights, Soft CEM, Soft Sensor, Soft Sensor Insights, Software CEM, Pavilion - Turning Your
Data Into Gold.
The following are trademarks of Pavilion Technologies, Inc.: BOOST, Boiler OnLine Optimization
Software Technology, DataIns, Economic Insights, Insights, OnLine Learning, Pavilion OnLine
Applications, Pavilion RunTime Products, PDI, Plant Optimizer, Power Insights, Power Insights Suite,
Process Optimizer, ProcIns, Production Chain Optimization, Programless OnLine Engine, Property
Predictor, RunTime Application Engine, RunTime Software Controller, Simulation Insights, Soft Sensor
Insights, Virtual OnLine Analyzer, VOA.

Advant, AdvaCommand, AdvaInform, and AdvaBuild are registered trademarks of ABB Asea Brown
Boveri Ltd, Switzerland. Enterprise Historian is a trademark of ABB Asea Brown Boveri Ltd, Switzerland.
AIM is a trademark of W.R. Biles & Associates, Inc.
CM50S and TDC 3000 are trademarks of Honeywell Inc.
Exceed is a trademark of Hummingbird Communications, Ltd.
Foxboro and I/A Series are registered trademarks of The Foxboro Company.
GLOBEtrotter, GLOBEtrotter Software, FLEXlm and Flexible License Manager are registered trademarks
of GLOBEtrotter Software, Inc.
HP, Apollo, and HP-UX are registered trademarks of Hewlett-Packard Company.
IBM, RS-6000, and AIX are trademarks of International Business Machines Corporation.
Motif, OSF/1, UNIX and the "X" device are registered trademarks and IT DialTone and The Open Group
are trademarks of The Open Group in the US and other countries.
OpenVMS, VAX, DEC, and DECnet are trademarks of Digital Equipment Corporation.
PI and PI-ProcessBook are trademarks of OSI Software, Inc.
PostScript is a trademark of Adobe Systems Incorporated.
Section

Sentinel and Sentinel SuperPro are registered trademarks of Rainbow Technologies, Inc.
SUN, Sun Microsystems, and Solaris are registered trademarks of Sun Microsystems, Inc., and SunOS is a
trademark of Sun Microsystems, Inc.
Windows NT, Windows, Excel for Windows, and Notepad are registered trademarks of Microsoft
Corporation.
X Window System is a trademark of the Massachusetts Institute of Technology.

Insights User’s Guide xvii


xviii Insights User’s Guide
1
• Getting Started, page 1-2
• Product Overview, page 1-3 1 Introduction

Pavilion Insights™ gives you the power to analyze complex linear and nonlinear
processes. Using just your historical process data, Insights analysis tools can:
• Plot process data in a variety of revealing formats: chronological order, variable
against variable, histogram, principal components analysis (P.C.A.), and
correlation.
• Compute basic statistics or use the transform calculator to perform advanced
analysis.
• Rank the effect of process inputs on the outputs.
• Perform what-if scenarios (predictions) using a linear or nonlinear model.
Insights provides all the tools required for complete process analysis: data formatter,
data spreadsheet, data plotter, nonlinear model builder and trainer, model analysis tools,
and what-if scenario screen. The spreadsheet and plotter are integrated with a transform
calculator that not only captures your modifications as equations for further tuning, but
also offers an extensive library of mathematical and practical functions.
When you have learned all you can learn from your data using the advanced statistical
and plotting capabilities, use the auto modeler wizard to build a model quickly and

Insights User’s Guide 1


easily. The model reveals details not readily visible in the raw data or available using
traditional statistical analyzers. Using the model, you can investigate:
• Time delays.
• Cause and effect relationships.
• Response of an output variable to an input variable throughout the entire operating
range.
• Sensitivity of an output variable to each input variable.
• Nonlinear correlations between input and output variables.
• Process behavior under hypothetical circumstances.
Learn to use Insights for process analysis by following the tutorial. The tutorial starts
with historical data because that is where process analysis begins. You build a single
dataset from the provided sample data files, and then you plot the data in various
formats and perform advanced statistical analysis. The tutorial then introduces the auto
modeler wizard, which you use to build a linear or nonlinear neural network model of
the process. Using the model, you can investigate the time delays in the model as well as
the sensitivity of the outputs to the inputs. Finally, you use the model for performing
what-if scenarios. After completing the tutorial, you are ready to perform analysis on
your own process.
Designed using advanced COM technology, Insights is an automation server that you
can embed in other applications such as Microsoft Excel for Windows®.

Getting Started
Start Insights from the Start menu: select Start > Programs > Pavilion Technologies >
Insights.

2 Insights User’s Guide


Section

Product Overview
The main screen provides tools and other features through pull-down menus and a
quick-access toolbar. It can also bring up your browser for displaying instructions and
help.

Click the Tutorial or Road Map links to display them in your browser. For more
information on the tutorial and road map, see the Insights Tutorial.

File Pull-Down Menu


The File pull-down menu provides the following operations:
New
Create one of the following:
File Format
Use the formatter to prepare a raw data file for inclusion in a dataset. This
operation performs the same function as Tools > File Formatter > New. For
more information, see Chapter 4, File Formatter.

Insights User’s Guide 3


Dataset
Create a dataset from existing formatted files or from data read directly from a
DCS or historian using a data extractor wizard. For more information, see
Chapter 4, File Formatter and Chapter 2, Data Extractor Wizards.
Model (active only if a dataset is loaded)
Build a model using either the auto modeler wizard (prediction model only) or
the full-feature model building tools. The wizard is faster because it prompts
you only for the most essential parameters, using default values for the others,
before proceeding through the configuration and training steps without
requiring further interaction. The full-feature model building tools, on the
other hand, provide access to all parameters, including time delays, linear/
nonlinear connections, training modes, and more. For more information on the
auto modeler wizard, see the tutorial. For more information on building and
training models by hand, see Chapter 8, Building a Model, and Chapter 9,
Model Trainer.
Open
Load one of the following:
File Format
Opening a file format is equivalent to selecting Tools > File Formatter > Edit.
See also Chapter 4, File Formatter.
Dataset
Once a dataset is open, it is available for use by tools such as the spreadsheet,
plotter, transform calculator, model builder, model trainer, and auto modeler
wizard. Opening a dataset brings up the spreadsheet. If you close the
spreadsheet, the dataset remains open.
Model
Once a model is open, it is available for use by tools such as the model builder,
model trainer, model analysis tools, and What Ifs scenarios tool. Opening a
model displays the model statistics window. If you close the window, the
model remains open.
Clear
Unload (close) a dataset or model. If you changed the dataset or model during the
session without saving it, the operation prompts you to save it before closing it. You
do not have to save a model during or after training because the model trainer saves
training as it proceeds.

4 Insights User’s Guide


Section

Save
Write the dataset or model to disk. The dataset or model remains loaded. You do not
have to save a model during or after training because the model trainer saves
training as it proceeds.
Exit
Unload any dataset or model and terminate Insights. If you have made changes to
an open dataset or model but not saved the changes, Insights prompts you to save
them before exiting. If you have been training a model, you do not have to save the
training; the model trainer saves the training to disk as it proceeds.

Edit Pull-Down Menu


The Edit pull-down menu provides the following operations:
Cut
If you have selected text or objects, copy the text or object into the internal editing
buffer and then remove the text or object from its current location.
Copy
If you have selected text or objects, copy the text or object into the internal editing
buffer without removing the text or object from its current location.
Paste
If any text or object is in the internal editing buffer, insert it at the cursor location.

View Pull-Down Menu


Display and modify window layout.
Customize Toolbar
Change the selection or arrangement of tool-invocation buttons appearing in the
toolbar. The toolbar is the row of buttons at the top of the main screen.

Insights User’s Guide 5


Use the toolbar as a shortcut to invoke the various tools, which are also accessible
in the Tools pull-down menu. To display the label for a toolbar button, hold the
mouse pointer over the button:

Another way to bring up the Customize Toolbar window is by double-clicking in a


blank area of the toolbar, such as the area between the Transform Calculator and
Predicted vs. Actual buttons:

In the Customize Toolbar window, the Toolbar Buttons list shows the current layout
of the toolbar. The Available Buttons list shows the toolbar elements that you can
add. By default, all tool buttons appear in the toolbar, so the only available toolbar
element is the separator. Use the Add and Remove buttons in the middle to move
toolbar buttons into or out of the toolbar. Use the Move Up and Move Down
buttons to change the location of buttons in the toolbar.

Your changes to the toolbar remain until you terminate Insights. The next time you
start Insights, the toolbar is reset to its default.

6 Insights User’s Guide


Section

Tools Pull-Down Menu


The Tools pull-down menu provides access to the utilities described in the following
sections.

File Editor
Create, display, or modify an ASCII text file such as a raw data file. The file editor is the
only Pavilion utility that can change your raw data files. The formatter does not change
raw data files. For more information, see Chapter 3, File Editor.

File Formatter
Use the formatter to describe the format of data files so that the spreadsheet can include
them in datasets.
The formatter can read data files in a wide variety of formats, allowing you to specify
column separators, designate row types, set column data types, associate data columns
with date-time columns, and more. When you finish specifying the format of a file, the
formatter writes a format file describing the organization of the data file. The formatter
does not change the original raw data file.
The File Formatter submenu provides these operations:
New
Format a file starting with the default format settings. This operation performs the
same function as File > New > File Format.
Copy
Format a file starting with format settings already defined for another file.
Edit
Modify the format for an already-formatted file.
Delete
Delete the format file for a data file
For more information, see Chapter 4, File Formatter.

Data Spreadsheet
Use the spreadsheet to display and modify the contents of a dataset. The data
spreadsheet supports mathematical transform-generated columns and provides access to
data both in its before-transforms state and in its after-transforms state. The transforms

Insights User’s Guide 7


themselves are accessible through the transform calculator, described below. The
spreadsheet is closely integrated with the plotter and transform calculator so that any
change you make in one is immediately visible in the others. For more information, see
Chapter 5, Spreadsheet.

Data Plotter
Display and modify an existing dataset graphically. The plotter offers a variety of plot
types, including chronological order (time series), variable against variable (XY),
histogram, principal components analysis (P.C.A.), and correlation. The plotter also
provides a variety of tools for modifying data. The plotter is closely integrated with the
spreadsheet and transform calculator so that any change you make in one is immediately
visible in the others. For more information, see Chapter 6, Data Plotter.

Transform Calculator
Display and modify the transform list for an existing dataset. Any modifications you
make to the dataset using the plotter or, in some cases, the spreadsheet, are accessible as
equations called transforms. You can, for example, clip ranges of values simply by
selecting them in a plot; then you can use the transform calculator to review the
transform implementing the clip and modify it as needed. You can create new variables
by applying mathematical transforms to existing variables. The library of transforms is
extensive, and you can add your own transforms if required. The transform calculator
also allows you to time-merge data so that sampling intervals are uniform and missing
values have been restored using your choice of interpolation or extrapolation technique.
The transform calculator is closely integrated with the spreadsheet and plotter so that
any change you make in one is immediately visible in the others. For more information,
see Chapter 7, Transform Calculator.

Model Builder
Build a model using either the auto modeler wizard, which uses default settings for a
number of model configuration and training parameters, or build a model using the
manual model configuration tool. You must load a dataset before you can build a model.
For more information, see Chapter 8, Building a Model.

Model Trainer
Tune a new or existing model using an existing dataset. For more information, see
Chapter 9, Model Trainer.

8 Insights User’s Guide


Section

Model Analysis
Use the model analysis tools to check the fidelity of your models after they have been
trained and to perform process analyses. Select one of:
Predicted vs. Actual
Plot predicted output values against corresponding actual output values. This plot
shows the accuracy of the model over the range of data used to train the model and
also allows you to calculate residuals, the differences between predicted values and
actual values. Analyzing the distribution of the residuals can provide insight into
how well models have generalized to the process being modeled.
Sensitivity vs. Rank
Plot the sensitivity of the output to inputs.
Output vs. Percent
Plot output values against the full range of input values.
Sensitivity vs. Percent
Plot the sensitivity of the output against the full range of input values.
What Ifs
Perform what-if scenarios (predict outputs) using an existing model.
For more information, see Chapter 10, Model Analysis Tools and Chapter 11, What Ifs.

Plug-Ins
Pavilion plug-in technology makes it easy to add wizards, data extractors, and other
functionality to Pavilion products. Plug-in software is released separately from other
Pavilion products. Display the plug-ins by selecting Tools > Plug-Ins. For data extractor
plug-ins, see File > New > Dataset. For modeling plug-ins, see File > New > Model or
Tools > Model Builder. For help, see Help > Plug-Ins. For more information on the
currently-available plug-ins, contact your customer support representative.

Help Pull-Down Menu


The Help pull-down menu provides the following options:
Tutorial
A series of exercises that teach the procedures and tools used for process analysis.
See also the Insights Tutorial printed manual.

Insights User’s Guide 9


Road Map
A task-oriented approach to process analysis. The road map also appears in the
Insights Tutorial printed manual.
Help Topics
Complete reference documentation for software features and tools. These are the
same topics found in the Insights User’s Guide.
Plug-Ins
How to use plug-ins such as model building wizards and data extration wizards. For
more information, see “Plug-Ins” on page 1-9 in the preceding section.
Pavilion Home Page
Visit our web site, http://www.pavtech.com, for general information
including product and industry briefs.
About Pavilion Insights
Software copyright and version information.
You can also enter the help system by clicking the Help button in any window.
By default, the help system displays information for the window where you invoked
help. You can select help on any other topic in the Index of Help Topics. If you want to
select from the list, double-click on a topic, or click on it once and then click Help on
Selected Topic. All topics (except “Using the Help System”) are identified by a number
as well as a title; if you prefer, instead of clicking on a topic in the index, you may type
its number in the Selected Topic box. This method may be preferable if you are tracing
related topics cited in cross-references.
If you want help on a word or phrase that does not exactly match one of the titles listed
in the index (for example, a transform function name), simply type it in the Selected
Topic box and click Help on Selected Topic. The help system finds any help text that
contains your selection.
To restore the complete topic index, click Show All Topics.
In addition to the help system, many windows show, at the bottom of the window,
positional help for whatever part of the window is indicated by your mouse pointer.

10 Insights User’s Guide


2
Data Extractor
2

Wizards

Using a data extractor wizard to read data straight from your DCS or historian allows
you to build a dataset without having to generate and format raw data files. All you do is
specify the range of time for which you need data and the time interval separating the
data samples. The data extractor wizard reads the required data and builds the dataset.
To run a data extractor wizard, select File > New > Dataset.
The New Dataset window shows icons for data extractors. It also has an icon for the file
formatter, which you use to prepare raw data files if necessary. For more information,
see Chapter 4, File Formatter.

ABB Enterprise™ Historian


Use the ABB Enterprise™ historian data extractor to create a Pavilion dataset or a raw
data file using data for specified logs and time intervals. To access the data extractor,
bring up the New Dataset window: select File > New > Dataset.

Insights User’s Guide 1


In the New Dataset window, the following icon represents the ABB Enterprise historian
data extractor wizard.

To start the wizard, select the icon and click OK, or simply double-click the icon.

2 Insights User’s Guide


Section

DataDirect and PDI Options Window


In any window of the wizard, you can click Options to display and set a variety of
options for accessing the historian.

The default values are probably correct for your computer. If not, contact your system
administrator.
In the File Setup section, the history object file should list logs that you intend to access.
If necessary, use the Browse button to locate an alternative history object file. If the logs
you need to access do not appear in the history object file, click Edit to bring up an
ASCII text editor so that you can add the desired entries to the file.
Click OK to close the Options window.

Login Window
In the Introduction window of the wizard, click Next to proceed to the Login window.

Insights User’s Guide 3


If you do not have a valid user name and password, or if you do not know the host name
for the historian, see your system administrator.

To specify the port number for login, click Options.


In the Login window, click Next to log onto the Enterprise Historian and continue to the
Select Logs window.

4 Insights User’s Guide


Section

Select Logs Window


Use the Select Logs window to specify the logs you want extracted.

First, enter a search mask to match log names.

Note: Searches are case sensitive. The search strings must match the
case of the log names in the historian.

In the Mask field, use the asterisk (*) to match any number of characters in the name.
For example:
pav*
List all logs starting with pav.
Pav*
List all logs starting with Pav.

Insights User’s Guide 5


*pav*flow*
List all logs containing the strings pav and flow, in that order.
*
List all logs.
After entering a mask, click Search. Matching log names appear in the list on the left,
Historian Logs Matching Mask. Examine the list. Select those you need to extract by
clicking and dragging on them. Then click the right-pointing arrow to move selected
logs to the list on the right, Project Logs To Be Extracted. You can also move a log to
the list on the right by double-clicking it.

In this manner, you can continue to search and select logs for extraction.
The set of logs that you can access is determined by your computer’s history object file,
which you can specify by clicking Options. The Options window (described on page
2-3) also allows you to edit a history object file, adding entries for logs you need to
access.

6 Insights User’s Guide


Section

If you intend to extract data from the same logs again in the future, you may want to
save the current log list by clicking Save Project Logs. This operation saves the list of
logs to a file. Later, instead of searching and selecting logs again, you can load this list
using the Load Project Logs operation.
When the list on the right, Project Logs To Be Extracted, lists all the logs you need,
click Next to continue to the Select Date and Times window.

Select Date and Times Window


Use this window to specify the start and end times for data extraction. You also specify
the time interval between data samples.

Note: The start and end times and the extraction interval determine
the size of your dataset. Make sure you have enough disk space for
the amount of data you intend to extract.

Insights User’s Guide 7


By default, the wizard extracts values for one interval, the one that you specify in the
fields in the top of the window. If you need to extract data for multiple intervals, click
Use Multiple Intervals and click Add To List to compile a list of intervals for extraction.

When the Use Multiple Intervals feature is turned on, only the intervals appearing in the
list will be extracted. If Use Multiple Intervals is turned off, only the interval specified
in the date, time, and interval fields at the top of the window will be extracted.
After specifying the intervals to extract, click Next to continue to the Data Validation
window.

8 Insights User’s Guide


Section

Data Validation Window


Use this window to verify that the historian contains data for the logs and intervals that
you have specified. The wizard validates a log by retrieving a sample of 500 values for
each extraction interval.

To validate all logs, click Validate All Logs. The wizard indicates the results of the
validation check in the Status field:
Unknown
Validation not yet performed for this log. Status is set to Unknown whenever you
enter the Data Validation window from a previous window in the wizard.
Good
Extraction of all 500 validation values was successful.
Missing
Some validation values were missing. Review the sample data (see below), and
consider returning to preceding wizard windows to modify your log list and interval

Insights User’s Guide 9


specifications. This status does not indicate a fatal error: data extraction succeeds
even if values are missing.
Bad
The log was not found. Verify that the desired log is on the historian. Keep in mind
that log names are case sensitive. This status indicates a fatal error: data extraction
fails if a log is not found.

Note: Data extraction fails if a log is not found on the historian. You
must resolve a Bad status before continuing.

When you select a single log, the operations for validating selected logs, removing logs,
and viewing sample data become active. When you select multiple logs, only the
operations for validating and removing selected logs become active.
When validation has returned a Good or, where acceptable, Missing status for each log,
click Next to continue to the Extract and Save Data window.

10 Insights User’s Guide


Section

Extract and Save Data Window


Select a format and name for the extracted data.

The Dataset format is intended for use with Pavilion products and tools such as the
spreadsheet, plotter, model trainer, and so forth.
The ASCII format is a common text file, or raw file. For the ASCII format, you can
specify column headings and separators, and the string to substitute for error values.
If you choose to extract the data into the ASCII format, you can still use the ASCII files
to build a Pavilion dataset later.
For either format, specify a file or dataset name and path name. If the file or dataset
already exists, you are prompted before overwriting it.
Click Finish to begin the extraction.

Insights User’s Guide 11


What the Wizard Does
During data extraction, a progress window appears.
If the PDI network server is not running on the port specified in the Options window, an
error resembling the following appears instead of the progress window.

Click Options to verify and change the port used for communicating with the PDI
network server.
The PDI network server handles the data connection between the data extractor and the
Enterprise Historian. If the server is not available at port 8764 on your computer,
contact your system administrator. After correcting the problem, restart data extraction
by clicking Finish in the Extract and Save Data window.
To interrupt extraction at any time and return to the wizard, click Cancel Data
Extraction.
If you are extracting into a dataset, the wizard loads the dataset into the spreadsheet
upon completion. If the dataset does not exist in your data dictionary, a prompt appears,
asking if you wish to add it.

12 Insights User’s Guide


3


Selecting a File to Edit, page 3-3
Creating a New File, page 3-5 3 File Editor
• Editing a File, page 3-6
• Menu Bar, page 3-11
• Exiting the Editor, page 3-14

Use the editor to display and modify common ASCII text files. To invoke the editor,
select Tools > File Editor.

Note: Where the term “end-of-line sequence” appears in this


document, it represents whatever character sequence marks the end
of a line of text on your operating system. On UNIX and OpenVMS
systems, text lines end with the newline character (ASCII 10 decimal).
On Windows systems, text lines end with the carriage return and
newline characters (ASCII 13 and 10 decimal, respectively). The
Pavilion text editor represents the newline character as \nl\ and the
carriage return character as \ret\.

The editor is line oriented and allows you to view and manipulate special characters. It
also includes a global search and replace feature. The editor is intended principally for

Insights User’s Guide 1


adjusting raw data files that cannot be parsed by the formatter; but it can also be used to
view report files, or to view and edit any other files.

2 Insights User’s Guide


Section

Selecting a File to Edit


You can edit an existing file or create a new file. Creating a new file is explained in
“Creating a New File” on page 3-5. To select an existing file, drag on the File menu in
the menu bar and select Load File. The File Browser appears.

The File Browser is used to select a file to be loaded into the editor. It displays a list of
files and subdirectories in your current directory (on OpenVMS systems, it displays and
can access only the most recent version of any file).

Insights User’s Guide 3


Changing Directories
If the file that you want to edit is not located in your current directory, you can change to
another directory, and its files and subdirectories will be displayed in the list. There are
a number of methods to change the current directory:
• Drag on the Directory Option menu to display the complete directory structure
above the current directory. Use this menu to move up one or more directory levels.

• After you have made one or more directory changes, click Previous to move back
to your most recent previous directory selection, or click First to move back to the
first directory you were in when you entered the File Browser.
• Click on any subdirectory and click Load to move down to that subdirectory.
• Double-click on any subdirectory to move down to that subdirectory.
• Click in the Directory text box, type in the full path of any directory, and press the
Return key, to move to that directory (On Windows NT systems, this field is also
used to navigate to undisplayed network drives).

Loading a File
You can enter a File Mask to display only those file names that match a particular
format. The wildcard character is asterisk (*) for any number of arbitrary characters.
Anything that you type in this text box, including backspacing over its contents, is not
processed unless you press the Return key while the cursor is still in the text box.
If the file that you want to edit appears in the list, double-click on it, or click on it and
click Load; or you can type in the name of any file in the displayed directory, or the full

4 Insights User’s Guide


Section

path and filename of any file, and press the Return key. The selected file’s contents will
appear in the editor window.

Creating a New File


You can edit an existing file or create a new file. Selecting an existing file is explained
in “Selecting a File to Edit” on page 3-3. To create a new file, drag on the File menu in

Insights User’s Guide 5


the menu bar and select Create New File. This first invokes a prompt box, for you to
enter the name of the file to be created.

After you type a name and click Create, the editor is filled with a single line containing
the end-of-line sequence. You can use the Append function in the Edit menu, described
on page 3-14, to add more lines.

Editing a File
The contents of your file are displayed. The Show Special Characters toggle controls
whether nonprintable characters are represented in the file display area. These are the
codes that indicate special characters:

Displayed Additional Codes That Are Note


Code Recognized If You Type Them
\\ this means a single backslash character
\ff\ \formfeed\
\nl\ \newline\
\nul\ \null\ this means the null character
\ret\ \cr\ or \return\ this means a carriage return character
\tab\
\376\ other nonprintable characters are
(for example) displayed as their octal representation
between backslashes

6 Insights User’s Guide


Section

With Show Special Characters turned on, the example file shown on page 3-5 would
look like this:

To select one row of the file, click on it (either on its contents or on its row number). To
select a contiguous group of rows, drag through them, or click on the first one, scroll if
necessary, and shift-click on the last one. The line number(s) selected are displayed in
the information area above the line edit field.
You can edit one row at a time by selecting it. If only one row is selected, its contents
are copied into the line Edit field. Special characters are always displayed in the edit
field, regardless of the state of the Show Special Characters toggle. To change the
contents of a line, select it, then click in the edit field and backspace and type. To apply
changes from the edit field, click in the edit field and press the Return key; to cancel
changes, make a new row selection without pressing the return key.

Note: Pressing the return key simply enters the changes you have
made; it does not automatically put the end-of-line sequence at the
end of the line. Be sure to leave the proper end-of-line characters at
the end of the line, or the line will be combined with the one that

Insights User’s Guide 7


follows it. On Windows systems, indicate end-of-line with the carriage
return-newline sequence (\ret\\nl\) . On UNIX and OpenVMS
systems, indicate end-of-line with just the newline character (\nl\).

The Go To button is used to scroll to a row number that you specify. It invokes a prompt
box for you to fill in the row number. When you click Go, the file is scrolled.

For additional editing functions, see “Edit Pull-Down Menu” on page 3-14.

8 Insights User’s Guide


Section

Search and Replace


The Search and Replace button in the Editor window invokes the Search and Replace
window.

You can use this simply to search, or to search and replace. Type in the text that you
want to Search For and Replace With. If you want to specify any special characters, use
the codes listed in the table on page 3-6.

Insights User’s Guide 9


You can optionally type in one character to be used as a “wildcard” matching any
number of characters, and/or one character as a wildcard matching exactly one
character. If you search like this:

the search will find any instance of a, followed by any number of any characters except
newline, carriage return, or formfeed, followed by b. For a wildcard search across lines
in the file, you must explicitly specify the end-of-line sequence.
The following sample screen shows the end-of-line sequence for UNIX and OpenVMS
systems:

The following sample screen shows the end-of-line sequence for Windows systems:

10 Insights User’s Guide


Section

The Case Sensitive toggle controls whether the search is case-sensitive. The Replace
With text can be left empty if you want to search for text and delete it.
The Search Region specifies the rows affected by the search. If you had already selected
a group of rows before you opened this window, that selection is the default Search
Region; otherwise, the default Search Region is the entire file. Searching can be
Forward or Backward from the current position.
The Search button finds and highlights the next occurrence of the specified Search For
string, without changing it. The Replace button replaces the next single occurrence. The
Replace All button displays the number of occurrences found, and allows you to
Replace them all or cancel. The Count button displays the number of occurrences,
without any replacement. The Done button closes the Search and Replace window.

Menu Bar
The Editor menu bar provides the File pull-down menu and the Edit pull-down menu.

File Pull-Down Menu


The File pull-down menu provides the following operations:
Load File
Load File invokes the File Browser, as described in “Selecting a File to Edit” on
page 3-3. The file that you select is loaded into the editor.
Create New File
Create New File asks you to specify a name for the new file, and then creates it as a
single line consisting only of the end-of-line sequence (carriage return-newline on
Windows systems; newline on UNIX and OpenVMS systems). This is described in
“Creating a New File” on page 3-5. You can use the Append function in the Edit
menu, described on page 3-14, to add more blank lines.
Append File
Append File invokes the file browser, as described in “Selecting a File to Edit” on
page 3-3. The file that you select is appended to the file currently being edited.

Insights User’s Guide 11


Save File
Save File attempts to save the file with its current name. If it is not a new file that
you just created, you will be reminded that it already exists and asked about
overwriting.

A message tells you how many characters and lines were written.

Save File As
Save File As invokes a prompt box for you to type in a file name. You can type just
the file name, or the full path and name. After you specify a name, you are warned
about overwriting, and told how many characters and lines were written.

12 Insights User’s Guide


Section

Delete File
Delete File invokes the file browser, for you to select which file to delete.

The file browser is used to traverse your directory structure and select one file to be
deleted. It functions as described in “Selecting a File to Edit” on page 3-3. After
you select a file, you are asked to confirm.

Insights User’s Guide 13


If you click Delete, the file is permanently removed from your disk; on OpenVMS
systems, all versions of the file are removed.

Edit Pull-Down Menu


You must select one or more rows in the file before you can use the edit functions,
except Undo and Append. The Edit menu provides the following operations:
Undo
Revokes the one most recent editing change, whether applied from this menu, the
Search and Replace window, or the Edit text field.
Cut
Removes the currently selected lines from the file and saves them into an internal
buffer.
Copy
Saves the currently selected lines into an internal buffer, without removing them
from their current position.
Paste Before
Inserts the contents of the internal buffer above the currently selected line.
Insert
Puts new blank lines above the current selection; the number of lines inserted is
equal to the number of lines that were selected.
Delete
Removes the selected lines from the file, without copying them into the internal
buffer.
Append
Invokes a prompt box, asking how many lines to append. Type in a number and
click Append, and that many new blank lines are added to the end of the file.

Exiting the Editor


Click Done to end the editing session and close the Editor window. If you have made
any changes to the file and have not yet saved them, you will be asked about saving.

14 Insights User’s Guide


4


Data File Contents, page 4-2
Formatter Options, page 4-3 4 File Formatter
• Creating a New Format, page 4-3
• Copying a Format, page 4-20
• Editing a Format, page 4-25
• Deleting Formats, page 4-25
• Key Concepts, page 4-27

This chapter explains how to use the formatter to describe the format of your data files.
If you use a data extractor wizard to acquire raw data for building a dataset, you do not
need to use the formatter. To start a data extractor, select File > New > Dataset.
Most manufacturing processes have a mechanism for storing historical process data.
These “data historians” come in a multitude of forms from many different vendors, but
they all tend to have one thing in common: they can write out the data into ASCII text
flat files (columns of data with new items on each row). These files may be of a wide
variety of formats, but most commonly they are either space- or tab- or comma-
separated. Often they have one or more header lines (lines at the top of the file) that
describe the file and the columns.
Before the spreadsheet can read data, it must have information about the format of the
file so that it can read the data correctly. It is the job of the formatter to specify
information about flat files so that the spreadsheet can read the files. The formatter
stores this information in a format file, which contains simple information about the
format such as number of columns and rows, column delimiter, and name and data type
of each column. A formatted file is a data file that has been described in a format file.

Insights User’s Guide 1


The data dictionary includes a list of your formatted files and their associated format
files.
You invoke the formatter by selecting one of the options in the Tools > File Formatter
submenu.

Data File Contents


Data files contain the historical process data that you will use to build your model.

Required
As you prepare the data files, adhere to the following requirements for structure and
date/time information.

Structure
The basic assumptions about your data are:
• It is arranged in rows and columns.
• It is in an ASCII text file.
• Each “variable” (information from a particular data collection point in your
process) is stored in a different column.
Generally, successive rows contain information from successive points in time,
although this is not strictly necessary. Any given variable must be recorded in the same
column in every row. Beyond this basic requirement, the formatter and spreadsheet
allow wide flexibility in data file formats, as described below.

Date and Time Information


Many systems that store process data record the date and time with each sample. If a
data file does not include date and time information, then
• all data in a single row of the file must be sampled at the same time (or for the same
set of information),
• and, if there are time delays between the variables, then
– the sampling time-interval between rows must be uniform throughout the file,
– and you must know what that interval is.

2 Insights User’s Guide


Section

These special requirements apply only to a data file that does not include date and time
information attached to the variables.

Optional
The data file may include header rows that contain information about the variables’ Tag
Names, Comments, and Units.
Every variable is identified by a tag name. An optional comment may also be attached
to a variable, but the comment is not always visible throughout the product. If you will
ever use the Pavilion Data Interface™ to access your data in real time, we recommend
that you use the DCS tags as either the tag names or the comments. It is often
convenient to use the DCS tags as the comments, and brief descriptions such as
“flowrate” as the tag names.
You can also define Units for each variable, but except for date and time variables, this
information is ignored by the product and may be omitted.
If the data file has any other header rows, they can be skipped easily and do not cause a
problem.

Formatter Options
Use the formatter to describe the format of a data file, or change a format that you have
already specified; you can also remove a formatted file’s name from the data dictionary,
deleting its associated format file from the disk.

The formatter displays a data file interpreted according to its current format
specification, and allows you to change the format specification if you can see from the
display that the data file is not described correctly or sufficiently. The formatter is
described in detail beginning on page 4-5.

Creating a New Format


If you select New Format, the formatter begins with no knowledge of your file structure.
It scans your data file and tries to discern its format, and displays this interpretation for
you to verify and change as necessary. When you use this option, you should examine
every setting and verify that the format is specified correctly.

Insights User’s Guide 3


Selecting a File
The file browser appears when you select New Format. It is used to select a file to be
formatted. It lists all files and subdirectories in your current directory (on OpenVMS
systems, it displays and can access only the most recent version of any file).

Changing Directories
If the file that you want to format is not located in your current directory, you can
change to another directory, and its files and subdirectories will be displayed in the list.
There are a number of methods to change the current directory:

4 Insights User’s Guide


Section

• Drag on the Directory option menu to display the complete directory structure
above the current directory. Use this menu to move up one or more directory levels.

• After you have made one or more directory changes, click Previous to move back
to your most recent previous directory selection, or click First to move back to the
first directory you were in when you entered the file browser.
• Click on any subdirectory and click Select to move down to that subdirectory.
• Double-click on any subdirectory to move down to that subdirectory.
• Click in the Directory text box, type in the full path of any directory, and press the
Return key, to move to that directory.

Loading a File
You can enter a File Mask to display only those file names that match a particular
format. The wildcard character is asterisk (*) for any number of arbitrary characters.
Anything that you type in this text box, including backspacing over its contents, is not
processed unless you press the Return key while the cursor is still in the text box.
When the file that you want to format appears in the list, double-click on it, or click on it
and click Select. The formatter will open and load the selected file.

New Format
When you create a new format for a data file, the formatter scans and tries to interpret
the data file’s contents. The data file is then displayed according to this format. You can
change the format specification if you can see from the display that the data file is not
described correctly or sufficiently. When you click Done in any of the formatter

Insights User’s Guide 5


windows, the current format specification is saved to a format file. If you click Cancel,
the formatter is closed, and changes to the format specification are not saved.
The formatter is implemented as a series of three windows that successively refine a
format definition. From any of the three Steps, you can move to any of the other Steps,
or return to the Format Window (normally you will process each Step in order).

Step 1: Format Rows


Step 1 of the formatter displays your data file separated into rows. Based on the contents
of the rows, you may mark which rows are headers and which contain data, and fill in
information about how the data should be separated into columns. This is the
appearance of Step 1 when the formatter creates a new format file for a sample data file:

6 Insights User’s Guide


Section

The names of the file being formatted, its format file, and the data dictionary, are
displayed. To save the format file into a different file, or to use a different data
dictionary, type its name in the text box and press the Return key.
The first item to inspect is the Column Separator. You have the following choices:
Spaces (and Tabs)
Data items are separated by any nonzero number of spaces and/or tabs.
Comma (and spaces)
Data items are separated by one comma and any number of spaces.
Just One Tab
Data items are separated by exactly one tab. Two consecutive tabs are interpreted as
a missing data point. (Spaces are ignored.)
Fixed Width
Values begin at specific character positions; there are not necessarily any characters
between values.
Special
Data items are separated by some other character not listed above, such as “#”, and
spaces are ignored.
If you choose Fixed Width, then later, when you get to Step 2, you will need to set the
column widths. If you choose Special, click in the text box to the right, and type in it the
character that is the column separator.
The system fills in the number of columns based on the current value of the Column
Separator. You should always check the number of columns because, like the column
separator, it may have been derived incorrectly; if it is wrong, click in the text box,
backspace over the old value, and type in the correct value.
If the value for a particular point is missing or unavailable, some data systems will just
leave it blank, but some data systems will replace it with a character string such as
“####”. If your data file contains any particular character string that takes the place of
missing data, you should type it in the Missing Data box.

Row Flags
Row flags are displayed in this screen to the left of each row in the data file. Every row
may be assigned a flag, or its flag may be left blank to indicate that the row contains
data. If header rows in the data file contain Tag Names, Units, or Comments (described

Insights User’s Guide 7


on page 4-3), they should be flagged as such. Any other type of header row cannot be
processed, and should be flagged Skip.
When the formatter creates a new format file, it can only distinguish between readable
data rows (flag left blank) and nondata rows (automatically flagged Skip). If your data
file contains information such as tags, descriptions, and units, you should change the
row flags to mark it.
When you click in one Row Flag cell, or drag through the Row Flag cells for a group of
rows, the Flag option menu is enabled.

8 Insights User’s Guide


Section

Drag on this option menu to select the appropriate flag.

If more than one row is flagged Tag Name, the values in all Tag Name rows will be
appended to form the variable’s tag name; the same is true for Units and Comments.
In the formatter you cannot change any data values, but you can set the row flag so that
rows of bad data are skipped. To move to any particular row, click Go To, and a dialog
box will ask what row number you want to move to. When you click Go, the file is
scrolled until that row number is displayed.

After you have checked all parameters in Step 1 and made any necessary corrections,
move to the top area labelled Move to Step, and click 2. Format Columns.

Insights User’s Guide 9


Step 2: Format Columns
Step 2 displays your data file separated into rows and columns, based on the Column
Separator that you specified in Step 1.

The display is divided into two areas, separated by horizontal double lines. The top area
is the column information area. In a new format file the Tag Name, Comment, and Units
shown in this area are taken from the header rows that you marked with row flags, if
any; otherwise they are left blank. If the column separations are not marked correctly,
go back to Step 1 (move the mouse to Move to Step at the top of the window, and click

10 Insights User’s Guide


Section

1. Format Rows), make the correction, and then return to Step 2. If there is an error in
the row flags, you can correct it in either Step 1 or Step 2.
If you specified the column separator as Fixed Width, you must adjust the column
boundaries. In the column information area, move the mouse onto the vertical line
between columns; the pointer changes shape to a bidirectional arrow:

With the arrow pointer, press the mouse button and drag the column boundary to the
desired location.
The column information attached to each variable includes its tag name, comment,
units, type, and date/time reference. Even if this information was copied from flagged
header rows, you can still change it. The system fills in its interpretation of type and
date/time reference; you should always check these, and change them if necessary.
To change a variable’s tag name, comment, units, or date/time reference, click in its cell.
The name of the cell that you are editing will appear in the Selection area, its current
contents (if any) will appear in the Edit text box, and the cursor will move into the Edit
box.

Backspace and type as necessary to edit the text, and press the Return key to apply the
change. You can also make changes using the Edit menu functions, as described in “Edit
Menu” on page 4-16. If you enter new values in any cell, either by typing or by the Edit
menu functions, that cell is no longer affected by any row flags; if you want to change it
back, you must type or edit the old value back in.

Insights User’s Guide 11


Tag Name and Comment
Every process variable must be given a tag name eventually, and it may also have a
comment, as described on page 4-3. If you will ever use the Pavilion Data Interface to
access your data in real time, you should always use the actual tags as either the tag
name or the comment. If you will never access data in real time, you may specify any
names that you wish.
The formatter does not restrict the contents of tag names, but there are restrictions that
are enforced when the spreadsheet reads variables into a dataset. Tag names in a dataset
may not contain exclamation points (!), double-quotes ("), left curly braces ({), or right
curly braces (}), and may not be longer than 72 characters, and variables read into a
single dataset may not have duplicate names. If you have variable tag names that do not
conform to these restrictions, they can be corrected automatically as they are being read
into the spreadsheet (without having to return to the formatter).

Units
Units are ignored except for date and time variables; they are provided solely for your
convenience, and you may leave them blank if you wish. The system can understand
many common date/time formats, and for those you can leave the Units blank. If you
have a date/time format that the system does not automatically understand, you simply
type into the Units field a description of the format. Characters such as commas, colons,
hyphens, slashes, and so forth should appear exactly as they occur in the data.
Components of a date or time are indicated by the keys in the following tables. The keys
are not case sensitive.

Key Meaning
m Month number (1-12), no leading zeros
mm Month number (01-12), always 2 digits
mmm Three letter abbreviation for month name (Jan-Dec)
mmmm Month name fully spelled out
d Day number (1-31), no leading zeros
dd Day number (01-31), always 2 digits
y Year (1-2000), no leading zeros, as many digits as needed
yy Last two digits of year (00-99). See Note below.

12 Insights User’s Guide


Section

Key Meaning
yyyy Four digit year
www Three letter abbreviation for weekday name (Sun-Sat)
wwww Weekday name fully spelled out
j Day number in the year (1-366), no leading zeros
jjj Day number in the year (001-366), always 3 digits
k Week number (1-54), no leading zeros (see “Week Number,” below)
kk Week number (01-54), always 2 digits (see “Week Number,” below)
h Hour (1-12 or 1-24), no leading zeros.
hh Hour (01-12 or 01-24), always 2 digits.
m Minute (1-59), no leading zeros.
mm Minute (01-59), always 2 digits.
s Second (1-59), no leading zeros.
ss Second (01-59), always 2 digits.
t Tenths of a second (0-9), one digit only.
tt Hundredths of a second (00-99), always 2 digits.
ttt Thousandths of a second (000-999), always 3 digits.
a a.m. and p.m. indicated by a single a or p
p Alternative for “a”.
am a.m. and p.m. indicated by the letters am or pm.
pm Alternative for “am”.
a.m. a.m. and p.m. indicated by the characters “a.m.” or “p.m.”, with periods
and no spaces.
p.m. Alternative for “a.m.”.

Note: In two-digit year notation (yy), values between 00 and 49,


inclusive, are interpreted as corresponding to years 2000 through
2049. Values between 50 and 99, inclusive, are interpreted as
corresponding to years 1950 through 1999.

Insights User’s Guide 13


Week Number
Week 1 begins on January 1; the week number increments on each subsequent Sunday
and is not affected by changing from one month to the next. The week number can be 54
only when a leap year begins on a Saturday.
For example, in 1999, January 1 is Week 1 Day 1; Jan 7 is Week 2 Day 7; Jan 31 is
Week 5 Day 31; Feb 1 is Week 5 Day 1.
January 1999
S M Tu W Th F S
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31
February 1999
S M Tu W Th F S
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28

Examples

Type in
File Contents Interpretation
These Units
11/5/96 November 5, 1996 m/d/yy
11/5/96 May 11, 1996 d/m/yy

14 Insights User’s Guide


Section

Type in
File Contents Interpretation
These Units
96032 February 1, 1996 yyjjj
960201 February 1, 1996 yymmdd
233045 11:30 p.m. and 45 seconds hhmmss
1.0, 1.15, 1.30, 1.45 hours and minutes: 1:00, 1:15, 1:30, 1:45 h.m

If you encounter a date or time in a format that cannot be expressed in this notation, you
can use the editor to change the data file; or in some cases it may be better to tell the
formatter that the values are strings or numbers, and then use transforms to convert them
to date and time (see “Date/Time” on page A-34).

Type
When the formatter creates a new format file, it scans the contents of each column in the
data file and tries to interpret the data Type. You should always check these and change
them if necessary. To change a variable’s Type, click on its Type cell (or drag or shift-
click through multiple adjacent Type cells), and the Edit text box will change to an
option menu listing the available Types.

Most of the Types are self-explanatory. Double means double-precision real. DateTime
and TimeDate refer to columns with both date and time data, in the specified order, in a
single column. EurReal and EurDouble are real and double-precision numbers with

Insights User’s Guide 15


decimals marked by a comma, and thousands, millions, and so forth, optionally marked
by periods.

Date/Time
As explained in “Data File Contents” on page 4-2, data files do not always have to
include date and time information. If date and time information is provided, it can be in
two different columns or combined in one column (but if it is in two columns now, they
will be combined into a single column in the spreadsheet). Either date or time can occur
first. There can be more than one date/time point in each row of the data file; variables
in a single row of the data file do not have to be sampled all at the same time if each
variable is recorded with its sampling time.
The Date/Time reference of a date or time column is filled in as “n/a” and cannot be
changed. The Date/Time reference of a data column is the column number of its
corresponding Date and Time. (If the Date and Time are in two different columns, the
syntax is Date column number, slash, Time column number.) The system will fill in
default values which are the closest Date and Time columns to the left of the data
column. If the system was unable to recognize Date or Time columns correctly, you
must fill in the correct values.

Edit Menu
The Edit menu is available in Steps 2 and 3. Edit functions can be applied to all header
rows except Types (but not to the data rows). You must select a cell in one of these rows
before you invoke any Edit function (except Undo). Only one cell can be edited at a
time.
Undo (Ctrl-u)
Reverses the most recent editing change.
Cut (Ctrl-x)
Copies the contents of the selected cell to the edit buffer, and then erases the cell.
Copy (Ctrl-y)
Copies the contents of the selected cell to the edit buffer.
Paste (Ctrl-p)
Copies the contents of the edit buffer into the selected cell. If it already contained a
value, that value is lost (not appended).
Clear (Ctrl-b)
Erases the contents of the selected cell, without saving to the edit buffer.

16 Insights User’s Guide


Section

Insert (Ctrl-i)
Inserts a new, empty cell to the left of the selected cell, moving the rest of the row
one cell to the right, and losing the contents of the rightmost cell.
Delete (Ctrl-d)
Copies the contents of the selected cell to the edit buffer, deletes this cell, and
moves the rest of the row one cell to the left, leaving the rightmost cell blank.

Note: Any changes that you specify using this Edit menu are applied
to the format file, not to your data file. The formatter never changes
the contents of your data files.

Completing Step 2
After you have checked all parameters in Step 2 and made any necessary corrections,
move to the top area labelled Move to Step, and click 3. Verify Format.

Insights User’s Guide 17


Step 3: Verify Format
Step 3 of the formatter displays your data file as it will be read by the spreadsheet, so
you can verify that you have set all values correctly.

You cannot change a variable’s type in this step, but you can change any of the other
information that you set in Step 2; you can also change each variable’s display format.
If values in a date/time column are displayed as Error or as FmtErr, it means that the
units for that column do not match the column’s contents (or you left the Units blank,
and the file’s actual units are not one of the common formats that the system can
understand by default). Return to Step 2 so you can view the column’s contents, and

18 Insights User’s Guide


Section

type in the correct Units; see “Units” on page 4-12. If you cannot specify the units to
clear these errors, it may be necessary to read the file into the editor and adjust its
contents; see Chapter 3, File Editor.

Display Format
Display format is provided solely for your convenience. A variable’s display format
does not affect how it is read, interpreted, or stored internally; the display format simply
specifies how you want the data to appear on the screen. Display format does not apply
to Integer or String variables.
The display format for a Real or Double variable is an integer between 0 and 20 that
tells how many decimal places to display. The default display for these types is derived
from the number of decimals that occur in that column in the first few data rows in the
data file.
The display format for a Date variable has the same syntax as Date Units, as described
in “Units” on page 4-12, except that month and weekday name specifications are case
sensitive: the case of the first letter is copied, and the case of the second letter is used for
all subsequent letters. For example:

This Displays in This


Unit Style
mmm jan
Mmmm January
MMM JAN
www mon
Wwww Monday
WWW MON

If the Date’s Units are left blank, its default Display format is mm/dd/yy; if the Date’s
Units are filled in, its default Display format is the same as its Units.
The display format for a Time variable has the same syntax as Time Units, as described
in “Units” on page 4-12, except that times with am or pm indicators copy the case of the
am or pm specifier. If the Time’s units are left blank, its default display format is

Insights User’s Guide 19


hh:mm:ss; if the Time’s units are filled in, its default display format is the same as its
units.

Completing Step 3
When you have finished verifying that the file will be interpreted correctly, click Done
to save the file specification in a format file, record it in the data dictionary, and close
the formatter.

Copying a Format
The Copy Format option is for situations in which multiple data files have identical or
very similar formats. This option makes a copy of a format that you have already
specified. You can then edit this copy if necessary.

20 Insights User’s Guide


Section

The format is copied from one file that has already been formatted, to one or more files
that are not already formatted. Toggle buttons are used to select whether the files have
Identical or Similar formats.

Identical Format copies all values in the format file.

Similar Format Copies These Similar Format Does Not Copy


Items These Items
Row Flag (step 1 or 2) Number of Columns (step 1)
Missing Data characters (step 1) Changes to header rows by typing or by
Edit menu (step 2 or 3)
Column Separator (step 1) Changes to variable types (step 2)
Changes to column widths (step 2 or 3) Changes to Display format (step 3)

Insights User’s Guide 21


Selecting a Data File that Has Been Formatted
The format is copied from one file that has already been formatted. You can either type
its name and full directory path, or you can click Select Format to invoke the Select File
Format window.

This window lists all of the formatted files recorded in your current data dictionary,
sorted by directory, with number of rows and columns, and start date if known. The
name of your data dictionary is displayed; to change to a different data dictionary, type
its name in the text box. To select a formatted file from this list, double-click on it, or
click on it and click Select. Its name and path will be inserted in the Copy Format
window.
When making a selection from a list, if you notice that any listed file is obsolete, you
can click on it and then click Delete. The format file is deleted from your disk and the
data file’s name is removed from the data dictionary, but the data file itself remains on
your disk. If you remove a format file from this list, the window remains open so you
can continue the original purpose of selecting a formatted file.

22 Insights User’s Guide


Section

Selecting a Data File that Has Not Been Formatted


The format is copied to one or more data files. The files are selected by clicking Select
Files, which brings up the file browser.

This window is similar to the file browser described in “Selecting a File” on page 4-4,
but it has the ability to select multiple files. Click on any file in the Files list and click
the right arrow button, or just double-click on the file, and its name will be copied into
the Selection list. To remove a file name from the Selection list, click on it and click the
left arrow button, or just double-click on it. When the Selection list contains one or
more files that you want to copy the format onto, click Select, to close this window and
return to the Copy Format window.

Insights User’s Guide 23


Copying the Format
After you have filled in the names of files to copy the format from and to, and specified
Identical or Similar formats, you can click Edit and Apply to view all the files in the
formatter and make any necessary adjustments, or you can click Apply to make exact
copies of the format file without viewing the results.

When you select Edit and Apply to copy formats to multiple files, the formatter comes
up containing the first file in the list; after you click Cancel or Done, the formatter
comes up again containing the next file in the list. In this situation, the formatter is the
same as described in “New Format” on page 4-5, except that it includes two additional
control buttons, Cancel All and Done All. The regular Cancel and Done buttons pertain

24 Insights User’s Guide


Section

only to the format file currently being edited; the Cancel All and Done All buttons will
cancel or finish all files remaining.

Editing a Format
Use the Edit Format option to review and change any format that you have already
specified. Selecting the Edit Format option displays the Select File Format window.
This window lists all of the files that you have already formatted. It operates as
described in “Selecting a Data File that Has Been Formatted” on page 4-22. After you
select a file, the formatter is opened, containing the selected file and its current format
specification. You can cancel, or make changes and save them with the Done button.

Note: If a format has been saved with non-blank values in Tag Name,
Units, or Comment, and you edit the format, the values that you
already saved can be changed only by editing the individual header
cells; changing the Row Flags at this point has no effect.

Deleting Formats
If you no longer need to reference some of your formatted files, you can delete their
formats. The Delete Formats option invokes the Delete Formats window, for you to
select files. The selected files’ names are removed from the data dictionary, and their

Insights User’s Guide 25


format files are removed from the disk. Your data files themselves are not removed or
otherwise affected.

It is used to delete format files that the formatter generated to describe your data files,
and remove their entries from the data dictionary. It does not delete your data files.
The name of the current data dictionary is displayed. If you want to use a different data
dictionary file, click in this box, type in its name, and press the Return key.
The Files area lists all formatted files that are recorded in the data dictionary, sorted by
directory, with their corresponding format files. The Selection area lists the files that
you select. There are several methods to move files from one list to the other:
• click on a file and click the arrow button
• double-click on a file
• drag on a group of files and click the arrow button
• click the all=> or <=all button

26 Insights User’s Guide


Section

When the Selection list contains all the formats that you want to delete, click Delete. A
question box will ask you to confirm, and if you do, the formats are deleted and this
window is closed.

Key Concepts
• In Step 1, check & correct the column separators; mark the header (Tag Name,
Units, Comment) rows.
• In Step 2, check & correct the variable types and date/time pointers; optionally
override header values.
• In Step 3, check the interpretation of dates and times; if misinterpreted or Error, go
back to Step 2 and type in the Units (see the table on page 4-12 for syntax). In
extreme cases, use the editor to change the data file, or call them strings and parse
them using transforms.
• After you save a format file that assigns a name to a variable, that name takes
precedence even if you Edit Format and change the Row Flags. The only way that
Edit Format can change saved names is by manually editing the header cells in
Step 2.

Insights User’s Guide 27


28 Insights User’s Guide
5


About Datasets, page 5-1
Preprocessing Data, page 5-3 5 Spreadsheet
• Creating a Dataset, page 5-4
• Loading a Dataset, page 5-10
• Spreadsheet Window, page 5-12
• Variable Operations, page 5-32
• Menu Bar, page 5-46
• Exiting the Spreadsheet, page 5-60

This chapter explains how to use the spreadsheet to read data from formatted files (data
files that have been described to the formatter, as recorded in the data dictionary), and to
manipulate the data in the spreadsheet.
Invoke the spreadsheet by selecting Tools > Data Spreadsheet.

About Datasets
Use the spreadsheet to gather variables from formatted files into an internal data
structure called a dataset. When you save a dataset, its name is stored in the data
dictionary. A Pavilion model can only read data that is in a dataset.
A dataset consists of the original (raw) data values (obtained from the process history
through formatted files), and a list of functions or transforms that have been applied to
the data, producing a set of transformed data values. The transformed variables can
include variables that are unchanged from their raw values, variables whose raw values
have been modified by the transforms, and newly created variables generated by
transforms. After you have applied any transforms to the dataset, you can still view and

Insights User’s Guide 1


even modify the original data as it was before the transforms were applied, but only the
transformed values are used to build or run a model.
The terms column and variable are used interchangeably. A raw variable, in most
cases, is a process variable that was read into the dataset from a formatted file; there are
some other types of variables that are treated as raw variables, which will be discussed
later. If you apply a transform to a raw variable, it is still considered to be a raw
variable, but it has both raw and transformed values. A computed variable is a variable
that was created by applying a transform function to any variable in the dataset; the
computed variable is said to depend on all other variables from which it was
transformed. An independent variable is a variable created by applying a transform
that generates new values without reference to any variable that already existed in the
dataset; examples would be generating constants, row numbers, random numbers
(noise), or date/time values.
Every variable in a dataset must be given a tag name, and may also have a comment. If
you will ever use the Pavilion Data Interface (PDI) to access your data in real time, we
recommend that you use the DCS tags as either the tag name or the comment. If you
will never access data in real time, you may specify any names that you wish. Computed
variables may be given any valid name that you wish.

Transforms
All transforms on the dataset are kept in one ordered list. This transform list includes
functions that you apply directly from the Transform window (see Chapter 7, Transform
Calculator), and transforms that are automatically generated by a number of user actions
in the spreadsheet and plot windows. Whenever you load a dataset, or perform any
action that requires the transforms to be recalculated, a message displays each transform
as it is being applied. Transforms can be modified or deleted. Any action that generates
a transform can be undone by deleting the transform.

Status
Every data cell in a column has both a value and a status. If the status is OK, you see
only the value; if the status is not OK, the value may be undefined. Categories of bad
status include Cut, Blank, Break, Missing, and Error. The spreadsheet allows you to
change statuses as well as values.

2 Insights User’s Guide


Section

Date/Time Reference
If date and time were in two separate columns in the formatter, they are always
combined into a single date/time variable in the dataset.
Every numeric or string variable is optionally associated with a date/time variable; this
is referred to as its date/time pointer or date/time reference. You can change the date/
time pointer of a raw or independent column, but a computed column inherits its date/
time pointer from the variable(s) on which it depends (thus you cannot build a transform
using two variables with different date/time pointers).

Preprocessing Data
In the preprocessing phase, your dataset is displayed in a spreadsheet format or as a plot;
the default view is spreadsheet. You can have more than one copy of each view open at
the same time, up to your system limits on the maximum number of windows allowed,
but they all contain the same dataset.

CAUTION: If your dataset will be used in a Pavilion model, you


should examine the data and remove or correct any bad values,
examine every raw variable’s analysis variable range (AVR), and
usually time merge the dataset. There is no substitute for looking at
every variable; the quality of the data will make or break the quality of
the model.

You can remove bad data by changing its status or value as described in “Changing
Spreadsheet Contents” on page 5-15; by using plot cuts or clips, described in “Tools” on
page 6-28; or with transform functions described in Appendix A, Transform Reference.
For more information, see “Before Transforms Properties: Analysis Variable Range” on
page 5-45 and “Time Merge” on page 5-35.

Insights User’s Guide 3


Creating a Dataset
You can create a dataset either from existing formatted files or from data read directly
from the DCS or historian using a data extractor. To create a dataset, select File > New >
Dataset from the menu bar. An icon window containing your options appears.

The icon window shows any available data extractors in addition to the Formatted
ASCII Files option.
The data extractors create a dataset by reading data directly from the DCS or historian,
thus allowing you to skip the formatting phase and proceed straight to the preprocessing
phase. The Formatted ASCII Files option allows you to select formatted files for

4 Insights User’s Guide


Section

creating the dataset. If you select Formatted ASCII Files, the Select Files window
appears.

This dialog lists all formatted files recorded in the data dictionary, sorted by directory.
The complete path name of the current data dictionary is displayed; if you want to use a
different data dictionary, you can click in the box and type its name; when you press the
Return key, the formatted files recorded in that data dictionary will be listed.
This window is used to select the formatted files from which you will read variables.
The Selection area lists the files that you select. There are several methods to move files
from one list to the other:
• click on a file and click the arrow button
• double-click on a file
• drag on a group of files and click the arrow button
• click the all=> or <=all button

Insights User’s Guide 5


When the Selection list contains all the files from which you want to read variables,
click OK. This window is closed and the Select Variables window is invoked.

6 Insights User’s Guide


Section

This window displays all variables in all of the formatted files that you selected.
Variables are grouped under the name of their file. The Show Directory toggle button
controls whether you see the full path name of the formatted file.

Each variable is listed by its column number in the data file, tag name, comment if any,
and type. If any variable was not given a tag name in the formatter, it is marked “(no
name)”.

Insights User’s Guide 7


Any variable that was not given a tag name in the formatter is given a default name
when you select it, as shown in the example below.

To change the tag name of any variable in the Selection list, click on it, then click in the
New Name box, type the name, and press the Return key. Name changes do not affect
your data files or their associated format files; they only set the name that a variable will
have when it is read into the current dataset.

8 Insights User’s Guide


Section

When you click OK, the Selection list is checked for duplicate or bad variable names. If
duplicate names are found, it displays the first one, and asks whether to correct all the
names automatically or cancel the read:

Variable names may not include an exclamation point (!), a double quote ("), a left brace
({), or a right brace (}). As with duplicate names, if any bad names are found, it asks
whether to correct all the names automatically or cancel the read:

It does the same for names longer than 72 characters:

If you select Automatically, long names are truncated, the characters _2, _3, _4, and so
forth, are appended to duplicates, exclamation points and other illegal characters are
replaced with underscores (_), and reading continues. If you do not like the default
resolution names, you can change the names later.

Insights User’s Guide 9


If a duplicate name occurs because you have two or more files containing data for the
same variables at different times, for example, January’s data in one file and February’s
data in another file, you should only use the first file (January) to create the dataset, and
then use Add New Rows to append the next file (February). For more information, see
“Add New Rows (Before Transforms)” on page 5-48.
After any duplicate or bad variable names are resolved, the selected variables are read
into a new dataset and displayed in the spreadsheet. While the data is being read, a
message is displayed that periodically tells the number of rows that have been read.
After the raw data is read, the spreadsheet window is opened; see “Spreadsheet
Window” on page 5-12.

Loading a Dataset
When you click Load a Dataset, the Select Dataset dialog is invoked.

10 Insights User’s Guide


Section

This window is used to select an existing dataset to be loaded. It displays the name of
your current data dictionary, and a scrolled list of all datasets recorded in the data
dictionary, sorted by directory. If you want to use a different data dictionary, click in the
box, type in its name, and press the Return key, and datasets in the new data dictionary
will be listed.
When loading a dataset, if you notice on the list any dataset name that is obsolete, you
can click on it and then click Delete. You will be asked whether to Delete the dataset,
Just Remove its name from the data dictionary, or Cancel. Regardless of whether you
confirm or cancel, the Select Dataset window remains open so you can continue the
original purpose of selecting a dataset to be loaded.
You can load any dataset, regardless of whether it is displayed in the data dictionary
listing:
• If the dataset is listed in this window double-click on it, or click on it and then click
Select; or
• Type the full path name in the Dataset Name box and click Select; or
• Click Browse to invoke the common File Browser, described in “Selecting a File to
Edit” on page 3-3. The Browser displays datasets in the directory that you most
recently accessed, or in the directory from which you started running the program.
A File Mask will automatically be applied so that only dataset names (and directory
names) are displayed. Click on a dataset name and click Select in the Browser, to
enter the name in the Dataset Name box in this window, then click Select in this
window.
After any of these selection methods, the dataset is loaded. If you load a dataset not
recorded in your data dictionary, you will be asked whether to add it, and what comment
to attach to it. While the raw data is being loaded, a message is displayed which
periodically tells the portion of the dataset that has been read. If you click Cancel, the
dataset loading process will be cancelled (you cannot load just a portion of a dataset).

Insights User’s Guide 11


(Depending on how the dataset was saved, it may read rows rather than columns.) After
all of the raw data values have been read, the dataset’s transform list is applied. A
message displays each transform as it is being applied.

If you click Stop, the dataset is cleared from memory; you cannot load a dataset without
applying its transform list. After the raw data is read and all transforms are applied, the
Spreadsheet window is opened.
You can also use the Select Dataset window to delete a dataset or remove it from the
data dictionary. To do so, select the dataset name and then click Delete. A dialog
appears, prompting you to select one of the following options:
Delete
Delete the dataset files as well as the dataset’s entry in the data dictionary,
Just Remove from DD
Delete the dataset’s entry in the data dictionary without deleting the dataset files, or
Cancel
Cancel the operation without deleting the dataset files or changing the data
dictionary.

Spreadsheet Window
The Spreadsheet window displays a dataset in the format of a spreadsheet. Each column
contains a single variable (if any date and time information was in separate columns in
the formatter, it is automatically combined into a single date/time column). Header
information about a variable is displayed at the top of a column, above the data values.

12 Insights User’s Guide


Section

Row and column numbers in the dataset do not necessarily correspond to row and
column numbers in the raw data file(s) as displayed in the formatter.

If a data point is Blank, Missing, or in Error, it is marked as such. If data appears as a


row of asterisks (*), it only means that the default width of the spreadsheet column is
not wide enough to display the data. To change the width of one column, move the
mouse to the information area at the top of the column, on the line drawn at the right of
the column, until the pointer changes shape to a bidirectional arrow ( ), and drag to
the right until the asterisks are replaced with numbers. To change the width of multiple
contiguous columns, select them all (see “Selecting a Region” on page 5-14) and
change the width of any one of them.

Insights User’s Guide 13


As you move the mouse around the spreadsheet, the variable name and the row number
of the cell under the mouse are displayed in the positional help area at the bottom of the
window.

The Show Dataset menu allows you to view and manipulate the dataset either in its
normal state after the transform list has been applied (after-transforms), or in the
before-transforms state, showing the raw variables before any transforms are applied.
In the before-transforms state, the background color changes, the menu bar contents
change, and several of the operations buttons become inactive.

Header Information
The information area at the top of each column displays the variable’s Tag Name
(described on page 4-3), Comment (described on page 4-3), Units (described on page
4-12), Display format, and Time Column number. For raw variables, this information is
inherited from the format file. You can change this information as described in
“Changing Spreadsheet Contents” on page 5-15 or “Variable Properties” on page 5-43.
Display format for date/time, real, and double-precision variables is as described on
page 4-19. Display format for integers is “(Int)”, and for string variables is “(Str)”; these
cannot be changed.
Time Col is the column number of the variable’s date/time pointer. For a date/time
variable, this value is “n/a”. If a variable has no associated date/time information, this
value is “none”. If you rearrange the order of columns such that a date/time column is in
a different position, all Time Col values are updated automatically.

Selecting a Region
• To select a single cell, click on it.

14 Insights User’s Guide


Section

• To select a rectangular group of cells, click in one corner of the rectangle and drag
to the opposite corner; or click in one corner, scroll if necessary, and shift-click in
the opposite corner.
• To select an entire row, click on its row number.
• To select a group of rows, click on the first row number and drag to the last row
number; or click on the first row number, scroll if necessary, and shift-click on the
last row number.
• To select an entire column, click on its column number.
• To select a group of columns, click on the first column number and drag to the last
column number; or click on the first column number, scroll if necessary, and shift-
click on the last column number.
• To undo a selection, click in the white space to the right of or below the
spreadsheet, for example, on the word “Row” at the far left side of header section.
The selected area is highlighted. The arrow keys on your keyboard will change your
selection in the arrow’s direction. If the selected area is not visible on the screen, and
you use an arrow key to make a new selection, the dataset will be scrolled such that the
new selection is visible. If a row is selected, the left and right arrow keys will scroll to
the leftmost or rightmost column of the dataset; if a column is selected, the up and down
arrow keys will scroll to the top or bottom of the column. This can be used as a shortcut:
to find the last row in a column that is shorter than the other columns, select the entire
column and press the down arrow.

Changing Spreadsheet Contents


To change the contents of any header information or data cell, click on it, type a new
value, and press the Return key. When you click on the cell, its name is displayed in the
Selection area; its old value, and the new value that you type, appear in the Edit area.

Insights User’s Guide 15


To apply a change, press the Return key after typing the new value; to cancel a change
without applying it, click in another cell or region, without pressing the Return key.
Header contents can also be changed using the Variable Properties window, as described
on page 5-43. Note, however, that the date/time pointer of a computed column is
inherited from the variable that it was transformed from, and cannot be changed
directly.
To change a data cell’s status, type in one of the words Cut, Uncut, Blank,
Missing, Error, Break, or OK. However, this method does not work for string
variables, which consider the word that you type to be a new value; for string variables,
use the $changestat transform described on page A-30. See “Transform Finder” on
page A-33 for information on additional transform functions to change values and
statuses.

Important: When you make a change to a value, status, or date/time


pointer, there is a crucial difference in how changes are applied in the
before- and after-transforms states of the dataset:
• In the normal (after-transforms) state, these changes automatically
generate transforms, which are recorded in the transform list; you
have a record of them, and you can modify or undo them at any time.
• In the before-transforms state, changes are treated as though you
had edited the original data file before you brought it into the
spreadsheet; there is no record of the changes, and you cannot undo
them.

CAUTION: You cannot undo changes made to the before-


transforms state of the dataset. If there is any chance that you may
want to undo a dataset change in the future, make the change to the
after-transforms state of the dataset.

16 Insights User’s Guide


Section

Displaying Variable Statistics


The Statistics button invokes the Statistics Report window.

You can display statistics for either the after-transforms (normal) or the before-
transforms state.

Insights User’s Guide 17


You should always look at the min and max values of each variable to make sure they
are reasonable values, and investigate any values that are obviously invalid. In many
cases it can be useful to view mean and standard deviation also, to be sure that the data
is representative of a known state of your process. Additional measures are provided for
more unusual situations; for information on using and interpreting them, see any
standard textbook on statistical analysis.
Each variable’s name and total number of points are always displayed; other statistics
are shown only when requested. To calculate the value in a cell or region, select it (click
on it or drag through it), or click Calculate All to calculate values for all displayed
statistics types. To display any statistics type, turn on its toggle button.
String variables have no additional statistics, and date/time variables have only min and
max (use the Time Statistics function for more information on date/time variables). For
numeric variables, the number of decimals displayed for min and max is the same as the
variable’s display format; the number of decimals displayed for all other statistics is that
number plus two. These statistics can be selected for any numeric variable x with n valid
points:

Mean

n
1
x = --- å x i
n
i=1

18 Insights User’s Guide


Section

Standard Deviation

n
1
s = ------------ å ( x i – x )2
n–1
i=1

Range R = max - min

Standard Deviation s/R


as a Percentage of the
Range

Skew n
xi – x 3
å æè ------------ö
1
---
n s ø
i=1

Kurtosis æ n x – x 4ö
ç 1--- æ ------------ö ÷ –3
çn å è s ø ÷
i

è i=1 ø

Median ξ, such that


(the “middle” value)

ò f(x)dx = 0.5
–∞

where f(x) is the distribution function of the variable x.

Insights User’s Guide 19


1st Quartile q1, such that
(middle value of the lower
half of the distribution)

q1

ò f(x)dx = 0.25
–∞

where f(x) is the distribution function of the variable x.

3rd Quartile q3, such that


(middle value of the upper
half of the distribution)

q3

ò f(x)dx = 0.75
–∞

where f(x) is the distribution function of the variable x.

Trimmed Mean Mean of all values inclusively between the 1st and 3rd quar-
tiles.

Trimmed Standard Devia- Standard deviation of all values inclusively between the 1st
tion and 3rd quartiles.

20 Insights User’s Guide


Section

You can save the displayed statistics to an ASCII text report file by clicking Save
Report. This invokes a prompt box for you to enter the file name. A default name is
provided.

The Print button is used to print only the statistics that are visible; be sure to calculate
the ones you want before you print. The Print button invokes the Statistics Print setup
window, which sends a PostScript® specification to a file or directly to a printer.

Insights User’s Guide 21


Select the orientation, paper size, number of copies, and destination. Consult your
system administrator for a list of valid printer names on your system. If you write to a
file, you can specify it by file name only, or by its full directory path. You can optionally
print system labels, showing the dataset name and the current date and time; you can
also print labels on the top and/or bottom of each page. Fonts and sizes can be specified
independently for the table contents and the labels.

Time Statistics
The Time Statistics button on the spreadsheet invokes the Time Statistics window.

This window displays information about the date/time variables in your dataset. You
can select either the after-transforms (normal) or before-transforms state of the dataset.

22 Insights User’s Guide


Section

The information includes the location of invalid values, and the numbers, location, and
size of increasing, constant, and decreasing intervals. The use of time statistics
information is discussed in more detail in “Time Merge” on page 5-35.
To write the time statistics information to an ASCII text report file, click Save Report.
This invokes a prompt box for you to enter the file name. A default name is provided.

Viewing Data by Row or Column


The Go To button is used to scroll the spreadsheet to display a particular row or column.
It invokes the Go To dialog.

You can specify a row number (within the range of the dataset), or a column tag name,
comment, or number. The tag name or comment is case-sensitive, and must be typed in
exactly as it occurs in the variable. When you click Go, this dialog is closed, the
spreadsheet is scrolled to display your selection, and the selection is highlighted.

Insights User’s Guide 23


Viewing Data by Value or Status
The Search button is used to find a specified data value, combination of values, or
status. It invokes the Search Dataset dialog box, which is used to specify the type of
search, and its parameters, scope, and direction.

Search Types and Parameters


The search types are Search for Value, Date, String, Status, or Row. Search parameters
vary by search type.

24 Insights User’s Guide


Section

Search for Value


In a Search for Value, the parameters are a comparison type, Value, and Within.

The Value parameter is the number to which your data is compared. The Within
parameter is a small tolerance value determining the exactness of a numeric
comparison. This tolerance is applied to Equal, Not Equal, Less Than Or Equal, and
Greater Than Or Equal searches.

For example, Find Value Equal To Value 506.0 Within 0.01 would find any number
between 505.99 and 506.01. If you leave Within blank, it defaults to .5 times the
precision of the Value that you enter; for example, if you specify Value 3.200, Within
defaults to 0.0005. Within is used because of round-off errors between the internal
precision and your specified display format, and because of the round-off errors
inherent in computer storage of real numbers.

Insights User’s Guide 25


Search for Date
In a Search for Date, the parameters are a comparison type, Date, and Time.

An Increasing, Constant, or Decreasing search compares each date/time variable’s value


in the current row with its value in the previous row to find the specified type of
interval.

All other comparison types require you to specify a Date, or Date and Time, to which
the dataset values are compared.

26 Insights User’s Guide


Section

Search for String


In a Search for String, the parameters are a comparison type, a Value, and a toggle
button directing whether the search is case-sensitive.

The tolerances for the search are the following:

Do not put quotes around the value being searched for, or the quotes will be considered
as part of the search value.

Insights User’s Guide 27


Search for Status
Search for Status lets you select whether to search for points with a status of Error,
Blank, Missing, Cut, Break (see “Time Merge Parameters” on page 5-40), and/or Zero
Certainty (see “Certainty” on page 5-42).

Search for Row


In a Search for Row, the one parameter is a Condition.

This type of search is useful for finding rows in the dataset when a combination of
values from several variables indicates a specific state in the process.
A condition is an expression built by combining variable names, constants, parentheses,
relational operators (=, <>, <, <=, >, >=), logical operators (and, or, not), and transform
functions. A condition is built using the same syntax as a transform, as described in
detail in “Syntax” on page 7-7 and Appendix A, Transform Reference. The transform
functions that are allowed are those that operate on a single data point at a time; for
example, $sin(x) is allowed, but $average(x) is not. As in the transform
calculator (Chapter 7, Transform Calculator), variable names that contain characters
other than alphanumerics, or that consist solely of numerals, must be surrounded with
exclamation points; exclamation points are optional for any other variable names;

28 Insights User’s Guide


Section

functions and operators must be preceded by a dollar sign if they have the same name as
a variable; and dollar signs are optional for any other functions or operators. For
example, if a dataset contains variables called pressure, temperature, and flow, then you
could specify a Find Row search with the Condition
((!pressure!>1232.) $and (!temperature!>194.5) $and (!flow!<508.))

Scope and Order of Search


You can restrict a search to consider only certain row and/or column numbers. If you
had selected a region (larger than a single cell) in the spreadsheet before you opened the
Search window, that region is the default search area; otherwise the default search area
is the entire dataset.
Searching always “wraps around”, that is, if a search reaches the end of the dataset
without finding a match, it goes back to the beginning of the dataset and continues
searching until a match is found (or until the entire dataset has been searched
unsuccessfully).
Toggle buttons control whether searching is done forward or backward from the current
position, and whether columns or rows are searched first. The defaults are Forward and
By Column (by column is faster than by row).

Insights User’s Guide 29


Invoking the Search
You can click Count to display a message telling how many matches there are,

or click Search to find the first occurrence. After a successful search, this window is
closed and the matching cell or row is selected and highlighted.

Repeating a Search
The Search Again button is used to repeat a search without invoking the Search Window
again. If the most recent search was a Count, Search Again looks for the first match.

30 Insights User’s Guide


Section

Printing the Spreadsheet


The Print button is used to print all or a portion of the spreadsheet. It invokes the
Spreadsheet Print setup window, which sends a PostScript specification to a file or
directly to a printer.

Select the orientation, paper size, number of copies, and destination. Consult your
system administrator for a list of valid printer names on your system. If you write to a
file, you can specify it by file name only, or by its full directory path. Select the row and
column numbers to be printed. You can optionally print System Labels, showing the
dataset name and the current date and time; you can also print labels on the top and/or
bottom of each page. Fonts and sizes can be specified independently for the table
contents and the labels.

Insights User’s Guide 31


Variable Operations
The spreadsheet and the plot share a common set of Variable Operations buttons. The
Variable Operations are Duplicate, Copy Values, Delete (described on page 5-34), Time
Merge (described on page 5-35), Transform (Chapter 7, Transform Calculator), and
Properties (described on page 5-43).

Copying Variables
There are two methods by which you can copy a variable: Duplicate and Copy Values.
Duplicate copies the history of all transforms that had been applied to the variable, and,
for a raw variable, copies the before-transforms values into a new raw variable. Copy
Values copies the variable’s current transformed values and statuses into a new raw
variable, with no record of the history by which the current values were generated or
calculated.
For example, if a dataset’s transform list contains these transforms:

and you duplicate the variable flow1, the new transform list will be:

A duplicated variable inherits all properties of the original; copied values do not inherit
the comment or date/time pointer. For more information, see “Variable Properties” on
page 5-43.

Note: There are some special-purpose transform functions that


generate more than one output variable from a single transform (these
are all documented in Appendix A, Transform Reference). Any variable
that was created as the output from one of these transforms cannot be
Duplicated, but will fail when it tries to apply the transform list.

32 Insights User’s Guide


Section

Identical selection windows are invoked when you click Duplicate or Copy Values. The
following window is for the Duplicate operation.

There is a scrolled list of all variables in the after-transforms state of the dataset (even if
you are currently viewing the before-transforms state). When you select a variable it is
highlighted, and a default tag name for the new variable appears in the New Name box.
If you do not want to use the default new name you can type in a different one. If you
had already selected a variable in the spreadsheet before you invoked this dialog box,
that variable is the default selection, but you can change it.
After you click Duplicate or Copy, the transform list is applied to the dataset.
Meanwhile, a working message displays each transform as it is applied.

Insights User’s Guide 33


The Stop button that was illustrated in the Working dialog described on page 5-12 does
not appear during this operation. To cancel a Duplicate or Copy Values, you must wait
until the operation is complete, then Delete the new variable.

Deleting Variables
The Delete Variables window is invoked when you click Delete in the spreadsheet or in
the plot window.

After you select the variables to be deleted and click Delete, you are asked to confirm.
After you confirm, the variables and any transforms applied to them are deleted from
the dataset.
A variable cannot be deleted if:
• it is a date/time variable that is the date/time reference for any other variable (see
“Header Information” on page 5-14), either in the after-transforms or the before-
transforms state of the dataset;

34 Insights User’s Guide


Section

• it is used as input to a transform with output to a different variable;


• it was generated by a transform with multiple output columns.
In the last case, you can delete the variable by deleting or modifying the transform that
created it.
Occasionally, deleting a computed variable fails even if there are no other variables that
depend on it; if there are no dependencies, you can delete the variable by deleting the
transform that created it.

Time Merge
The time merge function transforms the dataset so that in terms of time stamps and
sampling intervals, the entire dataset is uniform and consistent: each row has a single
time stamp, and rows are separated by a single, consistent time interval. The time merge
function provides several options for the interpolation or extrapolation method used.
A time merge is implemented using the $TimeMerge transform (see “System-
Generated Transforms” on page A-30).

When Time Merging Is Required


Input data files are subject to very few restrictions on date and time format and
information; see “Date and Time Information” on page 4-2 for more details. However,
before a dataset can be used by a model, it must be made “row synchronous” (that is, all
data in a single row of the file must be associated with the same sampling time), and, if
the process includes time delays, the time interval between rows must be made constant,
and any breaks in the data must be marked. Time merging is a process that converts a
dataset to this required format, by expanding and/or compressing the data as necessary.
Time merging also fills in data to replace points with bad status, and can be used to
extrapolate data after the ending date/time of a dataset.

Time Interval and Time Delays


You can model quasi-steady-state processes having significant time delays among
process variable interactions by specifying a time delay value for each variable. The
time delay value adjusts the temporal relationship of a process variable with respect to
the other variables in the process. A positive time delay value advances the variable in
time; a negative value moves it back in time. One time delay unit represents the time
interval between rows in the after-transforms dataset. Time delays are discussed in more
detail in “Time Delays” on page 8-10.

Insights User’s Guide 35


Before executing a time merge, you have to determine the time interval that you will
want between rows in the resulting dataset. Time statistics information (see “Time
Statistics” on page 5-22) on the time intervals currently in the data may help you to
decide on an appropriate interval. As a general rule, the time interval should simply
reflect the rate at which you want to obtain output predictions; however, the time
interval should never be smaller than the sampling interval of the most frequently
sampled process input variable.
In the special case of a process that does not include time delays, the only reason to time
merge would be if the dataset were not row synchronous; in this case, any convenient
time interval could be used.

Preparing the Dataset for Time Merge


If any date/time variable point has a bad status, all data values in that row (in columns
pointing to that date/time variable) will be discarded. It is a good idea to search the date/
time columns for Blank, Missing, and Error points (searching is described on page
5-24). If you find any such points, you can correct them by hand (click on a cell, type a
new value, and press the Return key) if you want to recover the rest of the data in the
row. The figures below illustrate some conditions for changing or retaining an Error.
In the following case, it should be safe to change the Error to 01/01/92 01:00:00, and
use the data values in the rest of the row.

36 Insights User’s Guide


Section

In the following case, leave the Error in place, and the entire data row will be skipped.
Values for 01:00 will be generated when you time merge.

If any date/time variable has a value that is legal but erroneous, a time merge will cause
that error to affect the data values negatively. Before doing a time merge, you should
always check the Time Statistics to identify any peculiar values, and correct or remove
them. It can sometimes be convenient to round off the date/time values (see the
$dtRound transform described on page A-10) before doing a time merge.
It is usually preferable to remove bad data values (“outliers”) from the dataset before
doing a time merge. If you time merge first, and then remove the outliers, you should
also remove any interpolated or extrapolated value that was based on the outlier.

Insights User’s Guide 37


Time Merge Window
The Time Merge window is invoked when you click Time Merge in the spreadsheet or
in the plot window.

38 Insights User’s Guide


Section

Time Merge Variables


In a dataset, the variable types are date/time, numeric (real, integer, or double
precision), or string, and variable origins are raw, independent, or computed (see
“About Datasets” on page 5-1). Numerics and strings can be attached to a date/time
variable, or they can be unattached, without any date/time information. You can change
the date/time pointer of raw or independent columns (see “Changing Spreadsheet
Contents” on page 5-15), but not of computed columns; a computed column inherits the
date/time pointer of the column from which it is transformed.
You can time merge all or some of the variables in a dataset. You can select the date/
times and the variables that are unattached. When you merge a date/time variable, all
other variables that reference it are also merged; the original date/time variable is
removed from the after-transforms dataset and replaced with a new merged time, and
the date/time pointers are changed to the new merged time. When an unattached column
is time merged, its values are not changed, but its date/time pointer is set to the new
merged time variable. If you don’t bother selecting any variables from the list, all date/
time variables will be selected automatically, but the unattached variables will not be
selected.
The Show Dependent Variables toggle button changes the display so that, in addition to
the date/times and unattached variables, it shows the other variables that reference each
date/time. These variables are always affected by a time merge of their date/time
reference, regardless of whether they are displayed. Time merge calculations are applied
directly to raw and independent variables; computed variables inherit the effect of the
time merge on the variables from which they are computed.
The new time variable must be given a unique name. A default name is provided.

Time Merge Method


The time merge method can be Boxcar, Linear, Spline, LinearExtend, or SplineExtend.

Insights User’s Guide 39


Boxcar simply repeats the most recent value; Linear and Spline interpolate. Keep in
mind that interpolation can cause one or more data values to be lost at the endpoint:

Six Original Points:

Five Interpolated Points:


Starting Ending
Time Time

For interpolated points, the LinearExtend and SplineExtend options function the same
as ordinary Linear and Spline; but also they repeat the last original value until the
specified ending time is reached:

Six Original Points:

Five Interpolated Points,


plus one Extended Point:
Starting Ending
Time Time

Any time merge method appropriate for your application is acceptable.


Spline interpolation over relatively large gaps tends to produce spikes in the resulting
values. If gaps in the data are large, you will get better results by doing a Linear time
merge first, followed by a Spline.

Time Merge Parameters


The range of the time merge can be set to the Earliest Start and Latest End, or the Latest
Start and Earliest End, of all date/time columns in the dataset at the time the time merge
is executed. You can override these choices and fill in a specific starting and ending date
and time; but those values will be retained in the dataset’s transform list, and will
prevent the use of any data outside that range.

40 Insights User’s Guide


Section

Interval is the time difference between successive rows in the resulting time column.
You type in the amount, and select its units from the option menu.

Maximum Time Gap is an optional value used to control whether a gap in the data
(before the merge) is filled by the time merge or is left blank. If a gap in the data is
smaller than the specified Maximum Time Gap, the time merge expands data to fill the
gap. If a gap in the data is larger than the Maximum Time Gap, the time merge indicates
the gap by a data point with Blank status. If all columns are filled with multiple rows of
blanks, time merge collapses them into a single row of points with Break status.
To set the Maximum Time Gap, type in the amount, and select its units from the option
menu; if you will be using this dataset to model a process, the maximum time gap
should be set to a value greater than the time merge interval but less than the time frame
over which extrapolation or interpolation would introduce significant errors. This
maximum time gap is therefore a function of the speed of the process as well as the time
merge interval.
Time merge calculations can only be applied to date/time values in strictly increasing
order. Two option menus are used to specify how to handle values out of order: Cut Data
throws out any data that is in decreasing time order; Sort Data sorts the entire dataset
while making the time merge calculations.

If the time values are in random order, sorting can be extremely slow; it may be
preferable to sort the dataset first (using a series of $sort transforms, see Chapter 7,
Transform Calculator and $sort described on page A-23), then write a dataset report
(described on page 5-52) and create a new dataset that is already sorted.

Insights User’s Guide 41


If the dataset includes any duplicate times, Use First uses the data at the first occurrence
of the time and ignores all repetitions of the time; Use Last uses the data at the last
occurrence, and ignores any earlier reports; and Use Average saves all values (per
variable) at duplicate times and averages them.

When you click Merge, a $TimeMerge transform is generated and applied to the
dataset (the transform syntax is described on page A-31). This transform may force
some other transforms already on the dataset to be re-evaluated. A working message
indicates which transform is currently being evaluated.

Certainty
Time merge may generate new data by extrapolation or interpolation. For each point, a
record is kept of whether it existed before the time merge or whether it was generated,
and, if generated, how far away it was from known data; this record is called certainty.
After a time merge, the positional help at the bottom of the spreadsheet displays the
certainty for a given point in the dataset.

You can query certainty values with the $certainty transform described on page
A-5, or change them with $setcert described on page A-22. When you train a
model, you can use the certainty values to pay more attention to points with higher
certainty; see “Sparse Data Algorithm” on page 9-16.
The distance from known data that causes a certainty of zero is set with a parameter
called maxCert. This parameter cannot be set in the Time Merge window, but it is
automatically inserted in the $TimeMerge transform (described on page A-31), so you
can modify it (see “Editing the Transform List” on page 7-18). If you use the optional

42 Insights User’s Guide


Section

Max Time Gap, maxCert defaults to the same value; if you do not use Max Time Gap,
maxCert defaults to 12 times the time merge Interval.
Certainty values are generated by the $TimeMerge transform and do not use any other
information about the variable; in particular, if you time merge the same variable twice,
its certainties after the second time merge do not take into account any poor certainties it
may have had resulting from the first time merge. If you will be making use of certainty
information, never time merge the same variable more than once.

Variable Properties
Use the Variable Properties window to set Analysis Values and Analysis Variable
Ranges (AVR), and to view and change other properties. Invoke it by clicking
Properties in the spreadsheet or in the plot window.

Insights User’s Guide 43


You can display properties for the dataset in its after-transforms (normal) or before-
transforms state.

There is a scrolled list of all variables in the current view. Click on the tag name of any
variable in this list, and its properties will be displayed.

Common Properties
For all variables, the Column number, Type, and number of Values are displayed. These
properties cannot be changed. The number of values is the total number of points that
this variable has in the dataset, including cut, blank, missing, error, and break points.

44 Insights User’s Guide


Section

All other properties are displayed in text boxes. To change any of them, click in the text
box, type in the change, and press the Return key. Properties for all variables include the
column header information: tag name, comment, units, display format, and (for
variables other than date/times) the date/time column pointer.
The Show Transforms button brings up the Transform window, described on page 7-3.
If you select any variable in this window, and then click Show Transforms, the
Transform window comes up with that selected variable applied as a Mask, as described
on page 7-7.

Before Transforms Properties: Analysis Variable Range


All raw variables have a property called analysis value, and numeric raw variables have
a minimum and maximum analysis variable range (AVR). You can set these values
using the Properties window, but they are used only when you use the dataset with a
model to perform Output vs. Percent (described on page 10-22) or What Ifs (described
on page 11-1) analysis.
In general, a variable’s AVR should be the range of its valid values. However, if you
intend to model only a subset of all possible operating conditions, you should set the
AVR to correspond to that subset. If the raw data files from which the dataset was
created contained only valid data values, you should not have to change any variable’s
AVR; but if the data included bad values outside the normal range, then the default AVR
will reflect those bad values and you will have to correct it and then reset the AVR (you
can remove bad values from the dataset in its before-transforms state using the
spreadsheet Edit menu functions and plot cuts, or you can remove them from the after-
transforms state using plot cuts and clips, or transforms).

Insights User’s Guide 45


In the before-transforms view of the dataset, the Properties window includes fields for
setting the analysis value and analysis variable range (AVR), and convenience buttons
for setting the AVR.

After correcting any bad data, you can reset the AVR manually by typing in new values,
or you can use the convenience buttons (or the AVR menu in the before-transforms view
of the spreadsheet) to change the AVR to the current range. Convenience buttons are
provided to set the AVR at any time to the current values of the data range either before
or after transforms; note that either of these is a one-time-only setting to current values,
and does not change the AVR again later if the range changes again later. You should not
normally set the AVR to the after-transforms range if you have applied transforms, other
than clips and cuts, that change the range.

Menu Bar
The spreadsheet and the plot window share most menu bar functions, although some
menus in the spreadsheet are omitted from the plot. In the after-transforms state of the

46 Insights User’s Guide


Section

dataset, the menus are Dataset, Window, Edit, and Reorder; in the before-transforms
state of the dataset, the Reorder menu is replaced with an AVR menu.

Dataset Menu
The Dataset menu provides the following operations:
Create New Dataset page 5-47
Load Dataset page 5-47
Add New Variables page 5-47
Add New Rows (Before Transforms) page 5-48
Add Dataset page 5-50
Inherit Transforms page 5-50
Save Dataset and Save Dataset As page 5-50
Save Dataset Report page 5-52
Clear Dataset page 5-55
Delete Dataset File page 5-55

Create New Dataset


This function operates as described in “Creating a Dataset” on page 5-4. If you cancel
the process at any point, a currently loaded dataset will remain loaded; otherwise, the
current dataset is cleared immediately before the new one is created.

Load Dataset
This function operates as described in “Loading a Dataset” on page 5-10. If you cancel
the process at any point, a currently loaded dataset will remain loaded; otherwise, the
dataset is cleared immediately before the new one is loaded.

Add New Variables


Add New Variables is used to read additional variables from formatted data files into
new raw columns in the before-transforms state of the current dataset. It invokes the
Select Files dialog, and the operation continues as described in “Creating a Dataset” on
page 5-4. These new variables will not be affected by any transforms that have already
been applied to the dataset, even a $TimeMerge. If you have already applied a time
merge and then you Add New Variables, you should modify the $TimeMerge
transform (as described in Hint 8 on page C-8) to reference the date/time pointer(s) of
the new variables, rather than applying an additional time merge.

Insights User’s Guide 47


Add New Rows (Before Transforms)
Add New Rows is used to add additional data for existing variables in the dataset. The
additional data is read into new rows appended to raw variables that already exist in the
before-transforms state of the current dataset. When you Add New Rows, every variable
that you select to read:
• must already exist in the dataset, with the same tag name—duplicate tag names
cannot be resolved automatically;
• must match a raw variable—you cannot read new rows into a variable that you
created using the transform calculator;
• and must have the same variable type as the matching variable in the dataset.
In addition, if you select more than one variable with the same date/time pointer, their
corresponding columns in the before-transforms state of the dataset must all be of the
same length.
If a dataset includes two raw variables X and Y, and you have transformed Y to depend
on X, then the length of Y after transforms are applied will be no longer than the length
of X. (An example of this would be the transform
!Y!=$if($valid(!X!),!Y!,$error), which sets an error in Y when the point
in X is not valid.) If you add new rows to Y but not to X, the length of Y is increased
before-transforms, but after-transforms it is not changed because it is limited by the
length of X. If you add new rows to a variable, you should also add to any other
variable on which it depends.
Before selecting the Add New Rows operation, prepare the new rows of data as one or
more formatted files (see Chapter 4, File Formatter).
When you select the Add New Rows operation, it first displays the Select Files window
so you can select the formatted files. Then it scans the formatted files to build a list of
variables found there. Next it displays the Select Variables window so you can select
variables whose rows you wish to add. The selected variable names should match
existing dataset variable names exactly. If they do not, you can enter new names for the
selected variables in the New Name text box in the Select Variables window.
After you have verified that the variable names for all added rows match existing
variable names in the current dataset, changing names using the New Name text box as
necessary, click OK in the Select Variables window to add the new rows. If any new
variables are not found in the current dataset, the operation fails.
If any of the restrictions are violated, an error message appears, and the dataset remains
unchanged. If there is more than one violation, only the first one detected is noted.

48 Insights User’s Guide


Section

Note that you cannot append to any one column from more than one data file at the
same time.

Sample Error Messages


If you try to append data columns without selecting their associated date/time
column(s), the date/times will be read anyway.

You cannot read new rows into an independent or computed variable; new rows can be
appended only to raw variables (variables that exist before-transforms).

You can append to multiple columns at the same time, but if they have the same date/
time pointer, they must have the same length in the before-transforms state of the
dataset
.

Insights User’s Guide 49


Add Dataset
In its simplest conceptual form, a dataset consists of raw data plus a list of transforms;
what you normally see on the screen is the calculated values that result from applying
the transforms to the raw data.
The Add Dataset function lets you start with one dataset, for example ds1, and select a
second dataset, for example ds2, to be combined with it. The raw columns from ds2
are appended to the right side of the raw columns of ds1, and the complete transform
list from ds2 is combined with the transform list of ds1; the combined transform list is
then applied to the combined raw data. The Add Dataset operation will fail if the two
datasets have any variable names in common at any point, including the names of date/
time variables.
When you select Add Dataset from the Dataset menu, the common Select Dataset dialog
is invoked. This dialog functions as described in “Loading a Dataset” on page 5-10.
After an Add Dataset operation, all transforms will be recalculated on the combined
dataset.

Inherit Transforms
The Inherit Transforms function lets you start with one dataset, for example ds1, that
does not yet have any transforms, and select a second dataset, for example ds2, to
inherit transforms from. The raw columns from ds2 are not touched, only the complete
transform list from ds2 is copied to become the transform list of ds1; the transform list
is then applied to the raw variables of ds1. The Inherit Transforms operation will fail if
any transform from ds2 cannot be applied to the variables in ds1.
When you select Inherit Transforms from the Dataset menu, the common Select Dataset
dialog is invoked. This dialog functions exactly as described in “Loading a Dataset” on
page 5-10.
After inheriting transforms, you should carefully review any functions that are based on
a variable’s row number or date/time value, to be sure they are still appropriate in the
dataset that inherited them. Examples of such functions include, but are not limited to,
$override, $MarkCut, and $TimeCut.

Save Dataset and Save Dataset As


When you save a dataset, you are saving its raw data values, the names and properties of
its variables, and its transform list. If the dataset already has a name, Save Dataset asks
you to confirm, then writes the dataset to disk files in the same format as the last time

50 Insights User’s Guide


Section

you saved it. If the dataset does not already have a name, Save Dataset is identical to
Save Dataset As.
Save Dataset As saves the current dataset, and stores its name in a data dictionary. The
Save Dataset dialog provides text boxes for you to enter a directory and dataset name,
data dictionary name, and an optional comment.

The dataset name can consist of letters, numbers, and underscores only; no other
characters are allowed. If you enter a name that is already recorded in the data
dictionary, you will be asked whether to overwrite the existing dataset.
You can use any valid data dictionary name. It defaults to the data dictionary that you
have most recently specified, or to your default file.
The comment is stored in the data dictionary with the dataset name, and is displayed
whenever the Select Dataset dialog is opened.
You can save datasets in either binary or ASCII format. Binary data is faster to save and
load. ASCII can be viewed with a text editor and transferred between computers more

Insights User’s Guide 51


reliably. If you intend to transfer the dataset between computers, be sure to note the data
type so that you can select the correct mode for the transfer.

CAUTION: Some file transfer facilities, such as the File Transfer


Protocol (FTP), require you to select a data type mode (Binary/Image
or ASCII/Text) before transferring files between computers. Dataset
files other than DatasetName.pi_data are ASCII files.
DatasetName.pi_data is whichever data type you selected when
you saved the dataset. Be sure to specify the correct data type when
transferring files. If you transfer a file using the wrong data type, you
may corrupt the file permanently.

You can save the dataset in the format used by the current version of Pavilion software
or in the format used by a previous version. This feature is useful if you intend to use the
dataset on a computer that you have not updated with the latest version of Pavilion
software. A given version of the software can use datasets saved in an earlier format, but
the software may not be able to use a dataset saved in a later format.

Save Dataset Report


The Save Dataset Report function is a powerful tool that allows you to save all or any
portion of the raw data values or the transformed dataset to an ASCII text report file, in
a wide variety of formats that are compatible with the formatter and with a number of
commercially-available spreadsheet programs.
The option menu at the top of the Dataset Report window allows you to select whether
the report is made from the after-transforms (normal) or before-transforms state of the
dataset. The Variables and Selection lists allow you to select which variables are written
to the report, and in what order. If you do not bother selecting any variables, all
variables in the selected state of the dataset are written.
The Write Options toggle buttons allow you to select additional information to be
written with the data values. Pavilion Header is a row at the top of the report file that
identifies it as having been written by Pavilion products, with the version number and
current date. Tag Name, Comment, and Units each write one row at the top of the report
file, containing this information about the report columns. Row Numbers writes the
dataset’s row number at the beginning of each row in the report file.
The Separator Character is one character written between the data values in each row. If
you select the Write Options of Tag Name, Comment, or Units, then the same Separator

52 Insights User’s Guide


Section

Character is written between those header values. You can select Space, Tab, Comma,
or any other single character that you specify.
Default values are provided for the Start and End Row, and the Directory and Filename
of the report file. The default End Row is based on the length of the dataset, not of the
column(s) that are being written.
The Write Format File option is used if you want to read the report file back into a new
dataset; it allows you to bypass the formatter by automatically writing a format file and
recording it in a data dictionary. The next time you select Create New Dataset or Add
New Variables, this dataset report file will appear in your list of formatted files. If you
select Write Format File, you can type in the name of a data dictionary.

Insights User’s Guide 53


When you click Save, the report file is written, and a message notifies you when it is
complete.

54 Insights User’s Guide


Section

Clear Dataset
Clear Dataset erases the internal storage image of the current dataset, giving you an
opportunity to save if you have not yet done so. It does not affect any datasets that you
have saved to disk.

Delete Dataset File


Delete Dataset File is used to delete a dataset’s files from your disk and/or remove its
name from the data dictionary file. It invokes the Delete Dataset dialog.

This dialog displays the name of the current data dictionary, and a scrolled list of all
datasets recorded in it (sorted by directory). If you want to use a different file as the data
dictionary, click in the box, type in its name, and press the Return key, and datasets in
the new data dictionary will be listed. To select a dataset from the list, click on it and
then click the Delete button (or, as a shortcut, just double-click on the dataset). You can
type in the full path name of a dataset instead.

Insights User’s Guide 55


When you click the Delete button, you are asked whether to delete the dataset, just
remove it from the data dictionary, or Cancel.

Delete
This operation permanently removes the files from your disk.
Just Remove From DD
This operation removes the data dictionary entry without affecting the disk files.

Window Menu
The Window menu contains two entries, New Spreadsheet and New Plot. All
spreadsheet (described starting at Chapter 5, Spreadsheet) and plot windows (described
starting at Chapter 6, Data Plotter) that are open at the same time contain the same
dataset.

Edit Menu
The Edit menu is found only on the spreadsheet. Edit functions can be used on data cells
or header cells; but for data cells they can be used only in the before-transforms state of
the dataset. Edit functions do not generate transforms, so you have no record of any
editing that has been performed. (Similar functionality is available in the
$clearRows, $copyRows, $deleteRows, $dupRows, and $insertRows
transforms.) Only the most recent edit operation can be undone.

Edit Regions
Before selecting any Edit operation (except Undo), you must click or drag in the
spreadsheet to select the region to which the editing will be applied. Except as noted
below, the region can be a single cell, a rectangular group of cells, one or more complete
rows, or one or more complete columns. To select a region, drag from one corner to the
opposite corner; or click in one corner, scroll the window if necessary, and shift-click in

56 Insights User’s Guide


Section

the opposite corner. To select a complete row, click in its row number; to select multiple
rows, drag through their row numbers; to select a complete column, click in its column
number; to select multiple columns, drag through their column numbers.

Note: For edit operations, selecting all of the cells in a row or column
is not the same as selecting the row or column by its number.

The cut/copy/paste operations treat header cells and data cells independently, and they
do not share the same buffering system; you cannot copy from data and paste into a
header, or vice versa.

Edit Operations
Undo (Ctrl-u)
Undoes the most recent editing operation since the current dataset was loaded.
Cut (Ctrl-x)
Moves the contents of the selected cells into the internal editing buffer, and leaves
the selected cells Blank. You can cut all of the cells in a column, but you cannot
select a column by number and cut the entire column. You cannot cut a Tag Name
cell.
Copy (Ctrl-y)
Copies the contents of the selected cells into the internal editing buffer.
Paste (Ctrl-p)
Copies the contents of the internal editing buffer into the selected region. The
region being pasted into must have the same variable types as the values in the
buffer. You can select a single cell that will be used as the upper left corner of the
paste region, or you can select the entire region. If you select a rectangular region,
its size in at least one dimension must match the size of the buffer. You can
lengthen a column by pasting beyond the bottom of a column, but you cannot create
new columns by pasting to the right of the dataset. You can paste into all of the cells
in a column, but you cannot paste when the selected region is one or more complete
columns.
Clear (Ctrl-b)
Removes the selected cells’ contents and leaves them Blank, without saving into
the internal editing buffer.

Insights User’s Guide 57


Insert (Ctrl-i)
Puts new, blank cells into the dataset, pushing the selected cells down to lengthen
the columns. You cannot insert when the selected region is one or more complete
columns.
Delete (Ctrl-d)
Copies the selected cells’ contents into the internal editing buffer, and removes the
cells from the dataset, making their columns shorter. You can delete all of the cells
in a column, but you cannot delete one or more complete columns from the Edit
menu; to delete columns, use the Delete button on the spreadsheet window. You
cannot delete a Tag Name cell.
Duplicate (Ctrl-a)
Operation depends on the type of cell and region that you select.
• If the region is a group of data cells, it must contain cells from at least two
rows. Duplicate functions as a “Fill Down”, copying the values from the top
row into all selected cells below it.
• If the region is one or more complete data rows, selected by row number,
Duplicate functions as a “Copy and Paste”; it makes an identical copy of the
selected rows, and inserts them below the selection, making all columns in the
dataset longer.
• You cannot Duplicate if the region is one or more complete data columns.
• If the region is a group of header cells, Duplicate functions as a “Fill Right”,
copying the values from the leftmost cell into all selected cells beside it. You
cannot duplicate Tag Name cells.

Reorder Menu
The Reorder menu is available in the spreadsheet only when you are viewing the
dataset’s after-transforms state.

CAUTION: You cannot undo a Reorder function. Always save the


dataset before you reorder its columns.

Move Variables
To move one or more adjacent variables, first select the column(s) by clicking or
dragging on their column numbers (as described in “Selecting a Region” on
page 5-14), then select Move Variables from the Reorder menu. Now, when you
move the mouse into the spreadsheet, a moving indicator appears to show the

58 Insights User’s Guide


Section

position into which the columns will be moved when you click. Scroll if necessary,
and when the indicator is where you want to move the columns, click, and the
columns will be moved. To “cancel” the move, just move the pointer back into the
original position
.

Sort Alphabetically
Sort Alphabetically first warns you that you cannot undo the operation, and then it
sorts the transformed variables.

AVR Menu
The AVR menu is available only when you are viewing the dataset’s before-transforms
state. There are two operations: Set to Data Range Before Transforms and Set to Data
Range After Current Transforms.
The concept of AVR applies only to raw variables before they are transformed. Every
raw numeric variable has an Analysis Variable Range (AVR). For more information on
AVR, see “Variable Properties” on page 5-43.

Note: It is extremely important to set correct values for the AVRs


before using the dataset with a model to perform Output vs. Percent or
What Ifs analysis.

For one or more variables that you select, you can set the AVR at any time to the current
values of the data range in either the before- or after-transforms state; note that either of
these is a one-time-only setting to current values, and does not change the AVR again
later if the range changes again later. You should not normally set the AVR to the after-
transforms range if you have applied transforms, other than clips and cuts, that change
the range.
Either option from this window brings up a dialog used to select the variables to which
the AVR setting is to be applied.

Insights User’s Guide 59


Exiting the Spreadsheet
The Done button closes the spreadsheet. If you have more than one spreadsheet open,
they all have to be closed separately; closing one of them does not affect the others.
When you close the last spreadsheet window, the dataset that was loaded in it remains
loaded in the system and will appear the next time you open a spreadsheet window. The
dataset remains loaded even if you open any of the model-related functions such as the
trainer or the model analysis tools. However, a dataset that you have created or modified
is not automatically saved; you must explicitly save it or it will be lost when you exit. If
you close the spreadsheet without saving the dataset, you will be warned.

60 Insights User’s Guide


6
• Plot Appearance, page 6-4
• Plot Types and Parameters, 6 Data Plotter
page 6-10
• Y Axis Limits, page 6-27
• Crosshairs, page 6-27
• Tools, page 6-28
• Printing a Plot, page 6-39
• Exiting the Plotter, page 6-39

This chapter explains how to use the plotter to display graphical representations of data
in a dataset. Display the plot window by selecting New Plot from the Window pull-
down menu in the spreadsheet.
The Plot window consists of a plotting area surrounded by plot controls, plus plotting
menus and operations buttons that are shared with the Spreadsheet window. The
operations buttons are explained beginning on page 5-32, the dataset menu is on page

Insights User’s Guide 1


5-47, the Window menu is on page 5-56, and the AVR menu (before-transforms) is on
page 5-59.

You can plot and manipulate the dataset either in its normal state after the transform list
has been applied, or in the “before-transforms” state, showing the raw variables before
any transforms are applied. In the before-transforms state, the background color
changes, the AVR menu appears in the menu bar, and the Duplicate, Time-Merge, and
Transform operations buttons become inactive.
The controls to the left of the plotting area set the plot type and parameters. The specific
parameters that are to be set change with the plot type selection. The controls to the
right of the plotting area are used with any plot type (with some minor exceptions as
noted below).

2 Insights User’s Guide


Section

The Continuous Update toggle button controls whether changes that you make to any
plot controls or selections are drawn immediately, or saved until you click the Draw
button.

If Continuous Update is turned off, a highlight appears around the Draw button
whenever you have specified changes that have not yet been drawn.

Note: In all documentation for the plot window, any description of an


action that is followed by “…the plot is drawn” should be understood
to mean “…if Continuous Update is turned On, the plot is drawn
immediately, but if Continuous Update is turned Off, the plot change is
remembered and the Draw button is highlighted.”

Depending on the amount of information that is being plotted, it can take a significant
amount of time to draw a plot. While a plot is being drawn, the Draw button changes to
a Stop button and is highlighted in red. If you click the Stop button, the drawing process
stops.

To make a plot, you select the Line Type, Graph Type, and Plot Type, and set the
parameters that appear for the selected plot type; as soon as you select the variables to
be plotted, the plot is drawn.

Insights User’s Guide 3


Plot Appearance
Line Type
Plots can be made with Points, Lines, or both. Default values for Line Type are set
according to the Plot Type selection.

Graph Type
Graph Type is used when you select more than one variable to be plotted. For some Plot
Types, you cannot control the Graph Type, so the menu is grayed out.

A Stacked plot draws every selected variable on a separate Y axis and with a common
X axis, except in the histogram plot, where each X axis may be different. An Overlay
plot draws every selected variable on a single pair of axes. A Normalized plot draws
every selected variable on the same X axis, but with independent Y axis scalings, such
that each variable’s minimum and maximum are drawn to the same height on the plot.
These three plot types are illustrated below.

4 Insights User’s Guide


Section

Example of a stacked plot:

Insights User’s Guide 5


Example of an overlay plot:

6 Insights User’s Guide


Section

Example of a normalized plot:

Insights User’s Guide 7


For a normalized plot, the Y axis labelling pertains only to one variable at a time; its
name is displayed above the axis. To display the axis for a different variable, click on
the variable name, and the Y Axis Variable Selection box will appear.

This box lists all variables that are currently being plotted. To display the Y axis
labelling for a different variable, double-click it, or click it and click OK.

8 Insights User’s Guide


Section

When you select a different variable, the Y axis labels are changed to display the scaling
for that variable. The scaling of the plot is not changed.

Display
Legends are drawn above stacked plots, and to the right of overlay or normalized plots;
by turning off the legends, more space is available for drawing the plots.

Insights User’s Guide 9


Points that have a status of Cut or Break have a value, but the value is ignored in
calculations. When Display Cuts is turned on, these points are plotted at their known
value, with a colored dot that indicates their status. When Display Cuts is turned off, Cut
and Break points are not plotted.

Plot Types and Parameters


A number of different plot types are available. When you select a Plot Type, controls for
that plot type appear.

Selecting Y Variables
All types of plots include in their parameters a button for selecting the variables to plot
on the Y axis, with up-arrow and down-arrow buttons for quick scrolling through the
list of variables.

When you click the Y Variables button, the Y Variable Selection dialog box is invoked.
When you select variables and click OK, the plot is drawn. After you have drawn a plot,

10 Insights User’s Guide


Section

if you click a different plot type, the new plot is drawn with the same Y variable
selections.

The up-arrow and down-arrow buttons are used for quick scrolling through the variables
in the dataset. After you have selected any n variables, the up-arrow (or down-arrow)
button will select the next (or previous) n variables, without opening the Y Variable
Selection dialog. To keep a selected set of variables in the plot while scrolling through
other variables, select the Freeze tool and select the variables you want to keep in the
plot.

Insights User’s Guide 11


Row Number Plot
Row Number is the default plot type. It plots the row number on the X axis and one or
more variables on the Y axis. When you select a Row Number plot, the parameters that
appear are Y Variables, First Row, Row Count, and % Rows Visible.

Row Count and % Rows Visible display corresponding information, controlling the size
of the X axis. If you change either of these parameters, the other one is automatically
updated. To display all rows, click the 100% button. Y variables are selected as
described in “Selecting Y Variables”, above. First Row and Row Count control which
rows of the dataset are plotted. To change either of them after you have drawn a plot,
click in its text box, type the new number, and either press the Return key while the
mouse is still in the text box, or click Draw. If you scroll through the plot using the
scrollbar or the left and right arrow buttons, the value of First Row is automatically
updated.

12 Insights User’s Guide


Section

Time Series Plot


The time series plot shows time on the X axis and one or more variables on the Y axis.
You cannot make a time series plot of a variable that does not have an associated date/
time column. When you select a time series plot, the parameters that appear are Y
Variables, Start Date and Time, Increment, and % Time Visible.

Start Date and Time, Increment, and % Time Visible control the portion of the dataset
that is plotted. Increment, which is expressed as a typed quantity with units selected

Insights User’s Guide 13


from an option menu, controls the amount of time displayed on the screen (scale of the
X axis).

This corresponds exactly to the % Time Visible. If you change either of these two
parameters, the other one is automatically updated. To display all times, click the 100%
button. Start Date and Time are used as an alternative to the scrollbar or the left and
right arrow buttons, to scroll through the dataset. The default values are Start Date and
Time at the beginning of the dataset, and 100% of the dataset time visible.
To change any of these parameters, click in its text box, type the new number, and either
press the Return key or click Draw.
Y Variables are selected as described in “Selecting Y Variables” on page 6-10. You may
not select any variable that does not have a valid date/time reference. You can view the
date/time reference of a variable in the spreadsheet, or by checking its properties as
described on page 5-43.
Increment and % Time Visible control the scale of the X axis. If you change the Start
Date/Time to values outside the bounds of your dataset, the displayed Increment and
% Time Visible do not change. If you scroll through the plot using the scrollbar or the
left and right arrow buttons, the starting date and time are automatically updated.

14 Insights User’s Guide


Section

XY Plot
XY plots one variable on the X axis and one or more variables on the Y axis. When you
select an XY plot, the parameters that appear are X Variable, Y Variables, First Row,
Row Count, and % Rows Visible.

Row Count and % Rows Visible display corresponding information, and if you change
either of these parameters, the other one is automatically updated. To display all rows,
click the 100% button. You may select X and Y variables in any order; plotting will
occur as soon as you select them both. Y variables are selected as described in
“Selecting Y Variables” on page 6-10.
When you click X Variable, the X Variable Selection box appears. This box simply lists
all variables in your dataset and allows you to select one of them for the X axis. You can
click your selection and then click OK, or double-click your selection, or click Cancel to
cancel the selection.
First Row and Row Count control which rows of the dataset are plotted. To change
either of these values before you draw a plot, click in its box and type the new number.

Insights User’s Guide 15


To change either of them after you have drawn a plot, click in its text box, type the new
number, and either press the Return key or click Draw.

Important: If you display less than 100% of an XY plot, the portion that
is displayed is selected by row number, not by X value. This is
different from some other commercially available plotting packages.

For an XY plot, the default Line Type is Lines off and Points on. If you turn Lines on,
the lines will connect the data points in order by row number.

Probability Plot
The probability plot, or theoretical quantile-quantile plot, plots the quantiles of the
selected variable(s) against the quantiles of a theoretical normal distribution. This plot is
useful for identifying outliers and data clusters or multiple populations. If the data were
perfectly normally distributed, it would plot in a straight line, with the slope of the line
indicating the standard deviation. The outliers are those points farthest from a line
followed by the rest of the points. If the data consists of multiple populations, there will
be multiple groups of points, each roughly a straight line. This can indicate that the
variable stabilizes around two or more distinct states.
For a rigorous definition of the quantile-quantile plot, consult a textbook or reference on
statistical methods for data analysis. As an approximation only, you can think of the
quantile-quantile plot like this:
Make a plot of the cumulative frequency distribution of your data, with percentage
of data on the X axis and data value at that percentage on the Y axis. Make a similar
plot for a normal curve, with percentage of data on the X axis and number of
standard deviations from the mean on the Y axis. Since these two plots share the
same X axis, you can group the plotted values in triplets of (X, Y, Y'). Drop the X
value from these triplets and you will have pairs consisting of cumulative frequency
within your data and cumulative frequency within a normal curve, both measured at
the same percentage of data. A plot of these pairs, with normal curve values on the
X axis and values within your data on the Y axis, is a quantile-quantile plot.
When you select a Probability plot, the only parameter is Y Variables; Y variables are
selected as described in “Selecting Y Variables” on page 6-10, but note that you cannot

16 Insights User’s Guide


Section

make a probability plot of a string or date/time variable. Depending on the underlying


frequency distribution, this plot can take a significant amount of time to generate.

Histogram Plot
When you select a histogram plot, the parameters that appear are Y Variables, First
Row, Row Count, Number of Bins, Bin Size and Offset, Cumulative, and % Data
Shown.

Insights User’s Guide 17


Row Count and % Data Shown display corresponding information, and if you change
either of these parameters, the other one is automatically updated. To display all data,
click the 100% button. Y variables are selected as described in “Selecting Y Variables”
on page 6-10. First Row and Row Count control which rows of the dataset are plotted.
To change either of them after you have drawn a plot, click in its text box, type the new
number, and either press the Return key while the mouse is still in the text box, or click
Draw. If you scroll through the plot using the scrollbar or the left and right arrow
buttons, the value of First Row is automatically updated.
You can specify the histogram bin divisions as a Number of Bins, or as a Bin Size and
Offset, such that the first bin starts at the offset plus a multiple of the bin size. If the
Cumulative toggle is turned on, the count in each bin includes the count in all earlier
bins.
For a Histogram plot, Graph Type cannot be selected.

P.C.A. Plot
A Principal Components Analysis (P.C.A.) plot shows component number on the X axis
and the product of the principal component vector with its weight on the Y axis.

You must select at least two Y variables to make a P.C.A. plot, and they must be
numeric variables, not date/times or strings. Select Y variables as described in
“Selecting Y Variables” on page 6-10.

18 Insights User’s Guide


Section

The Normalize toggle controls whether the calculations are based on normalized values,
which are divided by the standard deviation of the data, or unnormalized values, which
are not. Selecting Normalize does not normalize the results.
The default graphical representation of P.C.A. data is as a series of colored squares. The
range from -1 to +1 times the value of largest magnitude is divided into a number of
equal portions, and each portion is assigned a color; the default colors are shades of blue
for negative values, and shades of red for positive values. The number of portions is
equal to the number of colors specified as color resources. The size of each square is
proportional to the magnitude of the value.

If the Vary Size toggle is turned off, all squares are the same size; if Vary Color is off, all
squares are black; if both are off, no squares appear. The Line Type controls for a
P.C.A. plot default to Points and Lines both turned off, but they can be turned on; if on,
they are plotted on an unmarked Y axis that is identical to the range of calculated

Insights User’s Guide 19


values. The example below not only has Points and Lines turned on, but it also has Vary
Size and Vary Color turned off.

The Graph Type selection has no effect on a P.C.A. plot.

20 Insights User’s Guide


Section

To add scores to your dataset, click Create PCA Projection Transforms.

The operation adds transforms that calculate the scores for each component. You specify
a prefix to identify the new variables, or you may accept the default.

A working dialog appears as transforms are added.


To generate a report showing the eigenvalues and weights, click Write PCA Report.
There is also a transform, $pca, (described on page A-20), that you can use to perform
PCA analysis. To build a PCR model, see “PCR” on page 8-4.

Insights User’s Guide 21


Correlation Plot
A Correlation plot shows linear correlation values. For the equations, see any standard
textbook on statistics.

Note: The algorithms used for the Correlation plot do not detect
nonlinear correlations. Variables in your dataset may be strongly
related nonlinearly even though the Correlation plot indicates no
correlation. To analyze nonlinear correlations in your data, use the
model builder’s Find Time Delays feature (described on page 8-12).

In the Correlation plot, data is represented in the same style of colored boxes as the
P.C.A. plot, described above. The Normalize, Vary Size, and Vary Color toggle buttons,
and Line Type and Graph Type selections, function as described above for a P.C.A. plot.
The parameters for a Correlation plot are an option menu to select Y vs. Y or X vs. Y
plotting, X Variable and Y Variable selection buttons; toggle buttons to set Normalize,
Vary Size, and Vary Color; a Use Time toggle, and other parameters that depend on the
setting for Use Time.

22 Insights User’s Guide


Section

Options for Y vs. Y correlation plot:

Insights User’s Guide 23


Options for X vs. Y correlation plot:

24 Insights User’s Guide


Section

Options for correlation plot with Use Time option:

Y vs. Y is a shortcut for making identical selections for the X and Y variables; X vs. Y
lets you make independent selections for X and Y variables. If Y vs. Y is selected, the
X Variables button is grayed out.
If Use Time is turned off, the additional parameters are X Tau and Y Tau. These values
specify an integer number of rows that the X and Y variables should be shifted before
the correlation calculations are made.
If Use Time is turned on, the analysis calculates correlations for a group of Y variables
against a single X variable that is time-shifted through a range of time steps. The
additional parameters are Start Tau, Stop Tau, and Interval, the Y vs. Y toggle
disappears, and the X Variables(plural) button becomes an X Variable(singular) button.
For example, if you set Start Tau -10, Stop Tau +10, and Interval 1, then all of the

Insights User’s Guide 25


Y variables will be correlated against 21 transient variables created by shifting the
X variable by -10 rows, -9 rows, -8 rows, …, and +10 rows.

To generate a report file of correlation statistics, click Write Corr Report. The operation
prompts you to specify a file name.
Correlation values can be written into columns in the dataset using any of the transforms
$correlation, $corr, $covariance, or $covarTD, described on page A-8.

26 Insights User’s Guide


Section

Y Axis Limits
To change the limits of a Y axis, move the mouse over the Y axis legend until it is
highlighted, and click; the Axis Limits box will appear (note that you must click on the
highlighted region and not on the axis line itself).

You can set a variable’s Y axis limits to the limits of its displayed values, or the limits of
all of its values in the dataset, or to any arbitrary minimum and maximum value. If you
choose to display data’s min and max or variable’s min and max, the Display Cuts
toggle button in the Plot window controls whether the values of Cut and Break points
are considered in calculating the min and max. When you click Apply, this dialog is
closed and the plot axis is changed as specified.

Crosshairs
After a plot is drawn, two black arrows appear in the lower left corner of the drawing
area.

Insights User’s Guide 27


Drag the vertical arrow to move a horizontal line across the plot, and move the
horizontal arrow to move a vertical line across the plot. By placing the intersection of
the lines, the “cross-hairs,” over specific points, you can display the exact X or Y value
of the point. To make the lines disappear, drag them off of the plot.
In the example below, the vertical arrow has been moved upward into the press1 plot.
The information bar at the top of the plot shows the press1 value at the line’s position.

Tools

The plotter provides a number of tools to help you examine data and select data for
cutting.

28 Insights User’s Guide


Section

The informational tools are Info (on page 6-31) and Zoom (on page 6-37). The tools that
change the dataset, and thus generate transforms, are Clip, Cut (three variations), and
Uncut (on page 6-30). The Freeze tool (on page 6-38) is used to select variables to retain
in the plot while cycling through the other variables. The currently-selected tool is
outlined and highlighted. The default tool is Info. Not all tools are available for all types
of plots; if a tool is not available, its icon is grayed out.

In the after-transforms state of the dataset, clips, cuts, and uncuts are applied to the
dataset as transforms, so they can be viewed, modified, or deleted, like any other
transform. This can be especially useful if you are having difficulty getting precise
mouse movement in a crowded plot; just use the mouse to mark approximately the area
that you want, then open the Transform window, select the transform, and modify it to
the exact values that you want.
However, in the before-transforms state of the dataset, clips, cuts, and uncuts change the
data directly, with no record of the change that was made, and no direct Undo capability.
Cuts and uncuts can still be rescinded with an opposite action (either before- or after-
transforms), or with a $changestat transform, but there is no Undo capability for
before-transforms clips. If you destroy data by a before-transforms clip, the only way
to recover is to re-create the dataset from the raw data file. Points that have been clipped
or cut are indicated in a plot by colored dots. Cut points can be hidden if you turn off the
Display Cuts toggle button, but clipped points are always visible. When a point has been
cut, its value is still known, but its status is Cut so the value is not used in calculations.
You can restore a cut point to OK status by marking it with the Uncut tool, or by
removing the transform that cut it, or by clicking in its cell in the spreadsheet and typing
OK.

Insights User’s Guide 29


Transform Tools
These are the transforms generated when you use the plot tools on an after-transforms
plot:

Clip $ClipAbove or $ClipBelow


applied from a Row Number, Time
Series, XY, or Probability plot

Cut Y $CutAbove or $CutBelow


applied from a Row Number, Time
Series, XY, or Probability plot

Cut Box or Cut X $MarkCut


applied from a Row Number plot

Cut Box or Cut X $TimeCut


applied from a Time Series plot

Cut Box or Cut X $ScatCut


applied from an XY plot

Uncut $UnMarkCut
applied from a Row Number plot

Uncut $UnTimeCut
applied from a Time Series plot

Uncut $UnScatCut
applied from an XY plot

When you Cut Y or Clip, the transform is based only on the variable’s value. When you
Cut Box, Cut X, or Uncut, the transform applies the cut or uncut according to the X axis
in the current plot type: $MarkCut and $UnMarkCut are based on row numbers in
the dataset, $TimeCut and $UnTimeCut are based on date and time in the variable’s
date/time column reference, and $ScatCut and $UnScatCut are based on row
numbers combined with pairs of values in the selected variable and the X axis variable.
Any method of uncut can be used after any method of cut.

30 Insights User’s Guide


Section

When you apply a clip, cut, or uncut to an Overlay or Normalized plot, the system
generates one transform for every variable in the plot, even if a particular variable does
not have any points that are affected by the action.
The plot tools provide a quick and easy means to cut data from the dataset that you are
currently using; but cuts from Cut X or Cut Box, which are based on a row number or
date and time in the current dataset, will have no effect (or even an undesirable effect) if
you inherit the transforms onto another dataset, or if you save the dataset to be used in
real time with . It is almost always preferable to remove data from a dataset by using a
transform that describes conditions when data should be ignored.
For a large dataset with a large number of transforms, it can take a considerable amount
of time to apply the dataset’s transform list. You can minimize the calculation time if
you apply cuts and uncuts carefully. For example, suppose that you intend to cut points
from rows 4167 through 4200, but find that you missed and accidentally cut rows 4150
through 4200. You could simply apply an uncut to rows 4150-4166; or you could open
the transform window, find the cut in the transform list, and modify it to apply only to
the correct rows. Modifying the transform is more work for you, but it means that when
the dataset’s transform list is evaluated, only one transform (from these actions) will be
calculated, instead of one cut and one uncut.

Info
Info is used to display information about a plotted point.

Click on the Info icon, move the mouse to the desired point, and push and hold the
mouse button. While the button is held, the coordinates of the point and any other
pertinent information will be displayed.
Information for row number plot:

Insights User’s Guide 31


Information for time series plot:

Information for XY plot:

Information for probability plot:

Information for histogram plot:

Information for P.C.A. plot:

Information for correlation plot:

32 Insights User’s Guide


Section

Clip
Clip is used with numeric variables only, to change data values above or below a
selected value to be equal to that value.

It affects the entire variable, even if not all of it is visible in the plot. In a single clip
action you can set either an upper limit or a lower limit, but not both at the same time.
To set an upper limit, click on the Clip icon, move the mouse anywhere in the top half of
the plot to be clipped, push and hold the mouse button, and drag the mouse vertically on
the plot. As you drag the mouse, the Y value of its current position is displayed, the area
to be clipped is highlighted, and every affected point is marked with a yellow vertical
arrow.

The vertical arrows are difficult to distinguish when the plot shows a large number of
points, but when displaying fewer points, you can distinguish them.

Insights User’s Guide 33


When you reach the Y value that you want to set as the upper limit of the data, release
the mouse button, and Apply and Cancel buttons will appear; click Apply, and all data
points greater than that value will be changed to that value.
A lower limit is set similarly: drag starting in the bottom half of the plot.

Cut Y
Cut Y is used with numeric variables only, to cut all data values above or below a
selected value.

It affects the entire variable, even if not all of it is visible in the plot. In a single Cut Y
action you can set either an upper limit or a lower limit, but not both at the same time.
The mouse actions are the same as described above for clipping. To set an upper limit,
click on the Cut Y icon, move the mouse anywhere in the top half of the plot to be cut,
push and hold the mouse button, and drag the mouse vertically on the plot. As you drag
the mouse, the Y value of its current position is displayed, the area to be cut is
highlighted, and every affected point is marked. When you reach the Y value that you
want to set as the upper limit of the data, release the mouse button, and click Apply; all
data points greater than that value will be cut. A lower limit is set similarly, starting in
the bottom half of the plot.

Cut Box
Cut Box is used to define a rectangular area of a plot within which all data points are cut
from the dataset, based on time or row number.

Click on the Cut Box icon, move the mouse to one corner of the rectangular area you
wish to cut, push and hold the mouse button, drag the mouse to the opposite corner of

34 Insights User’s Guide


Section

the rectangle, and release the button. The rectangle will be highlighted, and all data
points within it will be marked with a red box.

The red boxes are difficult to distinguish when the plot shows a large number of points,
but when displaying fewer points, you can distinguish them.

When you click Apply, all data points within the rectangle will be cut from the dataset.

Cut X
Cut X is used to cut all values within a range that is defined on the X axis, based on time
or row number.

Insights User’s Guide 35


Click on the Cut X icon. If you have stacked plots, you can cut from just one at a time,
or from all of them together. To cut from just one plot, move the mouse into that plot
and drag through the desired area.

To cut from all plots, move the mouse into the area between any two plots and drag
through the desired area.

36 Insights User’s Guide


Section

As you drag the mouse, the area to be cut is highlighted, and every affected point is
marked with a red box. After marking the area to be cut, release the mouse button, and
click Apply; all data points within the marked range will be cut.

Uncut
Uncut is used to restore the previous value of some points that have been cut using any
of the cut tools.

Uncut is not the same as undoing a Cut; to undo a Cut, find it in the transform list and
delete it. Click on the Uncut icon, and drag through a rectangular area (as described
above for Cut Box). It’s OK if the region includes points that were not already cut. Click
Apply and any previously cut points within the rectangle will be uncut.

Zoom
Zoom is used to magnify or “zoom in on” a portion of a plot.

Click on the Zoom icon, and use the mouse to drag through a rectangular area (as
described above for Cut Box). Cancel and Apply Zoom buttons will appear.

Click Apply Zoom and the selected region will expand to fill the entire plotting area,
and an Undo Zoom button will appear. You may repeat Zoom as many times as
necessary. When you click Undo Zoom, the original plot will be restored. The other plot

Insights User’s Guide 37


tools are still available when the plot is zoomed; however, some features do not function
while zoomed, and others function normally and undo the Zoom setting.

Freeze
Use the Freeze tool to select variables to retain in the plot so that you can use the up-
arrow and down-arrow buttons to cycle through the other variables, comparing them to
the “frozen” variables.

First select the freeze tool. Then click one or more variables in the plot. The border on
variables selected for freeze changes from a dotted line to a solid line. Now use the up-
arrow and down-arrow buttons to the left of the plot area to cycle through the other
variables. The frozen variables remain in place so that you can compare them to each
page of variables at a time.

38 Insights User’s Guide


Section

Printing a Plot
The Print button is used to print a copy of the plotting area (not the entire window). It
invokes the Plot Print setup window, which sends a PostScript specification to a file or
directly to a printer.

Select the orientation, paper size, number of copies, and destination, and whether to
print in color (Color On) or black & white (Color Off). Consult your system
administrator for a list of valid printer names on your system. If you write to a file, you
can specify it by file name only, or by its full directory path. You can optionally print
System Labels, showing the dataset name and the current date and time; you can also
print labels on the top of the page, and along the X and Y axes. Fonts and sizes can be
specified independently for the labels.

Exiting the Plotter


The Done button closes the plotter. If you have more than one plotter open, they all have
to be closed separately; closing one of them does not affect the others. When you close

Insights User’s Guide 39


the last plotter window, the dataset that was loaded in it remains loaded in the system.
The dataset remains loaded even if you open any of the model-related functions such as
the trainer or the model analysis tools. However, a dataset that you have created or
modified is not automatically saved; you must explicitly save it or it will be lost when
you exit the program. If you close the plotter without saving the dataset, you will be
warned.

40 Insights User’s Guide


7


Transform Order, page 7-2
Transform Window, page 7-3 7 Transform Calculator
• Syntax, page 7-7
• Building a New Transform,
page 7-16
• Editing the Transform List,
page 7-18
• Transform Errors and Panic Stop,
page 7-22
• User-Defined Transforms,
page 7-23
• Transform Control Statements,
page 7-23

This chapter describes how to use the transform calculator to add, display, and delete
transforms in your dataset. The effects of transforms appear immediately in the after-
transforms view of the dataset. To invoke the transform calculator, click Transform at
the bottom of the spreadsheet.
A transform function is a mathematical or logical operation that may be applied to a
variable or group of variables in the dataset. The result of the function can replace the
values of an existing variable or can be stored as a new variable. You can apply more
than one transform sequentially to the same variable.
All transforms of a dataset are kept in one ordered list, which is displayed in the
Transform window. The transform list includes functions that you apply directly from
the Transform window, and transforms that are automatically generated by a number of
user actions in the spreadsheet and plot windows (see “System-Generated Transforms”
on page A-30). When transforms are being applied to a dataset, a working window
displays each transform as it is being calculated. In some circumstances this message
includes a Stop button that allows you to cancel the evaluation, but this feature is not
always available.

Insights User’s Guide 1


Any transform in the list can be modified or deleted. Any action that generates a
transform can be undone by deleting the transform.
Transforms are recorded in the form
Variable = Expression
such that Expression is a formula built from the transform functions, dataset
variables (the inputs), and numeric, date/time, or string constants; and Variable (the
output) is the name of one or more variables into which the results of the calculation
will be stored. If A is an input to B, then B is said to depend on A. All transforms require
exactly one output variable, unless specifically noted otherwise in Appendix A,
Transform Reference.
If the output variable already exists in the dataset, the transform expression must use
that same variable as an input; you cannot simply replace an existing variable’s values
with something unrelated. If any input to a transform is a computed variable, that
transform’s output cannot be a raw variable. Some transform functions require that the
output be the same variable as one of the inputs.
Variables with different date/time references cannot be combined in a single transform
(but if a date/time reference is “none”, it is exempt from this rule). You can change the
date/time reference of a raw or independent variable (by typing in its header cell or in
the Properties window), but not of a computed variable, which inherits its date/time
from its input variable.
In general, you can combine as many functions as you wish into a single, complex
transform expression. However, some transform functions cannot be combined with
certain others in a single expression, and will result in an error message similar to
“invalid mix of vector functions with scalar function”. If this message appears, apply
the functions sequentially instead of combined in a single transform.

Transform Order
The internal architecture of spreadsheet software is typically either equation-based (also
called formula-based) or data flow.
In an equation-based system, each column (or, in some applications, each cell) has its
own formula or list of formulas. The system keeps track of which columns reference
which other columns, so that calculations are done in the correct order. If A already
exists in the spreadsheet, and you create a new column B=2*A, then any time you
change a value in A, B is automatically recalculated. No matter what you do to A, B

2 Insights User’s Guide


Section

always retains the relationship 2*A. The user never has to be concerned with the order
in which formulas are applied, or with maintaining the relationships among variables.
Many commercial spreadsheet programs use an equation-based architecture.
The simple-to-use and easy-to-understand equation-based architecture, however, is not
compatible with modifying or deleting any transforms that were done before a time
merge. These capabilities require a data flow architecture.
In a data flow system, the formulas function like a programming language. There is one
ordered list of formulas for the entire spreadsheet, and the formulas are calculated
strictly in the order in which they appear in the list. If you enter the formula B=2*A, it
only means “set B to 2 times the value that A has right now”; if you subsequently enter
any formula that changes A’s value, B is not affected. This architecture is far more
powerful than equation-based, but it requires considerable attention and understanding
from the user to ensure that formulas are entered in the correct order to produce the
desired results.
The transform calculator implements a restricted data flow architecture that emulates an
equation-based system. When you enter a new transform, instead of being placed at the
bottom of the list, it is automatically inserted in a position which retains the
relationships that you have already specified. If you enter the formula B=2*A, and
subsequently enter any formula that changes A’s value, that new formula is inserted in
the list above B=2*A, so B’s value is forced to change, in a manner that is compatible
with the data flow concept. For any particular transform, there may be a number of
different “legal” positions; if necessary, you can rearrange the order of the transform
list, but you cannot rearrange it so that it stops emulating an equation-based system.
This architecture supports modifying or deleting transforms that were done before a
time merge.

Transform Window
The Transform window is used to build, modify, rearrange, and delete transforms. It is
invoked when you click the Transform button in the spreadsheet or in the plot window,

Insights User’s Guide 3


or the Show Transforms button in the Properties window (see “Variable Properties” on
page 5-43).

At the top of the window are text boxes for specifying the variable name and the
expression, with control buttons to clear the text boxes or enter the transform into the
transform list.
The calculator buttons, and lists of variables and transform functions and constants, are
used solely as a convenience in constructing a transform; their only action is to place
text into the Variable and Expression text boxes. If you prefer, you can type the text
from the keyboard, without using these buttons and lists.

4 Insights User’s Guide


Section

The Multiple button invokes the common Variable Selection box. This is a convenience
for placing multiple variable names into a transform, automatically provided with the
required syntax (as described on page 7-7).

Insights User’s Guide 5


The complete transform function list contains a large number of functions and
constants. The option menu is a convenience for displaying a subset of the list.

A trigger button, labeled with the ellipsis (“...”), appears next to some of the more
complex functions such as the $TimeMerge transform. Click the trigger button to
display a dialog for specifying arguments.
The transform list is the list of all transforms that have been applied to the dataset (and
that match the mask, if any). In this display, the complete transform list is numbered
sequentially, but the index numbers are for display only and are not part of the transform
syntax.

6 Insights User’s Guide


Section

The bidirectional arrow button is used to expand the transform list vertically to fill the
window,

and to return the window to its original state.

Masking is used to selectively display only a portion of the transform list. It is primarily
used to display all transforms that have been applied to a specified variable, but it can
also be used for any other lexical matching. Masking is applied from the Current Mask
text field; type in any text and press the Return key, and the transform list display will be
limited to transforms that contain that text (ignoring case). As a convenience for
displaying transforms that have been applied to a specified variable, click on that
variable in the Mask list; its name and the appropriate punctuation will be inserted in the
Current Mask text field.
If you select a spreadsheet column and then open the Transform window, or if you open
the Properties window, select a variable, and click Show Transforms to open the
Transform window, a mask is automatically applied to display transforms that have that
variable as output.
The Show All button removes the current mask and displays the complete transform list.

Syntax
The syntax of each transform function is given in Appendix A, Transform Reference;
this section discusses general syntactic rules. The syntax of each function is also
displayed in the positional help at the bottom of the window when you move the mouse
over the function’s name in the list. For example, placing the mouse over the average

Insights User’s Guide 7


transform in the functions and Constants list shows its syntax at the bottom of the
window.

The symbols $, !, and ; have special meanings in transforms.


• $ is a special key symbol provided in case there is ever a name conflict between
one of your variables and the name of a transform function or keyword; in that
situation, when the word appears without an initial $ it is taken to refer to your
variable, and when the word appears with an initial $ it is taken to refer to the
function or keyword. If there is no name conflict, the $ is allowed but not
necessary.
• ! must be used to surround variable names that contain characters other than
alphanumerics, or that consist solely of numerals. Except as noted below, !s are
optional for any other variable names.
• ; in a transform expression (but not inside a variable name surrounded with !s)
causes the remainder of the expression to be treated as a comment. This feature can
be used to attach initials or explanations to transforms; it can also be used to
suppress or “comment out” a transform without removing it from the transform list.

8 Insights User’s Guide


Section

However, note that you cannot make a transform that consists solely of a comment,
without a valid expression. For example, to suppress a transform
!A! = $ln(!A!)
you could modify it to
!A! = !A!;$ln(!A!)
When you create a new variable, its tag name can be any valid name that you wish; it
does not have to match or indicate any tag in your process. Valid names must not be
longer than 72 characters, and may not contain an exclamation point (!), a double
quote ("), a left curly brace ({), or a right curly brace (}).
Expressions can include extra blank spaces, tabs, and redundant parentheses. Names of
variables, functions, and constants are not case sensitive; sometimes we display and
document them in uppercase or mixed case, but this is solely for ease of recognition.
If the output of a transform is the same variable as one of its inputs, you can substitute
the symbol $self instead of repeating the name of the input.
Numeric constants can be entered in decimal or scientific notation.
Date/time constants are specified as date followed by time, separated by at least one
space, surrounded by backslash (\) characters. The date and time must be in a form that
the parser can recognize; a number of these are documented in “Units” on page 4-12.
The recommended form is
\mm/dd/yy hh:mm:ss.ttt\
with seconds and thousandths optional. There are additional forms that can be
recognized, including
\3 days\
and
\4.1 hours\
Character string constants must be surrounded with either double quote (") or single
quote (') characters. There are two options for typing a character string constant with
embedded quotes; either type the embedded value twice, or surround the string with the
other quote style. For example, the string
ab"cd
could be typed as
"ab""cd"

Insights User’s Guide 9


or it could be typed as
'ab"cd'
and the string
ab'cd
could be typed as
'ab''cd'
or it could be typed as
"ab'cd"
Also note that if your keyboard features a back-quote character (`), also called left-
quote, that character is not interchangeable with the regular single quote or double
quote.

Transforms With Multiple Outputs


For almost all cases, the output of a transform is just one variable. However, there are a
few special-purpose transform functions (documented in Appendix A, Transform
Reference) that produce more than one output variable. For these few exceptional
functions, each output variable name must be surrounded with exclamation points (!),
and the list of output variables must be separated with commas and surrounded with
parentheses. Each such transform produces a fixed number of output variables; if you
want to suppress any of the outputs, omit its name but retain the comma as a place-
holder. For example,

10 Insights User’s Guide


Section

calculates five new variables containing fft information on an input variable called
signal; but

omits the second and third outputs, calculating only the frequency, magnitude, and
phase information.

Entering Multiple Transforms Simultaneously


There is a shortcut for applying similar transforms to a number of variables
simultaneously. It only works if the output of each transform is the same as one of its
input variables. If, for example, you want to take the log of five variables, A, B, C, D,
and E, you could type in five transforms,
!A!=$ln(!A!)
!B!=$ln(!B!)
!C!=$ln(!C!)
!D!=$ln(!D!)
!E!=$ln(!E!)
Alternatively, you could use $self,
!A!=$ln($self)
!B!=$ln($self)
!C!=$ln($self)
!D!=$ln($self)
!E!=$ln($self)

Insights User’s Guide 11


As a shortcut, you can enter these five transforms simultaneously, like this:

This syntax generates five different transforms, each placed in the transform list in its
own correct position (not necessarily the order in which you typed the output variables)
.

When you use this syntax, each output variable must be surrounded with exclamation
points and the list must be separated with commas, but you must not surround the list
with parentheses. The syntax that uses parentheses is only for a single transform that has
multiple outputs.

Column Length of Independent Variables


When you create a new independent variable (it includes functions or constants, but no
variable names), the transform that appears in the list automatically includes a colon and
number of rows. For example, if your dataset currently has 11885 rows, and you type

12 Insights User’s Guide


Section

it will appear in the transform list as

The default number of rows is the length of the dataset at the time the transform was
created. If you wish to specify a different length, you can do so when you create the
transform

or you can modify the transform after it has been created. (If you modify the transform
and remove the length completely, the default length will automatically be put back in.)

Arithmetic Operators
+ unary positive or binary addition
- unary negative or binary subtraction
* multiplication
/ division
^ raise to the power (not necessarily a positive integer)
These operators function as on an ordinary calculator.
% percent; N% is replaced internally with N/100 (N must be a constant)

Insights User’s Guide 13


Relational Operators
< less than
<= less than or equal to
= equal to
<> not equal to
>= greater than or equal to
> greater than
These operators are used to form an expression that is evaluated as 0 if it is false and 1 if
it is true.
Example: if A is a variable in your dataset, and you create a new variable such that

then the value of New will be 1 in any data row in which A is less than 123, and New
will be 0 in all other data rows.

Conditional Expression Constructors


$if
$and
$or inclusive or
$xor exclusive or (not available on a button, you have to type it in)
$not
(
)
,

14 Insights User’s Guide


Section

A conditional expression is of the form


$if (Expression, trueValue, falseValue)
If the expression reduces to a value of false (zero), the result is falseValue; if the
expression reduces to a value of true (nonzero), the result is trueValue.
Example: if A and B are variables in your dataset, and you create a New variable such
that

then the value of New will be 0 in any data rows in which both A and B are less than
123, and New will be 123 when A or B or both reach 123.

Note: Unlike the spreadsheet Search function (described on page


5-25), a conditional expression of the form $if(!x!=22.1) does not
tolerate inexactness in numeric comparisons. $if(!x!=22.1) will fail
if x is actually 22.099 or 22.101. For a “close to equal” test, use the
$withinPct transform (described on page A-26), or construct an
explicit test such as $if((22.09<!x!)$and(!x!<22.11)).

Conditional expressions can be nested; for example

which is expressed verbally as “if A is less than 10, the result is 1; otherwise, if A is less
than 100, the result is 2; otherwise, if A is less than 1000, the result is 3; otherwise, the
result is 0”. See also the $findLE transform, which functions as a multiply-nested if-
less-than-or-equal.

Insights User’s Guide 15


See also the transforms $isBadStatus, $isBreak, and $isValid, described in
Appendix A, Transform Reference.

Miscellaneous Buttons
\ Special key symbol used to surround date/time constants.
: Used in time values, and to specify the column length of new independent
variables.
space Used at your convenience to insert blank spaces for easier reading.
" Used to surround character string constants; interchangeable with single
quote (').
; Comments out the remainder of the transform, as described in “Syntax” on
page 7-7.

Building a New Transform


The output variable name (or list of names) goes in the Variable text box, and the
transform expression goes in the Expression text box. You can type directly into these
fields if you prefer, or you can build the text by clicking on the variable names,
functions and constants, and calculator buttons. For example, if you first click on

16 Insights User’s Guide


Section

average in the Functions and Constants list below and then click on flow1 in the
variables list, the corresponding terms appear in the Expression field:

When you first open this window, the window manager’s focus should be in the
Variable text box; if not, move the mouse outside the window and back in, and, if
necessary, click in the text box. After you fill in the Variable, if you press the Return
key, the focus will automatically move to the Expression. At any time you can click the
Clear button; both text boxes will be erased, and the focus will move back to the
Variable. After you fill in the Expression, if you press the Return key or click the Enter
button, the new transform is inserted in its correct position within the transform list, and
any affected transforms are recalculated. A message informs you which transform is
currently being calculated.
(If you enter a transform that is parsed but cannot be evaluated, the entire transform list
is reapplied before a message notifies you of the failure, to ensure that the dataset is
restored to a safe condition.) After the transform list has been calculated, the text of the
new transform remains in the Variable and Expression fields, because it is common to
make several similar transforms sequentially. If you don’t want to re-use any of this text,
simply click the Clear button to erase it.

Insights User’s Guide 17


Editing the Transform List
Important: Before editing the transform list, be sure to read
“Transform Order” on page 7-2.

The transform list Edit functions are in the Edit pull-down menu.

CAUTION: When you enter a new transform, it is guaranteed to be


placed in a valid position in the list, provided you have never edited
the list. If you edit the list, you must do so carefully, or you can create
a condition where there is no valid position for a new transform. Such
a condition may not become evident until after you apply a number of
additional transforms that are all inserted correctly. When you edit the
list, you are responsible for making sure that the edited order is valid.

While you are editing the transform list, it is likely to take several steps to complete the
editing that you intend; therefore, recalculation is automatically suspended until you
select it. When you edit the transform list, the Cancel Changes and Update Dataset
buttons, which are normally grayed out, become enabled (and are highlighted in red).

Note: When these buttons are enabled, you can continue to edit the
transform list, but you cannot enter new transforms for automatic
insertion, or save the dataset, or perform any action in the
spreadsheet or Plot window that would generate a transform, until you
click Cancel Changes, Update Dataset (recalculate), or Done
(recalculate and close the window).

Exceptions to this are Delete All, which is automatically followed by recalculate, and
Copy, which does not change the list. Cancel Changes only cancels the transform list
editing that has not yet been applied; it does not cancel any changes that have already
been applied to the dataset.

18 Insights User’s Guide


Section

Append
Append changes the Enter button to Append, and allows you to create one new
transform that is placed at the bottom of the transform list
.

The Append operation bypasses the rules that would insert it only in a valid position.
The Clear button cancels the Append operation.

Insert Before
You must select (click on) a transform in the list before you can select Insert Before.

Insert Before changes the Enter button to Insert, and allows you to create one new
transform that is placed immediately before the selected transform, bypassing the rules
that would insert it only in a valid position. The Clear button cancels the Insert Before
operation.

Modify
You must select (click on) a transform in the list before you can select Modify. As a
shortcut, just double-click on the transform. Modify changes the Enter button to Modify
and the Clear button to Cancel, and copies the selected transform back up into the

Insights User’s Guide 19


Variable and Expression text boxes, allowing you to change it. The Cancel button
cancels the Modify operation.

The result of a modified transform is placed in the transform list in the same position it
was in before it was modified, which may no longer be valid for it if you have changed
any variable names in it. (If you modify one transform, but you don’t change any of the
variable names that it uses for input or output, its position in the list will almost always
still be valid.)
When you modify a transform, you may not use the shortcut syntax described on page
7-11 to enter multiple transforms simultaneously.
Windows that are invoked from trigger buttons, as described on page 7-6, are used only
for creating new transforms and cannot be used to modify them.

Delete
You can select one or more transforms from the list and then Delete them. To select one
transform, click on it; to unselect it, control-click. To select a contiguous group, drag
through them, or click on the first one and shift-click on the last one. After selecting
transforms, select Delete from the menu.

Delete All
If the full transform list is displayed, Delete All deletes them all; if the transform list is
masked, Delete All deletes only the ones that are visible. You are asked to confirm, and

20 Insights User’s Guide


Section

the transforms are deleted and the transform list (if any of it remains) is immediately
recalculated.

Cut
You can select one or more transforms from the list and then Cut them. Cut removes
them from the list, but saves them in an internal Cut/Copy/Paste buffer so you can Paste
them later. The cut transforms remain in the buffer until you exit the Pavilion product or
until you cut or copy again. The contents of the buffer remain intact even if you load a
different dataset.

Copy
You can select one or more transforms from the list and then Copy them. Copy saves
them into the internal Cut/Copy/Paste buffer without removing them from the list. As
with Cut, the copied transforms remain in the buffer until you exit the Pavilion product
or until you cut or copy again. The contents of the buffer remain intact even if you load
a different dataset

Paste Before, Paste After


If the transform list is empty, you can select Paste Before or Paste After to paste the Cut/
Copy/Paste buffer into it. If the transform list is not empty, you must first select a
transform to reference the Before or After.

Insights User’s Guide 21


Breakpoints: Debugging the Transform List
To help debug a transform list, you can set breakpoints that pause processing so you can
examine the dataset. The debugging operations are:
Run with BreakPoints
Begin processing at the first transform, applying the transforms to the raw data until
reaching the first breakpoint.
Set BreakPoint
Define breakpoints at the currently-selected transforms. A breakpoint causes
processing to pause before the transform is processed.
Clear BreakPoint
Remove the breakpoints, if any, defined at the currently-selected transforms.
Clear All BreakPoints
Remove all breakpoints from the transform list.
When the transform calculator pauses at a breakpoint, a message box indicates the
breakpoint transform number. Click OK in the message box. While processing is paused
at a breakpoint, you can review the spreadsheet. The after transforms table reflects the
state of the dataset before executing the transform at the current breakpoint.
To continue processing of the transform list from the current breakpoint, click Update
Dataset in the transform calculator. If you modify the transform list while execution is
paused at a breakpoint, clicking Update Dataset restarts processing from the first
transform. To halt processing when paused at a breakpoint, click Cancel Changes in the
transform calculator.
The breakpoints remain in the transform list until you remove them or until you unload
the current dataset. If you save the dataset, either as a normal offline dataset or as an
OnLine dataset, the breakpoints are not included in the saved version of the dataset.

Transform Errors and Panic Stop


Evaluation of the transform list will stop if a transform expression is invalid, or if a
transform is out of order, or if you click the panic Stop button in the working dialog that
displays transforms as they are calculated. When any of these situations occurs, the
Cancel Changes and Update Dataset buttons become enabled, and you must either
abandon the changes or repair the transform list. You can repair the list by modifying,
rearranging, or deleting transforms; see “Transform Order” on page 7-2. The basic rule

22 Insights User’s Guide


Section

of order is that no dataset variable can be used as an output after it has already been used
as an input to a different variable.

User-Defined Transforms
There is some limited capability for the transform calculator to access C functions that
you write, treating them as special-purpose, customized transform functions. Some
examples are provided, and appear in the User-Defined group of functions (but do not
appear in the All functions list). For more information, see Appendix E, User-Defined
Transforms, or contact your customer support representative.

Transform Control Statements


The transform calculator includes a number of restrictions that protect the integrity of a
dataset. However, operational requirements sometimes make it necessary to bypass
these restrictions. We have provided the capability to bypass the standard transform
restrictions, using the control statements listed below, which have been excerpted from
“Tcl,” a general-purpose, publicly-available scripting language. Tcl is described in John
K. Ousterhout, Tcl and the Tk Toolkit (Reading, MA: Addison-Wesley Publishing
Company, 1994).

CAUTION: When a dataset’s transform list contains any of these


control statements, some of the safeguards that generally operate on
a dataset are disabled. It then becomes possible for you to delete or
rename a variable on which other variables depend, causing the
dataset to become unusable. To avoid these difficulties, (1) always use
exclamation points (!) around spreadsheet variables that are
referenced in Tcl commands, and (2) never use a deleted or renamed
variable in a file that is referenced from a source command. When
you use these Tcl commands, you are responsible for ensuring that
the transform list can still be evaluated.

General Rules
Control statements can reference or change variables in the dataset; they can also assign
and query values for Tcl variables, which are variables that do not appear in the dataset
and are known only to the set of control statements.

Insights User’s Guide 23


Dataset variables are referenced within exclamation points(!), as throughout the
transform calculator. However, when a value is assigned to a Tcl variable, the variable is
referenced by name only, without any special symbols; but when the value of a Tcl
variable is used, the variable is preceded by a dollar sign ($). Tcl variable names consist
of letters, numbers, and underscores.
All control statements must be surrounded with curly braces ({ }). Within a control
statement, the curly braces are used for grouping. If statements are nested such that two
curly braces occur consecutively, they must be separated by at least one blank space.
If portions of a control statement are enclosed within square brackets ([ ]), those
portions are forced to be evaluated before the remainder of the statement is evaluated.
Square brackets can be nested.
Date/time constants cannot be used with transform control statements. Date/time
constants have the following form:
\30 min\ or \4.5 hr\

Commands and Examples


In these examples, !flow! is a variable in the dataset, and x and y are Tcl variables.
All of these commands can be combined and nested. Tcl commands are entered in the
transform calculator as expressions (on the right side of the equal sign), without any
entry in the Variable field (on the left side of the equal sign).
set Assign a value to a Tcl variable.
= {set x 1}
assigns the value 1 to x;
= {set y $x}
assigns the value that is already in x to y.
get Query the value in a dataset variable for a specified row number.
= {get !flow! 2}
returns the value in row 2 of !flow!;
= {get !flow! 0}
returns the value in the last row of !flow!;
= {get !flow! -1}
returns the value in the next-to-last row of !flow!;
= {set x [get !flow! 0]}
first gets the value in the last row of !flow! and then assigns it to the Tcl
variable x.

24 Insights User’s Guide


Section

apply Apply any transform.


= {apply !flow! = !flow! + 1}
applies the transform !flow!=!flow!+1
= {apply !flow! = !flow! + $x}
creates a transform containing the current value of the Tcl variable x, and
applies it;
= {apply !flow! = !flow! + [get !flow! 0]}
first gets the value in the last row of !flow!, then creates a transform using
that value, and then applies the transform.
Note that if the transform arguments include any string constants, the
transform calculator generally allows them to be enclosed in either double
quotes (") or single quotes ('); but when the transform is used like this, in a Tcl
apply statement, it can only be enclosed in single quotes ('), and not double
quotes (").
if The keyword if is followed by a condition and two expressions, each within
curly braces. If the condition is true, then the first expression is executed;
otherwise the second expression is executed.
= {if {$x > 1} {set y 10} {set y 8} }
if the value already in the Tcl variable x is greater than one, assign the value 10
to y, otherwise assign the value 8 to y.
= {if {[get !flow! 0] > 40}
{apply =RunModel('model1','prediction','_1',
$m_outpred)}
{apply =RunModel('model2','prediction','_1',
$m_outpred)} }
if the last value in !flow! is greater than 40, then run model1, otherwise run
model2.
source
The source command is followed by the full path name of a file that contains
one or more Tcl commands. That file is opened and the command(s) are read
from it and executed.
= {source /usr/example/Tcl_commands}

Important: In future versions of the product, we will continue to


support the statements listed above. Other Tcl commands not listed
here may currently be functional, but we do not intend to support
them in the future.

Insights User’s Guide 25


26 Insights User’s Guide
8
• Model Types and Variable Types,
page 8-2 8 Building a Model
• Using the Model Builder, page 8-4

This chapter explains how to build a model. As an alternative, you may want to use the
auto modeler wizard instead (select File > New > Model).
Building a model involves selecting a dataset to use as the basis for the model,
specifying which variables are inputs and which are outputs, identifying any time delay
relationships that exist between inputs and outputs, and selecting portions of the dataset
for training, testing, and validation sets.
The features for building models are in the Model window. You invoke the Model
window by selecting Tools > Model Builder.
Before building a model, you may need to time-merge the dataset that you will use for
training the model. For more information, see “When Time Merging Is Required” on
page 5-35. After building and saving a model, proceed to Chapter 9, Model Trainer.

Insights User’s Guide 1


Model Types and Variable Types
There are numerous types of models that you can use for analyzing and predicting
process behavior. The model types are prediction, FANN, custom, PCR (principal
components regression), and external (user-defined).

Prediction
In a prediction model, the variables are categorized as input or output. You can use a
prediction model for predicting outputs and for performing control and optimization
(also called predict inputs and predict setpoints).

FANN
FANN (focused-attention neural network) models are useful for analyzing processes
having intermediate measurements or indicators that are dependent on the manipulated
controls or external measured disturbance variables.
In a FANN model, the variables are categorized as independent input, dependent input,
or output.
Independent variables include controls that can be manipulated by the user (for
example, by downloading a new setpoint to a PID as long as the PID is not in manual
mode), as well as external measured disturbance variables input to the system, such as
feedstock, that come from outside the system, are measured and known, but are not
dependent on any of the other variables, and cannot be changed by the operator.

Note: “Independent,” in this context, refers to the process variable.


We also use the term “independent” to describe the origin or source
of some variables in a dataset, as described in “About Datasets” on
page 5-1. There is no relationship between these two sorts of
independence.

Dependent variables are intermediate measurements or states, such as a temperature that


is dependent on the control settings, the external measured disturbance variable settings,
and other, unknown, or unmeasured influences. The output variables are affected by
both the Independent and the Dependent inputs.
For FANN models, it is essential to build a model of how the Dependent state variables
will change as a function of the Independent control and external measured disturbance
variables; only then will we have a correct representation of how the output will change

2 Insights User’s Guide


Section

when one of the controls is changed. These two stages of the model are identified as
Phase 1 and Phase 2.
A prediction model is structured like this:

Inputs Model Outputs

A FANN model is structured like this:

Independent
Controls/Externals
Outputs
Model
(Phase 1)
Outputs

Dependent
States Model Predicted States
(Phase 2)

Initial States

A more detailed and complete description of FANN models is available in a technical


report that you can obtain from your customer support representative.

Custom
A custom model allows advanced users to specify the number of hidden nodes for the
neural network. Hidden nodes are of interest only to advanced users. The custom model

Insights User’s Guide 3


allows ordinary inputs and outputs as well as in/out variables as defined for sensor
validation models.

PCR
Principal components regression (PCR) is a standard multivariate-statistical variant of
ordinary least squares regression where the PCA (principal components analysis)
components of the original inputs are used as inputs to the linear regression. PCR is
useful for reducing the dimensionality of input data in cases where there is an
insufficient amount of data available relative to the number of variables. A complete
discussion of PCR is beyond the scope of this manual. The model builder feature for
PCR models is discussed in “Select PCR Components” on page 8-58.

External
An external model is any set of equations or algorithms that you have prepared and
compiled yourself. If you wish to develop an external model, contact a customer service
representative.

Using the Model Builder


When you select a model type, the Build Model window appears if no model is
currently loaded or if the loaded model is the same as the model type you selected. If the
currently loaded model is not the same as the model type you selected, however, the
model builder prompts you before proceeding.

4 Insights User’s Guide


Section

You have these choices:


Edit Current Model
Bring up the model builder for the same type of model as the currently-loaded
model.
Build ModelType Model
Unload the current model before bringing up the model builder for the type of
model you selected.
Build Model using Current Model’s VariableTypes
Convert the current model to the selected model type before bringing up the model
builder. To convert a model from one type to another, the model builder changes
variable types in the currently-loaded model to their logical equivalents, where
possible, in the selected model type. For example, variables of type Input in a
prediction model become variables of type In/Out in a sensor validation model.
Examine the model after conversion to verify that variable types are correct.
The Build Model window includes controls and settings that are common to all model
types, as well as some controls that are unique to each model type. The initial
appearance of the Build Model window depends on the type of model that you selected,
and on whether a dataset was already loaded. If a dataset is loaded, its variables’ names
are listed horizontally across the Build Model window; if no dataset was loaded, this
area is empty. Following are the variations of the window when a dataset is loaded.
Build Model window for a prediction model:

Insights User’s Guide 5


The Build Model window for a FANN model looks like this:

Build Model window for a custom model:

6 Insights User’s Guide


Section

Build Model window for a PCR model:

Build Model window for an external model:

To build a model, follow these steps:


• Load a dataset (if you already loaded a dataset in the preprocessor, it is still loaded).
A model is always associated with a dataset. After the model has been built and
trained, you can run it with data from any dataset that includes variables with the
same names as the dataset from which it was built. Datasets are loaded from the
Load Dataset option in the File menu in the menu bar; for more information, see
“Loading a Dataset” on page 5-10.

Insights User’s Guide 7


• Map the dataset variables to the model, accounting for any time delays that exist in
the process. You can specify the time delays manually or calculate them
automatically. See “Mapping Dataset Variables to the Model” on page 8-8 and
“Time Delays” on page 8-10.
• Optionally, you can edit the connectivity between pairs of input and output
variables. Each input and output can be connected nonlinearly, or linearly, or not at
all. See “Editing Connectivity” on page 8-32.
• Select dataset patterns for testing, training, and validation, and, if desired, for
filtering: see “Setting Patterns” on page 8-48.
• Select any additional features; see “Additional Features” on page 8-58.
• Save the model as described in “Saving or Abandoning the Model” on page 8-62. If
you have specified a model but not saved it, and you click Done to close this
window, you will be asked whether you want to save the model. You cannot use a
model that has not been saved (unlike datasets).
• Check the model statistics and variable bounds; if necessary, change the pattern
selection and/or variable bounds, and save the model again. See “Model Statistics
Window” on page 8-64 and “Variable Bounds” on page 8-65.

File Menu
For all models, the menu bar contains the File pull-down menu. The operations Load
Dataset, Load Model, and Delete Model File are described in “Pull-Down Menus” on
page 9-2. The operations Save Model, Save Model As, and Clear Model are described
in “Saving or Abandoning the Model” on page 8-62.

Mapping Dataset Variables to the Model


You must specify how each variable in the dataset functions in the model. The types of
variables differ with the model type.
All of the variables in your dataset are listed horizontally across the screen, in a grid,
with a horizontal scrollbar if necessary. All variables in the dataset are listed, but Date/
Time or string variables, or variables with constant value, may not be used in a model.
As a convenience, you can click Select Variables, to invoke the Select Model Builder
Variables window. This window lists all variables in the dataset, and allows you to

8 Insights User’s Guide


Section

select which ones will appear in the Build Model window. Selecting a variable here does
not commit it to being in the model, but simply allows it to appear in the grid.

In the Build Model window, below each variable name is a grid cell that is used to set
the variable’s usage in the model. In the simple case of a process with no time delays,
the variable type definitions are set in the single row of cells immediately below the row
of variable tag names. To mark variables, first click on the variable type, and then click
in the cell under the variable names; you can mark a group of adjacent variables by
dragging through all the cells. To unmark a variable, first click the Clear button, and
then click in the cell under its tag name. The type name will appear in the cell. Any
variable left blank is not used in the model. If you make a mistake, you can change or
clear any variable type as often as necessary. For example, to select flow1 as an input

Insights User’s Guide 9


below, you would first click the Input button in the Prediction Model section on the left
and then click in the cell under flow1:

You must specify a model interval. It should be the same as the time interval between
rows in the dataset. If this interval can be determined automatically, it is displayed;
otherwise, a field is provided for you to fill in, and select units from an option menu.

Time Delays
To model a process having significant time delays between process variable
interactions, specify a time delay value for each variable. The time delay value adjusts
the temporal relationship of a process variable with respect to the other variables in the
process.

10 Insights User’s Guide


Section

The terms tau and time delay are used interchangeably to mean a quantity of sampling
intervals.

2-hour delay

1-hour delay

Input1 Input2 Output1


0 +1 +2
Process Time Line (Hours)

In the example process illustrated above, a change in Input1 will affect Output1
approximately two hours later, and a change in Input2 will affect Output1
approximately one hour later. The time delay values assigned to these three variables
depend on the data sampling interval and on the position of the zero time delay, the
point where you decide to run the model.

Model Interval Model Interval


Time Delay Values
Is One Hour Is 5 Minutes
Run model immediately when Input1: -1 Input1: -12
Input2 is measured Input2: 0 Input2: 0
Output1: +1 Output1: +12

Run model at the time Input1: -2 Input1: -24


Output1 is modeled Input2: -1 Input2: -12
Output1: 0 Output1: 0

You can also use the same variable more than once in a model. Every process input or
output variable can have as many time delays as necessary, that is, the same variable
may be used with different time delays to serve as multiple inputs and/or outputs in a
model. However, steady-state models typically have only one time delay per variable.

Insights User’s Guide 11


For help selecting the best time delays (prediction-type models only), use the Time
Delay Identification tool (see “Finding Time Delays Automatically,” below).
For FANN models, you must identify all time delays yourself. To enter time delays
yourself, see “Specifying Time Delays Manually” on page 8-29. In a FANN model, you
must not specify an output variable with a time delay earlier than all inputs, or an input
variable that is later than all outputs.

Finding Time Delays Automatically


For models of type prediction and custom (including any prediction or custom model
used for predict inputs), you can use the model builder to identify time delays
automatically; otherwise, you enter time delays yourself based on your knowledge of
the process. The model builder’s time delay identification feature erases any time delays
already specified in the model; therefore, you should run the time delay identification
feature before entering any time delays yourself.
The time delay identification feature uses a nonlinear correlation algorithm, which gives
the feature the ability to detect and measure correlations not evident to common linear
correlation algorithms, such as the Correlation plot available through the data plotter.
To invoke the time delay identification feature, click Find Time Delays in the model
builder.
If you have already specified any variables in the Build Model window and then you
invoke the Time Delay Identification window, you will be warned that the time delays
already specified will be erased.

12 Insights User’s Guide


Section

Click No to return to the Model Builder. Click Yes to continue to the Time Delay
Identification window.

Insights User’s Guide 13


Selecting Model Variables
In the Time Delay Identification window, click the Select Inputs and Outputs button to
invoke the Select Model Variables window.

The left side of this window lists all variables in the dataset. The right side contains two
lists, one for model inputs and one for outputs. To copy variables from the dataset list
into either of the model variable lists, click or drag on the variable names, then click on
one of the right arrow buttons. An alternative is to click in one of the model variable

14 Insights User’s Guide


Section

lists, which will mark its right arrow button with a light blue highlight; then double-
clicking a variable in the dataset list will copy it into that selected model variable list.
When you finish selecting model inputs and outputs, click Done. A grid appears in the
Time Delay Identification window listing the input variables along the left side and the
outputs along the top.

There is one grid cell for each input/output pair. Each cell indicates the time delay for
the pair and the method for determining the time delay. You set the method by selecting
one or more cells and then selecting the method in the Edit Delay Settings section.

Insights User’s Guide 15


Click the vertical arrow button to reduce the grid to an abbreviated format.

Specifying the Time Delay Settings


The Edit Delay Settings selections appear when you select one or more cells.

16 Insights User’s Guide


Section

For each cell, specify one of the following methods for identifying time delays:
Automatic
The time delay identification algorithm determines the best time delay based on
nonlinear correlations in the dataset. Specify the range to be searched as a Min and
Max Time Delay, and specify an Increment (a number of time delays) at which to
search.

For example, the range 0-10 and default increment 1 cause the time delay
identification facility to evaluate the nonlinear correlations at time delays 0, 1, 2,
and so on up to 10, after which it selects the time delay having the greatest
correlation. In the grid, automatic delays that you have not yet calculated are
represented by three asterisks (***).
Examine each input/output pair, and use your knowledge of the process as much as
possible to set the range. It is crucial that the range cover the entire area where you
believe the best time delay resides; on the other hand, you can minimize execution
time by not specifying an unnecessarily broad range.
Automatic time delay identification generates sensitivity statistics that you can use
later (in “Finding the Best Inputs” on page 8-21) in determining which inputs, if
any, are not necessary in the model. To take full advantage of the Find Best Inputs
feature later, you may wish to select automatic time delay identification for as many
inputs as you have time and resources to calculate now, or all inputs if possible.

Insights User’s Guide 17


Manual
You specify one or more time delays explicitly, without using automatic time delay
identification. Select either a single time delay or multiple, incremental delays. For
a single time delay, type in its value.

For incremental delays, type in a Min and Max Time Delay and an Increment.

For example, with Min of 1, Max of 8, and Increment of 2, the variable is defined
as an input at time delays 1, 3, 5, and 7.
None
Do not use automatic time delay identification for this input/output pair, nor specify
an explicit time delay. As a result, the pair has no time delay, and the model has no

18 Insights User’s Guide


Section

connection between this input and output. If you determine later that you need to
create a connection between the input and output, do so by adding the input at the
desired time delay in the Model Builder and then use the Edit Connectivity facility
(described on page 8-32) to specify the required connections. In the Model Builder
window, non-fully-connected variables are marked with an asterisk (*).

Note: For use in this Time Delay Identification window, the time delay
values refer simply to the relative time between the two variables in
the selected grid cell, so they should be zero or positive; this differs
from the absolute time delay that will be assigned to each variable
within the model, in which all input variables are zero or negative. The
absolute time delays will be automatically assigned later, when you
create the model from the Model Output window, described on page
8-21.

Finding the Time Delays


You can run the automatic delay finder for all grid cells that are marked Automatic, or
only for cells that are selected. A working dialog informs you of progress during
calculations, and you can Cancel at any time.
When you run the automatic delay finder, the results are posted in this window, but they
are not automatically used to generate a model. You have the opportunity to append the
results to the dataset for inspection.

Writing a Report
The Write Report button can be used to write a file showing, for every pair of input and
output variables, all of the time delays that were considered and the calculated nonlinear
correlation value for each. This button invokes the Enter Filename dialog, which is used
to enter a file Name and Directory for the report.

Insights User’s Guide 19


Adding to Dataset
To add the time delay identification results to your dataset for plotting and further
analysis, click Add to Dataset. A window appears so that you can specify a suffix for the
time delay variables.

For each input/output pair, the operation adds two columns:


invar_outvar_suffix
The nonlinear correlation values. Example: flow2_ppm_TDI
invar_outvar_suffix_d
The time delays for which results were calculated. Example: flow2_ppm_TDI_d
In the data plotter, you can create an XY plot where you plot the correlation values
against the time delays. Examine the plots to see if you need to change any of the
automatically-derived time delays.

Configuring the Model


After you have specified a method for determining the time delay for each cell (either
automatic, manual, or none) and after you have run the automatic delay finder to

20 Insights User’s Guide


Section

calculate all automatic time delays, click Configure Model. The Model Output window
appears.

Finding the Best Inputs


In general, you can improve the accuracy of a model by removing low-sensitivity
inputs. You can determine which inputs have low sensitivities using the information
generated for automatic time delay identification.
To plot the sensitivities of input variables at their strongest time delays, click Find Best
Inputs in the Model Output window.

Insights User’s Guide 21


The Find Best Inputs window plots the inputs according to peak sensitivity, derived
during automatic time delay identification.

The plot shows only the inputs for which automatic time delay identification has been
performed. Inputs with manually-specified time delays or no time delays do not appear.
The sensitivities plotted are the peaks detected within the Min and Max time delay
boundaries you specified for automatic time delay identification.

22 Insights User’s Guide


Section

Use the Info tool to display variable name, time delay, and peak sensitivity for points in
the plot.

The sensitivities shown in the Find Best Inputs plot may not match the sensitivities you
generate later using the Sensitivity vs. Percent model analysis tool (described on page
10-36) because different algorithms and methods are used in each case. The values
shown by Find Best Inputs, generated by automatic time delay identification, are
statistical equations in the common sense whereas the methods used for Sensitivity vs.
Percent analysis combine statistical techniques with the information available from a
fully-trained neural network model. The values shown in the Find Best Inputs plot are
not as accurate as those generated for Sensitivity vs. Percent analysis, but the Find Best
Inputs values are still sufficient to represent the relative sensitivities of the inputs.

Insights User’s Guide 23


Use the Include Left tool to indicate which inputs to include in the model.

24 Insights User’s Guide


Section

To mark the lower-sensitivity inputs, to the right of the marked area, for removal, click
Apply.

Red points mark the inputs that will be removed from the model when you click Done.

Insights User’s Guide 25


You can also select points for removal by specifying a value in the Number of Inputs
text box. When you enter text in the box, OK and Cancel buttons appear.

Click OK to apply the cut.


Cutting by specifying Number of Inputs may undo selections made using the Include
Left tool.
Using the Include Left tool or the Number of Inputs specification, mark the inputs, if
any, that you want removed from the model. Optionally, click Print Plot to send the plot
to the printer. When finished, click Done to remove the marked inputs from the model
and return to the Model Output window.

26 Insights User’s Guide


Section

Specifying Output Time Delays


The Time Delay Identification window specified relative time between input and output
variables; this window is used for you to specify an absolute time delay in the model for
every output variable. The grid in this window operates just like the grid in the Build
Model window, described in “Specifying Time Delays Manually” on page 8-29, except
that you use it to mark the positions of output variables only. Click the vertical arrow
button on the left to expand the display.

By default all output variables are set to time delay zero, but you can change them to
any other time delay. It is also possible to specify more than one delay per output
variable, but it is not typically appropriate to do so.

Insights User’s Guide 27


Creating the Model
After making any needed changes in the output time delays and selection of inputs, click
Create Model. The model will be created with the output variables in the positions that
you specified, and selected inputs automatically placed in the correct positions relative
to the outputs. A message notifies you that the model has been created.

The model then appears in the Build Model window.

28 Insights User’s Guide


Section

Note: This model is fully connected, that is, there is a relationship


between every input variable and every output variable, for each of the
time delays indicated. For the model illustrated above, the calculated
time delay for press1 and impurity is 8, the time delay for press1
and yield is 7, and the time delay for press1 and quality is 6; but
when the model is created, all three time delays of press1 are each
connected to all three output variables. If you want to break some of
these connections, see “Editing Connectivity” on page 8-32.

Specifying Time Delays Manually


You can identify and enter time delays yourself for all model types. For prediction
models, you can also use the model builder’s automatic time delay identification
feature. The time delay identification feature erases any existing time delays in the
model, so if you intend to use it, use it before entering any time delays manually. For
more information, see “Finding Time Delays Automatically” on page 8-12. For custom
models, you must identify and enter all time delays yourself as described in this section.
If your process includes time delays, expand the display by clicking on the bidirectional
arrow button on the left side of the Build Model window.

The button changes appearance, and the grid expands to a matrix. You cannot expand
the grid before a dataset is loaded.
The names of the variables are displayed across the top. The relative time is listed along
the left side, displayed both as increments of the model interval, and as time units if
known. Boxes appear for you to specify the vertical scrolling limits and the increment
between displayed rows; you can specify these as time units or as rows.

Insights User’s Guide 29


The mechanics of selecting a variable type and then clicking or dragging in a grid cell
are exactly as described in “Mapping Dataset Variables to the Model” on page 8-8; the
only new consideration is making sure that you are in the grid row corresponding to the
correct relative time. The positional help at the bottom of the window displays your
current tau within the grid.
In a prediction model, you must not specify an output variable with a time delay earlier
than all inputs, or an input variable that is later than all outputs.
When you have finished mapping the variables, you can click on the arrow button to
shrink the grid back to its original size and restore the button’s appearance.

Variables in the Model


A variable in a dataset can be mapped into a model with as many different time delay
specifications as necessary. Every combination of variable name and time delay value is
counted in the model as another variable; remember this when checking model statistics
that display the number of variables of each type. If any variable in the model has a
nonzero time delay, the Analysis windows that display variable names usually include
the associated time delays (taus).

30 Insights User’s Guide


Section

The following sample display shows a model that uses the antifoam variable twice,
once at time delay -8 and again at time delay -4.

Each instance of the antifoam variable would be considered a different variable and
counted separately in the Model Statistics window and any other display that listed
model variables.

Insights User’s Guide 31


Editing Connectivity
After you have selected variables and specified time delays, you may optionally click
the Edit Connectivity button to invoke the Edit Model Connections window (this feature
available only for prediction and custom models).

This window displays the model variables, with their time delays, in a grid. Each grid
cell marks the intersection of one input and one output. Click one of the top three
buttons on the left to set connections throughout the entire model. If you want to specify
both linear and nonlinear connections in the same model, click Custom Connectivity.

32 Insights User’s Guide


Section

You can select an individual grid cell and specify the type of relationship between that
one input and output. You can specify a nonlinear connection, a linear connection, or no
connection at all. All three selections can be used within the same model. Like the
model builder grid, to use this window you first select the type of connection, and then
click to set a grid cell to that type. Changes that you make in this window are applied to
the model immediately, with no opportunity to cancel. If any cell in a row or column is
empty (connectivity “None”), the affected input and output are marked with asterisks in
the Build Model grid.

Internal Parameters
The Internal Parameter Count fields appear only for custom models having nonlinear
connections or nonlinear outputs.

The Internal Parameter Count indicates the number of hidden units (nodes) used in the
network. The quality of the model is ordinarily not very sensitive to this number;
however, in difficult problems, the overall accuracy of the model can be sensitive to the
number of hidden units.
Given a good testing set, it is impossible to use too many hidden units because the
number of effective parameters in the network always starts at the number of input
variables and grows with training time; over-fitting (too many effective parameters) is
avoided by stopping training when performance on the testing set begins to degrade.
Ultimately, the model reflects the parameters and other settings that produced the
greatest accuracy in the testing set. Given a good testing set, the only drawback to using
a large number of hidden units is that larger networks are slower to train.
The Internal Parameter Count can be set to the default value or to a custom number that
you specify. Before you begin to define a model, there is no default, so it is displayed as
three asterisks, as shown above. The default is calculated and displayed only when you
click the Update Default button. To use a number different from the default, click and
type in the box labeled Use Custom.

Insights User’s Guide 33


To get the best training results for the model, several different models can be built with
differing numbers of internal parameters; the best model is the one that achieves the best
testing error (see “Measures of Training Performance” on page 9-11).
Setting the Internal Parameter Count to zero is equivalent to having all linear
connections and a nonlinear output.

Gain Constraints and Extrapolation Training


To obtain correct model gains both inside and outside of the operational range reflected
in your training data, use the model builder’s Gain Constraints feature.
A model’s input-output gain values, or sensitivities, represent the influence that an input
has on an output. A gain is expressed as the ratio of the change in output to the change
in input (in raw, or engineering, units).
Using your physical knowledge of the process, you may know at least roughly what
some or all the input-output gains actually are in the process you are modeling. A
simplistic example may be that the temperature of a container of water (in the absence
of complicating factors) is a strictly-increasing function of the amount of heat applied to
it; therefore, the temperature-to-heater gain is positive.
Depending on the quality and nature of your training data, the gains derived during
training may not reflect the true process gains. Incorrect gains will not cause inaccurate
predictions (predict outputs) as long as you avoid running the model outside of the
operational ranges reflected in the training data. In models used for optimization and
control (predict inputs), however, correct gains are crucial for good model results.
Steady-state models used with Process Perfecter dynamic control models also require
correct gains.
The gain constraints feature allows you to force the model to take on whatever input-
output gain values you specify; therefore, it is very important that the gain constraints
you specify correctly represent the process. The model trainer fits the data as well as
possible within the specified gain constraints.

CAUTION: Do not use the gain constraints feature unless you know
the true process gains and can specify them adequately. Enforcing
incorrect gains can result in an unreliable model.

34 Insights User’s Guide


Section

The Gain Constraints feature allows you to improve your model in two ways:
Enforce accurate gains for optimization
When using a steady-state model for optimization (predict inputs) or as part of a
Process Perfecter dynamic control model, it is crucial that the steady-state model’s
gains be correct. You can specify minimum and maximum gain boundaries and
other parameters that the trainer uses in forcing the model to take on the proper
gains.
Perform extrapolation training
Ordinarily, a model functions reliably only in the operating range represented in its
training data. Using the Gain Constraints feature, however, you can enforce gains
across the entire input space, thus dramatically improving the accuracy of the
model outside operating regions covered by the training data. Extrapolation
training occurs by default when you specify gain constraints.
To display the gains in a model, use Output vs. Percent model analysis (described on
page 10-22) or Sensitivity vs. Percent model analysis (described on page 10-36). If the
gains differ from what you know to be correct, you may want to utilize the Gain
Constraints feature and retrain the model.
To evaluate your model’s accuracy when extrapolating, or generating predictions
outside the range of its training data, use the What Ifs tool (see Chapter 11, What Ifs).

Insights User’s Guide 35


Specifying Gain Constraints
With inputs and outputs specified in the Build Model window, click Gain Constraints.
The Gain Constraints window appears.

The display shows gain-constrained training parameters for each input/output pair. The
defaults shown indicate that there are no constraints on the gains. For this default
condition, the model trainer derives the gains entirely from the training data.

36 Insights User’s Guide


Section

To change the gain constraint parameters, select one or more input/output pairs. With
pairs selected, the window displays the parameters.

Min Gain and Max Gain


Specifying Min Gain and/or Max Gain constraint values for an input-output pair tells
the model trainer to make the gain for that input-output pair fall within the Min Gain/
Max Gain range for every training and extrapolation pattern (see “Specifying
Extrapolation Training Parameters” on page 8-39).
The Min Gain and Max Gain are expressed as the ratio of the change in output to the
change in input (in raw, or engineering, units). The Min Gain default is negative infinity.
The Max Gain default is positive infinity.
Typical datasets contain at least some patterns that behave as outliers with respect to
enforcing gain constraints. Therefore, for any gain bound that you want strictly enforced
for all patterns, it is advisable to specify a somewhat tighter bound than you actually
desire. For example, if you want to constrain a gain to be strictly positive for all
patterns, then instead of specifying MinGain = 0.0 and MaxGain = +Infinity, specify a
small positive value for MinGain (0.05, for example). For a better understanding of this
topic, see “Tolerance Parameters,” below.

Insights User’s Guide 37


Tolerance Parameters
Tolerance parameters allow you to adjust the strength with which your gain constraints
are imposed on the model, in the case that the gain constraints are inconsistent with the
data (see “Causes of Inaccurate Gains” on page 8-46).
Gain Tolerance
The gain constraints that you specify affect model training for any training pattern
(or extrapolation pattern as described in “Specifying Extrapolation Training
Parameters” on page 8-39) for which any input-output gain falls outside its
MinGain/MaxGain range. However, the Gain Tolerance parameter allows you to
adjust the strength of that effect. The Gain Tolerance parameter is a percentage of
the MinGain/MaxGain range, with a default of 20% (or 0.20), which defines an
extension of the MinGain/MaxGain range by the specified percentage
(0.20*|MinGain| is subtracted from MinGain, and 0.20*|MaxGain| is added to
MaxGain). If, during training, a gain for a pattern falls outside its extended range,
the strength of the gain constraint is increased.
Example 1: MinGain = 0.1, MaxGain = 0.20, and Gain Tolerance = 20% (default).
Then the extended MinGain/MaxGain range = [0.08, 0.24].
Example 2: MinGain = 0.0, MaxGain = 1.0, and Gain Tolerance = 20% (default).
Then the extended MinGain/MaxGain range = [0.0, 1.2].
Pattern Tolerance
This parameter is similar to Gain Tolerance, except that it applies to the percentage
of training patterns. Again, the gain constraints that you specify affect model
training for any pattern having any gain that falls outside its MinGain/MaxGain
range. The Pattern Tolerance parameter allows you to adjust the strength of that
effect. If, during training, more than the Pattern Tolerance percentage of training
patterns have any gains that violate their extended ranges, the strength of the gain
constraint is increased. Extrapolation patterns (see “Specifying Extrapolation
Training Parameters” on page 8-39) are not included in this test.

Note: The Gain Constraint Monitor (see “Gain Constraint Monitor” on


page 9-23) displays, for each input/output variable pair, both during
and after training, the percentage of training patterns that are violating
their specified gain constraints. This information is used in
determining the best epoch (see “Best Epoch” on page 8-45).

38 Insights User’s Guide


Section

When Your Gain Constraints Strongly Contradict The Data


In typical practice, the defaults of 20% for Gain Tolerance and Pattern Tolerance result
in nice response surfaces with the average gains within the specified range. However, in
cases where the constraints strongly contradict the data (see “Causes of Inaccurate
Gains” on page 8-46), the default tolerances can sometimes yield average gains that are
not completely within the specified limits. This is often perfectly acceptable, and may
be judged by viewing the “Output vs. %” plot under the Analyze function.
To enforce the gain constraints more strongly, you may lower both the Gain Tolerance
and the Pattern Tolerance parameters. However, if the gain constraints contradict the
data strongly enough, lowering these parameters to 10% (or 0.10) or below (depending
on the Min Gain and Max Gain settings) can sometimes make it difficult or impossible
for the training algorithm to converge. (See “Trouble Shooting Gain Constraints” on
page 8-47).
In some cases, it may be impossible for the MinGain/MaxGain range and the Pattern
Tolerance parameter to be simultaneously satisfied. Depending on what gain constraints
you specify, the MinGain/MaxGain range and the achieved Pattern Fraction may be
forced to trade-off with one another. For example, if the data dictates a nonlinear
response (non-constant gains), and you specify a tight MinGain/MaxGain range, then
after training, the average gains may lie within the specified limits, but at the expense of
violating the specified Pattern Tolerance parameter (the percentage of patterns in
violation may be viewed with the Gain Constraint Monitor, found in the Train window).
Adjusting the MinGain/MaxGain range and the Pattern Tolerance parameter allows you
to trade off between these goals when they are in conflict, based on which goal is more
important in your particular model.
The average gains (see “Sensitivity vs. Percent” on page 10-36) and the percentage of
patterns in violation of constraints as reported by the Gain Constraint Monitor (see
“Gain Constraint Monitor” on page 9-23) should be considered together in assessing the
quality of the model.

Specifying Extrapolation Training Parameters


In extrapolation training, patterns are generated at random throughout the entire input
space, and the gain constraints are applied to these extrapolation patterns. Gain
constraints can be applied to the random patterns because the gain range is known and
no target output value is needed.

Note: Extrapolation training does not occur unless you have also
specified gain constraints. Gain constraints may be specified for the

Insights User’s Guide 39


purpose of triggering extrapolation training even if the model gains
are already correct. To specify gain constraints, return to “Gain
Constraints and Extrapolation Training” on page 8-34.

The Gain Constraints window allows you to specify how many random extrapolation
patterns to generate for training.

With the parameters set to generate one extrapolation pattern per ten rows of training
data, roughly 10% of the data is randomly generated throughout the input space
indicated by the variable bounds (described on page 8-65). The extrapolation patterns
are not included when calculating the training and testing errors. They serve only to
expand the response surface across areas having no data. To turn off extrapolation
training, set Number of Extrapolation Patterns to 0.

Saving the Model


After specifying the desired parameters for gain-constrained training and extrapolation
training, click Done in the Gain Constraints window.

Infeasible Gain Constraints Warning Message


When you save a model for which you have specified gain constraints, and if you are
using the Nonlinear Output Function option for any output variable (see “Editing
Connectivity” on page 8-32), a dialog box like the following example may warn you
that your gain constraints are infeasible.

40 Insights User’s Guide


Section

This warning means that the gain constraints you specified cannot be achieved across
the entire range of the indicated input variable(s).
If the Output Function of the output variable is Nonlinear, the maximum gain that the
model can achieve across the entire range of the input variable is the dataset range of the
output variable divided by the dataset range of the input variable:
(Max_output - Min_output) / (Max_input - Min_input)
where Max_output is the variable bound for the output variable, and so forth.
If the Output Function of the output variable is Linear, however, the model can
accommodate any (finite and reasonable) gain constraint range. These ideas are
illustrated in Figure 1 and Figure 2, below, and may be understood as follows.
The Nonlinear Output Function saturates at the minimum and maximum of the
historical dataset. Therefore, the maximum gain that can exist across the entire range of
the input variable is equal to the slope of the line running from the lower left corner to

Insights User’s Guide 41


the upper right corner of the input-output response plot. Similarly, the minimum slope
that can exist across the entire range of the input variable is equal to the slope of the line
running from the lower right corner to the upper left corner. (The min and max slopes
have the same magnitude.) In Figure 1, if a Min gain constraint is specified as 2.0, the
trainer fits the data as well as possible while attempting to minimize the number of data
points that violate the gain constraints. As can be seen in Figure 1, the result is that the
gain achievable by the model has to be less than the specified Min Gain of 2.0 for at
least some portions of the range of the input variable.
The warning message indicates your options under these conditions. Do one of the
following:
• change the Nonlinear Output Function to a Linear Output Function for the affected
outputs (this is the recommended action),
• change the output variable bounds as specified in the warning message (see
“Variable Bounds and Gain Constraints” on page 8-44),
• change the gain constraints, or
• continue (and expect an input-output response that only partially reflects the
desired gain constraints, as shown in Figure 1).
Any of these options may be appropriate depending on your particular modeling
situation. Assuming that you want the specified gain constraints to be enforced across
the entire input variable range, the simplest solution is perhaps to use a linear instead of
nonlinear output function.
If you choose to change the variable bounds (whether or not the warning message
occurred during gain constraint specification), the same warning message will occur at
that time, if appropriate.

42 Insights User’s Guide


Section

Max Gain Possible


Across Entire Input Axis
Min Gain Specified
Data Fit
Data

Figure 1: Nonlinear Output Function

Insights User’s Guide 43


In the case of a Linear Output Function, because the output value does not saturate, the
output values may take on any value, and so the response may cross the input-output
response plot at any slope (gain). This is illustrated in Figure 2.

Min Gain Specified


and Data Fit
Data

Figure 2: Linear Output Function

Variable Bounds and Gain Constraints


Upon saving the model, the warning for infeasible gain constraints may appear if you
changed the variable bounds after defining the gain constraints. See “Infeasible Gain
Constraints Warning Message” on page 8-40.

44 Insights User’s Guide


Section

Training the Model


After you have made any other required modifications to your model, save it and train it
normally.
Gain constraints add requirements to (or decrease the “degrees of freedom” of) the
training task; therefore, it is usual for more epochs to be required for training to
complete as compared to training the same model without gain constraints. (See
“Trouble Shooting Gain Constraints” on page 8-47).

Monitoring Training
In the Training window, there is a Gain Constraints button in the Monitors area. This
window displays the Percent of Training Patterns Violating the Gain Constraints for
each input/output pair. This information allows you to see to what extent your specified
gain constraints are being met by the trained model. If, for any input/output pair, the
percentage is close to or less than the Pattern Tolerance parameter you specified for that
pair, the model is conforming to your gain constraint specifications (as well as fitting
the data as well as possible within those constraints). This information is also used to
determine the best epoch (see below). If the percent of patterns in violation is
significantly exceeding your Pattern Tolerance specification, see “Trouble Shooting
Gain Constraints” on page 8-47.

Best Epoch
In gain constrained training, Autostop (as well as the Auto Learning Rate Adjustment)
will be fixed in the Off position. In determining the best epoch in gain-constrained
training, constraint satisfaction must be considered in addition to the test relative error.
The gain constraints have priority; therefore, the best epoch is determined as follows: if
in no epoch the gain constraints have yet been satisfied (within tolerance; see above),
the epoch with the lowest test relative error is the best epoch; otherwise, the epoch
having the lowest test relative error among all of the epochs satisfying the gain
constraints is the best epoch.

How Gain Constraints Affect the Training and Testing Errors


If you have trained the same model twice, once with gain constraints and once without,
you can compare training/testing errors to understand what characteristics of the
training data may have caused inaccurate gains in the model.

Insights User’s Guide 45


Causes of Inaccurate Gains
Inaccurate gains typically result from one or more of the following causes:
Correlated inputs
Correlated inputs are any pair of input variables whose time series are correlated to
some significant degree; consequently, the variables have complementary effects
on model predictions, making it difficult for the model to distinguish the individual
gain of each variable. For a given pair of correlated variables, any number of
complementary sets of gains could be derived, and although the gains may not
reflect the true gains of the variables, the model can still produce accurate
predictions. Use the Correlation plot (described on page 6-22) to examine the
degree of correlation between pairs of input variables. Typical datasets have at least
a few correlated inputs and may have several clusters of them.
Closed-loop control data
When parts of the process are under closed-loop control, the controller (automatic
or human) essentially represents an inverse model of the process; consequently,
data gathered while under closed-loop control can result in gains having the wrong
sign in the model.
Bad data
Inaccurate, corrupted, or incomplete data can result in incorrect gains.

Interpreting Differences in Training/Testing Error


The causes of inaccurate gains divide into two scenarios with respect to the change in
training/testing errors (Relative Error or R2) when gain constraints are added to a
model. Datasets often contain characteristics from both categories.
No increase in training/testing error
Results from correlated inputs. Correlated inputs contain common information;
therefore, many different sets of gains may be consistent with the data and thus
yield equivalent training/testing errors (data-fitting quality). Roughly speaking, you
can use gain constraints in this case to break the correlations between the correlated
inputs by specifying the gains that you know are correct from your knowledge of
the process. In this case, many different sets of gains are consistent with the data,
and you are simply specifying which of these sets of gains is correct. Therefore,
because you are specifying gain constraints that are consistent with the data,
training/testing error (data-fitting accuracy) does not increase. In other words, the
gain constraints impose the correct gains on the model without causing the data-
fitting quality to suffer.

46 Insights User’s Guide


Section

Increase in training/testing error


Results from data gathered from a process under closed-loop control, or inaccurate,
corrupted, or incomplete data. In these cases, you typically will be specifying gain
constraints that are not consistent with the data. When you apply gain constraints in
this case, you are purposefully contradicting the data with information that you
know more correctly represents the process than does the data (with respect to the
gains in question). The model trainer fits the data as well as possible while
satisfying the specified constraints. Because gain constraints in this case are
inconsistent with the data, the training error can naturally be expected to be worse
than in a model trained without such gain constraints. This is perfectly acceptable
and desirable, provided that the gain constraints you specify are in fact true of the
process.
If specifying gain constraints results in no significant increase in training/testing error,
you may conclude that the gain constraints simply broke input correlations. A dramatic
increase in training/testing error likely indicates that the specified constraints severely
contradict the training data. A moderate increase in the training/testing error likely
indicates that a combination of correlation breaking and data-contradiction occurred. No
increase at all indicates pure correlation breaking.

Trouble Shooting Gain Constraints


Infeasible Gain Constraints Warning Message
This warning message may appear after you have specified gain constraints or after you
have adjusted the variable bounds. This message appears only if you use nonlinear
output function(s) in your model. The message lists all input/output pairs for which the
gain constraints you specified cannot be achieved by the model across the entire range
of the input variable. For more information, see “Infeasible Gain Constraints Warning
Message” on page 8-40.

Output response is more linear than expected


Training for additional epochs may be required. In the case that the gain constraints
conflict at all with the data, the gain constraints remove degrees of freedom from the
training process, which make it likely that more epochs will be required to fully train the
model than if the gain constraints were not included.
A second possibility is to create a Custom Model and increase the number of internal
parameters from the default number (see “Editing Connectivity” on page 8-32). This
approach increases the internal degrees of freedom available to the model, which helps
offset the degrees of freedom removed by the gain constraints.

Insights User’s Guide 47


Model is diverging (relative error or Gain Constraints violations are increasing):
Training may diverge (continually worsen) either in data-fitting error (relative error) or
in gain satisfaction (the Percentage of Training Patterns Violating Gain Constraints
shown by the Gain Constraint Monitor during training). Almost certainly this problem
occurs because one or both of the Tolerance Parameters (see “Tolerance Parameters” on
page 8-38) is too low (too “tight”). The most likely cause is that, for one or more input/
output pairs, the Gain Tolerance is set too low (probably 10% or less). This low setting
alone will not necessarily cause training to diverge; it may be that “widening” your Min
and Max gain constraint settings for the input/output variables in question may also
alleviate the problem. In rare cases, increasing the Pattern Tolerance (for the input/
output pairs in question) might alleviate the problem, especially if it is set near 0%, or if
you have a very small dataset.

Model is not sufficiently respecting the Gain Constraints


If the Tolerance Parameter settings are at the default values of 20%, decrease the Gain
Tolerance and/or Pattern Tolerance settings for the input/output pairs that are violating
the constraints beyond your satisfaction. Try decreasing the Gain Tolerance by 5% to
start with. If the Gains are still not respected in your model to your satisfaction, you may
have to tighten the Gain Tolerance even more. It may also help to lower the Pattern
Tolerance setting.

Setting Patterns
When a dataset is used in a model, each row of the dataset must be a single sample of all
the input and output variables for the process at a given point in time. All the variables
in a dataset row must be sampled at the same time. (See “Time Merge” on page 5-35 for
information on how to achieve this.) A data pattern, also called a model pattern, is a
complete set of inputs and outputs as presented to the model. If there are time delays
among the variables, a data pattern will contain values gathered from different rows in
the dataset.

Testing, Training, Validation


In order to train the model, the dataset must be apportioned into training patterns and
testing patterns. During training, the model alternates between a training (learning) pass
and a testing pass. In the training pass, the training patterns are run through the model,
and during the testing pass, the test patterns are run through the model. Each training/
testing cycle is called an epoch. The training pass of the epoch modifies the internal
structure of the model based on the error between the actual output value (from the

48 Insights User’s Guide


Section

dataset) and the predicted output (from the model). The testing pass of the epoch does
not alter the model, but compares its output to the target output of the test patterns to
provide a means of determining the success of the training.
In addition to the required sets of testing and training patterns, it is advisable to reserve
a portion of the dataset for validation. The model is never exposed to the validation
patterns while it is being trained. The final measure of a model’s performance (after
training is completed) should be how closely it calculates the output values for the
validation set (see “Predicted vs. Actual” on page 10-2). The validation set should
normally be a block of data at the end of the dataset (this is the default behavior). It is
not necessary that the validation set statistics match the statistics of the rest of the
dataset (unlike the test set, which should closely match the training set); however, you
must make sure that the validation set does not contain input or output points that are
outside the range of the remainder of the data. It is possible to specify a validation set
from patterns distributed throughout the dataset, instead of using a block of data at the
end, but this is not generally recommended. The validation set is typically optional (for
required use, see “Stiff” on page 9-13); when used, it should ordinarily be roughly
between 5% and 10% of the data.
The training performance of a model can depend on which portions of the dataset are
chosen as the test set. After you save a model, the Model Statistics window (described
on page 8-64) is invoked. You should examine the mean and standard deviation of the
testing and training sets; they should differ by no more than about 5%. The test set
should be at least 10% of the total of training and testing data; between 10% and 20% is
generally sufficient.

Filtering
The model will learn the relationships in the training data. If the training data contains
redundant (or nearly redundant) values, the model will pay more attention to patterns

Insights User’s Guide 49


with more occurrences than to patterns with fewer occurrences. Usually, this is exactly
what you want: the model should pay more attention to normal values than to outliers.

Learning All Data

n-dimensional The model is most accurate


Process State Space where the data is most dense

However, if your process has more than one operational mode, but spends more time in
one mode than the others, it will cause “mode dominance”, in which the model does not
learn all operational modes equally well. If this problem occurs, you can filter out the

50 Insights User’s Guide


Section

redundant data (after the dataset is divided into testing, training, and validation
patterns).

Learning Filtered Data

n-dimensional The model learns all


Process State Space regions of the process

Important: Do not apply a filter unnecessarily. Filtering can cause the


model to learn undesired outliers as well as good data.

Insights User’s Guide 51


Set Patterns Window
If you do not specify otherwise, the test and validation sets will be selected by default,
and there will be no filtering. If you want to examine or change the defaults, click the
Set Patterns button, which invokes the Set Patterns window.

This window is used to specify testing/training/validation sets and filtering. When you
load a dataset and a model, the dataset is first divided into the testing/training/validation
sets, and then the filtering is applied, and finally, any unusable rows are discarded (for
example, at the beginning and end of the dataset, around breaks, and rows containing
invalid points). Because the unusable rows are not discarded until last, the number of
patterns in each group is not necessarily what you initially specified. The Model
Statistics window, which comes up when you save or load a model, contains an option
to write the final pattern selection into a column in the dataset; for more information,
see “Model Statistics Window” on page 8-64.
The test set can be selected at fixed intervals in the dataset, or at random, or according to
values specified in a variable in the dataset. With interval or random selection of the test

52 Insights User’s Guide


Section

set, the validation set is a block of rows; for a customized validation set, you must select
Use Variable for the test set.

Interval
An interval test set is selected by dividing the dataset into blocks. The validation set is
selected first, by specifying a Start row and a number of Samples (or 0 to suppress the
validation set). The number of samples can be greater than will actually fit in the dataset
(for your convenience in specifying “from here to the end” without having to calculate
an exact number of rows).
After the validation set has been reserved, the testing set is specified from the remainder
of the dataset as blocks of data at regular intervals, and all other rows are used for
training. Start is the row number of the first testing row. Testing Samples is the number
of rows in each block of test rows; Training Samples is the number of rows to skip
between blocks of test rows (that is, the number of data rows to use for training). At
least one block of test rows is always included in the test set; Iterations is the number of
repetitions in addition to the first block. Iterations can be greater than will actually fit in
the dataset, but at least one specified block must fit.
The default values can be used in a simple system without time delays. If your process
includes time delays, the number of testing samples in a block should, as a rule of
thumb, be at least twice the size of the span of time delays in the process (that is, the

Insights User’s Guide 53


largest positive time delay minus the smallest negative time delay); this is to ensure that
the training and testing sets are relatively independent.

Example: Default Test/Train/Validation Sets for a dataset with 200 rows

row 1
Test
row 15
row 16

Train

row 100
row 101
Test
row 115
row 116

Train

row 190
row 191
Validate row 200

54 Insights User’s Guide


Section

Random
With Random test set selection, you specify the Validation Set exactly as for an Interval
test set; then you select a percentage of the remainder of the data to be used for testing.

The default is 15%. If the dataset has a relatively small number of rows, it may be
difficult with Random selection to achieve a good statistical balance between the testing
and training sets; if this occurs, you should select Use Variable instead of Random.

Use Variable
The Use Variable option for test and validation sets is a customized selection used for
unusual situations in which neither Interval nor Random selection is suitable; instead,
you create a variable in the dataset with code values that indicate how each row is to be
used. The variable can be created in the preprocessor using the transform calculator, or
it can be a Pattern Flag variable that was generated automatically and appended to the

Insights User’s Guide 55


dataset (see “Model Statistics Window” on page 8-64). The acceptable values in this
variable are

-1 don’t use this


row
0 train
1 test
2 validation

No other values are allowed in the variable. You can create the variable by assigning
numeric values to it, or by using the transform constants $m_ignore, $m_train,
$m_test, or $m_valid (described on page A-17).
You must specify which variable to use.

You can enter the variable’s name, or you can click Select Variable to invoke the Test
Set Variable Selection window. This window lists all variables in the dataset; however,
only numeric variables can be selected. When you click OK, this window is closed and
the selected variable’s name is entered in the Select Test Set window.
For example, suppose that you want to divide a dataset into groups of 500 rows, such
that in each group the first 100 rows are used for testing, the next 300 rows for training,
and the final 100 rows for validation. You can create variables called PatternFlags
and showPatternFlags, and assign their values using these transforms:
!PatternFlags! = $findle($row $mod 500, 0, $m_valid, 100,
...continuation of previous line: $m_test, 400, $m_train, $m_valid)
!showPatternFlags! = $ttv(!PatternFlags!)

56 Insights User’s Guide


Section

In the spreadsheet, these two variables would appear like this:

In this example, PatternFlags contains the pattern information and would be


selected for Use Variable; showPatternFlags is a convenience for display
purposes only, to interpret the numeric values contained in PatternFlags.

Train Filter
The filter selection can be FIFO (first in, first out), Nearest Neighbor, or None. A FIFO
filter concentrates on the most recent data; a Nearest Neighbor filter groups data by
similar values in n-dimensional space. The filtering parameters are Cells per Input and
Patterns per Cell. Cells per Input is the number of divisions to make along each axis in
n-dimensional pattern space; Patterns per Cell is the maximum number of similar
patterns allowed before the filter begins to discard patterns. These parameters are
calculated by default, and should not be changed without first consulting your customer
support representative. For more information, see “Filtering” on page 8-49.
If you specify a filter, every time you load the model, you will be asked whether to Use
or Disable the filter.

Insights User’s Guide 57


When the model has just been created and has not yet been trained, or if you have
partially trained it and intend to train it further, you should select Use Filter; after the
model has been trained and you are ready to use it for analysis, you may select Use or
Disable.
Be aware that if you disable a filter, and then re-save the model while the filter is
disabled, the filter is permanently removed and you can never re-apply it without re-
training the model.
A message notifies you that the filter is being constructed.

Additional Features
In addition to the common features described above, each model type has additional
features, which are accessed by controls in the Build Model window.

Select PCR Components


For principal components regression (PCR) models, you can select how many
components to include in the model. For a brief description of PCR, see “PCR” on
page 8-4. A complete discussion of PCR is beyond the scope of this manual.
Building a PCR model allows you to reduce the dimensionality of the input state space
and orthogonalize input axes, which helps you to decorrelate variables. It also allows
you to generate eigenvalues and scores that you can analyze.
In the Build Model window, mark at least two variables as inputs. Then click Select
PCR Components. Use the Select Principal Components window to calculate and graph
components and choose how many of them to include in the model. You can specify the
Number of Components to calculate, or by not specifying a number, you can calculate

58 Insights User’s Guide


Section

all components. Click Calculate PCR to calculate and graph the components according
to cumulative variance.

Optionally, you can select unnormalized scaling instead of normalized, the default. In
normalized scaling, calculations are based on values divided by the standard deviation
of the data; in unnormalized scaling, the values are not divided by the standard
deviation. If you change the scaling, you need to click Calculate PCR again. Both
options yield components and scores based on mean-centered inputs (x - <x>, where
<x> is the average of x).

Insights User’s Guide 59


Using the Info tool, you can click on a point in the graph to display the cumulative
variance for the corresponding component.

Information appears in the positional help field at the bottom of the window.

60 Insights User’s Guide


Section

Use the Include Left tool to select the number of components for inclusion in the model.

You can also select PCR components by entering a number in the Number of
Components text box. Implicitly, specifying a number of components indicates the
strongest (leftmost) components.

Insights User’s Guide 61


To review the eigenvalues and eigenvectors of the components, click Report. The
operation first prompts you for a file name so that it can write the report to disk. Then it
displays the report.

This report is the same as the one produced when you select Write PCA Report in a
PCA plot of a dataset (on page 6-18).
After you have selected the number of components that you want included in the model,
click Done to return to the Build Model window.
When you save the model, the model builder calculates the PCR statistics, effectively
training the model. The PCR model is immediately ready for use. It does not require the
epoch-by-epoch training that other model types require.
To review PCR scores for the dataset, bring up the Predicted vs. Actual model analysis
tool (on page 10-2), specifying that internal parameters should be appended to the
current dataset. For PCR models, the columns added to the dataset are the scores.
Other tools for principal components analysis (PCA) are the $pca transform (on page
A-20) and the PCA plot (on page 6-18).

Saving or Abandoning the Model


When you have specified a model, save it. Select Save Model or Save Model As from
the File pull-down menu. To erase the current model without saving it, select Clear
Model from the File menu.
A warning appears if you try to exit the Build Model window without having saved the
current dataset or model. If you build a model on a dataset that is not saved, and then
exit without saving the dataset, you may never be able to load the model again if you
cannot recreate the dataset variables used in the model. It is always risky to build a
model on an unsaved dataset.

62 Insights User’s Guide


Section

CAUTION: If you entered the Build Model window only to inspect a


model that you have already built and trained, do not save it or the
training files will be deleted and you will have to train it again.

Save Model and Save Model As


When you save a model, if it does not already have a name, Save Model is identical to
Save Model As. Save Model As saves the current model, and stores its name in a data
dictionary. The Save Model dialog provides text boxes for you to enter a directory and
model name, data dictionary name, and an optional comment.
The model name can consist of letters, numbers, and underscores only; no other
characters are allowed. If you enter a name that is already recorded in the data
dictionary, you will be asked whether to overwrite the existing model.
You can use any valid data dictionary name. It defaults to the data dictionary that you
have most recently specified, or to your default file.
The comment is stored in the data dictionary with the model name, and is displayed
whenever the Select Model dialog is opened.
When you save a model, it is immediately re-loaded.

CAUTION: If you entered the Build Model window only to inspect a


model that you have already built and trained, do not save it or the
training files will be deleted and you will have to train it again.

Loading the Model


When the model is loaded:
• The patterns selection is applied to the current dataset.
• If you applied a filter to the model when you created it, you will be asked whether
to use or disable the filter.

Insights User’s Guide 63


For a model that has a filter but has not yet been trained, you should select Use
Filter; at other times, you may select Use or Disable.
• The Model Statistics window is invoked.

Model Statistics Window


The Model Statistics window is automatically invoked whenever you save or load a
model (it can also be invoked from the common File menu described on page 9-2).

This window displays the number of patterns in each of the training, testing, and
validation sets, and the total of these three sets; for a number of reasons, this total often
will not equal the number of rows in the dataset (if any variable has unusable values, if
the model has any time delays, if a training filter is being used, and so forth). The mean
and standard deviation of the output variables in each group are also displayed.

64 Insights User’s Guide


Section

If you are interested in seeing how the dataset is grouped into training, testing,
validation, and ignored patterns, you can click the Append Pattern Column button. This
creates a new column in the dataset containing values that indicate each pattern’s use
(-1=ignore, 0=train, 1=test, 2=validation). A message notifies you when the new
column is complete. You can go back to the preprocessor and view the new column; you
can even modify its values, then return to the model builder and modify the pattern set
specification to use this variable. The $ttv transform can be used to create an
additional variable displaying the interpretation of the codes.
When you have just built a new model, it is necessary to check:
• the number of each type of variable, to be sure that the model has been specified
correctly;
• the number of testing, training, and validation patterns: if a validation set is used, it
should be between 5% and 10% of the size of the dataset, and the test set should be
at least 10%–20% of the total of testing and training patterns;
• the means and standard deviations of the testing and training patterns: if they are
different by more than about 5%, the test set is not representative and should be
changed;
• if a validation set is used, it must not contain points that are outside the range of the
remainder of the data (but it is not necessary, or generally expected, that the
validation set mean and standard deviation should match the testing or training
sets).
If any of these requirements are not met, you should return to the Build Model window,
change the test set specification, and save the model again (if the test set is specified
from a variable in the dataset, you must go to the preprocessor and change the variable’s
values, and then load the model, but the model does not have to be saved again). After
these requirements are met, we recommend that you click Variable Bounds to invoke the
Variable Bounds window.
After a model has been trained, the Model Statistics window also displays some of the
statistics of the dataset on which the model was trained (which is not necessarily the
dataset currently loaded). These values are provided for information only; there is no
need for you to check them.

Variable Bounds
When a model is trained or run, the internal calculations are not made using the actual
dataset values of the variables; instead, each variable is linearly scaled to an internal
working range. This is done invisibly to the user, and all data displayed on the screen is

Insights User’s Guide 65


unscaled back to its original units. The standard scaling maps the variable’s range in the
dataset to the scaling range.
The standard scaling is appropriate for most models, and it should never be changed
unnecessarily. However, it is sometimes necessary to run a model online using values
slightly outside the bounds of the data on which it was trained; in this case you must
make certain that the scaling range for each variable, especially each output variable,
encompasses the entire data range on which the model will be run.
The standard scaling maps to the training data range, but ±20% extrapolation beyond
this range is within the acceptable range of running the model. Do not attempt to
extrapolate too far beyond the data region.

CAUTION: If you change the variable bounds, Output vs. % and


Sensitivity vs. % analysis will be incorrect outside the training data
range.

You can also use the gain constraints feature to improve a model’s extrapolation
accuracy, but you can do so only if you are also specifying gain constraints. For more
information, see “Gain Constraints and Extrapolation Training” on page 8-34.

Setting Variable Bounds


You can view the variable bounds for any model, but you can change them only for
models that have not yet been trained. If you want to change the variable bounds for a
model that already has been trained, you must copy it into an untrained model first,
change the variable bounds on the new model, and then train the new model.
If you specify gain constraints that violate variable bounds, a message may appear when
you save or train the model stating that you may need to change variable bounds. Before
changing variable bounds for this reason, see “Infeasible Gain Constraints Warning
Message” on page 8-40 before proceeding.
To view or change the variable scaling bounds for a model, invoke the Variable Bounds
window, either by selecting Show Variable Bounds from the common File pull-down

66 Insights User’s Guide


Section

menu or by clicking Variable Bounds in the Model Statistics window that appears when
you save or load a model.

This window is used to display the scaling bounds of the variables in a model, and
change them if the model has not yet been trained. It consists of a menu bar, an Edit
area, a Defaults area, and a scrolled list showing all of the variables in the model, their
lower and upper scaling bounds, and their minimum and maximum in the current
dataset.
If the model has already been trained, you can view the values, but the File menu and
Edit area are disabled, and the Defaults area can be viewed but not modified.
By default, every bound is set to a specified percentage beyond the values in the dataset;
this percentage can be either a percentage of the variable’s values, or of its range. You
can also set any bound to ignore the default and instead use a value that you type in.

Insights User’s Guide 67


Bounds that are set to use the default are marked with an asterisk (*). If you type in a
different default percentage, every bound that is set to use the default is changed.
To change the default percentage, click in the text box and type in a new value. The
percentage must be a nonnegative number, and either % of Value or % of Range must
be selected. For example, if you set the default lower bound to 5% of Range and the
upper bound to 10%, and your dataset has a variable that ranges from 0. to 100., then its
bounds will be -5. and 110.:

68 Insights User’s Guide


Section

If you set the same bounds as a % of Value, however, the bounds will be 0. to 110:

To edit the bounds for any variable individually, click on it and its current bounds and
default flags will appear in the Edit area. If you type a number, that number will be used
instead of the default percentage, and the Use Default toggle button will be turned off; if
you turn on the Use Default toggle button, the corresponding number will be filled in.
You cannot set a bound to be inside the range of the variable’s values in the dataset; if
the dataset contains extreme values that you want the model to ignore, you must use the
preprocessor to change or remove them.
The Save Report button invokes a prompt box asking for a file name and directory path.

When you type in a file name and click OK, the variable bounds information is saved to
an ASCII file. Use the editor (Chapter 3, File Editor) to view or print this file.

Insights User’s Guide 69


Saving Variable Bounds
If the model has not been trained, the File menu in the menu bar contains two entries,
Save Model and Save Model As.
Save Model saves the displayed variable bounds values into the current model; Save
Model As invokes the common Save Model dialog box so you can save the changed
model under a new name.
The Cancel button closes this window without saving any new changes that you have
made since the last time you saved the model. The Done button saves the model under
its current name and closes the window.

Modifying an Existing Model


If a model has been built but not yet trained, you can modify it as necessary. This is
useful if there is a mistake to be corrected, or if you are resuming work on a model that
had to be saved before you had time to finish defining it.
If a model has already been trained, and you make any changes to its definition, the
training is no longer valid. If you save the changed model without giving it a new name,
the old model is deleted and a new model is created. If you want to make a new model
that is similar to an existing one, without deleting the original, be sure to use the Copy
Model option in the File menu to save the changed model under a new name.

70 Insights User’s Guide


9
• Pull-Down Menus, page 9-2
• Loading the Dataset and Model, 9 Model Trainer
page 9-11
• Measures of Training Perfor-
mance, page 9-11
• Training Types, page 9-12
• Training Parameters, page 9-14
• Starting and Stopping Training,
page 9-17
• Training Monitors, page 9-18
• Best Epoch Adjustment, page 9-23

This chapter explains how to train a model. If you built your model using the auto
modeler wizard, you do not need to train the model because the auto modeler has
already done so.
The model trainer uses historical data that has been made into a dataset. You should
have already preprocessed the training data according to the instructions in Chapter 5,
Spreadsheet, and the following chapters. You should have already built and saved the
model according to the instructions in Chapter 8, Building a Model.

Insights User’s Guide 1


Invoke the Train window by selecting Tools > Model Trainer. It is used to control and
monitor the training process.

To train a model, you first load a dataset and the model, select a training type (described
on page 9-12), optionally set training parameters (described on page 9-14), and start the
training. Numeric information is always presented during training, and you can also
view two types of plots (see “Training Monitors” on page 9-18). Training continues
until you stop it, or until the training parameters that you set cause it to stop
automatically.

Pull-Down Menus
The model trainer provides the following pull-down menus:
File Pull-Down Menu page 9-3
Epoch Pull-Down Menu page 9-10
Phase Pull-Down Menu page 9-10

2 Insights User’s Guide


Section

File Pull-Down Menu


The File pull-down menu in the Model window provides the following operations:
Load Dataset page 9-3
Load Model page 9-4
Copy Model page 9-6
Save Model As page 9-7
Rename Model Variables page 9-7
Delete Model File page 9-9
Show Model Statistics page 9-10
Show Variable Bounds page 9-10
Where the File pull-down menu appears in other windows, such as the Train window
and the Build Model window, it may provide only some of the operations documented
in this section.

Load Dataset
When you click Load Dataset, the Select Dataset dialog is invoked. This is the common
Select Dataset dialog that was used in the preprocessor. It functions as described in
“Loading a Dataset” on page 5-10.

Insights User’s Guide 3


Load Model
The Load Model option on the File menu is used to load a model that you have already
built and saved. You cannot load a saved model until a dataset has been loaded. When
you click Load Model, the Select Model dialog appears.

This dialog lists all models that have been recorded in your current data dictionary, with
their associated comments, if any. It functions exactly like the Select Dataset dialog,
described in “Loading a Dataset” on page 5-10. After you select a model, it is loaded.
When the model is loaded:
• The Patterns selection is applied to the current dataset.
• If you applied a filter to the model (when you created the model, described in
“Filtering” on page 8-49), you will be asked (every time you load the model)
whether to Use or Disable the filter. For a model that has a filter but has not yet

4 Insights User’s Guide


Section

been trained, you should select Use Filter; at other times you may choose Use or
Disable.

• The Model Statistics window, described on page 8-64, appears.


You can load a model with the dataset on which it was created, or with any other dataset
that includes variables with the same names. If you try to load a model with no dataset
loaded, or with a dataset that does not have all of the variables needed by the model, an
error message is displayed and the model is not loaded.
Process analysis using a model may not give good results if the model is run on data that
is more than 5% outside the range of the data to which the model was scaled. When you
load a model, the range of each variable in the currently loaded dataset is compared to
the variable bounds, and a warning message is displayed when any variable is out of
range.

This is only a warning, and does not prevent your using the model if desired. Only one
message is produced, even if more than one variable is out of range. You can view the
variable bounds to determine which variables are out of range.

Insights User’s Guide 5


You can also use the Select Model window to delete a model or remove it from the data
dictionary. To do so, select the model name and then click Delete. A dialog appears,
prompting you to select one of the following options:
Delete
Delete the model files as well as the model’s entry in the data dictionary,
Just Remove from DD
Delete the model’s entry in the data dictionary without deleting the model files, or
Cancel
Cancel the operation without deleting the model files or changing the data
dictionary.

Copy Model
Use Copy Model to save the current model under a new name, without affecting the
model files having the old name. It invokes a version of the Save Model window.

6 Insights User’s Guide


Section

This window is similar to the Save Model window described on page 8-63, but it also
has a Save Training toggle button. If the model that is being copied has already been
trained, this toggle button controls whether the copy retains the training or is copied
without training; if the model that is being copied has not already been trained, this
toggle button is grayed out.
You can save the model in the format used by the current version of the software or in
the format used by a previous version. This feature is useful if you intend to use the
model on a computer that you have not updated with the latest version of the software.
A given version of the software can use models saved in an earlier format, but the
software may not be able to use a model saved in a later format.

Save Model As
This operation is the same as Copy Model, above.

Rename Model Variables


In the Rename Model Variables window, rename a variable by selecting it in the
variable list and then entering a name in the New Variable Name field. Then click the
OK button to enter the change.

Insights User’s Guide 7


After renaming variables, enter a new name for the changed model. Optionally, you can
also enter a new directory for the model. If you rename variables, you must also rename
the model.
Keep in mind that you will not be able to load the new model unless the currently-
loaded dataset has matching variable names.

8 Insights User’s Guide


Section

Delete Model File


Delete Model File is used to delete a model’s files from your disk and remove its name
from your data dictionary. It invokes the Delete Model dialog.

This dialog displays the name of the current data dictionary, and a scrolled list of all
models recorded in it (sorted by directory). If you want to use a different data dictionary,
click in the box, type in its name, and press the Return key, and models in the new data
dictionary will be listed. To select a model from the list, click on it and then click the
Delete button (or, as a shortcut, just double-click on the model). You can type in the full
path name of a model instead. When you click the Delete button, you are asked whether
to delete the model, just remove it from the data dictionary, or Cancel.

Delete permanently removes the files from your disk; Just Remove From DD removes
the data dictionary entry without affecting the disk files. The data dictionary can have

Insights User’s Guide 9


multiple entries for the same model if you use the Save model As command and specify
the same file under both absolute and relative pathnames, or if you specify it with
differing case under operating systems that are not case-sensitive. However, on the disk
only one copy of the model exists, so Deleting the duplicate entry will delete the files.
Just Remove From DD removes the duplicate entry without deleting the files.

Show Model Statistics


Show Model Statistics invokes the Model Statistics window, which is also invoked
automatically when you load a model. It is described on page 8-64.

Show Variable Bounds


Show Variable Bounds invokes the Variable Bounds window, which can also be invoked
from the Model Statistics window. It is described on page 8-65.

Epoch Pull-Down Menu


The operation in this menu, Replace Best with Current, forces the model to use the
training from the latest epoch rather than from the epoch having the lowest test error.
For more information, see “Best Epoch Adjustment” on page 9-23.

Phase Pull-Down Menu


This menu appears only for FANN models.

A FANN model consists of two elements, identified as Phase 1 and Phase 2, as


illustrated on page 8-2. The two phases must be trained separately. When a FANN
model is loaded, the menu bar contains a Phase menu, which is used to select which
Phase is to be trained. If you try to leave the Train window without training both phases,
you will be warned. You cannot use a FANN model for analysis until both of its Phases
have been trained.

10 Insights User’s Guide


Section

Loading the Dataset and Model


If you did not already have a dataset and model loaded before you entered this window,
you must load them before you can begin training. Load Dataset and Load Model are
entries in the common File menu, described on page 9-2.

Measures of Training Performance


“Setting Patterns” on page 8-48 described how to set aside portions of the dataset for
training, testing, and validation. The validation data is not used during the training
process.
When training begins, the discrepancy between the model’s predicted values and the
dataset’s actual values is relatively high. At this point, the model cannot predict outputs
very well from the data. As training progresses, the model continues to modify its
internal structure to better represent the relationships between input variables and output
variables. As the model becomes better attuned to these relationships, both testing and
training relative error decrease, indicating that the model can more accurately predict
output values. (For equations, see Appendix B, Error Measures). Note that relative error
is not the commonly used statistical measure, R2. Unlike, R2, a lower relative error
indicates better performance. If relative error is less than or equal to 1., the relationship
between the two measures is

R 2 = 1 – ( rel-err ) 2

If relative error is greater than 1., R2 is undefined and is displayed as zero (0.).
After an extended exposure to the training patterns, the model may begin to “memorize”
or overtrain on the training data. This reduces the total relative error for the training
data, but also reduces the model’s ability to predict accurately from new data. When this
happens, the total relative error for the test data increases. Typically, there is an optimal
training point, at which the test relative error reaches its lowest value. The trainer
detects this optimal training point and saves the best internal parameters for the model.
Later, when you use the model for prediction and optimization, the model uses the
optimal internal parameters.

Insights User’s Guide 11


While the model is being trained, the epoch number, relative error, and R2 for the
current and the best epochs are displayed in the Train window.

If the model has more than one output variable, the error measures that are displayed are
a composite over all output variables. To see the relative error and R2 for individual
variables, run a Predicted vs. Actual analysis and view the report file; for more
information, see “Predicted vs. Actual” on page 10-2.
If you add gain constraints to a model and then retrain it, the training and testing errors
will probably change. For more information on gain constraints, see “Gain Constraints
and Extrapolation Training” on page 8-34.
In general, a relative error of 0.035 or less may indicate a useful model.

Training Types
There are three types of training: regular, stiff, and ridge. They are not all appropriate
for every type of model, and are grayed out when they should not be used. You can
partially train a model with one type, and then change to regular or stiff and continue
training; but if you change to ridge, it ignores any previous training and starts over.

Regular
The regular, general-purpose neural net trainer uses gradient descent
(“backpropagation”). It can be used with any model.

12 Insights User’s Guide


Section

Stiff
Stiff training is an alternative method for training neural networks, using the patented
Stiff Differential Equation Solver algorithm developed at Du Pont and licensed from
Du Pont. You cannot use stiff training for linear models.
If the data in the system is “stiff,” as defined in any standard textbook on numerical
methods (many chemical processes are known to be stiff), it may train to a better result
with stiff training than with regular. The regular trainer only uses first derivative
information to seek a solution; the stiff trainer also uses second order partial derivatives
in conjunction with a stiff differential equation solver.
However, the stiff trainer is too compute-intensive to be useful for problems with more
than about 30 total input and output variables, depending on the computational power
available to you. Stiff training can be considerably slower than regular training, and
requires more memory. Time and memory consumption increase rapidly with the
number of variables, and linearly with the number of patterns in the dataset. Stiff
training also defines epochs differently, processing successive epochs at differing
speeds, but will generally converge within ten or fewer epochs. (The first reported
epoch completes almost immediately, but typically has a relatively high error; this
corresponds to the initial state or “zero-th” epoch of regular training.)

Test and Validation Set Requirements


The stiff trainer has a greater risk of overtraining than does the regular trainer. This risk
is greatest when a random test set is selected, but also is present with the interval test
set. Whenever you use the stiff trainer, and especially when you use a random test set,
you should increase the size of the test set, and be sure to use a validation set of at least
5%–10% of the dataset (see “Setting Patterns” on page 8-48). After training the model,
we strongly recommend that you run a Predicted vs. Actual analysis using the validation
set, as described in “Predicted vs. Actual” on page 10-2. If the model trained well, but
cannot predict well on the validation set, then you know it has overtrained and you
should start the training over with a larger test set.

Completing Stiff Training


A model that has been partially trained using any available method can be further
trained using regular or stiff. In many cases the best results can be obtained by training

Insights User’s Guide 13


until regular begins to converge, and then changing to stiff training. When the stiff
trainer cannot train further, it produces a message similar to this:

This message may or may not indicate an error condition. If you get this message, you
should first check the Error History plot or the error values displayed on the Train
window. If the stiff trainer has converged to a good solution (to a low relative error or
high R2), there is no need for further training.

Ridge
The ridge trainer, which performs ridge regression, is available only for linear models.
If the data from your system is actually linear, the ridge trainer will train it faster and
better than the regular trainer. If a linear model does not train well (to a low relative
error or high R2), then either there is insufficient data, or the data is not linear and you
should go back to the model builder and specify a new model that is not linear.

Training Parameters
Training parameters are set by default; you may, but do not have to, change them. You
can view them at any time, but they cannot be changed while the model is training. If
you want to change any parameters after training has begun, you have to stop training,
make the changes, and restart the training. If you select Stiff or Ridge training, all

14 Insights User’s Guide


Section

parameters except Final Epoch and Train Rel Error are ignored. To inspect or change
training parameters, click Edit Parameters to invoke the Training Parameters window.

Stopping Criteria
No matter what stopping criteria you set, you can always stop training at any time by
clicking Stop Training in the Train window.
The Autostop algorithm is used to recognize that training has stopped improving; it will
cause training to stop if the training relative error begins and continues to increase, or if
it remains essentially unchanged for an extended period. This is useful if you want to let
a model train for a long time (overnight or over a weekend) without watching it, but you
don’t want to consume computer resources unnecessarily. Autostop applies only to
regular training, and is ignored for stiff or ridge. Autostop defaults on, but you are
warned if you start training with it on.
You can train for a particular number of epochs by setting a Final Epoch number. If you
stop training and later re-start it, the epoch numbers continue from where they stopped,
so the Final Epoch number is a cumulative total. The default is 10000.

Insights User’s Guide 15


If you specify a Train Rel Error, training will stop when the training relative error
becomes as low as the specified value. The default is 0.0.

Auto Learning Rate Adjustment


This parameter applies only to regular training, and is ignored for stiff or ridge. The
regular training algorithm essentially consists of gradient descent with added noise.
Gradient descent requires a choice of step size. The auto learning rate option adjusts the
step size dynamically according to how learning is progressing. When this option is
turned off, a fixed step size is used. Auto learning rate is the default option, and on the
average will outperform a fixed step size.
Auto Learning Rate Adjustment should normally be left On. If it is turned Off, training
will ordinarily be slower, and perhaps significantly so, but there is a small chance that a
slightly better model might result.

Sparse Data Algorithm


Note: Do not confuse the sparse data algorithm with training filters,
which are designed for redundant data as described on page 8-49.

Sparse data occurs when the model input variables are sampled frequently, but the
outputs are sampled relatively infrequently; for example, you may have 10 minute
samples of the process, but the output comes from a lab analysis performed only every 2
hours. The time merge function (“Time Merge” on page 5-35) interpolates or
extrapolates the output values to the time interval that you specify. The certainty of a
value records whether it already existed before the time merge or whether it was
generated, and, if generated, how far away it was from known data. For more
information, see “Certainty” on page 5-42. If Sparse Data Algorithm is turned on, the
training will be weighted by the certainties, paying more attention to the actual values
provided in the original data than to merged values.

Note: The sparse data algorithm weights the error term on each
pattern by the certainty of the outputs; therefore, in effect it ignores
patterns whose outputs have zero or near-zero certainty. This
weighting method may appear to inflate the R2 numbers from what
would be anticipated from an un-weighted error term.

When you have finished viewing or setting the training parameters, click Done to close
this window and return to the Train window.

16 Insights User’s Guide


Section

Starting and Stopping Training


When you have set the training parameters and selected a training type, click Start
Training. For stiff and ridge training, you are reminded that most training parameters are
ignored.

For regular training, if you have Autostop turned on, you will be asked to verify it.

After you have chosen to set autostop either on or off, training begins. When training
begins, the Start Training button will gray out, and the Stop Training button will become
active.

If you use a filter with a FANN model (see “FANN” on page 8-2), the filter is applied at
the beginning and end of training of each phase.

Insights User’s Guide 17


You can wait for training to stop automatically based on the training parameters that you
have set, or you can stop it at any time by clicking Stop Training. If you stop, you can
later continue from where you stopped. If you stop training and later restart it, the
resulting model will not normally be exactly identical to a model trained on the same
data without stopping.
If you want to change any training parameters during training, you must stop the
training, make the changes, and start training again. During training, you can use the
editor (Chapter 3, File Editor) or formatter (Chapter 4, File Formatter), and you can
view the dataset in the preprocessor, but you cannot change the dataset or model, or
perform any Analysis.
You can train a model, leave this window (or even exit), and later train some more, but
do not try to train using a dataset different from the one on which it was originally
trained.
When training is complete, click Done to close the Train window. You cannot close this
window while a model is being trained. You will be warned if you close it while an
untrained model is loaded.

CAUTION: If you built a filter into the model (see “Filtering” on


page 8-49), and used the filter while you were training the model, you
may or may not wish to continue to use the filter while you use the
model for analysis. If you no longer wish to use the filter, you should
copy the model (using the Copy Model command in the File menu),
and disable the filter only in the copy. Be aware that if you disable a
filter, and then re-save the model while the filter is disabled, the filter is
permanently removed and you can never re-apply it without re-training
the model.

Training Monitors
During training you can monitor training performance using the Error History,
Prediction, and if applicable, Gain Constraints monitors, all accessible from the Train
window.

18 Insights User’s Guide


Section

Error History Plot


The Error History Plot window appears when you click Error History in the Train
window.

This window plots the relative error for both the training data and the testing data, at
each epoch. The X axis is epoch number, and the Y axis is relative error. If the
Continuous Update toggle button is turned on, the plot is updated at the end of every
epoch; if it is turned off, the plot is updated only when you click the Update button. The
Print button invokes the common Plot Print setup window, described in “Printing a Plot”
on page 6-39.
For a FANN model, you can view and print this plot for either phase only while the
model is being trained for that phase.
For models having multiple outputs, click Select Outputs to specify individual outputs
whose relative errors you wish to plot in addition to the average relative errors. When
displaying multiple plots, you can display them either stacked or overlaid, as indicated

Insights User’s Guide 19


in the option menu at the top of the plot window. The following sample window shows a
stacked plot.

Click Show Table to display the training history. The history shows the train relative
error and the test relative error at each epoch. The Relative Test Error and Relative Train
Error columns show composite errors for all outputs in the model. Subsequent columns

20 Insights User’s Guide


Section

show relative error for each output individually. The best epoch, the one whose training
is used by the model, is marked with an asterisk and colored blue.

Prediction Plots (Training Stripcharts)


Prediction Plots can be viewed while you train with the regular trainer. These plots are
stripcharts that compare each output variable’s original values in the dataset with its
predicted values from the model. At the beginning of training, the predicted output
values will simply track the mean of the original output; as training progresses and the
model begins to learn the process dynamics, the predicted output will more closely
approximate the original values.

Insights User’s Guide 21


The Prediction plots button invokes the Training Stripcharts window.

This window contains stripcharts of the original dataset values and the model’s
predicted values, for every output variable in the model.
There is an Update toggle button associated with each individual stripchart, as well as
an Update All toggle button. If Update All is turned off, it overrides the individual
Update toggles and prevents all plots from being updated; if Update All is turned on, the
Update toggle on each stripchart controls whether it is updated.

Note: Drawing the Prediction plots in Update mode consumes


computer resources and slows the training process. For faster
training, turn off Update mode and only update the plot occasionally.

The Selection button invokes the Training Stripchart Outputs dialog, where you select
which output variables to plot. This dialog lists all of the model output variables, for you
to select which ones to be plotted. Note that, for phase 2 of a FANN model, the model
outputs are the predicted values of the dependent variables (rather than the process
outputs); for more information, see “Model Types and Variable Types” on page 8-2.

22 Insights User’s Guide


Section

Gain Constraint Monitor


If you specified gain constraints (described in “Gain Constraints and Extrapolation
Training” on page 8-34) when you built the model, you can display gain constraints
information while training.

For every input/output pair for which you have defined gain constraints, the display
shows the percentage of training patterns in the dataset that do not conform to the
constraints. The figures on the left are for the current epoch, and the figures on the right
are for the best epoch.

Best Epoch Adjustment


Very early in the training process, before the model begins to converge on the best
internal parameters, the relative error for the testing patterns will typically fluctuate
more greatly than later in the training process. In certain rare cases, the test relative error
can momentarily fluctuate to a value that is lower than the model will ever converge to,

Insights User’s Guide 23


and that one aberrant value will be saved as the “best” epoch. An example of this
situation appears in the following Error History plot.

In this example, it is obvious that the results of the model at epoch 10 are not as good as
in later epochs when the testing and training errors converged. However, the low test
relative error at that point would cause epoch 10 to be remembered as the best epoch.
If you see from the Error History plot that this rare situation has occurred, you should
force the model to stop remembering the aberrant point as the best epoch. This is done
with the Replace Best with Current command, accessed from the Epoch menu.

24 Insights User’s Guide


Section

You must stop training before you can access this command, and then restart the
training. When you select Replace Best with Current, you are asked to confirm.

If you click Replace Best Epoch, a message appears to let you know when the
replacement is complete.

Note that it is not necessary to wait until the model has fully trained before you replace
the best epoch. You can continue to train, and if a better epoch (with lower relative
error) occurs, it will override the epoch that you replaced.

Insights User’s Guide 25


26 Insights User’s Guide
10


Predicted vs. Actual, page 10-2
Sensitivity vs. Rank, page 10-9 Model Analysis
10
• Output vs. Percent, page 10-22
• Sensitivity vs. Percent, page 10-36 Tools

This chapter describes the tools used to analyze a model. The model analysis tools are:
Predicted vs. Actual
Evaluate how well the model predicts process behaviors over the range of data used
to train the model. You can also calculate the residuals of the model. See “Predicted
vs. Actual” on page 10-2.
Sensitivity vs. Rank
Analyze the sensitivity of the outputs to each of the inputs; that is, determine how
much effect each input has on each output. Plot the inputs in order from greatest
sensitivity to lowest. See “Sensitivity vs. Rank” on page 10-9.
Output vs. Percent
From the lower boundary of a given input’s range to its upper boundary and while
holding all other inputs at their average (or other selected) values, plot a curve
showing how the input determines the output. See “Output vs. Percent” on
page 10-22.

Insights User’s Guide 1


Sensitivity vs. Percent
From the lower boundary of a given input’s range to its upper boundary and while
holding all other inputs at their average (or other selected) values, show how the
output’s sensitivity to the input varies. This plot is similar to the Output vs. Percent
plot except that it shows the partial derivative instead of the value of the output
variable. See “Sensitivity vs. Percent” on page 10-36.
The analysis tools are in the Tools > Model Analysis submenu.

Generating Reports and Data Files


All types of analysis allow you to write results to a report file and/or a data file. The
report file is intended for a human to view, while the data file is intended to be formatted
and read into a dataset. From the analysis plot windows, you can view the report file;
you can also view it in the editor (described in Chapter 3, File Editor), and you can view
or print it using your system resources.
The Print (or Print Plot) button on an analysis plot window invokes the common Plot
Print window, described in “Printing a Plot” on page 6-39.
Default filenames, based on the model name, are provided for the report and data files.
You can change these to any other names that you wish.

Predicted vs. Actual


A Predicted vs. Actual analysis is used to run a dataset through a trained model and
compare the model’s predicted output values with the actual output values recorded in
the dataset. The principal reasons for doing this are:
• to validate a model by running it on data that it never saw during training;
• to identify points that a trained model did not learn well;
• to view the training/testing relative error and R2 for individual variables in a model
with multiple output variables;
• to append predicted values to a dataset for residuals analysis. The residuals are the
difference between actual values and model calculated values. Analyzing the
distribution of the residuals can provide insight into how well models have
generalized to the process being modeled.

2 Insights User’s Guide


Section

Run Model Window


Invoke the Predicted vs. Actual Run Model window by selecting Tools > Model
Analysis > Predicted vs. Actual.

This window is used to set up and initiate a Predicted vs. Actual analysis. If you do not
already have a dataset and model loaded, you must load them using the File menu
(described in “Pull-Down Menus” on page 9-2).

Insights User’s Guide 3


The Patterns selection can be All, Test, Train, or Validation. “Setting Patterns” on
page 8-48 describes how the dataset is apportioned into these groups. (Remember that
the Model Statistics Window, described on page 8-64, indicates how many patterns
there are of each type; if you select a Patterns type that is empty, the model cannot run.)

The Patterns selections have the following uses:


All
View points that a trained model did not learn well. If the model was saved with a
filter (described in “Filtering” on page 8-49), and you applied the filter when you
most recently loaded the model, the designation “All” patterns should be
understood to mean “All patterns that remain after filtering.” To disable a filter that
you have already applied to a model, you must load the model again.
Test or Train
View the training/testing Relative Error and R2 for individual variables in a model
with multiple output variables.
Validation
Validate a model using the validation set in the same dataset on which it was
trained.
If you write a report file, the options are Summary, Outputs by Pattern, and Inputs and
Outputs by Pattern.
Summary report
A Summary report shows the Relative Error and R2 for each output variable, and
for all output variables combined, over all selected patterns.
Outputs by Pattern report
An Outputs by Pattern report includes the model’s Predicted value, the dataset’s
Actual value, and the Relative Error, for each output variable, and for all output
variables combined, at each selected pattern.

4 Insights User’s Guide


Section

Inputs and Outputs by Pattern report


An Inputs and Outputs by Pattern report includes the value of every input variable
at every selected pattern. (Because this type of report file is also generated by other
types of analysis, the input variables are reported with an initial and final value, but
in a Predicted vs. Actual analysis, the initial and final values are always identical.)
If you write a data file, it will contain the Actual outputs from the dataset and Predicted
outputs from the model. Each variable is written in a column, and each pattern is written
in a row, with several header rows at the beginning. Each variable’s time delay is
appended to its name (because any dataset variable may appear in the model at more
than one time delay); the characters _A are appended to Actual values, and _P are
appended to Predicted values.
You can write the Predicted Values as new columns appended to the current dataset (one
new column for each model output variable). The tag name given to each new column is
formed by combining the variable name, its time delay, the letter P, and an index
number. For models other than FANN models, you can also append the Internal
Parameters as new columns in the dataset; this can be useful for detecting regions of
process operation. These new columns can then be transformed, or used in models, or
deleted, just like any other column in the dataset; be sure to save the dataset if you want
to keep them. For PCR models, the internal parameters are the data scores.
The Predicted vs. Actual plot provides different plot tools from those available in the
plotter; if you want to display Predicted vs. Actual values in a plot, you must append the
Predicted Values column, then make an XY plot with the original variable on the X axis
and the appended column on the Y axis; for more information, see Chapter 6, Data
Plotter.
After you have finished setting up the Predicted vs. Actual analysis, click Run Model to
run it (or click Cancel to abandon it).

Insights User’s Guide 5


Predicted vs. Actual Window
After the selected patterns have been run through the model, the Predicted vs. Actual
plot window is invoked (if the model has more than one output variable, the Select
Outputs window, described below, comes up in front of the plot).

In this plot, predicted values are on the Y axis, and actual values are on the X axis.

6 Insights User’s Guide


Section

The reporting and printing options are described in “Generating Reports and Data Files”
on page 10-2.

Displaying Model Performance


For each selected output variable, this window plots one point for every data pattern that
was run through the model (if there is only one model output variable, it is selected by
default). The Actual values from the dataset are on the X axis, and the Predicted values
from the model are on the Y axis.
The toggle buttons allow you to display lines indicating Perfect Model, and, for each
variable, Best Fit, and one, two and three times the Standard Deviation. Perfect Model is
a 45° line showing the theoretical results of an perfect model (if this line is slightly
different from 45°, resize the window until the plot is square). Best Fit is a least-squares
fit to all the points for each variable. Standard Deviations are computed from the Best
Fit line.
If only one output variable is selected, the Slope of its Best Fit line, Standard Deviation,
and R2 (defined in “Measures of Training Performance” on page 9-11) are displayed.
The R2 is calculated only for this variable, for the patterns that were selected for this
run, so it will not necessarily match the R2 shown on the Train model window.

To display the coordinates of any point, click the Info tool, move the mouse onto the
point, and push and hold the mouse button; the variable’s name and time delay, the
Actual and Predicted values, and the row number of the actual output, will be displayed.

Insights User’s Guide 7


Selecting Variables
The Select Outputs button invokes the Select Outputs dialog.

This is the common window that was used in the plotter; but note that the list contains
only the model outputs (rather than all variables in the dataset), and that every variable
is identified both by its tag name and by its time delay. Predicted vs. Actual points are
plotted only for the variables that you select.

Cutting Points
It is best to remove all bad data from the dataset before you begin to build a model, but
sometimes this is not possible. When a Predicted vs. Actual plot shows that most points
in a dataset were modeled well (close to the Perfect Model line), but a few points are
farther off, occasionally the cause is that those points’ values are bad–even though they
are within the range of that variable’s good values. If you investigate such points and
find that they are, in fact, bad values, you can use the Cut Box tool to remove them from
the dataset. You would then have to build and train a new model, that would not use
those points.

CAUTION: Do not cut good values that were not modeled well.
Doing so may invalidate your model. Avoid cutting a point without
understanding why it is bad.

8 Insights User’s Guide


Section

To use the Cut Box or Uncut tool, click on its icon, drag through a rectangular area of
the plot (as you do in the Plot window, Chapter 6, Data Plotter), and then click Apply
(or click a different tool to cancel the marked area). Cuts and uncuts that you apply are
displayed on this plot, but they are not actually applied to the dataset yet. When you
have finished marking cuts and uncuts, you can click Cancel to discard them (and close
the window), or click Done to apply them as transforms to the dataset. You will be
warned that applying the transforms will cause the model to be cleared.

If you continue, the transform calculator may have to reapply some or all of the dataset’s
transform list.

Sensitivity vs. Rank


A Sensitivity vs. Rank analysis calculates the sensitivity of output variables to input
variables (that is, the effect or influence the inputs have on the outputs) over the patterns
in the dataset, and, for each output, ranks the inputs in order of sensitivity. In the
iterative modeling process, you can build a model using every input that you think could
possibly affect the outputs, run a Sensitivity vs. Rank analysis and remove from the
model those inputs that have negligible sensitivity, then train this reduced model
(perhaps repeating the cycle).

Sensitivity Measures
The three types of sensitivity measures are Average Absolute, Average, and Peak.

Insights User’s Guide 9


Average Absolute sensitivity is the distribution-averaged sum of the absolute values of
the partial derivatives of the input-output pairs,

Npats
∂o k, i
å
-----------
∂x k, j
k=1
Average Absolute = -------------------------
N pats

where Npats is the number of patterns in the dataset over which the distribution is
calculated, xk,j is the jth input for the kth pattern, and ok,i is the ith output for the kth
pattern.
Average sensitivity is the average of the partial derivatives (actual, not absolute, values),

N pats
∂o k, i
å ∂x k, -j
----------
Average = k---------------------
=1
-
N pats

Peak sensitivity is the maximum of the partials over all patterns,

∂o k, i
Peak = max æ ---------- , k ∈ 1, 2, …, N patsö
è ∂x - ø
k, j

All of these measures are scaled and are calculated for each pair of input and output
variables.

10 Insights User’s Guide


Section

Run Sensitivity Window


Invoke the Sensitivity vs. Rank Run Sensitivity window by selecting Tools > Model
Analysis > Sensitivity vs. Rank.

This window is used to set up and initiate a Sensitivity vs. Rank analysis. Sensitivity
may also be referred to as gain. If you do not already have a dataset and model loaded,
you must load them using the File menu (described in “Pull-Down Menus” on
page 9-2).

Insights User’s Guide 11


The Patterns selection can be All, Test, Train, or Validation.

“Setting Patterns” on page 8-48 describes how the dataset is apportioned into these
groups. (Remember that the Model Statistics window, described on page 8-64, indicates
how many patterns there are of each type. If you select a patterns type that is empty, the
model cannot run.)
If the model was saved with a filter (described in “Filtering” on page 8-49), and you
applied the filter when you most recently loaded the model, then the designation “All”
patterns should be understood to mean “All patterns that remain after filtering.” To
disable a filter that you have already applied to a model, you must load the model again.
The sparse data algorithm, used in training a model, is explained on page 9-16. You
should turn on Sparse Data Algorithm if and only if you used it for training this model.
If you write a report file, it can be Summary or Detailed.
• A Summary report shows, for each model output variable, a list of the model
inputs, ranked in order of Average Absolute sensitivity. Each input variable’s
Average Absolute, Average, and Peak sensitivity are shown, with the pattern
number of the pattern in which the Peak occurred.
• A Detailed report includes, for every selected pattern in the dataset, the per-pattern
sensitivity of every output to every input; thus the length of this file is the number
of patterns times the number of output variables times the number of input
variables, plus header and summary rows.
You can also write the sensitivity values (either scaled or unscaled) to a data file. The
data file will contain one column for every pair of input and output variables in the
model, with the sensitivity (gain) values for this pair at every pattern in the dataset. The
data file can be formatted and read back into the spreadsheet and plotter as a dataset for
plotting and analysis.

12 Insights User’s Guide


Section

After you have finished setting up the Sensitivity vs. Rank analysis, click Run
Sensitivity to run it (or click Cancel to abandon it). Sensitivity vs. Rank analysis can
take as long as training the model for one epoch.

Sensitivity vs. Rank Window


After the sensitivity has been calculated over the selected patterns, the Sensitivity vs.
Rank plot window is invoked (if the model has more than one output variable, the Sens.
vs. Rank Outputs dialog, described below, comes up in front of the plot).

Three toggle buttons control which type of sensitivity is displayed. One line is plotted
for each selected output variable. The Y axis is the sensitivity values; each plotted point
on a line represents that output’s sensitivity to one input variable. On each line, the
inputs are ordered by magnitude of sensitivity (of whichever type is being displayed);

Insights User’s Guide 13


this means that input variables do not necessarily occur in the same order on each line,
or from one plot type to another. The X axis is simply the rank order. To identify any
point, click the Info tool, then move the mouse onto the point and push and hold the
mouse button; the input and output variable names, time delays, and sensitivity will be
displayed.

If you are displaying Peak sensitivity, the pattern number at which the peak occurred
will also be shown.

To review the information for all points, generate a report or print the data. See
“Generating Reports and Data Files” on page 10-2.

Selecting Variables
The Select Outputs button invokes the Sens. vs. Rank Outputs dialog. This is the
common window that was used in the plotter; but note that the list contains only the
model outputs (rather than all variables in the dataset), and that every variable is
identified both by its tag name and by its time delay. Sensitivity results are plotted only
for the output variables that you select.

Interpreting the Results


Average sensitivity is the (scaled) average change of the output variable as the input
variable increases from its minimum to its maximum value. A positive average
sensitivity value indicates that, on average, the output value increases as the input
variable increases. A negative average sensitivity value indicates that, on average, the
output value decreases as the input value increases. Average absolute sensitivity is the
average of the magnitude (absolute value) of the change of the output variable as the
input variable increases from its minimum to its maximum value. (Note that this is not
in general the magnitude of the average.) Thus, average absolute sensitivity is always
positive, and is greater than or equal to the magnitude of average sensitivity.

14 Insights User’s Guide


Section

Average absolute sensitivity, then, gives a general indication of the strength of the
influence of an input on an output. Combined with average sensitivity, it can be used to
tell you whether the input-output relationship is linear, monotonic, or without a causal
connection.
The following illustrations show examples of X-Y plots (see “Output vs. Percent” on
page 10-22) of an input and an output variable over the extent of their range, with the
corresponding sensitivity relationships indicated.
Output

Input Output Input


Average Absolute Sens. = |Average Sens.| ≠ 0
Linear, Monotonic Nonlinear, Monotonic
Output

Output

Input Input
Average Absolute Sens. Average Absolute Sens.
> |Average Sens.| = Average Sens. = 0
Nonlinear, Nonmonotonic No Causal Relationship

Insights User’s Guide 15


If Average Absolute Sensitivity = |Average Sensitivity| ≠ 0, then the relationship is
monotonic. Monotonic means that the output variable does not change direction as the
input variable increases (no change in the sign of the sensitivity). The relationship is not
necessarily linear; strong nonlinearities can exist over small ranges of the input. The
degree of nonlinearity can be determined using the Output vs. Percent analysis tool
(described on page 10-22), which generates plots similar to those shown above, and the
Sensitivity vs. Percent analysis tool (described on page 10-36).
If Average Absolute Sensitivity > |Average Sensitivity|, then the relationship is
nonmonotonic and therefore nonlinear. The greater the inequality, the greater the
nonlinearity.
If Average Absolute Sensitivity = Average Sensitivity = 0, then there is no causal
relationship between the input variable and the output variable.
If Sensitivity vs. Rank results do not match what you know to be true of your process,
you should consider using gain-constrained training. For more information, see “Gain
Constraints and Extrapolation Training” on page 8-34.

Removing Model Variables


The Include and Exclude tools on the Sensitivity vs. Rank plot window are used to mark
input variables to be removed from the model, so you can build a new model that omits
the inputs to which no output is very sensitive. After marking variables to be removed,
you must save the modified model from this window using one of the Save Model
commands described in “File Menu” on page 10-22.
In a Sensitivity vs. Rank plot, every plotted line corresponds to one output variable.
Every plotted point corresponds to an input variable in the current model; every input
variable is represented by one or more plotted points—one point in each line. When you
mark any point as Included or Excluded, all other points that represent the same input
variable are automatically marked in the same way. All variables are initially Included.

16 Insights User’s Guide


Section

To use the Include Left tool, click on it; it will be highlighted, and the Apply button will
be enabled (not grayed out). Click at any point within the plotting area, and all of the
plot to the left of the point where you clicked will be highlighted.

Click Apply, or click any other tool button to cancel this selection. If you click Apply,
every input variable that corresponds to at least one point in the highlighted region will
be marked Include, and every input variable that is totally outside the highlighted region

Insights User’s Guide 17


will be marked Exclude. Variables that are Included retain their original appearance;
variables that are Excluded are highlighted with a large colored dot.

In the example illustrated above, the Include region was drawn to the right of the fifth
point, so the first five points on each output line are Included, but several other points
are also Included. An input variable is Excluded only if all of its corresponding points
are outside the Include region.
The Include Box tool is used to Include all variables within a specified rectangular
region of the plot, without changing the status of variables totally outside the region. To
use the Include Box tool, click on it and it will be highlighted, and the Apply button will
be enabled. Move the mouse onto the plot, to one corner of the rectangular region that

18 Insights User’s Guide


Section

you want to specify. Push and hold the mouse button, dragging the mouse diagonally to
the opposite corner, and release the button.

Click Apply, or click any other tool button to cancel this selection. If you click Apply,
every input variable that corresponds to at least one point in the highlighted region will

Insights User’s Guide 19


be marked Include; but (unlike the Include Left tool) variables totally outside the region
are not affected.

In the example illustrated above, the selected point is now Included, as well as other
points that correspond to the same input variable.
The Exclude tool is used to Exclude all variables within a specified rectangular region
of the plot, without changing the status of variables totally outside the region. To use the
Exclude tool, click on it and it will be highlighted, and the Apply button will be enabled.
As for the Include Box tool, move the mouse onto the plot, to one corner of the
rectangular region that you want to specify. Push and hold the mouse button, dragging
the mouse diagonally to the opposite corner, and release the button. Click Apply, or
click any other tool button to cancel this selection. If you click Apply, every input

20 Insights User’s Guide


Section

variable that corresponds to at least one point in the highlighted region will be marked
Exclude; variables totally outside the region are not affected.

Sorting the Dataset


Use the Sort button to change the order of columns in the dataset, such that input
variables appear in the order of their influence on any one output variable.
Sorting is always done in the order of Average Absolute Sensitivity, regardless of which
type of sensitivity is currently being displayed. If your model has only one output
variable, sorting is done immediately when you click the Sort button; otherwise, the
Outputs for Sens. vs. Rank dialog is invoked, for you to make a selection.

This dialog lists all of the output variables in your model, by name and time delay. To
cancel the sort, click Cancel. To select a variable, double-click on it, or click on it and
click Select. The input variables in your dataset will be sorted according to the selected
output variable’s sensitivity to them (if an input variable occurs in the model with
multiple time delays, it is sorted according to the largest sensitivity at any of its time
delays). A message will be displayed when the sort is complete. The dataset will remain
sorted for as long as it is loaded. The sorted version is not saved unless you save it
explicitly using one of the Save Dataset commands in the File pull-down menu.

Insights User’s Guide 21


File Menu
The File menu is used to save a dataset or model that you have modified while in this
window.

Note: A sorted dataset (described on page 10-21) is not saved to disk.


To save the sorted version of a dataset, you must save it explicitly.

CAUTION: If you use the Include and Exclude tools to modify the
model, your work will be lost if you do not save the modified model
from the File menu. If you select Save Model from any other window
that you may have open, only the current model will be saved, not the
modified model that you specified in this window.

Save Dataset saves the dataset under its current name; Save Dataset As invokes the
common Save Dataset dialog (described in “Save Dataset and Save Dataset As” on
page 5-50) for you to specify a new name.
Save Model saves the modified model under the same name as the original model,
destroying the original model; Save Model As invokes the common Save Model dialog
(described in “Save Model and Save Model As” on page 8-63) for you to specify a new
name. If you modified the model, you must train it before you can use it.

Output vs. Percent


The Output vs. Percent analysis generates plots similar to those illustrated on page
10-15. The Sensitivity vs. Rank data summarizes the sensitivity information into a set of
numbers for each input variable (Average Absolute, Average, and Peak sensitivity); the
Output vs. Percent data provides the details of how the output varies across the range of
each input variable. This detail allows you to distinguish whether a monotonic variable
is linear or nonlinear, and the extent of the nonlinearities.
Output vs. Percent calculations are not based on the values in the current dataset
(although a dataset must be loaded before the model can be loaded). The calculations
are made by stepping one variable through its range while holding all other variables at
fixed values, passing each resulting pattern through the model, and recording the value
of one output variable at each step. The steps are made at intervals of 5% of the
variable’s range. Only one output variable is considered. Only one variable is stepped at
a time; if you select more than one variable to be stepped, the first one is stepped while

22 Insights User’s Guide


Section

the other is held constant, and then the first one is held constant while the second one is
stepped.

Note: Output vs. Percent is intended for steady-state analysis. If any


variable is specified in the model with multiple time delays, this
analysis considers each time delay to be an independent variable,
stepped independently of that same variable’s value at any other time
delay.

If analysis results to not reflect what you know to be true of your process, consider
using gain-constrained training. For more information, see “Gain Constraints and
Extrapolation Training” on page 8-34.

Stepping a Variable
For Output vs. Percent, there are two ways to define which variable is stepped, and
through what range it is stepped. Recall that a dataset consists of a set of raw variables
obtained from the process history, which are then operated on by transform functions,
producing a set of transformed values from which a model is built. The dataset in its
after-transforms state can include variables that are unchanged from their raw values,
variables whose raw values have been modified by the transforms, and newly created
variables generated by transforms.
• You can choose to work with only the model variables. One model input variable is
stepped from the minimum to the maximum of its Variable Bounds (dataset range
bounds by default), while all other model inputs are held at a constant value (either
the average in the dataset on which the model was trained, or any value that you
specify). The combination of values at each step is used as a data pattern which is
input to the model. This is suitable for most models, but it does not preserve any
interrelationships that transforms define among variables, and it defines the step
interval linearly according to the variable’s transformed values.
• Alternatively, you can “look behind” the transforms and work with the before-
transforms values that are input to the transform list. One dataset variable is stepped
from the minimum to the maximum of its analysis variable range (AVR) while all
the other dataset variables are held at their analysis value (see “Before Transforms
Properties: Analysis Variable Range” on page 5-45). The combination of values at
each step is then fed through the transform list to calculate transformed values, and
the data pattern input to the model is extracted from these transformed dataset
values. This method preserves the variable interrelationships defined by transforms,
and it defines the step interval linearly according to the variable’s actual values in

Insights User’s Guide 23


the plant; but it requires you to set correct Analysis Variable Range values, and it
steps redundant or correlated variables independently.
(If neither of these methods is suitable for your model, you can simply create a new
dataset that contains whatever model input values you want to analyze, assign any
values to the model outputs, and run this dataset through a Predicted vs. Actual
analysis.)

Example
For the dataset and model in this example, we will suppose that:
• The dataset and the model include additional variables that are not relevant to this
discussion.
• Raw variables, which correspond to tags in the data system, include A, B, C, D, T1,
T2, and T3. We will not specify what type of measurement is represented by A, B,
C, or D; T1, T2, and T3 are three temperature measurements around the same part
of the process, so they would be expected to have similar values.
• A actually ranges from 70. to 130., and B actually ranges from 1485. to 1603.; but
the file that was input contains some erroneous or invalid values that appear as
99999. The bad values were removed from A with a $MarkCut transform, which
removes points by their row number; the bad values were removed from B with a
$CutAbove transform, which removes points by their value.
• C has a large range in which small variations are significant, so a $ln transform
has been applied to it.
• A computed variable, 2D, was created as twice the value of D. The model uses as
input both D and 2D.
• A computed variable, avgT, was created as the average of the three temperature
measurements. The model uses this average instead of any one of the measured
temperatures.

24 Insights User’s Guide


Section

Analysis Variable
Raw Data Values Range of Raw Data Range (modified)
(Before Transforms) A 70. 99999. A 70. 130.
B 1485. 99999. B 1485. 1603.
A B C D T1 T2 T3 C .1 1000. C .1 1000.
D 711. 1492. D 711. 1492.
T1 512. 768. T1 512. 768.
T2 508. 760. T2 508. 760.
T3 515. 771. T3 515. 771.

Transform List
!A!=$MarkCut(!A!,100,101)
!B!=$CutAbove(!B!,2000.)
!C!=$ln(!C!)
!avgT!=(!T1!+!T2!+!T3!)/3.
!2D!=2*!D!

Range of
Transformed Dataset Transformed Data Variable Bounds
A 70. 130. A 70. 130.
A B C D T1 T2 T3 2D avgT B 1485. 1603. B 1485. 1603.
C -1. 3. C -1. 3.
D 711. 1492. D 711. 1492.
T1 512. 768. T1 not used in model
T2 508. 760. T2 not used in model
T3 515. 771. T3 not used in model
2D 1422. 2986. 2D 1422. 2986.
avgT 511.7 766.3 avgT 511.7 766.3

If You Choose To Step Model Inputs


You can step one or more variables from A, B, C, D, 2D, and avgT (but not T1, T2, or
T3). Each stepped variable moves from the minimum to the maximum of its Variable
Bounds.
• If you step A, B, or avgT, there are no problems.
• If you step C, its stepped values are linear increments of $ln(!C!), which do not
correspond to linear intervals of the actual process measurement.

Insights User’s Guide 25


• If you step D, 2D is held constant, and if you step 2D, D is held constant, so they do
not retain the transform relationship 2D=2*D. This will cause misleading results.

If You Choose To Step Raw Variables


You can step one or more variables from A, B, C, D, T1, T2, and T3 (but not 2D or
AvgT). Each stepped variable moves from the minimum to the maximum of its Analysis
Variable Range.
• If you step A without modifying its Analysis Variable Range (AVR) as shown in the
illustration, it is stepped through the range of its raw data, from 70. to 99999. The
$MarkCut transform has no effect on this analysis, since it is based on row
number. All transformed values after the first one will be outside the model’s
Variable Bounds, producing invalid results (and a number of error messages).
• If you modify A’s Analysis Variable Range, it can be stepped with no problems.
• If you step B without modifying its AVR as shown in the illustration, it is stepped
through the range of its raw data, from 1485. to 99999. The $CutAbove transform
will throw out all values above 2000, so most points will not be calculated (and
there will be a number of error messages).
• If you modify B’s AVR, it can be stepped with no problems.
• If you step C, its stepped values are linear intervals of the process measurement,
with no problems.
• If you step D, the model variable 2D will step with it, retaining the transform
relationship, with no problems.
• If you step any one of T1, T2, and T3, the other two are held constant, which does
not reflect their known relationship in the process. This will cause misleading
results.

26 Insights User’s Guide


Section

Output vs. % Selection Window


Invoke the Output vs. Percent Selection window by selecting Tools > Model Analysis >
Output vs. Percent.

This window is used to select one output variable and one or more input variables for
the Output vs. Percent analysis.
If you do not already have a dataset and model loaded, you must load them using the
File menu (described in “Pull-Down Menus” on page 9-2).

Insights User’s Guide 27


The Model Output Selection lists all of the output variables in the model, with their time
delays. Select any one by clicking on it. The first one in the list is selected by default.
The report file shows the initial value and Variable Bounds of every model variable; and
then, for the variables that were stepped, the output value at each step. The complete set
of values that are run as the model patterns is available in the data file.
As described in “Stepping a Variable” on page 10-23, you can step the model inputs, or
the before-transforms values of the dataset’s raw variables. The Source option menu
selects which variables are stepped, and, for model inputs, how you will specify the
values of the variables that are not stepped.

If the Source is Transformed Averages or Current Model Values, the lower half of the
window is used for Model Input Selection, and lists all input variables in the model, as
shown above (however, for a FANN model, only the Independent variables are
analyzed, so the Dependent variables are not listed; for more information, see “Model
Types and Variable Types” on page 8-2). If the Source is Raw Analysis Values, the
lower half of the window is used for Raw Variable Selection, and lists all raw variables
in the before-transforms view of the dataset.

28 Insights User’s Guide


Section

In each case, you can select one or more variables to be stepped. (If you don’t select any,
they will all be selected automatically.) It is possible to select any raw variable in the
dataset, but for the purpose of this analysis, it only makes sense to select a variable that
is an input to the model, either directly or through transforms.
Every variable that is not being stepped is held at a constant value. When the Source is
Transformed Averages, the constant is each model input variable’s average value (in the
dataset on which the model was trained). When the Source is not Transformed
Averages, the Initialize menu in the menu bar is used to set values for all variables that
are not stepped.
• When the Source is Current Model Values, the constant is each model input
variable’s Current Value. The Current Values of the variables can be changed
whenever you run the model, so they will vary depending on what you have done
with the model most recently. They can be set by selecting Model Inputs from the
Initialize menu.
• When the Source is Raw Analysis Values, the constant is each raw variable’s
Analysis Value; the Analysis Value can be set in the Preprocessor Properties
window (see “Variable Properties” on page 5-43), or by selecting Raw Variables
from the Initialize menu.

Insights User’s Guide 29


Model Input Editor
When you select Model Inputs from the Initialize menu, the Model Input Editor is
invoked.

This window is used to set the Current Values for model input variables. Each model
input variable is listed by name and time delay (tau), with a slider bar ranging from the
minimum to the maximum of its Variable Bounds, and a text field containing its Current
Value. You can change any Current Value either by dragging on the slider bar, or by
typing in the text field. The Reset to Averages button will change every variable’s
Current Value back to its average value in the dataset on which the model was trained.

30 Insights User’s Guide


Section

The File menu contains two entries, Save Model and Copy Model. When you save a
model, its Current Values are saved with it. If you specify a set of Current Values that
you want to retain permanently with the model, you can save it under its current name
with Save Model, or save it under a new name with Copy Model, which operates as
described in “Save Model and Save Model As” on page 8-63.

Raw Table Editor


When you select Raw Variables from the Initialize menu, the Raw Table Editor is
invoked.
The left side of this window displays a scrolled area with each raw variable’s Tag Name,
Minimum, and Maximum of its AVR (except that string variables have no min and
max), and a text box containing the variable’s Analysis Value.

For numeric variables, there is also a slider that can be used to change the Analysis
Value, as an alternative to typing it. The AVR was specified in the preprocessor when
the dataset was created. To specify a set of raw values, type in the text boxes or drag the
sliders. To change the Analysis Values of all numeric variables back to the midpoint
value of their AVR, click Reset to Midpoint.

Insights User’s Guide 31


The right side of this window is used to display the model input variable Current Values
that will result from applying the dataset’s transform list to a configuration of raw
variable Analysis Values.

It displays the name of every model input variable, with its Current Value and minimum
and maximum Variable Bounds. After you set Analysis Values for the raw variables,
you can click Apply; the dataset’s transform list will be applied to the Analysis Values
of the raw variables, generating transformed values for all variables in the dataset.
These transformed values become the Current Values of the model input variables. (If
any variable occurs in the model with multiple time delays, this process causes the
Current Value at every time delay to be the same, so the model input variables are each
listed only once, without time delay information.)
However, if any raw variable’s AVR has not been set to appropriate values, it is possible
to assign to it an Analysis Value which, after the transform list has been applied to it,
results in a transformed value that is outside the range of the model’s Variable Bounds.
If you get an error message stating that a variable is out of its allowed range and that you
need to change the AVR to put the variable back within the variable bounds, you must
look at the transforms to see which raw variables are inputs to that model variable (it
may be that there is only itself), and be sure to set a valid AVR for each of the raw
variables. You can change them from the AVR menu, which operates as described in
“AVR Menu” on page 5-59; for more information about AVR and Analysis Value, see
also “Before Transforms Properties: Analysis Variable Range” on page 5-45.
The File menu contains entries Save Dataset and Save Dataset As. If you change any
Analysis Values or Analysis Variable Ranges, you can save the changes permanently
with the dataset. Save dataset writes the changes into the current dataset; Save Dataset

32 Insights User’s Guide


Section

As invokes the Save Dataset dialog (described in “Save Dataset and Save Dataset As”
on page 5-50) so you can save the changed dataset under a different name.
After you have finished setting Analysis Values, click Done to close this window.

Initiating the Output vs. Percent Analysis


After you have selected one model output, the analysis source (Transformed Averages,
Current Model Values, or Raw Analysis Values), and at least one variable to be stepped,
click Run to run the analysis.
If you run from Current Model Values or Raw Analysis Values, and you provide input
values that are outside the variable bounds, you will see one or more error messages.
The Output vs. % plot appears when the analysis is complete.

Insights User’s Guide 33


Output vs. % Plot
The Output vs. % plot appears when the Output vs. Percent analysis is complete.

This window displays the results for one model output variable, showing how the output
is affected by selected variables as they move through their range of values. Each line in
the plot corresponds to one of the selected variables, either a model input variable, or a
before-transforms raw variable, depending on the Source that you selected in the Output
vs. % Selection window. As a given variable is stepped through its range of values, the
other variables are held at fixed values. Each point on the line corresponds to one value
through which the variable was stepped. The Y axis is the output variable’s calculated
value. If plotting multiple inputs, the X axis is the percent of the stepped variable’s
range, from 0 to 100. If plotting one input, the X axis is the input’s value. If Source was
Transformed Averages or Current Model Values, this range is the variable bounds; if
Source was Raw Analysis Values, this range is the analysis variable range. If the error
conditions described on the previous page cause any points to be invalid, those points

34 Insights User’s Guide


Section

are plotted with a large colored dot; any other points on the plot can still contain useful
information.
To display the coordinates of any point, move the mouse onto the point and press and
hold the mouse button.

The display shows the name of the input variable (and time delay, if it is a transformed
model variable), name and time delay of the output variable, value of the output, and
percent of the input’s range.
The reporting and printing options are described in “Generating Reports and Data Files”
on page 10-2.

Selecting Variables
The Select Inputs button invokes the Output vs. % Model Input Selection window, or
the Plot Variables Selection window, depending on whether you stepped model input
variables or raw variables. In either case, the variables listed are only the ones that you
have already calculated; this selection is to determine which calculated variables are to
be displayed.

Changing the Output Response


If your model’s output response does not reflect what you know to be true about the
process, consider defining enforced gain constraints for the model and then retraining it.
For more information on gain constraints and related concepts, see “Gain Constraints
and Extrapolation Training” on page 8-34.

Insights User’s Guide 35


Sensitivity vs. Percent
Sensitivity vs. Percent analysis is similar to Output vs. Percent: calculations are made
by stepping one model variable through its range while holding all other variables at a
fixed value, and passing each resulting pattern through the model; the difference is that
in Output vs. Percent analysis we calculate the value of the output variable at each step,
but in Sensitivity vs. Percent we calculate the partial derivative. Thus the Sensitivity vs.
Percent plot of an input variable is the derivative of the Output vs. Percent plot for that
same input.
For each input variable x, we compute k inputs and evaluate the partials at each of the k
input values. A partial derivative may be defined as an ordinary derivative with the
other independent variables held constant:

∂ ∂
o( x ) ≡ o ( x ) ;( x 1, x 2, x j ≠ k, …, x Nin ) = Constant
∂ xk ∂ xk
xk xk

that is, the other xj are held constant for j ≠ k .


As for an Output vs. Percent analysis, the calculations are not generated from values
from the current dataset (although a dataset must be loaded before the model can be
loaded). The steps are made at intervals of 5% of the range of the Variable Bounds. Only
one output variable is considered. Only one variable is stepped at a time; if you select
more than one variable to be stepped, the first one is stepped while the others are held
constant, and then the second one is stepped while the others are held constant, and so
on.

Note: Sensitivity vs. Percent is intended for steady-state analysis. If


any variable is specified in the model with multiple time delays, this
analysis considers each time delay to be an independent variable,
stepped independently of that same variable’s value at any other time
delay.

The major operational difference between Sensitivity vs. Percent and Output vs. Percent
is that only the model input variables can be stepped. Each selected variable is stepped

36 Insights User’s Guide


Section

within its Variable Bounds; each variable not currently being stepped is held constant,
either at its average value, or at an arbitrary Current Value that you set.

Sens. vs. % Selection Window


Invoke the Sensitivity vs. Percent Selection window by selecting Tools > Model
Analysis > Sensitivity vs. Percent.

Insights User’s Guide 37


This window is used to select one output variable and one or more model input variables
for the Sensitivity vs. Percent analysis. If you do not already have a dataset and model
loaded, you must load them using the File menu (described in “Pull-Down Menus” on
page 9-2).
The Model Output Selection lists all of the output variables in the model, with their time
delays. Select any one by clicking on it. The first one in the list is selected by default.
The report file shows the initial value and Variable Bounds of every model variable; and
then, for the variables that were stepped, the sensitivity value at each step. The complete
set of values that are run as the model patterns is available in the data file.
The Model Input Selection lists all of the input variables in the model, with their time
delays. You can select any one or more of them to be stepped. If you don’t select any
variable to be stepped, they will all be selected automatically. Every variable that is not
being stepped is held at a constant value; the Source menu controls how the constants
are specified.
When the Source is Transformed Averages, the constant is each model input variable’s
average value (in the dataset on which the model was trained). When the Source is
Current Model Values, the Initialize menu in the menu bar is used to set values for all
variables that are not stepped.
The Model Inputs selection in the Initialize menu invokes the Model Input Editor,
described on page 10-30.

Initiating the Sensitivity vs. Percent Analysis


After you have selected one Model Output, the Source, and at least one Model Input to
be stepped, click Run to run the analysis. The Sensitivity vs. % plot appears when the
analysis is complete.

38 Insights User’s Guide


Section

Sensitivity vs. % Plot


The Sensitivity vs. % plot appears when the Sensitivity vs. Percent analysis is complete.

This window displays the results for one model output variable. Each line in the plot
corresponds to one model input variable whose values were stepped. Each point on the
line corresponds to one value through which the variable was stepped. The Y axis is the
output variable’s sensitivity to the input; the X axis is the percent of the variable’s range,
from 0 to 100. This range is the Variable Bounds.

Insights User’s Guide 39


To display the coordinates of any point, move the mouse onto the point and press and
hold the mouse button.

The reporting and printing options are described in “Generating Reports and Data Files”
on page 10-2.

Selecting Variables
The Select Inputs button invokes the Sens. vs. % Model Input Selection window. The
variables listed are only the ones that you have already calculated; this selection is to
determine which calculated variables are to be displayed.

Changing the Sensitivity


If your model’s sensitivities do not reflect what you know to be true about the process,
consider defining enforced gain constraints for the model and then retraining it. For
more information on gain constraints and related concepts, see “Gain Constraints and
Extrapolation Training” on page 8-34.

40 Insights User’s Guide


11
• What Ifs Main Window, page 11-1
• Setpoint Editor: Inputs, page 11-20 11 What Ifs
• Setpoint Editor: Outputs,
page 11-23
• What Ifs Checklist, page 11-27

In a What If, or predict outputs, study, you specify a value for every input variable in
your model, and the model predicts values for all of the output variables.
Use the What Ifs tool after you have built, trained, and analyzed your model according
to the instructions in Chapter 8, Building a Model, Chapter 9, Model Trainer, and
Chapter 10, Model Analysis Tools.
To invoke the What Ifs tool, select Tools > Model What Ifs.

What Ifs Main Window


The initial state of the What Ifs window depends on whether you already have a dataset
and model loaded, and if so, what model type it is.
If no model is loaded, the central part of the window is blank. Load a dataset and model
now using the operations in the File pull-down menu.
When a dataset and model are loaded, all variables in the model appear on the screen.

Insights User’s Guide 1


The variables are grouped as Inputs and Outputs.

2 Insights User’s Guide


Section

If a FANN model is loaded, the variables are grouped as Independents, Initial values of
Dependents, Outputs, and Predicted values of Dependents.

Display Components
The display for each variable consists of its name and tau (time delay), Utilities button,
setpoint display bar, numeric display, and Action Menu Selection region.

Insights User’s Guide 3


Any variable that you assigned multiple time delays in the Build Model window appears
once for each of its taus, as described in “Variables in the Model” on page 8-30. If a
variable name does not fit in its display area, move the mouse onto it and press the
mouse button to display the complete name. You can also see the complete name in the
positional help area (at the bottom of the window) when the mouse is over the name.
Below each variable’s name is its Utilities button, which you can drag to display its
Utilities menu. For Independent inputs in a FANN model, and for all inputs in other
models, the Utilities menu has a submenu for Clamping.
Utilities menu for independent inputs:

Utilities menu for dependent inputs and outputs:

Input Setpoint Editors are described on page 11-20; Output Setpoint Editors are
described on page 11-23; Stripcharts are described on page 11-17; and Clamping is
described on page 11-21.
To the right of each variable listing is the setpoint display bar (“setpoint widget”). This
display bar is your visual key to the behavior of the system. It is a graphical interface
that allows you to set initial and desired values, and other parameters that control the
calculations. When the mouse is over any indicator in this area, its name and value are

4 Insights User’s Guide


Section

displayed in the positional help at the bottom of the window. You can change these
values by dragging on the indicators. While you drag, a popup shows the parameter’s
name and value. As calculations are made and a variable’s value is changed, the
indicators move to display the new value. All of the values that can be displayed and set
in this display bar can also be set in the Setpoint Editor, so they are described in
“Setpoint Editor: Inputs” on page 11-20 and “Setpoint Editor: Outputs” on page 11-23.
The Setpoint Editor for each variable is accessed from its Utilities menu, as shown
above.
To the right of each setpoint display bar is a numeric display that shows a value for each
model variable. When you have just been using the graphical interface to change values
of components of the setpoint display bar, the value that you set most recently is
displayed here. After you set up and initiate a calculation, the calculated final value is
displayed here.
The numeric display and the area behind it are also used as the Action Menu Selection
region to select and unselect model variables; a highlight in this region indicates that the
variable is selected for the Action menu. All selected variables are affected by any
actions that you take using the Action menu in the menu bar, described on page 11-6. To
select a variable, click on its Action Menu Selection region; to select multiple
contiguous variables, drag on them; to select additional noncontiguous variables, make
an original selection, then control-click or control-drag on the additional variable(s). To
unselect a variable, control-click on it. To unselect all variables in a section of the
screen, click the Clear Selection button in that section. The setpoint widget below shows
the selected variable highlighted at the right:

Menu Bar
The menu bar provides the following pull-down menus and option menus:
File Pull-Down Menu page 11-6
Action Pull-Down Menu page 11-6
Edit Pull-Down Menu page 11-6
Mode Option Menu page 11-6

Insights User’s Guide 5


Patterns Option Menu page 11-7
Source Option Menu page 11-7

File Pull-Down Menu


The File menu is the same as the common File menu described on page 9-2, with the
addition of the Save Model command.
Values for parameters and constraints are saved with a model, so you can change them
here in the Setpoints & What Ifs system and save the model with the changed values.

Action Pull-Down Menu


The Action menu provides shortcuts for applying certain Setpoint Editor selections to
multiple variables concurrently, without having to open the Setpoint Editor.
To use the Action menu, first select the variables to which the Action will be applied, by
clicking or dragging on each variable’s Action Menu Selection region (to the right of the
setpoint display bar), as described on page 11-5. Then select an operation from the
Action menu.
The menu provides the following operations:
Open Stripchart
See “Stripcharts” on page 11-17.
Clamping operations
See “Values, Clamping, Priority” on page 11-21.
Confidence
See “Confidence” on page 11-25.
Error Computation
See “Error Computation” on page 11-22 and “Error Computation” on page 11-25.

Edit Pull-Down Menu


The Raw Table Editor, invoked from the Edit menu, is described in “Raw Table Editor”
on page 10-31. It is useful in a Setpoints & What Ifs analysis when you set the Source
menu (described in “Source Option Menu,” below) to Raw Analysis Values.

Mode Option Menu


Select Predict Outputs for prediction (“What If” scenarios).

6 Insights User’s Guide


Section

Patterns Option Menu


The Patterns menu is used in conjunction with the Source menu, described below.
If the Source is Transformed Dataset, use the Patterns menu to select whether to use all
patterns in the dataset, or only the Testing, Training, or Validation patterns (apportioned
when you built the model, as described in “Setting Patterns” on page 8-48). The default
is All.

Note: If the model was saved with a filter (described in “Filtering” on


page 8-49), and you applied the filter when you most recently loaded
the model, then the designation “All” patterns should be understood
to mean “All patterns that remain after filtering”. To disable a filter that
you have already applied to a model, you must load the model again.

If Source is not Dataset, you should set Patterns All.

Source Option Menu


The Source menu is used to control how you specify the values of input variables to run
through the model. You can use all patterns in the current dataset, or you can create just
one pattern and use it. That one pattern can be defined by assigning values directly to
the model input variables, or to the raw variables in their Before Transforms state.
If you assign values to the raw variables in their Before Transforms state, the dataset’s
Transform List is then automatically applied, to produce a set of transformed values that
is fed into the model. Depending on the values that you assign to the raw variables, it is
possible to produce a set of transformed values that are outside the range of the model’s
Variable Bounds, and thus cannot result in reliable predictions.
Note that the values specified in the Source are not necessarily the values that are run
through the model; values from the Source can be modified by constraints and by
clamping before they are input to the model. For information on constraints and
clamping, see “Values, Clamping, Priority” on page 11-21 and “Constraints” on
page 11-22.
The Source option can have one of the following values:
Transformed Dataset
Input values are taken from the current dataset. You can either Step through the
patterns one at a time, or you can Run through them sequentially. The Patterns
menu, described above, can be used to select a subset of the dataset.

Insights User’s Guide 7


Transformed Averages
Inputs to the model are the average of each variable’s transformed values that were
used to train the model.
Current Screen
Inputs to the model are the values that you set as each variable’s Initial Value, either
in the setpoint display bar here on the Setpoints & What Ifs window, or in the
Setpoint Editor. When you select this source setting, the Update Initial button
appears next to the source option menu. Click Update Initial to set the initial values
of inputs to the current values.

Raw Midpoints
One pattern of inputs is generated by applying the dataset’s Transform List to the
midpoint of each raw variable’s Analysis Variable Range (which was specified in
the Preprocessor, and can be changed in the Raw Table Editor; for more
information, see “Raw Table Editor” on page 10-31).
Raw Analysis Values
One pattern of inputs is generated by applying the dataset’s Transform List to the
raw variables’ Analysis Values. The Analysis Values were originally set in the
Preprocessor, and can be changed using the Raw Table Editor. The Edit menu in the
menu bar can be used to open the Raw Table Editor.
DCS
This option is not supported in the Insights system.

Note: Source Transformed Averages and Source Current Screen


bypass the dataset’s Transform List and see each model variable
independently, so any interrelationships defined by transforms among
the model variables are not handled automatically. For example, if the
Before Transforms dataset included a raw variable X, and you created
a new computed variable 2X=2*X; then, if you use Current Screen and
set a value for X, the corresponding value for 2X is not set
automatically, you have to calculate and set it by hand. Both Raw

8 Insights User’s Guide


Section

Analysis Values and Raw Midpoints use the dataset’s Transform List,
so they automatically preserve the interrelationships defined by
transforms among the model variables.

Other Controls
Continuous Update
The Continuous Update toggle button is useful when Source is Transformed Dataset
and you use the Run button (described below) to process many patterns sequentially
(rather than stepping one pattern at a time).

If Continuous Update is turned off, the display on this window is not updated until the
run is complete (either you reach the end of the dataset, or you click the Stop button); if
Continuous Update is on, the display is updated at every pattern. Using Continuous
Update will significantly increase the processing time.

Edit
The parameters that appear in the setpoint display bars are explained in “Setpoint
Editor: Inputs” on page 11-20 and “Setpoint Editor: Outputs” on page 11-23. You can
change any of these parameters by dragging on its indicator in the setpoint display bar.
When you do so, its name and value appear in the Edit area.

You can fine-tune a parameter value by typing in the Edit area and pressing the Return
key.

Insights User’s Guide 9


Report
The Reporting button invokes the Setpoint Reporting Parameters window.

This window is used to specify reporting options. The label on the Reporting button in
the Setpoints & What Ifs main window changes to indicate whether you have turned
reporting on or off in this window.
You can write a Report File containing Summary information only, or information on
the model Outputs by Pattern, or information on the model Inputs and Outputs by
Pattern. The default filename is displayed; to use a different file or directory, click in the
box, type its name, and press the Return key while the mouse is still in the box.
You can write a Data File at any time, but generally it is most useful only when Source
is Dataset (see “Source Option Menu” on page 11-7). For every pattern in the dataset, it
writes a record containing the initial and final values of every input variable, and the
predicted and original values of every output variable. A single record at the top of the
file identifies each variable. This file can then be formatted and read into a dataset. The
default filename is displayed; to use a different file, click in the box, type its name, and
press the Return key while the mouse is still in the box.

10 Insights User’s Guide


Section

After the calculations have been made, you can click Show Report to invoke the Report
window, to display the Report file. You can also view this file with the editor
(Chapter 3, File Editor), and you can view or print it using your system resources.

Report File
All report files begin with the common information listing of all constraints and other
parameters for all variables.
If Mode is Predict Outputs and Source is not Transformed Dataset:
• The Summary and Outputs by Pattern reports both show the predicted and actual
(original) values, and Relative Error, for each output variable.
• The Inputs and Outputs by Pattern report includes the initial and final values for
input variables.
If Mode is Predict Outputs and Source is Transformed Dataset:
• The Summary report shows the Relative Error and R2 for each output variable, as a
composite over the entire dataset.
• The Outputs by Pattern report includes the predicted and actual (original) values,
and Relative Error, for each output variable at each pattern in the dataset.
• The Inputs and Outputs by Pattern report includes the initial and final values of
each input variable, at each pattern.

Insights User’s Guide 11


View
The View button invokes the Setpoint View dialog. There are two versions of this
dialog, depending on whether the model is a FANN model. For a FANN model, the
dialog looks like this:

12 Insights User’s Guide


Section

For other model types, the dialog looks like this:

The toggle switches control whether some of the setpoint parameters that you set are
visible in the setpoint display bars on the screen. (All parameters that you set are
operational, regardless of whether they are visible, but they cannot be changed on the
main screen if they are not visible there.) Settings are made separately for each type of
variable. When Mode is Predict Outputs, you cannot make Desired Value or Range
visible.
If you turn on Confidence Interval, the dataset is checked; if the current dataset is not
the one on which the model was trained, it may be impossible to compute the
confidence matrix. It is safest to generate the confidence matrix only with the dataset on
which the model was trained. Next, you will be warned that it may take several minutes
to initialize, and you will have the opportunity to cancel. This is a one-time initialization
for each model, which usually takes about twice the amount of time as to do a run
through all patterns in the dataset. The confidence information is saved permanently
with the model. If you turn Confidence Interval on, you can change the reporting
interval for each output variable in its Setpoint Editor (they all default to 90%), or you
can change it for one or more variables using the Action menu. For more information,
see “Confidence” on page 11-25.

Insights User’s Guide 13


Stop, Step, Run, Current Row, Run Speed

If Source is any selection except Transformed Dataset, the Step button is used to
make calculations; Stop, Run, Run Speed, and Current Row are ignored.
If Source is Transformed Dataset:
The Step button is used to make calculations for one pattern at a time. The Run button is
used to initiate continuous calculations through all selected patterns in the dataset. The
Stop button stops a continuous calculation. The Run Speed slider controls relative run
speed (slowest at the left, and fastest at the right); slow is useful if you have Continuous
Update turned on and want to view the step-by-step changes in values.
The Current Row number in the dataset is displayed; to move to a different row number
(for example, to move back to the beginning of the dataset), click in the box and type in
a new number. Since Current Row displays the row number that has just been
processed, if you type in a number and click Step, the row that is processed will be the
next valid row after the number that you typed. If you have selected Train, Test, or
Validation in the Patterns menu, or if there are breaks in the data, additional rows are
skipped.
When Current Row is 0, the “Current Row” label is grayed out, and you cannot change
to any other row until you Step or Run the model to move to an active row. Current Row
is set to 0 when you initially load a model, or any time you type in a row number that is
out of range.

Note: The examples below assume that you used the default Test Set,
such that the first 15 of every 100 rows are reserved for Testing, and
the other 85 rows are used for Training; and that you did not use a
Validation set. If you did not use the default Test Set, or if you did use
a Validation set, the examples will change accordingly. The dataset in
the example has 300 rows.

If you select Train in the Patterns menu, you can process only dataset rows that are part
of the Training Set (that is, 16-100, 116-200, 216-300). If you type a row number that is
at the end of one block of Training rows, the next row processed will be the first row in
the next Training block; for example, if you type 100, the next step will be row 116. If

14 Insights User’s Guide


Section

you type a row number that is outside the range of the Training Set (any number less
than 16 or greater than 300), it will be changed to 0, and the next step will be the first
Training row (16). If you type a row number that is within the range of the Training Set
but is not a Training row, it will automatically be changed to the next valid Training row
number, and when you step, the following row will be processed; for example, if you
type 112, it will be changed to 116, and the next step will process row 117. Behavior for
these cases, and for comparable cases when you select Test in the Patterns menu, is
summarized in the table below. Boundaries around breaks in the data are handled
similarly.

Insights User’s Guide 15


Rows in the Dataset Row
Current
1 Current Processed
Row after
Test Set Patterns Row that when You
15 You Press
Menu You Type Click Step
16 Return

All 0 0 1
Training Set
All 1 1 2

All 300 300 1


100 Test 0 0 1
101
Test Set Test 1 1 2
115
116 Test 15 15 101

Test 16 101 102

Training Set Test 101 101 102

Test 215 215 1

Test 300 0 1
200
201 Train 0 0 16
Test Set
215 Train 1 0 16
216
Train 16 16 17

Train 100 100 116


Training Set
Train 112 116 117

Train 116 116 117


300 Train 300 300 16

16 Insights User’s Guide


Section

Stripcharts
You can display two styles of stripchart plots: each variable can be in a separate
window, or all selected variables can be in a single scrolled window. Be aware that
updating stripcharts requires considerable resources and will significantly slow the
calculations.
To display stripcharts in individual windows, select Stripchart in each variable’s
Utilities menu.

The stripchart appears in a separate window.

Plot lines will appear in the stripchart the next time you step or run the model.

Insights User’s Guide 17


To display stripcharts in a single scrolled window, select the variables (by clicking in the
Action Menu Selection area, where the numeric value is displayed) and then select
Open Stripchart from the Action menu in the menu bar.

18 Insights User’s Guide


Section

The stripcharts appear together in a separate window.

The Update toggle button on each plot, and the Update All toggle button at the top of
the window, work together to control whether plots are updated continuously: if Update
All is off, nothing is updated, regardless of the state of the individual Update’s buttons;
if Update All is on, the individual Update buttons control updating on each plot. The
stripcharts within this scrolled window cannot be resized vertically.
For Input variables, the Initial and Final values (see “Values, Clamping, Priority” on
page 11-21) are plotted; for Output variables, the Original and Predicted values are
plotted.

Note: There is no limit to the number of variables that you can display
in individual stripchart plots, but most window manager systems limit
the number of windows that you can have open simultaneously. If you
attempt to exceed this system limitation, you may crash your
machine.

Insights User’s Guide 19


Setpoint Editor: Inputs
The Setpoint Editor for any variable is invoked when you drag on its Utilities menu and
select Setpoint Editor. This window displays information about input variables, and can
be used to set parameters for a Setpoint or What If analysis. For output variables and
predicted values of Dependent variables, see “Setpoint Editor: Outputs” on page 11-23.
Some settings are not applicable to Initial values of Dependent inputs in a FANN model,
so their fields are omitted or grayed out.

The upper left region of the Setpoint Editor contains the variable’s tag name and time
delay, and an enlarged view of the setpoint display bar with its min and max scaling
values. The setpoint display bar is a graphical interface that allows you to set initial and
desired values, and other parameters that control the calculations. Values of these
parameters can be changed by dragging on their indicator (either here in the Setpoint
Editor, or in the Setpoints & What Ifs main window), or by typing in the text boxes
provided in this window. Changes that you make in this window do not appear in the
Setpoints main window until you click Apply or Done. Scaling can be changed to any
values that you wish.

20 Insights User’s Guide


Section

Move the mouse over any of these indicators, and its name and value will be displayed
in the positional help area at the bottom of the window.
If any variable in the dataset is specified in the model with multiple time delays, any
changes that you make to Hard Constraints or Rate Constraints are applied to all time
delay instances of that variable.

Values, Clamping, Priority


If Source is Current Screen, Initial Value is used before you run a pattern through the
model, to assign an initial value to the variable. If Source is any value other than Current
Screen, Initial Value is used after you run a pattern through the model, to display the
initial value that came from the Source. Initial Value is represented by a red triangle in
the top half of the setpoint display bar.
When Mode is Predict Outputs, you specify an Initial Value for every input variable in
your model, and the model calculates predicted values for all of the output variables.
Final Value is used for display only, and cannot be changed. When Mode is Predict
Outputs, Final Value shows the input variable’s value that was actually used to calculate
the output prediction; this can differ from the Input Value from the Source if you have
applied Clamping (described below). Final Value is represented by a blue bar bisected
by a vertical red line.
Clamping is available for all input variables except Initial Dependents in a FANN
model. Clamping selections are Clamp to Screen Value, Clamp to Current (Disturbance)
Value, and No Clamping.
When Mode is Predict Outputs, Clamp to Screen Value can be used to override the input
values found in the Source and instead use a value that you specify by dragging the
Initial Value on the screen. This has no effect if Source is Current Screen (when all
values are taken from the screen anyway). If you Clamp to Screen Value, then drag the
Initial Value to a new position, the clamped value moves with it. When Mode is Predict
Outputs, Clamp to Current (Disturbance) Value has no effect.
Clamping can be set here in the Setpoint Editor, or in the Setpoints & What Ifs main
window using each variable’s Utilities menu or using the Action menu in the menu bar.
When clamping has been applied, it is represented by two small blue triangles bounding

Insights User’s Guide 21


the Final Value indicator. For Screen Value clamping, they point inward toward the
Final Value:

For Current Value clamping, they point outward from the Final Value:

Priority is ignored when Mode is Predict Outputs.

Statistics
The variable’s statistics in the model and in the current dataset are displayed for
information only.

Constraints
Constraints have no effect in Predict Outputs mode.

Error Computation
Error computation has no effect on Predict Outputs.

Confidence
Confidence appears in the Setpoint Editor, but is not currently operational for any input
variables.

22 Insights User’s Guide


Section

Control Buttons
Cancel closes the Setpoint Editor window without applying any changes. Apply applies
the changes without closing the window. Done applies the changes and closes the
window.

Setpoint Display Bar: Inputs

Final value Initial value


(blue bar) (red triangle)

Hard constraints Rate-of-change


(red-striped regions) constraints (yellow bars)

Fuzzy constraints Desired range


(orange-striped regions) (light green region)

Desired value
(green bar)

Setpoint Editor: Outputs


The Setpoint Editor for any variable is invoked when you drag on its Utilities menu and
select Setpoint Editor. This window displays information about output variables and can

Insights User’s Guide 23


be used to set parameters for a Setpoint or What If analysis. For input variables, see
“Setpoint Editor: Inputs” on page 11-20.

The upper left region of the Setpoint Editor contains the variable’s tag name and time
delay, and an enlarged view of the setpoint display bar with its min and max scaling
values. The setpoint display bar is a graphical interface that allows you to set original
and desired values, and other parameters that control the calculations. Values of these
parameters can be changed by dragging on their indicator (either here in the Setpoint
Editor, or in the Setpoints & What Ifs main window), or by typing in the text boxes
provided in this window. Changes that you make in this window do not appear in the
Setpoints main window until you click Apply or Done. Scaling can be changed to any
values that you wish.
Move the mouse over any of these indicators, and its name and value will be displayed
in the positional help area at the bottom of the window.

Values
Original Value is used after you run a pattern through the model, to display the original
value that came from the Source. If Source is Current Screen, Original Value has no

24 Insights User’s Guide


Section

meaning. Original Value is represented by a red triangle in the top half of the setpoint
display bar.
Final Value is used for display only, and cannot be changed. When Mode is Predict
Outputs, Final Value is the predicted output. Final Value is represented by a blue bar
bisected by a vertical red line.

Statistics
The variable’s statistics in the model and in the current dataset are displayed for
information only.

Constraints
Fuzzy constraints have no effect when Mode is Predict Outputs. Hard Constraints and
Rate Constraints are not operational for any output variables.

Error Computation
Error computation for output variables has no effect on Predict Outputs.

Confidence
Confidence is operational only for output variables. Confidence can be controlled in the
Setpoint Editor only if it has already been turned on in the Setpoint View Parameters
dialog, described on page 11-12.
The Confidence range is based on the same concept as the standard student’s t-statistic.
It basically means that there is a certain confidence, or percentage, that the true process
output will fall within the given interval or “error bars” that bracket the model’s
predicted output. For example, if the Confidence selection is chosen to be 50%, the error
bars are typically relatively small and there is a 50/50 chance that the true output will
fall within the interval around the predicted output, whereas for the same prediction, if
we selected a Confidence of 99.9%, the error bars would be relatively larger, and we
would have a 99.9% probability that the true output is within the error bars of the
predicted output.

Insights User’s Guide 25


Before running or stepping the model, drag on the Confidence Selection option menu to
select the percent confidence that you want to calculate and display (or you can use the
Action menu to set the Confidence Selection for a group of variables).

After the run or step is completed, the setpoint display bars are updated to display a
black horizontal bar that indicates the lower and upper limits of the Confidence range at
the selected percentage, and the size of the range is displayed in the Setpoint Editor;
these values are positive numbers interpreted as a range around the calculated result.
Most commonly, both of these numbers will be the same. A narrow range generally
corresponds to a lower confidence selection:

A wider range generally corresponds to a higher confidence selection:

26 Insights User’s Guide


Section

If you change the Confidence percentage selection after stepping the model, it has no
effect on the displayed confidence range; you must run or step the model in order to
calculate the confidence range for the changed percentage selection.

Control Buttons
Cancel closes the Setpoint Editor window without applying any changes. Apply applies
the changes without closing the window. Done applies the changes and closes the
window.

Setpoint Display Bar: Outputs

Final value
(blue bar)

Initial value Confidence ranges


(red triangle) (black bars)

Fuzzy constraints Desired range


(orange-striped regions) (light green region)

Desired value
(green bar)

What Ifs Checklist


This section summarizes how to use all the pieces of the What Ifs system to perform an
analysis. It assumes that you have read in detail all What Ifs information in this chapter.

Insights User’s Guide 27


To Predict Outputs
When Mode is Predict Outputs, you specify a value for every input variable in your
model, and the model calculates predicted values for all of the output variables. You can
make calculations for each pattern in a dataset, or for a single specific pattern.

Predict Outputs from a Single Pattern


Predict Outputs from a Dataset (Source Transformed Averages, Current
(Source Transformed Dataset) Values, Raw Midpoints, Raw Analysis
Values)
1. Load a dataset and a model.

The dataset does not have to be the one on The dataset does not have to be the one on
which the model was trained, but it has to which the model was trained, but it has to
include variables that have the same names, include variables that have the same names.
and the dataset mean and standard deviation The values contained in this dataset are not
should not differ from those of the training used.
dataset by more than about 5%.

2. Set Mode to Predict Outputs.

3. All Constraints of every type are ignored in a Predict Outputs


analysis. You may wish to open the Setpoint View Parameters
window and make the Constraints invisible.

4. If you wish to display error bars around any of the predicted


output values, open the Setpoint View Parameters window
and turn on Confidence. Then use the Setpoint Editors or the
Action menu to specify the percent confidence that you wish
to have calculated. Confidence calculations are always
optional.

28 Insights User’s Guide


Section

Predict Outputs from a Single Pattern


Predict Outputs from a Dataset (Source Transformed Averages, Current
(Source Transformed Dataset) Values, Raw Midpoints, Raw Analysis
Values)
5. Consider the values that you want to use for the input vari-
ables, and set the input Source.

Set Source Transformed Dataset, and make Select a Source; and in all cases, set Patterns
a selection from the Patterns menu. to All.
• If Source is Transformed Averages, inputs
to the model are the average of each vari-
able’s transformed values that were used
to train the model.
• If Source is Current Screen, inputs to the
model are the values that you set as each
variable’s Initial Value, either in the set-
point display bar on the Setpoints & What
Ifs window, or in the Setpoint Editor.
• If Source is Raw Midpoints, one pattern of
inputs is generated by applying the
dataset’s Transform List to the midpoint
of each variable’s Analysis Variable
Range. The AVR was specified in the Pre-
processor, and can be changed using the
Raw Table Editor, which is accessed from
the Edit menu in the menu bar, and is
described on page 10-31.
• If Source is Raw Analysis Values, one pat-
tern of inputs is generated by applying the
dataset’s Transform List to the raw vari-
ables’ Analysis Values. The Analysis
Values were originally set in the Prepro-
cessor, and can be changed using the Raw
Table Editor.

Insights User’s Guide 29


Predict Outputs from a Single Pattern
Predict Outputs from a Dataset (Source Transformed Averages, Current
(Source Transformed Dataset) Values, Raw Midpoints, Raw Analysis
Values)
6. If you want to override any variable’s values from the speci-
fied Source and instead use a fixed, constant value: in the set-
point display bar set its Initial value, and then set Clamp to
Screen Value. You can set Initial values in the Setpoints &
What Ifs main window or in the Setpoint Editor, and you can
set clamping in the Utilities menu, the Action menu, or the
Setpoint Editor.
Note that clamping has no effect if Source is Current Screen.
Clamp to Current (Disturbance) Value is not used in Predict
Outputs.
If Source is Transformed Dataset and you do not set any
clamping, this analysis is identical to running Predicted vs.
Actual (described on page 10-2).

7. Turn on any stripchart plots that you may want to view, using
the Action menu or each variable’s Utilities menu. Stripcharts
are not generally used if Source is any value other than Trans-
formed Dataset.
For input variables, the Initial and Final values are plotted;
Initial is the value specified in the Source, and Final is the
same value unless clamping forces it to a different value.
For output variables, the Original and Predicted values are
plotted; Original is the value specified in the Source, and Pre-
dicted is the model’s calculated result.

If you have modified any input variable from If Source is not Dataset, the Original value
its values as contained in the dataset, you does not provide useful information.
should not expect the Predicted output val-
ues to match the Original values from the
dataset.

8. If you want to produce a report file, set the reporting parame-


ters.

30 Insights User’s Guide


Section

Predict Outputs from a Single Pattern


Predict Outputs from a Dataset (Source Transformed Averages, Current
(Source Transformed Dataset) Values, Raw Midpoints, Raw Analysis
Values)
9. Run the model.

You can step through the dataset one Click Step, and the model calculates the out-
selected pattern at a time, or run through all put values that would be produced from the
selected patterns continuously. specified settings of the input variables. The
• If you want to run only one pattern, click setpoint display bars and numerical displays
Step. The model calculates the output val- will be updated.
ues that would be produced from the
inputs at the Current Row, and update all
setpoint display bars and numerical dis-
plays to show the Final values of inputs
(clamping applied to the Source values)
and Predicted values of outputs.
• To run through all selected patterns con-
tinuously, set Continuous Update and Run
Speed as you wish, and click Run. Current
Row will be updated as each pattern is
processed. If Continuous Update is on, the
setpoint display bars and numerical dis-
plays will also be updated at each pattern.
If you want to stop before the end of the
dataset is reached, click Stop.
To move back to the beginning of the
dataset, or to move to any other particular
row number, change the Current Row as
explained in “Stop, Step, Run, Current Row,
Run Speed” on page 11-14.

10. If you turned on View Confidence Interval, the confidence


range for each output variable is displayed graphically in the
setpoint display bars, and numerically in the Setpoint Editors.

Insights User’s Guide 31


32 Insights User’s Guide
A
• General Information and Conven-
tions, page A-1 A Transform Reference
• Common Transforms and Con-
stants, page A-3
• Random Number Transforms,
page A-26
• Transforms for Batch Processes,
page A-27
• System-Generated Transforms,
page A-30
• Transform Finder, page A-33

General Information and Conventions


This appendix is the reference pages for transform functions and constants. For general
rules of syntax, see “Syntax” on page 7-7.
Unless noted otherwise:
• Transforms operate only on numeric values.
• x refers to any numeric constant or variable, and var is any numeric variable.
• dtVal is any date/time constant or variable, and dtVar is any date/time variable.
• string is any string constant or variable, and stringVar is any string variable.
String constants are surrounded by double quotes (") or single quotes(').
• Output of transforms is a single real variable, that can be the same as or different
from an input variable.
Date/time constants are specified as date followed by time, separated by at least one
space, surrounded by backslash (\) characters. The date and time must be in a form that
the parser can recognize; for more information, see “Units” on page 4-12. The

Insights User’s Guide 1


recommended form is \ mm/dd/yy hh:mm:ss.ttt \ with seconds and
thousandths optional.
Date/time increments are specified as a value and a unit, surrounded by backslash (\)
characters. The value is a real number. The unit can be milliseconds, seconds, minutes,
hours, or days (but not months or years); it can be spelled out, or abbreviated to the first
unique character.
Some status transforms operate on a variable’s status only, and some on its status and
value. For more information, see “Status” on page 5-2.
Function and constant names are not case sensitive. Sometimes we display and
document them in uppercase or mixed case, but this is solely for ease of recognition.
There is some limited capability for the transform calculator to access C functions that
you write, treating them as special-purpose, customized transform functions. Some
examples are provided, and appear in the User-Defined group of functions in the
transform calculator’s Functions and Constants list (but do not appear in the All
functions list). For more information, see Appendix E, User-Defined Transforms, or
contact your customer support representative.

Moving Window Transforms


Moving window transforms operate on the portion of an input variable that is within a
“moving window”. They all take as input at least a variable, an integer window size,
an alignment constant, and a threshold (some of them take additional input
arguments, specified below; the threshold is placed after all other arguments). The
size automatically shrinks as boundaries and breaks are approached. The
alignment constant must be one of $lag, $center, or $lead. If the alignment
is omitted, it is set to $lag. If alignment is $lag, each output value is formed from
input values in a window extending back in time, ending with the current value. If
alignment is $center, each output value is formed from input values in a window
centered on the current value. If alignment is $lead, each output value is formed
from input values in a window extending forward in time, beginning with the current
value. The threshold is useful in special situations where you may not want to use a
value based on fewer points than the complete window size; if the current window has
fewer valid input points than the specified threshold, the transform value is still
calculated but the point’s status is set to Cut. The default value of the threshold is
the same as the window size.

2 Insights User’s Guide


Section

CAUTION: For alignment=$center or alignment=$lead, moving


window transforms use future data values. Be careful that your model
does not rely on future values that will not be available when the
model is used with current, rather than historic, data. Compensate by
time-lagging future-windowed inputs.

Common Transforms and Constants


abs(x)
Absolute value.
acos(x)
Arccosine, result in radians. Inputs must be in the range [-1., 1] and outputs
range from 0. to π.
acosd(x)
Arccosine, result in degrees. Inputs must be in the range [-1., 1] and outputs
range from 0. to 180.
arx(var, D, b0, b1, …, bN-1, a1, a2, …, aN, mode)
This transform implements a causal linear filter of arbitrary degree with dead-
time. The equation implemented is
res[t]=-a1[t]*res[t-1]-a2[t]*res[t-2]-…
+b0[t]*var[t-D]+b1[t]*var[t-1-D]+…
such that var[t] is the filter input, res[t] is the filter output, D is a
constant integer delay, and the parameters ai and bi are either constants or
variables.
The filter degree, N, is inferred from the total number of arguments with the
assumption that an equal number of a and b parameters is specified. In order to
implement an unbalanced transfer function, the value 0.0 is specified for
unused parameters. There is no restriction against specifying unstable filters. If
a Break status is encountered in either var or one of the parameter variables
as part of transform evaluation, the filter is put into an “off-line” state. This
state propagates a Break status to the output and requires that the filter be “re-
primed” before normal filtering resumes. This re-priming occurs automatically
when both var and the parameters return to OK status and causes the internal
filter states to re-initialize to steady-state values consistent with the new input
value var[t]. Note, that if the filter parameters define any discrete-time
poles at z=1 (embedded integrators), the steady-state condition of the filter has
all internal states set to 0.0.

Insights User’s Guide 3


The mode argument is used to control processing of Error statuses; it can be
FILTER_DISABLE or FILTER_SMOOTH. FILTER_DISABLE causes the
filter to go off-line and propagate the Error status to the output. This is the
typical mode used during signal shaping and noise filtering applications where
errors are not a systematic part of the process. FILTER_SMOOTH is more
applicable when frequent errors are expected as part of normal behavior, such
as when a set of sparse lab measurements occurs with intervening error values.
In this mode, the filter returns an OK status and attempts to filter smoothly over
the error condition.
ascend
See $sort described on page A-23.
asin(x)
Arcsine, result in radians. Inputs must be in the range [-1., 1] and outputs range
from -π/2 to π/2.
asind(x)
Arcsine, result in degrees. Inputs must be in the range [-1., 1] and outputs
range from -90. to 90.
atan(x) or atan(y,x)
Arctangent, result in radians. This function can take one or two arguments. For
one argument, the result is between -π/2 and π/2. For two arguments (y, x) the
result is arctangent of y/x, ranging between -π and π, with the choice of
quadrant taken from the signs of x and y. For example: $atan(0,-1) = π,
$atan(1,-1) = 3π/4, $atan(1,0) = π/2, $atan(1,1) = π/4,
$atan(0,1) = 0, $atan(-1,1) = -π/4, $atan(-1,0) = -π/2,
$atan(-1,-1) = -3π/4, $atan(-.001,-1) = -3.14
atand(x) or atand(y,x)
Arctangent, result in degrees. Like atan, this function can take one or two
arguments. For one argument, the result is between -90 and +90; for two
arguments, the result is between -179.9999... and +180.
average(var,size[,alignment[,threshold]])
Synonym for $moveave; average value in a moving window. See “Moving
Window Transforms” on page A-2.
b_change, b_dcont, b_dfirst, b_dlast, b_dnum, b_fall, b_index,
b_last, b_max, b_mean, b_min, b_nvalid, b_rise, b_slope,
b_std, b_time, b_value
See $batch described on page A-28.

4 Insights User’s Guide


Section

batch, batchbreak, batchindex, batchx


See “Transforms for Batch Processes” on page A-27.
biasSensor(var,lab,tolerance,filter[,lab_delay],max_delay)
Intended for Virtual OnLine Analyzer applications. Generates a bias value for
correcting the predicted sensor value, var. A lab analysis value must be
available as variable lab. Specify a tolerance to prevent recalculation of the
bias for small changes or noise in the lab value. Specify a filter of 0.0 to use
only the most recent lab value in calculations; otherwise, specify a filter
coefficient between 0.0 and 1.0 to generate a weighted average of recent lab
values, to compensate for a noisy lab value or a lab value delayed by an
uncertain amount. If the lab measurement value is delayed, specify the delay in
the max_delay argument, in units equal to the time interval. If the delay is to
be made available at runtime through a variable, specify the variable name as
the lab_delay argument; the value passed through this variable should be in
units equal to the time interval. If the value of the lab delay variable is less then
zero, the delay is set to zero; if the value of the lab delay variable is greater
than max_delay, the delay is set to max_delay.
break(x)
Sets the output point to the specified value but with Break status; generally
used as the value resulting from a Conditional Expression (for more
information, see “Conditional Expression Constructors” on page 7-14). The
input value can be a numeric or date/time variable or constant, but not a string;
use $changestat to change the status of strings.
center
See “Moving Window Transforms” on page A-2.
certainty(var)
This function takes as input a variable (of any type) that has been Time
Merged. It returns real numbers ranging from 0 to 1, specifying on a row-by-
row basis how close each value is to completely extrapolated data (0.) or to
original data (1.). If the variable has not been Time Merged, the certainty of
every value is 1. The certainty of invalid points has Error status. (This same
information is available in the positional help area of the spreadsheet when the
mouse moves over a spreadsheet cell.) See also $setcert described on page
A-22, and the maxcert parameter to the $TimeMerge transform described
on page A-31.
changedate
See “System-Generated Transforms” on page A-30.

Insights User’s Guide 5


changelen(var,length)
Changes the length (number of rows) of the input variable, which can be of any
type. If the new length is longer than the original length, the output is
padded with Blanks; if the new length is shorter, values are deleted from the
bottom of the column. The output variable must be the same as the input.
changestat
See “System-Generated Transforms” on page A-30.
char(int[,int[,int...]])
Builds a string by converting one or more numeric variables or constants to
integers and interpreting them as ASCII character codes. If any character in the
string is non-printable, the result is changed to a question mark (?). The output
is a string variable.
checkFlatline(var,tolerance,window)
Sets the output to Error if the variable flatlines. If, in the number of rows
indicated by window, the difference between the variable’s maximum and
minimum values is less than tolerance, the variable is considered flatlined.
Specify a tolerance value to account for any insignificant amount of variation
caused by noise in the variable. For example, if the tolerance is 0 and the
window is 20, the variable may have the same value for 19 rows without
generating an error, but if the variable continues unchanged for 20 rows, the
20th row will be an error.
checkRange(var,min,max)
Sets the output to Error if the variable is outside the range indicated by min
and max.
checkRate(var,tolerance)
Sets the output to Error if the variable changes by more than tolerance
from one row to the next.
clearRows(var,startrow[,endrow])
Changes the variable’s status in the specified row(s) to Blank. The arguments
are a variable of any type, and one or two row numbers. If the endrow is
omitted, it is assumed to be the same as the startrow. The output variable
must be the same as the input.
clipabove, clipbelow
See “System-Generated Transforms” on page A-30.

6 Insights User’s Guide


Section

compare(string1,string2[,codes])
Compares the values of two string variables or constants, and returns a whole
number. The optional codes are expressed as a single argument within a single
pair of quotes. The returned values are 1 if equal and 0 if not equal, unless code
'L' is specified. The codes are not case-sensitive.
L Lexical comparison, returning -1 if string1 is lexically less than
string2, 0 if they are equal, and +1 if string1 is lexically
greater.
S Short compare: if string2 is shorter than string1, compare only
up to the length of string2.
T Before comparing, trim leading and trailing spaces from string1.
U Ignore case.
copyBreak(var)
If variable has no date/time pointer, returns value and status of variable. If
variable’s date/time pointer is Break, returns value of variable with status
Break. If variable’s date/time pointer is not Break, returns value and status of
variable.
copyRows(copyToVar, copyToStartRow, copyToEndRow,
copyFromVar,
copyFromStartRow[, copyFromEndRow])
The arguments are variables to copy to and from, with starting and ending row
numbers in each. The variables can be of any type but both must be of the same
type. This function copies one or more cells from the second variable and
pastes them into the first variable. If only one cell is copied, the
copyFromEndRow can be omitted, and if the copyToEndRow is greater
than copyToStartRow, the cell is pasted multiple times. If more than one
cell is copied, the copyFromEndRow must be specified, and the length from
copyToStartRow to copyToEndRow must equal the length from
copyFromStartRow to copyFromEndRow. The output of this transform
must be the same as the copyToVar.
corr(var1,var2[,count])
This transform takes as input two numeric variables and an optional count,
which defaults to 10 if omitted. It calculates the correlation coefficients of
var1 with respect to var2, for time delays (row number shifts) from -
count to count (so the result has count*2+1 rows).

Insights User’s Guide 7


correlation(var1,var2,…,varN)
This transform takes as input any number N of numeric variables, and
generates N columns each of length N containing the correlation matrix values.
These values are the same as the normalized Correlation plot; for more
information, see “Correlation Plot” on page 6-22. Correlation values are
normalized; use $covariance for unnormalized values. This special
transform requires up to N output variable names, as described in “Transforms
With Multiple Outputs” on page 7-10.
cos(x)
Cosine of a value that is in radians.
cosd(x)
Cosine of a value that is in degrees.
covariance(var1,var2,…,varN)
This transform takes as input any number N of numeric variables, and
generates N columns each of length N containing the covariance matrix values.
These values are the same as the unnormalized Correlation plot; for more
information, see “Correlation Plot” on page 6-22. Covariance values are
unnormalized; use $correlation for normalized values. This special
transform requires up to N output variable names, as described in “Transforms
With Multiple Outputs” on page 7-10.
covarTD(numX,numY,x1,t1,x2,t2,…,xN,tN,y1,s1,y2,s2,…,yM,sM)
This transform generates a cross covariance matrix (numX columns each of
length numY) with time delays (row number shifts). numX and numY are the
numbers of X and Y variables. x1,t1,… is a list of X variables each with a
time delay (number of rows); y1,s1,… is a list of Y variables each with a time
delay. The X and Y variables must be numeric. This special transform requires
up to numX output variable names, as described in “Transforms With Multiple
Outputs” on page 7-10.
cutabove, cutbelow
See “System-Generated Transforms” on page A-30.
cutstat(value)
Sets the output point to the specified value but with Cut status; generally used
as the value resulting from a Conditional Expression (for more information,
see “Conditional Expression Constructors” on page 7-14). The input value can
be a numeric or date/time, but not a string; use $changestat to change the
status of string variables.

8 Insights User’s Guide


Section

day(dtVal)
Returns a whole number indicating the day of the month (1-31) of the input
date/time value.
deleterows(var,startrow[,endrow])
The arguments are a variable of any type and one or two row numbers. It
deletes rows from the variable, shifts any remaining rows up, and reduces the
variable’s length. If the endrow is omitted, only one row is deleted. The
output of this transform must be the same as the input variable.
delta(var[,n])
The two arguments are a variable and an integer n. The second argument may
be omitted and is then assumed to be 1. The result is the value of the specified
variable in row current+n minus the value in the current row; except that, if
row current+n is beyond a cell with Break status, or beyond the end of the
column, then the last valid result is repeated.
descend
See $sort described on page A-23.
differs(var)
Takes as input a variable of any type; returns 1 if the value in the current row is
different from the value in the previous row, and 0 if they are the same. If the
current row has any bad status, the result is Error; if the current row has a good
status but the previous row has a bad status, the result is 1.
double(x)
Causes the result of the variable, constant, or calculation x to be in double-
precision. If the output is a new variable, it will be of type double-precision,
but if the output variable already exists, its type is not changed.
dt(dtVal)
Causes the result of the date/time variable, constant, or calculation dtVal to
be a date/time. If the output is a new variable, it will be of type date/time, but if
the output variable already exists, its type is not changed. For example, from
!newdt! = \4/25/80 08:30:00\
the type of newdt is real, but from
!newdt! = $dt(\4/25/80 08:30:00\)
the type of newdt is date/time.
dtadd(dtVal,increment)
Returns a date/time value generated by adding increment to dtVal.
Increment is specified as described on page A-2, and can be negative. For

Insights User’s Guide 9


example,
!process_time! = $dtadd(!process_time!,\5 min\)
adds 5 minutes to the values in the variable process_time.
dtcreate(dtStart,dtEnd,increment)
Returns a new date/time variable generated by stepping from the constant
dtStart to the constant dtEnd by increment. The constants and the
increment are specified as described on page A-2. For example,
!dtVar!=$dtcreate(\01/01/94 00:00:00\,
\01/02/94 12:00:00\, \1.5 h\)
creates a new date/time variable beginning at midnight on January 1, 1994,
stepping by ninety minute intervals to noon on January 2.
dtdiff(dtVal1,dtVal2)
Returns dtVal1 minus dtVal2, expressed as a (real) number of days.
dtmake(year[,month[,day[,hour[,minute[,second[,millisecond
s]]]]]])
Returns a new date/time variable generated from the specified whole numbers,
which may be numeric variables or constants. Default values are provided for
any omitted arguments. For example,
!dtVar!=$dtmake(95,2,1,0,10*$row)
creates a new date/time variable, with values February 1, 1995, at 00:10; 2/1/
95 at 00:20; 2/1/95 at 00:30; etc. See also $dtcreate.
dtread(string[,format])
Returns a new date/time variable generated by reading a string. The string can
be a variable, a constant, or the output of the $str function (described on page
A-24). The format is specified as for Units in the formatter (see “Units” on
page 4-12), and must be surrounded by quote (") characters. The format may
be omitted if the default parser can understand the string without it.
dtround(dtVal,increment)
Returns the input date/time value rounded to the nearest multiple of
increment. The increment must be less than one day, and should be an
even divisor of a whole day; it is specified as described on page A-2. For
example,
!dtVar! = $dtround(!dtVar!,\1 sec\)
rounds the variable dtVal to the nearest 1 second.
dtwrite(dtVal[,format])
Returns a string variable created by writing the input date/time according to the

10 Insights User’s Guide


Section

specified format. The format is specified as for the formatter (see “Display
Format” on page 4-19), and must be surrounded by quote (") characters. If the
format is omitted, the default format is used (mm/dd/yy hh:mm:ss).
dup(string,n)
Duplicates the contents of a string n times, where n is a positive integer
constant. For example, $dup('abc',3) returns the string abcabcabc.
The output is a string variable.
duprows(var,startrow,endrow)
The arguments are a variable of any type, and two row numbers. It copies the
variable’s value in the first row of the specified region and duplicates it into all
rows of the region. The output of this transform must be the same as the input
variable.
dwt(var,direction)
The arguments are a numeric variable, and a direction constant that is
either $forward or $inverse. The output is the Discrete Wavelet
Transform of the input variable.
e The constant e=2.71828...
encode(var,proximity,type,value1[,value2,...valueN])
Takes as input a numeric variable, a real constant proximity, a constant
indicating the encoding type, and N constant values, such that N ≥ 1 . This
special transform requires up to N output variable names (except for
type=e_range, there are only N-1 outputs), as described in “Transforms
With Multiple Outputs” on page 7-10.
For proximity=0., these are the encoding types and what they mean:
e_exact
Each output column i contains 1.0 if the input variable exactly equals
valuei, and 0.0 otherwise, for i=1,…,N.
e_lessthan
Each output column i contains 1.0 if the input variable is less than or
equal to valuei, and 0.0 otherwise, for i=1,…,N.

e_range
Each output column i contains 1.0 if valuei ≤ the input variable
≤ valuei+1, for i=1,…,N-1.

Insights User’s Guide 11


For proximity>0., in addition to the standard encoding described above,
the output is proportionately between 0 and 1 if (valuei-
proximity) ≤ the input variable’s value ≤ (valuei+proximity).
For example, from
(!a!,!b!,!c!,!d!)=$encode(!var!,0.,$E_RANGE,2.,4.,6.,8.
,10.)
the output column !a! will contain 1. when !var! is inclusively between 2.
and 4., and 0. otherwise; !b! will contain 1. when !var! is inclusively
between 4. and 6., and 0. otherwise, etc. See also $strcode described on
page A-24.
error This function has no input argument; it simply returns a point with Error status.
Generally, it is used as the value resulting from a Conditional Expression (for
more information, see “Conditional Expression Constructors” on page 7-14). It
can be applied to a variable of any type.
etread(string,format)
Reads elapsed time from the input string according to the specified
format, and outputs a real number of elapsed days. The elapsed time can be
specified in any combination of hours, minutes, seconds, milliseconds, and
fractions, but cannot include days, months, or years. The format is as
described in “Units” on page 4-12, with one addition: the symbol f indicates
fractions of hours or fractions of minutes, whichever was immediately
preceding; f means there is exactly one digit, ff means there are exactly two
digits, and fff means there are exactly three digits. For example,
$etread('18:00','hh:mm') = .75
$etread('11:45','hh:mm') = .49
$etread('11.75','h.ff') =.49
etwrite(x,format)
Interprets the input x as a real number of days, and writes it as an elapsed time
according to the specified format. The value x must be a real (floating-point)
number; to convert an integer to real, use the $real transform. The elapsed
time can be specified in any combination of hours, minutes, seconds,
milliseconds, and fractions, but cannot include days, months, or years. The
format is as described in “Units” on page 4-12, with one addition: the
symbol f indicates fractions of hours or fractions of minutes, whichever was
immediately preceding; f means there is exactly one digit, ff means there are
exactly two digits, and fff means there are exactly three digits. The output is

12 Insights User’s Guide


Section

a string variable. For example,


$etwrite(.75,'hh:mm') = '18:00'
$etwrite(.49,'hh:mm') = '11:45'
$etwrite(.49,'h.ff') = '11.75'
exp(x)
Exponential function: the constant e to the power of the argument.
expave(var, ff [, mode])
This is a simple single pole filter, defined as
res[t] = (1 - ff[t]) * res[t-1] + ff[t] * var[t]
such that var is the initial (input) column, res is the resulting (output)
column, t is the current row number, ff is the filter factor, and mode is used
to control processing of errors. The valid range of ff is 0. <= ff <= 1.
This transform is equivalent to this general filter implemented with $arx:
$arx(var, 0, ff, -(1-ff), mode) except that $arx does not place
restrictions on the values of ff and has slightly different error-handling
modes. Relationship to correlation: There are two common representations of a
first order filter. The filter factor (ff) representation as defined above is
commonly used by Chemical Engineers. Users with a signal processing or
statistics background are typically more comfortable with specifying c, which
is called the correlation coefficient, the discrete pole, or the smoothing factor.
ff is equivalent to 1-c. Relationship to time constant: It is often convenient to
filter based on a time constant. If you want to specify the filter factor in terms
of a time constant, use this formula: ff=1-exp(-dt/tau) such that dt is
the sampling frequency of the system, and tau is the time constant of the
filter. The mode argument is used to control processing of Error statuses; it
is any one of FILTER_DISABLE, FILTER_SMOOTH, or
FILTER_FREEZE. FILTER_DISABLE causes the filter to go off-line and
propagate the Error status to the output. This is the typical mode used during
signal shaping and noise filtering applications where errors are not a
systematic part of the process. FILTER_SMOOTH is more applicable when
frequent errors are expected as part of normal behavior, such as when a set of
sparse lab measurements occurs with intervening error values. In this mode,
the filter returns an OK status and attempts to filter smoothly over the error
condition. FILTER_FREEZE clamps the last good value of res[t] during
an input Error condition but does not propagate the Error status to the
output; when a new valid input is available, the filter output is reset to this new
value.

Insights User’s Guide 13


fft(var,length)
Calculates a raw Fast Fourier Transform (FFT) with no data windowing or
concern about aliasing. You specify an input variable and a length which
must be a constant power of 2. The input variable must not have any invalid
points. This special transform requires a list of up to five output variables (that
correspond to the frequency, real, imaginary, magnitude, and phase values), as
described in “Transforms With Multiple Outputs” on page 7-10.
filter_freeze
See expave.
filter_disable, filter_smooth
See arx described on page A-3 and expave, above.
findle(var,a1,a2,[b1,b2,[c1,c2,...]]z)
This transform is a shortcut for a multiply-nested $if statement. It takes as
input one variable, then any number of pairs of arguments, followed by one
final argument. For example,
$findle(!var!,a1,a2,b1,b2,c1,c2,z)
is equivalent to
$if(!var!<=a1, a2, $if(!var!<=b1, b2, $if(!var!<=c1,
c2, z)))
Both cases are verbalized as “if var is less than or equal to a1, then use value
a2, otherwise if var is less than or equal to b1, then use value b2, otherwise
if var is less than or equal to c1, then use value c2, otherwise use value z”.
fmt(x,format)
Builds a string by writing a numeric value according to the specified format.
The format is specified as “w.r”, where w is the width of the output string
and r is the number of places printed to the right of the decimal point. For
example, $fmt(123.4567,'5.2') returns the string 123.45. The output
is a string variable.
forcestat(x,status)
Takes as input a variable of any type and a character string that contains the
name of a status (Blank, Break, Cut, Error, OK), and changes the variable
to have the specified status. Unlike the $changestat transform,
$forcestat can be combined with $if to change statuses on a conditional
basis.
forward
See $dwt described on page A-11.

14 Insights User’s Guide


Section

heartBeat(var,N)
Write the integer value N to variable var. If N is 0, alternate between writing 0
and 1. This transform is useful for setting up a heartbeat signal for on-line
applications so that your DCS can verify that the application is running.
holdLast(var [,num_cycles])
If variable var has a bad status, use the last good value for var instead. If you
do not specify the optional num_cycles argument, the last good value is
used indefinitely, or until status returns to good. If you specify the
num_cycles argument, the last good value is used for at most that number of
execution cycles or until status returns to good, whichever comes first.
This transform provides improved robustness in the on-line environment
where you may need a model to function in spite of a bad status. In the off-line
environment, a cycle corresponds to a row in the dataset. The num_cycles
argument may be greater than the size of the dataset; the transform maintains
its state history from row to row regardless of dataset size.
hour(dtVal)
Returns a whole number indicating the hour (0-23) of the input date/time.
if(expression,trueValue,falseValue)
See “Conditional Expression Constructors” on page 7-14.
ifft(realVar,imagVar,length)
This function takes as input two variables that are interpreted as real and
imaginary values, and an FFT length that is a constant power of 2, and
returns a single real column that is the raw inverse FFT.
inf Constant, signifying infinity.
insertrows(var,startrow[,numberOfRows])
The arguments are a variable of any type, a starting row number, and an
optional number of rows. It inserts the specified numberOfRows above the
startrow, increasing the variable’s length. The new rows are filled with
Blanks. The output variable must be the same as the input.
int(x)
Causes the result of the numeric variable, constant, or calculation x to be an
integer. If the output is a new variable, it will be of type integer, but if the
output variable already exists, its type is not changed.
inverse
See $dwt described on page A-11.

Insights User’s Guide 15


isbadstatus(var)
For use as an expression in an $if transform (see “Conditional Expression
Constructors” on page 7-14). Returns true (nonzero) if status is Blank,
Missing, Error, or Cut. Returns false (zero) if status is OK or Break.
isbreak(var)
For use as an expression in an $if transform (see “Conditional Expression
Constructors” on page 7-14). Returns true (nonzero) if status is Break. Returns
false (zero) otherwise.
isvalid(var[,var2,...,varn])
For use as an expression in an $if transform (see “Conditional Expression
Constructors” on page 7-14). Same as $valid.
join(var1,var2[,var3,…])
This function takes any number of variables as arguments, and outputs a new
column formed by appending all of the inputs vertically. The length of the
output is the sum of the lengths of all of the inputs. The inputs may be of any
variable type, but if they are not all numerics, they must all be the same type.
lag, lead
See “Moving Window Transforms” on page A-2.
left(x,n) or left(x,characterList)
The first argument is a variable or value of any type. This function converts the
first argument to a string; then either extracts the leftmost n characters from the
string, where n is a positive integer constant; or extracts all characters to the
left of any character in the specified characterList, where
characterList is one or more characters listed within quote (") characters.
The output is a string variable. For example,
$left ("abcdefg",4) = abcd
$left("abcdefg","cxn") = ab
If nothing in the characterList is found in the string, the result is the
entire input string.
len(string)
Returns a whole number that is the string length (not the column length) of the
input string value.
ln(x) Natural logarithm (base e).

16 Insights User’s Guide


Section

log(x[,y])
The second argument may be omitted and is then assumed to be 10. The result
is the logarithm of x in base y.
lookup(var1,var2)
This function uses the values of the second argument as row numbers, and
returns the value of the first argument in the row number specified by the
second argument. The first argument can be of any variable type. Note that
both arguments must be variables. For example, suppose there is a variable
called !key!, and its values in the first four rows of the dataset are 3, 27, 49,
31; if you type the transform
$lookup(!flow1!,!key!)
then the results in the first four rows of the dataset will be the original values of
!flow1! from rows 3, 27, 49, and 31.
lookupRel(var1,var2)
This relative lookup function returns the value of the first argument in a row
number calculated by adding the values of the second argument to the current
row number. The first argument can be of any variable type. Note that both
arguments must be variables. For example, suppose there is a variable called
!key!, and its values in the first four rows of the dataset are 3, 27, 49, 31; if
you type the transform
$lookupRel(!flow1!,!key!)
then the results in the first four rows of the dataset will be the original values of
!flow1! from rows 4, 29, 52, and 35.
m_test, m_train, m_valid, m_ignore
These constants are used to set values in a variable that you designate in the
Set Patterns window to specify patterns for a newly built model (for more
information, see “Use Variable” on page 8-55). The numeric values can be
interpreted using the $ttv transform, described on page A-25.
markcut
See “System-Generated Transforms” on page A-30.
max(x[,y,z,…])
Takes any number of arguments and finds the maximum. If there is only one
argument, and it is a variable, the result at every row is the variable’s maximum
over all rows in the dataset. In any other case, the result at each row is
evaluated using a variable’s value in that row.

Insights User’s Guide 17


mean(var)
Mean of all values in the column.
median(var)
Median of all values in the column.
mid(string,numstart,numend)
Returns a portion of the string beginning at character number numstart
and ending with character number numend, where numstart and numend
are positive integer constants. If this segment is beyond the length of the input
string, the result is an empty string, not an error. The output is a string
variable.
midn(string,length,numStart)
Returns a portion of the string beginning at character number numStart
and continuing for the specified length, where numStart and length are
positive integer constants. If this segment is beyond the length of the input
string, the result is an empty string, not an error. The output is a string
variable.
millisec(dtVal)
Returns a whole number indicating the milliseconds (0-999) of the input date/
time.
min(x[,y,z,…])
Takes any number of arguments and finds the minimum. If there is only one
argument, and it is a variable, the result at every row is the variable’s minimum
over all rows in the dataset. In any other case, the result at each row is
evaluated using a variable’s value in that row.
minute(dtVal)
Returns a whole number indicating the minutes (0-59) of the input date/time.
x mod y
Modulo function (remainder after division). Examples: 21 $mod 5 evaluates
to 1, and .93 $mod .9 evaluates to .03
month(dtVal)
Returns a whole number indicating the month (1-12) of the input date/time.
moveave(var,size[,alignment[,threshold]])
Average value within the moving window. See also “Moving Window
Transforms” on page A-2.

18 Insights User’s Guide


Section

moveexp(var,size[,alignment,decayRate[,threshold]])
Exponential average in the moving window. The decayRate is as defined for
the $expave transform, described on page A-13. See also “Moving Window
Transforms” on page A-2.
movegauss(var,size[,alignment,stddev[,threshold]])
Gaussian filter applied within a moving window. The standard deviation
argument must be a positive constant. See also “Moving Window Transforms”
on page A-2.
movels(var,size[,alignment[,threshold]])
Moving least squares fit to the points in the moving window. See also “Moving
Window Transforms” on page A-2.
movemax(var,size[,alignment[,threshold]])
Maximum value within the moving window. See also “Moving Window
Transforms” on page A-2.
movemed(var,size[,alignment[,threshold]])
Median value within the moving window. See also “Moving Window
Transforms” on page A-2.
movemeda(var,size[,alignment[,threshold]])
Approximating median within the moving window. See also “Moving Window
Transforms” on page A-2.
movemin(var,size[,alignment[,threshold]])
Minimum value within the moving window. See also “Moving Window
Transforms” on page A-2.
movesd(var,size[,alignment[,threshold]])
Standard deviation within the moving window. See also “Moving Window
Transforms” on page A-2.
movevalid(var,size[,alignment])
Number of valid points within the moving window (if all points are valid, at
breaks and boundaries it shrinks as the window size shrinks). See also
“Moving Window Transforms” on page A-2. Note that this transform, unlike
all other Moving Window transforms, does not take a threshold argument.
none A general-purpose constant used as an input to a number of transforms,
documented with any function that can use it.
now A date/time constant indicating the current date and time.

Insights User’s Guide 19


nrows(var)
Returns the number of rows in the input variable, which can be of any variable
type.
nvalid(var)
Returns the number of valid points in the input variable, which can be of any
variable type.
ord(string)
Returns the ordinal (numeric) value of the ASCII character code for the first
character in the string. For example, $ord('a') returns 97.
override
See “System-Generated Transforms” on page A-30.
pca(var1,var2,…,varN)
Runs a Principal Components Analysis on any number N of numeric variables,
and generates N columns each of length N containing the product of the
principal component vector with its weight. These values are the same as the
PCA plot; for more information, see “P.C.A. Plot” on page 6-18. This special
transform requires up to N output variable names, as described in “Transforms
With Multiple Outputs” on page 7-10.
pi The constant 3.14159265...
pos(character,string)
Searches for a character (a string of length 1) within a string, and
returns a whole number specifying its position in the string, or 0 if not
found. The character and string can be constants or variables.
preserveRow(var,row)
Applies only to online datasets (has no effect in offline use). Retains value of
variable var up to the specified row through successive runs of the dataset.
This feature is particularly useful when running a time-delayed model in a
runtime application because previous model results are not saved; instead, they
are recomputed each time the dataset is run. Using this transform, you can
preserve previous model results in the generated dataset.
prev, prev2
These keywords are used to create transforms with recursive definitions. They
access the output value from the previous row, or from two rows previous. For
example, Fibonacci numbers could be calculated by
$if($row<=2,1,$prev+$prev2)

20 Insights User’s Guide


Section

rand, random, randomS, randS


See “Random Number Transforms” on page A-26.
rank(var)
Takes as input a numeric or string variable, and returns an integer column
containing the rank order of the values in the input variable; that is, rank of the
smallest value in the input is 1, rank of the next-smallest value in the input is 2,
etc. If two or more values in the input variable are identical, their rank is also
identical.
real(x)
Causes the result of the numeric variable, constant, or calculation x to be a
single-precision real number. If the output is a new variable, it will be of type
real, but if the output variable already exists, its type is not changed.
right(x,n) or right(x,characterList)
The first argument is a variable or value of any type. This function converts the
first argument to a string; then either extracts the rightmost n characters from a
string, where n is a positive integer constant; or extracts all characters to the
right of any character in the characterList, where characterList is
one or more characters specified within quote (") characters. The output is a
string variable. For example,
$right ("abcdefg",4) = defg
$right("abcdefg","dxn") = efg
If nothing in the input characterList is found in the string, the result is an
empty string.
round(x[,y])
Round x to the nearest multiple of y, which is assumed to be 1.0 if omitted. For
example, $round(.1)=0, and $round(-.9)=-1.
row (No arguments.) Returns the current row number.
scale(var[,a,b])
Linearly scales the values of the variable to the range defined by the constants
a and b. If a and b are omitted, they are set to 0. and 1.
scatcut
See “System-Generated Transforms” on page A-30.
second(dtVal)
Returns a whole number indicating the seconds (0-59) of the input date/time.

Insights User’s Guide 21


self Used when applying a single transform to multiple variables, to indicate that
each named output variable uses itself as input. For example, if you type in the
transform
!a!,!b! = $self * 2
the transforms actually generated into the Transform List will be
!a! = !self! * 2
!b! = !self! * 2
which are computational equivalent to
!a! = !a! * 2
!b! = !b! * 2
Note that if you use $self like this, with multiple variables on the left side of
the transform, those variable names must be surrounded with exclamation
points (!) and separated by commas.
setcert(var,certaintyvar)
This transform sets the certainty characteristic of an input variable (of any
type) to the values contained in the column certaintyvar. This transform
is useful for modifying the certainty values generated by time merge. The
output variable must be the same as the input variable. The certainty
characteristic, the value of certaintyvar, is inclusively between 0 and 1,
and has “granularity,” that is, it can only be set to a group of predetermined
values; the values that you set are automatically rounded to the nearest valid
certainty.
sgram(var1,[var2,]fftlen,displen,overlap)
This transform calculates a power spectrogram. The input column is divided
into segments (which may overlap). The mean is subtracted from each
segment, a tapering window is applied, and an FFT is calculated. The result of
each segment is output in a new column. If the optional second variable is
omitted, the result is auto-power rather than cross-power. fftlen is the
length of each segment, displen is the length of the generated columns, and
overlap is the amount to overlap each segment.
shift(var,n)
The arguments are a variable, and an integer n. If n is positive, the variable is
shifted down n rows, and Blanks are placed in the first n rows. If n is negative,
the variable is shifted up |n| rows, and Blanks are placed in the last |n|
rows. In either case, the length of the variable is not changed.

22 Insights User’s Guide


Section

sigmoid(x[,a[,b]])
Returns the sigmoid of x,

1
sigmoid(x) = -------------------------
-
1 + e a(b – x)

where a controls the slope and defaults to 1, and b controls the location and
defaults to 0. The result is centered around b; it will be about 0 at b-6a and
about 1 at b+6a.
sign(x)
Returns -1 if x<0; returns 0 if x=0; and returns +1 if x>0.
sin(x)
Sine of a value that is in radians.
sind(x)
Sine of a value that is in degrees.
sort(var1,var2[,direction])
This function takes as input two variables and an optional direction. It
sorts the second column in the order of the first column, in either ascending or
descending order. The direction is the constant $ascend or $descend, and
defaults to $ascend if omitted.
spec(var1,[var2,]fftlen,displen,overlap)
This transform calculates a power spectrum. The input column is divided into
segments (which may overlap). The mean is subtracted from each segment, a
tapering window is applied, and a FFT is calculated. The results are averaged
and returned in a single output column. If the optional second variable is
omitted, the result is auto-power rather than cross-power. fftlen is the
length of each segment, displen is the length of the generated columns, and
overlap is the amount to overlap each segment.
sqrt(x)
Square root.
status(var)
The input variable can be of any type. The output is a string variable displaying

Insights User’s Guide 23


the status of the input. If the input has been Time Merged, the certainty is
displayed instead of OK status.
stdev(var)
Standard deviation of all values in the column.
str(x1[,x2[,x3,…]])
Takes any number of arguments of any type, variables or constants, converts
any numerics to strings, and concatenates all inputs into a single output string.
If any argument is a date/time variable or constant, it is treated as the double-
precision number that is its internal representation. All numbers other than
integers are written with 6 decimals (for more control, use $fmt or
$dtwrite). The output is a string variable.
strcode(stringVar,value1[,…,valueN])
Takes as input a string variable and N constant values, such that N ≥ 1 . Each
output column i contains 1.0 if the input variable exactly equals valuei, and
0.0 otherwise. This special transform requires up to N output variable names,
as described in “Transforms With Multiple Outputs” on page 7-10.
subcol(var,startrow[,stoprow[,increment]])
The arguments are a variable (of any type), and one, two, or three integers.
This function returns a portion (subcolumn) of the input variable, from the
startrow to the stoprow, counting by the increment. If startrow
and stoprow are at least 1 they are interpreted as absolute row numbers; if
they are less than 1, they are counted backward from the end of the column
(that is, 0 is the last row in the column, -1 is the next-to-last row, etc.). The
increment defaults to 1 if stoprow >startrow, and -1 if
startrow>stoprow.
subst(stringVar,pattern,replacement[,'u'])
Takes as input a string variable, and pattern and replacement string
constants. In every instance where the specified pattern is found in the
input variable, the replacement string is substituted for it. If the "U" code
is specified, the pattern match operation is folded to upper-case; that is, it is not
case-sensitive. The output is a string variable.
sum(var)
The result of this transform at each row is the sum of the input variable in all
rows from 1 to current, except that the result restarts at zero after a Break in the
input.

24 Insights User’s Guide


Section

tan(x)
Tangent of a value that is in radians.
tand(x)
Tangent of a value that is in degrees.
timecut
See “System-Generated Transforms” on page A-30.
timemerge
See “System-Generated Transforms” on page A-30.
tm_average, tm_boxcar, and so forth: all keywords beginning with tm_
See $timemerge described on page A-31.
tmerge(valueVar,dtVar,method,interval,maxTimeGap,
handleDuplicates, handleOutOfOrder,maxCert)
This transform is similar to $TimeMerge. It takes as input a variable that has
numeric or string values and a date/time variable, and merges the value
variable to match the date/time variable. Unlike the system-generated
$TimeMerge, it does not delete the value variable’s original date/time
column. The additional arguments are the same as for $timemerge
described on page A-31. The output variable must be the same as the input
value variable.
today
A date/time constant indicating the current date, with time set to midnight.
trend(var)
The result of this transform is a new variable that is the best linear fit to the
input variable.
trunc(x)
Truncate the fractional portion of a number. For example, $trunc(4.9)=4,
and $trunc(-4.9)=-4.
ttv(var)
This transform is provided for your convenience in interpreting the numeric
values underlying the $m_test, $m_train, $m_valid, and $m_ignore
constants. It takes as input one numeric variable containing these constants’
values, and returns a string variable indicating the interpretation of each
constant, "Test", "Train", "Valid", "Ignore".

Insights User’s Guide 25


unmarkcut, unscatcut, untimecut
See “System-Generated Transforms” on page A-30.
val(string)
If the input string is the representation of a number (for example, "123.45"),
this function returns its numeric value; otherwise, returns 0.
valid(var1[, var2, …, varn])
Returns 1 when all input variables contain a valid value (OK status); 0 if any
input variable is Error, Missing, Blank, Cut, Break. The variables may be of
any type.
weekday(dtVal)
Returns a whole number indicating the day of the week (1=Sunday, …,
7=Saturday) of the input date/time.
withinPct(x,y,z)
Takes three arguments, x, y, and z, constants or variables, tests whether x is
within y percent of z, and returns 0 for false and 1 for true. For example,
$withinPct(x,5,z) tests whether (.95*z ≤ x ≤ 1.05*z).
year(dtVal)
Returns a whole number indicating the year of the input date/time.
yearday(dtVal)
Returns a whole number indicating the day of the year (1=Jan 1, 2=Jan 2, …,
365(or 366)=Dec 31) of the input date/time.

Random Number Transforms


Random number transforms can be used with or without a specified seed. Loading a
dataset, and a number of other transform operations, cause the dataset’s Transform List
to be applied in order to the Before Transforms values of the raw variables. Every time
an unseeded random number transform is applied it is recalculated, and thus generates a
different random sequence. The starting point or “seed” is taken from the system clock.
The seeded random number functions force the seed to a specified value. If the seed
argument ≤ 0, a system constant is used as the seed; if the argument is any positive
integer, that input value is used as the seed. When you apply a seeded transform, it
affects all $rand and $random sequences until you apply another seeded transform or
exit the program, even if you insert them in the Transform List above the seeded
transform, and even if you delete it after it has been applied.

26 Insights User’s Guide


Section

rand (No arguments.) At each row, generates a random real number between 0.0 and
1.0.
random([a,]b)
At each row, generates a random whole number inclusively between arguments
a and b. If a is omitted, it is assumed to be 1. a must be ≤ b.
randomS(seed,[a,]b)
At each row, generates a seeded random whole number inclusively between
arguments a and b. The seed is any positive integer. If a is omitted, it is
assumed to be 1. a must be ≤ b.
randS(seed)
At each row, generates a seeded random real number between 0.0 and 1.0. The
seed is any positive integer.

Gaussian Random
The random number transforms generate uniform sequences. If you want to use random
numbers with gaussian distribution instead, you must start by generating two random
variables, for example u1 and u2:
!u1! = $randS(42580)
!u2! = $randS(80191)
From these two variables, you can apply the following transform to generate a gaussian
sequence with mean 0. and std dev 1., for example g1:
!g1! = $sqrt(-2. * $ln(!u1!)) * $cos(2.* $pi * !u2!)
You can generate a second gaussian sequence, for example g2, from the same two
uniform variables:
!g2! = $sqrt(-2. * $ln(!u1!)) * $sin(2.* $pi * !u2!)

Transforms for Batch Processes


These special transforms are used only with batch processes. One column in the dataset
must contain a batch identifier (of any variable type, but usually a string), and all
records pertaining to a single batch must be contiguous.
• You first apply a $batchIndex transform, that takes the batch identifiers and
generates a new column containing the row numbers in which each new batch
starts. This new column of batch indices is then used as input to other transforms.

Insights User’s Guide 27


• Optionally, you may wish to filter the batch data using standard filtering transforms
such as $moveAve. Because you do not want to average data from different
batches together, you must use $batchBreak to insert breaks between batches.
After performing the filtering, you would have to update the batch index variable to
include the breaks, using a transform such as
!batch_index! = $self + $row - 1
• Finally, apply as many $batch or $batchX transforms as necessary to extract
feature information about the batch.
batch(index,var,feature[,code])
This transform is used to extract one feature from one variable, for output into
a new variable. The index column is output from $batchIndex, or can be
created by you by hand; it is a numeric variable containing the row numbers at
which every new batch starts (so its length equals the number of batches in the
dataset). The length of the output variable is the same as the length of the
index variable. var is the numeric variable containing the data. feature is
a code indicating the feature to be extracted from var. For some features, you
can set a code to return the time or location of a value rather than the value
itself. These are the batch features that can be queried:
b_dcont
Length of the first delta pulse.
b_dfirst
Value of the first delta (*).
b_dlast
Value of the last delta (*).
b_dnum
Number of delta.
b_first
First value (*).
b_last
Last value (*).
b_max
Maximum value (*).
b_mean
Average value.

28 Insights User’s Guide


Section

b_min
Minimum value (*).
b_nvalid
Number of valid values.
b_slope
Slope (based on a Least Squares fit).
b_std
Standard deviation.
Features that are marked with an asterisk (*) can be qualified by these codes.
b_index
Return the row number of the value, rather than its value.
b_time
Return the time of the value, rather than its value.
b_value
Return the value (default).
batchBreak(index,data)
This transform is used to insert Breaks in a data column between the end of
each batch and beginning of the next batch. It is used only when you wish to
perform some type of filtering (such as moving average, $moveAve), on the
data before extracting batch feature information. The index argument is the
column output from $batchIndex.
batchIndex(var,type[,tolerance])
This transform takes as input a variable containing batch identifiers and
generates a new column containing the row numbers in which each new batch
starts. This new index column is then used as input to $batch or $batchX
transforms. The batch identifier may be of any variable type. All records
pertaining to a single batch must be contiguous. The type argument is one of
these three constants, which control when to mark the start of a new batch:
b_change
Mark when a value differs from the previous value.
b_fall
Mark when a value decreases from the previous value.

Insights User’s Guide 29


b_rise
Mark when a value increases from the previous value.
The optional tolerance argument is a real number that is used when the
batch identifier is numeric, to specify how far two values can differ and still be
considered the same batch.
batchX(index,var,feature[,code])
This transform is identical to $batch, except that it returns results only from
every other batch, returning Blanks from the alternate batches.

System-Generated Transforms
These transform functions are generated automatically from user actions in the
spreadsheet or the Preprocessor Plot Window. When these functions have been applied
to a dataset, they appear in the Transform List and you can delete or modify them as
necessary. You can also type them directly into the Transform Window.

Important: If you perform an action that generates one of these


transforms, and then you want to rescind that action, you should
delete the transform, rather than performing an opposite action that
generates another transform; for example, delete a Cut instead of
applying an Uncut.

Unless specified otherwise, the input variables can be of any type.

Applied from the Spreadsheet


changedate(var,dtVar) or changedate(var,$none)
This transform is generated when you change the date/time reference of a
variable by clicking in its date/time reference cell and typing in the current
column number of the dtVar, or the word “none”.
changestat(var,startrow,[endrow,]status)
This transform is generated when you change the status of a cell by clicking in
it and typing a new status (Blank, Break, Cut, Error, OK). If endrow is
omitted, it is assumed to be the same as startrow. The transform generated
by the system omits the endrow because it is applied to a single cell, but you
can modify the transform to apply to a group of rows from startrow to a
different endrow. See also $forcestat described on page A-14.

30 Insights User’s Guide


Section

override(var,startrow,[endrow,]value)
This transform is generated when you change the value of a cell by clicking in
it and typing a new value. If endrow is omitted, it is assumed to be the same
as startrow. The transform generated by the system omits the endrow
because it is applied to a single cell, but you can modify the transform to apply
to a group of rows from startrow to a different endrow.
timemerge(start,end,interval,method,maxTimeGap,
handleDuplicates, handleOutOfOrder, maxCert,
var1[,…,varN])
This transform is generated when you apply a Time Merge from the Time
Merge window; see “Transform Window” on page 7-3 for more information on
the meanings of the parameters. The output of this transform is the new date/
time variable.
start is $tm_early_start (earliest start of any date/time column in the
dataset), $tm_late_start (latest start of any date/time column in the
dataset), or a date/time constant; end is $tm_late_end (latest end of any
date/time column in the dataset), $tm_early_end (earliest end of any date/
time column in the dataset), or a date/time constant. (The format for date/time
constants is described on page A-1).
interval is the Time Merge interval that you specified (in the format
described for Date/Time increments described on page A-2).
method is any of the constants $tm_boxcar, $tm_linear,
$tm_spline, $tm_linearExtend, $tm_splineExtend,
corresponding to your menu selection for the Time Merge method.
maxTimeGap is the Max Time Gap value that you specified, in the same
format for Date/Time increments.
handleDuplicates is one of the constants $tm_first,
$tm_average, or $tm_last, corresponding to your menu selection for
handling duplicate time values.
handleOutOfOrder is one of the constants $tm_cut or $tm_sort,
corresponding to your menu selection for handling out-of-order time values.
maxCert does not correspond to any field on the Time Merge window; it
controls how far a value can be from known data before its certainty is set to 0.
For more information, see “Certainty” on page 5-42.

Insights User’s Guide 31


The list of one or more input variables can contain only date/time variables and
unattached variables without any date/time reference.

Applied from the Plot Window


Cuts, Uncuts, and Clips are applied from the Preprocessor plot window, as described in
“Tools” on page 6-28 and “Transform Tools” on page 6-30. Cut X can be applied to
variables of any type; Cut Box and all types of Uncut can be applied to numerics and
date/times, but not strings; and Clip and Cut Y can be applied to numerics only. In the
specifications that follow, var is the tag name of a variable; startrow and endrow
are starting and ending row numbers; low and high are a range of the variable’s
values; dtVal is a date/time constant in the format \mm/dd/yy hh:mm:ss\.
clipabove(var,value) or clipbelow(var,value)
This transform is generated when you apply a Clip to any plot. For the
specified variable, it changes any value above or below the given value to
equal the value. The output variable must be the same as the input.
cutabove(var,value) or cutbelow(var,value)
This transform is generated when you apply a Cut Y to any plot. For the
specified variable, it cuts any point above or below the given value. The output
variable must be the same as the input.
markcut(var,startrow,endrow[,low,high])
This transform is generated when you apply a Cut X or Cut Box to a Row
Number plot. It cuts points within the range specified by the row numbers and
values. The output variable must be the same as the input.
scatcut(Yvar,[Ylow,Yhigh,]Xvar,Xlow,Xhigh,startrow,endrow)
This transform is generated when you apply a Cut X or Cut Box to an XY plot.
Within the specified row number range, it cuts points from the Y variable (not
from X) when both Xvar and Yvar fall within the specified range of values.
The output variable must be the same as the input Y variable.
timecut(var,dtVal1,dtVal2[,low,high])
This transform is generated when you apply a Cut X or Cut Box to a Time
Series plot. It cuts points within the range specified by the date/time values and
the low and high values. The output variable must be the same as the input.
unmarkcut(var,startrow,endrow,low,high)
This transform is generated when you apply an Uncut to a Row Number plot. It
uncuts points within the range specified by the row numbers and values. The
output variable must be the same as the input.

32 Insights User’s Guide


Section

unscatcut(Yvar,Ylow,Yhigh,Xvar,Xlow,Xhigh,startrow,endrow)
This transform is generated when you apply an Uncut to an XY plot. Within
the specified row number range, it uncuts points from the Y variable when both
Xvar and Yvar fall within the specified range of values. The output variable
must be the same as the input Y variable.
untimecut(var,dtVal1,dtVal2,low,high)
This transform is generated when you apply an Uncut to a Time Series plot. It
uncuts points within the range specified by the date/time values and the low
and high values. The output variable must be the same as the input.

Transform Finder
This section is provided to help you find transforms by topic. Consult the reference
pages above for syntax.

Insights User’s Guide 33


Date/Time
add time to a date/time $dtadd
change a variable’s date/time pointer $changedate
current date only, or date & time $today, $now
create a new date/time variable $dtcreate, $dtmake
day of month, extract from date/time $day
day number within week, extract from date/ $weekday
time
day number within year, extract from date/time $yearday
difference between two date/times $dtdiff
hour, extract from date/time $hour
make a new date/time variable $dtcreate, $dtmake
milliseconds, extract from date/time $millisec
minutes, extract from date/time $minute
month, extract from date/time $month
new date/time variable $dtcreate, $dtmake
now, date & time $now
pointer, change date/time $changedate
read date/time from a string $dtread
read elapsed time from a string $etread
seconds, extract from date/time $second
subtract two date/times $dtdiff
today’s date $today
type, set to date/time $dt
write date/time into a string $dtwrite
write elapsed time into a string $etwrite
year number, extract from date/time $year

Editing
change a value $override
clear (set to blank) $clearRows
clip $clipAbove, $clipBelow

34 Insights User’s Guide


Section

copy & paste $copyRows, $join


cut, uncut $cutAbove, $cutBelow,
$cutStat, $markCut,
$scatCut, $timeCut,
$UnMarkCut, $UnScatCut,
$UnTimeCut
delete $deleteRows
duplicate $dupRows
insert blanks $insertRows
scoot values up or down in the column $shift
search & replace within string $subst

Insights User’s Guide 35


Filtering/Smoothing
moving average $average, $moveAve
moving exponential filter $moveExp
moving Gaussian filter $moveGauss
moving least squares filter $moveLS
moving maximum filter $moveMax
moving median filter $moveMed
moving median approximating filter $moveMedA
moving minimum filter $moveMin
moving standard deviation filter $moveSD
moving number-valid filter $moveValid

Math & Statistics


absolute value $abs
best fit $trend
correlation, covariance $corr, $correlation,
$covariance, $covarTD
discrete wavelet $dwt
difference between two rows $delta, $differs
exponential $exp
exponential average $expAve
fit, best $trend
integrate $sum
linear trend (best fit) $trend
logarithm $ln, $log
modulus $mod
normalize $scale
round off, truncate $round, $trunc
sigmoid $sigmoid
sign, query $sign
square root $sqrt

36 Insights User’s Guide


Section

statistics $max, $mean, $median,


$min, $stdev
subtract two rows $delta
sum of successive rows $sum
trigonometry $sin, $sind, $cos, $cosd,
$tan, $tand, $asin,
$asind, $acos, $acosd,
$atan, $atand

Insights User’s Guide 37


Miscellaneous
change date/time pointer of a variable $changedate
change length of column $changelen
close to (is one value close to another) $withinPct
create multiple new columns, each containing $encode, $strcode
only 0’s and 1’s, coding whether the input had
a certain value or range
date/time pointer, change $changedate
difference between two rows $delta, $differs
extract a subset of a column $subCol
interpret test/train/validation set codes $ttv
length of column, change $changelen
lookup value by row number $lookup, $lookupRel
near (is one value near another) $withinPct
nested if $findle
model (applicable only with RunTime license; $RunVOA, $RunCEM, $RunSV,
other licenses may also be required) $RunBOOST, $RunModel
order, find without sorting $rank
order, sort into $sort
previous result in a recursive definition $prev, $prev2
random numbers $rand, $randS, $random,
$randomS
row number, current $row
rows, count the number of $nRows, $nValid
test/train/validation set codes, interpret $ttv
sort into order $sort
sorting, find order without $rank
subset of a column $subCol
time merge $timeMerge, $tMerge
valid points, number found within a moving $moveValid
window

Plot Cuts

38 Insights User’s Guide


Section

See “Editing” on page A-34.

Insights User’s Guide 39


Signal Processing
correlation, covariance $corr, $correlation,
$covariance, $covarTD
discrete wavelet $dwt
fft, inverse fft $fft, $ifft
power spectrogram $sgram
power spectrum $spec
principal component matrix, weighted $pca

Status
break status, assign $break
certainty, query or set $certainty, $setCert
change status $changestat, $forcestat
cut status, assign $cutStat
error status, assign $error
query status $status, $valid

Strings
create string $str
combine multiple strings into a single string $str
compare string values $compare
convert ASCII codes to characters $char
convert number to string $fmt
convert string to number $val
duplicate string contents $dup
find position of a character in a string $pos
make new string $str
read date/time from string $dtread
search & replace within string $subst
substring $left, $mid, $midN, $right
write date/time to string $dtwrite

40 Insights User’s Guide


Section

Type Conversion
See also “Type Forcing”, below.

any type to string $str


ASCII codes from characters $ord
ASCII codes to characters $char
characters from ASCII $char
characters to ASCII $ord
date/time from numerics $dtmake
date/time from string $dtread
date/time to numeric day of month $day
date/time to numeric day within week $weekday
date/time to numeric day within year $yearday
date/time to numeric hour $hour
date/time to numeric milliseconds $millisec
date/time to numeric minutes $minute
date/time to numeric month $month
date/time to numeric seconds $second
date/time to numeric year $year
date/time to string $dtwrite
day of month (numeric) from date/time $day
day number within week from date/time $weekday
day number within year from date/time $yearday
elapsed time from string $etread
elapsed time to string $etwrite
hour (numeric) from date/time $hour
length of string $len
milliseconds (numeric) from date/time $millisec
minutes (numeric) from date/time $minute
month (numeric) from date/time $month
numerics to date/time $dtmake
number from string $val
number to string $fmt

Insights User’s Guide 41


seconds from date/time $second
string from any type $str
string from date/time $dtwrite
string from elapsed time $etwrite
string from number $fmt
string to date/time $dtread
string to elapsed time $etread
string to number $val
year number from date/time $year

Type Forcing
force to date/time $dt
force to double precision $double
force to integer $int
force to real number $real
force to string $str

42 Insights User’s Guide


B
B Error Measures

When training a model or making What If calculations, the model trainer computes a
relative error that indicates the discrepancy between the actual output values in the
dataset and the predicted output values generated by the model.
A relative error, if computable, is a real number greater than 0. A relative error of 0.
would indicate that a model can perfectly predict outputs from inputs. A relative error of
1. would indicate that a model predicts as well as predicting the mean of the data.
Generally, a model with a relative error less than about 0.8 can be useful. If a relative
error is not computable, for example, if there is only a single pattern in the dataset, it is
assigned a value of -1.
The relative error is computed at three levels:
• For individual output variables: rel-errout
• For the output variables as a composite for each pattern: rel-errpat
• For the patterns as a composite for the entire dataset: rel-errtot
Squared errors are used to compute relative errors. Definitions of these error measures
follow.

Insights User’s Guide 1


sq-errout
The squared error for an individual output variable is defined as:

2
sq-err out = ( y out – ŷ out )

where y out is the actual value for the output variable in the pattern and ŷ out is the output
value derived by the model.

sq-errpat
The squared error for a pattern is the sum of the squared errors for each output in the
pattern:

N outs

sq-err pat = å sq-err out


out = 1

where out is the output index and N outs is the total number of outputs in the pattern.

2 Insights User’s Guide


Section

sq-errtot
The squared error for a dataset is the sum of the squared errors for each pattern in the
dataset:

N pats

sq-err tot = å sq-err pat


pat = 1

where pat is the pattern index and N pats is the total number of patterns in the dataset.

rel-errout
The relative error for an individual output variable is defined as:

sq-err out
rel-errout = -------------------
σ 2 out

2
where σ out is the variance of the actual values of the output.

The standard deviation for an output is computed from all the values for the output in
the dataset used during training.

Insights User’s Guide 3


rel-errpat
The relative error for a pattern is defined as:

sq-err pat
rel-err pat = ----------------------------------
2
-
N outs × σ all-outs

where

N outs
2 1 2
σ all-outs = -----------
N outs å σ out
out = 1

rel-errtot
The relative error for an entire dataset is defined as:

sq-err tot
rel-errtot = ----------------------------------------------------
2
-
N pats × N outs × σ all-outs

4 Insights User’s Guide


Section

Relationship to R2
Relative error is not the same as the commonly-used statistical measure, R2. If relative
error is less than or equal to 1., the two measures are related as follows:

2 2
R = 1 – rel-err
If relative error is greater than 1., R2 is undefined and is displayed as zero (0.).

Insights User’s Guide 5


6 Insights User’s Guide
C
Frequently-Asked
C

Questions

These are some frequently-asked questions (FAQs) plus additional tips and hints.

1. How do I print from Insights? What is my default printer?


To print, click the Print button that appears in the Spreadsheet window, Plotter window,
and various other windows in Insights. In the Print window that appears, you need to
specify the printer’s complete path name in the “Send to printer” field.
If you do not know the complete path name of your printer, you can display it by
bringing up the Word Pad accessory: go to the Windows NT Start menu, and select Start
> Programs > Accessories > Word Pad. When the Word Pad window appears, select File
> Print. In the Word Pad Print window that appears, the Name field shows the full path
name of your default printer. Copy the complete printer path name, exactly as it appears,
into the “Send to printer” field in the Insights print window.

2. How do I create a directory or folder?


Bring up the Windows NT Explorer from the Start menu: select Start > Programs >
Windows NT Explorer.

Insights User’s Guide 1


Using the Windows NT Explorer, locate a folder where you can create a new work
folder. Create your work folder there by clicking the right mouse button and selecting
New > Folder from the pop-up menu. When the new folder appears, its name, New
Folder, is highlighted so that you can type a new name for it. Type the new name and
press return.

3. How do I change the working directory and data dictionary?


To set the working directory, bring up the properties window for the shortcut that you
use to start Insights. Set the “Start in” field to the desired directory. The working
directory is in effect whenever you start Insights using the shortcut. By default, the data
dictionary is created in the working directory that you have defined for Insights. Most
Insights browser windows, which appear when you load a file format, dataset or model,
allow you to change the data dictionary.

4. How do I add variables or data to an existing dataset?


To add variables or data from a file to an existing dataset, follow these steps:
1. Make sure the new data includes date/time information. Date/time information is
necessary to merge the new data with the data already in the dataset.
2. If you have not formatted the file using the formatter, do so now. Invoke the
formatter by selecting either the New or the Copy operation in the Tools > File
Formatter menu.
3. Load the existing dataset in the spreadsheet if not already loaded. To invoke the
spreadsheet, select Tools > Data Spreadsheet.
4. In the spreadsheet, add the new data as variables by selecting Add New Variables
from the Dataset menu. To add the new data as rows, select Add New Rows from
the Dataset menu. In either case, the operation prompts you to select the new
formatted file.
5. Time-merge the dataset.

5. There is a date & time variable that the formatter can’t interpret.
1. First, check the format keys in “Units” on page 4-12, to be sure that the date/time
style can’t be understood by the formatter. Most date/time styles can be understood
if you specify the Units correctly.
2. If the date/time truly cannot be interpreted by the formatter, try to use the editor to
patch the file. If even this is not possible, in the formatter set the variable Type

2 Insights User’s Guide


Section

(described on page 4-15) to String. For example, the formatter can read date
followed by time, or time followed by date, but not the style
Thu Aug 1 08:30:00 1991
where the time is inserted between parts of the date.

Insights User’s Guide 3


3. After you finish formatting the file and create a dataset from it, use a transform to
build a date/time variable from the input string. For example,

4. Look in the spreadsheet to find the column number of the new date/time variable
that was created by the transform.

4 Insights User’s Guide


Section

5. In the After Transforms view of the dataset, in the header rows at the top of the
dataset, select the Time Col cells for all of the data variables that should use this
date/time variable, and then, in the Edit area, type in its column number.

Insights User’s Guide 5


6. I’m trying to change the Date/Time pointers for several columns in
the Before Transforms view of the dataset, but after every change it
reevaluates the entire transform list.
This hint works only when you change the Date/Time pointers in the Before Transforms
view; it cannot be used when you change them in the After Transforms view, as
described above.
1. First, save the dataset. This is critical.
2. Open the Transform window. In the Edit menu (above the transform list), select
Delete All. This will remove all transforms from the copy of the dataset that is
loaded into the program, but will not affect the transforms that were saved on disk.

CAUTION: At this point, do not save the dataset again, or you will
destroy all of its transforms.

3. Display the Before Transforms view of the dataset, and change the Date/Time
pointers that you wanted to.
4. From the File menu in the spreadsheet, select Inherit Transforms. Select the name
of the dataset that you just saved. This will bring the transforms back in from disk,
evaluating them once only.
5. If any problem occurs, simply abandon the changed version of the dataset and
reload the saved version. If the transforms are evaluated correctly with no
problems, you may now save the dataset.

7. I made a dataset, and it thinks that some of my numbers are strings.


It is best to correct this problem in the formatter, before you create the dataset. There are
a number of situations that can cause a column of raw data to be misinterpreted as
strings. In Step 2 of the formatter (described on page 4-10), you should check the Type
of every column, and correct them if necessary.

6 Insights User’s Guide


Section

An alternative method is available if it is not practical to go back to the formatter. For


example, if the variable flow1 is a string, but should have been numeric:
1. Use the $val transform (described on page A-26) to create a new numeric
variable, flow1_num.

2. Since flow1_num is a computed column, use the Copy Values function (described
on page 5-32) to copy it into a raw column, flow1_raw.

Insights User’s Guide 7


3. You no longer need flow1_num, so delete the transform that created it (described
on page 7-20).
4. You no longer need the original string variable flow1, so delete it (described on
page 5-34).
5. Optionally, you may find it convenient to move to the Before Transforms view of
the spreadsheet, and change the name flow1_raw to flow1.

8. The dataset already has a Time Merge, but now I have added some
more variables, and they have to be Time Merged also. Do I have to sit
through two Time Merges every time it evaluates the Transform List?
No. Time Merge is implemented as a transform, so you can modify the original Time
Merge to include the new variables also. The syntax of the Time Merge transform

8 Insights User’s Guide


Section

(described on page A-31) is fairly complex, so you generally use the Time Merge
window (described on page 5-38) to create it; but it is not difficult to add a new variable.
1. In the Transform window, locate the Time Merge in the transform list. Double-click
on it to bring it up into the Expression box to be modified. It will look
approximately like this:

Insights User’s Guide 9


2. This transform function has a number of arguments; the last argument is the list of
date/time variables or unattached variables that are being Time Merged. Scroll to
the right until you find the end of the transform.

3. Click in the transform, just to the left of the closing parenthesis. Type a comma,
followed by the name of the new date/time variable.

4. Hit the Return key, or click Modify; then click Update Dataset. The transform list
will be reevaluated. All of your variables will be merged now, in a single transform.

10 Insights User’s Guide


Section

9. Two variables have the same Date/Time column, but I want them to
use different Tfme Merge methods.
This can be done, but it requires two Time Merges. For this example, we will use a
dataset with one date/time column, dt, and two data columns, data1 and data2.
1. Use the Duplicate function in the Operations buttons at the bottom of the
spreadsheet window to duplicate the date/time column, dt, into dt2. For this
example, the new date/time column will be column number 4.

Insights User’s Guide 11


2. Move to the Before Transforms view of the dataset, and change the date/time
pointer for data2 to the column number of dt2, column 4: click in the TimeCol
cell for data2, type 4, and press the Return key.
3. From the After Transforms view of the dataset, open the Time Merge window. You
can click Show Dependent Variables to show which data column uses each date/
time column.

12 Insights User’s Guide


Section

4. Select only dt, then fill in the rest of the Time Merge information, and merge.

Insights User’s Guide 13


5. Open the Time Merge window again. dt has been removed from the dataset and
replaced with merged_time, but dt2 and data2 have not been affected. Now
select only dt2, then fill in the rest of the Time Merge information, and merge.

14 Insights User’s Guide


Section

6. Now change the date/time pointer of data2 so that it is the same as the date/time
pointer of data1. You can do this from the spreadsheet, as described in several
other Hints above; or you can simply add the transform directly:
!data2! = $changeDate(!data2!, !merged_time!)
An alternative method would be to use the $tmerge transform, described on page
A-25.

10. I want a transform to make a real number, but it comes out as an


integer.
Use the $real transform to force the output to be a real number. For example, for a
dataset in which !real! is a real number and !igr! is an integer,
!new! = $if(!real! > !igr!, !real!, !igr!)
produces an output variable that is an integer, but
!new! = $real($if(!real! > !igr!, !real!, !igr!))
produces an output variable that is a real number.

Insights User’s Guide 15


11. The Statistics window tells about Quartiles, but I need to find the
“Deciles,” that is, the top and bottom 10% of the data distribution.
1. Look in the statistics window to find the total number of valid points in the
variable. Figure what numbers are 10% and 90% of this number. In this example,
the variable flow1 has 5650 valid points, so remember 565 and 5085.

16 Insights User’s Guide


Section

2. In the Plot window, make a Histogram plot of the variable. Turn on the Cumulative
toggle button. Set the Number of Bins to a number so large that you cannot see the
individual bins.

Insights User’s Guide 17


3. Drag the horizontal crosshair up until it reaches the number that is 10% of the total
valid points (you may need to Zoom the plot to see this exactly).

18 Insights User’s Guide


Section

4. Zoom again if necessary, then use the Info tool to find the range of values in the
histogram bin at this point. This value is the first decile.

Click Info tool, then mouse here

5. Similarly, find the tenth decile where the distribution is 90% of the total valid
points.

Insights User’s Guide 19


12. I ran a Predicted vs. Actual analysis, and wrote the predictions into
the dataset. Now I want to draw a Time Series plot of the predicted and
actual data, but the prediction column doesn’t have a date/time pointer.

When the prediction column is created, its date/time pointer (displayed in the
spreadsheet in the header row called Time Col) is none. Simply click in this cell and
type in the column number of the date/time column.

13. My data dictionary doesn’t match my files.


If you move a dataset or model from one directory to another, the data dictionary will
still look for it in its former location. You can remove the obsolete entries by selecting
Delete Dataset or Delete Model, as appropriate, and then selecting “Just Remove From
DD” when asked to confirm. You can always load any dataset or model, regardless of
whether it is listed in the data dictionary, by using the Browse button, which appears in
all windows used to load file formats, datasets, or models.

20 Insights User’s Guide


D
D Files

This Appendix documents all files created and used by Insights. All filenames are case-
sensitive on computer systems that are case-sensitive. All files are ASCII, except that
datasetname.pi_data can be either ASCII or binary, depending on how you
saved the dataset; you need to know which format it is if you use FTP to transfer a
dataset from one machine to another. If you move or copy a dataset or model from one
directory or machine to another, you must move or copy all files that are marked
“required” in this table.

Log Files
session_n.pi_script
A record is kept of all actions that you take during each session of Insights. This
record is saved in a file called session_n.pi_script, where n is a sequence
number that increments with every session until it reaches the limit that is specified
by the symbolic name PAVILION_SCRIPTS, and then restarts at 1. The current
file must be retained but old files may be deleted. The file is a Visual Basic (VB)
script. If you know VB, you can edit the file as needed. You can use the script to run
Insights from another application. You can run a specific session either by selecting

Insights User’s Guide 1


the Tools > Run Script operation or by executing a command line like the
following:
insights -run session_n.pi_script
Pavilion_date.trace
A record is kept of major actions that you take during all sessions of Insights on the
same date. This record is saved in a file called Pavilion_date.trace, in the
directory specified by the symbolic name PAVILION_DIR. These files may be
deleted.

Data Dictionary File


Pavilion.pi_dd
You can set the data dictionary to be any file by setting the symbolic name
PAVILION_DICT. If you do not assign a value to PAVILION_DICT, the data
dictionary defaults to a file named Pavilion.pi_dd in the directory specified
by the symbolic name PAVILION_DIR. While Insights is running, you can
change the current data dictionary to any file, in any window that reads from or
writes to the data dictionary.

Print Files
file.ps
When you print from Insights, you have the choice of printing directly to a printer
or writing a PostScript file. If you write to a file, the default filename is file.ps.

Format File Suffixes


.pi_fmt
.pi_fmt_n
Default suffix for format files, defined on page 4-1. The prefix is the same as the
prefix of the raw data file. The form with a sequence number is used if another
format file already exists with the same name. You can delete format files with
Delete Format described on page 4-25.

2 Insights User’s Guide


Section

Dataset File Suffixes


.pi_data
.pi_logical
.pi_transform
Required files for a dataset. If you move the dataset from one directory (or
machine) to another, you must move all three of these files.

Dataset Report File Suffixes


.pi_dsr
Default name of an optional dataset report file.
.pi_sr
Default name of an optional statistics report file.
.pi_tdi
Default name of an optional time delay identification report file.
.pi_tsr
Default name of an optional time statistics report file.

Model File Suffixes


.pi_model1, …, .pi_model4
Always required for all models.
.pi_model5, …, .pi_model7
.pi_trainhst1
Required for all trained models
.pi_model10
Optional file that contains confidence information; required if you want to calculate
confidence values in What Ifs or at runtime.
.pi_NSLog
Optional text file created if the Stiff trainer is used, containing information about
the training.

Insights User’s Guide 3


Model Report and Data File Suffixes
All of these output files are optional; the names are defaults and any other names can be
used. Report files are organized for a person to view; data files are organized to be
formatted and read into a dataset.
.outpct
Output vs. % report file.
.outpct_data
Output vs. % data file.
.pi_vbr
Variable bounds report file.
.pred
Predicted vs. Actual report file.
.preddata
Predicted vs. Actual data file.
.sens
Sensitivity report file.
.sens_vals
Sensitivity values (data) file.
.setp
What Ifs report file.
.setpdata
What Ifs data file.
.snspct
Sensitivity vs. % report file.
.snspct_data
Sensitivity vs. % data file.

4 Insights User’s Guide


E
EUser-Defined
Transforms

This chapter explains how to develop and add your own transforms for use by the
transform calculator.
If the transform calculator does not provide all the services you need, you can add your
own transforms. Adding a transform requires that you write the transform in the C
programming language.
Some example user-define transforms are provided, and appear in the User-Defined
group of functions in the transform calculator’s Functions and Constants list (but do not
appear in the All functions list).
You can add at most ten user-defined transforms.
A user-defined transform must return one vector as output. This restriction means, for
example, that you cannot develop a fast fourier transform (fft) because it returns two
vectors, one real and one imaginary. You could, however, create two user-defined
transforms, fft_real and fft_imaginary.
The inputs to the user-defined transforms have no restrictions.

Insights User’s Guide 1


There is a file in the Pavilion software distribution, examples/userdef/
userdt.c, that contains sample user-defined transforms. This file is the primary one
that you will modify.
Basic instructions for compiling userdt.c and building the .DLL are in
userdef_pnt1.bat, which is also located in the examples/userdef/
directory.

2 Insights User’s Guide


Index

Symbols
! 4-12, 7-8, 7-24
" 7-25
$ 7-8, 7-24
: 7-12, 7-16
; 7-8, 7-16
[ ] 7-24
’ 7-25

Numerics
2000 year xv

A
ABB AEH data extractor 2-1
$abs A-3
$acos A-3
$acosd A-3
Action menu 11-6
selection region 11-5

Insights User’s Guide i


adding datasets 5-50
adding new rows 5-48
AEH data extractor 2-1
After Transforms 5-14
Analysis Value 5-45, 10-23, 10-31
Analysis Variable Range see AVR
analyzer 1-9
$and 7-14
apply 7-25
arccosine A-3
arcsine A-4
arctangent A-4
$arx A-3
$ascend A-23
ASCII data type 5-52
ASCII editor 3-1
$asin A-4
$asind A-4
$atan A-4
$atand A-4
Auto Learning Rate Adjustment 9-16
auto modeler wizard 1-4
Autostop 9-15
$average A-4
Average Absolute sensitivity 10-10
Average sensitivity 10-10
AVR 5-45, 10-23, 10-31
menu 5-59, 10-32

B
$b_change A-29
$b_dcont A-28
$b_dfirst A-28
$b_dlast A-28
$b_dnum A-28
$b_fall A-29
$b_index A-29
$b_last A-28
$b_max A-28
$b_mean A-28
$b_min A-29
$b_nvalid A-29
$b_rise A-30
$b_slope A-29
$b_std A-29
$b_time A-29

ii Insights User’s Guide


Section

$b_value A-29
Back button 1-3
$batch A-28
batch transforms A-27
$batchBreak A-29
$batchIndex A-29
$batchX A-30
Before Transforms 5-14
changing dataset 5-16, 6-29
best epoch 9-21, 9-23
best inputs 8-21
$biasSensor A-5
Binary data type 5-52
bounds checking A-6
boxcar 5-39
braces, curly{ } 7-24
brackets, square 7-24
$break A-5
breakpoint 7-22
browser 1-3
button layout 1-5

C
carriage return 3-1
$center A-2
$certainty A-5
certainty 5-42, 9-16, A-22, A-31
$changeDate A-30
$changelen A-6
$changestat 5-16, 6-29, A-30
$char A-6
$checkFlatline A-6
$checkRange A-6
$checkRate A-6
clamping 11-21
in Utilities menu 11-4
clear
dataset 5-55
model 8-62
$clearRows 5-56, A-6
Clip tool 6-33
$clipabove 6-30, A-32
$clipbelow 6-30, A-32
closed-loop control data 8-46
colon (:) 7-12, 7-16
colored dot 6-10, 10-18

Insights User’s Guide iii


column 5-2
length 7-12, A-6
width 4-7, 4-11, 5-13
comment
dataset 5-51, 8-63
model 8-63
transform 7-8
variable 4-3
$compare A-7
computed variable 5-2
Date/Time reference 5-16
conditional expressions 7-14
confidence 11-13, 11-25
view 11-13
connections 8-32
constraint 11-21, 11-22, 11-25
gain 8-34
continuous update 6-3, 11-9
converting
character to number A-20
date/time to string A-10
number to character A-6
number to date/time A-10
number to string A-14
string to date/time A-10
string to number A-26
to date/time A-9
to double precision A-9
to integer A-15
to real A-21
to string A-24
transforms for A-41
copy 1-5
copy dataset see Save Dataset As
$copybreak A-7
copyright 2, xvi
$copyrows 5-56, A-7
$corr A-7
correlated inputs 8-46
$correlation 6-22, A-8
correlation plot 6-22
$cos A-8
$cosd A-8
$covariance 6-22, A-8
$covarTD A-8
crosshairs 6-27
curly braces 7-24

iv Insights User’s Guide


Section

current row 11-14


custom model 8-3
internal parameters 8-33
linear 8-34
Customize Toolbar window 1-5
cut 1-5
Cut Box tool
Predicted vs. Actual plot 10-9
preprocessor plot 6-34
Cut X tool 6-35
Cut Y tool 6-34
$cutabove 6-30, A-32
$cutbelow 6-30, A-32
$cutstat A-8

D
data
maximum value xv
raw 5-1
transformed 5-1
data dictionary 5-56, 9-9, C-2
delete dataset 5-12
delete model 9-6
data extractor 2-1, 5-4
data extractor wizard 1-4
data file 1-3
creating 3-5
deleting 3-13
editing 3-6
data pattern 8-48
data plotter 1-8
data spreadsheet 1-7
data type 5-52
dataset 5-1
adding new rows 5-48
adding new variables 5-47
adding together 5-50
After Transforms 5-16, 6-29
Before Transforms 5-14, 5-16, 6-29
clearing 5-55
copy see Save Dataset As
creating 1-4, 5-4
delete 5-12
deleting 5-55
editing 5-56
inheriting transforms 5-50

Insights User’s Guide v


loading 5-10, 9-3
mapping into model 8-8
name 5-51
reordering 5-58
report file 5-52
saving 5-50
searching 5-24, 5-25, 5-26, 5-27, 5-28
sorting 5-58, 10-21
transferring 5-52
writing model predictions into 10-5
date units 4-12
Date/Time
combined in a single column 4-16, 5-12
Error 4-18
pointer 5-3, 5-14, C-5, C-6, C-20
reference 4-16, 5-3, 5-34, 7-2, A-30
transforms A-34
units 7-9
$day A-9
debugging transforms 7-22
deciles C-16
delete
data file 3-13
dataset 5-55
format 4-25
model 9-9
variable 5-34
$deleterows 5-56, A-9
$delta A-9
depend 5-2
dependent variable 8-2
initial 11-3
predicted 11-3
$descend A-23
dictionary C-2
$differs A-9
directory C-2
discrete wavelet transform A-11
display
cuts 6-10
legends 6-9
display format 4-19
dollar sign 7-8, 7-24
$double A-9
double quotes 7-25
$dt A-9
$dtadd A-9

vi Insights User’s Guide


Section

$dtcreate A-10
$dtdiff A-10
$dtmake A-10
$dtread A-10
$dtround A-10
$dtwrite A-10
$dup A-11
$duprows 5-56, A-11
$dwt A-11

E
$e A-11
$e_exact A-11
$e_lessthan A-11
$e_range A-11
Edit menu 1-5
editing
data file 3-6
dataset 5-15
format 4-25
format headers 4-16
model connections 8-32
transforms 7-18
transforms for A-34
editor 3-1
saving file 3-12
search and replace 3-9
eigenvalue 6-21, 8-62
eigenvector 8-62
$encode A-11
Enterprise Historian data extractor 2-1
epoch 8-48, 9-21, 9-23
$error A-12
error computation 11-22, 11-25
Error History plot 9-19
$etread A-12
$etwrite A-12
exclamation point 4-12, 7-8, 7-24
Exclude 10-20
$exp A-13
$expave A-13
external model 8-4
extractor 2-1
extractor see data extractor 5-4
extrapolate 5-35
extrapolation 5-39

Insights User’s Guide vii


extrapolation training 8-34, 8-39

F
FANN model 8-2
filtering 9-17
phase 8-3
time delay 8-12
$fft A-14
file editor 3-1
menu 1-7
file format 1-3, 1-7
File menu 1-3, 9-3
File Transfer Protocol 5-52
filenames D-1
filter 8-49, 8-57, 8-64
disable 8-57
use 8-57
with FANN model 9-17
$filter_disable A-4, A-13
$filter_freeze A-13
$filter_smooth A-4, A-13
filtering transforms A-36
Final Epoch 9-15
final value 11-21, 11-25
find best inputs 8-21
$findle 7-15, 8-56, A-14
flatline detection A-6
$fmt A-14
focus 7-17
$forcestat A-14
Format 4-1
column separator 4-7
columns 4-10
copy 4-20
delete 4-25
edit 4-25
error 4-18
key concepts 4-27
new 4-3
row flags 4-7
rows 4-6
verify 4-18
Format File 4-1
format file 1-7
Formatted File 4-1, 5-1
formatter 1-3

viii Insights User’s Guide


Section

menu 1-7
$forward A-11
Forward button 1-3
Fourier transform A-14
inverse A-15
freeze tool 6-38
FTP 5-52
fuzzy constraint 11-25

G
gain 10-12
gain constraint 8-34
gain constraints 9-23
gap 5-41
Gaussian random A-27
get 7-24
graph type 6-4

H
hard constraints 11-21
$heartBeat A-15
help
positional 1-10
Help menu 1-9
histogram plot 6-17
$holdLast A-15
Home button 1-3
home page 1-3
$hour A-15

I
IEEE standard xv
$if 7-14, A-15
if 7-25
$ifft A-15
Include Box 10-18
Include Left 10-17
independent variable (dataset) 5-2, 7-12
independent variable (model) 8-2, 11-3
$inf A-15
infinity A-15
Info
Predicted vs. Actual plot 10-7
preprocessor plot 6-31

Insights User’s Guide ix


Sensitivity vs. Rank plot 10-14
initial value 11-21
input validation A-6
inputs
best 8-21
selecting 8-21
$insertrows 5-56, A-15
$int A-15
internal parameters 8-33
interpolate 5-35
interpolation 5-39
invalid mix 7-2
$inverse A-11
$isbadstatus A-16
$isvalid A-16

J
$join A-16

L
$lag A-2
$lead A-2
$left A-16
legends
display 6-9
$len A-16
length
of column 7-12, A-6
of string A-16
line type 6-4
linear extrend 5-39
linear model
custom 8-34
prediction 8-32
$ln A-16
loading
dataset 5-10, 9-3
editor file 3-3
model 9-4
$log A-17
logarithm A-16, A-17
$lookup A-17
$lookupRel A-17

x Insights User’s Guide


Section

M
$m_ignore 8-56, A-17
$m_test 8-56, A-17
$m_train 8-56, A-17
$m_valid 8-56, A-17
main window 1-3
$markcut 5-50, 6-30, A-32
mask 7-7
math transforms A-36
$max A-17
max gain 8-37
maximum data value xv
Maximum Time Gap 5-41
$mean A-18
$median A-18
median, approximating A-19
menu 1-3
method 5-39
$mid A-18
$midn A-18
$millisec A-18
$min A-18
min gain 8-37
$minute A-18
$mod A-18
model
analyzer 1-9
auto modeler wizard 1-4
builder 1-8
building 8-1
checklist 8-7
clearing 8-62
connectivity 8-32
converting type 8-5
copy 9-6, 9-7
custom 8-3, 8-34
delete 9-6
deleting 9-9
editing connections 8-32
external 8-4
extrapolation 8-34
FANN 8-2
gain constraint 8-34
internal parameters 8-33
linear 8-32, 8-34
loading 9-4

Insights User’s Guide xi


mapping dataset variables 8-8
methodology 8-7
modifying 8-70
pattern 8-48
PCR 8-4, 8-58
performance 9-11, 10-7
phase 8-3
prediction 8-2
removing variables based on sensitivity 10-16
rename 9-7
rename variables 9-7
saving 8-62
setting patterns 8-48
test set 9-13
time delays 8-29
trainer 1-8
training 8-34, 9-1, 9-11, 9-12, 9-13, 9-14
types 8-2
validating 10-2
variable bounds 8-65, 8-66, 8-67, 8-69, 9-5
variables 8-30
verifying test set 8-65
what ifs 1-9
writing predictions into dataset 10-5
Model Input Editor 10-30
model interval 8-10
Model Statistics 8-64
monitor training 9-18
$month A-18
$moveAve A-18
$moveExp A-19
$moveGauss A-19
$moveLS A-19
$moveMax A-19
$moveMed A-19
$moveMedA A-19
$moveMin A-19
$moveSD A-19
$moveValid A-19
moving files 5-52
moving window transforms A-2

N
n/a 4-16, 5-14
new format 4-3
newline 3-1

xii Insights User’s Guide


Section

noise A-26
$none A-19
nonlinear correlation 8-12
normalized plot 6-4
setting Y axis 6-8
$not 7-14
$now A-19
$nrows A-20
$nvalid A-20

O
$or 7-14
$ord A-20
original value 11-24
output
error 9-19
relative error 9-19
Output vs. % 10-22
errors 10-33
example 10-24
plot 10-34
selection window 10-27
stepping a variable 10-23
overlay 6-4
$override 5-50, A-31
overtrain 9-11

P
P.C.A. plot 6-18
paste 1-5
pattern 8-48
pav_info script xv
PCA 8-4
$pca A-20
PCR 8-4
PCR model 8-58
Peak sensitivity 10-10
phase 8-3
$pi A-20
plot
Before Transforms 6-2
Clip 6-33
colored dot 6-10
continuous update 6-3
correlation 6-22

Insights User’s Guide xiii


crosshairs 6-27
Cut Box 6-34
Cut X 6-35
Cut Y 6-34
display 6-9, 6-10
draw 6-3
Error History 9-19
Freeze 6-38
gain constraints 9-23
graph type 6-4
histogram 6-17
Info 6-31
legends 6-9
line type 6-4
lines 6-4
menus 6-1
normalized 6-4, 6-8
opening 5-56, 6-1
overlay 6-4
P.C.A. 6-18
points 6-4
prediction 9-21
printing 6-39
probability 6-16
row number 6-12
selecting Y variables 6-10
setting Y axis 6-8
stacked 6-4
stop 6-3
time series 6-13
tools 6-28
types 6-10
Uncut 6-37
window 6-1
XY 6-15
Y axis limits 6-27
Zoom 6-37
plot time delays 8-20
plotter 1-8
$pos A-20
positional help 1-10
Predict Inputs 11-1, 11-6
predict inputs 11-1
Predict Outputs 11-1, 11-6
usage checklist 11-28
Predicted vs. Actual 10-2
plot 10-6, 10-7

xiv Insights User’s Guide


Section

report file 10-4


validating a model 10-2
writing predictions into dataset 10-5
prediction model 8-2
prediction plot 9-21
Preprocess 5-1
$preserveRow A-20
$prev A-20
$prev2 A-20
principal components analysis 6-18, 8-4, A-20
principal components regression 8-4, 8-58
printing
plot 6-39
spreadsheet 5-31
statistics 5-21
priority 11-22
probability plot 6-16
Properties 5-43
properties C-2
pull-down menu 1-3

Q
quantile-quantile plot 6-16
quotes 7-25

R
R2 9-11, 10-2
$rand A-27
$random A-27
random numbers A-26
$randomS A-27
$randS A-27
range violation detection A-6
$rank A-21
rate constraints 11-21
rate of change violation detection A-6
raw data file 1-3
Raw Table Editor 10-31
raw variable 5-2
$real A-21
regular training 9-12
Relative Error 9-11, 10-2, B-1
relative error 9-19, B-5
release notes xiv
Replace Best with Current 9-24

Insights User’s Guide xv


report file
correlation 6-26
dataset 5-52
Output vs. % 10-28
Predicted vs. Actual 10-4
Sensitivity vs. % 10-38
Sensitivity vs. Rank 10-12
Setpoints & What Ifs 11-10
statistics 5-21
time delay identification 8-19
time statistics 5-23
variable bounds 8-69
residuals analysis 10-2
ridge training 9-14
$right A-21
$round A-21
$row A-21
row flags 4-7
row number plot 6-12
R-squared B-5

S
sampling interval 5-35
save
dataset 5-50
dataset report 5-52
edited file 3-12
model 8-62
Save Dataset As 5-50
Save Model As see Copy Model
$scale A-21
$scatcut 6-30, A-32
score 6-21
Search
dataset 5-24, 5-25, 5-26, 5-27, 5-28
search
editor 3-9
$second A-21
$self 7-9, A-22
semicolon (;) 7-8, 7-16
sensitivity 8-21
Sensitivity vs. Rank 10-9
interpretation 10-14
measures 10-9
plot 10-14, 10-17, 10-18, 10-20
removing model variables 10-16

xvi Insights User’s Guide


Section

report file 10-12


Sensitivity vs. % 10-36
measures 10-36
plot 10-39
selection window 10-37
stepping a variable 10-36
sensor bias A-5
set 7-24
$setcert A-22
setpoint display bar 11-4
legend 11-23, 11-27
scaling 11-20, 11-24
Setpoint Editor
accessed from Utilities menu 11-5
inputs 11-20
outputs 11-23
setpoint study 11-1
Setpoints & What Ifs 11-1
Action menu 11-5, 11-6
clamping 11-4, 11-21
confidence 11-13
constraints 11-21, 11-22, 11-25
continuous update 11-9
current row 11-14
display bar 11-4, 11-20, 11-23, 11-24, 11-27
Edit field 11-9
error computation 11-22, 11-25
final value 11-21, 11-25
initial value 11-21
original value 11-24
Predict Inputs 11-1, 11-6
Predict Outputs 11-1, 11-6
priority 11-22
report file 11-10
setpoint display bar 11-4, 11-20, 11-23, 11-24, 11-27
Setpoint Editor 11-5, 11-20, 11-23
Source 11-7
stripcharts 11-17
usage checklist 11-28
Utilities button 11-4
Utilities menu 11-4
variable, displaying name 11-4
view parameters 11-12
$sgram A-22
$shift A-22
shortcut C-2
$sigmoid A-23

Insights User’s Guide xvii


$sign A-23
signal processing transforms A-40
$sin A-23
$sind A-23
single quotes 7-25
smoothing transforms A-36
$sort A-23
sorting dataset
alphabetically 5-58
by sensitivity 10-21
source 7-25
Source, Setpoints & What Ifs 11-7
sparse data algorithm 9-16, 10-12
spikes in Time Merge 5-40
spline extend 5-39
spreadsheet 1-7
After Transforms 5-14
Before Transforms 5-14
changing contents 5-15
column width 5-13
Go To 5-23
positional help 5-14
printing 5-31
reordering 5-58
selecting multiple cells 5-14
statistics 5-17
window 5-12
$sqrt A-23
square brackets 7-24
stacked 6-4
start in directory C-2
starting 1-2
statistics
deciles C-16
equations 5-18
model see Model Statistics
printing 5-21
report file 5-21
time 5-22
variable 5-17
statistics transforms A-36
$status A-23
status 5-2, 5-16, A-6, A-8, A-26, A-30
searching for 5-28
transforms A-40
$stdev A-24
stepping a variable

xviii Insights User’s Guide


Section

Output vs. % 10-23


Sensitivity vs. % 10-36
stiff training 9-13
$str A-24
$strcode A-24
strength of inputs 8-21
string substitution A-24
string transforms A-40
stripcharts
Setpoints & What Ifs 11-17
training 9-22
$subcol A-24
$subst A-24
$sum A-24

T
tag name 4-3, 4-12, 5-8, 7-9
displaying 11-4
restrictions 4-12, 5-9
$tan A-25
$tand A-25
tau 8-11
Tcl variable 7-23
test set 8-48
interval 8-53
random 8-55
stiff training 9-13
variable 8-55
verifying 8-65
testing patterns see test set
text editor 3-1
Time Col see Date/Time reference
time delay plot 8-20
time delays 5-35, 8-10, 8-11
calculating automatically 8-12
mapping into model 8-29
specifying manually 8-29
time gap 5-41
time interval 5-35
Time Merge 5-35, C-8, C-11
interval 5-35
spikes 5-40
transform see $TimeMerge
when required 5-35
window 5-38
time series plot 6-13

Insights User’s Guide xix


time statistics 5-22
$timecut 5-50, 6-30, A-32
Time-Merge 5-35
$TimeMerge 5-47, A-31
$tm_average A-31
$tm_boxcar A-31
$tm_cut A-31
$tm_early_end A-31
$tm_early_start A-31
$tm_first A-31
$tm_last A-31
$tm_late_end A-31
$tm_late_start A-31
$tm_linear A-31
$tm_linearExtend A-31
$tm_sort A-31
$tm_spline A-31
$tm_splineExtend A-31
$tmerge A-25
$today A-25
toolbar 1-5
Tools menu 1-7
Tools, preprocessor plot 6-28
Train Rel Error 9-16
trainer 1-8
training 9-1
extrapolation 8-34
gain constraint 8-34
monitor 9-18
parameters 9-14
performance 9-11
regular 9-12
ridge 9-14
sparse 9-16
stiff 9-13
training patterns see test set
training set see test set
transferring files 5-52
transform calculator 1-8, 7-1
Transform List 5-2, 7-22
transforms 5-1, 5-2, 7-1
! 7-8, 7-24
$ 7-8, 7-24
: 7-12, 7-16
; 7-8, 7-16
[ ] 7-24
{ } 7-24

xx Insights User’s Guide


Section

batch A-27
braces, curly 7-24
brackets, square 7-24
breakpoint 7-22
colon (:) 7-12, 7-16
comment 7-8
conditional expressions 7-14
curly braces 7-24
Date/Time reference 7-2
debugging 7-22
deleting 7-20
depend 7-2
dollar sign 7-8, 7-24
editing 7-18
entering 7-16
entering multiple 7-11
errors 7-22
exclamation point 7-8, 7-24
for converting A-41
for editing A-34
for filtering A-36
for smoothing A-36
for strings A-40
for type forcing A-42
from plot tools 6-30
index numbers 7-6
inheriting 5-50
input 7-2
invalid mix 7-2
list 7-6
mask 7-7
math A-36
miscellaneous A-38
modifying 7-19
moving window A-2
multiple outputs 7-10
on date/times A-34
order 7-2
output 7-2
random numbers A-26
semicolon (;) 7-8, 7-16
signal processing A-40
square brackets 7-24
statistics A-36
status A-40
syntax 7-7
system-generated A-30

Insights User’s Guide xxi


user-defined 7-23, A-2, E-1
window 7-3
$trend A-25
trigger button 7-6
$trunc A-25
$ttv 8-56, A-25
type
model 8-2
variable 4-15

U
unattached variable 5-39
Uncut
Predicted vs. Actual plot 10-9
preprocessor plot 6-37
units 4-12
$unmarkcut 6-30, A-32
$unscatcut 6-30, A-33
$untimecut 6-30, A-33
update initial button 11-8
user-defined transforms 7-23, A-2, E-1
Utilities button 11-4
Utilities menu 11-4

V
$val A-26
$valid A-26
validation patterns see test set
validation set see test set 8-49
variable 5-2
adding new 5-47
computed 5-2
copy 5-32
deleting 5-34
depend 7-2
dependent 8-2, 11-3
displaying name 11-4
duplicate 5-32
for test set 8-55
in a model 8-30
independent 5-2, 7-12, 8-2, 11-3
mapping into model 8-8
moving 5-58
origin 5-39
properties 5-43

xxii Insights User’s Guide


Section

raw 5-2
rename 5-15, 9-7
reordering 5-58
sorting 5-58
tag name and comment 4-12, 5-2, 5-8, 5-9, 7-9
Tcl 7-23
time delay 8-10, 8-29, 8-30
type 4-15, 5-39, A-42, C-15
types in models 8-2
unattached 5-39
units 4-12
variable bounds 8-65, 9-5, 10-23
disabled 8-67
gain constraints 8-66
report 8-69
setting 8-66
variables
sorting 10-21
view 1-5
view parameters 11-12

W
wavelet transform A-11
web browser 1-3
$weekday A-26
Welcome page 1-3
What Ifs 11-1
what ifs 1-9
window layout 1-5
window, moving A-2
$withinpct 7-15, A-26
wizard
auto modeler 1-4
data extractor 1-4, 5-4
working directory C-2

X
$xor 7-14
XY plot 6-15

Y
Y axis limits 6-27
Y variables 6-10
$year A-26

Insights User’s Guide xxiii


year 2000 xv
$yearday A-26

Z
Zoom tool 6-37

xxiv Insights User’s Guide

You might also like