Modeler Scripting Automation
Modeler Scripting Automation
3 Python Scripting
and Automation Guide
IBM
Note
Before you use this information and the product it supports, read the information in “Notices” on page
453.
Product Information
This edition applies to version 18, release 3, modification 0 of IBM® SPSS® Modeler and to all subsequent releases and
modifications until otherwise indicated in new editions.
© Copyright International Business Machines Corporation .
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
Contents
iii
Setting properties.................................................................................................................................30
Creating nodes and modifying streams.....................................................................................................31
Creating nodes......................................................................................................................................31
Linking and unlinking nodes.................................................................................................................31
Importing, replacing, and deleting nodes........................................................................................... 33
Traversing through nodes in a stream................................................................................................. 34
Clearing, or removing, items......................................................................................................................34
Getting information about nodes.............................................................................................................. 35
iv
cognosimport Node Properties................................................................................................................. 87
databasenode properties.......................................................................................................................... 91
datacollectionimportnode Properties....................................................................................................... 92
excelimportnode Properties...................................................................................................................... 96
extensionimportnode properties...............................................................................................................97
fixedfilenode Properties............................................................................................................................ 99
gsdata_import Node Properties..............................................................................................................104
jsonimportnode Properties......................................................................................................................104
sasimportnode Properties.......................................................................................................................104
simgennode properties........................................................................................................................... 105
statisticsimportnode Properties..............................................................................................................107
tm1odataimport Node Properties........................................................................................................... 107
tm1import Node Properties (deprecated).............................................................................................. 108
twcimport node properties......................................................................................................................109
userinputnode properties........................................................................................................................110
variablefilenode Properties..................................................................................................................... 111
xmlimportnode Properties...................................................................................................................... 116
v
Graph node common properties............................................................................................................. 185
collectionnode Properties....................................................................................................................... 186
distributionnode Properties.................................................................................................................... 187
evaluationnode Properties...................................................................................................................... 188
graphboardnode Properties.................................................................................................................... 190
histogramnode Properties.......................................................................................................................195
mapvisualization properties.................................................................................................................... 196
multiplotnode Properties........................................................................................................................ 200
plotnode Properties................................................................................................................................. 201
timeplotnode Properties......................................................................................................................... 204
eplotnode Properties...............................................................................................................................205
tsnenode Properties................................................................................................................................ 206
webnode Properties................................................................................................................................ 208
vi
treeas properties..................................................................................................................................... 317
twostepnode Properties.......................................................................................................................... 319
twostepAS Properties..............................................................................................................................320
vii
Microsoft Model Nugget Properties .................................................................................................. 345
Node Properties for Oracle Modeling......................................................................................................347
Oracle Modeling Node Properties .....................................................................................................347
Oracle Model Nugget Properties .......................................................................................................354
Node Properties for IBM Netezza Analytics Modeling............................................................................355
Netezza Modeling Node Properties................................................................................................... 355
Netezza Model Nugget Properties..................................................................................................... 369
viii
xgboostlinearnode Properties................................................................................................................. 424
xgboosttreenode Properties....................................................................................................................426
Notices..............................................................................................................453
Trademarks.............................................................................................................................................. 454
Terms and conditions for product documentation................................................................................. 454
Index................................................................................................................ 457
ix
x
Chapter 1. Scripting and the Scripting Language
Scripting overview
Scripting in IBM SPSS Modeler is a powerful tool for automating processes in the user interface. Scripts
can perform the same types of actions that you perform with a mouse or a keyboard, and you can use
them to automate tasks that would be highly repetitive or time consuming to perform manually.
You can use scripts to:
• Impose a specific order for node executions in a stream.
• Set properties for a node as well as perform derivations using a subset of CLEM (Control Language for
Expression Manipulation).
• Specify an automatic sequence of actions that normally involves user interaction--for example, you can
build a model and then test it.
• Set up complex processes that require substantial user interaction--for example, cross-validation
procedures that require repeated model generation and testing.
• Set up processes that manipulate streams—for example, you can take a model training stream, run it,
and produce the corresponding model-testing stream automatically.
This chapter provides high-level descriptions and examples of stream-level scripts, standalone scripts,
and scripts within SuperNodes in the IBM SPSS Modeler interface. More information on scripting
language, syntax, and commands is provided in the chapters that follow.
Note:
You cannot import and run scripts created in IBM SPSS Statistics within IBM SPSS Modeler.
Types of Scripts
IBM SPSS Modeler uses three types of scripts:
• Stream scripts are stored as a stream property and are therefore saved and loaded with a specific
stream. For example, you can write a stream script that automates the process of training and applying
a model nugget. You can also specify that whenever a particular stream is executed, the script should
be run instead of the stream's canvas content.
• Standalone scripts are not associated with any particular stream and are saved in external text files.
You might use a standalone script, for example, to manipulate multiple streams together.
• SuperNode scripts are stored as a SuperNode stream property. SuperNode scripts are only available in
terminal SuperNodes. You might use a SuperNode script to control the execution sequence of the
SuperNode contents. For nonterminal (source or process) SuperNodes, you can define properties for the
SuperNode or the nodes it contains in your stream script directly.
Stream Scripts
Scripts can be used to customize operations within a particular stream, and they are saved with that
stream. Stream scripts can be used to specify a particular execution order for the terminal nodes within a
stream. You use the stream script dialog box to edit the script that is saved with the current stream.
To access the stream script tab in the Stream Properties dialog box:
1. From the Tools menu, choose:
Stream Properties > Execution
2. Click the Execution tab to work with scripts for the current stream.
Use the toolbar icons at the top of the stream script dialog box for the following operations:
• Import the contents of a preexisting stand-alone script into the window.
• Save a script as a text file.
• Print a script.
• Append default script.
• Edit a script (undo, cut, copy, paste, and other common edit functions).
• Execute the entire current script.
• Execute selected lines from a script.
• Stop a script during execution. (This icon is only enabled when a script is running.)
• Check the syntax of the script and, if any errors are found, display them for review in the lower pane of
the dialog box.
Note: From version 16.0 onwards, SPSS Modeler uses the Python scripting language. All versions before
16.0 used a scripting language unique to SPSS Modeler, now referred to as Legacy scripting. Depending
on the type of script you are working with, on the Execution tab select the Default (optional script)
execution mode and then select either Python or Legacy.
You can specify whether a script is or is not run when the stream is executed. To run the script each time
the stream is executed, respecting the execution order of the script, select Run this script. This setting
provides automation at the stream level for quicker model building. However, the default setting is to
ignore this script during stream execution. Even if you select the option Ignore this script, you can always
run the script directly from this dialog box.
The script editor includes the following features that help with script authoring:
• Syntax highlighting; keywords, literal values (such as strings and numbers), and comments are
highlighted.
• Line numbering.
• Block matching; when the cursor is placed by the start of a program block, the corresponding end block
is also highlighted.
• Suggested auto-completion.
The colors and text styles that are used by the syntax highlighter can be customized by using the IBM
SPSS Modeler display preferences. To access the display preferences, choose Tools > Options > User
Options and select the Syntax tab.
A list of suggested syntax completions can be accessed by selecting Auto-Suggest from the context
menu, or pressing Ctrl + Space. Use the cursor keys to move up and down the list, then press Enter to
insert the selected text. To exit from auto-suggest mode without modifying the existing text, press Esc.
The Debug tab displays debugging messages and can be used to evaluate script state once the script is
executed. The Debug tab consists of a read-only text area and a single-line input text field. The text area
displays text that is sent to either standard output or standard error by the scripts, for example through
error message text. The input text field takes input from the user. This input is then evaluated within the
context of the script that was most recently executed within the dialog (known as the scripting context).
The text area contains the command and resulting output so that the user can see a trace of commands.
The text input field always contains the command prompt (--> for Legacy scripting).
A new scripting context is created in the following circumstances:
• A script is executed by using either Run this script or Run selected lines.
• The scripting language is changed.
If a new scripting context is created, the text area is cleared.
Note: Executing a stream outside of the script pane does not modify the script context of the script pane.
The values of any variables that are created as part of that execution are not visible within the script
dialog box.
stream = modeler.script.stream()
neuralnetnode = stream.findByType("neuralnetwork", None)
results = []
neuralnetnode.run(results)
appliernode = stream.createModelApplierAt(results[0], "Drug", 594, 187)
analysisnode = stream.createAt("analysis", "Drug", 688, 187)
typenode = stream.findByType("type", None)
stream.linkBetween(appliernode, typenode, analysisnode)
analysisnode.run([])
Standalone Scripts
The Standalone Script dialog box is used to create or edit a script that is saved as a text file. It displays the
name of the file and provides facilities for loading, saving, importing, and executing scripts.
To access the standalone script dialog box:
From the main menu, choose:
Tools > Standalone Script
The same toolbar and script syntax-checking options are available for standalone scripts as for stream
scripts. See the topic “Stream Scripts” on page 1 for more information.
taskrunner = modeler.script.session().getTaskRunner()
# First load the model builder stream from file and build a model
druglearn_stream = taskrunner.openStreamFromFile(installation + "streams/druglearn.str", True)
results = []
druglearn_stream.findByType("c50", None).run(results)
# Now load the plot stream, read the model from file and insert it into the stream
drugplot_stream = taskrunner.openStreamFromFile(installation + "streams/drugplot.str", True)
model = taskrunner.openModelFromFile("rule.gm", True)
modelapplier = drugplot_stream.createModelApplier(model, "Drug")
Note: To learn more about scripting language in general, see “Scripting language overview” on page 15.
stream = modeler.script.session().createProcessorStream("featureselection",
True)
stream.link(statisticsimportnode, typenode)
stream.link(typenode, featureselectionnode)
models = []
featureselectionnode.run(models)
# Assumes the stream automatically places model apply nodes in the stream
applynode = stream.findByType("applyfeatureselection", None)
The script creates a source node to read in the data, uses a Type node to set the role (direction) for the
response_01 field to Target, and then creates and executes a Feature Selection node. The script also
connects the nodes and positions each on the stream canvas to produce a readable layout. The resulting
model nugget is then connected to a Table node, which lists the 15 most important fields as determined
by the selection_mode and top_n properties. See the topic “featureselectionnode properties” on page
245 for more information.
SuperNode Scripts
You can create and save scripts within any terminal SuperNodes using the IBM SPSS Modeler scripting
language. These scripts are only available for terminal SuperNodes and are often used when creating
template streams or to impose a special execution order for the SuperNode contents. SuperNode scripts
also enable you to have more than one script running within a stream.
For example, let's say you needed to specify the order of execution for a complex stream, and your
SuperNode contains several nodes including a SetGlobals node, which needs to be executed before
deriving a new field used in a Plot node. In this case, you can create a SuperNode script that executes the
SetGlobals node first. Values calculated by this node, such as the average or standard deviation, can then
be used when the Plot node is executed.
Within a SuperNode script, you can specify node properties in the same manner as other scripts.
Alternatively, you can change and define the properties for any SuperNode or its encapsulated nodes
directly from a stream script. See the topic Chapter 21, “SuperNode properties,” on page 435 for more
information. This method works for source and process SuperNodes as well as terminal SuperNodes.
Note: Since only terminal SuperNodes can execute their own scripts, the Scripts tab of the SuperNode
dialog box is available only for terminal SuperNodes.
To open the SuperNode script dialog box from the main canvas:
Select a terminal SuperNode on the stream canvas and, from the SuperNode menu, choose:
SuperNode Script...
To open the SuperNode script dialog box from the zoomed-in SuperNode canvas:
Right-click the SuperNode canvas, and from the context menu, choose:
SuperNode Script...
Looping in streams
With looping you can automate repetitive tasks in streams; examples may include the following:
• Run the stream a given number of times and change the source each time.
• Run the stream a given number of times, changing the value of a variable each time.
• Run the stream a given number of times, entering one extra field on each execution.
• Build a model a given number of times and change a model setting each time.
You set up the conditions to be met on the Looping subtab of the stream Execution tab. To display the
subtab, select the Looping/Conditional Execution execution mode.
Any looping requirements that you define will take effect when you run the stream, if the Looping/
Conditional Execution execution mode has been set. Optionally, you can generate the script code for
your looping requirements and paste it into the script editor by clicking Paste... in the bottom right corner
of the Looping subtab; the main Execution tab display changes to show the Default (optional script)
execution mode with the script in the top part of the tab. This means that you can define a looping
structure using the various looping dialog box options before generating a script that you can customize
further in the script editor. Note that when you click Paste... any conditional execution requirements you
have defined will also be displayed in the generated script.
Important: The looping variables that you set in a SPSS Modeler stream may be overridden if you run the
stream in a IBM SPSS Collaboration and Deployment Services job. This is because the IBM SPSS
Collaboration and Deployment Services job editor entry overrides the SPSS Modeler entry. For example, if
you set a looping variable in the stream to create a different output file name for each loop, the files are
correctly named in SPSS Modeler but are overridden by the fixed entry entered on the Result tab of the
IBM SPSS Collaboration and Deployment Services Deployment Manager.
To set up a loop
1. Create an iteration key to define the main looping structure to be carried out in a stream. See Create an
iteration key for more information.
2. Where needed, define one or more iteration variables. See Create an iteration variable for more
information.
3. The iterations and any variables you created are shown in the main body of the subtab. By default,
iterations are executed in the order they appear; to move an iteration up or down the list, click on it to
select it then use the up or down arrow in the right hand column of the subtab to change the order.
1. In the right hand column of the Conditional subtab, click the Add New Condition button to open
the Add Conditional Execution Statement dialog box. In this dialog you specify the condition that
must be met in order for the node to be executed.
2. In the Add Conditional Execution Statement dialog box, specify the following:
a. Node. Select the node for which you want to set up conditional execution. Click the browse button
to open the Select Node dialog and choose the node you want; if there are too many nodes listed
you can filter the display to show nodes by one of the following categories: Export, Graph,
Modeling, or Output node.
b. Condition based on. Specify the condition that must be met for the node to be executed. You can
choose from one of four options: Stream parameter, Global variable, Table output cell, or Always
true. The details you enter in the bottom half of the dialog box are controlled by the condition you
choose.
• Stream parameter. Select the parameter from the list available and then choose the Operator for
that parameter; for example, the operator may be More than, Equals, Less than, Between, and so
on. You then enter the Value, or minimum and maximum values, depending on the operator.
• Global variable. Select the variable from the list available; for example, this might include: Mean,
Sum, Minimum value, Maximum value, or Standard deviation. You then select the Operator and
values required.
• Table output cell. Select the table node from the list available and then choose the Row and
Column in the table. You then select the Operator and values required.
• Always true. Select this option if the node must always be executed. If you select this option,
there are no further parameters to select.
3. Repeat steps 1 and 2 as often as required until you have set up all the conditions you require. The node
you selected and the condition to be met before that node is executed are shown in the main body of
the subtab in the Execute Node and If this condition is true columns respectively.
4. By default, nodes and conditions are executed in the order they appear; to move a node and condition
up or down the list, click on it to select it then use the up or down arrow in the right hand column of the
subtab to change the order.
In addition, you can set the following options at the bottom of the Conditional subtab:
• Evaluate all in order. Select this option to evaluate each condition in the order in which they are shown
on the subtab. The nodes for which conditions have been found to be "True" will all be executed once all
the conditions have been evaluated.
• Execute one at a time. Only available if Evaluate all in order is selected. Selecting this means that if a
condition is evaluated as "True", the node associated with that condition is executed before the next
condition is evaluated.
• Evaluate until first hit. Selecting this means that only the first node that returns a "True" evaluation
from the conditions you specified will be run.
The "Run selected lines" button executes a single line, or a block of adjacent lines, that you have selected
in the script:
Python Scripting
This guide to the Python scripting language is an introduction to the components that are most likely to be
used when scripting in IBM SPSS Modeler, including concepts and programming basics. This will provide
you with enough knowledge to start developing your own Python scripts to use within IBM SPSS Modeler.
Operations
Assignment is done using an equals sign (=). For example, to assign the value "3" to a variable called "x"
you would use the following statement:
x = 3
The equals sign is also used to assign string type data to a variable. For example, to assign the value "a
string value" to the variable "y" you would use the following statement:
The following table lists some commonly used comparison and numeric operations, and their
descriptions.
Lists
Lists are sequences of elements. A list can contain any number of elements, and the elements of the list
can be any type of object. Lists can also be thought of as arrays. The number of elements in a list can
increase or decrease as elements are added, removed, or replaced.
Examples
You can then access specific elements of the list, for example:
mylist[0]
one
The number in the brackets ([]) is known as an index and refers to a particular element of the list. The
elements of a list are indexed starting from 0.
You can also select a range of elements of a list; this is called slicing. For example, x[1:3] selects the
second and third elements of x. The end index is one past the selection.
Strings
A string is an immutable sequence of characters that is treated as a value. Strings support all of the
immutable sequence functions and operators that result in a new string. For example, "abcdef"[1:4]
results in the output "bcd".
In Python, characters are represented by strings of length one.
Strings literals are defined by the use of single or triple quoting. Strings that are defined using single
quotes cannot span lines, while strings that are defined using triple quotes can. A string can be enclosed
in single quotes (') or double quotes ("). A quoting character may contain the other quoting character un-
escaped or the quoting character escaped, that is preceded by the backslash (\) character.
Examples
"This is a string"
'This is also a string'
"It's a string"
'This book is called "Python Scripting and Automation Guide".'
"This is an escape quote (\") in a quoted string"
Multiple strings separated by white space are automatically concatenated by the Python parser. This
makes it easier to enter long strings and to mix quote types in a single string, for example:
"This string uses ' and " 'that string uses ".'
Strings support several useful methods. Some of these methods are given in the following table.
Statement Syntax
The statement syntax for Python is very simple. In general, each source line is a single statement. Except
for expression and assignment statements, each statement is introduced by a keyword name, such as
if or for. Blank lines or remark lines can be inserted anywhere between any statements in the code. If
there is more than one statement on a line, each statement must be separated by a semicolon (;).
Very long statements can continue on more than one line. In this case the statement that is to continue on
to the next line must end with a backslash (\), for example:
When a structure is enclosed by parentheses (()), brackets ([]), or curly braces ({}), the statement can
be continued on to a new line after any comma, without having to insert a backslash, for example:
x = (1, 2, 3, "hello",
"goodbye", 4, 5, 6)
Identifiers
Identifiers are used to name variables, functions, classes and keywords. Identifiers can be any length, but
must start with either an alphabetical character of upper or lower case, or the underscore character (_).
Names that start with an underscore are generally reserved for internal or private names. After the first
character, the identifier can contain any number and combination of alphabetical characters, numbers
from 0-9, and the underscore character.
There are some reserved words in Jython that cannot be used to name variables, functions, or classes.
They fall under the following categories:
• Statement introducers: assert, break, class, continue, def, del, elif, else, except, exec,
finally, for, from, global, if, import, pass, print, raise, return, try, and while
• Parameter introducers: as, import, and in
• Operators: and, in, is, lambda, not, and or
Improper keyword use generally results in a SyntaxError.
Blocks of Code
Blocks of code are groups of statements that are used where single statements are expected. Blocks of
code can follow any of the following statements: if, elif, else, for, while, try, except, def, and
class. These statements introduce the block of code with the colon character (:), for example:
if x == 1:
y = 2
z = 3
elif:
y = 4
z = 5
Indentation is used to delimit code blocks (rather than the curly braces that are used in Java). All lines in a
block must be indented to the same position. This is because a change in the indentation indicates the
end of a code block. It is usual to indent by four spaces per level. It is recommended that spaces are used
if x == 1: y = 2; z = 3;
import sys
print "test1"
print sys.argv[0]
print sys.argv[1]
print len(sys.argv)
In this example, the import command imports the entire sys class so that the methods that exist for this
class, such as argv, can be used.
The script in this example can be invoked using the following line:
Examples
The print keyword prints the arguments immediately following it. If the statement is followed by a
comma, a new line is not included in the output. For example:
The for statement is used to iterate through a block of code. For example:
In this example, three strings are assigned to the list mylist1. The elements of the list are then printed,
with one element of each line. This will result in the following output:
one
two
three
In this example, the iterator lv takes the value of each element in the list mylist1 in turn as the for loop
implements the code block for each element. An iterator can be any valid identifier of any length.
In this example, the value of the iterator lv is evaluated. If the value of lv is two a different string is
returned to the string that is returned if the value of lv is not two. This results in the following output:
Mathematical Methods
From the math module you can access useful mathematical methods. Some of these methods are given
in the following table. Unless specified otherwise, all values are returned as floats.
In addition to the mathematical functions, there are some useful trigonometric methods. These methods
are shown in the following table.
There are also two mathematical constants. The value of math.pi is the mathematical constant pi. The
value of math.e is the mathematical constant e.
The label is incorrect because the string literal itself has been converted to an ASCII string by Python.
Python allows Unicode string literals to be specified by adding a u character prefix before the string literal:
This will create a Unicode string and the label will be appear correctly.
Using Python and Unicode is a large topic which is beyond the scope of this document. Many books and
online resources are available that cover this topic in great detail.
Object-Oriented Programming
Object-oriented programming is based on the notion of creating a model of the target problem in your
programs. Object-oriented programming reduces programming errors and promotes the reuse of code.
Python is an object-oriented language. Objects defined in Python have the following features:
• Identity. Each object must be distinct, and this must be testable. The is and is not tests exist for this
purpose.
• State. Each object must be able to store state. Attributes, such as fields and instance variables, exist for
this purpose.
• Behavior. Each object must be able to manipulate its state. Methods exist for this purpose.
Python includes the following features for supporting object-oriented programming:
• Class-based object creation. Classes are templates for the creation of objects. Objects are data
structures with associated behavior.
• Inheritance with polymorphism. Python supports single and multiple inheritance. All Python instance
methods are polymorphic and can be overridden by subclasses.
Defining a Class
Within a Python class, both variables and methods can be defined. Unlike in Java, in Python you can
define any number of public classes per source file (or module). Therefore, a module in Python can be
thought of similar to a package in Java.
In Python, classes are defined using the class statement. The class statement has the following form:
or
When you define a class, you have the option to provide zero or more assignment statements. These
create class attributes that are shared by all instances of the class. You can also provide zero or more
function definitions. These function definitions create methods. The superclasses list is optional.
The class name should be unique in the same scope, that is within a module, function or class. You can
define multiple variables to reference the same class.
class MyClass:
pass
Here, the pass statement is used because a statement is required to complete the class, but no action is
required programmatically.
The following statement creates an instance of the class MyClass:
x = MyClass()
x.attr1 = 1
x.attr2 = 2
.
.
x.attrN = n
class MyClass
attr1 = 10 #class attributes
attr2 = "hello"
def method1(self):
print MyClass.attr1 #reference the class attribute
def method2(self):
print MyClass.attr2 #reference the class attribute
Inside a class, you should qualify all references to class attributes with the class name; for example,
MyClass.attr1. All references to instance attributes should be qualified with the self variable; for
example, self.text. Outside the class, you should qualify all references to class attributes with the
class name (for example MyClass.attr1) or with an instance of the class (for example x.attr1, where
x is an instance of the class). Outside the class, all references to instance variables should be qualified
with an instance of the class; for example, x.text.
Hidden Variables
Data can be hidden by creating Private variables. Private variables can be accessed only by the class itself.
If you declare names of the form __xxx or __xxx_yyy, that is with two preceding underscores, the
Python parser will automatically add the class name to the declared name, creating hidden variables, for
example:
class MyClass:
__attr = 10 #private class attribute
def method1(self):
pass
Unlike in Java, in Python all references to instance variables must be qualified with self; there is no
implied use of this.
Inheritance
The ability to inherit from classes is fundamental to object-oriented programming. Python supports both
single and multiple inheritance. Single inheritance means that there can be only one superclass. Multiple
inheritance means that there can be more than one superclass.
Inheritance is implemented by subclassing other classes. Any number of Python classes can be
superclasses. In the Jython implementation of Python, only one Java class can be directly or indirectly
inherited from. It is not required for a superclass to be supplied.
Any attribute or method in a superclass is also in any subclass and can be used by the class itself, or by
any client as long as the attribute or method is not hidden. Any instance of a subclass can be used
wherever and instance of a superclass can be used; this is an example of polymorphism. These features
enable reuse and ease of extension.
Types of scripts
In IBM SPSS Modeler there are three types of script:
• Stream scripts are used to control execution of a single stream and are stored within the stream.
• SuperNode scripts are used to control the behavior of SuperNodes.
• Stand-alone or session scripts can be used to coordinate execution across a number of different
streams.
Various methods are available to be used in scripts in IBM SPSS Modeler with which you can access a
wide range of SPSS Modeler functionality. These methods are also used in Chapter 4, “The Scripting API,”
on page 37 to create more advanced functions.
Streams
A stream is the main IBM SPSS Modeler document type. It can be saved, loaded, edited and executed.
Streams can also have parameters, global values, a script, and other information associated with them.
SuperNode streams
A SuperNode stream is the type of stream used within a SuperNode. Like a normal stream, it contains
nodes which are linked together. SuperNode streams have a number of differences from a normal stream:
• Parameters and any scripts are associated with the SuperNode that owns the SuperNode stream, rather
than with the SuperNode stream itself.
• SuperNode streams have additional input and output connector nodes, depending on the type of
SuperNode. These connector nodes are used to flow information into and out of the SuperNode stream,
and are created automatically when the SuperNode is created.
Diagrams
The term diagram covers the functions that are supported by both normal streams and SuperNode
streams, such as adding and removing nodes, and modifying connections between the nodes.
Executing a stream
The following example runs all executable nodes in the stream, and is the simplest type of stream script:
modeler.script.stream().runAll(None)
The following example also runs all executable nodes in the stream:
stream = modeler.script.stream()
stream.runAll(None)
In this example, the stream is stored in a variable called stream. Storing the stream in a variable is useful
because a script is typically used to modify either the stream or the nodes within a stream. Creating a
variable that stores the stream results in a more concise script.
The modeler.script module also defines a way of terminating the script with an exit code. The
exit(exit-code) function stops the script from executing and returns the supplied integer exit code.
One of the methods that is defined for a stream is runAll(List). This method runs all executable
nodes. Any models or outputs that are generated by executing the nodes are added to the supplied list.
It is common for a stream execution to generate outputs such as models, graphs, and other output. To
capture this output, a script can supply a variable that is initialized to a list, for example:
stream = modeler.script.stream()
results = []
stream.runAll(results)
When execution is complete, any objects that are generated by the execution can be accessed from the
results list.
Finding nodes
Streams provide a number of ways of locating an existing node. These methods are summarized in the
following table.
As an example, if a stream contained a single Filter node that the script needed to access, the Filter node
can be found by using the following script:
stream = modeler.script.stream()
node = stream.findByType("filter", None)
...
Alternatively, if the ID of the node (as shown on the Annotations tab of the node dialog box) is known, the
ID can be used to find the node, for example:
stream = modeler.script.stream()
node = stream.findByID("id32FJT71G2") # the filter node ID
...
Setting properties
Nodes, streams, models, and outputs all have properties that can be accessed and, in most cases, set.
Properties are typically used to modify the behavior or appearance of the object. The methods that are
available for accessing and setting object properties are summarized in the following table.
stream = modeler.script.stream()
node = stream.findByType("variablefile", None)
node.setPropertyValue("full_filename", "$CLEO/DEMOS/DRUG1n")
...
Alternatively, you might want to filter a field from a Filter node. In this case, the value is also keyed on the
field name, for example:
stream = modeler.script.stream()
# Locate the filter node ...
node = stream.findByType("filter", None)
# ... and filter out the "Na" field
node.setKeyedPropertyValue("include", "Na", False)
Creating nodes
Streams provide a number of ways of creating nodes. These methods are summarized in the following
table.
For example, to create a new Type node in a stream you can use the following script:
stream = modeler.script.stream()
# Create a new type node
node = stream.create("type", "My Type")
stream = modeler.script.stream()
filenode = stream.createAt("variablefile", "My File Input ", 96, 64)
filternode = stream.createAt("filter", "Filter", 192, 64)
tablenode = stream.createAt("table", "Table", 288, 64)
session = modeler.script.session()
session.getStreamManager.removeAll()
session = modeler.script.session()
session.getDocumentOutputManager().removeAll()
session = modeler.script.session()
session.getModelOutputManager().removeAll()
Table 16. Methods to obtain the ID, name, and label of a node
Method Return type Description
n.getLabel() string Returns the display label of the
specified node. The label is the
value of the property
custom_name only if that
property is a non-empty string
and the use_custom_name
property is not set; otherwise,
the label is the value of
getName().
n.setLabel(label) Not applicable Sets the display label of the
specified node. If the new label is
a non-empty string it is assigned
to the property custom_name,
and False is assigned to the
property use_custom_name so
that the specified label takes
precedence; otherwise, an empty
string is assigned to the property
custom_name and True is
assigned to the property
use_custom_name.
n.getName() string Returns the name of the specified
node.
n.getID() string Returns the ID of the specified
node. A new ID is created each
time a new node is created. The
ID is persisted with the node
when it is saved as part of a
stream so that when the stream
is opened, the node IDs are
preserved. However, if a saved
node is inserted into a stream,
the inserted node is considered
to be a new object and will be
allocated a new ID.
Methods that can be used to obtain other information about a node are summarized in the following table.
import modeler.api
import modeler.api
class CacheFilter(modeler.api.NodeFilter):
"""A node filter for nodes with caching enabled"""
def accept(this, node):
return node.isCacheEnabled()
import modeler.api
stream = modeler.script.stream()
sourceNode = stream.findByID('')
session = modeler.script.session()
fileSystem = session.getServerFileSystem()
parameter = stream.getParameterValue('VPATH')
serverDirectory = fileSystem.getServerFile(parameter)
files = fileSystem.getFiles(serverDirectory)
for f in files:
if f.isDirectory():
print 'Directory:'
else:
print 'File:'
sourceNode.setPropertyValue('full_filename',f.getPath())
break
print f.getName(),f.getPath()
stream.execute()
import modeler.api
stream = modeler.script.stream()
filternode = stream.findByType("filter", None)
typenode = stream.findByType("type", None)
c50node = stream.findByType("c50", None)
# Always use a custom model name
c50node.setPropertyValue("use_model_name", True)
lastRemoved = None
fields = typenode.getOutputDataModel()
for field in fields:
# If this is the target field then ignore it
if field.getModelingRole() == modeler.api.ModelingRole.OUT:
continue
# Set the name of the new model then run the build
c50node.setPropertyValue("model_name", "Exclude " + lastRemoved)
c50node.run([])
The DataModel object provides a number of methods for accessing information about the fields or
columns within the data model. These methods are summarized in the following table.
Table 18. DataModel object methods for accessing information about fields or columns
Method Return type Description
d.getColumnCount() int Returns the number of columns
in the data model.
Each field (Column object) includes a number of methods for accessing information about the column.
The table below shows a selection of these.
Table 19. Column object methods for accessing information about the column
Method Return type Description
c.getColumnName() string Returns the name of the column.
c.getColumnLabel() string Returns the label of the column
or an empty string if there is no
label associated with the column.
c.getMeasureType() MeasureType Returns the measure type for the
column.
c.getStorageType() StorageType Returns the storage type for the
column.
c.isMeasureDiscrete() Boolean Returns True if the column is
discrete. Columns that are either
a set or a flag are considered
discrete.
c.isModelOutputColumn() Boolean Returns True if the column is a
model output column.
Note that most of the methods that access information about a column have equivalent methods defined
on the DataModel object itself. For example the two following statements are equivalent:
dataModel.getColumn("someName").getModelingRole()
dataModel.getModelingRole("someName")
import modeler.api
stream = modeler.script.stream()
label = model.getLabel()
algorithm = model.getModelDetail().getAlgorithmName()
The task runner class provides a convenient way running various common tasks. The methods that are
available in this class are summarized in the following table.
Table 20. Methods of the task runner class for performing common tasks
Method Return type Description
t.createStream(name, Stream Creates and returns a new
autoConnect, autoManage) stream. Note that code that must
create streams privately without
making them visible to the user
should set the autoManage flag
to False.
Not applicable Exports the stream description to
t.exportDocumentToFile(
documentOutput, filename, a file using the specified file
fileFormat) format.
Handling Errors
The Python language provides error handling via the try...except code block. This can be used within
scripts to trap exceptions and handle problems that would otherwise cause the script to terminate.
In the example script below, an attempt is made to retrieve a model from a IBM SPSS Collaboration and
Deployment Services Repository. This operation can cause an exception to be thrown, for example, the
repository login credentials might not have been set up correctly, or the repository path is wrong. In the
script, this may cause a ModelerException to be thrown (all exceptions that are generated by IBM
SPSS Modeler are derived from modeler.api.ModelerException).
import modeler.api
session = modeler.script.session()
try:
repo = session.getRepository()
m = repo.retrieveModel("/some-non-existent-path", None, None, True)
# print goes to the Modeler UI script panel Debug tab
print "Everything OK"
except modeler.api.ModelerException, e:
print "An error occurred:", e.getMessage()
Note: Some scripting operations may cause standard Java exceptions to be thrown; these are not derived
from ModelerException. In order to catch these exceptions, an additional except block can be used to
catch all Java exceptions, for example:
import modeler.api
session = modeler.script.session()
try:
repo = session.getRepository()
m = repo.retrieveModel("/some-non-existent-path", None, None, True)
# print goes to the Modeler UI script panel Debug tab
print "Everything OK"
except modeler.api.ModelerException, e:
print "An error occurred:", e.getMessage()
except java.lang.Exception, e:
print "A Java exception occurred:", e.getMessage()
In the following example, the script aggregates some Telco data to find which region has the lowest
average income data. A stream parameter is then set with this region. That stream parameter is then used
in a Select node to exclude that region from the data, before a churn model is built on the remainder.
The example is artificial because the script generates the Select node itself and could therefore have
generated the correct value directly into the Select node expression. However, streams are typically pre-
built, so setting parameters in this way provides a useful example.
The first part of the example script creates the stream parameter that will contain the region with the
lowest average income. The script also creates the nodes in the aggregation branch and the model
building branch, and connects them together.
import modeler.api
# First create the aggregation branch to compute the average income per region
statisticsimportnode = stream.createAt("statisticsimport", "SPSS File", 114, 142)
statisticsimportnode.setPropertyValue("full_filename", "$CLEO_DEMOS/telco.sav")
statisticsimportnode.setPropertyValue("use_field_format_for_storage", True)
stream.link(statisticsimportnode, aggregatenode)
stream.link(aggregatenode, tablenode)
stream.link(statisticsimportnode, selectnode)
stream.link(selectnode, typenode)
stream.link(typenode, c50node)
The following part of the example script executes the Table node at the end of the aggregation branch.
The following part of the example script accesses the table output that was generated by the execution of
the Table node. The script then iterates through rows in the table, looking for the region with the lowest
average income.
# table output contains a RowSet so we can access values as rows and columns
rowset = table.getRowSet()
min_income = 1000000.0
min_region = None
# From the way the aggregate node is defined, the first column
# contains the region and the second contains the average income
The following part of the script uses the region with the lowest average income to set the "LowestRegion"
stream parameter that was created earlier. The script then runs the model builder with the specified
region excluded from the training data.
import modeler.api
stream = modeler.script.stream()
# First create the aggregation branch to compute the average income per region
statisticsimportnode = stream.createAt("statisticsimport", "SPSS File", 114, 142)
statisticsimportnode.setPropertyValue("full_filename", "$CLEO_DEMOS/telco.sav")
statisticsimportnode.setPropertyValue("use_field_format_for_storage", True)
stream.link(statisticsimportnode, aggregatenode)
stream.link(aggregatenode, tablenode)
stream.link(statisticsimportnode, selectnode)
stream.link(selectnode, typenode)
stream.link(typenode, c50node)
# table output contains a RowSet so we can access values as rows and columns
rowset = table.getRowSet()
min_income = 1000000.0
min_region = None
# From the way the aggregate node is defined, the first column
# contains the region and the second contains the average income
row = 0
rowcount = rowset.getRowCount()
while row < rowcount:
if rowset.getValueAt(row, 1) < min_income:
min_income = rowset.getValueAt(row, 1)
Global Values
Global values are used to compute various summary statistics for specified fields. These summary values
can be accessed anywhere within the stream. Global values are similar to stream parameters in that they
are accessed by name through the stream. They are different from stream parameters in that the
associated values are updated automatically when a Set Globals node is run, rather than being assigned
by scripting or from the command line. The global values for a stream are accessed by calling the stream's
getGlobalValues() method.
The GlobalValues object defines the functions that are shown in the following table.
GlobalValues.Type defines the type of summary statistics that are available. The following summary
statistics are available:
• MAX: the maximum value of the field.
• MEAN: the mean value of the field.
• MIN: the minimum value of the field.
• STDDEV: the standard deviation of the field.
• SUM: the sum of the values in the field.
For example, the following script accesses the mean value of the "income" field, which is computed by a
Set Globals node:
import modeler.api
globals = modeler.script.stream().getGlobalValues()
mean_income = globals.getValue(modeler.api.GlobalValues.Type.MEAN, "income")
session = modeler.script.session()
tasks = session.getTaskRunner()
# Open the model build stream, locate the C5.0 node and run it
buildstream = tasks.openStreamFromFile(demosDir + "druglearn.str", True)
c50node = buildstream.findByType("c50", None)
results = []
c50node.run(results)
# Now open the plot stream, find the Na_to_K derive and the histogram
plotstream = tasks.openStreamFromFile(demosDir + "drugplot.str", True)
derivenode = plotstream.findByType("derive", None)
histogramnode = plotstream.findByType("histogram", None)
# Create a model applier node, insert it between the derive and histogram nodes
# then run the histgram
applyc50 = plotstream.createModelApplier(results[0], results[0].getName())
applyc50.setPositionBetween(derivenode, histogramnode)
plotstream.linkBetween(applyc50, derivenode, histogramnode)
histogramnode.setPropertyValue("color_field", "$C-Drug")
histogramnode.run([])
The following example shows how you can also iterate over the open streams (all the streams open in the
Streams tab). Note that this is only supported in standalone scripts.
This section provides an overview of tips and techniques for using scripts, including modifying stream
execution, using an encoded password in a script, and accessing objects in the IBM SPSS Collaboration
and Deployment Services Repository.
The script loops through all nodes in the current stream, and checks whether each node is a Filter. If so,
the script loops through each field in the node and uses either the field.upper() or
field.getColumnName().upper() function to change the name to upper case.
repo = modeler.script.session().getRepository()
For example, you can retrieve a stream from the repository with the following function:
This example retrieves the risk_score.str stream from the specified folder. The label production
identifies which version of the stream to retrieve, and the last parameter specifies that SPSS Modeler is to
manage the stream (for example, so the stream appears in the Streams tab if the SPSS Modeler user
interface is visible). As an alternative, to use a specific, unlabeled version:
Note: If both the version and label parameters are None, then the latest version is returned.
For example, you can store a new version of the risk_score.str stream with the following function:
This example stores a new version of the stream, associates the "test" label with it, and returns the
version marker for the newly created version.
Note: If you do not want to associate a label with the new version, pass None for the label.
This example creates a new folder that is called "cross-sell" in the "/projects" folder. The function
returns the full path to the new folder.
To rename a folder, use the renameFolder() function:
repo.renameFolder("/projects/cross-sell", "cross-sell-Q1")
The first parameter is the full path to the folder to be renamed, and the second is the new name to give
that folder.
To delete an empty folder, use the deleteFolder() function:
repo.deleteFolder("/projects/cross-sell")
repo.lockFile(REPOSITORY_PATH)
repo.lockFile(URI)
repo.unlockFile(REPOSITORY_PATH)
repo.unlockFile(URI)
As with storing and retrieving objects, the REPOSITORY_PATH gives the location of the object in the
repository. The path must be enclosed in quotation marks and use forward slashes as delimiters. It is not
case sensitive.
repo.lockFile("/myfolder/Stream1.str")
repo.unlockFile("/myfolder/Stream1.str")
Alternatively, you can use a Uniform Resource Identifier (URI) rather than a repository path to give the
location of the object. The URI must include the prefix spsscr: and must be fully enclosed in quotation
repo.lockFile("spsscr:///myfolder/Stream1.str")
repo.unlockFile("spsscr:///myfolder/Stream1.str")
Note that object locking applies to all versions of an object - you cannot lock or unlock individual versions.
Script checking
You can quickly check the syntax of all types of scripts by clicking the red check button on the toolbar of
the Standalone Script dialog box.
Script checking alerts you to any errors in your code and makes recommendations for improvement. To
view the line with errors, click on the feedback in the lower half of the dialog box. This highlights the error
in red.
The -script flag loads the specified script, while the -execute flag executes all commands in the script
file.
stream = modeler.script.stream()
# Assume the stream contains a single C5.0 model builder node
# and that the datasource, predictors and targets have already been
# set up
modelbuilder = stream.findByType("c50", None)
results = []
modelbuilder.run(results)
modeloutput = results[0]
# Now that we have the C5.0 model output object, access the
# relevant content model
cm = modeloutput.getContentModel("PMML")
API
Table 25. API
Return Method Description
int getRowCount() Returns the number of rows in
this table.
int getColumnCount() Returns the number of columns
in this table.
String getColumnName(int Returns the name of the column
columnIndex) at the specified column index.
The column index starts at 0.
StorageType getStorageType(int Returns the storage type of the
columnIndex) column at the specified index.
The column index starts at 0.
Object getValueAt(int rowIndex, Returns the value at the specified
int columnIndex) row and column index. The row
and column indices start at 0.
void reset() Flushes any internal storage
associated with this content
model.
Example script
stream = modeler.script.stream()
from modeler.api import StorageType
# Next create the aggregate node and connect it to the variable file node
aggregatenode = stream.createAt("aggregate", "Aggregate", 192, 96)
stream.link(varfilenode, aggregatenode)
# Then create the table output node and connect it to the aggregate node
tablenode = stream.createAt("table", "Table", 288, 96)
stream.link(aggregatenode, tablenode)
# Execute the table node and capture the resulting table output object
results = []
tablenode.run(results)
tableoutput = results[0]
# For each column, print column name, type and the first row
# of values from the table content
col = 0
while col < tablecontent.getColumnCount():
print tablecontent.getColumnName(col), \
tablecontent.getStorageType(col), \
tablecontent.getValueAt(0, col)
col = col + 1
The output in the scripting Debug tab will look something like this:
Age_Min Integer 15
Age_Max Integer 74
Na_Mean Real 0.730851098901
Na_SDev Real 0.116669731242
Drug String drugY
Record_Count Integer 91
API
Table 27. API
Return Method Description
String getXMLAsString() Returns the XML as a string.
number getNumericValue(String Returns the result of evaluating
xpath) the path with return type of
numeric (for example, count the
number of elements that match
the path expression).
Example script
The Python scripting code to access the content might look like this:
results = []
modelbuilder.run(results)
modeloutput = results[0]
cm = modeloutput.getContentModel("PMML")
API
Table 29. API
Return Method Description
String getJSONAsString() Returns the JSON content as a
string.
Object getObjectAt(<List of Returns the object at the
cbjecta> path, specified path. The supplied root
JSONArtifact artifact) artifact may be null in which case
throws Exception the root of the content is used.
The returned value may be a
literal string, integer, real or
boolean, or a JSON artifact
(either a JSON object or a JSON
array).
Hash table (key:object, getChildValuesAt(<List of Returns the child values of the
value:object> object> path, specified path if the path leads to
JSONArtifact artifact) a JSON object or null otherwise.
throws Exception The keys in the table are strings
while the associated value may
be a literal string, integer, real or
boolean, or a JSON artifact
(either a JSON object or a JSON
array).
List of objects getChildrenAt(<List of Returns the list of objects at the
object> path path, specified path if the path leads to
JSONArtifact artifact) a JSON array or null otherwise.
throws Exception The returned values may be a
literal string, integer, real or
boolean, or a JSON artifact
(either a JSON object or a JSON
array).
void reset() Flushes any internal storage
associated with this content
model (for example, a cached
DOM object).
Example script
If there is an output builder node that creates output based on JSON format, the following could be used
to access information about a set of books:
results = []
outputbuilder.run(results)
output = results[0]
# Get the third book entry. Assumes the top-level "books" value
# contains a JSON array which can be indexed
bookInfo = cm.getObjectAt(["books", 2], None)
ColumnStatsContentModel API
Table 30. ColumnStatsContentModel API
Return Method Description
List<StatisticType> getAvailableStatistics() Returns the available statistics in
this model. Not all fields will
necessarily have values for all
statistics.
List<String> getAvailableColumns() Returns the column names for
which statistics were computed.
Number getStatistic(String Returns the statistic values
column, StatisticType associated with the column.
statistic)
void reset() Flushes any internal storage
associated with this content
model.
PairwiseStatsContentModel API
Table 31. PairwiseStatsContentModel API
Return Method Description
List<StatisticType> getAvailableStatistics() Returns the available statistics in
this model. Not all fields will
necessarily have values for all
statistics.
List<String> getAvailablePrimaryColumn Returns the primary column
s() names for which statistics were
computed.
List<Object> getAvailablePrimaryValue Returns the values of the primary
s() column for which statistics were
computed.
List<String> getAvailableSecondaryColu Returns the secondary column
mns() names for which statistics were
computed.
Number getStatistic(String Returns the statistic values
primaryColumn, String associated with the columns.
secondaryColumn,
StatisticType statistic)
(Means node)
"means" "means" "pairwiseStatistic
s"
(Means node)
"dataaudit" "means" "columnStatistics"
Example script
from modeler.api import StatisticType
stream = modeler.script.stream()
results = []
statisticsnode.run(results)
statsoutput = results[0]
statscm = statsoutput.getContentModel("pairwiseStatistics")
if (statscm != None):
pcols = statscm.getAvailablePrimaryColumns()
scols = statscm.getAvailableSecondaryColumns()
stats = statscm.getAvailableStatistics()
corr = statscm.getStatistic(pcols[0], scols[0], StatisticType.Pearson)
print "Pairwise stats:", pcols[0], scols[0], " Pearson = ", corr
The available arguments (flags) allow you to connect to a server, load streams, run scripts, or specify
other parameters as needed.
For example, you can use the -server, -stream and -execute flags to connect to a server and then
load and run a stream, as follows:
Note that when running against a local client installation, the server connection arguments are not
required.
Parameter values that contain spaces can be enclosed in double quotes—for example:
You can also execute IBM SPSS Modeler states and scripts in this manner, using the -state and -
script flags, respectively.
Note: If you use a structured parameter in a command, you must precede quotation marks with a
backslash. This prevents the quotation marks being removed during interpretation of the string.
System arguments
The following table describes system arguments available for command line invocation of the user
interface.
Note: Default directories can also be set in the user interface. To access the options, from the File menu,
choose Set Working Directory or Set Server Directory.
Loading objects from the IBM SPSS Collaboration and Deployment Services
Repository
Because you can load certain objects from a file or from the IBM SPSS Collaboration and Deployment
Services Repository (if licensed), the filename prefix spsscr: and, optionally, file: (for objects on disk)
tells IBM SPSS Modeler where to look for the object. The prefix works with the following flags:
• -stream
• -script
• -output
• -model
• -project
You use the prefix to create a URI that specifies the location of the object—for example, -stream
"spsscr:///folder_1/scoring_stream.str". The presence of the spsscr: prefix requires that a
valid connection to the IBM SPSS Collaboration and Deployment Services Repository has been specified
in the same command. So, for example, the full command would look like this:
Note that from the command line, you must use a URI. The simpler REPOSITORY_PATH is not supported.
(It works only within scripts.) For more details about URIs for objects in the IBM SPSS Collaboration and
Deployment Services Repository, see the topic “Accessing Objects in the IBM SPSS Collaboration and
Deployment Services Repository ” on page 49.
Parameter arguments
Parameters can be used as flags during command line execution of IBM SPSS Modeler. In command line
arguments, the -P flag is used to denote a parameter of the form -P <name>=<value>.
Parameters can be any of the following:
• Simple parameters (or parameters used directly in CLEM expressions).
• Slot parameters, also referred to as node properties. These parameters are used to modify the settings
of nodes in the stream. See the topic “Node properties overview” on page 73 for more information.
• Command line parameters, used to alter the invocation of IBM SPSS Modeler.
For example, you can supply data source user names and passwords as a command line flag, as follows:
The format is the same as that of the datasource parameter of the databasenode node property. For
more information, see: “databasenode properties” on page 91.
The last parameter should be set to true if you're passing an encoded password. Also note that no
leading spaces should be used in front of the database user name and password (unless, of course, your
user name or password actually contains a leading space).
A backslash is also required in front of the quotes that identify a structured parameter, as in the following
TM1 datasource example:
Note: If the database name (in the datasource property) contains one or more spaces, periods (also
known as a "full stop"), or underscores, you can use the "backslash double quote" format to treat it as
string. For example: "{\"db2v9.7.6_linux\"}" or: "{\"TDATA 131\"}". In addition, always
enclose datasource string values in double quotes and curly braces, as in the following example:
"{\"SQL Server\",spssuser,abcd1234,false}".
Examples
To connect to a public server:
Note that connecting to a server cluster requires the Coordinator of Processes through IBM SPSS
Collaboration and Deployment Services, so the -cluster argument must be used in combination with
the repository connection options (spsscr_*). See the topic “ IBM SPSS Collaboration and Deployment
Services Repository Connection Arguments” on page 67 for more information.
-epassword The encoded password with which to log on to the server. Available in
<encodedpasswordstring> server mode only.
Note: An encoded password can be generated from the Tools menu of the
IBM SPSS Modeler application.
-domain <name> The domain used to log on to the server. Available in server mode only.
-P <name>=<value> Used to set a startup parameter. Can also be used to set node properties
(slot parameters).
The following table lists the arguments that can be used to set up the connection.
Table 36. IBM SPSS Collaboration and Deployment Services Repository connection arguments
Argument Behavior/Description
-spsscr_hostname <hostname or The hostname or IP address of the server on which the IBM SPSS
IP address> Collaboration and Deployment Services Repository is installed.
-spsscr_port <number> The port number on which the IBM SPSS Collaboration and
Deployment Services Repository accepts connections (typically,
8080 by default).
-spsscr_use_ssl Specifies that the connection should use SSL (secure socket
layer). This flag is optional; the default setting is not to use SSL.
-spsscr_username <name> The user name with which to log on to the IBM SPSS Collaboration
and Deployment Services Repository.
modelerclient @<commandFileName>
Enclose the filename and path to the command file in quotation marks if spaces are required, as follows:
The command file can contain all arguments previously specified individually at startup, with one
argument per line. For example:
-stream report.str
-Porder.full_filename=APR_orders.dat
-Preport.filename=APR_report.txt
-execute
When writing and referencing command files, be sure to follow these constraints:
• Use only one command per line.
OBJECT.setPropertyValue(PROPERTY, VALUE)
or:
VARIABLE = OBJECT.getPropertyValue(PROPERTY)
or:
where OBJECT is a node or output, PROPERTY is the name of the node property that your expression
refers to, and KEY is the key value for keyed properties.. For example, the following syntax is used to find
the filter node, and then set the default to include all fields and filter the Age field from downstream data:
All nodes used in IBM SPSS Modeler can be located using the stream findByType(TYPE, LABEL)
function. At least one of TYPE or LABEL must be specified.
Structured properties
There are two ways in which scripting uses structured properties for increased clarity when parsing:
• To give structure to the names of properties for complex nodes, such as Type, Filter, or Balance nodes.
• To provide a format for specifying multiple properties at once.
Structuring for Complex Interfaces
The scripts for nodes with tables and other complex interfaces (for example, the Type, Filter, and Balance
nodes) must follow a particular structure in order to parse correctly. These properties need a name that is
more complex than the name for a single identifier, this name is called the key. For example, within a Filter
node, each available field (on its upstream side) is switched on or off. In order to refer to this information,
the Filter node stores one item of information per field (whether each field is true or false). This property
may have (or be given) the value True or False. Suppose that a Filter node named mynode has (on its
upstream side) a field called Age. To switch this to off, set the property include, with the key Age, to the
value False, as follows:
Another advantage that structured properties have is their ability to set several properties on a node
before the node is stable. By default, a multiset sets all properties in the block before taking any action
based on an individual property setting. For example, when defining a Fixed File node, using two steps to
set field properties would result in errors because the node is not consistent until both settings are valid.
Defining properties as a multiset circumvents this problem by setting both properties before updating the
data model.
Abbreviations
Standard abbreviations are used throughout the syntax for node properties. Learning the abbreviations is
helpful in constructing scripts.
The example s:sample.max_size illustrates that you do not need to spell out node types in full.
The example t.direction.Age illustrates that some slot names can themselves be structured—in
cases where the attributes of a node are more complex than simply individual slots with individual values.
Such slots are called structured or complex properties.
SuperNode-specific properties are discussed separately, as with all other nodes. See the topic Chapter
21, “SuperNode properties,” on page 435 for more information.
A variety of stream properties can be controlled by scripting. To reference stream properties, you must set
the execution method to use scripts:
stream = modeler.script.stream()
stream.setPropertyValue("execute_method", "Script")
Example
The node property is used to refer to the nodes in the current stream. The following stream script
provides an example:
stream = modeler.script.stream()
annotation = stream.getPropertyValue("annotation")
stream.setPropertyValue("annotation", annotation)
The above example uses the node property to create a list of all nodes in the stream and write that list in
the stream annotations. The annotation produced looks like this:
Script
Table 41. Stream properties (continued)
Property name Data type Property description
date_format
"DDMMYY"
"MMDDYY"
"YYMMDD"
"YYYYMMDD"
"YYYYDDD"
DAY
MONTH
"DD-MM-YY"
"DD-MM-YYYY"
"MM-DD-YY"
"MM-DD-YYYY"
"DD-MON-YY"
"DD-MON-YYYY"
"YYYY-MM-DD"
"DD.MM.YY"
"DD.MM.YYYY"
"MM.DD.YYYY"
"DD.MON.YY"
"DD.MON.YYYY"
"DD/MM/YY"
"DD/MM/YYYY"
"MM/DD/YY"
"MM/DD/YYYY"
"DD/MON/YY"
"DD/MON/YYYY"
MON YYYY
q Q YYYY
ww WK YYYY
date_baseline number
date_2digit_baseline number
time_format
"HHMMSS"
"HHMM"
"MMSS"
"HH:MM:SS"
"HH:MM"
"MM:SS"
"(H)H:(M)M:(S)S"
"(H)H:(M)M"
"(M)M:(S)S"
"HH.MM.SS"
"HH.MM"
"MM.SS"
"(H)H.(M)M.(S)S"
"(H)H.(M)M"
"(M)M.(S)S"
time_rollover flag
import_datetime_as_string flag
decimal_places number
decimal_symbol Default
Period
Comma
angles_in_radians flag
use_max_set_size flag
max_set_size number
FirstHit
refresh_source_nodes flag Use to refresh source nodes
automatically upon stream
execution.
script string
annotation string
name string Note: This property is read-only.
If you want to change the name
of a stream, you should save it
with a different name.
"UTF-8"
stream_rewriting boolean
stream_rewriting_maximise boolean
_sql
boolean
stream_rewriting_optimise_cl
em_
execution
boolean
stream_rewriting_optimise_sy
ntax_
execution
enable_parallelism boolean
sql_generation boolean
database_caching boolean
sql_logging boolean
sql_generation_logging boolean
sql_log_native boolean
sql_log_prettyprint boolean
record_count_suppress_inp boolean
ut
record_count_feedback_int integer
erval
Example 1
varfilenode = modeler.script.stream().create("variablefile", "Var. File")
varfilenode.setPropertyValue("full_filename", "$CLEO_DEMOS/DRUG1n")
varfilenode.setKeyedPropertyValue("check", "Age", "None")
varfilenode.setKeyedPropertyValue("values", "Age", [1, 100])
varfilenode.setKeyedPropertyValue("type", "Age", "Range")
varfilenode.setKeyedPropertyValue("direction", "Age", "Input")
Example 2
This script assumes that the specified data file contains a field called Region that represents a multi-line
string.
# Create a Variable File node that reads the data set containing
# the "Region" field
varfilenode = modeler.script.stream().create("variablefile", "My Geo Data")
varfilenode.setPropertyValue("full_filename", "C:/mydata/mygeodata.csv")
varfilenode.setPropertyValue("treat_square_brackets_as_lists", True)
Both NODE.direction.FIELDNAME
Split
Frequency
RecordID
type Range Type of field. Setting this property to Default will
clear any values property setting, and if
Flag value_mode is set to Specify, it will be reset to
Read. If value_mode is already set to Pass or
Read, it will be unaffected by the type setting.
Set
Usage format:
Typeless
NODE.type.FIELDNAME
Discrete
Ordered Set
Default
storage Unknown Read-only keyed property for field storage type.
Integer NODE.storage.FIELDNAME
Real
Time
Date
Timestamp
Coerce NODE.check.FIELDNAME
Discard
Warn
Abort
values [value value] For a continuous (range) field, the first value is the
minimum, and the last value is the maximum. For
nominal (set) fields, specify all values. For flag
fields, the first value represents false, and the last
value represents true. Setting this property
automatically sets the value_mode property to
Specify. The storage is determined based on the
first value in the list, for example, if the first value is
a string then the storage is set to String.
Usage format:
NODE.values.FIELDNAME
value_mode Read Determines how values are set for a field on the
next data pass.
Pass
Usage format:
Read+
NODE.value_mode.FIELDNAME
Current
Note that you cannot set this property to Specify
Specify directly; to use specific values, set the values
property.
default_value_mode Read Specifies the default method for setting values for
all fields.
Pass
Usage format:
NODE.default_value_mode
Usage format:
NODE.extend_values.FIELDNAME
value_labels string Used to specify a value label. Note that values must
be specified first.
enable_missing flag When set to T, activates tracking of missing values
for the field.
Usage format:
NODE.enable_missing.FIELDNAME
missing_values [value value ...] Specifies data values that denote missing data.
Usage format:
NODE.missing_values.FIELDNAME
range_missing flag When this property is set to T, specifies whether a
missing-value (blank) range is defined for a field.
Usage format:
NODE.range_missing.FIELDNAME
missing_lower string When range_missing is true, specifies the lower
bound of the missing-value range.
Usage format:
NODE.missing_lower.FIELDNAME
missing_upper string When range_missing is true, specifies the upper
bound of the missing-value range.
Usage format:
NODE.missing_upper.FIELDNAME
Usage format:
NODE.null_missing.FIELDNAME
whitespace_missing flag When this property is set to T, values containing
only white space (spaces, tabs, and new lines) are
considered missing values.
Usage format:
NODE.whitespace_missing.FIELDNAME
description string Used to specify a field label or description.
default_include flag Keyed property to specify whether the default
behavior is to pass or filter fields:
NODE.default_include
Example:
set mynode:filternode.default_include
= false
include flag Keyed property used to determine whether
individual fields are included or filtered:
NODE.include.FIELDNAME.
new_name string
Flag /
MeasureType.FLAG
Set /
MeasureType.SET
OrderedSet /
MeasureType.ORDER
ED_SET
Typeless /
MeasureType.TYPEL
ESS
Collection /
MeasureType.COLLE
CTION
Geospatial /
MeasureType.GEOSP
ATIAL
collection_measure Range / For collection fields (lists with a depth of 0), this
MeasureType.RANGE keyed property defines the measurement type
associated with the underlying values.
Flag /
MeasureType.FLAG
Set /
MeasureType.SET
OrderedSet /
MeasureType.ORDER
ED_SET
Typeless /
MeasureType.TYPEL
ESS
LineString
MultiLineString
Polygon
MultiPolygon
has_coordinate_syst boolean For geospatial fields, this property defines whether
em this field has a coordinate system
coordinate_system string For geospatial fields, this keyed property defines
the coordinate system for this field.
custom_storage_type Unknown / This keyed property is similar to custom_storage
MeasureType.UNKNO in that it can be used to define the override storage
WN for the field. What is different is that in Python
scripting, the setter function can also be passed
String / one of the StorageType values while the getter
MeasureType.STRIN will always return on the StorageType values.
G
Integer /
MeasureType.INTEG
ER
Real /
MeasureType.REAL
Time /
MeasureType.TIME
Date /
MeasureType.DATE
Timestamp /
MeasureType.TIMES
TAMP
List /
MeasureType.LIST
Integer /
MeasureType.INTEG
ER
Real /
MeasureType.REAL
Time /
MeasureType.TIME
Date /
MeasureType.DATE
Timestamp /
MeasureType.TIMES
TAMP
custom_list_depth integer For list fields, this keyed property specifies the
depth of the field
max_list_length integer Only available for data with a measurement level of
either Geospatial or Collection. Set the maximum
length of the list by specifying the number of
elements the list can contain.
max_string_length integer Only available for typeless data and used when you
are generating SQL to create a table. Enter the
value of the largest string in your data; this
generates a column in the table that is big enough
to contain the string.
asimport Properties
The Analytic Server source enables you to run a stream on Hadoop Distributed File System (HDFS).
Example
node.setPropertyValue("use_default_as", False)
node.setPropertyValue("connection",
["false","9.119.141.141","9080","analyticserver","ibm","admin","admin","false
","","","",""])
Example
where:
Where stored_credential_name
is the name of a Cognos credential in
the repository.
/Public Folders/GOSALES
cognos_items ["field","field", ... ,"field"] The name of one or more data objects
to be imported. The format of field is
[namespace].[query_subject].
[query_item]
cognos_filters field The name of one or more filters to apply
before importing data.
cognos_data_paramet list Values for prompt parameters for data.
ers Name-and-value pairs are enclosed in
square brackets, and multiple pairs are
separated by commas and the whole
string enclosed in square brackets.
Format:
[["param1", "value"],…,["paramN",
"value"]]
cognos_report_direc field The Cognos path of a folder or package
tory from which to import reports, for
example:
/Public Folders/GOSALES
Format:
[["param1", "value"],…,["paramN",
"value"]]
Example
import modeler.api
stream = modeler.script.stream()
node = stream.create("database", "My node")
node.setPropertyValue("mode", "Table")
node.setPropertyValue("query", "SELECT * FROM drug1n")
node.setPropertyValue("datasource", "Drug1n_db")
node.setPropertyValue("username", "spss")
node.setPropertyValue("password", "spss")
node.setPropertyValue("tablename", ".Drug1n")
Right
Both
use_quotes AsNeeded Specify whether table and column names
are enclosed in quotation marks when
Always queries are sent to the database (for
example, if they contain spaces or
punctuation).
Never
query string Specifies the SQL code for the query you
want to submit.
Note: If the database name (in the datasource property) contains spaces, then instead of individual
properties for datasource, username and password, you can also use a single datasource property in
the following format:
[database_name,username,passwor
d[,true | false]]
Use this format also if you are changing the data source; however, if you just want to change the username
or password, you can use the username or password properties.
datacollectionimportnode Properties
The Data Collection Data Import node imports survey data based on the Data
Collection Data Model used by market research products. The Data Collection Data
Library must be installed to use this node.
Example
mrADODsc
mrI2dDsc
mrLogDsc
mrQdiDrsDsc
mrQvDsc
mrSampleReportingMDSC
mrSavDsc
mrSCDsc
mrScriptMDSC
mrADODsc
mrI2dDsc
mrLogDsc
mrPunchDSC
mrQdiDrsDsc
mrQvDsc
mrRdbDsc2
mrSavDsc
mrScDSC
mrXmlDsc
File
Folder
UDL
DSN
casedata_file string When casedata_source_type is File,
specifies the file containing the case data.
casedata_folder string When casedata_source_type is
Folder, specifies the folder containing the
case data.
casedata_udl_string string When casedata_source_type is UDL,
specifies the OLD-DB connection string
for the data source containing the case
data.
Latest
Specify
specific_version string When version_import_mode is Specify,
defines the version of the case data to be
imported.
use_language string Defines whether labels of a specific
language should be used.
language string If use_language is true, defines the
language code to use on import. The
language code should be one of those
available in the case data.
use_context string Defines whether a specific context should
be imported. Contexts are used to vary
the description associated with
responses.
context string If use_context is true, defines the
context to import. The context should be
one of those available in the case data.
use_label_type string Defines whether a specific type of label
should be imported.
label_type string If use_label_type is true, defines the
label type to import. The label type should
be one of those available in the case data.
user_id string For databases requiring an explicit login,
you can provide a user ID and password
to access the data source.
password string
import_system_variables Common Specifies which system variables are
imported.
None
All
import_codes_variables flag
Single
excelimportnode Properties
The Excel Import node imports data from Microsoft Excel in the .xlsx file format. An
ODBC data source is not required.
Examples
extensionimportnode properties
python_script = """
import spss.pyspark
from pyspark.sql.types import *
cxt = spss.pyspark.runtime.getContext()
if cxt.isComputeDataModelOnly():
cxt.setSparkOutputSchema(_schema)
else:
df = cxt.getSparkInputData()
if df is None:
drugList=[(1,23,'F','HIGH','HIGH',0.792535,0.031258,'drugY'), \
(2,47,'M','LOW','HIGH',0.739309,0.056468,'drugC'),\
(3,47,'M','LOW','HIGH',0.697269,0.068944,'drugC'),\
(4,28,'F','NORMAL','HIGH',0.563682,0.072289,'drugX'),\
cxt.setSparkOutputData(df)
"""
node.setPropertyValue("python_syntax", python_script)
R example
#### Script example for R
node.setPropertyValue("syntax_type", "R")
R_script = """# 'JSON Import' Node v1.0 for IBM SPSS Modeler
# 'RJSONIO' package created by Duncan Temple Lang - http://cran.r-project.org/web/packages/
RJSONIO
# 'plyr' package created by Hadley Wickham http://cran.r-project.org/web/packages/plyr
# Node developer: Danil Savine - IBM Extreme Blue 2014
# Description: This node allows you to import into SPSS a table data from a JSON.
# Install function for packages
packages <- function(x){
x <- as.character(match.call()[[2]])
if (!require(x,character.only=TRUE)){
install.packages(pkgs=x,repos="http://cran.r-project.org")
require(x,character.only=TRUE)
}
}
# packages
packages(RJSONIO)
packages(plyr)
### This function is used to generate automatically the dataModel
getMetaData <- function (data) {
if (dim(data)[1]<=0) {
} else {
"""
node.setPropertyValue("r_syntax", R_script)
fixedfilenode Properties
The Fixed File node imports data from fixed-field text files—that is, files whose fields
are not delimited but start at the same position and are of a fixed length. Machine-
generated or legacy data are frequently stored in fixed-field format.
Example
Period
skip_header number Specifies the number of lines to ignore at
the beginning of the first record. Useful for
ignoring column headers.
auto_recognize_datetime flag Specifies whether dates or times are
automatically identified in the source
data.
lines_to_scan number
fields list Structured property.
full_filename string Full name of file to read, including
directory.
strip_spaces None Discards leading and trailing spaces in
strings on import.
Left
Right
Both
invalid_char_mode Discard Removes invalid characters (null, 0, or any
character non-existent in current
Replace encoding) from the data input or replaces
invalid characters with the specified one-
character symbol.
invalid_char_replacement string
use_custom_values flag
100 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 50. fixedfilenode properties (continued)
fixedfilenode properties Data type Property description
custom_storage Unknown
String
Integer
Real
Time
Date
Timestamp
"YYMMDD"
"YYYYMMDD"
"YYYYDDD"
DAY
MONTH
"DD-MM-YY"
"DD-MM-YYYY"
"MM-DD-YY"
"MM-DD-YYYY"
"DD-MON-YY"
"DD-MON-YYYY"
"YYYY-MM-DD"
"DD.MM.YY"
"DD.MM.YYYY"
"MM.DD.YY"
"MM.DD.YYYY"
"DD.MON.YY"
"DD.MON.YYYY"
102 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 50. fixedfilenode properties (continued)
fixedfilenode properties Data type Property description
"DD/MM/YY"
"DD/MM/YYYY"
"MM/DD/YY"
"MM/DD/YYYY"
"DD/MON/YY"
"DD/MON/YYYY"
MON YYYY
q Q YYYY
ww WK YYYY
custom_time_format "HHMMSS" This property is applicable only if a
custom storage has been specified.
"HHMM"
"MMSS"
"HH:MM:SS"
"HH:MM"
"MM:SS"
"(H)H:(M)M:(S)S"
"(H)H:(M)M"
"(M)M:(S)S"
"HH.MM.SS"
"HH.MM"
"MM.SS"
"(H)H.(M)M.(S)S"
"(H)H.(M)M"
"(M)M.(S)S"
SystemDefault
"UTF-8"
jsonimportnode Properties
The JSON source node imports data from a JSON file.
sasimportnode Properties
The SAS Import node imports SAS data into IBM SPSS Modeler.
104 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Example
UNIX
Transport
SAS7
SAS8
SAS9
full_filename string The complete filename that you enter,
including its path.
member_name string Specify the member to import from the
specified SAS transport file.
read_formats flag Reads data formats (such as variable
labels) from the specified format file.
full_format_filename string
import_names NamesAndLabels Specifies the method for mapping variable
names and labels on import.
LabelsasNames
simgennode properties
The Simulation Generate node provides an easy way to generate simulated data—either
from scratch using user specified statistical distributions or automatically using the
distributions obtained from running a Simulation Fitting node on existing historical data.
This is useful when you want to evaluate the outcome of a predictive model in the
presence of uncertainty in the model inputs.
fields example
This is a structured slot parameter with the following syntax:
simgennode.setPropertyValue("fields", [
[field1, storage, locked, [distribution1], min, max],
[field2, storage, locked, [distribution2], min, max],
[field3, storage, locked, [distribution3], min, max]
])
distribution is a declaration of the distribution name followed by a list containing pairs of attribute
names and values. Each distribution is defined in the following way:
For example, to create a node that generates a single field with a Binomial distribution, you might use the
following script:
The Binomial distribution takes 2 parameters: n and prob. Since Binomial does not support minimum and
maximum values, these are supplied as an empty string.
Note: You cannot set the distribution directly; you use it in conjunction with the fields property.
The following examples show all the possible distribution types. Note that the threshold is entered as
thresh in both NegativeBinomialFailures and NegativeBinomialTrial.
stream = modeler.script.stream()
106 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
simgennode.setPropertyValue("fields", [\
beta_dist, \
binomial_dist, \
categorical_dist, \
dice_dist, \
exponential_dist, \
fixed_dist, \
gamma_dist, \
lognormal_dist, \
negbinomialfailures_dist, \
negbinomialtrial_dist, \
normal_dist, \
poisson_dist, \
range_dist, \
triangular_dist, \
uniform_dist, \
weibull_dist
])
correlations example
This is a structured slot parameter with the following syntax:
simgennode.setPropertyValue("correlations", [
[field1, field2, correlation],
[field1, field3, correlation],
[field2, field3, correlation]
])
Correlation can be any number between +1 and -1. You can specify as many or as few correlations as you
like. Any unspecified correlations are set to zero. If any fields are unknown, the correlation value should
be set on the correlation matrix (or table) and is shown in red text. When there are unknown fields, it is
not possible to execute the node.
statisticsimportnode Properties
The IBM SPSS Statistics File node reads data from the .sav file format used by IBM
SPSS Statistics, as well as cache files saved in IBM SPSS Modeler, which also use
the same format.
The properties for this node are described under “statisticsimportnode Properties” on page 409.
Note: This node was deprecated in Modeler 18.0. The replacement node script name is tm1odataimport.
108 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 56. tm1import node properties
tm1import node properties Data type Property description
pm_host string Note: Only for version 16.0 and 17.0
For example:
TM1_import.setPropertyValue("tm1_c
onnection", ['Planning Sample',
"admin", "apple"])
selected_view ["field" "field"] A list property containing the details of the
selected TM1 cube and the name of the cube
view from where data will be imported into
SPSS. For example:
TM1_import.setPropertyValue("selec
ted_view", ['plan_BudgetPlan',
'Goal Input'])
selected_column ["field" ] Specify the selected column; only one item
can be specified.
For example:
setPropertyValue("selected_columns
", ["Measures"])
selected_rows ["field" "field"] Specify the selected rows.
For example:
setPropertyValue("selected_rows",
["Dimension_1_1", "Dimension_2_1",
"Dimension_3_1", "Periods"])
Hybrid
TWCDataImport.dataT Historical Specifies the type of weather data to
ype input. Possible values are Historical
Forecast or Forecast. Historical is the
default.
TWCDataImport.start Integer If Historical is specified for
Date TWCDataImport.dataType, specify a
start date in the format yyyyMMdd.
TWCDataImport.endDa Integer If Historical is specified for
te TWCDataImport.dataType, specify
an end date in the format yyyyMMdd.
TWCDataImport.forec 6 If Forecast is specified for
astHour TWCDataImport.dataType, specify
12 6, 12, 24, or 48 for the hour.
24
48
userinputnode properties
The User Input node provides an easy way to create synthetic data—either from
scratch or by altering existing data. This is useful, for example, when you want to
create a test dataset for modeling.
Example
110 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 58. userinputnode properties
userinputnode properties Data type Property description
data
names Structured slot that sets or returns a list of
field names generated by the node.
custom_storage Unknown Keyed slot that sets or returns the storage
for a field.
String
Integer
Real
Time
Date
Timestamp
data_mode Combined If Combined is specified, records are
generated for each combination of set
Ordered values and min/max values. The number
of records generated is equal to the
product of the number of values in each
field. If Ordered is specified, one value is
taken from each column for each record in
order to generate a row of data. The
number of records generated is equal to
the largest number values associated with
a field. Any fields with fewer data values
will be padded with null values.
values Note: This property has been deprecated
in favor of userinputnode.data and
should no longer be used.
variablefilenode Properties
The Variable File node reads data from free-field text files—that is, files whose
records contain a constant number of fields but a varied number of characters. This
node is also useful for files with fixed-length header text and certain types of
annotations.
Example
Period
multi_blank flag Treats multiple adjacent blank delimiter
characters as a single delimiter.
read_field_names flag Treats the first row in the data file as
labels for the column.
strip_spaces None Discards leading and trailing spaces in
strings on import.
Left
Right
Both
112 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 59. variablefilenode properties (continued)
variablefilenode properties Data type Property description
invalid_char_mode Discard Removes invalid characters (null, 0, or any
character non-existent in current
Replace encoding) from the data input or replaces
invalid characters with the specified one-
character symbol.
invalid_char_replacement string
break_case_by_newline flag Specifies that the line delimiter is the
newline character.
lines_to_scan number Specifies how many lines to scan for
specified data types.
auto_recognize_datetime flag Specifies whether dates or times are
automatically identified in the source
data.
quotes_1 Discard Specifies how single quotation marks are
treated upon import.
PairAndDiscard
IncludeAsText
quotes_2 Discard Specifies how double quotation marks are
treated upon import.
PairAndDiscard
IncludeAsText
full_filename string Full name of file to be read, including
directory.
use_custom_values flag
custom_storage Unknown
String
Integer
Real
Time
Date
Timestamp
"YYMMDD"
"YYYYMMDD"
"YYYYDDD"
DAY
MONTH
"DD-MM-YY"
"DD-MM-YYYY"
"MM-DD-YY"
"MM-DD-YYYY"
"DD-MON-YY"
"DD-MON-YYYY"
"YYYY-MM-DD"
"DD.MM.YY"
"DD.MM.YYYY"
"MM.DD.YY"
"MM.DD.YYYY"
"DD.MON.YY"
"DD.MON.YYYY"
114 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 59. variablefilenode properties (continued)
variablefilenode properties Data type Property description
"DD/MM/YY"
"DD/MM/YYYY"
"MM/DD/YY"
"MM/DD/YYYY"
"DD/MON/YY"
"DD/MON/YYYY"
MON YYYY
q Q YYYY
ww WK YYYY
custom_time_format "HHMMSS" Applicable only if a custom storage has
been specified.
"HHMM"
"MMSS"
"HH:MM:SS"
"HH:MM"
"MM:SS"
"(H)H:(M)M:(S)S"
"(H)H:(M)M"
"(M)M:(S)S"
"HH.MM.SS"
"HH.MM"
"MM.SS"
"(H)H.(M)M.(S)S"
"(H)H.(M)M"
"(M)M.(S)S"
SystemDefault
"UTF-8"
xmlimportnode Properties
The XML source node imports data in XML format into the stream. You can import a
single file, or all files in a directory. You can optionally specify a schema file from which
to read the XML structure.
Example
116 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 60. xmlimportnode properties (continued)
xmlimportnode properties Data type Property description
fields List of items (elements and attributes) to
import. Each item in the list is an XPath
expression.
appendnode properties
The Append node concatenates sets of records. It is useful for combining datasets
with similar structures but different data.
Example
All
create_tag_field flag
tag_field_name string
aggregatenode properties
The Aggregate node replaces a sequence of input records with summarized,
aggregated output records.
Example
aggregatenode.setKeyedPropertyValue
("aggregate_exprs", "Na_MAX",
"MAX('Na')")
Prefix
inc_record_count flag Creates an extra field that specifies how
many input records were aggregated to form
each aggregate record.
count_field string Specifies the name of the record count field.
allow_approximation Boolean Allows approximation of order statistics
when aggregation is performed in Analytic
Server
bin_count integer Specifies the number of bins to use in
approximation
balancenode properties
The Balance node corrects imbalances in a dataset, so it conforms to a specified
condition. The balancing directive adjusts the proportion of records where a
condition is true by the factor specified.
Example
120 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 63. balancenode properties
balancenode properties Data type Property description
directives Structured property to balance proportion of
field values based on number specified (see
example below).
training_data_only flag Specifies that only training data should be
balanced. If no partition field is present in
the stream, then this option is ignored.
cplexoptnode properties
The CPLEX Optimization node provides the ability to use complex mathematical
(CPLEX) based optimization via an Optimization Programming Language (OPL)
model file. This functionality was available in the IBM Analytical Decision
Management product, which is no longer supported. But you can also use the CPLEX
node in SPSS Modeler without requiring IBM Analytical Decision Management.
122 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 64. cplexoptnode properties (continued)
cplexoptnode properties Data type Property description
[[0,0,'Product','Type','Products','pr
od_id_tup','int','prod_id'],
[0,0,'Product','Type','Products','pro
d_name_tup','string',
'prod_name'],
[1,1,'Components','Type','Components'
,
'comp_id_tup','int','comp_id'],
[1,1,'Components','Type',
'Components','comp_name_tup','string'
,'comp_name']]
[['Production','int','res'],
['Remark','string','res_1']['Cost',
'float','res_2']]
derive_stbnode properties
The Space-Time-Boxes node derives Space-Time-Boxes from latitude, longitude
and timestamp fields. You can also identify frequent Space-Time-Boxes as
hangouts.
Example
# Hangouts mode
node.setPropertyValue("mode", "Hangouts")
node.setPropertyValue("hangout_density", "STB_GH7_30MINS")
node.setPropertyValue("id_field", "Event")
node.setPropertyValue("qualifying_duration", "30MINUTES")
node.setPropertyValue("min_events", 4)
node.setPropertyValue("qualifying_pct", 65)
latitude_field field
longitude_field field
124 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 65. Space-Time-Boxes node properties (continued)
derive_stbnode properties Data type Property description
timestamp_field field
hangout_density density A single density. See densities for valid
density values.
densities [density,density,..., Each density is a string, for example
density] STB_GH8_1DAY.
Note: There are limits to which densities are
valid. For the geohash, values from GH1 to
GH15 can be used. For the temporal part,
the following values can be used:
EVER
1YEAR
1MONTH
1DAY
12HOURS
8HOURS
6HOURS
4HOURS
3HOURS
2HOURS
1HOUR
30MINS
15MINS
10MINS
5MINS
2MINS
1MIN
30SECS
15SECS
10SECS
5SECS
2SECS
1SEC
id_field field
qualifying_duration Must be a string.
1DAY
12HOURS
8HOURS
6HOURS
4HOURS
3HOURS
2Hours
1HOUR
30MIN
15MIN
10MIN
5MIN
2MIN
1MIN
30SECS
15SECS
10SECS
5SECS
2SECS
1SECS
name_extension string
Example
default_ascending flag
low_distinct_key_count flag Specifies that you have only a small number
of records and/or a small number of unique
values of the key field(s).
keys_pre_sorted flag Specifies that all records with the same key
values are grouped together in the input.
disable_sql_generation flag
126 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Examples:
The custom options require more than one argument, these are added as a list, for example:
node.setPropertyValue("composite_values", [
[FIELD1, [FILLOPTION1]],
[FIELD2, [FILLOPTION2]],
.
.
])
Example:
node.setPropertyValue("composite_values", [
["Age", ["First"]],
["Name", ["MostFrequent", "First"]],
["Pending", ["IncludesValue", "T"]],
["Marital", ["FirstMatch", "Married", "Divorced", "Separated"]],
["Code", ["Concatenate", "Comma"]]
])
extensionprocessnode properties
process_script = """
import spss.pyspark.runtime
from pyspark.sql.types import *
cxt = spss.pyspark.runtime.getContext()
node.setPropertyValue("python_syntax", process_script)
R example
#### script example for R
node.setPropertyValue("syntax_type", "R")
node.setPropertyValue("r_syntax", """day<-as.Date(modelerData$dob, format="%Y-%m-%d")
next_day<-day + 1
modelerData<-cbind(modelerData,next_day)
var1<-c(fieldName="Next day",fieldLabel="",fieldStorage="date",fieldMeasure="",fieldFormat="",
fieldRole="")
modelerDataModel<-data.frame(modelerDataModel,var1)""")
128 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
mergenode properties
The Merge node takes multiple input records and creates a single output record
containing some or all of the input fields. It is useful for merging data from different
sources, such as internal customer data and purchased demographic data.
Example
FullOuter
PartialOuter
Anti
outer_join_tag.n flag In this property, n is the tag name as
displayed in the Select Dataset dialog box.
Note that multiple tag names may be
specified, as any number of datasets could
contribute incomplete records.
rfmaggregatenode properties
The Recency, Frequency, Monetary (RFM) Aggregate node enables you to take
customers' historical transactional data, strip away any unused data, and combine
all of their remaining transaction data into a single row that lists when they last dealt
with you, how many transactions they have made, and the total monetary value of
those transactions.
Example
130 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 69. rfmaggregatenode properties
rfmaggregatenode Data type Property description
properties
relative_to Fixed Specify the date from which the recency of
transactions will be calculated.
Today
reference_date date Only available if Fixed is chosen in
relative_to.
contiguous flag If your data are presorted so that all records
with the same ID appear together in the
data stream, selecting this option speeds up
processing.
id_field field Specify the field to be used to identify the
customer and their transactions.
date_field field Specify the date field to be used to calculate
recency against.
value_field field Specify the field to be used to calculate the
monetary value.
extension string Specify a prefix or suffix for duplicate
aggregated fields.
add_as Suffix Specify if the extension should be added
as a suffix or a prefix.
Prefix
discard_low_value_reco flag Enable use of the
rds discard_records_below setting.
discard_records_below number Specify a minimum value below which any
transaction details are not used when
calculating the RFM totals. The units of
value relate to the value field selected.
only_recent_transactio flag Enable use of either the
ns specify_transaction_date or
transaction_within_last settings.
specify_transaction_da flag
te
transaction_date_after date Only available if
specify_transaction_date is selected.
Specify the transaction date after which
records will be included in your analysis.
transaction_within_las number Only available if
t transaction_within_last is selected.
Specify the number and type of periods
(days, weeks, months, or years) back from
the Calculate Recency relative to date after
which records will be included in your
analysis.
Rprocessnode Properties
The R Transform node enables you to take data
from an IBM(r) SPSS(r) Modeler stream and modify
the data using your own custom R script. After the
data is modified it is returned to the stream.
Example
convert_datetime flag
convert_datetime_class
POSIXct
POSIXlt
convert_missing flag
use_batch_size flag Enable use of batch processing
132 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 70. Rprocessnode properties (continued)
Rprocessnode properties Data type Property description
batch_size integer Specify the number of data records to be
included in each batch
samplenode properties
The Sample node selects a subset of records. A variety of sample types are
supported, including stratified, clustered, and nonrandom (structured) samples.
Sampling can be useful to improve performance, and to select groups of related
records or transactions for analysis.
Example
Complex
mode Include Include or discard records that meet the
specified condition.
Discard
sample_type First Specifies the sampling method.
OneInN
RandomPct
first_n integer Records up to the specified cutoff point will
be included or discarded.
one_in_n number Include or discard every nth record.
rand_pct number Specify the percentage of records to include
or discard.
use_max_size flag Enable use of the maximum_size setting.
Systematic
sample_units Proportions
Counts
sample_size_proportion Fixed
s
Custom
Variable
sample_size_counts Fixed
Custom
Variable
fixed_proportions number
fixed_counts integer
variable_proportions field
variable_counts field
use_min_stratum_size flag
minimum_stratum_size integer This option only applies when a Complex
sample is taken with Sample
units=Proportions.
use_max_stratum_size flag
maximum_stratum_size integer This option only applies when a Complex
sample is taken with Sample
units=Proportions.
clusters field
stratify_by [field1 ... fieldN]
specify_input_weight flag
input_weight field
new_output_weight string
134 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 71. samplenode properties (continued)
samplenode properties Data type Property description
sizes_proportions [[string string value] If sample_units=proportions and
[string string value]…] sample_size_proportions=Custom,
specifies a value for each possible
combination of values of stratification fields.
default_proportion number
sizes_counts [[string string value] Specifies a value for each possible
[string string value]…] combination of values of stratification fields.
Usage is similar to sizes_proportions
but specifying an integer rather than a
proportion.
default_count number
selectnode properties
The Select node selects or discards a subset of records from the data stream based
on a specific condition. For example, you might select the records that pertain to a
particular sales region.
Example
sortnode properties
The Sort node sorts records into ascending or descending order based on the values
of one or more fields.
Example
spacetimeboxes properties
Space-Time-Boxes (STB) are an extension of Geohashed spatial locations. More
specifically, an STB is an alphanumeric string that represents a regularly shaped
region of space and time.
latitude_field field
longitude_field field
timestamp_field field
136 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 74. spacetimeboxes properties (continued)
spacetimeboxes properties Data type Property description
densities [density, density, Each density is a string. For example:
density…] STB_GH8_1DAY
Note there are limits to which densities are
valid.
For the geohash, values from GH1-GH15
can be used.
For the temporal part, the following values
can be used:
EVER
1YEAR
1MONTH
1DAY
12HOURS
8HOURS
6HOURS
4HOURS
3HOURS
2HOURS
1HOUR
30MINS
15MINS
10MINS
5MINS
2 MINS
1 MIN
30SECS
15SECS
10SECS
5 SECS
2 SECS
1SEC
field_name_extension string
add_extension_as Prefix
Suffix
Unknown
Year
Quarter
Month
Week
Day
Hour
Hour_nonperiod
Minute
Minute_nonperiod
Second
Second_nonperiod
138 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 75. streamingtimeseries properties (continued)
streamingtimeseries Properties Values Property description
period_field field
period_start_value integer
num_days_per_week integer
start_day_of_week Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
num_hours_per_day integer
start_hour_of_day integer
timestamp_increments integer
cyclic_increments integer
cyclic_periods list
output_interval None
Year
Quarter
Month
Week
Day
Hour
Minute
Second
is_same_interval flag
cross_hour flag
aggregate_and_distribute list
Sum
Mode
Min
Max
distribute_default Mean
Sum
group_default Mean
Sum
Mode
Min
Max
missing_imput Linear_interp
Series_mean
K_mean
K_median
Linear_trend
k_span_points integer
use_estimation_period flag
estimation_period Observations
Times
date_estimation list Only available if you use
date_time_field
period_estimation list Only available if you use
use_period
observations_type Latest
Earliest
observations_num integer
140 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 75. streamingtimeseries properties (continued)
streamingtimeseries Properties Values Property description
observations_exclude integer
method ExpertModeler
Exsmooth
Arima
expert_modeler_method ExpertModeler
Exsmooth
Arima
consider_seasonal flag
detect_outliers flag
expert_outlier_additive flag
expert_outlier_level_shift flag
expert_outlier_innovational flag
expert_outlier_level_shift flag
expert_outlier_transient flag
expert_outlier_seasonal_additive flag
expert_outlier_local_trend flag
expert_outlier_additive_patch flag
consider_newesmodels flag
HoltsLinearTrend
BrownsLinearTrend
DampedTrend
SimpleSeasonal
WintersAdditive
WintersMultiplicativ
e
DampedTrendAdditive
DampedTrendMultiplic
ative
MultiplicativeTrendA
dditive
MultiplicativeSeason
al
MultiplicativeTrendM
ultiplicative
MultiplicativeTrend
futureValue_type_method Compute
specify
exsmooth_transformation_type None
SquareRoot
NaturalLog
arima.p integer
arima.d integer
arima.q integer
arima.sp integer
arima.sd integer
arima.sq integer
142 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 75. streamingtimeseries properties (continued)
streamingtimeseries Properties Values Property description
arima_transformation_type None
SquareRoot
NaturalLog
arima_include_constant flag
tf_arima.p. fieldname integer For transfer functions.
tf_arima.d. fieldname integer For transfer functions.
tf_arima.q. fieldname integer For transfer functions.
tf_arima.sp. fieldname integer For transfer functions.
tf_arima.sd. fieldname integer For transfer functions.
tf_arima.sq. fieldname integer For transfer functions.
tf_arima.delay. fieldname integer For transfer functions.
tf_arima.transformation_type. None For transfer functions.
fieldname
SquareRoot
NaturalLog
arima_detect_outliers flag
arima_outlier_additive flag
arima_outlier_level_shift flag
arima_outlier_innovational flag
arima_outlier_transient flag
arima_outlier_seasonal_additive flag
arima_outlier_local_trend flag
arima_outlier_additive_patch flag
conf_limit_pct real
events fields
forecastperiods integer
extend_records_into_future flag
conf_limits flag
noise_res flag
Example
calculate_conf flag
conf_limit_pct real
use_time_intervals_node flag If use_time_intervals_node=true,
then the settings from an upstream Time
Intervals node are used. If
use_time_intervals_node=false,
interval_offset_position,
interval_offset, and interval_type
must be specified.
interval_offset_position LastObservation refers to Last valid
LastObservation
LastRecord observation. LastRecord refers to Count
back from last record.
interval_offset number
interval_type
Periods
Years
Quarters
Months
WeeksNonPeriodic
DaysNonPeriodic
HoursNonPeriodic
MinutesNonPeriodic
SecondsNonPeriodic
events fields
144 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 76. streamingts properties (continued)
streamingts properties Data type Property description
expert_modeler_method
AllModels
Exsmooth
Arima
consider_seasonal flag
detect_outliers flag
expert_outlier_additive flag
expert_outlier_level_shi flag
ft
expert_outlier_innovatio flag
nal
expert_outlier_transient flag
expert_outlier_seasonal_ flag
additive
expert_outlier_local_tre flag
nd
expert_outlier_additive_ flag
patch
exsmooth_model_type
Simple
HoltsLinearTrend
BrownsLinearTrend
DampedTrend
SimpleSeasonal
WintersAdditive
WintersMultiplicativ
e
exsmooth_transformation_
None
type SquareRoot
NaturalLog
tf_arima_transformation_typ None
e. SquareRoot
fieldname NaturalLog
arima_detect_outlier_mod
None
e Automatic
arima_outlier_additive flag
arima_outlier_level_shif flag
t
arima_outlier_innovation flag
al
arima_outlier_transient flag
arima_outlier_seasonal_a flag
dditive
arima_outlier_local_tren flag
d
arima_outlier_additive_p flag
atch
deployment_force_rebuild flag
deployment_rebuild_mode
Count
Percent
deployment_rebuild_count number
deployment_rebuild_pct number
deployment_rebuild_field <field>
146 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Chapter 11. Field Operations Node Properties
anonymizenode properties
The Anonymize node transforms the way field names and values are represented
downstream, thus disguising the original data. This can be useful if you want to
allow other users to build models using sensitive data, such as customer names or
other details.
Example
stream = modeler.script.stream()
varfilenode = stream.createAt("variablefile", "File", 96, 96)
varfilenode.setPropertyValue("full_filename", "$CLEO/DEMOS/DRUG1n")
node = stream.createAt("anonymize", "My node", 192, 96)
# Anonymize node requires the input fields while setting the values
stream.link(varfilenode, node)
node.setKeyedPropertyValue("enable_anonymize", "Age", True)
node.setKeyedPropertyValue("transformation", "Age", "Random")
node.setKeyedPropertyValue("set_random_seed", "Age", True)
node.setKeyedPropertyValue("random_seed", "Age", 123)
node.setKeyedPropertyValue("enable_anonymize", "Drug", True)
node.setKeyedPropertyValue("use_prefix", "Drug", True)
node.setKeyedPropertyValue("prefix", "Drug", "myprefix")
set_random_seed flag When set to True, the specified seed value will be used
(if transformation is also set to Random).
random_seed integer When set_random_seed is set to True, this is the seed
for the random number.
scale number When transformation is set to Fixed, this value is
used for "scale by." The maximum scale value is normally
10 but may be reduced to avoid overflow.
Table 77. anonymizenode properties (continued)
anonymizenode properties Data type Property description
translate number When transformation is set to Fixed, this value is
used for "translate." The maximum translate value is
normally 1000 but may be reduced to avoid overflow.
autodataprepnode properties
The Automated Data Preparation (ADP) node can analyze your data and identify
fixes, screen out fields that are problematic or not likely to be useful, derive new
attributes when appropriate, and improve performance through intelligent screening
and sampling techniques. You can use the node in fully automated fashion, allowing
the node to choose and apply fixes, or you can preview the changes before they are
made and accept, reject, or amend them as desired.
Example
Speed
Accuracy
Custom
custom_fields flag If true, allows you to specify target, input,
and other fields for the current node. If
false, the current settings from an
upstream Type node are used.
target field Specifies a single target field.
inputs [field1 ... fieldN] Input or predictor fields used by the
model.
use_frequency flag
frequency_field field
use_weight flag
weight_field field
excluded_fields Filter
None
148 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 78. autodataprepnode properties (continued)
autodataprepnode properties Data type Property description
if_fields_do_not_match StopExecution
ClearAnalysis
prepare_dates_and_times flag Control access to all the date and time
fields
compute_time_until_date flag
reference_date Today
Fixed
fixed_date date
units_for_date_durations Automatic
Fixed
fixed_date_units Years
Months
Days
compute_time_until_time flag
reference_time CurrentTime
Fixed
fixed_time time
units_for_time_durations Automatic
Fixed
fixed_date_units Hours
Minutes
Seconds
extract_year_from_date flag
extract_month_from_date flag
extract_day_from_date flag
extract_hour_from_time flag
extract_minute_from_time flag
extract_second_from_time flag
exclude_low_quality_input flag
s
exclude_too_many_missing flag
Delete
rescale_continuous_inputs flag
rescaling_method MinMax
ZScore
150 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 78. autodataprepnode properties (continued)
autodataprepnode properties Data type Property description
min_max_minimum number
min_max_maximum number
z_score_final_mean number
z_score_final_sd number
rescale_continuous_target flag
target_final_mean number
target_final_sd number
transform_select_input_fi flag
elds
maximize_association_with flag
_target
p_value_for_merging number
merge_ordinal_features flag
merge_nominal_features flag
minimum_cases_in_category number
bin_continuous_fields flag
p_value_for_binning number
perform_feature_selection flag
p_value_for_selection number
perform_feature_construct flag
ion
transformed_target_name_e string
xtension
transformed_inputs_name_e string
xtension
constructed_features_root string
_name
years_duration_ string
name_extension
months_duration_ string
name_extension
days_duration_ string
name_extension
hours_duration_ string
name_extension
minutes_duration_ string
name_extension
seconds_duration_ string
name_extension
astimeintervalsnode properties
Use the Time Intervals node to specify intervals and derive a new time field for
estimating or forecasting. A full range of time intervals is supported, from seconds to
years.
binningnode properties
The Binning node automatically creates new nominal (set) fields based on the
values of one or more existing continuous (numeric range) fields. For example, you
can transform a continuous income field into a new categorical field containing
groups of income as deviations from the mean. Once you have created bins for the
new field, you can generate a Derive node based on the cut points.
Example
152 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
node.setPropertyValue("fixed_width_add_as", "Suffix")
node.setPropertyValue("fixed_bin_method", "Count")
node.setPropertyValue("fixed_bin_count", 10)
node.setPropertyValue("fixed_bin_width", 3.5)
node.setPropertyValue("tile10", True)
Rank
SDev
Optimal
rcalculate_bins Always Specifies whether the bins are
recalculated and the data placed in the
IfNecessary relevant bin every time the node is
executed, or that data is added only to
existing bins and any new bins that have
been added.
fixed_width_name_extension string The default extension is _BIN.
fixed_width_add_as Suffix Specifies whether the extension is added
to the end (suffix) of the field name or to
Prefix the start (prefix). The default extension is
income_BIN.
fixed_bin_method Width
Count
fixed_bin_count integer Specifies an integer used to determine
the number of fixed-width bins
(categories) for the new field(s).
fixed_bin_width real Value (integer or real) for calculating
width of the bin.
equal_count_name_ string The default extension is _TILE.
extension
equal_count_add_as Suffix Specifies an extension, either suffix or
prefix, used for the field name generated
Prefix by using standard p-tiles. The default
extension is _TILE plus N, where N is the
tile number.
tile4 flag Generates four quantile bins, each
containing 25% of cases.
Prefix
custom_tile integer
equal_count_method RecordCount The RecordCount method seeks to
assign an equal number of records to
ValueSum each bin, while ValueSum assigns
records so that the sum of the values in
each bin is equal.
tied_values_method Next Specifies which bin tied value data is to
be put in.
Current
Random
rank_order Ascending This property includes Ascending
(lowest value is marked 1) or
Descending Descending (highest value is marked 1).
extension
rank_pct flag Each rank is divided by the number of
records with valid values and multiplied
by 100. Percentage fractional ranks fall in
the range of 1–100.
rank_pct_name_extension string The default extension is _P_RANK.
sdev_name_extension string
154 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 80. binningnode properties (continued)
binningnode properties Data type Property description
sdev_add_as Suffix
Prefix
sdev_count One
Two
Three
optimal_name_extension string The default extension is _OPTIMAL.
optimal_add_as Suffix
Prefix
optimal_supervisor_field field Field chosen as the supervisory field to
which the fields selected for binning are
related.
optimal_merge_bins flag Specifies that any bins with small case
counts will be added to a larger,
neighboring bin.
optimal_small_bin_threshold integer
optimal_pre_bin flag Indicates that prebinning of dataset is to
take place.
optimal_max_bins integer Specifies an upper limit to avoid creating
an inordinately large number of bins.
optimal_lower_end_point Inclusive
Exclusive
optimal_first_bin Unbounded
Bounded
optimal_last_bin Unbounded
Bounded
derivenode properties
The Derive node modifies data values or creates new fields from one or more
existing fields. It creates fields of type formula, flag, nominal, state, count, and
conditional.
Example 1
# Create and configure a Flag Derive field node
node = stream.create("derive", "My node")
node.setPropertyValue("new_name", "DrugX_Flag")
Example 2
This script assumes that there are two numeric columns called XPos and YPos that represent the X and Y
coordinates of a point (for example, where an event took place). The script creates a Derive node that
computes a geospatial column from the X and Y coordinates representing that point in a specific
coordinate system:
stream = modeler.script.stream()
# Other stream configuration code
node = stream.createAt("derive", "Location", 192, 96)
node.setPropertyValue("new_name", "Location")
node.setPropertyValue("formula_expr", "['XPos', 'YPos']")
node.setPropertyValue("formula_type", "Geospatial")
# Now we have set the general measurement type, define the
# specifics of the geospatial object
node.setPropertyValue("geo_type", "Point")
node.setPropertyValue("has_coordinate_system", True)
node.setPropertyValue("coordinate_system", "ETRS_1989_EPSG_Arctic_zone_5-47")
Multiple
fields list Used in Multiple mode only to select
multiple fields.
name_extension string Specifies the extension for the new
field name(s).
add_as Suffix Adds the extension as a prefix (at
the beginning) or as a suffix (at the
Prefix end) of the field name.
156 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 81. derivenode properties (continued)
derivenode properties Data type Property description
result_type Formula The six types of new fields that you
can create.
Flag
Set
State
Count
Conditional
formula_expr string Expression for calculating a new
field value in a Derive node.
flag_expr string
flag_true string
flag_false string
set_default string
set_value_cond string Structured to supply the condition
associated with a given value.
state_on_val string Specifies the value for the new field
when the On condition is met.
state_off_val string Specifies the value for the new field
when the Off condition is met.
state_on_expression string
state_off_expression string
state_initial On Assigns each record of the new field
an initial value of On or Off. This
Off value can change as each condition
is met.
count_initial_val string
count_inc_condition string
count_inc_expression string
count_reset_conditio string
n
cond_if_cond string
cond_then_expr string
cond_else_expr string
Set / MeasureType.SET
OrderedSet /
MeasureType.ORDERED_SET
Typeless /
MeasureType.TYPELESS
Collection /
MeasureType.COLLECTION
Geospatial /
MeasureType.GEOSPATIAL
collection_measure Range / MeasureType.RANGE For collection fields (lists with a
depth of 0), this property defines the
Flag / MeasureType.FLAG measurement type associated with
the underlying values.
Set / MeasureType.SET
OrderedSet /
MeasureType.ORDERED_SET
Typeless /
MeasureType.TYPELESS
geo_type Point For geospatial fields, this property
defines the type of geospatial object
MultiPoint represented by this field. This
should be consistent with the list
depth of the values
LineString
MultiLineString
Polygon
MultiPolygon
has_coordinate_syste boolean For geospatial fields, this property
m defines whether this field has a
coordinate system
158 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 81. derivenode properties (continued)
derivenode properties Data type Property description
coordinate_system string For geospatial fields, this property
defines the coordinate system for
this field
ensemblenode properties
The Ensemble node combines two or more model nuggets to obtain more accurate
predictions than can be gained from any one model.
Example
AdjustedPropensityWeighte
dVoting
HighestConfidence
AverageRawPropensity
AverageAdjustedPropensity
set_ensemble_method Voting Specifies the method used to
determine the ensemble score.
ConfidenceWeightedVoting This setting applies only if the
selected target is a nominal field.
HighestConfidence
RawPropensity
AdjustedPropensity
set_voting_tie_selection Random If a voting method is selected,
specifies how ties are resolved.
HighestConfidence This setting applies only if the
selected target is a nominal field.
calculate_standard_error flag If the target field is continuous, a
standard error calculation is run
by default to calculate the
difference between the
measured or estimated values
and the true values; and to show
how close those estimates
matched.
fillernode properties
The Filler node replaces field values and changes storage. You can choose to replace
values based on a CLEM condition, such as @BLANK(@FIELD). Alternatively, you can
choose to replace all blanks or null values with a specific value. A Filler node is often
used together with a Type node to replace missing values.
Example
Blank
Null
BlankAndNull
160 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 83. fillernode properties (continued)
fillernode properties Data type Property description
condition string
replace_with string
filternode properties
The Filter node filters (discards) fields, renames fields, and maps fields from one
source node to another.
Example:
Using the default_include property. Note that setting the value of the default_include property does
not automatically include or exclude all fields; it simply determines the default for the current selection.
This is functionally equivalent to clicking the Include fields by default button in the Filter node dialog
box. For example, suppose you run the following script:
This will cause the node to pass the fields Age and Sex and discard all others. After running the previous
script, now suppose you add the following lines to the script to name two more fields:
node.setPropertyValue("default_include", False)
# Include these two fields in the list
for f in ["BP", "Na"]:
node.setKeyedPropertyValue("include", f, True)
This will add two more fields to the filter so that a total of four fields are passed (Age, Sex, BP, Na). In
other words, resetting the value of default_include to False doesn't automatically reset all fields.
Alternatively, if you now change default_include to True, either using a script or in the Filter node
dialog box, this would flip the behavior so the four fields listed above would be discarded rather than
included. When in doubt, experimenting with the controls in the Filter node dialog box may be helpful in
understanding this interaction.
historynode properties
The History node creates new fields containing data from fields in previous records.
History nodes are most often used for sequential data, such as time series data.
Before using a History node, you may want to sort the data using a Sort node.
Example
162 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
partitionnode properties
The Partition node generates a partition field, which splits the data into separate
subsets for the training, testing, and validation stages of model building.
Example
reclassifynode properties
The Reclassify node transforms one set of categorical values to another.
Reclassification is useful for collapsing categories or regrouping data for analysis.
Example
Prefix
reclassify string Structured property for field values.
use_default flag Use the default value.
default string Specify a default value.
164 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 87. reclassifynode properties (continued)
reclassifynode properties Data type Property description
pick_list [string string … Allows a user to import a list of known
string] new values to populate the drop-down
list in the table.
reordernode properties
The Field Reorder node defines the natural order used to display fields downstream.
This order affects the display of fields in a variety of places, such as tables, lists, and
the Field Chooser. This operation is useful when working with wide datasets to make
fields of interest more visible.
Example
Type
Storage
ascending flag
start_fields [field1 field2 … New fields are inserted after these fields.
fieldn]
end_fields [field1 field2 … New fields are inserted before these
fieldn] fields.
reprojectnode properties
Within SPSS Modeler, items such as the Expression Builder spatial functions, the
Spatio-Temporal Prediction (STP) Node, and the Map Visualization Node use the
projected coordinate system. Use the Reproject node to change the coordinate
system of any data that you import that uses a geographic coordinate system.
Specify
coordinate_system string The name of the coordinate system to be
applied to the fields. Example:
set
reprojectnode.coordinate_system
= “WGS_1984_World_Mercator”
restructurenode properties
The Restructure node converts a nominal or flag field into a group of fields that can
be populated with the values of yet another field. For example, given a field named
payment type, with values of credit, cash, and debit, three new fields would be
created (credit, cash, debit), each of which might contain the value of the actual
payment made.
Example
all
include_field_name flag Indicates whether to use the field name
in the restructured field name.
value_mode OtherFields Indicates the mode for specifying the
values for the restructured fields. With
Flags OtherFields, you must specify which
fields to use (see below). With Flags, the
values are numeric flags.
value_fields list Required if value_mode is
OtherFields. Specifies which fields to
use as value fields.
166 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
rfmanalysisnode properties
The Recency, Frequency, Monetary (RFM) Analysis node enables you to determine
quantitatively which customers are likely to be the best ones by examining how
recently they last purchased from you (recency), how often they purchased
(frequency), and how much they spent over all transactions (monetary).
Example
recalculate_bins Always
IfNecessary
add_outliers flag Available only if recalculate_bins is set
to IfNecessary. If set, records that lie
below the lower bin will be added to the
lower bin, and records above the highest
bin will be added to the highest bin.
Frequency
Monetary
recency_thresholds value value Available only if recalculate_bins is set
to Always. Specify the upper and lower
thresholds for the recency bins. The upper
threshold of one bin is used as the lower
threshold of the next—for example, [10 30
60] would define two bins, the first bin with
upper and lower thresholds of 10 and 30,
with the second bin thresholds of 30 and
60.
frequency_thresholds value value Available only if recalculate_bins is set
to Always.
monetary_thresholds value value Available only if recalculate_bins is set
to Always.
settoflagnode properties
The Set to Flag node derives multiple flag fields based on the categorical values
defined for one or more nominal fields.
Example
all
true_value string Specifies the true value used by the node
when setting a flag. The default is T.
false_value string Specifies the false value used by the
node when setting a flag. The default is F.
168 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 92. settoflagnode properties (continued)
settoflagnode properties Data type Property description
use_extension flag Use an extension as a suffix or prefix to
the new flag field.
extension string
add_as Suffix Specifies whether the extension is added
as a suffix or prefix.
Prefix
aggregate flag Groups records together based on key
fields. All flag fields in a group are
enabled if any record is set to true.
keys list Key fields.
statisticstransformnode properties
The Statistics Transform node runs a selection of IBM SPSS Statistics syntax
commands against data sources in IBM SPSS Modeler. This node requires a licensed
copy of IBM SPSS Statistics.
The properties for this node are described under “statisticstransformnode properties” on page 409.
Example
Periods
CyclicPeriods
Years
Quarters
Months
DaysPerWeek
DaysNonPeriodic
HoursPerDay
HoursNonPeriodic
MinutesPerDay
MinutesNonPeriodic
SecondsPerDay
SecondsNonPeriodic
mode Label Specifies whether you want to label
records consecutively or build the series
Create based on a specified date, timestamp,
or time field.
field field When building the series from the data,
specifies the field that indicates the
date or time for each record.
period_start integer Specifies the starting interval for
periods or cyclic periods
cycle_start integer Starting cycle for cyclic periods.
year_start integer For interval types where applicable,
year in which the first interval falls.
quarter_start integer For interval types where applicable,
quarter in which the first interval falls.
170 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 93. timeintervalsnode properties (continued)
timeintervalsnode properties Data type Property description
month_start
January
February
March
April
May
June
July
August
September
October
November
December
day_start integer
hour_start integer
minute_start integer
second_start integer
periods_per_cycle integer For cyclic periods, number within each
cycle.
fiscal_year_begins For quarterly intervals, specifies the
January
February month when the fiscal year begins.
March
April
May
June
July
August
September
October
November
December
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
10
15
20
30
field_name_extension string
field_name_extension_as_pr flag
efix
172 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 93. timeintervalsnode properties (continued)
timeintervalsnode properties Data type Property description
date_format
"DDMMYY"
"MMDDYY"
"YYMMDD"
"YYYYMMDD"
"YYYYDDD"
DAY
MONTH
"DD-MM-YY"
"DD-MM-YYYY"
"MM-DD-YY"
"MM-DD-YYYY"
"DD-MON-YY"
"DD-MON-YYYY"
"YYYY-MM-DD"
"DD.MM.YY"
"DD.MM.YYYY"
"MM.DD.YYYY"
"DD.MON.YY"
"DD.MON.YYYY"
"DD/MM/YY"
"DD/MM/YYYY"
"MM/DD/YY"
"MM/DD/YYYY"
"DD/MON/YY"
"DD/MON/YYYY"
MON YYYY
q Q YYYY
ww WK YYYY
time_format
"HHMMSS"
"HHMM"
"MMSS"
"HH:MM:SS"
"HH:MM"
"MM:SS"
"(H)H:(M)M:(S)S"
"(H)H:(M)M"
"(M)M:(S)S"
"HH.MM.SS"
"HH.MM"
"MM.SS"
"(H)H.(M)M.(S)S"
"(H)H.(M)M"
"(M)M.(S)S"
Mode
Min
Max
First
Last
TrueIfAnyTrue
MeanOfRecentPoints
True
False
agg_mode All Specifies whether to aggregate or pad
all fields with default functions as
Specify needed or specify the fields and
functions to use.
agg_range_default Mean Specifies the default function to use
when aggregating continuous fields.
Sum
Mode
Min
Max
agg_set_default Mode Specifies the default function to use
when aggregating nominal fields.
First
Last
agg_flag_default TrueIfAnyTrue
Mode
First
Last
pad_range_default Blank Specifies the default function to use
when padding continuous fields.
MeanOfRecentPoints
pad_set_default Blank
MostRecentValue
pad_flag_default Blank
True
False
174 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 93. timeintervalsnode properties (continued)
timeintervalsnode properties Data type Property description
max_records_to_create integer Specifies the maximum number of
records to create when padding the
series.
estimation_from_beginning flag
estimation_to_end flag
estimation_start_offset integer
estimation_num_holdouts integer
create_future_records flag
num_future_records integer
create_future_field flag
future_field_name string
transposenode properties
The Transpose node swaps the data in rows and columns so that records become
fields and fields become records.
Example
typenode properties
The Type node specifies field metadata and properties. For example, you can specify
a measurement level (continuous, nominal, ordinal, or flag) for each field, set
options for handling missing values and system nulls, set the role of a field for
modeling purposes, specify field and value labels, and specify values for a field.
176 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Example
Note that in some cases you may need to fully instantiate the Type node in order for other nodes to work
correctly, such as the fields from property of the Set to Flag node. You can simply connect a Table
node and execute it to instantiate the fields:
None
Partition
Split
Frequency
RecordID
Integer
Real
Time
Date
Timestamp
check None Keyed property for field type and range
checking.
Nullify
Coerce
Discard
Warn
Abort
178 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 95. typenode properties (continued)
typenode Data type Property description
properties
values [value value] For continuous fields, the first value is the
minimum, and the last value is the
maximum. For nominal fields, specify all
values. For flag fields, the first value
represents false, and the last value
represents true. Setting this property
automatically sets the value_mode
property to Specify.
value_mode Read Determines how values are set. Note that
you cannot set this property to Specify
Pass directly; to use specific values, set the
values property.
Read+
Current
Specify
extend_valu flag Applies when value_mode is set to
es Read. Set to T to add newly read values to
any existing values for the field. Set to F
to discard existing values in favor of the
newly read values.
enable_miss flag When set to T, activates tracking of
ing missing values for the field.
missing_val [value value ...] Specifies data values that denote missing
ues data.
range_missi flag Specifies whether a missing-value (blank)
ng range is defined for a field.
missing_low string When range_missing is true, specifies
er the lower bound of the missing-value
range.
missing_upp string When range_missing is true, specifies
er the upper bound of the missing-value
range.
null_missin flag When set to T, nulls (undefined values
g that are displayed as $null$ in the
software) are considered missing values.
whitespace_ flag When set to T, values containing only
missing white space (spaces, tabs, and new lines)
are considered missing values.
description string Specifies the description for a field.
value_label [[Value LabelString] [ Value LabelString] ...] Used to specify labels for value pairs.
s
COMMA
date_format Sets the date format for the field (applies
"DDMMYY"
"MMDDYY" only to fields with DATE or TIMESTAMP
"YYMMDD" storage).
"YYYYMMDD"
"YYYYDDD"
DAY
MONTH
"DD-MM-YY"
"DD-MM-YYYY"
"MM-DD-YY"
"MM-DD-YYYY"
"DD-MON-YY"
"DD-MON-YYYY"
"YYYY-MM-DD"
"DD.MM.YY"
"DD.MM.YYYY"
"MM.DD.YYYY"
"DD.MON.YY"
"DD.MON.YYYY"
"DD/MM/YY"
"DD/MM/YYYY"
"MM/DD/YY"
"MM/DD/YYYY"
"DD/MON/YY"
"DD/MON/YYYY"
MON YYYY
q Q YYYY
ww WK YYYY
180 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 95. typenode properties (continued)
typenode Data type Property description
properties
number_form DEFAULT Sets the number display format for the
at field.
STANDARD
SCIENTIFIC
CURRENCY
standard_pl integer Sets the number of decimal places for the
aces field when displayed in standard format. A
value of –1 will use the stream default.
Note that the existing display_places
slot will also change this but is now
deprecated.
scientific_ integer Sets the number of decimal places for the
places field when displayed in scientific format.
A value of –1 will use the stream default.
currency_pl integer Sets the number of decimal places for the
aces field when displayed in currency format. A
value of –1 will use the stream default.
grouping_sy DEFAULT Sets the grouping symbol for the field.
mbol
NONE
LOCALE
PERIOD
COMMA
SPACE
column_widt integer Sets the column width for the field. A
h value of –1 will set column width to Auto.
justify AUTO Sets the column justification for the field.
CENTER
LEFT
RIGHT
OrderedSet /
MeasureType.ORDERED_SET
Typeless / MeasureType.TYPELESS
Collection /
MeasureType.COLLECTION
Geospatial /
MeasureType.GEOSPATIAL
collection_ Range / MeasureType.RANGE For collection fields (lists with a depth of
measure 0), this keyed property defines the
Flag / MeasureType.FLAG measurement type associated with the
underlying values.
Set / MeasureType.SET
OrderedSet /
MeasureType.ORDERED_SET
Typeless / MeasureType.TYPELESS
geo_type Point For geospatial fields, this keyed property
defines the type of geospatial object
MultiPoint represented by this field. This should be
consistent with the list depth of the
values.
LineString
MultiLineString
Polygon
MultiPolygon
has_coordin boolean For geospatial fields, this property defines
ate_ system whether this field has a coordinate
system
coordinate_ string For geospatial fields, this keyed property
system defines the coordinate system for this
field.
182 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 95. typenode properties (continued)
typenode Data type Property description
properties
custom_stor Unknown / MeasureType.UNKNOWN This keyed property is similar to
age_ type custom_storage in that it can be used
String / MeasureType.STRING to define the override storage for the field.
What is different is that in Python
scripting, the setter function can also be
Integer / MeasureType.INTEGER passed one of the StorageType values
while the getter will always return on the
Real / MeasureType.REAL StorageType values.
Time / MeasureType.TIME
Date / MeasureType.DATE
Timestamp /
MeasureType.TIMESTAMP
List / MeasureType.LIST
custom_list String / MeasureType.STRING For list fields, this keyed property
_ specifies the storage type of the
storage_typ Integer / MeasureType.INTEGER underlying values.
e
Real / MeasureType.REAL
Time / MeasureType.TIME
Date / MeasureType.DATE
Timestamp /
MeasureType.TIMESTAMP
custom_list integer For list fields, this keyed property
_depth specifies the depth of the field
max_list_le integer Only available for data with a
ngth measurement level of either Geospatial or
Collection. Set the maximum length of the
list by specifying the number of elements
the list can contain.
max_string_ integer Only available for typeless data and used
length when you are generating SQL to create a
table. Enter the value of the largest string
in your data; this generates a column in
the table that is big enough to contain the
string.
PNG
HTML
output (.cou)
full_filename string Specifies the target path and filename for output
generated from the graph node.
use_graph_size flag Controls whether the graph is sized explicitly,
using the width and height properties below.
Affects only graphs that are output to screen.
Not available for the Distribution node.
graph_width number When use_graph_size is True, sets the graph
width in pixels.
graph_height number When use_graph_size is True, sets the graph
height in pixels.
plotnode.setPropertyValue("color_field", "")
Specifying colors
The colors for titles, captions, backgrounds, and labels can be specified by using the hexadecimal strings
starting with the hash (#) symbol. For example, to set the graph background to sky blue, you would use
the following statement:
mygraphnode.setPropertyValue("graph_background", "#87CEEB")
Here, the first two digits, 87, specify the red content; the middle two digits, CE, specify the green content;
and the last two digits, EB, specify the blue content. Each digit can take a value in the range 0–9 or A–F.
Together, these values can specify a red-green-blue, or RGB, color.
Note: When specifying colors in RGB, you can use the Field Chooser in the user interface to determine the
correct color code. Simply hover over the color to activate a ToolTip with the desired information.
collectionnode Properties
The Collection node shows the distribution of values for one numeric field relative to
the values of another. (It creates graphs that are similar to histograms.) It is useful
for illustrating a variable or field whose values change over time. Using 3-D graphing,
you can also include a symbolic axis displaying distributions by category.
Example
186 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 97. collectionnode properties (continued)
collectionnode properties Data type Property description
by_label string
operation Sum
Mean
Min
Max
SDev
color_field string
panel_field string
animation_field string
range_mode Automatic
UserDefined
range_min number
range_max number
bins ByNumber
ByWidth
num_bins number
bin_width number
use_grid flag
graph_background color Standard graph colors are described at the
beginning of this section.
page_background color Standard graph colors are described at the
beginning of this section.
distributionnode Properties
The Distribution node shows the occurrence of symbolic (categorical) values, such
as mortgage type or gender. Typically, you might use the Distribution node to show
imbalances in the data, which you could then rectify using a Balance node before
creating a model.
Example
Flags
x_field field
color_field field Overlay field.
normalize flag
sort_mode ByOccurence
Alphabetic
use_proportional_scale flag
evaluationnode Properties
The Evaluation node helps to evaluate and compare predictive models. The
evaluation chart shows how well models predict particular outcomes. It sorts
records based on the predicted value and confidence of the prediction. It splits the
records into groups of equal size (quantiles) and then plots the value of the
business criterion for each quantile from highest to lowest. Multiple models are
shown as separate lines in the plot.
Example
188 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 99. evaluationnode properties
evaluationnode properties Data type Property description
chart_type
Gains
Response
Lift
Profit
ROI
ROC
inc_baseline flag
field_detection_method Metadata
Name
use_fixed_cost flag
cost_value number
cost_field string
use_fixed_revenue flag
revenue_value number
revenue_field string
use_fixed_weight flag
weight_value number
weight_field field
n_tile Quartiles
Quintles
Deciles
Vingtiles
Percentiles
1000-tiles
cumulative flag
style Line
Point
export_data flag
data_filename string
delimiter string
new_line flag
inc_field_names flag
inc_best_line flag
inc_business_rule flag
business_rule_condition string
plot_score_fields flag
score_fields [field1 ... fieldN]
target_field field
use_hit_condition flag
hit_condition string
use_score_expression flag
score_expression string
caption_auto flag
graphboardnode Properties
The Graphboard node offers many different types of graphs in one single node.
Using this node, you can choose the data fields you want to explore and then select
a graph from those available for the selected data. The node automatically filters out
any graph types that would not work with the field choices.
Note: If you set a property that is not valid for the graph type (for example, specifying y_field for a
histogram), that property is ignored.
190 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Note: In the UI, on the Detailed tab of many different graph types, there is a Summary field; this field is
not currently supported by scripting.
Example
3DArea
3DBar
3DDensity
3DHistogram
3DPie
3DScatterplot
Area
ArrowMap
Bar
BarCounts
BarCountsMap
BarMap
BinnedScatter
Boxplot
Bubble
ChoroplethMeans
ChoroplethMedians
ChoroplethSums
ChoroplethValues
192 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 100. graphboardnode properties (continued)
graphboard Data type Property description
properties
ChoroplethCounts
CoordinateMap
CoordinateChoroplethMeans
CoordinateChoroplethMedians
CoordinateChoroplethSums
CoordinateChoroplethValues
CoordinateChoroplethCounts
Dotplot
Heatmap
HexBinScatter
Histogram
Line
LineChartMap
LineOverlayMap
Parallel
Path
Pie
PieCountMap
PieCounts
PieMap
PolygonOverlayMap
Ribbon
Scatterplot
SPLOM
Surface
x_field field Specifies a custom label for the x
axis. Available only for labels.
y_field field Specifies a custom label for the y
axis. Available only for labels.
z_field field Used in some 3-D graphs.
color_field field Used in heat maps.
size_field field Used in bubble plots.
categories_fiel field
d
values_field field
rows_field field
columns_field field
fields field
start_longitude field Used with arrows on a reference
_field map.
end_longitude_f field
ield
start_latitude_ field
field
end_latitude_fi field
eld
data_key_field field Used in various maps.
panelrow_field string
panelcol_field string
animation_field string
longitude_field field Used with co-ordinates on maps.
latitude_field field
map_color_field field
194 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
histogramnode Properties
The Histogram node shows the occurrence of values for numeric fields. It is often
used to explore the data before manipulations and model building. Similar to the
Distribution node, the Histogram node frequently reveals imbalances in the data.
Example
UserDefined
range_min number
range_max number
bins ByNumber
ByWidth
num_bins number
bin_width number
normalize flag
separate_bands flag
x_label_auto flag
x_label string
y_label_auto flag
y_label string
use_grid flag
mapvisualization properties
The Map Visualization node can accept multiple input connections and display
geospatial data on a map as a series of layers. Each layer is a single geospatial field;
for example, the base layer might be a map of a country, then above that you might
have one layer for roads, one layer for rivers, and one layer for towns.
196 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 102. mapvisualization properties (continued)
mapvisualization Data type Property description
properties
color string If standard is selected for color_type,
the drop-down contains the same color
palette as the chart category color order on
the user options Display tab.
Default is chart category color 1.
198 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 102. mapvisualization properties (continued)
mapvisualization Data type Property description
properties
color_aggregation and string If you select an overlay field for a points
transp_aggregation layer using hex binning, then all the values
for that field must be aggregated for all
points within the hexagon. Therefore, you
must specify an aggregation function for any
overlay fields you want to apply to the map.
The available aggregation functions are:
Continuous (Real or Integer storage):
• Sum
• Mean
• Min
• Max
• Median
• 1st Quartile
• 3rd Quartile
Continuous (Time, Date, or Timestamp
storage):
• Mean
• Min
• Max
Nominal/Categorical:
• Mode
• Min
• Max
Flag:
• True if any true
• False if any false
multiplotnode Properties
The Multiplot node creates a plot that displays multiple Y fields over a single X field.
The Y fields are plotted as colored lines; each is equivalent to a Plot node with Style
set to Line and X Mode set to Sort. Multiplots are useful when you want to explore
the fluctuation of several variables over time.
Example
200 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 103. multiplotnode properties (continued)
multiplotnode properties Data type Property description
use_overlay_expr flag
overlay_expression string
records_limit number
if_over_limit PlotBins
PlotSample
PlotAll
x_label_auto flag
x_label string
y_label_auto flag
y_label string
use_grid flag
graph_background color Standard graph colors are described at the
beginning of this section.
page_background color Standard graph colors are described at the
beginning of this section.
plotnode Properties
The Plot node shows the relationship between numeric fields. You can create a plot
by using points (a scatterplot) or lines.
Example
Function
overlay_expression string Specifies the expression used when
overlay_type is set to Function.
style Point
Line
202 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 104. plotnode properties (continued)
plotnode properties Data type Property description
point_type
Rectangle
Dot
Triangle
Hexagon
Plus
Pentagon
Star
BowTie
HorizontalDash
VerticalDash
IronCross
Factory
House
Cathedral
OnionDome
ConcaveTriangle
OblateGlobe
CatEye
FourSidedPillow
RoundRectangle
Fan
x_mode Sort
Overlay
AsRead
x_range_mode Automatic
UserDefined
x_range_min number
x_range_max number
y_range_mode Automatic
UserDefined
y_range_min number
y_range_max number
z_range_mode Automatic
UserDefined
z_range_min number
z_range_max number
jitter flag
records_limit number
if_over_limit PlotBins
PlotSample
PlotAll
timeplotnode Properties
The Time Plot node displays one or more sets of time series data. Typically, you
would first use a Time Intervals node to create a TimeLabel field, which would be
used to label the x axis.
Example
Models
use_custom_x_field flag
x_field field
y_fields list
panel flag
normalize flag
line flag
204 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 105. timeplotnode properties (continued)
timeplotnode properties Data type Property description
points flag
point_type
Rectangle
Dot
Triangle
Hexagon
Plus
Pentagon
Star
BowTie
HorizontalDash
VerticalDash
IronCross
Factory
House
Cathedral
OnionDome
ConcaveTriangle
OblateGlobe
CatEye
FourSidedPillow
RoundRectangle
Fan
smoother flag You can add smoothers to the plot only if you
set panel to True.
use_records_limit flag
records_limit integer
symbol_size number Specifies a symbol size.
panel_layout Horizontal
Vertical
eplotnode Properties
The E-Plot (Beta) node shows the relationship between numeric fields. It is similar
to the Plot node, but its options differ and its output uses a new graphing interface
specific to this node. Use the beta-level node to play around with new graphing
features.
tsnenode Properties
206 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 107. tsnenode properties (continued)
tsnenode properties Data type Property description
early_exaggeration float Controls how tight the natural clusters in
the original space are in the embedded
space, and how much space will be
between them. Default is 12.0.
learning_rate float Default is 200.
n_iter integer Maximum number of iterations for the
optimization. Set to at least 250. Default is
1000.
angle float The angular size of the distant node as
measured from a point. Specify a value in
the range of 0-1. Default is 0.5.
enable_random_seed Boolean Set to true to enable the random_seed
parameter. Default is false.
random_seed integer The random number seed to use. Default
is None.
n_iter_without_progress integer Maximum iterations without progress.
Default is 300.
min_grad_norm string If the gradient norm is below this
threshold, the optimization will be
stopped. Default is 1.0E-7. Possible
values are:
• 1.0E-1
• 1.0E-2
• 1.0E-3
• 1.0E-4
• 1.0E-5
• 1.0E-6
• 1.0E-7
• 1.0E-8
Example
OverallPct
PctLarger
PctSmaller
strong_links_heavier flag
num_links ShowMaximum
ShowLinksAbove
ShowAll
max_num_links number
208 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 108. webnode properties (continued)
webnode properties Data type Property description
links_above number
discard_links_min flag
links_min_records number
discard_links_max flag
links_max_records number
weak_below number
strong_above number
link_size_continuous flag
web_display Circular
Network
Directed
Grid
graph_background color Standard graph colors are described at the
beginning of this section.
symbol_size number Specifies a symbol size.
Expert
anomalydetectionnode properties
The Anomaly Detection node identifies unusual cases, or outliers, that do not
conform to patterns of “normal” data. With this node, it is possible to identify
outliers even if they do not fit any previously known patterns and even if you are not
exactly sure what you are looking for.
Example
Simple
anomaly_method IndexLevel Specifies the method used to
determine the cutoff value for
PerRecords flagging records as anomalous.
NumRecords
index_level number Specifies the minimum cutoff value
for flagging anomalies.
percent_records number Sets the threshold for flagging
records based on the percentage of
records in the training data.
num_records number Sets the threshold for flagging
records based on the number of
records in the training data.
num_fields integer The number of fields to report for
each anomalous record.
impute_missing_values flag
adjustment_coeff number Value used to balance the relative
weight given to continuous and
categorical fields in calculating the
distance.
212 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 110. anomalydetectionnode properties (continued)
anomalydetectionnode Values Property description
Properties
peer_group_num_auto flag Automatically calculates the number
of peer groups.
min_num_peer_groups integer Specifies the minimum number of
peer groups used when
peer_group_num_auto is set to
True.
max_num_per_groups integer Specifies the maximum number of
peer groups.
num_peer_groups integer Specifies the number of peer groups
used when peer_group_num_auto
is set to False.
noise_level number Determines how outliers are treated
during clustering. Specify a value
between 0 and 0.5.
noise_ratio number Specifies the portion of memory
allocated for the component that
should be used for noise buffering.
Specify a value between 0 and 0.5.
apriorinode properties
The Apriori node extracts a set of rules from the data, pulling out the rules with the
highest information content. Apriori offers five different methods of selecting rules
and uses a sophisticated indexing scheme to process large data sets efficiently. For
large problems, Apriori is generally faster to train; it has no arbitrary limit on the
number of rules that can be retained, and it can handle rules with up to 32
preconditions. Apriori requires that input and output fields all be categorical but
delivers better performance because it is optimized for this type of data.
Example
Memory
use_transactional_data flag When the value is true, the score for each
transaction ID is independent from other
transaction IDs. When the data to be
scored is too large to obtain acceptable
performance, we recommend separating
the data.
contiguous flag
id_field string
content_field string
mode Simple
Expert
evaluation RuleConfidence
DifferenceToPrior
ConfidenceRatio
InformationDifferenc
e
NormalizedChiSquare
lower_bound number
214 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 111. apriorinode properties (continued)
apriorinode Properties Values Property description
optimize Speed Use to specify whether model building
should be optimized for speed or for
Memory memory.
associationrulesnode properties
The Association Rules Node is similar to the Apriori Node; however, unlike Apriori,
the Association Rules Node can process list data. In addition, the Association Rules
Node can be used with IBM SPSS Analytic Server to process big data and take
advantage of faster parallel processing.
Lift
Conditionsupport
Deployability
true_flags Boolean Setting as Y determines that only the true
values for flag fields are considered during
rule building.
rule_criterion Boolean Setting as Y determines that the rule
criterion values are used for excluding rules
during model building.
216 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 112. associationrulesnode properties (continued)
associationrulesnode Data type Property description
properties
rules_to_display upto The maximum number of rules to display in
the output tables.
all
display_upto integer If upto is set in rules_to_display, set
the number of rules to display in the output
tables. Minimum 1.
field_transformations Boolean
records_summary Boolean
rule_statistics Boolean
most_frequent_values Boolean
most_frequent_fields Boolean
word_cloud Boolean
word_cloud_sort Confidence
Rulesupport
Lift
Conditionsupport
Deployability
word_cloud_display integer Minimum 1, maximum 20
max_predictions integer The maximum number of rules that can be
applied to each input to the score.
criterion Confidence Select the measure used to determine the
strength of rules.
Rulesupport
Lift
Conditionsupport
Deployability
allow_repeats Boolean Determine whether rules with the same
prediction are included in the score.
check_input NoPredictions
Predictions
NoCheck
Example
Area_under_curve
Profit
Lift
Num_variables
ranking_dataset Training
Test
number_of_models integer Number of models to include in
the model nugget. Specify an
integer between 1 and 100.
calculate_variable_importance flag
enable_accuracy_limit flag
accuracy_limit integer Integer between 0 and 100.
enable_ area_under_curve flag
_limit
218 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 113. autoclassifiernode properties (continued)
autoclassifiernode Properties Values Property description
area_under_curve_limit number Real number between 0.0 and
1.0.
enable_profit_limit flag
profit_limit number Integer greater than 0.
enable_lift_limit flag
lift_limit number Real number greater than 1.0.
enable_number_of_variables_lim flag
it
number_of_variables_limit number Integer greater than 0.
use_fixed_cost flag
fixed_cost number Real number greater than 0.0.
variable_cost field
use_fixed_revenue flag
fixed_revenue number Real number greater than 0.0.
variable_revenue field
use_fixed_weight flag
fixed_weight number Real number greater than 0.0
variable_weight field
lift_percentile number Integer between 0 and 100.
enable_model_build_time_limit flag
model_build_time_limit number Integer set to the number of
minutes to limit the time taken to
build each individual model.
enable_stop_after_time_limit flag
stop_after_time_limit number Real number set to the number of
hours to limit the overall elapsed
time for an auto classifier run.
enable_stop_after_valid_model_ flag
produced
use_costs flag
<algorithm> flag Enables or disables the use of a
specific algorithm.
<algorithm>.<property> string Sets a property value for a specific
algorithm. See the topic “Setting
Algorithm Properties” on page
220 for more information.
For example:
Algorithm names for the Auto Classifier node are cart, chaid, quest, c50, logreg, decisionlist,
bayesnet, discriminant, svm and knn.
Algorithm names for the Auto Numeric node are cart, chaid, neuralnetwork, genlin, svm,
regression, linear and knn.
Algorithm names for the Auto Cluster node are twostep, k-means, and kohonen.
Property names are standard as documented for each algorithm node.
Algorithm properties that contain periods or other punctuation must be wrapped in single quotes, for
example:
node.setPropertyValue("chaid", True)
Note: In cases where certain algorithm options are not available in the Auto Classifier node, or when only
a single value can be specified rather than a range of values, the same limits apply with scripting as when
accessing the node in the standard manner.
autoclusternode properties
The Auto Cluster node estimates and compares clustering models, which identify
groups of records that have similar characteristics. The node works in the same
manner as other automated modeling nodes, allowing you to experiment with
multiple combinations of options in a single modeling pass. Models can be
compared using basic measures with which to attempt to filter and rank the
usefulness of the cluster models, and provide a measure based on the importance of
particular fields.
Example
220 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 114. autoclusternode properties
autoclusternode Properties Values Property description
evaluation field Note: Auto Cluster node only.
Identifies the field for which an
importance value will be calculated.
Alternatively, can be used to identify
how well the cluster differentiates the
value of this field and, therefore; how
well the model will predict this field.
ranking_measure Silhouette
Num_clusters
Size_smallest_cluster
Size_largest_cluster
Smallest_to_largest
Importance
ranking_dataset Training
Test
summary_limit integer Number of models to list in the report.
Specify an integer between 1 and 100.
enable_silhouette_limit flag
silhouette_limit integer Integer between 0 and 100.
enable_number_less_limit flag
number_less_limit number Real number between 0.0 and 1.0.
enable_number_greater_li flag
mit
number_greater_limit number Integer greater than 0.
enable_smallest_cluster_ flag
limit
smallest_cluster_units Percentage
Counts
smallest_cluster_limit_p number
ercentage
smallest_cluster_limit_c integer Integer greater than 0.
ount
enable_largest_cluster_l flag
imit
Counts
largest_cluster_limit_pe number
rcentage
largest_cluster_limit_co integer
unt
enable_smallest_largest_ flag
limit
smallest_largest_limit number
enable_importance_limit flag
importance_limit_conditi Greater_than
on
Less_than
importance_limit_greater number Integer between 0 and 100.
_than
importance_limit_less_th number Integer between 0 and 100.
an
<algorithm> flag Enables or disables the use of a
specific algorithm.
<algorithm>.<property> string Sets a property value for a specific
algorithm. See the topic “Setting
Algorithm Properties” on page 220 for
more information.
autonumericnode properties
The Auto Numeric node estimates and compares models for continuous numeric
range outcomes using a number of different methods. The node works in the same
manner as the Auto Classifier node, allowing you to choose the algorithms to use
and to experiment with multiple combinations of options in a single modeling pass.
Supported algorithms include neural networks, C&R Tree, CHAID, linear regression,
generalized linear regression, and support vector machines (SVM). Models can be
compared based on correlation, relative error, or number of variables used.
Example
222 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 115. autonumericnode properties
autonumericnode Properties Values Property description
custom_fields flag If True, custom field settings will be
used instead of type node settings.
target field The Auto Numeric node requires a
single target and one or more input
fields. Weight and frequency fields can
also be specified. See the topic
“Common modeling node properties”
on page 211 for more information.
inputs [field1 … field2]
partition field
use_frequency flag
frequency_field field
use_weight flag
weight_field field
use_partitioned_data flag If a partition field is defined, only the
training data are used for model
building.
ranking_measure Correlation
NumberOfFields
ranking_dataset Test
Training
number_of_models integer Number of models to include in the
model nugget. Specify an integer
between 1 and 100.
calculate_variable_impor flag
tance
enable_correlation_limit flag
correlation_limit integer
enable_number_of_fields_ flag
limit
number_of_fields_limit integer
enable_relative_error_li flag
mit
relative_error_limit integer
enable_model_build_time_ flag
limit
model_build_time_limit integer
enable_stop_after_time_l flag
imit
bayesnetnode properties
The Bayesian Network node enables you to build a probability model by combining
observed and recorded evidence with real-world knowledge to establish the
likelihood of occurrences. The node focuses on Tree Augmented Naïve Bayes (TAN)
and Markov Blanket networks that are primarily used for classification.
Example
224 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 116. bayesnetnode properties (continued)
bayesnetnode Properties Values Property description
mode Expert
Simple
missing_values flag
all_probabilities flag
independence Likelihood Specifies the method used to
determine whether paired
Pearson observations on two variables are
independent of each other.
significance_level number Specifies the cutoff value for
determining independence.
maximal_conditioning_set number Sets the maximal number of
conditioning variables to be used for
independence testing.
inputs_always_selected [field1 ... fieldN] Specifies which fields from the
dataset are always to be used when
building the Bayesian network.
buildr properties
The R Building node enables you to enter custom R
script to perform model building and model scoring
deployed in IBM SPSS Modeler.
Example
c50node properties
The C5.0 node builds either a decision tree or a rule set. The model works by
splitting the sample based on the field that provides the maximum information gain
at each level. The target field must be categorical. Multiple splits into more than two
subgroups are allowed.
Example
226 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 118. c50node properties
c50node Properties Values Property description
target field C50 models use a single target field
and one or more input fields. A weight
field can also be specified. See the
topic “Common modeling node
properties” on page 211 for more
information.
output_type DecisionTree
RuleSet
group_symbolics flag
use_boost flag
boost_num_trials number
use_xval flag
xval_num_folds number
mode Simple
Expert
favor Accuracy Favor accuracy or generality.
Generality
expected_noise number
min_child_records number
pruning_severity number
use_costs flag
costs structured This is a structured property.
use_winnowing flag
use_global_pruning flag On (True) by default.
calculate_variable_impor flag
tance
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
adjusted_propensity_part Test
ition
Validation
Example
228 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 119. carmanode properties (continued)
carmanode Properties Values Property description
mode Simple The default is Simple.
Expert
exclude_multiple flag Excludes rules with multiple
consequents. The default is False.
use_pruning flag The default is False.
pruning_value number The default is 500.
vary_support flag
estimated_transactions integer
rules_without_antecedent flag
s
cartnode properties
The Classification and Regression (C&R) Tree node generates a decision tree that
allows you to predict or classify future observations. The method uses recursive
partitioning to split the training records into segments by minimizing the impurity at
each step, where a node in the tree is considered “pure” if 100% of cases in the
node fall into a specific category of the target field. Target and input fields can be
numeric ranges or categorical (nominal, ordinal, or flags); all splits are binary (only
two subgroups).
Example
Bagging
psm
model_output_type Single
InteractiveBuilder
use_tree_directives flag
tree_directives string Specify directives for growing the tree.
Directives can be wrapped in triple
quotes to avoid escaping newlines or
quotes. Note that directives may be
highly sensitive to minor changes in
data or modeling options and may not
generalize to other datasets.
use_max_depth Default
Custom
max_depth integer Maximum tree depth, from 0 to 1000.
Used only if use_max_depth =
Custom.
prune_tree flag Prune tree to avoid overfitting.
use_std_err flag Use maximum difference in risk (in
Standard Errors).
std_err_multiplier number Maximum difference.
max_surrogates number Maximum surrogates.
use_percentage flag
min_parent_records_pc number
min_child_records_pc number
min_parent_records_abs number
min_child_records_abs number
230 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 120. cartnode properties (continued)
cartnode Properties Values Property description
use_costs flag
costs structured Structured property.
priors Data
Equal
Custom
custom_priors structured Structured property.
adjust_priors flag
trails number Number of component models for
boosting or bagging.
set_ensemble_method Voting Default combining rule for categorical
targets.
HighestProbability
HighestMeanProbabilit
y
range_ensemble_method Mean Default combining rule for continuous
targets.
Median
large_boost flag Apply boosting to very large data sets.
min_impurity number
impurity_measure Gini
Twoing
Ordered
train_pct number Overfit prevention set.
set_random_seed flag Replicate results option.
seed number
calculate_variable_impor flag
tance
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
adjusted_propensity_part Test
ition
Validation
Example
node.setPropertyValue("custom_fields", True)
node.setPropertyValue("target", "Drug")
node.setPropertyValue("inputs", ["Age", "Na", "K", "Cholesterol", "BP"])
node.setPropertyValue("use_model_name", True)
node.setPropertyValue("model_name", "CHAID")
node.setPropertyValue("method", "Chaid")
node.setPropertyValue("model_output_type", "InteractiveBuilder")
node.setPropertyValue("use_tree_directives", True)
node.setPropertyValue("tree_directives", "Test")
node.setPropertyValue("split_alpha", 0.03)
node.setPropertyValue("merge_alpha", 0.04)
node.setPropertyValue("chi_square", "Pearson")
node.setPropertyValue("use_percentage", False)
node.setPropertyValue("min_parent_records_abs", 40)
node.setPropertyValue("min_child_records_abs", 30)
node.setPropertyValue("epsilon", 0.003)
node.setPropertyValue("max_iterations", 75)
node.setPropertyValue("split_merged_categories", True)
node.setPropertyValue("bonferroni_adjustment", True)
Bagging
psm
232 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 121. chaidnode properties (continued)
chaidnode Properties Values Property description
model_output_type Single
InteractiveBuilder
use_tree_directives flag
tree_directives string
method Chaid
ExhaustiveChaid
use_max_depth Default
Custom
max_depth integer Maximum tree depth, from 0 to 1000.
Used only if use_max_depth =
Custom.
use_percentage flag
min_parent_records_pc number
min_child_records_pc number
min_parent_records_abs number
min_child_records_abs number
use_costs flag
costs structured Structured property.
trails number Number of component models for
boosting or bagging.
set_ensemble_method Voting Default combining rule for categorical
targets.
HighestProbability
HighestMeanProbabilit
y
range_ensemble_method Mean Default combining rule for continuous
targets.
Median
large_boost flag Apply boosting to very large data sets.
split_alpha number Significance level for splitting.
merge_alpha number Significance level for merging.
bonferroni_adjustment flag Adjust significance values using
Bonferroni method.
split_merged_categories flag Allow resplitting of merged categories.
coxregnode properties
The Cox regression node enables you to build a survival model for time-to-event
data in the presence of censored records. The model produces a survival function
that predicts the probability that the event of interest has occurred at a given time (t)
for given values of the input variables.
Example
234 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 122. coxregnode properties (continued)
coxregnode Properties Values Property description
method Enter
Stepwise
BackwardsStepwise
groups field
model_type MainEffects
Custom
custom_terms ["BP*Sex" "BP*Age"]
mode Expert
Simple
max_iterations number
p_converge 1.0E-4
1.0E-5
1.0E-6
1.0E-7
1.0E-8
0
p_converge 1.0E-4
1.0E-5
1.0E-6
1.0E-7
1.0E-8
1.0E-2
1.0E-3
1.0E-4
1.0E-5
0
removal_criterion LR
Wald
Conditional
probability_entry number
probability_removal number
output_display EachStep
LastStep
ci_enable flag
ci_value 90
95
99
correlation flag
display_baseline flag
survival flag
hazard flag
log_minus_log flag
one_minus_survival flag
separate_line field
value number or string If no value is specified for a field, the
default option "Mean" will be used
for that field.
236 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
decisionlistnode properties
The Decision List node identifies subgroups, or segments, that show a higher or
lower likelihood of a given binary outcome relative to the overall population. For
example, you might look for customers who are unlikely to churn or are most likely
to respond favorably to a campaign. You can incorporate your business knowledge
into the model by adding your own custom segments and previewing alternative
models side by side to compare the results. Decision List models consist of a list of
rules in which each rule has a condition and an outcome. Rules are applied in order,
and the first rule that matches determines the outcome.
Example
InteractiveBuilder
search_direction Up Relates to finding segments; where Up
is the equivalent of High Probability,
Down and Down is the equivalent of Low
Probability..
target_value string If not specified, will assume true value
for flags.
max_rules integer The maximum number of segments
excluding the remainder.
min_group_size integer Minimum segment size.
min_group_size_pct number Minimum segment size as a
percentage.
confidence_level number Minimum threshold that an input field
has to improve the likelihood of
response (give lift), to make it worth
adding to a segment definition.
max_segments_per_rule integer
mode Simple
Expert
EqualCount
bin_count number
max_models_per_cycle integer Search width for lists.
max_rules_per_cycle integer Search width for segment rules.
segment_growth number
include_missing flag
final_results_only flag
reuse_fields flag Allows attributes (input fields which
appear in rules) to be re-used.
max_alternatives integer
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
adjusted_propensity_part Test
ition
Validation
discriminantnode properties
Discriminant analysis makes more stringent assumptions than logistic regression
but can be a valuable alternative or supplement to a logistic regression analysis
when those assumptions are met.
Example
238 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 124. discriminantnode properties (continued)
discriminantnode Values Property description
Properties
method Enter
Stepwise
mode Simple
Expert
prior_probabilities AllEqual
ComputeFromSizes
covariance_matrix WithinGroups
SeparateGroups
means flag Statistics options in the Advanced
Output dialog box.
univariate_anovas flag
box_m flag
within_group_covariance flag
within_groups_correlatio flag
n
separate_groups_covarian flag
ce
total_covariance flag
fishers flag
unstandardized flag
casewise_results flag Classification options in the Advanced
Output dialog box.
limit_to_first number Default value is 10.
summary_table flag
leave_one_classification flag
combined_groups flag
separate_groups_covarian flag Matrices option Separate-groups
ce covariance.
territorial_map flag
combined_groups flag Plot option Combined-groups.
separate_groups flag Plot option Separate-groups.
summary_of_steps flag
F_pairwise flag
UnexplainedVariance
MahalanobisDistance
SmallestF
RaosV
V_to_enter number
criteria UseValue
UseProbability
F_value_entry number Default value is 3.84.
F_value_removal number Default value is 2.71.
probability_entry number Default value is 0.05.
probability_removal number Default value is 0.10.
calculate_variable_impor flag
tance
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
adjusted_propensity_part Test
ition
Validation
extensionmodelnode properties
build_script = """
import json
import spss.pyspark.runtime
240 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.linalg import DenseVector
from pyspark.mllib.tree import DecisionTree
cxt = spss.pyspark.runtime.getContext()
df = cxt.getSparkInputData()
schema = df.dtypes[:]
target = "Drug"
predictors = ["Age","BP","Sex","Cholesterol","Na","K"]
def metaMap(row,schema):
col = 0
meta = []
for (cname, ctype) in schema:
if ctype == 'string':
meta.append(set([row[col]]))
else:
meta.append((row[col],row[col]))
col += 1
return meta
def metaReduce(meta1,meta2,schema):
col = 0
meta = []
for (cname, ctype) in schema:
if ctype == 'string':
meta.append(meta1[col].union(meta2[col]))
else:
meta.append((min(meta1[col][0],meta2[col][0]),max(meta1[col][1],meta2[col][1])))
col += 1
return meta
def setToList(v):
if isinstance(v,set):
return list(v)
return v
lookup = {}
for i in range(0,len(schema)):
lookup[schema[i][0]] = i
def row2LabeledPoint(dm,lookup,target,predictors,row):
target_index = lookup[target]
tval = dm[target_index].index(row[target_index])
pvals = []
for predictor in predictors:
predictor_index = lookup[predictor]
if isinstance(dm[predictor_index],list):
pval = dm[predictor_index].index(row[predictor_index])
else:
pval = row[predictor_index]
pvals.append(pval)
return LabeledPoint(tval,DenseVector(pvals))
treeModel = DecisionTree.trainClassifier(
lps,
numClasses=predictorClassCount,
categoricalFeaturesInfo=getCategoricalFeatureInfo(metadata, lookup, predictors),
impurity='gini',
maxDepth=5,
_outputPath = cxt.createTemporaryFolder()
treeModel.save(cxt.getSparkContext(), _outputPath)
cxt.setModelContentFromPath("TreeModel", _outputPath)
cxt.setModelContentFromString("model.dm",json.dumps(metadata), mimeType="application/json")\
.setModelContentFromString("model.structure",treeModel.toDebugString())
"""
node.setPropertyValue("python_build_syntax", build_script)
R example
#### script example for R
node.setPropertyValue("syntax_type", "R")
node.setPropertyValue("r_build_syntax", """modelerModel <- lm(modelerData$Na~modelerData
$K,modelerData)
modelerDataModel
modelerModel
""")
242 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
factornode properties
The PCA/Factor node provides powerful data-reduction techniques to reduce the
complexity of your data. Principal components analysis (PCA) finds linear
combinations of the input fields that do the best job of capturing the variance in the
entire set of fields, where the components are orthogonal (perpendicular) to each
other. Factor analysis attempts to identify underlying factors that explain the pattern
of correlations within a set of observed fields. For both approaches, the goal is to
find a small number of derived fields that effectively summarizes the information in
the original set of fields.
Example
ULS
GLS
ML
PAF
Alpha
Image
mode Simple
Expert
max_iterations number
complete_records flag
matrix Correlation
Covariance
extract_factors ByEigenvalues
ByFactors
min_eigenvalue number
max_factor number
rotation None
Varimax
DirectOblimin
Equamax
Quartimax
Promax
delta number If you select DirectOblimin as your
rotation data type, you can specify a
value for delta.
244 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 126. factornode properties (continued)
factornode Properties Values Property description
kappa number If you select Promax as your rotation
data type, you can specify a value for
kappa.
featureselectionnode properties
The Feature Selection node screens input fields for removal based on a set of
criteria (such as the percentage of missing values); it then ranks the importance of
remaining inputs relative to a specified target. For example, given a data set with
hundreds of potential inputs, which are most likely to be useful in modeling patient
outcomes?
Example
For a more detailed example that creates and applies a Feature Selection model, see in.
CramersV
Lambda
unimportant_below number Specifies the threshold p values
used to rank variables as important,
marginal, or unimportant. Accepts
values from 0.0 to 1.0.
important_above number Accepts values from 0.0 to 1.0.
unimportant_label string Specifies the label for the
unimportant ranking.
marginal_label string
important_label string
selection_mode ImportanceLevel
ImportanceValue
TopN
246 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 127. featureselectionnode properties (continued)
featureselectionnode Values Property description
Properties
select_important flag When selection_mode is set to
ImportanceLevel, specifies
whether to select important fields.
select_marginal flag When selection_mode is set to
ImportanceLevel, specifies
whether to select marginal fields.
select_unimportant flag When selection_mode is set to
ImportanceLevel, specifies
whether to select unimportant fields.
importance_value number When selection_mode is set to
ImportanceValue, specifies the
cutoff value to use. Accepts values
from 0 to 100.
top_n integer When selection_mode is set to
TopN, specifies the cutoff value to
use. Accepts values from 0 to 1000.
genlinnode properties
The Generalized Linear model expands the general linear model so that the
dependent variable is linearly related to the factors and covariates through a
specified link function. Moreover, the model allows for the dependent variable to
have a non-normal distribution. It covers the functionality of a wide number of
statistical models, including linear regression, logistic regression, loglinear models
for count data, and interval-censored survival models.
Example
FixedValue
trials_field field Field type is continuous, flag, or
ordinal.
trials_number number Default value is 10.
model_type MainEffects
MainAndAllTwoWayEffec
ts
offset_type Variable
FixedValue
offset_field field Field type is only continuous.
offset_value number Must be a real number.
base_category Last
First
include_intercept flag
mode Simple
Expert
distribution BINOMIAL IGAUSS: Inverse Gaussian.
IGAUSS
NEGBIN
NORMAL
POISSON
TWEEDIE
MULTINOMIAL
negbin_para_type Specify
Estimate
248 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 128. genlinnode properties (continued)
genlinnode Properties Values Property description
negbin_parameter number Default value is 1. Must contain a non-
negative real number.
tweedie_parameter number
link_function IDENTITY CLOGLOG: Complementary log-log.
CUMCAUCHIT
CUMCLOGLOG
CUMLOGIT
CUMNLOGLOG
CUMPROBIT
power number Value must be real, nonzero number.
method Hybrid
Fisher
NewtonRaphson
max_fisher_iterations number Default value is 1; only positive
integers allowed.
Deviance
PearsonChiSquare
FixedValue
scale_value number Default value is 1; must be greater
than 0.
covariance_matrix ModelEstimator
RobustEstimator
max_iterations number Default value is 100; non-negative
integers only.
max_step_halving number Default value is 5; positive integers
only.
check_separation flag
start_iteration number Default value is 20; only positive
integers allowed.
estimates_change flag
estimates_change_min number Default value is 1E-006; only positive
numbers allowed.
estimates_change_type Absolute
Relative
loglikelihood_change flag
loglikelihood_change_min number Only positive numbers allowed.
loglikelihood_change_typ Absolute
e
Relative
hessian_convergence flag
hessian_convergence_min number Only positive numbers allowed.
hessian_convergence_type Absolute
Relative
case_summary flag
contrast_matrices flag
descriptive_statistics flag
estimable_functions flag
model_info flag
250 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 128. genlinnode properties (continued)
genlinnode Properties Values Property description
iteration_history flag
goodness_of_fit flag
print_interval number Default value is 1; must be positive
integer.
model_summary flag
lagrange_multiplier flag
parameter_estimates flag
include_exponential flag
covariance_estimates flag
correlation_estimates flag
analysis_type TypeI
TypeIII
TypeIAndTypeIII
statistics Wald
LR
citype Wald
Profile
tolerancelevel number Default value is 0.0001.
confidence_interval number Default value is 95.
loglikelihood_function Full
Kernel
singularity_tolerance 1E-007
1E-008
1E-009
1E-010
1E-011
1E-012
Descending
DataOrder
calculate_variable_impor flag
tance
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
adjusted_propensity_part Test
ition
Validation
glmmnode properties
A generalized linear mixed model (GLMM) extends the linear model so that the
target can have a non-normal distribution, is linearly related to the factors and
covariates via a specified link function, and so that the observations can be
correlated. Generalized linear mixed models cover a wide variety of models, from
simple linear regression to complex multilevel models for non-normal longitudinal
data.
252 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 129. glmmnode properties (continued)
glmmnode Properties Values Property description
residual_covariance_type Diagonal Specifies covariance structure for
residuals.
AR1
ARMA11
COMPOUND_SYMMETRY
IDENTITY
TOEPLITZ
UNSTRUCTURED
VARIANCE_COMPONENTS
custom_target flag Indicates whether to use target defined
in upstream node (false) or custom
target specified by target_field
(true).
target_field field Field to use as target if
custom_target is true.
use_trials flag Indicates whether additional field or
value specifying number of trials is to
be used when target response is a
number of events occurring in a set of
trials. Default is false.
use_field_or_value Field Indicates whether field (default) or
value is used to specify number of
Value trials.
GammaLog
BinomialLogit
PoissonLog
BinomialProbit
NegbinLog
BinomialLogC
Custom
target_distribution Normal Distribution of values for target when
dist_link_combination is Custom.
Binomial
Multinomial
Gamma
Inverse
NegativeBinomial
Poisson
254 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 129. glmmnode properties (continued)
glmmnode Properties Values Property description
link_function_type Identity Link function to relate target
values to predictors.
LogC If target_distribution is
Binomial you can use any
Log of the listed link functions.
If target_distribution is
Multinomial you can use
CLOGLOG CLOGLOG, CAUCHIT, LOGIT,
NLOGLOG, or PROBIT.
Logit If target_distribution is
anything other than Binomial or
NLOGLOG Multinomial you can use
IDENTITY, LOG, or POWER.
PROBIT
POWER
CAUCHIT
link_function_param number Link function parameter value to use.
Only applicable if
normal_link_function or
link_function_type is POWER.
use_predefined_inputs flag Indicates whether fixed effect fields
are to be those defined upstream as
input fields (true) or those from
fixed_effects_list (false).
Default is false.
fixed_effects_list structured If use_predefined_inputs is
false, specifies the input fields to use
as fixed effect fields.
use_intercept flag If true (default), includes the
intercept in the model.
random_effects_list structured List of fields to specify as random
effects.
regression_weight_field field Field to use as analysis weight field.
use_offset None Indicates how offset is specified. Value
None means no offset is used.
offset_value
offset_field
offset_value number Value to use for offset if use_offset
is set to offset_value.
offset_field field Field to use for offset value if
use_offset is set to offset_field.
Data
inputs_category_order Ascending Sorting order for categorical predictors.
Value Data specifies using the sort
Descending order found in the data. Default is
Ascending.
Data
max_iterations integer Maximum number of iterations the
algorithm will perform. A non-negative
integer; default is 100.
confidence_level integer Confidence level used to compute
interval estimates of the model
coefficients. A non-negative integer;
maximum is 100, default is 95.
degrees_of_freedom_metho Fixed Specifies how degrees of freedom are
d computed for significance test.
Varied
test_fixed_effects_coeff Model Method for computing the parameter
ecients estimates covariance matrix.
Robust
use_p_converge flag Option for parameter convergence.
p_converge number Blank, or any positive value.
p_converge_type
Absolute
Relative
max_fisher_steps integer
singularity_tolerance number
256 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 129. glmmnode properties (continued)
glmmnode Properties Values Property description
use_model_name flag Indicates whether to specify a custom
name for the model (true) or to use
the system-generated name (false).
Default is false.
model_name string If use_model_name is true, specifies
the model name to use.
confidence onProbability Basis for computing scoring confidence
value: highest predicted probability, or
onIncrease difference between highest and second
highest predicted probabilities.
score_category_probabili flag If true, produces predicted
ties probabilities for categorical targets.
Default is false.
max_categories integer If
score_category_probabilities
is true, specifies maximum number of
categories to save.
score_propensity flag If true, produces propensity scores
for flag target fields that indicate
likelihood of "true" outcome for field.
emeans structure For each categorical field from the
fixed effects list, specifies whether to
produce estimated marginal means.
covariance_list structure For each continuous field from the
fixed effects list, specifies whether to
use the mean or a custom value when
computing estimated marginal means.
mean_scale Original Specifies whether to compute
estimated marginal means based on
Transformed the original scale of the target (default)
or on the link function transformation.
comparison_adjustment_me LSD Adjustment method to use when
thod performing hypothesis tests with
SEQBONFERRONI multiple contrasts.
SEQSIDAK
gle properties
A GLE extends the linear model so that the target can have a non-normal
distribution, is linearly related to the factors and covariates via a specified link
function, and so that the observations can be correlated. Generalized linear mixed
models cover a wide variety of models, from simple linear regression to complex
multilevel models for non-normal longitudinal data.
NegbinLog
TweedieIdentity
NominalLogit
BinomialLogit
BinomialProbit
BinomialLogC
CUSTOM
258 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 130. gle properties (continued)
gle Properties Values Property description
target_distribution Normal Distribution of values for target when
dist_link_combination is Custom.
Binomial
Multinomial
Gamma
INVERSE_GAUSS
NEG_BINOMIAL
Poisson
TWEEDIE
UNKNOWN
LOG UNKNOWN
LOGIT IDENTITY
PROBIT LOG
COMPL_LOG_LOG LOGIT
POWER PROBIT
LOG_COMPL COMPL_LOG_LOG
NEG_LOG_LOG POWER
ODDS_POWER LOG_COMPL
NEG_BINOMIAL NEG_LOG_LOG
GEN_LOGIT ODDS_POWER
CUMUL_LOGIT If target_distribution is
NEG_BINOMIAL you can use:
CUMUL_PROBIT
NEG_BINOMIAL.
CUMUL_COMPL_LOG_L
OG If target_distribution is UNKNOWN, you
can use:
CUMUL_NEG_LOG_LOG
GEN_LOGIT
CUMUL_CAUCHIT
CUMUL_LOGIT
CUMUL_PROBIT
CUMUL_COMPL_LOG_LOG
CUMUL_NEG_LOG_LOG
CUMUL_CAUCHIT
link_function_param number Tweedie parameter value to use. Only
applicable if normal_link_function or
link_function_type is POWER.
260 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 130. gle properties (continued)
gle Properties Values Property description
tweedie_param number Link function parameter value to use. Only
applicable if dist_link_combination is
set to TweedieIdentity, or
link_function_type is TWEEDIE.
use_predefined_inputs flag Indicates whether model effect fields are to
be those defined upstream as input fields
(true) or those from
fixed_effects_list (false).
model_effects_list structured If use_predefined_inputs is false,
specifies the input fields to use as model
effect fields.
use_intercept flag If true (default), includes the intercept in
the model.
regression_weight_field field Field to use as analysis weight field.
use_offset None Indicates how offset is specified. Value None
means no offset is used.
Value
Variable
offset_value number Value to use for offset if use_offset is set
to offset_value.
offset_field field Field to use for offset value if use_offset is
set to offset_field.
target_category_order Ascending Sorting order for categorical targets. Default
is Ascending.
Descending
inputs_category_order Ascending Sorting order for categorical predictors.
Default is Ascending.
Descending
max_iterations integer Maximum number of iterations the algorithm
will perform. A non-negative integer; default
is 100.
confidence_level number Confidence level used to compute interval
estimates of the model coefficients. A non-
negative integer; maximum is 100, default is
95.
test_fixed_effects_coef Model Method for computing the parameter
fecients estimates covariance matrix.
Robust
detect_outliers flag When true the algorithm finds influential
outliers for all distributions except
multinomial distribution.
conduct_trend_analysis flag When true the algorithm conducts trend
analysis for the scatter plot.
HYBRID
max_fisher_iterations integer If using the FISHER_SCORING
estimation_method, the maximum
number of iterations. Minimum 0, maximum
20.
scale_parameter_method Specify the method to be used for the
MLE
FIXED estimation of the scale parameter.
DEVIANCE
PEARSON_CHISQUARE
262 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 130. gle properties (continued)
gle Properties Values Property description
method Determines the model selection method, or if
LASSO
ELASTIC_NET using Ridge the regularization method,
FORWARD_STEPWISE used.
RIDGE
kmeansnode properties
The K-Means node clusters the data set into distinct groups (or clusters). The
method defines a fixed number of clusters, iteratively assigns records to clusters,
and adjusts the cluster centers until further refinement can no longer improve the
model. Instead of trying to predict an outcome, k-means uses a process known as
unsupervised learning to uncover patterns in the set of input fields.
Example
264 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
node.setPropertyValue("optimize", "Speed")
# "Expert" tab
node.setPropertyValue("mode", "Expert")
node.setPropertyValue("stop_on", "Custom")
node.setPropertyValue("max_iterations", 10)
node.setPropertyValue("tolerance", 3.0)
node.setPropertyValue("encoding_value", 0.3)
Number
label_prefix string
mode Simple
Expert
stop_on Default
Custom
max_iterations number
tolerance number
encoding_value number
optimize Speed Use to specify whether model building
should be optimized for speed or for
Memory memory.
kmeansasnode properties
K-Means is one of the most commonly used clustering algorithms. It clusters data
points into a predefined number of clusters. The K-Means-AS node in SPSS Modeler
is implemented in Spark. For details about K-Means algorithms, see https://
spark.apache.org/docs/2.2.0/ml-clustering.html. Note that the K-Means-AS node
performs one-hot encoding automatically for categorical variables.
knnnode properties
The k-Nearest Neighbor (KNN) node associates a new case with the category or
value of the k objects nearest to it in the predictor space, where k is an integer.
Similar cases are near each other and dissimilar cases are distant from each other.
Example
266 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
node.setPropertyValue("automatic_k_selection", False)
node.setPropertyValue("fixed_k", 2)
node.setPropertyValue("weight_by_importance", True)
# Settings tab - Analyze panel
node.setPropertyValue("save_distances", True)
IdentifyNeighbors
objective Balance
Speed
Accuracy
Custom
normalize_ranges flag
use_case_labels flag Check box to enable next option.
case_labels_field field
identify_focal_cases flag Check box to enable next option.
focal_cases_field field
automatic_k_selection flag
fixed_k integer Enabled only if
automatic_k_selectio is False.
minimum_k integer Enabled only if
automatic_k_selectio is True.
maximum_k integer
distance_computation Euclidean
CityBlock
weight_by_importance flag
range_predictions Mean
Median
perform_feature_selectio flag
n
forced_entry_inputs [field1 ... fieldN]
stop_on_error_ratio flag
number_to_select integer
minimum_change number
kohonennode properties
The Kohonen node generates a type of neural network that can be used to cluster
the data set into distinct groups. When the network is fully trained, records that are
similar should be close together on the output map, while records that are different
will be far apart. You can look at the number of observations captured by each unit
in the model nugget to identify the strong units. This may give you a sense of the
appropriate number of clusters.
Example
268 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
node.setPropertyValue("phase2_eta", 0.2)
node.setPropertyValue("phase2_cycles", 75)
Time
time number
optimize Speed Use to specify whether model building
should be optimized for speed or for
Memory memory.
cluster_label flag
mode Simple
Expert
width number
length number
decay_style Linear
Exponential
phase1_neighborhood number
phase1_eta number
phase1_cycles number
phase2_neighborhood number
phase2_eta number
phase2_cycles number
linearnode properties
Linear regression models predict a continuous target based on linear relationships
between the target and one or more predictors.
Boosting
psm
use_auto_data_preparatio flag
n
confidence_level number
model_selection ForwardStepwise
BestSubsets
None
criteria_forward_stepwis AICC
e
Fstatistics
AdjustedRSquare
ASE
probability_entry number
probability_removal number
use_max_effects flag
max_effects number
use_max_steps flag
max_steps number
270 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 135. linearnode properties (continued)
linearnode Properties Values Property description
criteria_best_subsets AICC
AdjustedRSquare
ASE
combining_rule_continuou Mean
s
Median
component_models_n number
use_random_seed flag
random_seed number
use_custom_model_name flag
custom_model_name string
use_custom_name flag
custom_name string
tooltip string
keywords string
annotation string
linearasnode properties
Linear regression models predict a continuous target based on linear relationships
between the target and one or more predictors.
none
criteria_for_forward_ste AICC The statistic used to determine
pwise whether an effect should be added to
Fstatistics or removed from the model. The
default value is AdjustedRSquare.
AdjustedRSquare
ASE
pin number The effect that has the smallest p-
value less than this specified pin
threshold is added to the model. The
default value is 0.05.
pout number Any effects in the model with a p-value
greater than this specified pout
threshold are removed. The default
value is 0.10.
use_custom_max_effects flag Whether to use max number of effects
in the final model. The default value is
FALSE.
max_effects number Maximum number of effects to use in
the final model. The default value is 1.
use_custom_max_steps flag Whether to use the maximum number
of steps. The default value is FALSE.
max_steps number The maximum number of steps before
the stepwise algorithm stops. The
default value is 1.
criteria_for_best_subset AICC The mode of criteria to use. The default
s value is AdjustedRSquare.
AdjustedRSquare
ASE
logregnode properties
Logistic regression is a statistical technique for classifying records based on values
of input fields. It is analogous to linear regression but takes a categorical target field
instead of a numeric range.
272 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Multinomial Example
Binomial Example
Multinomial
include_constant flag
mode Simple
Expert
method Enter
Stepwise
Forwards
Backwards
BackwardsStepwise
binomial_method Enter
Forwards
Backwards
274 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 137. logregnode properties (continued)
logregnode Properties Values Property description
model_type MainEffects When FullFactorial is specified as
the model type, stepping methods will
FullFactorial not be run, even if specified. Instead,
Enter will be the method used.
Custom
If the model type is set to Custom but
no custom fields are specified, a main-
effects model will be built.
custom_terms [[BP Sex][BP][Age]]
multinomial_base_categor string Specifies how the reference category
y is determined.
binomial_categorical_inp string
ut
binomial_input_contrast Indicator Keyed property for categorical input
that specifies how the contrast is
Simple determined.
Difference
Helmert
Repeated
Polynomial
Deviation
binomial_input_category First Keyed property for categorical input
that specifies how the reference
Last category is determined.
scale None
UserDefined
Pearson
Deviance
scale_value number
all_probabilities flag
1.0E-6
1.0E-7
1.0E-8
1.0E-9
1.0E-10
min_terms number
use_max_terms flag
max_terms number
entry_criterion Score
LR
removal_criterion LR
Wald
probability_entry number
probability_removal number
binomial_probability_ent number
ry
binomial_probability_rem number
oval
requirements HierarchyDiscrete
HierarchyAll
Containment
None
max_iterations number
max_steps number
276 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 137. logregnode properties (continued)
logregnode Properties Values Property description
p_converge 1.0E-4
1.0E-5
1.0E-6
1.0E-7
1.0E-8
0
l_converge 1.0E-1
1.0E-2
1.0E-3
1.0E-4
1.0E-5
0
delta number
iteration_history flag
history_steps number
summary flag
likelihood_ratio flag
asymptotic_correlation flag
goodness_fit flag
parameters flag
confidence_interval number
asymptotic_covariance flag
classification_table flag
stepwise_summary flag
info_criteria flag
monotonicity_measures flag
binomial_output_display at_each_step
at_last_step
binomial_goodness_of_fit flag
all
binomial_residual_enable flag
binomial_outlier_thresho number
ld
binomial_classification_ number
cutoff
binomial_removal_criteri LR
on
Wald
Conditional
calculate_variable_impor flag
tance
calculate_raw_propensiti flag
es
lsvmnode properties
The Linear Support Vector Machine (LSVM) node enables you to classify data into
one of two groups without overfitting. LSVM is linear and works well with wide data
sets, such as those with a very large number of records.
278 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 138. lsvmnode properties (continued)
lsvmnode Properties Values Property description
precision number Used only if measurement level of
target field is Continuous.
Specifies the parameter related to
the sensitiveness of the loss for
regression. Minimum is 0 and
there is no maximum. Default
value is 0.1.
exclude_missing_value flag When True, a record is excluded if
s any single value is missing. The
default value is False.
penalty_function L1 Specifies the type of penalty
function used. The default value is
L2 L2.
neuralnetnode properties
Important: A newer version of the Neural Net modeling node, with enhanced features, is available in this
release and is described in the next section (neuralnetwork). Although you can still build and score a
model with the previous version, we recommend updating your scripts to use the new version. Details of
the previous version are retained here for reference.
Example
Dynamic
Multiple
Prune
ExhaustivePrune
RBFN
prevent_overtrain flag
train_pct number
set_random_seed flag
random_seed number
mode Simple
Expert
stop_on Default Stopping mode.
Accuracy
Cycles
Time
accuracy number Stopping accuracy.
cycles number Cycles to train.
time number Time to train (minutes).
continue flag
show_feedback flag
binary_encode flag
280 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 139. neuralnetnode properties (continued)
neuralnetnode Properties Values Property description
use_last_model flag
gen_logfile flag
logfile_name string
alpha number
initial_eta number
high_eta number
low_eta number
eta_decay_cycles number
hid_layers One
Two
Three
hl_units_one number
hl_units_two number
hl_units_three number
persistence number
m_topologies string
m_non_pyramids flag
m_persistence number
p_hid_layers One
Two
Three
p_hl_units_one number
p_hl_units_two number
p_hl_units_three number
p_persistence number
p_hid_rate number
p_hid_pers number
p_inp_rate number
p_inp_pers number
p_overall_pers number
r_persistence number
r_num_clusters number
neuralnetworknode properties
The Neural Net node uses a simplified model of the way the human brain processes
information. It works by simulating a large number of interconnected simple
processing units that resemble abstract versions of neurons. Neural networks are
powerful general function estimators and require minimal statistical or
mathematical knowledge to train or apply.
Example
282 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 140. neuralnetworknode properties (continued)
neuralnetworknode Values Property description
Properties
use_partition flag If a partition field is defined, this option
ensures that only data from the
training partition is used to build the
model.
continue flag Continue training existing model.
objective Standard psm is used for very large datasets, and
requires a Server connection.
Bagging
Boosting
psm
method MultilayerPerceptron
RadialBasisFunction
use_custom_layers flag
first_layer_units number
second_layer_units number
use_max_time flag
max_time number
use_max_cycles flag
max_cycles number
use_min_accuracy flag
min_accuracy number
combining_rule_categoric Voting
al
HighestProbability
HighestMeanProbabilit
y
combining_rule_continuou Mean
s
Median
component_models_n number
overfit_prevention_pct number
use_random_seed flag
random_seed number
missingValueImputatio
n
use_model_name boolean
model_name string
confidence onProbability
onIncrease
score_category_probabili flag
ties
max_categories number
score_propensity flag
use_custom_name flag
custom_name string
tooltip string
keywords string
annotation string
questnode properties
The QUEST node provides a binary classification method for building decision trees,
designed to reduce the processing time required for large C&R Tree analyses while
also reducing the tendency found in classification tree methods to favor inputs that
allow more splits. Input fields can be numeric ranges (continuous), but the target
field must be categorical. All splits are binary.
Example
284 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 141. questnode properties
questnode Properties Values Property description
target field QUEST models require a single target
and one or more input fields. A
frequency field can also be specified.
See the topic “Common modeling node
properties” on page 211 for more
information.
continue_training_existi flag
ng_model
objective Standard psm is used for very large datasets, and
requires a Server connection.
Boosting
Bagging
psm
model_output_type Single
InteractiveBuilder
use_tree_directives flag
tree_directives string
use_max_depth Default
Custom
max_depth integer Maximum tree depth, from 0 to 1000.
Used only if use_max_depth =
Custom.
prune_tree flag Prune tree to avoid overfitting.
use_std_err flag Use maximum difference in risk (in
Standard Errors).
std_err_multiplier number Maximum difference.
max_surrogates number Maximum surrogates.
use_percentage flag
min_parent_records_pc number
min_child_records_pc number
min_parent_records_abs number
min_child_records_abs number
use_costs flag
costs structured Structured property.
Equal
Custom
custom_priors structured Structured property.
adjust_priors flag
trails number Number of component models for
boosting or bagging.
set_ensemble_method Voting Default combining rule for categorical
targets.
HighestProbability
HighestMeanProbabilit
y
range_ensemble_method Mean Default combining rule for continuous
targets.
Median
large_boost flag Apply boosting to very large data sets.
split_alpha number Significance level for splitting.
train_pct number Overfit prevention set.
set_random_seed flag Replicate results option.
seed number
calculate_variable_impor flag
tance
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
adjusted_propensity_part Test
ition
Validation
286 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
randomtrees properties
The Random Trees node is similar to the existing C&RT node; however, the Random
Trees node is designed to process big data to create a single tree and displays the
resulting model in the output viewer that was added in SPSS Modeler version 17.
The Random Trees tree node generates a decision tree that you use to predict or
classify future observations. The method uses recursive partitioning to split the
training records into segments by minimizing the impurity at each step, where a
node in the tree is considered pure if 100% of cases in the node fall into a specific
category of the target field. Target and input fields can be numeric ranges or
categorical (nominal, ordinal, or flags); all splits are binary (only two subgroups).
tree.setPropertyValue("costs",
[["drugA", "drugB", 3.0], ["drugX",
"drugY", 4.0]])
default_cost_increase none Note: only enabled for ordinal targets.
square
custom
max_pct_missing integer If the percentage of missing values in
any input is greater than the value
specified here, the input is excluded.
Minimum 0, maximum 100.
exclude_single_cat_pct integer If one category value represents a
higher percentage of the records than
specified here, the entire field is
excluded from model building.
Minimum 1, maximum 99.
max_category_number integer If the number of categories in a field
exceeds this value, the field is
excluded from model building.
Minimum 2.
min_field_variation number If the coefficient of variation of a
continuous field is smaller than this
value, the field is excluded from model
building.
288 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 142. randomtrees properties (continued)
randomtrees Properties Values Property description
num_bins integer Only used if the data is made up of
continuous inputs. Set the number of
equal frequency bins to be used for the
inputs; options are: 2, 4, 5, 10, 20, 25,
50, or 100.
topN integer Specifies the number of rules to report.
Default value is 50, with a minimum of
1 and a maximum of 1000.
regressionnode properties
Linear regression is a common statistical technique for summarizing data and
making predictions by fitting a straight line or surface that minimizes the
discrepancies between predicted and actual output values.
Note: The Regression node is due to be replaced by the Linear node in a future release. We recommend
using Linear models for linear regression from now on.
Example
Stepwise
Backwards
Forwards
include_constant flag
use_weight flag
weight_field field
mode Simple
Expert
complete_records flag
tolerance 1.0E-1 Use double quotes for arguments.
1.0E-2
1.0E-3
1.0E-4
1.0E-5
1.0E-6
1.0E-7
1.0E-8
1.0E-9
1.0E-10
1.0E-11
1.0E-12
290 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 143. regressionnode properties (continued)
regressionnode Properties Values Property description
stepping_method useP useP : use probability of F
sequencenode properties
The Sequence node discovers association rules in sequential or time-oriented data.
A sequence is a list of item sets that tends to occur in a predictable order. For
example, a customer who purchases a razor and aftershave lotion may purchase
shaving cream the next time he shops. The Sequence node is based on the CARMA
association rules algorithm, which uses an efficient two-pass method for finding
sequences.
Example
Expert
use_max_duration flag
max_duration number
use_gaps flag
min_item_gap number
max_item_gap number
use_pruning flag
pruning_value number
set_mem_sequences flag
mem_sequences integer
292 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
slrmnode properties
The Self-Learning Response Model (SLRM) node enables you to build a model in
which a single new case, or small number of new cases, can be used to reestimate
the model without having to retrain the model using all data.
Example
model_reliability flag
calculate_variable_impor flag
tance
The properties for this node are described under “statisticsmodelnode properties” on page 410.
stpnode properties
The Spatio-Temporal Prediction (STP) node uses data that contains location data,
input fields for prediction (predictors), a time field, and a target field. Each location
has numerous rows in the data that represent the values of each predictor at each
time of measurement. After the data is analyzed, it can be used to predict target
values at any location within the shape data that is used in the analysis.
Quarters
Months
Weeks
Days
Hours
Minutes
Seconds
294 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 146. stpnode properties (continued)
stpnode properties Data type Property description
interval_type_date Years
Quarters
Months
Weeks
Days
interval_type_time Hours Limits the number of days per
week that are taken into account
Minutes when creating the time index that
STP uses for calculation
Seconds
interval_type_integer Periods The interval to which the data set
will be converted. The selection
(Time index fields only, available is dependent on the
Integer storage) storage type of the field that is
chosen as the time_field for
the model.
period_start integer
start_month January The month the model will start to
index from (for example, if set to
February March but the first record in the
data set is January, the model
will skip the first two records and
March start indexing at March.
April
May
June
July
August
September
October
November
December
Tuesday
Wednesday
Thursday
Friday
Saturday
days_per_week integer Minimum 1, maximum 7, in
increments of 1
hours_per_day integer The number of hours the model
accounts for in a day. If this is set
to 10, the model will start
indexing at the day_begins_at
time and continue indexing for 10
hours, then skip to the next value
matching the day_begins_at
value, etc.
day_begins_at 00:00 Sets the hour value that the model
starts indexing from.
01:00
02:00
03:00
...
23:00
296 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 146. stpnode properties (continued)
stpnode properties Data type Property description
interval_increment 1 This increment setting is for
minutes or seconds. This
2 determines where the model
creates indexes from the data. So
with an increment of 30 and
3 interval type seconds, the model
will create an index from the data
4 every 30 seconds.
10
12
15
20
30
data_matches_interval Boolean If set to N, the conversion of the
data to the regular
interval_type occurs before
the model is built.
Median
1stQuartile
3rdQuartile
custom_agg [[field, aggregation Structured property:
method],[]..]
Script parameter: custom_agg
Demo:
For example:
[['x5'
'FirstQuartile']['x4' set :stpnode.custom_agg =
'Sum']] [
[field1 function]
[field2 function]
298 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 146. stpnode properties (continued)
stpnode properties Data type Property description
estimation_method Parametric The method for modeling the
spatial covariance matrix
Nonparametric
parametric_model Gaussian Order parameter for Parametric
spatial covariance model
Exponential
PoweredExponential
exponential_power number Power level for
PoweredExponential model.
Minimum 1, maximum 2.
Advanced tab
max_missing_values integer The maximum percentage of
records with missing values
allowed in the model.
significance number The significance level for
hypotheses testing in the model
build. Specifies the significance
value for all the tests in STP model
estimation, including two
Goodness of Fit tests, effect F-
tests, and coefficient t-tests.
Output tab
model_specifications flag
temporal_summary flag
location_summary flag Determines whether the Location
Summary table is included in the
model output.
model_quality flag
test_mean_structure flag
mean_structure_coefficients flag
autoregressive_coefficients flag
test_decay_space flag
parametric_spatial_covarian flag
ce
correlations_heat_map flag
correlations_map flag
location_clusters flag
similarity_threshold number The threshold at which output
clusters are considered similar
enough to be merged into a single
cluster.
svmnode properties
The Support Vector Machine (SVM) node enables you to classify data into one of two
groups without overfitting. SVM works well with wide data sets, such as those with a
very large number of input fields.
Example
1.0E-3 (default)
1.0E-4
1.0E-5
1.0E-6
regularization number Also known as the C parameter.
precision number Used only if measurement level of
target field is Continuous.
300 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 147. svmnode properties (continued)
svmnode Properties Values Property description
kernel RBF(default) Type of kernel function used for
the transformation.
Polynomial
Sigmoid
Linear
rbf_gamma number Used only if kernel is RBF.
gamma number Used only if kernel is
Polynomial or Sigmoid.
bias number
degree number Used only if kernel is
Polynomial.
calculate_variable_im flag
portance
calculate_raw_propens flag
ities
flag
calculate_adjusted_
propensities
adjusted_propensity_p Test
artition
Validation
tcmnode Properties
Temporal causal modeling attempts to discover key causal relationships in time
series data. In temporal causal modeling, you specify a set of target series and a set
of candidate inputs to those targets. The procedure then builds an autoregressive
time series model for each target and includes only those inputs that have the most
significant causal relationship with the target.
Single
metric_fields fields
both_target_and_input [f1 ... fN]
targets [f1 ... fN]
Period
input_interval None
Unknown
Year
Quarter
Month
Week
Day
Hour
Hour_nonperiod
Minute
Minute_nonperiod
Second
Second_nonperiod
period_field string
period_start_value integer
num_days_per_week integer
302 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 148. tcmnode properties (continued)
tcmnode Properties Values Property description
start_day_of_week Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
num_hours_per_day integer
start_hour_of_day integer
timestamp_increments integer
cyclic_increments integer
cyclic_periods list
output_interval None
Year
Quarter
Month
Week
Day
Hour
Minute
Second
is_same_interval Same
Notsame
cross_hour Boolean
aggregate_and_distribute list
Sum
Mode
Min
Max
distribute_default Mean
Sum
group_default Mean
Sum
Mode
Min
Max
missing_imput Linear_interp
Series_mean
K_mean
K_meridian
Linear_trend
None
k_mean_param integer
k_median_param integer
missing_value_threshold integer
conf_level integer
max_num_predictor integer
max_lag integer
epsilon number
threshold integer
is_re_est Boolean
num_targets integer
304 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 148. tcmnode properties (continued)
tcmnode Properties Values Property description
percent_targets integer
fields_display list
series_display list
network_graph_for_target Boolean
sign_level_for_target number
fit_and_outlier_for_targ Boolean
et
sum_and_para_for_target Boolean
impact_diag_for_target Boolean
impact_diag_type_for_tar Effect
get
Cause
Both
impact_diag_level_for_ta integer
rget
series_plot_for_target Boolean
res_plot_for_target Boolean
top_input_for_target Boolean
forecast_table_for_targe Boolean
t
same_as_for_target Boolean
network_graph_for_series Boolean
sign_level_for_series number
fit_and_outlier_for_seri Boolean
es
sum_and_para_for_series Boolean
impact_diagram_for_serie Boolean
s
impact_diagram_type_for_ Effect
series
Cause
Both
impact_diagram_level_for integer
_series
series_plot_for_series Boolean
residual_plot_for_series Boolean
Pivot
Both
rmsp_error Boolean
bic Boolean
r_square Boolean
outliers_over_time Boolean
series_transormation Boolean
use_estimation_period Boolean
estimation_period Times
Observation
observations list
observations_type Latest
Earliest
observations_num integer
observations_exclude integer
extend_records_into_futu Boolean
re
forecastperiods integer
max_num_distinct_values integer
display_targets FIXEDNUMBER
PERCENTAGE
goodness_fit_measure ROOTMEAN
BIC
RSQUARE
top_input_for_series Boolean
aic Boolean
306 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 148. tcmnode properties (continued)
tcmnode Properties Values Property description
rmse Boolean
ts properties
The Time Series node estimates exponential smoothing, univariate Autoregressive
Integrated Moving Average (ARIMA), and multivariate ARIMA (or transfer function)
models for time series data and produces forecasts of future performance. This
Time Series node is similar to the previous Time Series node that was deprecated in
SPSS Modeler version 18. However, this newer Time Series node is designed to
harness the power of IBM SPSS Analytic Server to process big data, and display the
resulting model in the output viewer that was added in SPSS Modeler version 17.
Unknown
Year
Quarter
Month
Week
Day
Hour
Hour_nonperiod
Minute
Minute_nonperiod
Second
Second_nonperiod
period_field field
period_start_value integer
num_days_per_week integer
start_day_of_week Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
num_hours_per_day integer
start_hour_of_day integer
timestamp_increments integer
308 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 149. ts properties (continued)
ts Properties Values Property description
cyclic_increments integer
cyclic_periods list
output_interval None
Year
Quarter
Month
Week
Day
Hour
Minute
Second
is_same_interval flag
cross_hour flag
aggregate_and_distribute list
aggregate_default Mean
Sum
Mode
Min
Max
distribute_default Mean
Sum
group_default Mean
Sum
Mode
Min
Max
Series_mean
K_mean
K_median
Linear_trend
k_span_points integer
use_estimation_period flag
estimation_period Observations
Times
date_estimation list Only available if you use
date_time_field
period_estimation list Only available if you use
use_period
observations_type Latest
Earliest
observations_num integer
observations_exclude integer
method ExpertModeler
Exsmooth
Arima
expert_modeler_method ExpertModeler
Exsmooth
Arima
consider_seasonal flag
detect_outliers flag
expert_outlier_additive flag
expert_outlier_level_shift flag
expert_outlier_innovational flag
expert_outlier_level_shift flag
expert_outlier_transient flag
310 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 149. ts properties (continued)
ts Properties Values Property description
expert_outlier_seasonal_additive flag
expert_outlier_local_trend flag
expert_outlier_additive_patch flag
consider_newesmodels flag
exsmooth_model_type Simple Specifies the Exponential
Smoothing method.
HoltsLinearTrend Default is Simple.
BrownsLinearTrend
DampedTrend
SimpleSeasonal
WintersAdditive
WintersMultiplicativ
e
DampedTrendAdditive
DampedTrendMultiplic
ative
MultiplicativeTrendA
dditive
MultiplicativeSeason
al
MultiplicativeTrendM
ultiplicative
MultiplicativeTrend
set :ts.futureValue_t
ype_method="specify"
set :ts.extend_metric
_values=[{'Market_1',
'USER_SPECIFY',
[1,2,3]},
{'Market_2','MOST_REC
ENT_VALUE', ''},
{'Market_3','RECENT_P
OINTS_MEAN', ''}]
exsmooth_transformation_type None
SquareRoot
NaturalLog
arima.p integer
arima.d integer
arima.q integer
arima.sp integer
arima.sd integer
arima.sq integer
arima_transformation_type None
SquareRoot
NaturalLog
arima_include_constant flag
tf_arima.p. fieldname integer For transfer functions.
tf_arima.d. fieldname integer For transfer functions.
tf_arima.q. fieldname integer For transfer functions.
312 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 149. ts properties (continued)
ts Properties Values Property description
tf_arima.sp. fieldname integer For transfer functions.
tf_arima.sd. fieldname integer For transfer functions.
tf_arima.sq. fieldname integer For transfer functions.
tf_arima.delay. fieldname integer For transfer functions.
tf_arima.transformation_type. None For transfer functions.
fieldname
SquareRoot
NaturalLog
arima_detect_outliers flag
arima_outlier_additive flag
arima_outlier_level_shift flag
arima_outlier_innovational flag
arima_outlier_transient flag
arima_outlier_seasonal_additive flag
arima_outlier_local_trend flag
arima_outlier_additive_patch flag
max_lags integer
cal_PI flag
conf_limit_pct real
events fields
continue flag
scoring_model_only flag Use for models with very
large numbers (tens of
thousands) of time
series.
forecastperiods integer
extend_records_into_future flag
extend_metric_values fields Allows you to provide
future values for
predictors.
conf_limits flag
noise_res flag
Example
Exsmooth
Arima
Reuse
expert_modeler_method flag
314 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 150. timeseriesnode properties (continued)
timeseriesnode Properties Values Property description
consider_seasonal flag
detect_outliers flag
expert_outlier_additive flag
expert_outlier_level_shift flag
expert_outlier_innovational flag
expert_outlier_level_shift flag
expert_outlier_transient flag
expert_outlier_seasonal_additive flag
expert_outlier_local_trend flag
expert_outlier_additive_patch flag
exsmooth_model_type Simple
HoltsLinearTrend
BrownsLinearTrend
DampedTrend
SimpleSeasonal
WintersAdditive
WintersMultiplicativ
e
exsmooth_transformation_type None
SquareRoot
NaturalLog
arima_p integer
arima_d integer
arima_q integer
arima_sp integer
arima_sd integer
arima_sq integer
arima_transformation_type None
SquareRoot
NaturalLog
NaturalLog
arima_detect_outlier_mode None
Automatic
arima_outlier_additive flag
arima_outlier_level_shift flag
arima_outlier_innovational flag
arima_outlier_transient flag
arima_outlier_seasonal_additive flag
arima_outlier_local_trend flag
arima_outlier_additive_patch flag
conf_limit_pct real
max_lags integer
events fields
scoring_model_only flag Use for models with very
large numbers (tens of
thousands) of time
series.
316 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
treeas properties
The Tree-AS node is similar to the existing CHAID node; however, the Tree-AS node
is designed to process big data to create a single tree and displays the resulting
model in the output viewer that was added in SPSS Modeler version 17. The node
generates a decision tree by using chi-square statistics (CHAID) to identify optimal
splits. This use of CHAID can generate nonbinary trees, meaning that some splits
have more than two branches. Target and input fields can be numeric range
(continuous) or categorical. Exhaustive CHAID is a modification of CHAID that does
a more thorough job of examining all possible splits but takes longer to compute.
exhaustive_chaid
max_depth integer Maximum tree depth, from 0 to 20. The
default value is 5.
num_bins integer Only used if the data is made up of
continuous inputs. Set the number of
equal frequency bins to be used for the
inputs; options are: 2, 4, 5, 10, 20, 25,
50, or 100.
record_threshold integer The number of records at which the
model will switch from using p-values
to Effect sizes while building the tree.
The default is 1,000,000; increase or
decrease this in increments of 10,000.
split_alpha number Significance level for splitting. The
value must be between 0.01 and 0.99.
merge_alpha number Significance level for merging. The
value must be between 0.01 and 0.99.
bonferroni_adjustment flag Adjust significance values using
Bonferroni method.
effect_size_threshold_co number Set the Effect size threshold when
nt splitting nodes and merging categories
when using a continuous target. The
value must be between 0.01 and 0.99.
effect_size_threshold_ca number Set the Effect size threshold when
t splitting nodes and merging categories
when using a categorical target. The
value must be between 0.01 and 0.99.
split_merged_categories flag Allow resplitting of merged categories.
minimum_record_use use_percentage
use_absolute
min_parent_records_pc number Default value is 2. Minimum 1,
maximum 100, in increments of 1.
Parent branch value must be higher
than child branch.
min_child_records_pc number Default value is 1. Minimum 1,
maximum 100, in increments of 1.
min_parent_records_abs number Default value is 100. Minimum 1,
maximum 100, in increments of 1.
Parent branch value must be higher
than child branch.
min_child_records_abs number Default value is 50. Minimum 1,
maximum 100, in increments of 1.
epsilon number Minimum change in expected cell
frequencies..
max_iterations number Maximum iterations for convergence.
use_costs flag
costs structured Structured property. The format is a list
of 3 values: the actual value, the
predicted value, and the cost if that
prediction is wrong. For example:
tree.setPropertyValue("costs",
[["drugA", "drugB", 3.0], ["drugX",
"drugY", 4.0]])
default_cost_increase none Note: only enabled for ordinal targets.
square
custom
calculate_conf flag
display_rule_id flag Adds a field in the scoring output that
indicates the ID for the terminal node
to which each record is assigned.
318 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
twostepnode Properties
The TwoStep node uses a two-step clustering method. The first step makes a single
pass through the data to compress the raw input data into a manageable set of
subclusters. The second step uses a hierarchical clustering method to progressively
merge the subclusters into larger and larger clusters. TwoStep has the advantage of
automatically estimating the optimal number of clusters for the training data. It can
handle mixed field types and large data sets efficiently.
Example
Number
label_prefix string
distance_measure Euclidean
Loglikelihood
BIC
twostepAS Properties
TwoStep Cluster is an exploratory tool that is designed to reveal natural groupings
(or clusters) within a data set that would otherwise not be apparent. The algorithm
that is employed by this procedure has several desirable features that differentiate
it from traditional clustering techniques, such as handling of categorical and
continuous variables, automatic selection of number of clusters, and scalability.
BIC
automatic_clustering_method use_clustering_criterion_settin
g
Distance_jump
Minimum
Maximum
feature_importance_method use_clustering_criterion_settin
g
effect_size
use_random_seed Boolean
random_seed integer
320 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 153. twostepAS properties (continued)
twostepAS Properties Values Property description
distance_measure Euclidean
Loglikelihood
include_outlier_clusters Boolean Default=True
num_cases_in_feature_tree_leaf_i integer Default=10
s_less_than
top_perc_outliers integer Default=5
initial_dist_change_threshold integer Default=0
leaf_node_maximum_branches integer Default=8
non_leaf_node_maximum_branches integer Default=8
max_tree_depth integer Default=3
adjustment_weight_on_measurement integer Default=6
_level
memory_allocation_mb number Default=512
delayed_split Boolean Default=True
fields_to_standardize [f1 ... fN]
adaptive_feature_selection Boolean Default=True
featureMisPercent integer Default=70
coefRange number Default=0.05
percCasesSingleCategory integer Default=95
numCases integer Default=24
include_model_specifications Boolean Default=True
include_record_summary Boolean Default=True
include_field_transformations Boolean Default=True
excluded_inputs Boolean Default=True
evaluate_model_quality Boolean Default=True
show_feature_importance bar Boolean Default=True
chart
show_feature_importance_ Boolean Default=True
word_cloud
show_outlier_clusters Boolean Default=True
interactive_table_and_chart
show_outlier_clusters_pivot_tabl Boolean Default=True
e
across_cluster_feature_importanc Boolean Default=True
e
across_cluster_profiles_pivot_ta Boolean Default=True
ble
Number
label_prefix String
322 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Chapter 14. Model nugget node properties
Model nugget nodes share the same common properties as other nodes. See the topic “Common Node
Properties” on page 73 for more information.
applyanomalydetectionnode Properties
Anomaly Detection modeling nodes can be used to generate an Anomaly Detection model nugget. The
scripting name of this model nugget is applyanomalydetectionnode. For more information on scripting the
modeling node itself, “anomalydetectionnode properties” on page 212
ScoreOnly
num_fields integer Fields to report.
discard_records flag Indicates whether records are discarded
from the output or not.
discard_anomalous_records flag Indicator of whether to discard the
anomalous or non-anomalous records. The
default is off, meaning that non-anomalous
records are discarded. Otherwise, if on,
anomalous records will be discarded. This
property is enabled only if the
discard_records property is enabled.
applyapriorinode Properties
Apriori modeling nodes can be used to generate an Apriori model nugget. The scripting name of this
model nugget is applyapriorinode. For more information on scripting the modeling node itself,
“apriorinode properties” on page 213
Predictions
NoCheck
Table 155. applyapriorinode properties (continued)
applyapriorinode Properties Values Property description
criterion Confidence
Support
RuleSupport
Lift
Deployabilit
y
applyassociationrulesnode Properties
The Association Rules modeling node can be used to generate an association rules model nugget. The
scripting name of this model nugget is applyassociationrulesnode. For more information on scripting the
modeling node itself, see “associationrulesnode properties” on page 215.
Lift
Conditionsupport
Deployability
allow_repeats Boolean Determine whether rules with the same
prediction are included in the score.
check_input NoPredictions
Predictions
NoCheck
applyautoclassifiernode Properties
Auto Classifier modeling nodes can be used to generate an Auto Classifier model nugget. The scripting
name of this model nugget is applyautoclassifiernode.For more information on scripting the modeling
node itself, “autoclassifiernode properties” on page 218
324 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 157. applyautoclassifiernode properties
applyautoclassifiernode Values Property description
Properties
flag_ensemble_method Voting Specifies the method used to
determine the ensemble score.
EvaluationWeightedVoting This setting applies only if the
selected target is a flag field.
ConfidenceWeightedVoting
RawPropensityWeightedVoti
ng
HighestConfidence
AverageRawPropensity
flag_evaluation_selection Accuracy This option is for flag target only,
to decide which evaluation
AUC_ROC measure is chosen for
evaluation-weighted voting.
filter_individual_model_o flag Specifies whether scoring results
utput from individual models should be
suppressed.
is_ensemble_update flag Enables continuous auto
machine learning mode, which
adds new component models
into an existing auto model set
instead of replacing the existing
auto model, and re-evaluates
measures of existing component
models using newly available
data.
is_auto_ensemble_weights_ flag Enables automatic model
reevaluation weights reevaluation.
use_accumulated_factor flag Accumulated factor is used to
compute accumulated measures.
accumulated_factor number (double) Max value is 0.99, and min value
is 0.85.
use_accumulated_reducing flag Performs model reducing based
on accumulated limit during
model refresh.
accumulated_reducing_limi number (double) Max value is 0.7, and min value
t is 0.1.
use_accumulated_weighted_ flag Accumulated evaluation measure
evaluation is used for voting when the
evaluation-weighted voting
method is selected for the
ensemble method.
RawPropensity
set_ensemble_method Voting Specifies the method used to
determine the ensemble score.
EvaluationWeightedVoting This setting applies only if the
selected target is a set field.
ConfidenceWeightedVoting
HighestConfidence
set_voting_tie_selection Random If a voting method is selected,
specifies how ties are resolved.
HighestConfidence This setting applies only if the
selected target is a nominal field.
applyautoclusternode Properties
Auto Cluster modeling nodes can be used to generate an Auto Cluster model nugget. The scripting name
of this model nugget is applyautoclusternode. No other properties exist for this model nugget. For more
information on scripting the modeling node itself, “autoclusternode properties” on page 220
applyautonumericnode Properties
Auto Numeric modeling nodes can be used to generate an Auto Numeric model nugget. The scripting
name of this model nugget is applyautonumericnode.For more information on scripting the modeling node
itself, “autonumericnode properties” on page 222
applybayesnetnode Properties
Bayesian network modeling nodes can be used to generate a Bayesian network model nugget. The
scripting name of this model nugget is applybayesnetnode. For more information on scripting the
modeling node itself, “bayesnetnode properties” on page 224.
326 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 159. applybayesnetnode properties (continued)
applybayesnetnode Values Property description
Properties
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
applyc50node Properties
C5.0 modeling nodes can be used to generate a C5.0 model nugget. The scripting name of this model
nugget is applyc50node. For more information on scripting the modeling node itself, “c50node
properties” on page 226.
NoMissingValues
calculate_conf flag Available when SQL generation is
enabled; this property includes
confidence calculations in the
generated tree.
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
applycarmanode Properties
CARMA modeling nodes can be used to generate a CARMA model nugget. The scripting name of this
model nugget is applycarmanode. No other properties exist for this model nugget. For more information
on scripting the modeling node itself, “carmanode properties” on page 228.
applycartnode Properties
C&R Tree modeling nodes can be used to generate a C&R Tree model nugget. The scripting name of this
model nugget is applycartnode. For more information on scripting the modeling node itself, “cartnode
properties” on page 229.
NoMissingValues
applychaidnode Properties
CHAID modeling nodes can be used to generate a CHAID model nugget. The scripting name of this model
nugget is applychaidnode. For more information on scripting the modeling node itself, “chaidnode
properties” on page 232.
applycoxregnode Properties
Cox modeling nodes can be used to generate a Cox model nugget. The scripting name of this model
nugget is applycoxregnode. For more information on scripting the modeling node itself, “coxregnode
properties” on page 234.
Fields
time_interval number
num_future_times integer
328 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 163. applycoxregnode properties (continued)
applycoxregnode Properties Values Property description
time_field field
past_survival_time field
all_probabilities flag
cumulative_hazard flag
applydecisionlistnode Properties
Decision List modeling nodes can be used to generate a Decision List model nugget. The scripting name of
this model nugget is applydecisionlistnode. For more information on scripting the modeling node itself,
“decisionlistnode properties” on page 237.
applydiscriminantnode Properties
Discriminant modeling nodes can be used to generate a Discriminant model nugget. The scripting name of
this model nugget is applydiscriminantnode. For more information on scripting the modeling node itself,
“discriminantnode properties” on page 238.
applyextension properties
Extension Model nodes can be used to generate an
Extension model nugget. The scripting name of this
model nugget is applyextension. For more
information on scripting the modeling node itself,
see “extensionmodelnode properties” on page
240.
score_script = """
import json
import spss.pyspark.runtime
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.linalg import DenseVector
from pyspark.mllib.tree import DecisionTreeModel
from pyspark.sql.types import StringType, StructField
cxt = spss.pyspark.runtime.getContext()
if cxt.isComputeDataModelOnly():
_schema = cxt.getSparkInputSchema()
_schema.fields.append(StructField("Prediction", StringType(), nullable=True))
cxt.setSparkOutputSchema(_schema)
else:
df = cxt.getSparkInputData()
_modelPath = cxt.getModelContentToPath("TreeModel")
metadata = json.loads(cxt.getModelContentToString("model.dm"))
schema = df.dtypes[:]
target = "Drug"
predictors = ["Age","BP","Sex","Cholesterol","Na","K"]
lookup = {}
for i in range(0,len(schema)):
lookup[schema[i][0]] = i
def row2LabeledPoint(dm,lookup,target,predictors,row):
target_index = lookup[target]
tval = dm[target_index].index(row[target_index])
pvals = []
for predictor in predictors:
predictor_index = lookup[predictor]
if isinstance(dm[predictor_index],list):
pval = row[predictor_index] in dm[predictor_index] and
dm[predictor_index].index(row[predictor_index]) or -1
else:
pval = row[predictor_index]
pvals.append(pval)
return LabeledPoint(tval, DenseVector(pvals))
def addPrediction(x,dm,lookup,target):
result = []
for _idx in range(0, len(x[0])):
result.append(x[0][_idx])
result.append(dm[lookup[target]][int(x[1])])
return result
_schema = cxt.getSparkInputSchema()
_schema.fields.append(StructField("Prediction", StringType(), nullable=True))
rdd2 = df.rdd.zip(predictions).map(lambda x:addPrediction(x, metadata, lookup, target))
outDF = cxt.getSparkSQLContext().createDataFrame(rdd2, _schema)
cxt.setSparkOutputData(outDF)
"""
applyModel.setPropertyValue("python_syntax", score_script)
R example
#### script example for R
applyModel.setPropertyValue("r_syntax", """
result<-predict(modelerModel,newdata=modelerData)
modelerData<-cbind(modelerData,result)
var1<-c(fieldName="NaPrediction",fieldLabel="",fieldStorage="real",fieldMeasure="",
330 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
fieldFormat="",fieldRole="")
modelerDataModel<-data.frame(modelerDataModel,var1)""")
applyfactornode Properties
PCA/Factor modeling nodes can be used to generate a PCA/Factor model nugget. The scripting name of
this model nugget is applyfactornode. No other properties exist for this model nugget. For more
information on scripting the modeling node itself, “factornode properties” on page 243.
applyfeatureselectionnode Properties
Feature Selection modeling nodes can be used to generate a Feature Selection model nugget. The
scripting name of this model nugget is applyfeatureselectionnode. For more information on scripting the
modeling node itself, “featureselectionnode properties” on page 245.
applygeneralizedlinearnode Properties
Generalized Linear (genlin) modeling nodes can be used to generate a Generalized Linear model nugget.
The scripting name of this model nugget is applygeneralizedlinearnode. For more information on scripting
the modeling node itself, “genlinnode properties” on page 247.
applyglmmnode Properties
GLMM modeling nodes can be used to generate a GLMM model nugget. The scripting name of this model
nugget is applyglmmnode. For more information on scripting the modeling node itself, “glmmnode
properties” on page 252.
applygle Properties
The GLE modeling node can be used to generate a GLE model nugget. The scripting name of this model
nugget is applygle. For more information on scripting the modeling node itself, see “gle properties” on
page 257.
332 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 170. applygle properties
applygle Properties Values Property description
enable_sql_generation udf Used to set SQL generation options
during stream execution. Choose either
native to pushback to the database and score
using a SPSS Modeler Server scoring
adapter (if connected to a database
with a scoring adapter installed), or
score within SPSS Modeler.
applygmm properties
The Gaussian Mixture node can be used to generate a Gaussian Mixture model nugget. The scripting name
of this model nugget is applygmm. The properties in the following table are available in version 18.2.1.1
and later. For more information on scripting the modeling node itself, see “gmm properties” on page 413.
applykmeansnode Properties
K-Means modeling nodes can be used to generate a K-Means model nugget. The scripting name of this
model nugget is applykmeansnode. No other properties exist for this model nugget. For more information
on scripting the modeling node itself, “kmeansnode properties” on page 264.
applyknnnode Properties
KNN modeling nodes can be used to generate a KNN model nugget. The scripting name of this model
nugget is applyknnnode. For more information on scripting the modeling node itself, “knnnode properties”
on page 266.
applykohonennode Properties
Kohonen modeling nodes can be used to generate a Kohonen model nugget. The scripting name of this
model nugget is applykohonennode. No other properties exist for this model nugget. For more information
on scripting the modeling node itself, “c50node properties” on page 226.
applylinearasnode Properties
Linear-AS modeling nodes can be used to generate a Linear-AS model nugget. The scripting name of this
model nugget is applylinearasnode. For more information on scripting the modeling node itself,
“linearasnode properties” on page 271.
native
applylogregnode Properties
Logistic Regression modeling nodes can be used to generate a Logistic Regression model nugget. The
scripting name of this model nugget is applylogregnode. For more information on scripting the modeling
node itself, “logregnode properties” on page 272.
334 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
applylsvmnode Properties
LSVM modeling nodes can be used to generate an LSVM model nugget. The scripting name of this model
nugget is applylsvmnode. For more information on scripting the modeling node itself, see “lsvmnode
properties” on page 278.
applyneuralnetnode Properties
Neural Net modeling nodes can be used to generate a Neural Net model nugget. The scripting name of
this model nugget is applyneuralnetnode. For more information on scripting the modeling node itself,
“neuralnetnode properties” on page 279.
Caution: A newer version of the Neural Net nugget, with enhanced features, is available in this release
and is described in the next section (applyneuralnetwork). Although the previous version is still available,
we recommend updating your scripts to use the new version. Details of the previous version are retained
here for reference, but support for it will be removed in a future release.
SoftMax
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
applyneuralnetworknode properties
Neural Network modeling nodes can be used to generate a Neural Network model nugget. The scripting
name of this model nugget is applyneuralnetworknode. For more information on scripting the modeling
node itself, neuralnetworknode Properties.
onIncrease
score_category_probabili flag
ties
max_categories number
score_propensity flag
enable_sql_generation udf Used to set SQL generation options
during stream execution. The options
native are to pushback to the database and
score using a SPSS® Modeler Server
scoring adapter (if connected to a
puresql database with a scoring adapter
installed), to score within SPSS
Modeler, or to pushback to the
database and score using SQL.
applyocsvmnode properties
One-Class SVM nodes can be used to generate a One-Class SVM model nugget. The scripting name of this
model nugget is applyocsvmnode. No other properties exist for this model nugget. For more information
on scripting the modeling node itself, see “ocsvmnode properties” on page 418.
applyquestnode Properties
QUEST modeling nodes can be used to generate a QUEST model nugget. The scripting name of this model
nugget is applyquestnode. For more information on scripting the modeling node itself, “questnode
properties” on page 284.
NoMissingValues
calculate_conf flag
display_rule_id flag Adds a field in the scoring output that
indicates the ID for the terminal node
to which each record is assigned.
336 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 179. applyquestnode properties (continued)
applyquestnode Properties Values Property description
calculate_raw_propensiti flag
es
calculate_adjusted_prope flag
nsities
applyr Properties
R Building nodes can be used to generate an R model nugget. The scripting name of this model nugget is
applyr. For more information on scripting the modeling node itself, “buildr properties” on page 225.
applyrandomtrees Properties
The Random Trees modeling node can be used to generate a Random Trees model nugget. The scripting
name of this model nugget is applyrandomtrees. For more information on scripting the modeling node
itself, see “randomtrees properties” on page 287.
applyregressionnode Properties
Linear Regression modeling nodes can be used to generate a Linear Regression model nugget. The
scripting name of this model nugget is applyregressionnode. No other properties exist for this model
nugget. For more information on scripting the modeling node itself, “regressionnode properties” on page
289.
applyselflearningnode properties
Self-Learning Response Model (SLRM) modeling nodes can be used to generate a SLRM model nugget.
The scripting name of this model nugget is applyselflearningnode. For more information on scripting the
modeling node itself, “slrmnode properties” on page 293.
applysequencenode Properties
Sequence modeling nodes can be used to generate a Sequence model nugget. The scripting name of this
model nugget is applysequencenode. No other properties exist for this model nugget. For more
information on scripting the modeling node itself, “sequencenode properties” on page 291.
applysvmnode Properties
SVM modeling nodes can be used to generate an SVM model nugget. The scripting name of this model
nugget is applysvmnode. For more information on scripting the modeling node itself, “svmnode
properties” on page 300.
338 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 183. applysvmnode properties
applysvmnode Properties Values Property description
all_probabilities flag
calculate_raw_propensities flag
calculate_adjusted_propensi flag
ties
applystpnode Properties
The STP modeling node can be used to generate an associated model nugget, which display the model
output in the Output Viewer. The scripting name of this model nugget is applystpnode. For more
information on scripting the modeling node itself, see “stpnode properties” on page 294.
applytcmnode Properties
Temporal Causal Modeling (TCM) modeling nodes can be used to generate a TCM model nugget. The
scripting name of this model nugget is applytcmnode. For more information on scripting the modeling
node itself, see “tcmnode Properties” on page 301.
applyts Properties
The Time Series modeling node can be used to generate a Time Series model nugget. The scripting name
of this model nugget is applyts. For more information on scripting the modeling node itself, see “ts
properties” on page 307.
applytreeas Properties
Tree-AS modeling nodes can be used to generate a Tree-AS model nugget. The scripting name of this
model nugget is applytreenas. For more information on scripting the modeling node itself, see “treeas
properties” on page 317.
applytwostepnode Properties
TwoStep modeling nodes can be used to generate a TwoStep model nugget. The scripting name of this
model nugget is applytwostepnode. No other properties exist for this model nugget. For more information
on scripting the modeling node itself, “twostepnode Properties” on page 319.
340 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
applytwostepAS Properties
TwoStep AS modeling nodes can be used to generate a TwoStep AS model nugget. The scripting name of
this model nugget is applytwostepAS. For more information on scripting the modeling node itself,
“twostepAS Properties” on page 320.
applyxgboosttreenode properties
The XGBoost Tree node can be used to generate an XGBoost Tree model nugget. The scripting name of
this model nugget is applyxgboosttreenode. The properties in the following table were added in 18.2.1.1.
For more information on scripting the modeling node itself, see “xgboosttreenode Properties” on page
426.
applyxgboostlinearnode properties
XGBoost Linear nodes can be used to generate an XGBoost Linear model nugget. The scripting name of
this model nugget is applyxgboostlinearnode. No other properties exist for this model nugget. For more
information on scripting the modeling node itself, see “xgboostlinearnode Properties” on page 424.
hdbscannugget properties
The HDBSCAN node can be used to generate an HDBSCAN model nugget. The scripting name of this
model nugget is hdbscannugget. No other properties exist for this model nugget. For more information
on scripting the modeling node itself, see “hdbscannode properties” on page 414.
kdeapply properties
The KDE Modeling node can be used to generate a KDE model nugget. The scripting name of this model
nugget is kdeapply. For information on scripting the modeling node itself, see “kdemodel properties” on
page 415.
342 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Chapter 15. Database modeling node properties
IBM SPSS Modeler supports integration with data mining and modeling tools available from database
vendors, including Microsoft SQL Server Analysis Services, Oracle Data Mining, and IBM Netezza®
Analytics. You can build and score models using native database algorithms, all from within the IBM SPSS
Modeler application. Database models can also be created and manipulated through scripting using the
properties described in this section.
For example, the following script excerpt illustrates the creation of a Microsoft Decision Trees model by
using the IBM SPSS Modeler scripting interface:
stream = modeler.script.stream()
msbuilder = stream.createAt("mstreenode", "MSBuilder", 200, 200)
msbuilder.setPropertyValue("analysis_server_name", 'localhost')
msbuilder.setPropertyValue("analysis_database_name", 'TESTDB')
msbuilder.setPropertyValue("mode", 'Expert')
msbuilder.setPropertyValue("datasource", 'LocalServer')
msbuilder.setPropertyValue("target", 'Drug')
msbuilder.setPropertyValue("inputs", ['Age', 'Sex'])
msbuilder.setPropertyValue("unique_field", 'IDX')
msbuilder.setPropertyValue("custom_fields", True)
msbuilder.setPropertyValue("model_name", 'MSDRUG')
Common Properties
The following properties are common to the Microsoft database modeling nodes.
MS Decision Tree
There are no specific properties defined for nodes of type mstreenode. See the common Microsoft
properties at the start of this section.
MS Clustering
There are no specific properties defined for nodes of type msclusternode. See the common Microsoft
properties at the start of this section.
MS Association Rules
The following specific properties are available for nodes of type msassocnode:
MS Naive Bayes
There are no specific properties defined for nodes of type msbayesnode. See the common Microsoft
properties at the start of this section.
MS Linear Regression
There are no specific properties defined for nodes of type msregressionnode. See the common
Microsoft properties at the start of this section.
MS Neural Network
There are no specific properties defined for nodes of type msneuralnetworknode. See the common
Microsoft properties at the start of this section.
MS Logistic Regression
There are no specific properties defined for nodes of type mslogisticnode. See the common Microsoft
properties at the start of this section.
MS Time Series
There are no specific properties defined for nodes of type mstimeseriesnode. See the common
Microsoft properties at the start of this section.
344 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
MS Sequence Clustering
The following specific properties are available for nodes of type mssequenceclusternode:
Algorithm Parameters
Each Microsoft database model type has specific parameters that can be set using the
msas_parameters property--for example:
stream = modeler.script.stream()
msregressionnode = stream.findByType("msregression", None)
msregressionnode.setPropertyValue("msas_parameters",
[["MAXIMUM_INPUT_ATTRIBUTES", 255],
["MAXIMUM_OUTPUT_ATTRIBUTES", 255]])
These parameters are derived from SQL Server. To see the relevant parameters for each node:
1. Place a database source node on the canvas.
2. Open the database source node.
3. Select a valid source from the Data source drop-down list.
4. Select a valid table from the Table name list.
5. Click OK to close the database source node.
6. Attach the Microsoft database modeling node whose properties you want to list.
7. Open the database modeling node.
8. Select the Expert tab.
The available msas_parameters properties for this node are displayed.
MS Decision Tree
Table 195. MS Decision Tree properties
applymstreenode Properties Values Description
analysis_database_name string This node can be scored directly in a stream.
udf
MS Linear Regression
Table 196. MS Linear Regression properties
applymsregressionnode Values Description
Properties
analysis_database_name string This node can be scored directly in a stream.
MS Neural Network
Table 197. MS Neural Network properties
applymsneuralnetworknode Values Description
Properties
analysis_database_name string This node can be scored directly in a stream.
MS Logistic Regression
Table 198. MS Logistic Regression properties
applymslogisticnode Values Description
Properties
analysis_database_name string This node can be scored directly in a stream.
346 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
MS Time Series
Table 199. MS Time Series properties
applymstimeseriesnode Values Description
Properties
analysis_database_name string This node can be scored directly in a stream.
historical_
prediction
MS Sequence Clustering
Table 200. MS Sequence Clustering properties
applymssequenceclusternod Values Description
e Properties
analysis_database_name string This node can be scored directly in a stream.
use_prediction_probability flag
prediction_probability string
use_prediction_set flag
Equal
Custom
custom_priors structured Structured property in the form:
set :oranbnode.custom_priors =
[[drugA 1][drugB 2][drugC 3][drugX
4][drugY 5]]
348 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
* Property ignored if mode is set to Simple.
MultiFeature
NaiveBayes
use_execution_time_limit flag *
execution_time_limit integer Value must be greater than 0.*
max_naive_bayes_predictors integer Value must be greater than 0.*
max_predictors integer Value must be greater than 0.*
priors Data
Equal
Custom
custom_priors structured Structured property in the form:
set :oraabnnode.custom_priors =
[[drugA 1][drugB 2][drugC 3][drugX
4][drugY 5]]
Disable
kernel_function Linear
Gaussian
System
minmax
none
kernel_cache_size integer Gaussian kernel only. Value must
be greater than 0.*
convergence_tolerance number Value must be greater than 0.*
use_standard_deviation flag Gaussian kernel only.*
standard_deviation number Value must be greater than 0.*
use_epsilon flag Regression models only.*
epsilon number Value must be greater than 0.*
use_complexity_factor flag *
complexity_factor number *
use_outlier_rate flag One-Class variant only.*
outlier_rate number One-Class variant only. 0.0–1.0.*
weights Data
Equal
Custom
custom_weights structured Structured property in the form:
set :orasvmnode.custom_we
ights = [[drugA 1][drugB
2][drugC 3][drugX 4]
[drugY 5]]
minmax
none
350 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 205. oraglmnode properties (continued)
oraglmnode Properties Values Property Description
missing_value_handling ReplaceWithMean
UseCompleteRecords
use_row_weights flag *
row_weights_field field *
save_row_diagnostics flag *
row_diagnostics_table string *
coefficient_confidence number *
use_reference_category flag *
reference_category string *
ridge_regression Auto *
Off
On
parameter_value number *
vif_for_ridge flag *
Gini
term_max_depth integer 2–20.*
term_minpct_node number 0.0–10.0.*
term_minpct_split number 0.0–20.0.*
term_minrec_node integer Value must be greater than 0.*
term_minrec_split integer Value must be greater than 0.*
display_rule_ids flag *
Oracle KMeans
The following properties are available for nodes of type orakmeansnode.
minmax
none
distance_function Euclidean
Cosine
Size
num_bins integer Value must be greater than 0.*
block_growth integer 1–5.*
min_pct_attr_support number 0.0–1.0.*
Oracle NMF
The following properties are available for nodes of type oranmfnode.
352 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 209. oranmfnode properties
oranmfnode Properties Values Property Description
normalization_method minmax
none
use_num_features flag *
num_features integer 0–1. Default value is estimated from the data by
the algorithm.*
random_seed number *
num_iterations integer 0–500.*
conv_tolerance number 0.0–0.5.*
display_all_features flag *
Oracle Apriori
The following properties are available for nodes of type oraapriorinode.
ImportanceVa
lue
TopN
select_important flag When selection_mode is set to
ImportanceLevel, specifies whether to select
important fields.
important_label string Specifies the label for the "important" ranking.
select_marginal flag When selection_mode is set to
ImportanceLevel, specifies whether to select
marginal fields.
marginal_label string Specifies the label for the "marginal" ranking.
important_above number 0.0–1.0.
select_unimportant flag When selection_mode is set to
ImportanceLevel, specifies whether to select
unimportant fields.
unimportant_label string Specifies the label for the "unimportant"
ranking.
unimportant_below number 0.0–1.0.
importance_value number When selection_mode is set to
ImportanceValue, specifies the cutoff value
to use. Accepts values from 0 to 100.
top_n number When selection_mode is set to TopN,
specifies the cutoff value to use. Accepts values
from 0 to 1000.
354 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 212. applyoradecisiontreenode properties
applyoradecisiontreenode Values Property Description
Properties
use_costs flag
display_rule_ids flag
Oracle O-Cluster
There are no specific properties defined for nodes of type applyoraoclusternode.
Oracle KMeans
There are no specific properties defined for nodes of type applyorakmeansnode.
Oracle NMF
The following property is available for nodes of type applyoranmfnode:
Oracle Apriori
This model nugget cannot be applied in scripting.
Oracle MDL
This model nugget cannot be applied in scripting.
where:
356 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 215. netezzadectreenode properties (continued)
netezzadectreenode Values Property Description
Properties
max_tree_depth integer Maximum number of levels to
which tree can grow. Default is
62 (the maximum possible).
min_improvement_splits number Minimum improvement in
impurity for split to occur. Default
is 0.01.
min_instances_split integer Minimum number of unsplit
records remaining before split
can occur. Default is 2 (the
minimum possible).
weights structured Relative weightings for classes.
Structured property in the form:
set :netezza_dectree.weig
hts = [[drugA 0.3][drugB
0.6]]
Canberra
maximum
num_clusters integer Number of clusters to be created; default is 3.
max_iterations integer Number of algorithm iterations after which to stop
model training; default is 5.
rand_seed integer Random seed to be used for replicating analysis
results; default is 12345.
nn-neighbors
358 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 218. netezzanaivebayesnode properties (continued)
netezzanaivebayesnode Values Property Description
Properties
use_m_estimation flag If true, uses m-estimation technique for avoiding
zero probabilities during estimation.
Netezza KNN
The following properties are available for nodes of type netezzaknnnode.
Canberra
Maximum
num_nearest_neighbors integer Number of nearest neighbors for a particular case;
default is 3.
standardize_measurements flag If true, standardizes measurements for continuous
input fields before calculating distance values.
use_coresets flag If true, uses core set sampling to speed up
calculation for large data sets.
Canberra
Maximum
max_iterations integer Maximum number of algorithm iterations to
perform before model training stops; default is 5.
Netezza PCA
The following properties are available for nodes of type netezzapcanode.
360 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 222. netezzaregtreenode properties (continued)
netezzaregtreenode Values Property Description
Properties
pruning_measure mse Method to be used for pruning.
r2
pearson
spearman
prune_tree_options allTrainingData Default is to use
allTrainingData to estimate
partitionTrainingData model accuracy. Use
partitionTrainingData to
specify a percentage of training
useOtherTable data to use, or useOtherTable
to use a training data set from a
specified database table.
perc_training_data number If prune_tree_options is set
to PercTrainingData,
specifies percentage of data to
use for training.
prune_seed integer Random seed to be used for
replicating analysis results when
prune_tree_options is set to
PercTrainingData; default is
1.
pruning_table string Table name of a separate pruning
dataset for estimating model
accuracy.
compute_probabilities flag If true, specifies that variances of
assigned classes should be
included in output.
ExponentialSmoothing or
esmoothing
ARIMA
SeasonalTrendDecompositio
n or std
trend_name N Trend type for exponential
smoothing:
A
N - none
DA
A - additive
M
DA - damped additive
DM
M - multiplicative
DM - damped multiplicative
362 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 224. netezzatimeseriesnode properties (continued)
netezzatimeseriesnode Values Property Description
Properties
seasonality_type N Seasonality type for exponential
smoothing:
A
N - none
M
A - additive
M - multiplicative
interpolation_method linear Interpolation method to be used.
cubicspline
exponentialspline
timerange_setting SD Setting for time range to use:
SP SD - system-determined (uses
full range of time series data)
SP - user-specified via
earliest_time and
latest_time
earliest_time integer Start and end values, if
timerange_setting is SP.
latest_time
date
Format should follow
time time_points value.
Example:
set
NZ_DT1.timerange_setting
= 'SP'
set NZ_DT1.earliest_time
= '1921-01-01'
set NZ_DT1.latest_time =
'2121-01-01'
SD - system-determined
SP - user-specified
set NZ_DT1.algorithm_name
= 'arima'
set NZ_DT1.arima_setting
= 'SP'
set NZ_DT1.p_symbol =
'lesseq'
set NZ_DT1.d_symbol =
'lesseq'
set NZ_DT1.q_symbol =
'lesseq'
364 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 224. netezzatimeseriesnode properties (continued)
netezzatimeseriesnode Values Property Description
Properties
d integer ARIMA - non-seasonal number of
moving average orders in the
model.
sp integer ARIMA - seasonal degrees of
autocorrelation.
sq integer ARIMA - seasonal derivation
value.
sd integer ARIMA - seasonal number of
moving average orders in the
model.
advanced_setting SD Determines how advanced
settings are to be handled:
SP
SD - system-determined
Example:
set
NZ_DT1.advanced_setting =
'SP'
set NZ_DT1.period = 5
set NZ_DT1.units_period =
'd'
period integer Length of seasonal cycle,
specified in conjunction with
units_period. Not applicable
for spectral analysis.
y - years
timestamp
For example, if the
time_points field contains a
date, this should also be a date.
forecast_times integer If forecast_setting =
forecasttimes, specifies
date values to use for making
forecasts.
time
Format should follow
time_points value.
timestamp
366 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 224. netezzatimeseriesnode properties (continued)
netezzatimeseriesnode Values Property Description
Properties
include_history flag Indicates if historical values are
to be included in output.
include_interpolated_valu flag Indicates if interpolated values
es are to be included in output. Not
applicable if include_history
is false.
poisson
negativebinomial
wald
gamma
dist_params number Distribution parameter value to
use. Only applicable if
distribution is
Negativebinomial.
trials integer Only applicable if
distribution is Binomial.
When target response is a
number of events occurring in a
set of trials, target field
contains number of events, and
trials field contains number of
trials.
model_table field Name of database table where
Netezza generalized linear model
will be stored.
maxit integer Maximum number of iterations
the algorithm should perform;
default is 20.
invnegative
invsquare
sqrt
power
oddspower
log
clog
loglog
cloglog
logit
probit
gaussit
cauchit
canbinom
cangeom
cannegbinom
368 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 225. netezzaglmnode properties (continued)
netezzaglmnode Properties Values Property Description
link_params number Link function parameter value to
use. Only applicable if
link_function is power or
oddspower.
interaction Specifies interactions between
[[[colnames1],[levels1]],
fields. colnames is a list of input
[[colnames2],[levels2]],
fields, and level is always 0 for
...,[[colnamesN],[levelsN]],]
each field.
Example:
[[["K","BP","Sex","K"],
[0,0,0,0]],
[["Age","Na"],[0,0]]]
Other model nugget properties are the same as those for the corresponding modeling node.
The script names of the model nuggets are as follows.
370 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Chapter 16. Output node properties
Output node properties differ slightly from those of other node types. Rather than referring to a particular
node option, output node properties store a reference to the output object. This is useful in taking a value
from a table and then setting it as a stream parameter.
This section describes the scripting properties available for output nodes.
analysisnode properties
The Analysis node evaluates predictive models' ability to generate accurate
predictions. Analysis nodes perform various comparisons between predicted values
and actual values for one or more model nuggets. They can also compare predictive
models to each other.
Example
Output (.cou)
by_fields list
Table 228. analysisnode properties (continued)
analysisnode properties Data type Property description
full_filename string If disk, data, or HTML output,
the name of the output file.
coincidence flag
performance flag
evaluation_binary flag
confidence flag
threshold number
improve_accuracy number
field_detection_method Metadata Determines how predicted
fields are matched to the
Name original target field. Specify
Metadata or Name.
inc_user_measure flag
user_if expr
user_then expr
user_else expr
user_compute [Mean Sum Min
Max SDev]
dataauditnode properties
The Data Audit node provides a comprehensive first look at the data, including
summary statistics, histograms and distribution for each field, as well as information
on outliers, missing values, and extremes. Results are displayed in an easy-to-read
matrix that can be sorted and used to generate full-size graphs and data preparation
nodes.
Example
372 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 229. dataauditnode properties
dataauditnode properties Data type Property description
custom_fields flag
fields [field1 … fieldN]
overlay field
display_graphs flag Used to turn the display of
graphs in the output matrix on
or off.
basic_stats flag
advanced_stats flag
median_stats flag
calculate Count Used to calculate missing
values. Select either, both, or
Breakdown neither calculation method.
outlier_detection_std_outlier number If
outlier_detection_metho
d is std, specifies the number
to use to define outliers.
outlier_detection_std_extreme number If
outlier_detection_metho
d is std, specifies the number
to use to define extreme
values.
outlier_detection_iqr_outlier number If
outlier_detection_metho
d is iqr, specifies the number
to use to define outliers.
outlier_detection_iqr_extreme number If
outlier_detection_metho
d is iqr, specifies the number
to use to define extreme
values.
use_output_name flag Specifies whether a custom
output name is used.
output_name string If use_output_name is true,
specifies the name to use.
output_mode Screen Used to specify target location
for output generated from the
File output node.
HTML (.html)
Output (.cou)
paginate_output flag When the output_format is
HTML, causes the output to be
separated into pages.
lines_per_page number When used with
paginate_output, specifies
the lines per page of output.
full_filename string
extensionoutputnode properties
The Extension Output node enables you to analyze
data and the results of model scoring using your
own custom R or Python for Spark script. The
output of the analysis can be text or graphical. The
output is added to the Output tab of the manager
pane; alternatively, the output can be redirected to
a file.
python_script = """
import json
import spss.pyspark.runtime
cxt = spss.pyspark.runtime.getContext()
df = cxt.getSparkInputData()
schema = df.dtypes[:]
print df
"""
node.setPropertyValue("python_syntax", python_script)
R example
#### script example for R
node.setPropertyValue("syntax_type", "R")
node.setPropertyValue("r_syntax", "print(modelerData$Age)")
374 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 230. extensionoutputnode properties
extensionoutputnode properties Data type Property description
syntax_type R Specify which script runs – R or
Python (R is the default).
Python
r_syntax string R scripting syntax for model
scoring.
python_syntax string Python scripting syntax for
model scoring.
convert_flags Option to convert flag fields.
StringsAndDoubles
LogicalValues
kdeexport properties
Kernel Density Estimation (KDE)© uses the Ball Tree or KD Tree algorithms for
efficient queries, and combines concepts from unsupervised learning, feature
engineering, and data modeling. Neighbor-based approaches such as KDE are some
of the most popular and useful density estimation techniques. The KDE Modeling
and KDE Simulation nodes in SPSS Modeler expose the core features and commonly
used parameters of the KDE library. The nodes are implemented in Python.
matrixnode properties
The Matrix node creates a table that shows relationships between fields. It is most
commonly used to show the relationship between two symbolic fields, but it can
also show relationships between flag fields or numeric fields.
Example
376 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
node.setPropertyValue("highlight_top", 1)
node.setPropertyValue("highlight_bottom", 5)
node.setPropertyValue("display", ["Counts", "Expected", "Residuals"])
node.setPropertyValue("include_totals", True)
# "Output" tab
node.setPropertyValue("full_filename", "C:/output/matrix_output.html")
node.setPropertyValue("output_format", "HTML")
node.setPropertyValue("paginate_output", True)
node.setPropertyValue("lines_per_page", 50)
Flags
Numerics
row field
column field
include_missing_values flag Specifies whether user-missing
(blank) and system missing
(null) values are included in the
row and column output.
cell_contents CrossTabs
Function
function_field string
function Sum
Mean
Min
Max
SDev
sort_mode Unsorted
Ascending
Descending
highlight_top number If non-zero, then true.
highlight_bottom number If non-zero, then true.
Expected
Residuals
RowPct
ColumnPct
TotalPct]
include_totals flag
use_output_name flag Specifies whether a custom
output name is used.
output_name string If use_output_name is true,
specifies the name to use.
output_mode Screen Used to specify target location
for output generated from the
File output node.
Output (.cou)
paginate_output flag When the output_format is
HTML, causes the output to be
separated into pages.
lines_per_page number When used with
paginate_output, specifies
the lines per page of output.
full_filename string
meansnode properties
The Means node compares the means between independent groups or between
pairs of related fields to test whether a significant difference exists. For example,
you could compare mean revenues before and after running a promotion or compare
revenues from customers who did not receive the promotion with those who did.
Example
378 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
node.setPropertyValue("label_correlations", True)
node.setPropertyValue("output_view", "Advanced")
node.setPropertyValue("output_mode", "File")
node.setPropertyValue("output_format", "HTML")
node.setPropertyValue("full_filename", "C:/output/means_output.html")
...]
label_correlations flag Specifies whether correlation
labels are shown in output.
This setting applies only when
means_mode is set to
BetweenFields.
correlation_mode Probability Specifies whether to label
correlations by probability or
Absolute absolute value.
weak_label string
medium_label string
strong_label string
weak_below_probability number When correlation_mode is
set to Probability, specifies
the cutoff value for weak
correlations. This must be a
value between 0 and 1—for
example, 0.90.
strong_above_probability number Cutoff value for strong
correlations.
weak_below_absolute number When correlation_mode is
set to Absolute, specifies the
cutoff value for weak
correlations. This must be a
value between 0 and 1—for
example, 0.90.
strong_above_absolute number Cutoff value for strong
correlations.
Delimited (.csv)
HTML (.html)
Output (.cou)
full_filename string
output_view Simple Specifies whether the simple or
advanced view is displayed in
Advanced the output.
reportnode properties
The Report node creates formatted reports containing fixed text as well as data and
other expressions derived from the data. You specify the format of the report using
text templates to define the fixed text and data output constructions. You can
provide custom text formatting by using HTML tags in the template and by setting
options on the Output tab. You can include data values and other conditional output
by using CLEM expressions in the template.
Example
380 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 234. reportnode properties
reportnode properties Data type Property description
output_mode Screen Used to specify target location
for output generated from the
File output node.
Output (.cou)
format Auto Used to choose whether output
is automatically formatted or
Custom formatted using HTML included
in the template. To use HTML
formatting in the template,
specify Custom.
use_output_name flag Specifies whether a custom
output name is used.
output_name string If use_output_name is true,
specifies the name to use.
text string
full_filename string
highlights flag
title string
lines_per_page number
routputnode properties
The R Output node enables you to analyze data and
the results of model scoring using your own custom
R script. The output of the analysis can be text or
graphical. The output is added to the Output tab of
the manager pane; alternatively, the output can be
redirected to a file.
convert_datetime flag
convert_datetime_class
POSIXct
POSIXlt
convert_missing flag
custom_name string
output_to
Screen
File
output_type
Graph
Text
full_filename string
graph_file_type
HTML
COU
text_file_type
HTML
TEXT
COU
setglobalsnode properties
The Set Globals node scans the data and computes summary values that can be
used in CLEM expressions. For example, you can use this node to compute statistics
for a field called age and then use the overall mean of age in CLEM expressions by
inserting the function @GLOBAL_MEAN(age).
Example
node.setKeyedPropertyVa
lue(
"globals", "Age",
["Max", "Sum",
"Mean", "SDev"])
clear_first flag
show_preview flag
382 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
simevalnode properties
The Simulation Evaluation node evaluates a specified predicted target field, and
presents distribution and correlation information about the target field.
category_groups
Categories
Iterations
create_pct_table boolean
pct_table
Quartiles
Intervals
Custom
pct_intervals_num number
pct_custom_values [number1...numberN]
use_source_node_name boolean
source_node_name string The custom name of the source
node that is either being
generated or updated.
use_cases
All
LimitFirstN
use_case_limit integer
fit_criterion
AndersonDarling
KolmogorovSmirnov
num_bins integer
parameter_xml_filename string
generate_parameter_import boolean
statisticsnode properties
The Statistics node provides basic summary information about numeric fields. It
calculates summary statistics for individual fields and correlations between fields.
Example
384 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 239. statisticsnode properties
statisticsnode properties Data type Property description
use_output_name flag Specifies whether a custom
output name is used.
output_name string If use_output_name is true,
specifies the name to use.
output_mode Screen Used to specify target location
for output generated from the
File output node.
Output (.cou)
full_filename string
examine list
correlate list
statistics [count mean sum
min max range
variance sdev
semean median
mode]
correlation_mode Probability Specifies whether to label
correlations by probability or
Absolute absolute value.
label_correlations flag
weak_label string
medium_label string
strong_label string
weak_below_probability number When correlation_mode is
set to Probability, specifies
the cutoff value for weak
correlations. This must be a
value between 0 and 1—for
example, 0.90.
strong_above_probability number Cutoff value for strong
correlations.
weak_below_absolute number When correlation_mode is
set to Absolute, specifies the
cutoff value for weak
correlations. This must be a
value between 0 and 1—for
example, 0.90.
strong_above_absolute number Cutoff value for strong
correlations.
The properties for this node are described under “statisticsoutputnode Properties” on page 411.
tablenode properties
The Table node displays the data in table format, which can also be written to a file.
This is useful anytime that you need to inspect your data values or export them in an
easily readable form.
Example
Delimited (.csv)
HTML (.html)
Output (.cou)
transpose_data flag Transposes the data before export
so that rows represent fields and
columns represent records.
paginate_output flag When the output_format is HTML,
causes the output to be separated
into pages.
386 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 240. tablenode properties (continued)
tablenode properties Data type Property description
lines_per_page number When used with
paginate_output, specifies the
lines per page of output.
highlight_expr string
output string A read-only property that holds a
reference to the last table built by
the node.
value_labels [[Value LabelString] Used to specify labels for value pairs.
COMMA
date_format Sets the date format for the field
"DDMMYY"
"MMDDYY" (applies only to fields with DATE or
"YYMMDD" TIMESTAMP storage).
"YYYYMMDD"
"YYYYDDD"
DAY
MONTH
"DD-MM-YY"
"DD-MM-YYYY"
"MM-DD-YY"
"MM-DD-YYYY"
"DD-MON-YY"
"DD-MON-YYYY"
"YYYY-MM-DD"
"DD.MM.YY"
"DD.MM.YYYY"
"MM.DD.YYYY"
"DD.MON.YY"
"DD.MON.YYYY"
"DD/MM/YY"
"DD/MM/YYYY"
"MM/DD/YY"
"MM/DD/YYYY"
"DD/MON/YY"
"DD/MON/YYYY"
MON YYYY
q Q YYYY
ww WK YYYY
"MMSS"
"HH:MM:SS"
"HH:MM"
"MM:SS"
"(H)H:(M)M:(S)S"
"(H)H:(M)M"
"(M)M:(S)S"
"HH.MM.SS"
"HH.MM"
"MM.SS"
"(H)H.(M)M.(S)S"
"(H)H.(M)M"
"(M)M.(S)S"
column_width integer Sets the column width for the field. A
value of –1 will set column width to
Auto.
justify AUTO Sets the column justification for the
field.
CENTER
LEFT
RIGHT
transformnode properties
The Transform node allows you to select and visually preview the results of
transformations before applying them to selected fields.
388 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Example
390 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Chapter 17. Export Node Properties
asexport Properties
The Analytic Server export enables you to run a stream on Hadoop Distributed File System (HDFS).
Example
node.setPropertyValue("use_default_as", False)
node.setPropertyValue("connection",
["false","9.119.141.141","9080","analyticserver","ibm","admin","admin","false
","","","",""])
cognosexportnode Properties
The IBM Cognos Export node exports data in a format that can be read by Cognos
databases.
For this node, you must define a Cognos connection and an ODBC connection.
Cognos connection
The properties for the Cognos connection are as follows.
392 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 244. cognosexportnode properties
cognosexportnode Data type Property description
properties
cognos_connection ["string","flag","string","string","string"] A list property containing the
connection details for the Cognos
server. The format is:
["Cognos_server_URL",
login_mode, "namespace",
"username", "password"]
where:
• storedCredentialMode. For
example:
['Cognos_server_url',
'storedCredentialMode',
"stored_credential_name"]
Where
stored_credential_name is the
name of a Cognos credential in the
repository.
/Public Folders/MyPackage
cognos_datasource string
cognos_export_mode Publish
ExportFile
cognos_filename string
ODBC connection
The properties for the ODBC connection are identical to those listed for databaseexportnode in the
next section, with the exception that the datasource property is not valid.
databaseexportnode properties
The Database export node writes data to an ODBC-compliant relational data source.
In order to write to an ODBC data source, the data source must exist and you must
have write permission for it.
Example
'''
Assumes a datasource named "MyDatasource" has been configured
'''
stream = modeler.script.stream()
db_exportnode = stream.createAt("databaseexport", "DB Export", 200, 200)
applynn = stream.findByType("applyneuralnetwork", None)
stream.link(applynn, db_exportnode)
# Export tab
db_exportnode.setPropertyValue("username", "user")
db_exportnode.setPropertyValue("datasource", "MyDatasource")
db_exportnode.setPropertyValue("password", "password")
db_exportnode.setPropertyValue("table_name", "predictions")
db_exportnode.setPropertyValue("write_mode", "Create")
db_exportnode.setPropertyValue("generate_import", True)
394 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
db_exportnode.setPropertyValue("drop_existing_table", True)
db_exportnode.setPropertyValue("delete_existing_rows", True)
db_exportnode.setPropertyValue("default_string_size", 32)
# Schema dialog
db_exportnode.setKeyedPropertyValue("type", "region", "VARCHAR(10)")
db_exportnode.setKeyedPropertyValue("export_db_primarykey", "id", True)
db_exportnode.setPropertyValue("use_custom_create_table_command", True)
db_exportnode.setPropertyValue("custom_create_table_command", "My SQL Code")
# Indexes dialog
db_exportnode.setPropertyValue("use_custom_create_index_command", True)
db_exportnode.setPropertyValue("custom_create_index_command", "CREATE BITMAP
INDEX <index-name>
ON <table-name> <(index-columns)>")
db_exportnode.setKeyedPropertyValue("indexes", "MYINDEX", ["fields", ["id",
"region"]])
Append
Merge
map string Maps a stream field name to a
database column name (valid
only if write_mode is Merge).
Add
drop_existing_table flag
delete_existing_rows flag
default_string_size integer
type Structured property used to set
the schema type.
generate_import flag
use_custom_create_table_c flag Use the custom_create_table slot
ommand to modify the standard CREATE
TABLE SQL command.
custom_create_table_comma string Specifies a string command to
nd use in place of the standard
CREATE TABLE SQL command.
use_batch flag The following properties are
advanced options for database
bulk-loading. A True value for
Use_batch turns off row-by-row
commits to the database.
batch_size number Specifies the number of records
to send to the database before
committing to memory.
bulk_loading Off Specifies the type of bulk-
loading. Additional options for
ODBC ODBC and External are listed
below.
External
not_logged flag
odbc_binding Row Specify row-wise or column-wise
binding for bulk-loading via
Column ODBC.
Other
loader_other_delimiter
396 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 245. databaseexportnode properties (continued)
databaseexportnode Data type Property description
properties
specify_data_file flag A True flag activates the
data_file property below,
where you can specify the
filename and path to write to
when bulk-loading to the
database.
data_file string
specify_loader_program flag A True flag activates the
loader_program property
below, where you can specify the
name and location of an external
loader script or program.
loader_program string
gen_logfile flag A True flag activates the
logfile_name below, where
you can specify the name of a file
on the server to generate an error
log.
logfile_name string
check_table_size flag A True flag allows table checking
to ensure that the increase in
database table size corresponds
to the number of rows exported
from IBM SPSS Modeler.
loader_options string Specify additional arguments,
such as -comment and -
specialdir, to the loader
program.
export_db_primarykey flag Specifies whether a given field is
a primary key.
use_custom_create_index_c flag If true, enables custom SQL for
ommand all indexes.
custom_create_index_comma string Specifies the SQL command used
nd to create indexes when custom
SQL is enabled. (This value can
be overridden for specific indexes
as indicated below.)
indexes.INDEXNAME.fields Creates the specified index if
necessary and lists field names
to be included in that index.
INDEXNAME flag Used to enable or disable custom
"use_custom_create_ SQL for a specific index. See
index_command" examples after the following
table.
Note: For some databases, you can specify that database tables are created for export with compression
(for example, the equivalent of CREATE TABLE MYTABLE (...) COMPRESS YES; in SQL). The
properties use_compression and compression_mode are provided to support this feature, as follows.
All_Operations
Basic
OLTP
Query_High
Query_Low
Archive_High
Archive_Low
398 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Example showing how to change the CREATE INDEX command for a specific index:
db_exportnode.setKeyedPropertyValue("indexes", "MYINDEX",
["use_custom_create_index_command",
True])db_exportnode.setKeyedPropertyValue("indexes", "MYINDEX",
["custom_create_index_command",
"CREATE BITMAP INDEX <index-name> ON <table-name> <(index-columns)>"])
datacollectionexportnode Properties
The Data Collection export node outputs data in the format used by Data Collection
market research software. A Data Collection Data Library must be installed to use
this node.
Example
stream = modeler.script.stream()
datacollectionexportnode = stream.createAt("datacollectionexport", "Data
Collection", 200, 200)
datacollectionexportnode.setPropertyValue("metadata_file", "c:\\museums.mdd")
datacollectionexportnode.setPropertyValue("merge_metadata", "Overwrite")
datacollectionexportnode.setPropertyValue("casedata_file", "c:\
\museumdata.sav")
datacollectionexportnode.setPropertyValue("generate_import", True)
datacollectionexportnode.setPropertyValue("enable_system_variables", True)
MergeCurrent
enable_system_variables flag Specifies whether the
exported .mdd file should
include Data Collection system
variables.
casedata_file string The name of the .sav file to
which case data is exported.
generate_import flag
Example
stream = modeler.script.stream()
excelexportnode = stream.createAt("excelexport", "Excel", 200, 200)
excelexportnode.setPropertyValue("full_filename", "C:/output/myexport.xlsx")
excelexportnode.setPropertyValue("excel_file_type", "Excel2007")
excelexportnode.setPropertyValue("inc_field_names", True)
excelexportnode.setPropertyValue("inc_labels_as_cell_notes", False)
excelexportnode.setPropertyValue("launch_application", True)
excelexportnode.setPropertyValue("generate_import", True)
Append
inc_field_names flag Specifies whether field names
should be included in the first
row of the worksheet.
start_cell string Specifies starting cell for
export.
worksheet_name string Name of the worksheet to be
written.
launch_application flag Specifies whether Excel should
be invoked on the resulting file.
Note that the path for
launching Excel must be
specified in the Helper
Applications dialog box (Tools
menu, Helper Applications).
generate_import flag Specifies whether an Excel
Import node should be
generated that will read the
exported data file.
extensionexportnode properties
400 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Python for Spark example
#### script example for Python for Spark
import modeler.api
stream = modeler.script.stream()
node = stream.create("extension_export", "extension_export")
node.setPropertyValue("syntax_type", "Python")
cxt = spss.pyspark.runtime.getContext()
df = cxt.getSparkInputData()
print df.dtypes[:]
_newDF = df.select("Age","Drug")
print _newDF.dtypes[:]
node.setPropertyValue("python_syntax", python_script)
R example
#### script example for R
node.setPropertyValue("syntax_type", "R")
node.setPropertyValue("r_syntax", """write.csv(modelerData, "C:/export.csv")""")
jsonexportnode Properties
The JSON export node outputs data in JSON format.
outputfilenode Properties
The Flat File export node outputs data to a delimited text file. It is useful for
exporting data that can be read by other analysis or spreadsheet software.
Example
stream = modeler.script.stream()
outputfile = stream.createAt("outputfile", "File Output", 200, 200)
outputfile.setPropertyValue("full_filename", "c:/output/flatfile_output.txt")
outputfile.setPropertyValue("write_mode", "Append")
outputfile.setPropertyValue("inc_field_names", False)
outputfile.setPropertyValue("use_newline_after_records", False)
outputfile.setPropertyValue("delimit_mode", "Tab")
outputfile.setPropertyValue("other_delimiter", ",")
outputfile.setPropertyValue("quote_mode", "Double")
outputfile.setPropertyValue("other_quote", "*")
outputfile.setPropertyValue("decimal_symbol", "Period")
outputfile.setPropertyValue("generate_import", True)
Append
inc_field_names flag
use_newline_after_records flag
delimit_mode Comma
Tab
Space
Other
402 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 251. outputfilenode properties (continued)
outputfilenode properties Data type Property description
other_delimiter char
quote_mode None
Single
Double
Other
other_quote flag
generate_import flag
encoding StreamDefault
SystemDefault
"UTF-8"
sasexportnode Properties
The SAS export node outputs data in SAS format, to be read into SAS or a SAS-
compatible software package. Three SAS file formats are available: SAS for
Windows/OS2, SAS for UNIX, or SAS Version 7/8.
Example
stream = modeler.script.stream()
sasexportnode = stream.createAt("sasexport", "SAS Export", 200, 200)
sasexportnode.setPropertyValue("full_filename", "c:/output/
SAS_output.sas7bdat")
sasexportnode.setPropertyValue("format", "SAS8")
sasexportnode.setPropertyValue("export_names", "NamesAndLabels")
sasexportnode.setPropertyValue("generate_import", True)
UNIX
SAS7
SAS8
full_filename string
export_names NamesAndLabels Used to map field names from
IBM SPSS Modeler upon export
NamesAsLabels to IBM SPSS Statistics or SAS
variable names.
statisticsexportnode Properties
The Statistics Export node outputs data in IBM SPSS Statistics .sav or .zsav format.
The .sav or .zsav files can be read by IBM SPSS Statistics Base and other
products. This is also the format used for cache files in IBM SPSS Modeler.
The properties for this node are described under “statisticsexportnode Properties” on page 411.
404 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 253. tm1odataexport node properties (continued)
tm1odataexport node Data type Property description
properties
spss_field_to_tm1_element list The tm1 element to be mapped to must be
_mapping part of the column dimension for selected
cube view. The format is: [[[Field_1,
Dimension_1, False], [Element_1,
Dimension_2, True], ...],
[[Field_2, ExistMeasureElement,
False], [Field_3,
NewMeasureElement, True], ...]]
There are 2 lists to describe the mapping
information. Mapping a leaf element to a
dimension corresponds to example 2 below:
Example 1: The first list: ([[Field_1,
Dimension_1, False], [Element_1,
Dimension_2, True], ...]) is used for
the TM1 Dimension map information.
Each 3 value list indicates dimension mapping
information. The third Boolean value is used to
indicate if it selects an element of a
dimension. For example: "[Field_1,
Dimension_1, False]" means that
Field_1 is mapped to Dimension_1;
"[Element_1, Dimension_2, True]"
means that Element_1 is selected for
Dimension_2.
Example 2: The second list: ([[Field_2,
ExistMeasureElement, False],
[Field_3, NewMeasureElement,
True], ...]) is used for the TM1 Measure
Dimension Element map information.
Each 3 value list indicates measure element
mapping information. The third Boolean value
is used to indicate the need to create a new
element. "[Field_2,
ExistMeasureElement, False]" means
that Field_2 is mapped to the
ExistMeasureElement; "[Field_3,
NewMeasureElement, True]" means the
NewMeasureElement needs to be the
measure dimension chosen in
selected_measure and that Field_3 is
mapped to it.
selected_measure string Specify the measure dimension.
Example:
setPropertyValue("selected_measure
", "Measures")
connection_type AdminServer Indicates the connection type. Default is
TM1Server AdminServer.
admin_host string The URL for the host name of the REST API.
Required if the connection_type is
AdminServer.
Note: This node was deprecated in Modeler 18.0. The replacement node script name is tm1odataexport.
For example:
TM1_export.setPropertyValue("tm1_c
onnection", ['Planning Sample',
"admin" "apple"])
selected_cube field The name of the cube to which you are
exporting data. For example:
TM1_export.setPropertyValue("selec
ted_cube", "plan_BudgetPlan")
406 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 254. tm1export node properties (continued)
tm1export node properties Data type Property description
spssfield_tm1element_mapp list The tm1 element to be mapped to must be
ing part of the column dimension for selected
cube view. The format is: [[[Field_1,
Dimension_1, False], [Element_1,
Dimension_2, True], ...],
[[Field_2, ExistMeasureElement,
False], [Field_3,
NewMeasureElement, True], ...]]
Example:
setPropertyValue("selected_measure
", "Measures")
Example
stream = modeler.script.stream()
xmlexportnode = stream.createAt("xmlexport", "XML Export", 200, 200)
xmlexportnode.setPropertyValue("full_filename", "c:/export/data.xml")
xmlexportnode.setPropertyValue("map", [["/catalog/book/genre", "genre"], ["/
catalog/book/title", "title"]])
408 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Chapter 18. IBM SPSS Statistics Node Properties
statisticsimportnode Properties
The Statistics File node reads data from the .sav or .zsav file format used by IBM SPSS
Statistics, as well as cache files saved in IBM SPSS Modeler, which also use the same
format.
Example
stream = modeler.script.stream()
statisticsimportnode = stream.createAt("statisticsimport", "SAV Import",
200, 200)
statisticsimportnode.setPropertyValue("full_filename", "C:/data/drug1n.sav")
statisticsimportnode.setPropertyValue("import_names", True)
statisticsimportnode.setPropertyValue("import_data", True)
LabelsAsData
use_field_format_for_stor Boolean Specifies whether to use IBM SPSS
age Statistics field format information when
importing.
statisticstransformnode properties
The Statistics Transform node runs a selection of IBM SPSS Statistics syntax
commands against data sources in IBM SPSS Modeler. This node requires a licensed
copy of IBM SPSS Statistics.
Example
stream = modeler.script.stream()
statisticstransformnode = stream.createAt("statisticstransform",
"Transform", 200, 200)
statisticstransformnode.setPropertyValue("syntax", "COMPUTE NewVar = Na +
K.")
statisticstransformnode.setKeyedPropertyValue("new_name", "NewVar", "Mixed
Drugs")
statisticstransformnode.setPropertyValue("check_before_saving", True)
statisticsmodelnode properties
The Statistics Model node enables you to analyze and work with your data by
running IBM SPSS Statistics procedures that produce PMML. This node requires a
licensed copy of IBM SPSS Statistics.
Example
stream = modeler.script.stream()
statisticsmodelnode = stream.createAt("statisticsmodel", "Model", 200, 200)
statisticsmodelnode.setPropertyValue("syntax", "COMPUTE NewVar = Na + K.")
statisticsmodelnode.setKeyedPropertyValue("new_name", "NewVar", "Mixed
Drugs")
410 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
statisticsoutputnode Properties
The Statistics Output node allows you to call an IBM SPSS Statistics procedure to
analyze your IBM SPSS Modeler data. A wide variety of IBM SPSS Statistics
analytical procedures is available. This node requires a licensed copy of IBM SPSS
Statistics.
Example
stream = modeler.script.stream()
statisticsoutputnode = stream.createAt("statisticsoutput", "Output", 200,
200)
statisticsoutputnode.setPropertyValue("syntax", "SORT CASES BY Age(A) Sex(A)
BP(A) Cholesterol(A)")
statisticsoutputnode.setPropertyValue("use_output_name", False)
statisticsoutputnode.setPropertyValue("output_mode", "File")
statisticsoutputnode.setPropertyValue("full_filename", "Cases by Age, Sex
and Medical History")
statisticsoutputnode.setPropertyValue("file_type", "HTML")
File
full_filename string
file_type HTML
SPV
SPW
statisticsexportnode Properties
The Statistics Export node outputs data in IBM SPSS Statistics .sav or .zsav format.
The .sav or .zsav files can be read by IBM SPSS Statistics Base and other
products. This is also the format used for cache files in IBM SPSS Modeler.
Example
stream = modeler.script.stream()
statisticsexportnode = stream.createAt("statisticsexport", "Export", 200,
200)
statisticsexportnode.setPropertyValue("full_filename", "c:/output/
zsav statisticsexportnode.setPropertyValue("file_
type","sav")
encrypt_file flag Whether or not the file is password protected.
password string The password.
launch_applicatio flag
n
export_names NamesAndLabels Used to map field names from IBM SPSS Modeler upon
export to IBM SPSS Statistics or SAS variable names.
NamesAsLabels
generate_import flag
412 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Chapter 19. Python Node Properties
gmm properties
A Gaussian Mixture© model is a probabilistic model that assumes all the data points
are generated from a mixture of a finite number of Gaussian distributions with
unknown parameters. One can think of mixture models as generalizing k-means
clustering to incorporate information about the covariance structure of the data as
well as the centers of the latent Gaussians. The Gaussian Mixture node in SPSS
Modeler exposes the core features and commonly used parameters of the Gaussian
Mixture library. The node is implemented in Python.
414 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 261. hdbscannode properties (continued)
hdbscannode properties Data type Property description
allow_single_cluster boolean Specify true if you want to allow single
cluster results. Default is false.
p_value double Specify the p value to use if you're using
minkowski for the metric. Default is 1.5.
leaf_size integer If using a space tree algorithm
(boruvka_kdtree, or
boruvka_balltree), specify the
number of points in a leaf node of the tree.
Default is 40.
outputValidity boolean Specify true or false to control whether
the Validity Index chart is included in the
model output.
outputCondensed boolean Specify true or false to control whether
the Condensed Tree chart is included in
the model output.
outputSingleLinkage boolean Specify true or false to control whether
the Single Linkage Tree chart is included in
the model output.
outputMinSpan boolean Specify true or false to control whether
the Min Span Tree chart is included in the
model output.
is_split Added in version 18.2.1.1.
kdemodel properties
Kernel Density Estimation (KDE)© uses the Ball Tree or KD Tree algorithms for
efficient queries, and combines concepts from unsupervised learning, feature
engineering, and data modeling. Neighbor-based approaches such as KDE are some
of the most popular and useful density estimation techniques. The KDE Modeling
and KDE Simulation nodes in SPSS Modeler expose the core features and commonly
used parameters of the KDE library. The nodes are implemented in Python.
kdeexport properties
Kernel Density Estimation (KDE)© uses the Ball Tree or KD Tree algorithms for
efficient queries, and combines concepts from unsupervised learning, feature
engineering, and data modeling. Neighbor-based approaches such as KDE are some
of the most popular and useful density estimation techniques. The KDE Modeling
and KDE Simulation nodes in SPSS Modeler expose the core features and commonly
used parameters of the KDE library. The nodes are implemented in Python.
416 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 263. kdeexport properties (continued)
kdeexport properties Data type Property description
kernel string The kernel to use: gaussian or tophat.
Default is gaussian.
algorithm string The tree algorithm to use: kd_tree,
ball_tree, or auto. Default is auto.
metric string The metric to use when calculating
distance. For the kd_tree algorithm,
choose from: Euclidean, Chebyshev,
Cityblock, Minkowski, Manhattan,
Infinity, P, L2, or L1. For the
ball_tree algorithm, choose from:
Euclidian, Braycurtis, Chebyshev,
Canberra, Cityblock, Dice, Hamming,
Infinity, Jaccard, L1, L2,
Minkowski, Matching, Manhattan, P,
Rogersanimoto, Russellrao,
Sokalmichener, Sokalsneath, or
Kulsinski. Default is Euclidean.
atol float The desired absolute tolerance of the
result. A larger tolerance will generally
lead to faster execution. Default is 0.0.
rtol float The desired relative tolerance of the
result. A larger tolerance will generally
lead to faster execution. Default is 1E-8.
breadthFirst boolean Set to True to use a breadth-first
approach. Set to False to use a depth-
first approach. Default is True.
LeafSize integer The leaf size of the underlying tree.
Default is 40. Changing this value may
significantly impact the performance.
pValue double Specify the P Value to use if you're using
Minkowski for the metric. Default is 1.5.
gmm properties
A Gaussian Mixture© model is a probabilistic model that assumes all the data points
are generated from a mixture of a finite number of Gaussian distributions with
unknown parameters. One can think of mixture models as generalizing k-means
clustering to incorporate information about the covariance structure of the data as
well as the centers of the latent Gaussians. The Gaussian Mixture node in SPSS
Modeler exposes the core features and commonly used parameters of the Gaussian
Mixture library. The node is implemented in Python.
ocsvmnode properties
The One-Class SVM node uses an unsupervised learning algorithm. The node can be
used for novelty detection. It will detect the soft boundary of a given set of samples,
to then classify new points as belonging to that set or not. This One-Class SVM
modeling node in SPSS Modeler is implemented in Python and requires the
scikit-learn© Python library.
418 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 265. ocsvmnode properties (continued)
ocsvmnode properties Data type Property description
mode_type string The mode. Possible values are simple or
expert. All parameters on the Expert tab
will be disabled if simple is specified.
stopping_criteria string A string of scientific notation. Possible
values are 1.0E-1, 1.0E-2, 1.0E-3,
1.0E-4, 1.0E-5, or 1.0E-6. Default is
1.0E-3.
precision float The regression precision (nu). Bound on
the fraction of training errors and support
vectors. Specify a number greater than 0
and less than or equal to 1.0. Default is
0.1.
kernel string The kernel type to use in the algorithm.
Possible values are linear, poly, rbf,
sigmoid, or precomputed. Default is
rbf.
enable_gamma Boolean Enables the gamma parameter. Specify
true or false. Default is true.
gamma float This parameter is only enabled for the
kernels rbf, poly, and sigmoid. If the
enable_gamma parameter is set to
false, this parameter will be set to auto.
If set to true, the default is 0.1.
coef0 float Independent term in the kernel function.
This parameter is only enabled for the
poly kernel and the sigmoid kernel.
Default value is 0.0.
degree integer Degree of the polynomial kernel function.
This parameter is only enabled for the
poly kernel. Specify any integer. Default
is 3.
shrinking Boolean Specifies whether to use the shrinking
heuristic option. Specify true or false.
Default is false.
enable_cache_size Boolean Enables the cache_size parameter.
Specify true or false. Default is false.
cache_size float The size of the kernel cache in MB. Default
is 200.
pc_type string The type of the parallel coordinates
graphic. Possible options are
independent or general.
lines_amount integer Maximum number of lines to include on
the graphic. Specify an integer between 1
and 1000.
rfnode properties
420 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 266. rfnode properties (continued)
rfnode properties Data type Property description
n_estimators integer Number of trees to build. Default is 10.
specify_max_depth Boolean Specify custom max depth. If false,
nodes are expanded until all leaves are
pure or until all leaves contain less than
min_samples_split samples. Default is
false.
max_depth integer The maximum depth of the tree. Default is
10.
min_samples_leaf integer Minimum leaf node size. Default is 1.
max_features string The number of features to consider when
looking for the best split:
• If auto, then
max_features=sqrt(n_features)
for classifier and
max_features=sqrt(n_features) for
regression.
• If sqrt, then
max_features=sqrt(n_features).
• If log2, then max_features=log2
(n_features).
Default is auto.
bootstrap Boolean Use bootstrap samples when building
trees. Default is true.
oob_score Boolean Use out-of-bag samples to estimate the
generalization accuracy. Default value is
false.
extreme Boolean Use extremely randomized trees. Default
is false.
use_random_seed Boolean Specify this to get replicated results.
Default is false.
random_seed integer The random number seed to use when
build trees. Specify any integer.
cache_size float The size of the kernel cache in MB. Default
is 200.
enable_random_seed Boolean Enables the random_seed parameter.
Specify true or false. Default is false.
enable_hpo Boolean Specify true or false to enable or
disable the HPO options. If set to true,
Rbfopt will be applied to determine the
"best" Random Forest model
automatically, which reaches the target
objective value defined by the user with
the following target_objval parameter.
smotenode Properties
The Synthetic Minority Over-sampling Technique (SMOTE) node provides an over-
sampling algorithm to deal with imbalanced data sets. It provides an advanced
method for balancing data. The SMOTE process node in SPSS Modeler is
implemented in Python and requires the imbalanced-learn© Python library.
422 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 267. smotenode properties (continued)
smotenode properties Data type Property description
algorithm_kind string The type of SMOTE algorithm: regular,
borderline1, or borderline2.
Renamed to algorithm starting
with version 18.2.1.1
usepartition Boolean If set to true, only training data will be
used for model building. Default is true.
Renamed to use_partition
starting with version 18.2.1.1
tsnenode Properties
xgboostlinearnode Properties
XGBoost Linear© is an advanced implementation of a gradient boosting algorithm
with a linear model as the base model. Boosting algorithms iteratively learn weak
classifiers and then add them to a final strong classifier. The XGBoost Linear node in
SPSS Modeler is implemented in Python.
424 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 269. xgboostlinearnode properties
xgboostlinearnode properties Data type Property description
TargetField field
426 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 270. xgboosttreenode properties (continued)
xgboosttreenode properties Data type Property description
objectiveType string The objective type for the learning task.
Possible values are reg:linear,
Renamed to objective_type reg:logistic, reg:gamma,
starting with version 18.2.1.1 reg:tweedie, count:poisson, rank:p
airwise, binary:logistic, or multi.
Note that for flag targets, only
binary:logistic or multi can be
used. If multi is used, the score result
will show the multi:softmax and
multi:softprob XGBoost objective
types.
earlyStopping Boolean Whether to use the early stopping
function. Default is False.
Renamed to early_stopping
starting with version 18.2.1.1
earlyStoppingRounds integer Validation error needs to decrease at least
every early stopping round(s) to continue
Renamed to training. Default is 10.
early_stopping_rounds
starting with version 18.2.1.1
evaluationDataRatio Double Ration of input data used for validation
errors. Default is 0.3.
Renamed to
evaluation_data_ratio
starting with version 18.2.1.1
random_seed integer The random number seed. Any number
between 0 and 9999999. Default is 0.
sampleSize Double The sub sample for control overfitting.
Specify a value between 0.1 and 1.0.
Renamed to sample_size Default is 0.1.
starting with version 18.2.1.1
eta Double The eta for control overfitting. Specify a
value between 0 and 1. Default is 0.3.
gamma Double The gamma for control overfitting. Specify
any number 0 or greater. Default is 6.
colsSampleRatio Double The colsample by tree for control
overfitting. Specify a value between 0.01
Renamed to col_sample_ratio and 1. Default is 1.
starting with version 18.2.1.1
colsSampleLevel Double The colsample by level for control
overfitting. Specify a value between 0.01
Renamed to col_sample_level and 1. Default is 1.
starting with version 18.2.1.1
lambda Double The lambda for control overfitting. Specify
any number 0 or greater. Default is 1.
428 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Chapter 20. Spark Node Properties
isotonicasnode Properties
kmeansasnode properties
K-Means is one of the most commonly used clustering algorithms. It clusters data
points into a predefined number of clusters. The K-Means-AS node in SPSS Modeler
is implemented in Spark. For details about K-Means algorithms, see https://
spark.apache.org/docs/2.2.0/ml-clustering.html. Note that the K-Means-AS node
performs one-hot encoding automatically for categorical variables.
multilayerperceptronnode Properties
Multilayer perceptron is a classifier based on the feedforward artificial neural
network and consists of multiple layers. Each layer is fully connected to the next
layer in the network. The MultiLayerPerceptron-AS node in SPSS Modeler is
implemented in Spark. For details about the multilayer perceptron classifier (MLPC),
see https://spark.apache.org/docs/latest/ml-classification-
regression.html#multilayer-perceptron-classifier.
430 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 273. multilayerperceptronnode properties (continued)
multilayerperceptronnode Data type Property description
properties
maxiter integer The maximum number of iterations to
perform. Default is 10.
xgboostasnode Properties
XGBoost is an advanced implementation of a gradient boosting algorithm. Boosting
algorithms iteratively learn weak classifiers and then add them to a final strong
classifier. XGBoost is very flexible and provides many parameters that can be
overwhelming to most users, so the XGBoost-AS node in SPSS Modeler exposes the
core features and commonly used parameters. The XGBoost-AS node is
implemented in Spark.
432 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 274. xgboostasnode properties (continued)
xgboostasnode properties Data type Property description
colsSampleLevel Double The sub sample ratio of columns for each
split, in each level. Specify a value
between 0.01 and 1. Default is 1.
normalizeType string If the dart booster type is used, this dart
parameter and the following three dart
parameters are available. This parameter
sets the normalization algorithm. Specify
tree or forest. Default is tree.
sampleType string The sampling algorithm type. Specify
uniform or weighted. Default is
uniform.
rateDrop Double The dropout rate dart booster parameter.
Specify a value between 0.0 and 1.0.
Default is 0.0.
skipDrop Double The dart booster parameter for the
probability of skip dropout. Specify a value
between 0.0 and 1.0. Default is 0.0.
Properties that are specific to SuperNodes are described in the following tables. Note that common node
properties also apply to SuperNodes.
Normal
script string
SuperNode Parameters
You can use scripts to create or set SuperNode parameters using the general format:
mySuperNode.setParameterValue("minvalue", 30)
value mySuperNode.getParameterValue("minvalue")
childDiagram = source_supernode.getChildDiagram()
varfilenode = childDiagram.findByType("variablefile", None)
varfilenode.setPropertyValue("full_filename", "c:/mydata.txt")
Creating SuperNodes
If you want to create a SuperNode and its content from scratch, you can do that in a similar way by
creating the SuperNode, accessing the child diagram, and creating the nodes you want. You must also
ensure that the nodes within the SuperNode diagram are also linked to the input- and/or output connector
nodes. For example, if you want to create a process SuperNode:
436 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Appendix A. Node names reference
This section provides a reference for the scripting names of the nodes in IBM SPSS Modeler.
438 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Table 277. Model Nugget Names (Database Modeling Palette) (continued)
Model name Model
netezzatimeseries Netezza Time Series
oraabn Oracle Adaptive Bayes
oraai Oracle AI
oradecisiontree Oracle Decision Tree
oraglm Oracle GLM
orakmeans Oracle k-Means
oranb Oracle Naive Bayes
oranmf Oracle NMF
oraocluster Oracle O-Cluster
orasvm Oracle SVM
Table 278. Output object types and the nodes that create them
Output object type Node
analysisoutput Analysis
collectionoutput Collection
dataauditoutput Data Audit
distributionoutput Distribution
evaluationoutput Evaluation
440 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Appendix B. Migrating from legacy scripting to
Python scripting
General differences
Legacy scripting owes much of its design to OS command scripts. Legacy scripting is line oriented, and
although there are some block structures, for example if...then...else...endif and
for...endfor, indentation is generally not significant.
In Python scripting, indentation is significant and lines belonging to the same logical block must be
indented by the same level.
Note: You must take care when copying and pasting Python code. A line that is indented using tabs might
look the same in the editor as a line that is indented using spaces. However, the Python script will
generate an error because the lines are not considered as equally indented.
s = modeler.script.stream()
Stream related functions can then be invoked through the returned object.
Python uses functions that are usually invoked through an object (a module, class or object) that defines
the function, for example:
stream = modeler.script.stream()
typenode = stream.findByType("type", "Type)
filternode = stream.findByType("filter", None)
stream.link(typenode, filternode)
derive.setLabel("Compute Total")
Literals and comments
Some literal and comment commands that are commonly used in IBM SPSS Modeler have equivalent
commands in Python scripting. This might help you to convert your existing SPSS Modeler Legacy scripts
to Python scripts for use in IBM SPSS Modeler 17.
Table 279. Legacy scripting to Python scripting mapping for literals and comments
Legacy scripting Python scripting
Integer, for example 4 Same
Float, for example 0.003 Same
Single quoted strings, for example ‘Hello’ Same
Note: String literals containing non-ASCII
characters must be prefixed by a u to ensure that
they are represented as Unicode.
“””This is a string
that spans multiple
lines”””
442 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Operators
Some operator commands that are commonly used in IBM SPSS Modeler have equivalent commands in
Python scripting. This might help you to convert your existing SPSS Modeler Legacy scripts to Python
scripts for use in IBM SPSS Modeler 17.
= ==
==
/= !=
/==
X ** Y X ** Y
X < Y X < Y
X <= Y X <= Y
X > Y X > Y
X >= Y X >= Y
X div Y X // Y
X rem Y X % Y
X mod Y X % Y
and and
or or
not(EXPR) not EXPR
Table 281. Legacy scripting to Python scripting mapping for conditionals and looping
Legacy scripting Python scripting
VAR = INT1
while VAR <= INT2:
...
VAR += 1
if…then if …:
… …
elseif…then elif …:
… …
else else:
… …
endif
No equivalent
with TYPE OBJECT
…
endwith
Variables
In legacy scripting, variables are declared before they are referenced, for example:
var mynode
set mynode = create typenode at 96 96
In Python scripting, variables are created when they are first referenced, for example:
444 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
In legacy scripting, references to variables must be explicitly removed using the ^ operator, for example:
var mynode
set mynode = create typenode at 96 96
set ^mynode.direction."Age" = Input
Like most scripting languages, this is not necessary is Python scripting, for example:
The IBM SPSS Modeler API in Python does not include the node suffix, so the Derive node has the type
derive, for example:
The only difference in type names in legacy and Python scripting is the lack of the type suffix.
Property names
Property names are the same in both legacy and Python scripting. For example, in the Variable File node,
the property that defines the file location is full_filename in both scripting environments.
Node references
Many legacy scripts use an implicit search to find and access the node to be modified. For example, the
following commands search the current stream for a Type node with the label "Type", then set the
direction (or modeling role) of the "Age" field to Input and the "Drug" field to be Target, that is the value to
be predicted:
In Python scripting, node objects have to be located explicitly before calling the function to set the
property value, for example:
typenode = stream.findByID("id65EMPB9VL87")
typenode.setKeyedPropertyValue("direction", "Age", "Input")
In Python scripting, the same result is achieved by using the functions setPropertyValue() and
setKeyedPropertyValue(), for example:
object.setPropertyValue(property, value)
object.setKeyedPropertyValue(keyed-property, key, value)
In legacy scripting, accessing property values can be achieved using the get command, for example:
var n v
set n = get node :filternode
set v = ^n.name
In Python scripting, the same result is achieved by using the function getPropertyValue(), for
example:
n = stream.findByType("filter", None)
v = n.getPropertyValue("name")
Editing streams
In legacy scripting, the create command is used to create a new node, for example:
In Python scripting, streams have various methods for creating nodes, for example:
stream = modeler.script.stream()
agg = stream.createAt("aggregate", "Aggregate", 96, 96)
select = stream.createAt("select", "Select", 164, 96)
In legacy scripting, the connect command is used to create links between nodes, for example:
In Python scripting, the link method is used to create links between nodes, for example:
stream.link(agg, select)
In legacy scripting, the disconnect command is used to remove links between nodes, for example:
In Python scripting, the unlink method is used to remove links between nodes, for example:
stream.unlink(agg, select)
446 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
In legacy scripting, the position command is used to position nodes on the stream canvas or between
other nodes, for example:
In Python scripting, the same result is achieved by using two separate methods; setXYPosition and
setPositionBetween. For example:
agg.setXYPosition(256, 256)
agg.setPositionBetween(myselect, mydistinct)
Node operations
Some node operation commands that are commonly used in IBM SPSS Modeler have equivalent
commands in Python scripting. This might help you to convert your existing SPSS Modeler Legacy scripts
to Python scripts for use in IBM SPSS Modeler 17.
Table 282. Legacy scripting to Python scripting mapping for node operations
Legacy scripting Python scripting
create nodespec at x y
stream.create(type, name)
stream.createAt(type, name, x, y)
stream.createBetween(type, name, preNode,
postNode)
stream.createModelApplier(model, name)
Looping
In legacy scripting, there are two main looping options that are supported:
• Counted loops, where an index variable moves between two integer bounds.
• Sequence loops that loop through a sequence of values, binding the current value to the loop variable.
for i from 1 to 10
println ^i
endfor
var items
set items = [a b c d]
for i in items
println ^i
endfor
i = 1
while i <= 10:
print i
i += 1
The sequence loop is very flexible, and when it is combined with IBM SPSS Modeler API methods it can
support the majority of legacy scripting use cases. The following example shows how to use a sequence
loop in Python scripting to iterate through the fields that come out of a node:
Executing streams
During stream execution, model or output objects that are generated are added to one of the object
managers. In legacy scripting, the script must either locate the built objects from the object manager, or
access the most recently generated output from the node that generated the output.
Stream execution in Python is different, in that any model or output objects that are generated from the
execution are returned in a list that is passed to the execution function. This makes it simpler to access
the results of the stream execution.
Legacy scripting supports three stream execution commands:
• execute_all executes all executable terminal nodes in the stream.
• execute_script executes the stream script regardless of the setting of the script execution.
• execute node executes the specified node.
Python scripting supports a similar set of functions:
• stream.runAll(results-list) executes all executable terminal nodes in the stream.
• stream.runScript(results-list) executes the stream script regardless of the setting of the
script execution.
448 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
• stream.runSelected(node-array, results-list) executes the specified set of nodes in the
order that they are supplied.
• node.run(results-list) executes the specified node.
In legacy script, a stream execution can be terminated using the exit command with an optional integer
code, for example:
exit 1
In Python scripting, the same result can be achieved with the following script:
modeler.script.exit(1)
var s
set s = open stream "c:/my streams/modeling.str"
In Python scripting, there is the TaskRunner class that is accessible from the session and can be used to
perform similar tasks, for example:
taskrunner = modeler.script.session().getTaskRunner()
s = taskrunner.openStreamFromFile("c:/my streams/modeling.str", True)
To save an object in legacy scripting, you can use the save command, for example:
The equivalent Python script approach would be using the TaskRunner class, for example:
IBM SPSS Collaboration and Deployment Services Repository based operations are supported in legacy
scripting through the retrieve and store commands, for example:
var s
set s = retrieve stream "/my repository folder/my_stream.str"
store stream ^s as "/my repository folder/my_stream_copy.str"
In Python scripting, the equivalent functionality would be accessed through the Repository object that is
associated with the session, for example:
session = modeler.script.session()
repo = session.getRepository()
s = repo.retrieveStream("/my repository folder/my_stream.str", None, None, True)
repo.storeStream(s, "/my repository folder/my_stream_copy.str", None)
Note: Repository access requires that the session has been configured with a valid repository connection.
Table 283. Legacy scripting to Python scripting mapping for stream operations
Legacy scripting Python scripting
create stream DEFAULT_FILENAME taskrunner.createStream(name,
autoConnect, autoManage)
close stream stream.close()
clear stream stream.clear()
get stream stream No equivalent
load stream path No equivalent
open stream path taskrunner.openStreamFromFile(path,
autoManage)
save stream as path taskrunner.saveStreamToFile(stream,
path)
retreive stream path repository.retreiveStream(path,
version, label, autoManage)
store stream as path repository.storeStream(stream, path,
label)
Model operations
Some model operation commands that are commonly used in IBM SPSS Modeler have equivalent
commands in Python scripting. This might help you to convert your existing SPSS Modeler Legacy scripts
to Python scripts for use in IBM SPSS Modeler 17.
Table 284. Legacy scripting to Python scripting mapping for model operations
Legacy scripting Python scripting
open model path taskrunner.openModelFromFile(path,
autoManage)
save model as path taskrunner.saveModelToFile(model, path)
retrieve model path repository.retrieveModel(path, version,
label, autoManage)
store model as path repository.storeModel(model, path,
label)
450 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Document output operations
Some document output operation commands that are commonly used in IBM SPSS Modeler have
equivalent commands in Python scripting. This might help you to convert your existing SPSS Modeler
Legacy scripts to Python scripts for use in IBM SPSS Modeler 17.
Table 285. Legacy scripting to Python scripting mapping for document output operations
Legacy scripting Python scripting
open output path taskrunner.openDocumentFromFile(path,
autoManage)
save output as path taskrunner.saveDocumentToFile(output,
path)
retrieve output path repository.retrieveDocument(path,
version, label, autoManage)
store output as path repository.storeDocument(output, path,
label)
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property
Department in your country or send inquiries, in writing, to:
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at
"Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or
trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon,
Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or
its subsidiaries in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
Applicability
These terms and conditions are in addition to any terms of use for the IBM website.
Personal use
You may reproduce these publications for your personal, noncommercial use provided that all proprietary
notices are preserved. You may not distribute, display or make derivative work of these publications, or
any portion thereof, without the express consent of IBM.
454 Notices
Commercial use
You may reproduce, distribute and display these publications solely within your enterprise provided that
all proprietary notices are preserved. You may not make derivative works of these publications, or
reproduce, distribute or display these publications or any portion thereof outside your enterprise, without
the express consent of IBM.
Rights
Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either
express or implied, to the publications or any information, data, software or other intellectual property
contained therein.
IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of
the publications is detrimental to its interest or, as determined by IBM, the above instructions are not
being properly followed.
You may not download, export or re-export this information except in full compliance with all applicable
laws and regulations, including all United States export laws and regulations.
IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE PUBLICATIONS ARE
PROVIDED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT,
AND FITNESS FOR A PARTICULAR PURPOSE.
Notices 455
456 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Index
Index 457
Association Rules node nugget (continued) command line (continued)
properties 324 parameters 65
associationrulesnode properties 215 running IBM SPSS Modeler 63
astimeintervalsnode properties 152 scripting 52
Auto Classifier models conditional execution of streams 6, 9
node scripting properties 324 coordinate system reprojection
Auto Classifier node properties 165
node scripting properties 218 Cox regression models
Auto Cluster models node scripting properties 234, 328
node scripting properties 326 coxregnode properties 234
Auto Cluster node CPLEX Optimization node
node scripting properties 220 properties 121
auto numeric models cplexoptnode properties 121
node scripting properties 222 creating a class 24
Auto Numeric models creating nodes 31, 33
node scripting properties 326
autoclassifiernode properties 218
autoclusternode properties 220
D
autodataprepnode properties 148 Data Audit node
automatic data preparation properties 372
properties 148 Data Collection export node
autonumericnode properties 222 properties 399
Data Collection source node
B properties 92
dataauditnode properties 372
Balance node Database export node
properties 120 properties 394
balancenode properties 120 database modeling 343
bayesian network models Database node
node scripting properties 224 properties 91
Bayesian Network models databaseexportnode properties 394
node scripting properties 326 databasenode properties 91
bayesnet properties 224 datacollectionexportnode properties 399
Binning node datacollectionimportnode properties 92
properties 152 decision list models
binningnode properties 152 node scripting properties 237, 329
blocks of code 19 decisionlist properties 237
buildr properties 225 defining a class 24
defining attributes 25
defining methods 25
C Derive node
C&R tree models properties 155
node scripting properties 229, 327 derive_stbnode
C5.0 models properties 124
node scripting properties 226, 327 derivenode properties 155
c50node properties 226 diagrams 27
CARMA models Directed Web node
node scripting properties 228, 327 properties 208
carmanode properties 228 directedwebnode properties 208
cartnode properties 229 discriminant models
CHAID models node scripting properties 238, 329
node scripting properties 232, 328 discriminantnode properties 238
chaidnode properties 232 Distinct node
clear generated palette command 52 properties 126
CLEM distinctnode properties 126
scripting 1 Distribution node
cognosimport node properties 87 properties 187
Collection node distributionnode properties 187
properties 186
collectionnode properties 186 E
command line
list of arguments 64, 66–68 E-Plot node
multiple arguments 68 properties 205
458 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
encoded passwords Fixed File node (continued)
adding to scripts 52 properties 99
Ensemble node fixedfilenode properties 99
properties 159 flags
ensemblenode properties 159 combining multiple flags 68
eplotnode properties 205 command line arguments 63
error checking Flat File node
scripting 52 properties 402
Evaluation node flatfilenode properties 402
properties 188 for command 49
evaluationnode properties 188 functions
examples 20 comments 442
Excel export node conditionals 444
properties 400, 401 document output operations 451
Excel source node literals 442
properties 96 looping 444
excelexportnode properties 400, 401 model operations 450
excelimportnode properties 96 node operations 447
executing scripts 11 object references 442
Executing streams 27 operators 443
execution order stream operations 450
changing with scripts 49
export nodes
node scripting properties 391
G
exportModelToFile 40 Gaussian Mixture node
Extension Export node properties 413, 417
properties 400 generalized linear models
Extension Import node node scripting properties 247, 331
properties 97 generated keyword 52
Extension Model node generated models
node scripting properties 240 scripting names 437, 439
Extension Output node genlinnode properties 247
properties 374 Geospatial source node
Extension Transform node properties 104
properties 127 GLE models
extensionexportnode properties 400 node scripting properties 257, 332
extensionimportnode properties 97 gle properties 257
extensionmodelnode properties 240 GLMM models
extensionoutputnode properties 374 node scripting properties 252, 332
extensionprocessnode properties 127 glmmnode properties 252
gmm properties 413, 417
F graph nodes
scripting properties 185
factornode properties 243 Graphboard node
feature selection models properties 190
node scripting properties 245, 331 graphboardnode properties 190
Feature Selection models gsdata_import node properties 104
applying 4
scripting 4
featureselectionnode properties 4, 245
H
field names HDBSCAN node
changing case 49 properties 414
Field Reorder node hdbscannode properties 414
properties 165 hdbscannugget properties 341
fields hidden variables 25
turning off in scripting 185 Histogram node
Filler node properties 195
properties 160 histogramnode properties 195
fillernode properties 160 History node
Filter node properties 162
properties 161 historynode properties 162
filternode properties 161
finding nodes 29
Fixed File node
Index 459
I kohonen models
node scripting properties 268
IBM Cognos source node Kohonen models
properties 87 node scripting properties 333
IBM Cognos TM1 source node kohonennode properties 268
properties 107, 108
IBM SPSS Analytic Server Repository
command line arguments 68
L
IBM SPSS Collaboration and Deployment Services linear models
Repository node scripting properties 269, 334
command line arguments 67 linear properties 269
scripting 49 linear regression models
IBM SPSS Modeler node scripting properties 289, 337, 338
running from command line 63 linear support vector machine models
IBM SPSS Statistics export node node scripting properties 278, 335
properties 411 linear-AS models
IBM SPSS Statistics models node scripting properties 271, 334
node scripting properties 410 linear-AS properties 271
IBM SPSS Statistics Output node lists 16
properties 411 logistic regression models
IBM SPSS Statistics source node node scripting properties 272, 334
properties 409 logregnode properties 272
IBM SPSS Statistics Transform node looping in streams 6, 7
properties 409 loops
identifiers 19 using in scripts 49
inheritance 25 lowertoupper function 49
interrupting scripts 11 LSVM models
Isotonic-AS node node scripting properties 278
properties 429 lsvmnode properties 278
isotonicasnode properties 429
iteration key
looping in scripts 7 M
iteration variable
Map Visualization node
looping in scripts 8
properties 196
mapvisualization properties 196
J mathematical methods 21
Matrix node
JSON content model 57 properties 376
JSON source node matrixnode properties 376
properties 104 Means node
jsonimportnode properties 104 properties 378
Jython 15 meansnode properties 378
Merge node
K properties 129
mergenode properties 129
K-Means models Microsoft models
node scripting properties 264, 333 node scripting properties 343, 345
K-Means-AS models Migrating
node scripting properties 265, 429 accessing objects 449
KDE Modeling node clear streams, output, and models managers 34
properties 415 commands 441
KDE models editing streams 446
node scripting properties 341 executing streams 448
KDE Simulation node file system 449
properties 375, 416 functions 441
kdeapply properties 341 general differences 441
kdeexport properties 375, 416 getting properties 446
kdemodel properties 415 looping 447
kmeansasnode properties 265, 429 miscellaneous 451
kmeansnode properties 264 model types 445
KNN models node references 445
node scripting properties 333 node types 445
knnnode properties 266 output types 445
460 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
Migrating (continued) Netezza models
overview 441 node scripting properties 355
property names 445 Netezza Naive Bayes models
repository 449 node scripting properties 355
scripting context 441 Netezza Naive Bayesmodels
setting properties 446 node scripting properties 369
variables 444 Netezza PCA models
model nuggets node scripting properties 355, 369
node scripting properties 323 Netezza Regression Tree models
scripting names 437, 439 node scripting properties 355, 369
model objects Netezza Time Series models
scripting names 437, 439 node scripting properties 355
modeling nodes netezzabayesnode properties 355
node scripting properties 211 netezzadectreenode properties 355
models netezzadivclusternode properties 355
scripting names 437, 439 netezzaglmnode properties 355
modifying streams 31, 34 netezzakmeansnode properties 355
MS Decision Tree netezzaknnnode properties 355
node scripting properties 343, 345 netezzalineregressionnode properties 355
MS Linear Regression netezzanaivebayesnode properties 355
node scripting properties 343, 345 netezzapcanode properties 355
MS Logistic Regression netezzaregtreenode properties 355
node scripting properties 343, 345 netezzatimeseriesnode properties 355
MS Neural Network neural network models
node scripting properties 343, 345 node scripting properties 279, 335
MS Sequence Clustering neural networks
node scripting properties 345 node scripting properties 282, 335
MS Time Series neuralnetnode properties 279
node scripting properties 345 neuralnetworknode properties 282
msassocnode properties 343 node scripting properties
msbayesnode properties 343 export nodes 391
msclusternode properties 343 model nuggets 323
mslogisticnode properties 343 modeling nodes 211
msneuralnetworknode properties 343 nodes
msregressionnode properties 343 deleting 33
mssequenceclusternode properties 343 importing 33
mstimeseriesnode properties 343 information 35
mstreenode properties 343 linking nodes 31
MultiLayerPerceptron-AS node looping through in scripts 49
properties 430 names reference 437
multilayerperceptronnode properties 430 replacing 33
Multiplot node unlinking nodes 31
properties 200 non-ASCII characters 22
multiplotnode properties 200 nuggets
multiset command 71 node scripting properties 323
numericpredictornode properties 222
N
O
nearest neighbor models
node scripting properties 266 object oriented 23
Netezza Bayes Net models ocsvmnode properties 418
node scripting properties 355, 369 One-Class SVM node
Netezza Decision Tree models properties 418
node scripting properties 355, 369 operations 16
Netezza Divisive Clustering models oraabnnode properties 347
node scripting properties 355, 369 oraainode properties 347
Netezza Generalized Linear models oraapriorinode properties 347
node scripting properties 355 Oracle Adaptive Bayes models
Netezza K-Means models node scripting properties 347, 354
node scripting properties 355, 369 Oracle AI models
Netezza KNN models node scripting properties 347
node scripting properties 355, 369 Oracle Apriori models
Netezza Linear Regression models node scripting properties 347, 354
node scripting properties 355, 369 Oracle Decision Tree models
Index 461
Oracle Decision Tree models (continued) Q
node scripting properties 347, 354
Oracle Generalized Linear models QUEST models
node scripting properties 347 node scripting properties 284, 336
Oracle KMeans models questnode properties 284
node scripting properties 347, 354
Oracle MDL models
node scripting properties 347, 354
R
Oracle models R Build node
node scripting properties 347 node scripting properties 225
Oracle Naive Bayes models R Output node
node scripting properties 347, 354 properties 381
Oracle NMF models R Transform node
node scripting properties 347, 354 properties 132
Oracle O-Cluster Random Forest node
node scripting properties 347, 354 properties 420
Oracle Support Vector Machines models Random Trees models
node scripting properties 347, 354 node scripting properties 287, 337
oradecisiontreenode properties 347 randomtrees properties 287
oraglmnode properties 347 Reclassify node
orakmeansnode properties 347 properties 164
oramdlnode properties 347 reclassifynode properties 164
oranbnode properties 347 referencing nodes
oranmfnode properties 347 finding nodes 29
oraoclusternode properties 347 setting properties 30
orasvmnode properties 347 regressionnode properties 289
output nodes remarks 19
scripting properties 371 Reorder node
output objects properties 165
scripting names 439 reordernode properties 165
outputfilenode properties 402 Report node
properties 380
P reportnode properties 380
Reprojection node
parameters properties 165
scripting 15 reprojectnode properties 165
SuperNodes 435 Restructure node
Partition node properties 166
properties 163 restructurenode properties 166
partitionnode properties 163 retrieve command 49
passing arguments 20 RFM Aggregate node
passwords properties 130
adding to scripts 52 RFM Analysis node
encoded 66 properties 167
PCA models rfmaggregatenode properties 130
node scripting properties 243, 331 rfmanalysisnode properties 167
PCA/Factor models rfnode properties 420
node scripting properties 243, 331 routputnode properties 381
Plot node Rprocessnode properties 132
properties 201
plotnode properties 201
properties
S
common scripting 73 Sample node
database modeling nodes 343 properties 133
filter nodes 71 samplenode properties 133
scripting 71, 73, 211, 323, 391 SAS export node
stream 75 properties 403
SuperNodes 435 SAS source node
Python properties 104
scripting 15 sasexportnode properties 403
Python models sasimportnode properties 104
Gaussian Mixture node scripting properties 333 scripting
node scripting properties 336, 341 abbreviations used 72
462 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
scripting (continued) Set to Flag node
common properties 73 properties 168
compatibility with earlier versions 52 setglobalsnode properties 382
conditional execution 6, 9 setting properties 30
context 28 settoflagnode properties 168
diagrams 27 Sim Eval node
error checking 52 properties 383
executing 11 Sim Fit node
Feature Selection models 4 properties 384
from the command line 52 Sim Gen node
graph nodes 185 properties 105
in SuperNodes 5 simevalnode properties 383
interrupting 11 simfitnode properties 384
iteration key 7 simgennode properties 105
iteration variable 8 Simulation Evaluation node
legacy scripting 442–444, 447, 450, 451 properties 383
output nodes 371 Simulation Fit node
overview 1, 15 properties 384
Python scripting 442–444, 447, 450, 451 Simulation Generate node
selecting fields 9 properties 105
standalone scripts 1, 27 slot parameters 5, 71, 73
stream execution order 49 SLRM models
streams 1, 27 node scripting properties 293, 338
SuperNode scripts 1, 27 slrmnode properties 293
SuperNode streams 27 SMOTE node
syntax 15–17, 19–25 properties 422
user interface 1, 3, 5 smotenode properties 422
visual looping 6, 7 Sort node
Scripting API properties 135
accessing generated objects 40 sortnode properties 135
example 37 source nodes
getting a directory 37 properties 79
global values 46 Space-Time-Boxes node
handling errors 42 properties 124, 136
introduction 37 Space-Time-Boxes node properties 124
metadata 38 spacetimeboxes properties 136
multiple streams 47 Spatio-Temporal Prediction node
searching 37 properties 294
session parameters 42 standalone scripts 1, 3, 27
standalone scripts 47 statements 19
stream parameters 42 Statistics node
SuperNode parameters 42 properties 384
scripts statisticsexportnode properties 411
conditional execution 6, 9 statisticsimportnode properties 4, 409
importing from text files 1 statisticsmodelnode properties 410
iteration key 7 statisticsnode properties 384
iteration variable 8 statisticsoutputnode properties 411
looping 6, 7 statisticstransformnode properties 409
saving 1 store command 49
selecting fields 9 STP node
security properties 294
encoded passwords 52, 66 STP node nugget
Select node properties 339
properties 135 stpnode properties 294
selectnode properties 135 stream execution order
Self-Learning Response models changing with scripts 49
node scripting properties 293, 338 stream.nodes property 49
sequence models Streaming Time Series models
node scripting properties 291, 338 node scripting properties 138
sequencenode properties 291 Streaming Time Series node
server properties 144
command line arguments 66 streamingtimeseries properties 138
Set Globals node streamingts properties 144
properties 382 streams
Index 463
streams (continued) traversing through nodes 34
conditional execution 6, 9 Tree-AS models
execution 27 node scripting properties 317, 340
looping 6, 7 treeas properties 317
modifying 31 ts properties 307
multiset command 71 tsnenode properties 206, 423
properties 75 TWC Import source node
scripting 1, 27 properties 109
string functions 49 twcimport node properties 109
strings TwoStep AS models
changing case 49 node scripting properties 320, 341
structured properties 71 TwoStep models
supernode 71 node scripting properties 319, 340
SuperNode twostepAS properties 320
stream 27 twostepnode properties 319
SuperNodes Type node
parameters 435 properties 176
properties 435 typenode properties 4, 176
scripting 435
scripts 1, 5, 27
setting properties within 435
U
streams 27 User Input node
support vector machine models properties 110
node scripting properties 338 userinputnode properties 110
Support vector machine models
node scripting properties 300
SVM models V
node scripting properties 300
Variable File node
svmnode properties 300
properties 111
system
variablefilenode properties 111
command line arguments 64
variables
scripting 15
T
t-SNE node W
properties 206, 423
Web node
table content model 54
properties 208
Table node
webnode properties 208
properties 386
tablenode properties 386
tcm models X
node scripting properties 339
tcmnode properties 301 XGBoost Linear node
Temporal Causal models properties 424
node scripting properties 301 XGBoost Tree node
Time Intervals node properties 426
properties 169 XGBoost-AS node
Time Plot node properties 431
properties 204 xgboostasnode properties 431
time series models xgboostlinearnode properties 424
node scripting properties 307, 314, 340 xgboosttreenode properties 426
Time Series models XML content model 55
node scripting properties 307, 339 XML export node
timeintervalsnode properties 169 properties 408
timeplotnode properties 204 XML source node
timeseriesnode properties 314 properties 116
tm1import node properties 108 xmlexportnode properties 408
tm1odataimport node properties 107 xmlimportnode properties 116
Transform node
properties 388
transformnode properties 388
Transpose node
properties 175
transposenode properties 175
464 IBM SPSS Modeler 18.3 Python Scripting and Automation Guide
IBM®